Ambiguity Sets and their applications to SVM

Size: px
Start display at page:

Download "Ambiguity Sets and their applications to SVM"

Transcription

1 Ambiguity Sets and their applications to SVM Ammon Washburn University of Arizona April 22, 2016 Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

2 Introduction Go over some (very little) set theory Explain what are φ-divergences Apply them to Support Vector Machines Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

3 Measures and Probability Measures A measure space consists of three things (X, X, µ). After several weeks deep into measure theory you realize power sets are bad So for X = R, we just use X is the Borel sets and µ is Lebesgue measure A probability measure P is a measure that sums up to one Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

4 Overview of Important Concepts in Probability Theory For Lebesgue measure we can use integrals. µ(a) = A dx If a probability measure P is absolutely continuous with respect to (dominated by) Lebesgue measure then there exists a function p(x) so that P(A) = A p(x)dx p(x) is called the density of P and by properties of P we know that R p(x)dx = 1 Abstractly if P is dominated by Q then there exists a function dp dq (x) so that P(A) = dp A dq (x)dq In other words we just only have to worry about measure Q and then find that special function that makes it work Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

5 Examples Consider we have probability measure that is dominated by Lebesgue measure and has a density of 2x1 [0,1] (x) What is the probability of the set A = { 1 2 }? What is the probability of the set A = [1, )? What is the probability of the set A = [0, 1 2 ] Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

6 φ-divergences We can measure how different two points are by taking their distance x y 2. How can we measure the distance between two distributions (two probability measures)? D(P, Q) = φ( dp dq )dq = X φ( p(z) )q(z)dz (1) q(z) Where φ is a convex function and φ(1) = 0, 0φ(a/0) a lim t φ(t)/t, and 0φ(0/0) 0. Note that P must be dominated by Q (denoted P << Q) or the divergence is infinity. Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

7 Solving Robust Linear Optimization in discrete case Consider the following problem This is from Ben-Tal et al. (2013). min c w (2a) s.t. (a + Bp) w β p U (2b) U R m is our uncertainty region and we make it robust by requiring that the constraint must be fulfilled for all p U. Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

8 Theorem for RLOs and φ-divergences Theorem Let U = {p R m p 0, Cp d, D(p, q) ρ} then the constraint in equation (2) can be replaced by the following constraint. a w + d η + ρλ + λ m ( b q i φ i w c i η ) β λ i=1 η 0, λ 0 (3a) (3b) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

9 Table 4 Some -Divergence Examples with Their Conjugates and Adjoints Divergence s t RCP Kullback Leibler e s 1 b t S.C. Burg entropy log 1 s s < 1 kl t S.C. J-divergence No closed form j t S.C. 2 -distance s s < 1 mc t CQP { Modified 2 -distance 1 s< 2 s + s 2 /4 s 2 c t CQP Hellinger distance -divergence of order >1 Variation distance Cressie Read s 1 s s<1 h t CQP ( ) / 1 s s + 1 t 1 ca { 1 s 1 v t LP s 1 s s 1 / 1 1 s< cr t CQP Notes. The last column indicates the tractability of (1). S.C., admits selfconcordant barrier. Figure 1: This figure is from Aharon Ben-Tal et al (2015) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

10 Proof of Theorem (1) Our constraint can be turned into the following maximization problem. Then we can find the Lagrangian function L and dual objective function g. Just need to worry that min λ,η 0 g(λ, η) β { β max (a + Bp) w m p 0 Cp d, q i φ( p } i ) ρ q i g(λ, η) = max p 0 = max p 0 { (a + Bp) w + ρλ λ i=1 m i=1 q i φ( p } i ) + η (d Cp) q i (4) (5) L(p, λ, η) (6) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

11 Proof of Theorem (2) Now we can show the following. g(λ, η) = a w + d η + ρλ + max p 0 = a w + d η + ρλ + = a w + d η + ρλ + = a w + d η + ρλ + m i=1 m i=1 m (p i (b i w) p i(c i η) λq iφ(p i /q i )) i=1 max p i 0 (p i(b i w c i η) λq iφ(p i /q i )) λq i max t 0 (t(b i w c i η)/λ φ(t)) m λq i φ (b i w c i η) i=1 Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

12 Corollary Theorem If we are just concerned with probability vectors then we can reduce the constraint in equation (2) to the following a w + η + ρλ + λ m ( b q i φ i w η ) β λ i=1 λ 0 (7a) (7b) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

13 Application (1) Consider the robust newsvendor model (This is also from Ben-Tal et al. (2013)) max min Q p U m p i u(r(q, i)) (8) i=1 s.t. r(q, i) = v min(d i, Q) + s(q d i ) + l(d i Q) + cq (9) We just have the historical sample frequencies q. So U = {p R m e p = 1, D(p, q) ρ}. Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

14 Application (2) After applying the theorem and adjusting some things we get { max η ρλ λ Q,η,λ m i=1 q i φ ( u(r(q, d i )) η ) } (10) λ If u(x) is concave and non-increasing and v + l s (wrong in paper) then the problem is still convex Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

15 Ambiguity and SVM For most applications of SVM, the uncertainty in the data is continuous (w.r.t. Lebesgue measure) and not discrete The divergence of a continuous and discrete distribution is always infinity Maybe we can just get probabilities in the nominal distribution and then turn the discrete nominal distribution into a continuous one Lets try the same strategy as Ben-Tal et al. (2013) but with continuous distributions Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

16 SVM with Kantorovich metric The Kantorovich metric ρ is defined as follows { } ρ(p, Q) inf d(x, y)k(dx, dy) K Marginals of K are P and Q X (11) Now the distance between a continuous and discrete distribution is not infinity Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

17 Semi-infinite DR SVM The DR-SVM with uncertainty set U being defined with the Kantorovich metric is defined as follows. 1 min w,b 2 w w + C sup 1 y(x w b)p(dx, dy) (12) P U is equivalent to (Lee and Mehrotra) min w,b,t,u 1 2 w w + C ( 1 m X m ξ j + ηu ) i=1 s.t. ξ j 1 y(x w b) [d(x, x j ) + d y (y, y j )]u, (13a) (x, y) X, j (13b) t, u 0 (13c) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

18 Sketch of proof We just consider the second part of (12) and use a fact from Shapiro (2001) that the following is the primal and dual of our problem. max E P[f (x, y)] P U s.t. E P [g i (x, y)] = b i, i = 1,..., t min ξ 0 b x s.t. t ξ i g i (x, y) f (x, y) C i=1 Where C = {f X : X f (x, y)p(dx, dy) 0, P U} Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

19 KL Divergence Let φ(x) = x log(x). This gives us the Kullback-Leibler Divergence. D(P, P 0 ) = X p(z) log( p(z) )dz (14) p 0 (z) We would like to use this to bound in a robust way the true probability P with the nominal probability P 0 which is given by the data Define U = {P : D(P, P 0 ) η} Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

20 Result Take our original problem and make it robust min h(w) s.t. min P U Pr P{H(w, x) 0} 1 ɛ then transform it to a tractable problem (Hu and Hong (2013)) where there is no ambiguity min h(w) s.t. Pr P0 {H(w, x) 0} 1 ɛ where ɛ = sup t>0 e η (t+1) ɛ 1 t ɛ Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

21 Proof (1) Consider the following problem. min max E P[H(w, x)] w W P U By multiplying by p 0(x) p(x) p 0 (x) and letting L(x) = p 0 (x) inner maximization problem to look like this. max E P0 [H(w, x)l(x)] s.t. E P0 [L log(l)] η, then we can change the L L where L = {L E P0 [L] = 1, L 0 a.s.} Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

22 Proof (2) l(α, L) = E P0 [H(w, x)l(x)] α(e P0 [L(x) log(l(x)) η] If we maximize this under L L and then take the minimum over α 0 then we solve the dual and by the convexity of the problem then they are equal. max E P0 [H(w, x)l(x) αl(x) log(l(x))] s.t. E P0 [L(x)] = 1 L(x) 0 Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

23 Proof (3) Now we make functionals J(f ) = E P0 [H(w, x)f (x) αf (x) log(f (x))] and J c (f ) = E P0 [L(x)] 1. Now we basically make a unconstrained optimization for these functionals and solve. After some functional analysis you get L (x) = eh(w,x)/α E P0 [e H(w,x)/α ] (15) Plug that back into l(l, α) and you get l(l, α) = v(α) = α log(e P0 [e H(w,x)/α ]) + αη Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

24 Proof (4) Now just notice that P P0 (H(w, x) 0) = E P0 [1 H(w,x) 0 (x)] and plug it into the above solution and you get the constraint from before. Pr P0 {H(w, x) 0} 1 ɛ e η (t + 1) ɛ 1 ɛ = sup ɛ t>0 t Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

25 References Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2): , Zhaolin Hu and L Jeff Hong. Kullback-leibler divergence constrained distributionally robust optimization. Available at Optimization Online, Changhyeok Lee and Sanjay Mehrotra. A distributionally-robust approach for finding support vector machines. Not Yet Published. Alexander Shapiro. On duality theory of conic linear problems. pages , Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25

Robust Dual-Response Optimization

Robust Dual-Response Optimization Yanıkoğlu, den Hertog, and Kleijnen Robust Dual-Response Optimization 29 May 1 June 1 / 24 Robust Dual-Response Optimization İhsan Yanıkoğlu, Dick den Hertog, Jack P.C. Kleijnen Özyeğin University, İstanbul,

More information

Distributionally Robust Stochastic Optimization with Wasserstein Distance

Distributionally Robust Stochastic Optimization with Wasserstein Distance Distributionally Robust Stochastic Optimization with Wasserstein Distance Rui Gao DOS Seminar, Oct 2016 Joint work with Anton Kleywegt School of Industrial and Systems Engineering Georgia Tech What is

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Lagrangian Duality Theory

Lagrangian Duality Theory Lagrangian Duality Theory Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapter 14.1-4 1 Recall Primal and Dual

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem: HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course

More information

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012

More information

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. Proceedings of the 204 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. ROBUST RARE-EVENT PERFORMANCE ANALYSIS WITH NATURAL NON-CONVEX CONSTRAINTS

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

Random Convex Approximations of Ambiguous Chance Constrained Programs

Random Convex Approximations of Ambiguous Chance Constrained Programs Random Convex Approximations of Ambiguous Chance Constrained Programs Shih-Hao Tseng Eilyan Bitar Ao Tang Abstract We investigate an approach to the approximation of ambiguous chance constrained programs

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Optimal Transport Methods in Operations Research and Statistics

Optimal Transport Methods in Operations Research and Statistics Optimal Transport Methods in Operations Research and Statistics Jose Blanchet (based on work with F. He, Y. Kang, K. Murthy, F. Zhang). Stanford University (Management Science and Engineering), and Columbia

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition

A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition Sanjay Mehrotra and He Zhang July 23, 2013 Abstract Moment robust optimization models formulate a stochastic problem with

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Optimal Transport in Risk Analysis

Optimal Transport in Risk Analysis Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Data-Driven Distributionally Robust Chance-Constrained Optimization with Wasserstein Metric

Data-Driven Distributionally Robust Chance-Constrained Optimization with Wasserstein Metric Data-Driven Distributionally Robust Chance-Constrained Optimization with asserstein Metric Ran Ji Department of System Engineering and Operations Research, George Mason University, rji2@gmu.edu; Miguel

More information

Linear and Combinatorial Optimization

Linear and Combinatorial Optimization Linear and Combinatorial Optimization The dual of an LP-problem. Connections between primal and dual. Duality theorems and complementary slack. Philipp Birken (Ctr. for the Math. Sc.) Lecture 3: Duality

More information

THE stochastic and dynamic environments of many practical

THE stochastic and dynamic environments of many practical A Convex Optimization Approach to Distributionally Robust Markov Decision Processes with Wasserstein Distance Insoon Yang, Member, IEEE Abstract In this paper, we consider the problem of constructing control

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

Additional Homework Problems

Additional Homework Problems Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels 1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Adjustable Robust Parameter Design with Unknown Distributions Yanikoglu, Ihsan; den Hertog, Dick; Kleijnen, J.P.C.

Adjustable Robust Parameter Design with Unknown Distributions Yanikoglu, Ihsan; den Hertog, Dick; Kleijnen, J.P.C. Tilburg University Adjustable Robust Parameter Design with Unknown Distributions Yanikoglu, Ihsan; den Hertog, Dick; Kleijnen, J.P.C. Publication date: 2013 Link to publication Citation for published version

More information

TWO-STAGE LIKELIHOOD ROBUST LINEAR PROGRAM WITH APPLICATION TO WATER ALLOCATION UNDER UNCERTAINTY

TWO-STAGE LIKELIHOOD ROBUST LINEAR PROGRAM WITH APPLICATION TO WATER ALLOCATION UNDER UNCERTAINTY Proceedings of the 2013 Winter Simulation Conference R. Pasupathy, S.-H. Kim, A. Tolk, R. Hill, and M. E. Kuhl, eds. TWO-STAGE LIKELIHOOD ROBUST LIEAR PROGRAM WITH APPLICATIO TO WATER ALLOCATIO UDER UCERTAITY

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Safe Approximations of Chance Constraints Using Historical Data

Safe Approximations of Chance Constraints Using Historical Data Safe Approximations of Chance Constraints Using Historical Data İhsan Yanıkoğlu Department of Econometrics and Operations Research, Tilburg University, 5000 LE, Netherlands, {i.yanikoglu@uvt.nl} Dick den

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

Welfare Maximization with Production Costs: A Primal Dual Approach

Welfare Maximization with Production Costs: A Primal Dual Approach Welfare Maximization with Production Costs: A Primal Dual Approach Zhiyi Huang Anthony Kim The University of Hong Kong Stanford University January 4, 2015 Zhiyi Huang, Anthony Kim Welfare Maximization

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

Quadratic Two-Stage Stochastic Optimization with Coherent Measures of Risk

Quadratic Two-Stage Stochastic Optimization with Coherent Measures of Risk Noname manuscript No. (will be inserted by the editor) Quadratic Two-Stage Stochastic Optimization with Coherent Measures of Risk Jie Sun Li-Zhi Liao Brian Rodrigues Received: date / Accepted: date Abstract

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

Homework Set #6 - Solutions

Homework Set #6 - Solutions EE 15 - Applications of Convex Optimization in Signal Processing and Communications Dr Andre Tkacenko JPL Third Term 11-1 Homework Set #6 - Solutions 1 a The feasible set is the interval [ 4] The unique

More information

On deterministic reformulations of distributionally robust joint chance constrained optimization problems

On deterministic reformulations of distributionally robust joint chance constrained optimization problems On deterministic reformulations of distributionally robust joint chance constrained optimization problems Weijun Xie and Shabbir Ahmed School of Industrial & Systems Engineering Georgia Institute of Technology,

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Optimal control problems with PDE constraints

Optimal control problems with PDE constraints Optimal control problems with PDE constraints Maya Neytcheva CIM, October 2017 General framework Unconstrained optimization problems min f (q) q x R n (real vector) and f : R n R is a smooth function.

More information

Tilburg University. Hidden Convexity in Partially Separable Optimization Ben-Tal, A.; den Hertog, Dick; Laurent, Monique. Publication date: 2011

Tilburg University. Hidden Convexity in Partially Separable Optimization Ben-Tal, A.; den Hertog, Dick; Laurent, Monique. Publication date: 2011 Tilburg University Hidden Convexity in Partially Separable Optimization Ben-Tal, A.; den Hertog, Dick; Laurent, Monique Publication date: 2011 Link to publication Citation for published version (APA):

More information

Distributionally Robust Convex Optimization

Distributionally Robust Convex Optimization Distributionally Robust Convex Optimization Wolfram Wiesemann 1, Daniel Kuhn 1, and Melvyn Sim 2 1 Department of Computing, Imperial College London, United Kingdom 2 Department of Decision Sciences, National

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Martingale optimal transport with Monge s cost function

Martingale optimal transport with Monge s cost function Martingale optimal transport with Monge s cost function Martin Klimmek, joint work with David Hobson (Warwick) klimmek@maths.ox.ac.uk Crete July, 2013 ca. 1780 c(x, y) = x y No splitting, transport along

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Machine Learning A Geometric Approach

Machine Learning A Geometric Approach Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

More information

Operations Research Letters

Operations Research Letters Operations Research Letters 37 (2009) 1 6 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Duality in robust optimization: Primal worst

More information

Distributionally robust optimization techniques in batch bayesian optimisation

Distributionally robust optimization techniques in batch bayesian optimisation Distributionally robust optimization techniques in batch bayesian optimisation Nikitas Rontsis June 13, 2016 1 Introduction This report is concerned with performing batch bayesian optimization of an unknown

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

AdaGAN: Boosting Generative Models

AdaGAN: Boosting Generative Models AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

Theory of Probability Fall 2008

Theory of Probability Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 8.75 Theory of Probability Fall 008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Section Prekopa-Leindler inequality,

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Distributionally Robust Convex Optimization

Distributionally Robust Convex Optimization Submitted to Operations Research manuscript OPRE-2013-02-060 Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Tutorial on Convex Optimization: Part II

Tutorial on Convex Optimization: Part II Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation

More information

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

MAT 578 FUNCTIONAL ANALYSIS EXERCISES MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.

More information

Convergence Analysis for Distributionally Robust Optimization and Equilibrium Problems*

Convergence Analysis for Distributionally Robust Optimization and Equilibrium Problems* MATHEMATICS OF OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0364-765X eissn 1526-5471 00 0000 0001 INFORMS doi 10.1287/xxxx.0000.0000 c 0000 INFORMS Authors are encouraged to submit

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

Distributionally robust simple integer recourse

Distributionally robust simple integer recourse Distributionally robust simple integer recourse Weijun Xie 1 and Shabbir Ahmed 2 1 Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061 2 School of Industrial & Systems

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Ambiguity in portfolio optimization

Ambiguity in portfolio optimization May/June 2006 Introduction: Risk and Ambiguity Frank Knight Risk, Uncertainty and Profit (1920) Risk: the decision-maker can assign mathematical probabilities to random phenomena Uncertainty: randomness

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

Lecture 7: Lagrangian Relaxation and Duality Theory

Lecture 7: Lagrangian Relaxation and Duality Theory Lecture 7: Lagrangian Relaxation and Duality Theory (3 units) Outline Lagrangian dual for linear IP Lagrangian dual for general IP Dual Search Lagrangian decomposition 1 / 23 Joseph Louis Lagrange Joseph

More information