Ambiguity Sets and their applications to SVM
|
|
- Jennifer Lawrence
- 5 years ago
- Views:
Transcription
1 Ambiguity Sets and their applications to SVM Ammon Washburn University of Arizona April 22, 2016 Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
2 Introduction Go over some (very little) set theory Explain what are φ-divergences Apply them to Support Vector Machines Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
3 Measures and Probability Measures A measure space consists of three things (X, X, µ). After several weeks deep into measure theory you realize power sets are bad So for X = R, we just use X is the Borel sets and µ is Lebesgue measure A probability measure P is a measure that sums up to one Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
4 Overview of Important Concepts in Probability Theory For Lebesgue measure we can use integrals. µ(a) = A dx If a probability measure P is absolutely continuous with respect to (dominated by) Lebesgue measure then there exists a function p(x) so that P(A) = A p(x)dx p(x) is called the density of P and by properties of P we know that R p(x)dx = 1 Abstractly if P is dominated by Q then there exists a function dp dq (x) so that P(A) = dp A dq (x)dq In other words we just only have to worry about measure Q and then find that special function that makes it work Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
5 Examples Consider we have probability measure that is dominated by Lebesgue measure and has a density of 2x1 [0,1] (x) What is the probability of the set A = { 1 2 }? What is the probability of the set A = [1, )? What is the probability of the set A = [0, 1 2 ] Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
6 φ-divergences We can measure how different two points are by taking their distance x y 2. How can we measure the distance between two distributions (two probability measures)? D(P, Q) = φ( dp dq )dq = X φ( p(z) )q(z)dz (1) q(z) Where φ is a convex function and φ(1) = 0, 0φ(a/0) a lim t φ(t)/t, and 0φ(0/0) 0. Note that P must be dominated by Q (denoted P << Q) or the divergence is infinity. Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
7 Solving Robust Linear Optimization in discrete case Consider the following problem This is from Ben-Tal et al. (2013). min c w (2a) s.t. (a + Bp) w β p U (2b) U R m is our uncertainty region and we make it robust by requiring that the constraint must be fulfilled for all p U. Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
8 Theorem for RLOs and φ-divergences Theorem Let U = {p R m p 0, Cp d, D(p, q) ρ} then the constraint in equation (2) can be replaced by the following constraint. a w + d η + ρλ + λ m ( b q i φ i w c i η ) β λ i=1 η 0, λ 0 (3a) (3b) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
9 Table 4 Some -Divergence Examples with Their Conjugates and Adjoints Divergence s t RCP Kullback Leibler e s 1 b t S.C. Burg entropy log 1 s s < 1 kl t S.C. J-divergence No closed form j t S.C. 2 -distance s s < 1 mc t CQP { Modified 2 -distance 1 s< 2 s + s 2 /4 s 2 c t CQP Hellinger distance -divergence of order >1 Variation distance Cressie Read s 1 s s<1 h t CQP ( ) / 1 s s + 1 t 1 ca { 1 s 1 v t LP s 1 s s 1 / 1 1 s< cr t CQP Notes. The last column indicates the tractability of (1). S.C., admits selfconcordant barrier. Figure 1: This figure is from Aharon Ben-Tal et al (2015) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
10 Proof of Theorem (1) Our constraint can be turned into the following maximization problem. Then we can find the Lagrangian function L and dual objective function g. Just need to worry that min λ,η 0 g(λ, η) β { β max (a + Bp) w m p 0 Cp d, q i φ( p } i ) ρ q i g(λ, η) = max p 0 = max p 0 { (a + Bp) w + ρλ λ i=1 m i=1 q i φ( p } i ) + η (d Cp) q i (4) (5) L(p, λ, η) (6) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
11 Proof of Theorem (2) Now we can show the following. g(λ, η) = a w + d η + ρλ + max p 0 = a w + d η + ρλ + = a w + d η + ρλ + = a w + d η + ρλ + m i=1 m i=1 m (p i (b i w) p i(c i η) λq iφ(p i /q i )) i=1 max p i 0 (p i(b i w c i η) λq iφ(p i /q i )) λq i max t 0 (t(b i w c i η)/λ φ(t)) m λq i φ (b i w c i η) i=1 Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
12 Corollary Theorem If we are just concerned with probability vectors then we can reduce the constraint in equation (2) to the following a w + η + ρλ + λ m ( b q i φ i w η ) β λ i=1 λ 0 (7a) (7b) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
13 Application (1) Consider the robust newsvendor model (This is also from Ben-Tal et al. (2013)) max min Q p U m p i u(r(q, i)) (8) i=1 s.t. r(q, i) = v min(d i, Q) + s(q d i ) + l(d i Q) + cq (9) We just have the historical sample frequencies q. So U = {p R m e p = 1, D(p, q) ρ}. Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
14 Application (2) After applying the theorem and adjusting some things we get { max η ρλ λ Q,η,λ m i=1 q i φ ( u(r(q, d i )) η ) } (10) λ If u(x) is concave and non-increasing and v + l s (wrong in paper) then the problem is still convex Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
15 Ambiguity and SVM For most applications of SVM, the uncertainty in the data is continuous (w.r.t. Lebesgue measure) and not discrete The divergence of a continuous and discrete distribution is always infinity Maybe we can just get probabilities in the nominal distribution and then turn the discrete nominal distribution into a continuous one Lets try the same strategy as Ben-Tal et al. (2013) but with continuous distributions Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
16 SVM with Kantorovich metric The Kantorovich metric ρ is defined as follows { } ρ(p, Q) inf d(x, y)k(dx, dy) K Marginals of K are P and Q X (11) Now the distance between a continuous and discrete distribution is not infinity Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
17 Semi-infinite DR SVM The DR-SVM with uncertainty set U being defined with the Kantorovich metric is defined as follows. 1 min w,b 2 w w + C sup 1 y(x w b)p(dx, dy) (12) P U is equivalent to (Lee and Mehrotra) min w,b,t,u 1 2 w w + C ( 1 m X m ξ j + ηu ) i=1 s.t. ξ j 1 y(x w b) [d(x, x j ) + d y (y, y j )]u, (13a) (x, y) X, j (13b) t, u 0 (13c) Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
18 Sketch of proof We just consider the second part of (12) and use a fact from Shapiro (2001) that the following is the primal and dual of our problem. max E P[f (x, y)] P U s.t. E P [g i (x, y)] = b i, i = 1,..., t min ξ 0 b x s.t. t ξ i g i (x, y) f (x, y) C i=1 Where C = {f X : X f (x, y)p(dx, dy) 0, P U} Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
19 KL Divergence Let φ(x) = x log(x). This gives us the Kullback-Leibler Divergence. D(P, P 0 ) = X p(z) log( p(z) )dz (14) p 0 (z) We would like to use this to bound in a robust way the true probability P with the nominal probability P 0 which is given by the data Define U = {P : D(P, P 0 ) η} Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
20 Result Take our original problem and make it robust min h(w) s.t. min P U Pr P{H(w, x) 0} 1 ɛ then transform it to a tractable problem (Hu and Hong (2013)) where there is no ambiguity min h(w) s.t. Pr P0 {H(w, x) 0} 1 ɛ where ɛ = sup t>0 e η (t+1) ɛ 1 t ɛ Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
21 Proof (1) Consider the following problem. min max E P[H(w, x)] w W P U By multiplying by p 0(x) p(x) p 0 (x) and letting L(x) = p 0 (x) inner maximization problem to look like this. max E P0 [H(w, x)l(x)] s.t. E P0 [L log(l)] η, then we can change the L L where L = {L E P0 [L] = 1, L 0 a.s.} Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
22 Proof (2) l(α, L) = E P0 [H(w, x)l(x)] α(e P0 [L(x) log(l(x)) η] If we maximize this under L L and then take the minimum over α 0 then we solve the dual and by the convexity of the problem then they are equal. max E P0 [H(w, x)l(x) αl(x) log(l(x))] s.t. E P0 [L(x)] = 1 L(x) 0 Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
23 Proof (3) Now we make functionals J(f ) = E P0 [H(w, x)f (x) αf (x) log(f (x))] and J c (f ) = E P0 [L(x)] 1. Now we basically make a unconstrained optimization for these functionals and solve. After some functional analysis you get L (x) = eh(w,x)/α E P0 [e H(w,x)/α ] (15) Plug that back into l(l, α) and you get l(l, α) = v(α) = α log(e P0 [e H(w,x)/α ]) + αη Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
24 Proof (4) Now just notice that P P0 (H(w, x) 0) = E P0 [1 H(w,x) 0 (x)] and plug it into the above solution and you get the constraint from before. Pr P0 {H(w, x) 0} 1 ɛ e η (t + 1) ɛ 1 ɛ = sup ɛ t>0 t Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
25 References Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2): , Zhaolin Hu and L Jeff Hong. Kullback-leibler divergence constrained distributionally robust optimization. Available at Optimization Online, Changhyeok Lee and Sanjay Mehrotra. A distributionally-robust approach for finding support vector machines. Not Yet Published. Alexander Shapiro. On duality theory of conic linear problems. pages , Ammon Washburn (University of Arizona) Ambiguity Sets April 22, / 25
Robust Dual-Response Optimization
Yanıkoğlu, den Hertog, and Kleijnen Robust Dual-Response Optimization 29 May 1 June 1 / 24 Robust Dual-Response Optimization İhsan Yanıkoğlu, Dick den Hertog, Jack P.C. Kleijnen Özyeğin University, İstanbul,
More informationDistributionally Robust Stochastic Optimization with Wasserstein Distance
Distributionally Robust Stochastic Optimization with Wasserstein Distance Rui Gao DOS Seminar, Oct 2016 Joint work with Anton Kleywegt School of Industrial and Systems Engineering Georgia Tech What is
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationLagrangian Duality Theory
Lagrangian Duality Theory Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapter 14.1-4 1 Recall Primal and Dual
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming
More informationThe Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:
HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course
More informationSecond Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs
Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)
More informationSemidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization
Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012
More informationProceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.
Proceedings of the 204 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. ROBUST RARE-EVENT PERFORMANCE ANALYSIS WITH NATURAL NON-CONVEX CONSTRAINTS
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationRandom Convex Approximations of Ambiguous Chance Constrained Programs
Random Convex Approximations of Ambiguous Chance Constrained Programs Shih-Hao Tseng Eilyan Bitar Ao Tang Abstract We investigate an approach to the approximation of ambiguous chance constrained programs
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationOptimal Transport Methods in Operations Research and Statistics
Optimal Transport Methods in Operations Research and Statistics Jose Blanchet (based on work with F. He, Y. Kang, K. Murthy, F. Zhang). Stanford University (Management Science and Engineering), and Columbia
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationA Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition
A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition Sanjay Mehrotra and He Zhang July 23, 2013 Abstract Moment robust optimization models formulate a stochastic problem with
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationOptimal Transport in Risk Analysis
Optimal Transport in Risk Analysis Jose Blanchet (based on work with Y. Kang and K. Murthy) Stanford University (Management Science and Engineering), and Columbia University (Department of Statistics and
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationCS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds
CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationConvex Optimization and SVM
Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationData-Driven Distributionally Robust Chance-Constrained Optimization with Wasserstein Metric
Data-Driven Distributionally Robust Chance-Constrained Optimization with asserstein Metric Ran Ji Department of System Engineering and Operations Research, George Mason University, rji2@gmu.edu; Miguel
More informationLinear and Combinatorial Optimization
Linear and Combinatorial Optimization The dual of an LP-problem. Connections between primal and dual. Duality theorems and complementary slack. Philipp Birken (Ctr. for the Math. Sc.) Lecture 3: Duality
More informationTHE stochastic and dynamic environments of many practical
A Convex Optimization Approach to Distributionally Robust Markov Decision Processes with Wasserstein Distance Insoon Yang, Member, IEEE Abstract In this paper, we consider the problem of constructing control
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationLinear and non-linear programming
Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)
More informationAdditional Homework Problems
Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More informationMATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels
1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationAdjustable Robust Parameter Design with Unknown Distributions Yanikoglu, Ihsan; den Hertog, Dick; Kleijnen, J.P.C.
Tilburg University Adjustable Robust Parameter Design with Unknown Distributions Yanikoglu, Ihsan; den Hertog, Dick; Kleijnen, J.P.C. Publication date: 2013 Link to publication Citation for published version
More informationTWO-STAGE LIKELIHOOD ROBUST LINEAR PROGRAM WITH APPLICATION TO WATER ALLOCATION UNDER UNCERTAINTY
Proceedings of the 2013 Winter Simulation Conference R. Pasupathy, S.-H. Kim, A. Tolk, R. Hill, and M. E. Kuhl, eds. TWO-STAGE LIKELIHOOD ROBUST LIEAR PROGRAM WITH APPLICATIO TO WATER ALLOCATIO UDER UCERTAITY
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationSafe Approximations of Chance Constraints Using Historical Data
Safe Approximations of Chance Constraints Using Historical Data İhsan Yanıkoğlu Department of Econometrics and Operations Research, Tilburg University, 5000 LE, Netherlands, {i.yanikoglu@uvt.nl} Dick den
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationWelfare Maximization with Production Costs: A Primal Dual Approach
Welfare Maximization with Production Costs: A Primal Dual Approach Zhiyi Huang Anthony Kim The University of Hong Kong Stanford University January 4, 2015 Zhiyi Huang, Anthony Kim Welfare Maximization
More informationQuantifying Stochastic Model Errors via Robust Optimization
Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationJune 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions
Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all
More informationQuadratic Two-Stage Stochastic Optimization with Coherent Measures of Risk
Noname manuscript No. (will be inserted by the editor) Quadratic Two-Stage Stochastic Optimization with Coherent Measures of Risk Jie Sun Li-Zhi Liao Brian Rodrigues Received: date / Accepted: date Abstract
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationStability of optimization problems with stochastic dominance constraints
Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM
More informationHomework Set #6 - Solutions
EE 15 - Applications of Convex Optimization in Signal Processing and Communications Dr Andre Tkacenko JPL Third Term 11-1 Homework Set #6 - Solutions 1 a The feasible set is the interval [ 4] The unique
More informationOn deterministic reformulations of distributionally robust joint chance constrained optimization problems
On deterministic reformulations of distributionally robust joint chance constrained optimization problems Weijun Xie and Shabbir Ahmed School of Industrial & Systems Engineering Georgia Institute of Technology,
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More information(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define
Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that
More informationOptimal control problems with PDE constraints
Optimal control problems with PDE constraints Maya Neytcheva CIM, October 2017 General framework Unconstrained optimization problems min f (q) q x R n (real vector) and f : R n R is a smooth function.
More informationTilburg University. Hidden Convexity in Partially Separable Optimization Ben-Tal, A.; den Hertog, Dick; Laurent, Monique. Publication date: 2011
Tilburg University Hidden Convexity in Partially Separable Optimization Ben-Tal, A.; den Hertog, Dick; Laurent, Monique Publication date: 2011 Link to publication Citation for published version (APA):
More informationDistributionally Robust Convex Optimization
Distributionally Robust Convex Optimization Wolfram Wiesemann 1, Daniel Kuhn 1, and Melvyn Sim 2 1 Department of Computing, Imperial College London, United Kingdom 2 Department of Decision Sciences, National
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More informationMartingale optimal transport with Monge s cost function
Martingale optimal transport with Monge s cost function Martin Klimmek, joint work with David Hobson (Warwick) klimmek@maths.ox.ac.uk Crete July, 2013 ca. 1780 c(x, y) = x y No splitting, transport along
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationOperations Research Letters
Operations Research Letters 37 (2009) 1 6 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Duality in robust optimization: Primal worst
More informationDistributionally robust optimization techniques in batch bayesian optimisation
Distributionally robust optimization techniques in batch bayesian optimisation Nikitas Rontsis June 13, 2016 1 Introduction This report is concerned with performing batch bayesian optimization of an unknown
More informationSurrogate loss functions, divergences and decentralized detection
Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationAdaGAN: Boosting Generative Models
AdaGAN: Boosting Generative Models Ilya Tolstikhin ilya@tuebingen.mpg.de joint work with Gelly 2, Bousquet 2, Simon-Gabriel 1, Schölkopf 1 1 MPI for Intelligent Systems 2 Google Brain Radford et al., 2015)
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More information14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.
CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity
More informationTheory of Probability Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 8.75 Theory of Probability Fall 008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Section Prekopa-Leindler inequality,
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationDistributionally Robust Convex Optimization
Submitted to Operations Research manuscript OPRE-2013-02-060 Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However,
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationTutorial on Convex Optimization: Part II
Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation
More informationMAT 578 FUNCTIONAL ANALYSIS EXERCISES
MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.
More informationConvergence Analysis for Distributionally Robust Optimization and Equilibrium Problems*
MATHEMATICS OF OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0364-765X eissn 1526-5471 00 0000 0001 INFORMS doi 10.1287/xxxx.0000.0000 c 0000 INFORMS Authors are encouraged to submit
More informationExercises Measure Theoretic Probability
Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary
More informationDistributionally robust simple integer recourse
Distributionally robust simple integer recourse Weijun Xie 1 and Shabbir Ahmed 2 1 Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061 2 School of Industrial & Systems
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationAmbiguity in portfolio optimization
May/June 2006 Introduction: Risk and Ambiguity Frank Knight Risk, Uncertainty and Profit (1920) Risk: the decision-maker can assign mathematical probabilities to random phenomena Uncertainty: randomness
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More informationLecture 7: Lagrangian Relaxation and Duality Theory
Lecture 7: Lagrangian Relaxation and Duality Theory (3 units) Outline Lagrangian dual for linear IP Lagrangian dual for general IP Dual Search Lagrangian decomposition 1 / 23 Joseph Louis Lagrange Joseph
More information