# Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Save this PDF as:

Size: px
Start display at page:

Download "Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725" ## Transcription

1 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization /

2 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j = 1,... r we defined the Lagrangian: L(x, u, v) = f(x) + and Lagrange dual function: m u i h i (x) + g(u, v) = min x L(x, u, v) r v j l j (x) j=1 2

3 The subsequent dual problem is: Important properties: max u,v g(u, v) subject to u 0 Dual problem is always convex, i.e., g is always concave (even if primal problem is not convex) The primal and dual optimal values, f and g, always satisfy weak duality: f g Slater s condition: for convex primal, if there is an x such that h 1 (x) < 0,... h m (x) < 0 and l 1 (x) = 0,... l r (x) = 0 then strong duality holds: f = g. Can be further refined to strict inequalities over the nonaffine h i, i = 1,... m 3

4 Outline Today: KKT conditions Examples Constrained and Lagrange forms Uniqueness with l 1 penalties 4

5 Given general problem Karush-Kuhn-Tucker conditions min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j = 1,... r The Karush-Kuhn-Tucker conditions or KKT conditions are: ( m r ) 0 f(x) + u i h i (x) + v j l j (x) (stationarity) u i h i (x) = 0 for all i j=1 h i (x) 0, l j (x) = 0 for all i, j u i 0 for all i (complementary slackness) (primal feasibility) (dual feasibility) 5

6 Necessity Let x and u, v be primal and dual solutions with zero duality gap (strong duality holds, e.g., under Slater s condition). Then f(x ) = g(u, v ) = min x f(x) + f(x ) + f(x ) m u i h i (x) + m u i h i (x ) + r vj l j (x) j=1 r vj l j (x ) j=1 In other words, all these inequalities are actually equalities 6

7 Two things to learn from this: The point x minimizes L(x, u, v ) over x R n. Hence the subdifferential of L(x, u, v ) must contain 0 at x = x this is exactly the stationarity condition We must have m u i h i(x ) = 0, and since each term here is 0, this implies u i h i(x ) = 0 for every i this is exactly complementary slackness Primal and dual feasibility hold by virtue of optimality. Therefore: If x and u, v are primal and dual solutions, with zero duality gap, then x, u, v satisfy the KKT conditions (Note that this statement assumes nothing a priori about convexity of our problem, i.e., of f, h i, l j ) 7

8 Sufficiency If there exists x, u, v that satisfy the KKT conditions, then g(u, v ) = f(x ) + = f(x ) m u i h i (x ) + r vj l j (x ) j=1 where the first equality holds from stationarity, and the second holds from complementary slackness Therefore the duality gap is zero (and x and u, v are primal and dual feasible) so x and u, v are primal and dual optimal. Hence, we ve shown: If x and u, v satisfy the KKT conditions, then x and u, v are primal and dual solutions 8

9 Putting it together In summary, KKT conditions: always sufficient necessary under strong duality Putting it together: For a problem with strong duality (e.g., assume Slater s condition: convex problem and there exists x strictly satisfying nonaffine inequality contraints), x and u, v are primal and dual solutions x and u, v satisfy the KKT conditions (Warning, concerning the stationarity condition: for a differentiable function f, we cannot use f(x) = { f(x)} unless f is convex!) 9

10 10 What s in a name? Older folks will know these as the KT (Kuhn-Tucker) conditions: First appeared in publication by Kuhn and Tucker in 1951 Later people found out that Karush had the conditions in his unpublished master s thesis of 1939 For unconstrained problems, the KKT conditions are nothing more than the subgradient optimality condition For general convex problems, the KKT conditions could have been derived entirely from studying optimality via subgradients 0 f(x ) + m N {hi 0}(x ) + r N {lj =0}(x ) j=1 where recall N C (x) is the normal cone of C at x

11 11 Example: quadratic with equality constraints Consider for Q 0, min 2 xt Qx + c T x subject to Ax = 0 x 1 E.g., as we will see, this corresponds to Newton step for equalityconstrained problem min x f(x) subject to Ax = b Convex problem, no inequality constraints, so by KKT conditions: x is a solution if and only if [ ] ] Q A T A 0 ] [ x u = [ c 0 for some u. Linear system combines stationarity, primal feasibility (complementary slackness and dual feasibility are vacuous)

12 12 Example: water-filling Example from B & V page 245: consider problem min x n log(α i + x i ) subject to x 0, 1 T x = 1 Information theory: think of log(α i + x i ) as communication rate of ith channel. KKT conditions: 1/(α i + x i ) u i + v = 0, i = 1,... n Eliminate u: u i x i = 0, i = 1,... n, x 0, 1 T x = 1, u 0 1/(α i + x i ) v, i = 1,... n x i (v 1/(α i + x i )) = 0, i = 1,... n, x 0, 1 T x = 1

13 13 Can argue directly stationarity and complementary slackness imply { 1/v α i if v < 1/α i x i = = max{0, 1/v α i }, i = 1,... n 0 if v 1/α i Still need x to be feasible, i.e., 1 T x = 1, and this gives n max{0, 1/v α i } = 1 Univariate equation, piecewise linear in 1/v and not hard to solve This reduced problem is called water-filling (From B & V page 246) 1/ν x i α i i

14 14 Example: support vector machines Given y { 1, 1} n, and X R n p, the support vector machine problem is: min β,β 0,ξ subject to 1 2 β C n ξ i ξ i 0, i = 1,... n y i (x T i β + β 0 ) 1 ξ i, i = 1,... n Introduce dual variables v, w 0. KKT stationarity condition: n 0 = w i y i, β = Complementary slackness: n w i y i x i, w = C1 v v i ξ i = 0, w i ( 1 ξi y i (x T i β + β 0 ) ) = 0, i = 1,... n

15 15 Hence at optimality we have β = n w iy i x i, and w i is nonzero only if y i (x T i β + β 0) = 1 ξ i. Such points i are called the support points For support point i, if ξ i = 0, then x i lies on edge of margin, and w i (0, C]; For support point i, if ξ i 0, then x i lies on wrong side of margin, and w i = C x T β + β 0 =0 ξ 4 ξ5 ξ ξ 3 1 ξ 2 M = 1 β M = 1 β margin KKT conditions do not really give us a way to find solution, but gives a better understanding In fact, we can use this to screen away non-support points before performing optimization

16 16 Constrained and Lagrange forms Often in statistics and machine learning we ll switch back and forth between constrained form, where t R is a tuning parameter, min x f(x) subject to h(x) t (C) and Lagrange form, where λ 0 is a tuning parameter, min x f(x) + λ h(x) (L) and claim these are equivalent. Is this true (assuming convex f, h)? (C) to (L): if problem (C) is strictly feasible, then strong duality holds, and there exists some λ 0 (dual solution) such that any solution x in (C) minimizes so x is also a solution in (L) f(x) + λ (h(x) t)

17 17 (L) to (C): if x is a solution in (L), then the KKT conditions for (C) are satisfied by taking t = h(x ), so x is a solution in (C) Conclusion: {solutions in (L)} λ 0 λ 0 {solutions in (L)} {solutions in (C)} t {solutions in (C)} t such that (C) is strictly feasible This is nearly a perfect equivalence. Note: when the only value of t that leads to a feasible but not strictly feasible constraint set is t = 0, then we do get perfect equivalence So, e.g., if h 0, and (C), (L) are feasible for all t, λ 0, then we do get perfect equivalence

18 Uniqueness in l 1 penalized problems Using the KKT conditions and simple probability arguments, we have the following (perhaps surprising) result: Theorem: Let f be differentiable and strictly convex, let X R n p, λ > 0. Consider min β f(xβ) + λ β 1 If the entries of X are drawn from a continuous probability distribution (on R np ), then w.p. 1 there is a unique solution and it has at most min{n, p} nonzero components Remark: here f must be strictly convex, but no restrictions on the dimensions of X (we could have p n) Proof: the KKT conditions are { X T {sign(β i )} if β i 0 f(xβ) = λs, s i [ 1, 1] if β i = 0, i = 1,... n 18

19 19 Note that Xβ, s are unique. Define S = {j : Xj T f(xβ) = λ}, also unique, and note that any solution satisfies β i = 0 for all i / S First assume that rank(x S ) < S (here X R n S, submatrix of X corresponding to columns in S). Then for some i S, X i = for constants c j R, so that s i X i = j S\{i} j S\{i} c j X j s j c j λ(s j X j ) Hence taking an inner product with f(xβ), λ = j S\{i} (s i s j c j )λ, i.e., j S\{i} s i s j c j = 1

20 20 In other words, we ve proved that rank(x S ) < S implies s i X i is in the affine span of s j X j, j S \ {i} (subspace of dimension < n) We say that the matrix X has columns in general position if any affine subspace L of dimension k < n does not contain more than k + 1 elements; of {±X 1,... ± X p } (excluding antipodal pairs) It is straightforward to show that, if the entries of X have a density over R np, then X is in general position with probability 1 X 3 X 4 X 2 X 1

21 21 Therefore, if entries of X are drawn from continuous probability distribution, any solution must satisfy rank(x S ) = S Recalling the KKT conditions, this means the number of nonzero components in any solution at most S min{n, p}. Further, we can reduce our optimization problem (by partially solving) to min β S R S f(x S β S ) + λ β S 1 Finally, strict convexity implies uniqueness of the solution in this problem, and hence in our original problem

22 22 Back to duality One of the most important uses of duality is that, under strong duality, we can characterize primal solutions from dual solutions Recall that under strong duality, the KKT conditions are necessary for optimality. Given dual solutions u, v, any primal solution x satisfies the stationarity condition 0 f(x ) + m u i h i (x ) + In other words, x solves min x L(x, u, v ) r vi l j (x ) j=1 Generally, this reveals a characterization of primal solutions In particular, if this is satisfied uniquely (i.e., above problem has a unique minimizer), then the corresponding point must be the primal solution

23 23 References S. Boyd and L. Vandenberghe (2004), Convex optimization, Chapter 5 R. T. Rockafellar (1970), Convex analysis, Chapters 28 30

### Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions

### Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

### COM S 672: Advanced Topics in Computational Models of Learning Optimization for Learning COM S 672: Advanced Topics in Computational Models of Learning Optimization for Learning Lecture Note 4: Optimality Conditions Jia (Kevin) Liu Assistant Professor Department of Computer Science Iowa State

### CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

### ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

### Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

### Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

### Support vector machines Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support

### Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

### I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

### Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

### Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

### Duality. Geoff Gordon & Ryan Tibshirani Optimization / Duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g., consider

### Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

### Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

### The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem: HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course

### COM S 578X: Optimization for Machine Learning COM S 578X: Optimization for Machine Learning Lecture Note 5: Optimality Conditions Jia (Kevin) Liu Assistant Professor Department of Computer Science Iowa State University Ames Iowa USA Fall 2018 JKL

### Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

### 5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

### Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

### 14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

### Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

### Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

### Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

### Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

### Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

### Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights

### Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

### On the Method of Lagrange Multipliers On the Method of Lagrange Multipliers Reza Nasiri Mahalati November 6, 2016 Most of what is in this note is taken from the Convex Optimization book by Stephen Boyd and Lieven Vandenberghe. This should

### The Karush-Kuhn-Tucker (KKT) conditions The Karush-Kuhn-Tucker (KKT) conditions In this section, we will give a set of sufficient (and at most times necessary) conditions for a x to be the solution of a given convex optimization problem. These

### Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

### Lecture 7: Convex Optimizations Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1

### Lagrangian Duality and Convex Optimization Lagrangian Duality and Convex Optimization David Rosenberg New York University February 11, 2015 David Rosenberg (New York University) DS-GA 1003 February 11, 2015 1 / 24 Introduction Why Convex Optimization?

### Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

### Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

### Convex Optimization Overview (cnt d) Conve Optimization Overview (cnt d) Chuong B. Do November 29, 2009 During last week s section, we began our study of conve optimization, the study of mathematical optimization problems of the form, minimize

### Nonlinear Programming and the Kuhn-Tucker Conditions Nonlinear Programming and the Kuhn-Tucker Conditions The Kuhn-Tucker (KT) conditions are first-order conditions for constrained optimization problems, a generalization of the first-order conditions we

### ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained

### Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

### Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

### Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

### Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

### Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

### Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

### Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

### Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

### Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

### 4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality

### Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

### Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

### E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

### 1. f(β) 0 (that is, β is a feasible point for the constraints) xvi 2. The lasso for linear models 2.10 Bibliographic notes Appendix Convex optimization with constraints In this Appendix we present an overview of convex optimization concepts that are particularly useful

### Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

### EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

### Pattern Classification, and Quadratic Problems Pattern Classification, and Quadratic Problems (Robert M. Freund) March 3, 24 c 24 Massachusetts Institute of Technology. 1 1 Overview Pattern Classification, Linear Classifiers, and Quadratic Optimization

### Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

### Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

### Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

### Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

### Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

### Convex Optimization Lecture 6: KKT Conditions, and applications Convex Optimization Lecture 6: KKT Conditions, and applications Dr. Michel Baes, IFOR / ETH Zürich Quick recall of last week s lecture Various aspects of convexity: The set of minimizers is convex. Convex

### subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

### Chap 2. Optimality conditions Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1

### Tutorial on Convex Optimization: Part II Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation

### Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

### Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

### Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

### Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module - 5 Lecture - 22 SVM: The Dual Formulation Good morning.

### CS711008Z Algorithm Design and Analysis CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief

### 10701 Recitation 5 Duality and SVM. Ahmed Hefny 10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian

### Lecture 14: Optimality Conditions for Conic Problems EE 227A: Conve Optimization and Applications March 6, 2012 Lecture 14: Optimality Conditions for Conic Problems Lecturer: Laurent El Ghaoui Reading assignment: 5.5 of BV. 14.1 Optimality for Conic Problems

### Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

### CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

### Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

### Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

### Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

### Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic

### Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

### Lecture 10: Duality in Linear Programs 10-725/36-725: Convex Optimization Spring 2015 Lecture 10: Duality in Linear Programs Lecturer: Ryan Tibshirani Scribes: Jingkun Gao and Ying Zhang Disclaimer: These notes have not been subjected to the

### Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Lagrange Relaxation and Duality Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler

### Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1 Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1 Session: 15 Aug 2015 (Mon), 10:00am 1:00pm I. Optimization with

### Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

### Interior Point Algorithms for Constrained Convex Optimization Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems

### Lagrangian Duality Theory Lagrangian Duality Theory Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapter 14.1-4 1 Recall Primal and Dual

### Generalization to inequality constrained problem. Maximize Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

### Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Lecture 8. Strong Duality Results. September 22, 2008 Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation

### Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality

### Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

### The Karush-Kuhn-Tucker conditions Chapter 6 The Karush-Kuhn-Tucker conditions 6.1 Introduction In this chapter we derive the first order necessary condition known as Karush-Kuhn-Tucker (KKT) conditions. To this aim we introduce the alternative COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x Consider constraints Notes for 2017-04-24 So far, we have considered unconstrained optimization problems. The constrained problem is minimize φ(x) s.t. x Ω where Ω R n. We usually define x in terms of