The Karush-Kuhn-Tucker (KKT) conditions

Similar documents
Existence of minimizers

Constrained Optimization and Lagrangian Duality

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Chap 2. Optimality conditions

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Numerical Optimization

Convex Optimization & Lagrange Duality

Nonlinear Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

5. Duality. Lagrangian

IE 5531: Engineering Optimization I

Convex Optimization M2

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

1. f(β) 0 (that is, β is a feasible point for the constraints)

Optimality Conditions for Constrained Optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

CO 250 Final Exam Guide

The Karush-Kuhn-Tucker conditions

Lecture 8. Strong Duality Results. September 22, 2008

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Finite-Dimensional Cones 1

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Support Vector Machines for Regression

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture 6: Conic Optimization September 8

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Math 273a: Optimization Subgradients of convex functions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

2.098/6.255/ Optimization Methods Practice True/False Questions

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Optimization. A first course on mathematics for economists

A Brief Review on Convex Optimization

Inequality Constraints

Lecture: Duality.

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Optimality, Duality, Complementarity for Constrained Optimization

CS-E4830 Kernel Methods in Machine Learning

Duality Theory of Constrained Optimization

Date: July 5, Contents

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Assignment 1: From the Definition of Convexity to Helley Theorem

Convex Optimization Lecture 6: KKT Conditions, and applications

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Sharpening the Karush-John optimality conditions

Topic one: Production line profit maximization subject to a production rate constraint. c 2010 Chuan Shi Topic one: Line optimization : 22/79

Convex Programs. Carlo Tomasi. December 4, 2018

Finite Dimensional Optimization Part III: Convex Optimization 1

Lagrangian Duality and Convex Optimization

4. Algebra and Duality

Constrained optimization

Second Order Optimality Conditions for Constrained Nonlinear Programming

Convex Optimization and Modeling

Lecture 18: Optimization Programming

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Constraint qualifications for nonlinear programming

Lecture: Duality of LP, SOCP and SDP

Lecture 3. Optimization Problems and Iterative Algorithms

Finite Dimensional Optimization Part I: The KKT Theorem 1

EE/AA 578, Univ of Washington, Fall Duality

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

5 Handling Constraints

Constrained Optimization Theory

Scientific Computing: Optimization

1.4 FOUNDATIONS OF CONSTRAINED OPTIMIZATION

Mathematical Economics. Lecture Notes (in extracts)

Lecture 7: Convex Optimizations

University of California, Davis Department of Agricultural and Resource Economics ARE 252 Lecture Notes 2 Quirino Paris

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Constrained Optimization

Week 3 Linear programming duality

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Nonlinear Programming and the Kuhn-Tucker Conditions

CONSTRAINED NONLINEAR PROGRAMMING

Support Vector Machines

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Machine Learning. Support Vector Machines. Manfred Huber

Optimization for Machine Learning

ICS-E4030 Kernel Methods in Machine Learning

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

Barrier Method. Javier Peña Convex Optimization /36-725

Mathematical Appendix

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Computational Optimization. Constrained Optimization Part 2

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

3. THE SIMPLEX ALGORITHM

Generalization to inequality constrained problem. Maximize

Transcription:

The Karush-Kuhn-Tucker (KKT) conditions In this section, we will give a set of sufficient (and at most times necessary) conditions for a x to be the solution of a given convex optimization problem. These are called the Karush-Kuhn-Tucker (KKT) conditions, and they play a fundamental role in both the theory and practice of convex optimization. We have derived these conditions (and have shown that they we both necessary and sufficient) in some special cases in the previous notes We will start here by considering a general convex program with inequality constraints only. This is just to make the exposition easier after we have this established, we will show how to include equality constraints (which must always be affine in convex programming). A great source for the material in this section is [Lau13, Chap. 10]. Everywhere in this section, the functionals f 0 (x), f 1 (x),..., f M (x), f m : R N R, are convex and differentiable. 1

KKT (inequality only) The KKT conditions for the convex program minimize x f 0 (x) subject to f 1 (x) 0 (1) f 2 (x) 0. f M (x) 0 in x R N and λ R M are f m (x) 0, m = 1,..., M, (K1) λ 0, (K2) λ m f m (x) = 0, m = 1,..., M, (K3) M f 0 (x) + λ m f m (x) = 0, (K4) m=1 We start by establishing that these are sufficient conditions for a minimizer. If the KKT conditions hold for x and some λ R M, then x is a solution to the program (1). Below, we denote the feasible set as C = {x R N : f m (x) 0, m = 1,..., M}. It should be clear that the convexity of the f m implies the convexity 1 1 The f m are convex functions, so their sublevel sets are convex sets, and C 2

of C. The sufficiency proof simply relies on the convexity of C, the convexity of f 0, and the concept of a descent/ascent direction (see the previous notes). Suppose x, λ obey the KKT conditions. The first thing to note is that if λ 1 = λ 2 = = λ M = 0, then (K4) implies that f(x ) = 0, and hence x is a global min, as by the convexity of f 0, for all x C. f 0 (x) f 0 (x ) + x x, f 0 (x ) = f(x ), Now suppose that R > 0 entries of λ are positive without loss of generality, we will take these to be the first R, λ 1 > 0, λ 2 > 0,, λ R > 0, λ R+1 = 0,, λ M = 0. We can rewrite (K4) as f 0 (x ) + λ 1 f 1 (x ) + + λ R f R (x ) = 0, (2) and note that by (K3), f 1 (x ) = 0,..., f R (x ) = 0. Consider any x C, x x. As C is convex, every point in between x and x must also be in C, meaning f m (x + θ(x x )) 0 = f m (x ), m = 1,..., R, is an intersection of sublevel sets. 3

for all 0 θ 1. This means that x x cannot be an ascent direction, and so It is clear, then, that x x, f m (x ) 0, m = 1,..., R. x x, f 0 (x ) 0, as otherwise there is no way (2) can hold with positive λ m. Along with the convexity of f 0, this means that f 0 (x) f 0 (x ) + x x, f(x ) f(x ). Since this holds for all x C, x is a minimizer. Necessity To establish the necessity of the KKT conditions, we need one piece of mathematical technology that we have not been exposed to yet. The Farkas lemma is a fundamental result in convex analysis; we will prove it in the Technical Details section. Farkas Lemma: Let A be an M N matrix and b R M. The exactly one of the following two things is true: 1. there exists x 0 such that Ax = b; 2. there exists λ R M such that A T λ 0, and b, λ > 0. 4

With this in place, we can give two different situations under which KKT is necessary. These are by no means the only situations for which this is true, but these two cover a high percentage of the cases encountered in practice. Suppose x is a solution to a convex program with affine inequality constraints: minimize f 0 (x) subject to Ax b. x R N Then there exists a λ such that x, λ obey the KKT conditions. In this case, the constraint functions have the form f m (x) = x, a m b m, and so f m (x) = a m, where a T m is the mth row of A. Since x is feasible, K1 must hold. If none of the constraints are active, meaning f m (x ) for m = 1,..., M and x lies in the interior of C, then it must be that f(x ) = 0, and K2 K4 hold with λ = 0. Suppose that there are R active constraints at x ; without loss of generality, we will take these to be the first R: f 1 (x ) = 0, f 2 (x ) = 0,..., f R (x ) = 0, f R+1 (x ) < 0,..., f M (x ) < 0. We start by taking λ R+1 = λ R+2 = = λ M = 0, which means K3 will hold. Suppose that there were no λ 0 such that f 0 (x ) + λ 1 f 1 (x ) + + λ R f R (x ) = 0. (3) 5

With A : R N consisting of the first R rows of A, and b R R as the first R entries in b, this means that there is no λ R R such that A T λ = f 0 (x ), λ 0. By the Farkas lemma, this means that there is a d R N such that A d 0, d, f 0 (x ) > 0, which means, since f m (x) = a m, d, f 0 (x ) < 0 d, f 1 (x ) 0. d, f R (x ) 0. This means that d is a descent direction for f 0, and is not an ascent direction for f 1,..., f R. Because the constraint functionals are affine, if d, f m (x ) = 0 above, then f m (x +td) = f m (x ) this means that moving in the direction d will not increase f 1,..., f m. Since the last M R constraints are not active, we can move at least a small amount in any direction so that they stay that way. This means that there exists a t > 0 such that but also maintains feasibility: f 0 (x + td) < f(x ), f m (x + td) 0, m = 1,..., M. This directly contradicts the assertion that x is optimal, and so λ 1,..., λ R 0 must exist such that (3) holds. For general convex inequality constraints, there are various other scenarios under which the KKT conditions are necessary; these are 6

called constraint qualifications. We have already seen that polygonal (affine) constraints qualify. Another set of constraint qualifications are Slater s condition: Slater s condition: There exists at least one strictly feasible point; a x such that none of the constraints are active: f 1 (x) < 0, f 2 (x) < 0,, f M (x) < 0. Suppose that Slater s condition holds for f 1,..., f M, and let x be a solution to minimize f 0 (x) subject to f m 0, m = 1,..., M. x R N Then there exists a λ such that x, λ obey the KKT conditions. This is proved in much the same way as in the affine inequality case. Suppose that x is a solution, and that f 1 (x ) = 0, f 2 (x ) = 0,..., f R (x ) = 0, f R+1 (x ) < 0,..., f M (x ) < 0. We take λ R+1 = = λ M = 0, and show that if there is not λ 1,..., λ R 0 such that f 0 (x ) + R λ m f m (x ) = 0, (4) m=1 then there is a another feasible point with a smaller value of f 0. 7

By the Farkas lemma, if there does not exist a λ 1,..., λ R 0 such that (4) holds, then there must be a u R N such that u, f 0 (x ) < 0 u, f 1 (x ) 0. u, f R (x ) 0. Now let z be a strictly feasible point, f m (z) < 0 for all m. We know that 0 > f m (z) f m (x )+ z x, f m (x ) z x, f m (x ) < 0, for m = 1,..., R, since then f m (x ) = 0. So u is a descent direction for f 0, and z x is a descent direction for all all of the constraint functionals f m, m = 1,..., R that are active. We consider a convex combination of these two vectors d θ = (1 θ)u + θ(z x ). We know that d θ, f m (x ) < 0 for all 0 < θ 1, m = 1,..., R. We also know that there is a θ small enough so that d θ is a descent direction for f 0 ; there exists 0 < ɛ 0 < 1 such that d ɛ0, f 0 (x ) < 0. Finally, we also know that we can move a small enough amount in any direction and keep constraints f R+1,..., f M inactive. Thus there is a t > 0 such that f 0 (x + td ɛ0 ) < f 0 (x ), f m (x + td ɛ0 ) 0, m = 1,..., M, which directly contradicts the assertion that x is optimal. 8

It should be clear from the two arguments above that Slater s condition can be refined we only need a point which obeys f m (z) < 0 for the f m which are not affine. We now state this formally: Suppose that f 1,..., f M are affine functionals, and f M +1,..., f M are convex functional which are not affine. Suppose that Slater s condition holds for f M +1,..., f M, and let x be a solution to minimize f 0 (x) subject to f m (x) 0, m = 1,..., M. x R N Then there exists a λ such that x, λ obey the KKT conditions. The above statement lets us extend the KKT conditions to optimization problems with linear equality constraints, which we now state. 9

KKT (with equality constraints) The KKT conditions for the optimization program min f 0 (x) x subject to f m (x) 0, m = 1,..., M (5) h p (x) = 0, p = 1,..., P in x R N, λ R M, and ν R P are f m (x) 0, m = 1,..., M, (K1) h p (x) = 0, p = 1,..., P λ 0, (K2) M f 0 (x) + λ m f m (x) + m=1 λ m f m (x) = 0, m = 1,..., M, (K3) P ν p h p (x) = 0, (K4) p=1 We call the λ and ν above Lagrange multipliers. Notice that λ is constrained to be positive, while ν can be arbitrary. Also, if the h p are affine, which they have to be for the program above to be convex, then we can write the equality constraints h p (x) = 0, p = 1,..., P as Ax = b, for some A : P N and b R P. Also, we can rewrite (K4) as M f 0 (x) + λ m f m (x) + A T ν = 0. m=1 If the f m are convex and the h p affine, then the KKT conditions are sufficient for x to be the solution to the convex program (5). 10

If Slater s condition holds for the non-affine f m, then they are also necessary. Almost nothing changes in the proofs above we could simply separate an equality constraint of the form x, a = b into x, a b 0 and x, a + b 0. Then we can recombine the result, taking nu = λ 1 λ 2, where λ 1 is the Lagrange multiplier for x, a b and λ 2 is the same for x, a + b. 11

Technical Details: Proof of the Farkas Lemma We prove the Farkas Lemma: if A is an M N matrix and b R M is a given vector, then exactly one of the following two things is true: 1. there exists x 0 such that Ax = b; 2. there exists v R M such that A T v 0, and b, v > 0. It is clear that if the first condition holds, the second cannot, as b, v = x, A T v for any x such that Ax = b, and x, A T v 0 for any x 0 and A T v 0. It is more difficult to argue that if the first condition does not hold, the second must. This ends up being a direct result of the separating hyperplane theorem. Let C(A) be the (convex) cone generated by the columns a 1,..., a N of A: { } N C(A) = v R M : v = θ n a n, θ n 0, n = 1,..., N. n=1 Then 1 above is clearly equivalent to b C(A). Since C(A) is closed and convex, and b is a single point, we know that if b C(A), then C(A) and b are strongly separated by a hyperplane. That is, if b C(A) implies that there exists a v R M such that which is the same as saying v T b > v T λ for all λ C(A), v T b > sup v T λ = sup v T Ax. λ C(A) x 0 12

We know that 0 C(A), so we must have v T b > 0. The above equation also gives a finite upper bound (namely whatever the actual value of v T b is) on the function v T Ax for all x 0. But this means that A T v 0, as otherwise we would have the following contradiction. If there were some index n such that (A T v)[n] = ɛ > 0, then with e m 0 as the unit vector e n [k] = { 1, k = n, 0, k n, we have sup v T Ax sup v T A(αe n ) = sup αɛ =, x 0 α 0 α 0 which contradicts the existence of this upper bound. References [Lau13] N. Lauritzen. Undergraduate Convexity. World Scientific, 2013. 13