UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Similar documents
Constrained Optimization

Optimality Conditions for Constrained Optimization

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Duality Theory of Constrained Optimization

Symmetric Matrices and Eigendecomposition

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Chapter 2 Convex Analysis

CO 250 Final Exam Guide

Numerical Optimization

Convex Functions and Optimization

Summary Notes on Maximization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Mathematical Economics. Lecture Notes (in extracts)

Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis.

MATH2070 Optimisation

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Constrained Optimization Theory

Inequality Constraints

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Date: July 5, Contents

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

Nonlinear Programming and the Kuhn-Tucker Conditions

Lecture 3. Optimization Problems and Iterative Algorithms

4TE3/6TE3. Algorithms for. Continuous Optimization

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Convex Analysis and Economic Theory Winter 2018

1 Review of last lecture and introduction

The Steepest Descent Algorithm for Unconstrained Optimization

Optimization Theory. Lectures 4-6

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

FIRST- AND SECOND-ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH VANISHING CONSTRAINTS 1. Tim Hoheisel and Christian Kanzow

Convex Optimization M2

Chap 2. Optimality conditions

Least Sparsity of p-norm based Optimization Problems with p > 1

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Introduction to Nonlinear Stochastic Programming

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Sharpening the Karush-John optimality conditions

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

University of California, Davis Department of Agricultural and Resource Economics ARE 252 Lecture Notes 2 Quirino Paris

Optimization. A first course on mathematics for economists

5 Handling Constraints

Constrained maxima and Lagrangean saddlepoints

. This matrix is not symmetric. Example. Suppose A =

Algorithms for nonlinear programming problems II

Algorithms for constrained local optimization

Nonlinear Optimization

5. Duality. Lagrangian

Generalization to inequality constrained problem. Maximize

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

Additional Homework Problems

The Karush-Kuhn-Tucker (KKT) conditions

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

Interior-Point Methods for Linear Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Existence of minimizers

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

We are now going to move on to a discussion of Inequality constraints. Our canonical problem now looks as ( ) = 0 ( ) 0

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Convex Analysis and Optimization Chapter 2 Solutions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Examination paper for TMA4180 Optimization I

Finite Dimensional Optimization Part III: Convex Optimization 1

Constrained Optimization and Lagrangian Duality

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

CONSTRAINED NONLINEAR PROGRAMMING

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

EC /11. Math for Microeconomics September Course, Part II Problem Set 1 with Solutions. a11 a 12. x 2

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Finite-Dimensional Cones 1

Characterizations of Solution Sets of Fréchet Differentiable Problems with Quasiconvex Objective Function

2.3 Linear Programming

Convex Optimization Boyd & Vandenberghe. 5. Duality

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Convex Optimization and Modeling

Final Exam - Math Camp August 27, 2014

Optimality, Duality, Complementarity for Constrained Optimization

The Kuhn-Tucker Problem

Assignment 1: From the Definition of Convexity to Helley Theorem

Lecture 19 Algorithms for VIs KKT Conditions-based Ideas. November 16, 2008

8. Constrained Optimization

Convex Sets with Applications to Economics

Nonlinear Programming Models

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Vector Spaces. Addition : R n R n R n Scalar multiplication : R R n R n.

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games

Written Examination

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Computational Optimization. Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

Math 341: Convex Geometry. Xi Chen

1. Introduction. Consider the following quadratically constrained quadratic optimization problem:

Lecture: Duality.

Transcription:

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1

1 Introduction We consider constrained optimization problems of the form: (P) min x f(x) s.t. g(x) 0 h(x) = 0 x X, where X is an open set and g(x)) = (g 1 (x),..., g m (x)) : R n R m, h(x) = (h 1 (x),..., h l (x)) : R n R l. Let F denote the feasible region of (P), namely, F := {x X : g(x) 0, h(x) = 0}. Then the problem (P) can be written as min f(x). x F Recall that x is a local minimum of (P) if there exists ε > 0 such that f( x) f(y) for all y F B( x, ε). Also recall that local and global minima and maxima, strict and non-strict, are defined analogously. We will often use the following shorthand notation: g(x) = g 1 (x) T. g m (x) T and h(x) = h 1 (x) T. h l (x) T. Here g(x) R m n and h(x) R l n are each Jacobian matrices, the i th row of which is the transpose of the corresponding gradient. 2

2 Necessary Optimality Conditions 2.1 Geometric Necessary Conditions A set C R n is a cone if for every x C, αx C for any α 0. A set C is a convex cone if C is a cone and C is a convex set. Suppose x F. We have the following definitions: F 0 := {d : f( x) T d < 0} is the cone of improving directions of f(x) at x (not including the origin 0 in the cone). I := {i : g i ( x) = 0} is the set of indices of the binding inequality constraints at x. G 0 := {d : g i ( x) T d < 0 for all i I} is the cone of inward pointing directions for the binding constraints at x (not including the origin 0 in the cone). H 0 := {d : h i ( x) T d = 0 for all i = 1,..., l} is the set of tangent directions for the equality constraints at x. Theorem 2.1 (Geometric First-order Necessary Conditions) If x is a local minimum of (P) and either: (i) h(x) is a linear function, i.e., h(x) = Ax b for A R l n, or (ii) h i ( x), i = 1,..., l, are linearly independent, then F 0 G 0 H 0 =. Proof: Assume h(x) is the linear function h(x) = Ax b, whereby h i ( x) = A i for i = 1,..., l and H 0 = {d : Ad = 0}. Suppose d F 0 G 0 H 0. Suppose that λ > 0 and is sufficiently small. Then g i ( x + λd) < g i ( x) = 0 for i I because d G 0. Also g i ( x + λd) < 0 for i / I from continuity and the fact that g i ( x) < 0. Furthermore h( x + λd) = (A x b) + λad = 0. 3

Therefore x + λd F for all λ > 0 and sufficiently small. On the other hand, for all sufficiently small λ > 0 it holds that f( x + λd) < f( x). This contradicts the assumption that x is a local minimum of (P). The proof when h(x) is nonlinear is rather involved, as it relies on the Implicit Function Theorem. We present the proof in Section 6 at the end of this lecture note. Note that Theorem 2.1 essentially states that if a point x is (locally) optimal, there is no direction d which is both a feasible direction (i.e., such that g( x + λd) 0 and h( x + λd) 0) for small λ > 0) and is an improving direction (i.e., such that f( x + λd) < f( x) for small λ > 0), which makes sense intuitively. 2.2 Separation of Convex Sets We will shortly attempt to restate the geometric necessary local optimality conditions: F 0 G 0 H 0 = into a constructive and computable algebraic statement about the gradients of the objective function and the constraint functions. The vehicle that will make this happen involves the separation theory of convex sets. We start with some definitions: If p 0 is a vector in R n and α is a scalar, H := {x R n : p T x = α} is a hyperplane, and H + = {x R n : p T x α}, H = {x R n : p T x α} are halfspaces. Let S and T be two non-empty sets in R n. A hyperplane H = {x : p T x = α} is said to separate S and T if p T x α for all x S and p T x α for all x T, i.e., if S H + and T H. If, in addition, S T H, then H is said to properly separate S and T. H is said to strictly separate S and T if p T x > α for all x S and p T x < α for all x T. H is said to strongly separate S and T if for some ε > 0 it holds that p T x α + ε for all x S and p T x α ε for all x T. 4

Proposition 2.1 Let S be a nonempty closed convex set in R n, and y S. Then there exists a unique point x S of minimum distance to y. Furthermore, x is the minimizing point if and only if (y x) T (x x) 0 for all x S. (1) Proof: Let f(x) = x y 2. Let ˆx be an arbitrary point in S, and let Ŝ = S {x : x y ˆx y }. Then Ŝ is a compact set and Ŝ S. Furthermore, since f(x) is a continuous function it follows from the Weierstrass Theorem that f( ) attains its minimum over the set Ŝ at some point x Ŝ. As a consequence of the construction of Ŝ it also follows that x minimizes f( ) over the set S. Note that x y. Furthermore, because f( ) is a strictly convex function, x is unique. We now establish that x is the point in S that minimizes the distance to y if and only if (1) holds. Let us first assume that (1) holds; then note that for any x S, x y 2 = (x x) (y x) 2 = x x 2 + y x 2 2(x x) T (y x) x y 2. Therefore f(x) f( x), and so x is the point in S that minimizes the distance to y. Conversely, let us assume that x is the point in S that minimizes the distance to y. For any x S, λx + (1 λ) x S for any λ [0, 1]. Therefore λx + (1 λ) x y x y. Thus: x y 2 λx + (1 λ) x y 2 which when rearranged yields: = λ(x x) + ( x y) 2 = λ 2 x x 2 + 2λ(x x) T ( x y) + x y 2, λ 2 x x 2 2λ(y x) T (x x). Restricting attention to when λ > 0, we divide by λ to yield: λ x x 2 2(y x) T (x x). 5

Now let λ 0, whereby we obtain (y x) T (x x) 0. Theorem 2.2 Let S be a nonempty closed convex set in R n, and suppose that y S. Then there exists p 0 and α such that H = {x : p T x = α} strongly separates S and {y}. Proof: Let x be the point in S that minimizes the distance from y to the set S. Note that x y. Let p = y x, α = 1 2 (pt y + p T x), and ε = 1 2 (pt y p T x) = 1 2 y x 2 > 0. From Proposition 2.1 it follows that for any x S, (x x) T (y x) 0, and so p T x = (y x) T x (y x) T x = p T x = α ε. Therefore p T x α ε for all x S. On the other hand, p T y = α + ε, which completes the proof. We now use Theorem 2.2 to prove two useful results about the solvability of linear inequalities: Theorem 2.3 (Farkas Lemma) Given an m n matrix A and an n- vector c, exactly one of the following two systems has a solution: or (i) Ax 0, c T x > 0, (ii) A T y = c, y 0. Proof: First note that both systems (i) and (ii) cannot have a solution, since then we would have 0 < c T x = y T Ax 0. Suppose the system (ii) has no solution. Let S = {x : x = A T y for some y 0}. Then c S. S is easily seen to be a convex set. Also, S is a closed set. 1 Therefore there exists p 0 and α such that c T p > α and p T x α for all x S. It then follows by construction of S that p T (A T y) = (Ap) T y α for all y 0. 1 For an exact proof of this fact, see Appendix B.3 of Nonlinear Programming by Dimitri Bertsekas, Athena Scientific, 1999. 6

If (Ap) i > 0 for some i, one could set y i sufficiently large so that (Ap) T y > α, a contradiction. Thus Ap 0. Taking y = 0, we also have that α 0, and so c T p > 0, whereby p is a solution of (i). Lemma 2.1 [Key Lemma] Given matrices Ā, B, and H of appropriate dimensions, exactly one of the following two systems has a solution: (i) Āx < 0, Bx 0, Hx = 0, or (ii) ĀT u + B T w + H T v = 0, u 0, w 0, e T u = 1. Proof: It is easy to show that both (i) and (ii) cannot have a solution. Suppose (i) does not have a solution. Then the system: Āx +eθ 0, θ > 0 Bx 0 Hx 0 Hx 0 has no solution. This system can be re-written in the form: Ā e B 0 H 0 H 0 ( x θ ) ( x 0, (0,..., 0, 1) θ ) > 0. From Farkas Lemma, there exists a vector (u; w; v 1 ; v 2 ) 0 such that: T Ā e u 0 B 0 H 0 w v 1 =. 0. H 0 v 2 1 This can be rewritten as: 7

Ā T u + B T w + H T (v 1 v 2 ) = 0, e T u = 1. Setting v = v 1 v 2 completes the proof. We conclude this subsection with some extensions/consequences of Theorem 2.2. The first extension is to the strong separation of two convex sets when one of the sets is closed and bounded, as follows: Theorem 2.4 Suppose S 1 and S 2 are disjoint nonempty closed convex sets and S 1 is bounded. Then S 1 and S 2 can be strongly separated by a hyperplane. Proof: Let T = {x R n : x = y z, where y S 1, z S 2 }. Then it is easy to show that T is a convex set. We also claim that T is a closed set. To see this, let {x i } T, and suppose x = lim i x i. Then x i = y i z i for {y i } S 1 and {z i } S 2. By the Weierstrass Theorem, some subsequence of {y i } converges to a point ȳ S 1. Then z i = y i x i ȳ x (over this subsequence), so that z = ȳ x is a limit point of {z i }. Since S 2 is also closed, z S 2, and therefore x = ȳ z T, proving that T is a closed set. By hypothesis S 1 S 2 =, whereby 0 T. Since T is convex and closed, there exists a hyperplane H = {x : p T x = ᾱ} such that p T x > ᾱ for x T and p T 0 < ᾱ (and hence ᾱ > 0). Let y S 1 and z S 2. Then x = y z T, and so p T (y z) = p T x > ᾱ > 0 for any y S 1 and z S 2. Let α 1 = inf{p T y : y S 1 } and α 2 = sup{p T z : z S 2 }, and note that 0 < ᾱ α 1 α 2. Define α = 1 2 (α 1 + α 2 ) and ε = 1 2ᾱ > 0. Then for all y S 1 and z S 2 we have and p T y α 1 = 1 2 (α 1 + α 2 ) + 1 2 (α 1 α 2 ) α + 1 2ᾱ = α + ε p T z α 2 = 1 2 (α 1 + α 2 ) 1 2 (α 1 α 2 ) α 1 2ᾱ = α ε. Therefore S 1 and S 2 are strongly separated by the hyperplane H := {x R n : p T x = α}. 8

Corollary 2.1 If S is a closed convex set in R n, then S is the intersection of all halfspaces that contain S. Theorem 2.5 Let S R n, and let C be the intersection of all halfspaces containing S. Then C is the smallest closed convex set containing S. 2.3 Algebraic Necessary Conditions In this subsection we restate the geometric necessary local optimality conditions of Theorem 2.1: F 0 G 0 H 0 = into a constructive and computable algebraic statement about the gradient of the objective function and gradients of the constraint functions. Theorem 2.6 (Fritz John Necessary Conditions) Let x be a feasible solution of (P). If x is a local minimum of (P), then there exists (u 0, u, v) such that m l u 0 f( x) + u i g i ( x) + v i h i ( x) = 0, u 0, u 0, (u 0, u, v) 0, u i g i ( x) = 0, i = 1,..., m. (Note that the first equation can be rewritten as u 0 f( x) + g( x) T u + h( x) T v = 0.) Proof: If the vectors h i ( x) are linearly dependent, then there exists v 0 such that l v i h i ( x) = 0. Setting (u 0, u) = 0 establishes the result. Suppose now that the vectors h i ( x) are linearly independent. Then we can apply Theorem 2.1 and assert that F 0 G 0 H 0 =. Assume for simplicity that I = {1,..., p}. Let 9

Ā = f( x) T g 1 ( x) T. g p ( x) T h 1 ( x) T, H =.. h l ( x) T Then there is no d that satisfies Ād < 0, Hd = 0. From the Key Lemma (Lemma 2.1), there exists (u 0, u 1,..., u p ) and (v 1,..., v l ) such that u 0 f( x) + p u i g i ( x) + l v i h i ( x) = 0, with u 0 +u 1 + +u p = 1 and (u 0, u 1,..., u p ) 0. Define u p+1,..., u m = 0. Then (u 0, u) 0, (u 0, u) 0, and for any i, either g i ( x) = 0, or u i = 0. Furthermore, u 0 f( x) + p u i g i ( x) + l v i h i ( x) = 0. When u 0 > 0, the Fritz John conditions state that the gradient of the objective function and the gradients of the active constraints must line up in an intuitively obviously way having to do with the local geometry of feasible directions at the point x. The condition u i g i ( x) = 0, i = 1,..., m, can be equivalently stated as either g i ( x) = 0 or u i = 0, i = 1,..., m, and is called the complementary slackness condition or simply the complementarity condition. Theorem 2.7 (Karush-Kuhn-Tucker (KKT) Necessary Conditions) Let x be a feasible solution of (P) and let I = {i : g i ( x) = 0}. Further, suppose that h i ( x) for i = 1,..., l and g i ( x) for i I are linearly independent. If x is a local minimum, there exists (u, v) such that f( x) + g( x) T u + h( x) T v = 0, u 0, u i g i ( x) = 0, i = 1,..., m. 10

Proof: x must satisfy the Fritz John conditions. If u 0 > 0, we can redefine u u/u 0 and v v/u 0, whereby the redefined values (u, v) satisfy the conditions of the theorem. If u 0 = 0, then u i g i ( x) + i I l v i h i ( x) = 0, and so the above gradients are linearly dependent. assumptions of the theorem. This contradicts the Example 1 Consider the problem: min 6(x 1 10) 2 +4(x 2 12.5) 2 s.t. x 2 1 +(x 2 5) 2 50 x 2 1 +3x 2 2 200 (x 1 6) 2 +x 2 2 37 In this problem, we have: f(x) = 6(x 1 10) 2 + 4(x 2 12.5) 2 g 1 (x) = x 2 1 + (x 2 5) 2 50 g 2 (x) = x 2 1 + 3x2 2 200 g 3 (x) = (x 1 6) 2 + x 2 2 37 We also have: 11

f(x) = g 1 (x) = g 2 (x) = g 3 (x) = 12(x 1 10) 8(x 2 12.5) 2x 1 2(x 2 5) 2x 1 6x 2 2(x 1 6) 2x 2 Let us determine whether or not the point x = ( x 1, x 2 ) = (7, 6) is a candidate to be an optimal solution to this problem. We first check for feasibility: g 1 ( x) = 0 0 g 2 ( x) = 43 < 0 g 3 ( x) = 0 0 To check for optimality, we compute all gradients at x: 12

f(x) = g 1 (x) = g 2 (x) = g 3 (x) = 36 52 14 2 14 36 2 12 We next check to see if the gradients line up, by trying to solve for u 1 0, u 2 = 0, u 3 0 in the following system: 36 14 14 2 0 + u 1 + u 2 + u 3 = 52 2 36 12 0 Notice that ū = (ū 1, ū 2, ū 3 ) = (2, 0, 4) solves this system, and that ū 0 and ū 2 = 0. Therefore x is a candidate to be an optimal solution of this problem. Example 2 Consider the problem (P): (P) : max x x T Qx s.t. x 1, where Q is symmetric. This is equivalent to: min x x T Qx s.t. x T x 1. 13

The Fritz John conditions are: 2u 0 Qx + 2ux = 0 x T x 1 (u 0, u) 0 (u 0, u) 0 u(1 x T x) = 0. Suppose that u 0 = 0. Then u 0. However, the gradient condition implies that x = 0, and then from complementarity we have u = 0, which is a contradiction. Therefore we can assume that u 0 > 0 and so instead work directly with the KKT conditions: 2Qx + 2ux = 0 x T x 1 u 0 u(1 x T x) = 0. First note that if x and u satisfy the KKT conditions, then x T Qx = ux T x = u, whereby we see that u is the objective function value of the problem (P) at x. One solution to the KKT system is x = 0, u = 0, with objective function value u = x T Qx = 0. Are there any better solutions to the KKT system? If x 0 is a solution of the KKT system together with some value u, then x is an eigenvector of Q with nonnegative eigenvalue u, so the objective value of this solution is u. Therefore the solution of (P) with the largest objective function value is x = 0 if the largest eigenvalue of Q is nonpositive. If the largest eigenvalue of Q is positive, then the optimal objective value of (P) is the largest eigenvalue, and the optimal solution is any eigenvector x corresponding to this eigenvalue, normalized so that x = 1. Example 3 Consider the problem: 14

min (x 1 12) 2 +(x 2 + 6) 2 s.t. x 2 1 + 3x 1 +x 2 2 4.5x 2 6.5 (x 1 9) 2 +x 2 2 64 8x 1 +4x 2 = 20 In this problem, we have: f(x) = (x 1 12) 2 + (x 2 + 6) 2 g 1 (x) = x 2 1 + 3x 1 + x 2 2 4.5x 2 6.5 g 2 (x) = (x 1 9) 2 + x 2 2 64 h 1 (x) = 8x 1 + 4x 2 20 Let us determine whether or not the point x = ( x 1, x 2 ) = (2, 1) is a candidate to be an optimal solution to this problem. We first check for feasibility: g 1 ( x) = 0 0 g 2 ( x) = 14 < 0 h 1 ( x) = 0 To check for optimality, we compute all gradients at x: 15

f( x) = g 1 ( x) = g 2 ( x) = h 1 ( x) = 20 14 7 2.5 14 2 8 4 We next check to see if the gradients line up, by trying to solve for u 1 0, u 2 = 0, v 1 in the following system: 20 7 14 8 0 + u 1 + u 2 + v 1 = 14 2.5 2 4 0 Notice that (ū, v) = (ū 1, ū 2, v 1 ) = (4, 0, 1) solves this system and that ū 0 and ū 2 = 0. Therefore x is a candidate to be an optimal solution of this problem. 2.4 Other Constraint Qualifications for KKT Necessary Conditions Recall that the statement of the KKT necessary conditions established herein in Theorem 2.7 has the form if x is a local minimum of (P) and (some requirement for the constraints) then the KKT conditions must hold at x. This additional requirement for the constraints that enables us to proceed with the proof of the KKT conditions is called a constraint qualification. 16

In Theorem 2.7 we established the following constraint qualification: Linear Independence Constraint Qualification: The gradients g i ( x), i I, h i ( x), i = 1,..., l are linearly independent. We will now establish two alternative (and useful) constraint qualifications. We start with a definition: Definition 2.1 A point x is called a Slater point if x satisfies g(x) < 0 and h(x) = 0, that is, x is feasible and satisfies all inequalities strictly. Theorem 2.8 (Slater condition) Suppose that g i ( ), i = 1,..., m are convex, h i ( ), i = 1,..., l are linear, and h i (x), i = 1,..., l are linearly independent for every x, and (P) has a Slater point. Then the KKT conditions are necessary to characterize a local optimal solution. Proof: Let x be a local minimum. The Fritz-John conditions are necessary for this problem, whereby there must exist (u 0, u, v) 0 such that (u 0, u) 0 and u 0 f( x) + m u i g i ( x) + l v i h i ( x) = 0, u i g i ( x) = 0, i = 1,..., m. If u 0 > 0, dividing through by u 0 demonstrates the KKT conditions. Now suppose u 0 = 0. Let x 0 be a Slater point, and define d := x 0 x. Then for each i I, 0 = g i ( x) > g i (x 0 ), and by the convexity of g i ( ) we have g i ( x) T d < 0. Also, since h i (x) is linear, h i ( x) T d = 0, i = 1,..., l. Suppose that u i > 0 for some i I. It then would be true that 0 = 0 T d = ( u 0 f( x) + m u i g i ( x) + T l v i h i ( x)) d < 0, which yields a contradiction. Therefore u i = 0 for all i I. But this then implies that v 0 and l v i h i ( x) = 0, which violates the linear independence assumption. This is a contradiction, and so u 0 > 0. 17

Theorem 2.9 (Linear constraints) If all constraints are linear, the KKT conditions are necessary to characterize a local optimal solution. Proof: Our problem is (P) s.t. min x f(x) Ax b Mx = g. Suppose x is a local optimum. Without loss of generality, we can partition the constraints Ax b into groups A I x b I and A Ī x bī such that A I x = b I and A x < bī. Then at x, the set S := {d : A Ī Id 0, Md = 0 } is precisely the set of feasible directions. Thus, in particular, for every d as above, f( x) T d 0, for otherwise d would be a feasible descent direction at x, violating its local optimality. Therefore, the linear system f( x) T d < 0 A I d 0 Md = 0 has no solution. Applying the Key Lemma (Lemma 2.1) using Ā := f( x)t, B := A I, and H := M, there exists (u 0, w, v) satisfying u 0 = 1, w 0, and u 0 f( x) + A T I w + M T v = 0, which are precisely the KKT conditions. 3 Sufficient Conditions for Optimality for Convex Optimization Problems We start with a definition. If f( ) : X R, then the α level set of f( ) is defined as: for each α R. S α := {x X : f(x) α} 18

Proposition 3.1 If f( ) is convex on X, then S α is a convex set for every α R. Proof: Suppose that f( ) is convex. For any given value of α, suppose that x, y S α. Let z = λx + (1 λ)y for some λ [0, 1]. Then f(z) λf(x) + (1 λ)f(y) λα + (1 λ)α = α, so z S α, which shows that S α is a convex set. The program (P) min x f(x) s.t. g(x) 0 h(x) = 0 x X is called a convex problem if f(x), g i (x), i = 1,..., m, are convex functions, h i (x), i = 1..., l are linear functions, and X is an open convex set. Proposition 3.2 If the optimization problem (P) is a convex problem, then the feasible region of (P) is a convex set. Proof: Because each function g i ( ) is convex, the level sets C i := {x X : g i (x) 0}, i = 1,..., m are convex sets. Also, because each function h i ( ) is linear, the sets D i = {x X : h i (x) = 0}, i = 1,..., l are convex sets. Thus, since the intersection of convex sets is also a convex set, the feasible region F = {x X : g i (x) 0, i = 1,..., m, h i (x) = 0, i = 1,..., l} 19

is a convex set. Theorem 3.1 (KKT Sufficient Conditions for Convex Problems) Suppose that (P) is a convex problem. Let x be a feasible solution of (P), and suppose x together with multipliers (u, v) satisfies f( x) + m u i g i ( x) + u 0, l v i h i ( x) = 0, u i g i ( x) = 0, i = 1,..., m. Then x is a global optimal solution of (P). Proof: We know from Proposition 3.2 that the feasible region of (P) is a convex set. Let I = {i g i ( x) = 0} denote the index of active constraints at x. Let x F be any point different from x. Thus for i I we have g i (x) 0 and g i ( x) = 0, whereby: 0 g i (x) g i ( x) + g i ( x) T (x x) = g i ( x) T (x x) whereby g i ( x) T (x x) 0 for all i I. Also, since h i ( ) is a linear function, we have h i ( x + λ(x x)) = 0 for all λ [0, 1], whereby h i ( x) T (x x) = 0 for each i = 1,..., l. Thus, from the KKT conditions, [ f( x) T (x x) = i I u i g i ( x) T l v i h i ( x)] (x x) 0, and by the gradient inequality, f(x) f( x) + f( x) T (x x) f( x). As this is true for all x F, it follows that x is a global optimal solution of (P). 20

Example 4 Continuing Example 1, note that f( ), g 1 ( ), g 2 ( ), and g 3 ( ) are all convex functions. Therefore the problem is a convex problem, and the KKT conditions are sufficient for optimality. Since the point x = (7, 6) was shown to satisfy the KKT conditions, it follows that x = (7, 6) is a global minimum. Example 5 Continuing Example 3, note that f( ), g 1 ( ), g 2 ( ) are all convex functions and that h 1 ( ) is a linear function. Therefore the problem is a convex problem, and the KKT conditions sufficient for optimality. Since the point x = (2, 1) was shown to satisfy the KKT conditions, it follows that x = (2, 1) is a global minimum. 4 Generalizations of Convexity and Weaker Sufficient Conditions for Optimality In Section 3 we showed the sufficiency of the KKT conditions for optimality under the condition that all relevant functions are convex. Herein we explore ways that the convexity conditions can be relaxed while still maintaining the sufficiency of the KKT conditions. It turns out that the KKT conditions will be sufficient for optimality so long as: (i) f( ) is a pseudoconvex function (to be defined shortly), (ii) g 1 ( ),..., g m ( ) are quasiconvex functions (also to be defined shortly), and (iii) h 1 ( ),..., h l ( ) are linear functions. Suppose X is a convex set in R n. The differentiable function f( ) : X R is a pseudoconvex function if for every x, y X the following holds: f(x) T (y x) 0 f(y) f(x). It is straightforward to show that a differentiable convex function is pseudoconvex, which we will prove at the end of this subsection. 21

Incidentally, we define a differentiable function f( ) : X R to be pseudoconcave if for every x, y X the following holds: f(x) T (y x) 0 f(y) f(x). Suppose X is a convex set in R n. The function f( ) : X R is a quasiconvex function if for all x, y X and for all λ [0, 1], f(λx + (1 λ)y) max{f(x), f(y)}. f( ) is quasiconcave if for all x, y X and for all λ [0, 1], f(λx + (1 λ)y) min{f(x), f(y)}. It is also straightforward to show that a convex function is quasiconvex, which will also be proved at the end of this subsection. Proposition 4.1 A function f( ) is quasiconvex on X if and only if S α is a convex set for every α R. Proof: Suppose that f(x) is quasiconvex. For any given value of α, suppose that x, y S α. Let z = λx + (1 λ)y for some λ [0, 1]. Then f(z) max{f(x), f(y)} α, so z S α, which shows that S α is a convex set. Conversely, suppose S α is a convex set for every α. Let x and y be given, and let α = max{f(x), f(y)}, and hence x, y S α. Then for any λ [0, 1], f(λx + (1 λ)y) α = max{f(x), f(y)}, and so f(x) is a quasiconvex function. Theorem 4.1 (KKT Sufficient Conditions for Generalized Convex Problems) Suppose that f(x) is a pseudoconvex function, g i (x), i = 1,..., m, are quasiconvex functions, and h i (x), i = 1,..., l, are linear functions. Let x be a feasible solution of (P), and suppose x together with multipliers (u, v) satisfies m l f( x) + u i g i ( x) + v i h i ( x) = 0, 22

u 0, u i g i ( x) = 0, i = 1,..., m. Then x is a global optimal solution of (P). Proof: Because each g i ( ) is quasiconvex, the level sets C i := {x X : g i (x) 0}, i = 1,..., m are convex sets. Also, because each h i ( ) is linear, the sets D i = {x X : h i (x) = 0}, i = 1,..., l are convex sets. Thus, since the intersection of convex sets is also a convex set, the feasible region is a convex set. F = {x X : g(x) 0, h(x) = 0} Let I = {i g i ( x) = 0} denote the index of active constraints at x. Let x F be any point different from x. Then λx + (1 λ) x is feasible for all λ [0, 1]. Thus for i I we have g i ( x + λ(x x)) = g i (λx + (1 λ) x) 0 = g i ( x) for any λ [0, 1], and since the value of g i ( ) does not strictly increase by moving in the direction x x, we must have g i ( x) T (x x) 0, for all i I. Similarly, h i ( x + λ(x x)) = 0, and so h i ( x) T (x x) = 0 for all i = 1,..., l. Thus, from the KKT conditions, f( x) T (x x) = ( g( x) T u + h( x) T v ) T (x x) 0, and by pseudoconvexity, f(x) f( x) for any feasible x. 23

The following summarizes how these different generalizations of convexity are related. Proposition 4.2 It holds that: (i) A differentiable convex function is pseudoconvex. (ii) A convex function is quasiconvex. (iii) A pseudoconvex function is quasiconvex. Proof: To prove the first claim, note that if f( ) is convex and differentiable, then f(y) f(x) + f(x) T (y x). Hence, if f(x) T (y x) 0, then f(y) f(x), and so f(x) is pseudoconvex. To prove the second claim, observe from Proposition 3.1 that the level sets of a convex function f( ) are convex sets, whereby from Proposition 4.1 it follows that f( ) is quasiconvex. To show the third claim, suppose f(x) is pseudoconvex. Let x, y and λ [0, 1] be given, and let z = λx + (1 λ)y. If λ = 0 or λ = 1, then f(z) max{f(x), f(y)} trivially; therefore, assume 0 < λ < 1. Let d = y x. If f(z) T d 0, then f(z) T (y z) = f(z) T (λ(y x)) = λ f(z) T d 0, so f(z) f(y) max{f(x), f(y)}. On the other hand, if f(z) T d 0, then f(z) T (x z) = f(z) T ( (1 λ)(y x)) = (1 λ) f(z) T d 0, so f(z) f(x) max{f(x), f(y)}. Thus f(x) is quasiconvex. 24

Example 6 Let D = R n be the domain of the convex function f( ) and let R ++ denote the positive half-line, that is, R ++ := {α R α > 0}. For every x R n define the new function h(x, t) : R n R ++ R as follows: ( x ) h(x, t) := f. t Then it is a straightforward exercise to show that h(x, t) is a quasiconvex function on its domain D := R n R ++. 5 Second-Order Optimality Conditions To describe the second order conditions for optimality, we will define the following function, known as the Lagrangian function, or simply the Lagrangian: L(x, u, v) = f(x) + m u i g i (x) + l v i h i (x) = f(x) + u T g(x) + v T h(x). Using the Lagrangian, we can, for example, re-write the gradient conditions of the KKT necessary conditions as follows: x L( x, u, v) = 0, u T g( x) = 0, u 0, g( x) 0, (2) since x L(x, u, v) = f(x) + g(x) T u + h(x) T v. Also, note that 2 xxl(x, u, v) = 2 f(x)+ m u i 2 g i (x)+ l v i 2 h i (x). Here we use the standard notation: 2 q(x) denotes the Hessian of the function q(x), and 2 xxl(x, u, v) denotes the submatrix of the Hessian of L(x, u, v) corresponding to the second partial derivatives with respect to the x variables only. Theorem 5.1 (KKT second order necessary conditions) Suppose x is a local minimum of (P), and g i ( x), i I and h i ( x), i = 1,..., l are linearly independent. Then x must satisfy the KKT conditions. Furthermore, every d that satisfies: g i ( x) T d 0, i I, h i ( x) T d = 0, i = 1..., l 25

must also satisfy d T xx L( x, u, v)d 0. Theorem 5.2 (KKT second order sufficient conditions) Suppose the point x F together with multipliers (u, v) satisfies the KKT conditions. Let I + = {i I : u i > 0} and I 0 = {i I : u i = 0}. Additionally, suppose that every d 0 that satisfies also satisfies g i ( x) T d = 0, i I +, g i ( x) T d 0, i I 0, h i ( x) T d = 0, i = 1..., l d T 2 xxl( x, u, v)d > 0. Then x is a strict local minimum of (P). 6 A Proof of Theorem 2.1 The proof of Theorem 2.1 relies on the Implicit Function Theorem. To motivate the Implicit Function Theorem, consider a system of linear functions: h(x) := Ax b and suppose that we are interested in solving h(x) = Ax b = 0. 26

Let us assume that A R l n has full row rank (i.e., its rows are linearly independent). Then we can partition the columns of A and elements of x as follows: A = [B N], x = (y; z), so that B R l l is non-singular, and h(x) = By + Nz b. Let s(z) = B 1 b B 1 Nz. Then for any z, h(s(z), z) = Bs(z)+Nz b = 0, i.e., x = (s(z), z) solves h(x) = 0. This idea of invertability of a system of equations is generalized (although only locally) by the following version of the Implicit Function Theorem, where we will preserve the notation used above: Theorem 6.1 (Implicit Function Theorem) Let h(x) : R n R l and x = (ȳ 1,..., ȳ l, z 1,..., z n l ) = (ȳ, z) satisfy: 1. h( x) = 0 2. h(x) is continuously differentiable in a neighborhood of x 3. The l l Jacobian matrix h 1 ( x) h 1 ( x) y l y 1..... h l ( x) y 1 h l ( x) y l is nonsingular. Then there exists ε > 0 along with functions s(z) = (s 1 (z),..., s l (z)) such that for all z B( z, ε), h(s(z), z) = 0. Moreover, s k (z) is continuously differentiable, k = 1,..., l. Furthermore, for all i = 1,..., l and j = 1,..., n l we have: l k=1 h i (y, z) y k s k(z) z j + h i(y, z) z j = 0. 27

Proof of Theorem 2.1: Let A = h( x) R l n. Then A has full row rank, and its columns (along with corresponding elements of x) can be re-arranged so that A = [B N] and x = (ȳ; z), where B is non-singular. Let z lie in a small neighborhood of z. Then, from the Implicit Function Theorem, there exists s(z) such that h(s(z), z) = 0. Suppose that d F 0 G 0 H 0, and let us write d = (q; p). Then 0 = Ad = Bq + Np, whereby q = B 1 Np. Let z(θ) = z + θp, y(θ) = s(z(θ)) = s( z + θp), and x(θ) = (y(θ), z(θ)). We will derive a contradiction by showing that d is an improving feasible direction, i.e., for small θ > 0, x(θ) is feasible and f(x(θ)) < f( x). To show feasibility of x(θ), note that for θ > 0 sufficiently small, it follows from the Implicit Function Theorem that: Furthermore, for i = 1,..., l we have: h(x(θ)) = h(s(z(θ)), z(θ)) = 0. 0 = h i(x(θ)) θ = l k=1 h i (s(z(θ)), z(θ)) y k s k(z(θ)) θ n l + k=1 h i (s(z(θ)), z(θ)) z k(θ) z k θ. Let r k = s k(z(θ)) θ, and recall that z k(θ) θ = p k. At θ = 0, the above equation system can then be re-written as 0 = Br + Np, or r = B 1 Np = q. Therefore, x k(θ) θ = d k for k = 1,..., n. For i I, g i (x(θ)) = g i ( x) + θ g i(x(θ)) θ = θ n k=1 g i (x(θ)) x k + θ α i (θ) θ=0 x k(θ) θ = θ g i ( x) T d + θ α i (θ), θ=0 where α i (θ) 0 as θ 0. Hence g i (x(θ)) < 0 for all i = 1,..., m for θ > 0 28

sufficiently small, and therefore, x(θ) is feasible for any θ > 0 sufficiently small. On the other hand, f(x(θ)) = f( x) + θ f( x) T d + θ α(θ) < f( x) for θ > 0 sufficiently small, where α(θ) 0 as θ 0. But this contradicts the local optimality of x. Therefore no such d can exist, and the theorem is proved. 29

7 Summary Road Map of Constrained Optimization Optimality Conditions Here is a road map of the structure of the constrained optimization optimality conditions we have developed herein. Necessary Conditions: If x solves P, then it holds that.... 1. Geometric Conditions (Theorem 2.1): F 0 G 0 H 0 = 2. Analytic Conditions: (i) Fritz John Necessary Conditions (Theorem 2.6) u 0 f( x) + m u i g i ( x) + (ii) KKT Necessary Conditions f( x) + m u i g i ( x) + l v i h i ( x) = 0 l v i h i ( x) = 0 The KKT conditions require a constraint qualification such as: (a) linear independence of active gradients (Theorem 2.7), or (b) Slater condition (Theorem 2.8), or (c) all constraints are linear (Theorem 2.9), or (d) some other constraint qualification. Sufficient Conditions: If P satisfies... and x satisfies..., then x is an optimal solution. 1. If convexity conditions hold, then the KKT conditions are sufficient for optimality (Theorem 3.1) 2. If generalized convexity conditions hold, then the KKT conditions are sufficient for optimality (Theorem 4.1) 30