Chap 2. Optimality conditions

Similar documents
Nonlinear Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization M2

Numerical Optimization

5. Duality. Lagrangian

The Karush-Kuhn-Tucker (KKT) conditions

Convex Optimization & Lagrange Duality

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Nonlinear Programming

Lecture 18: Optimization Programming

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Generalization to inequality constrained problem. Maximize

Constrained Optimization

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Date: July 5, Contents

Convex Optimization Boyd & Vandenberghe. 5. Duality

Constrained Optimization and Lagrangian Duality

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality.

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

Convex Optimization Lecture 6: KKT Conditions, and applications

Lecture 6: Conic Optimization September 8

Optimality, Duality, Complementarity for Constrained Optimization

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

LECTURE 10 LECTURE OUTLINE

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimality Conditions for Constrained Optimization

Duality Theory of Constrained Optimization

Finite Dimensional Optimization Part III: Convex Optimization 1

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Practice Exam 1: Continuous Optimisation

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 8. Strong Duality Results. September 22, 2008

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

CO 250 Final Exam Guide

GEORGIA INSTITUTE OF TECHNOLOGY H. MILTON STEWART SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES OPTIMIZATION III

EE/AA 578, Univ of Washington, Fall Duality

Additional Homework Problems

Constrained Optimization Theory

Inequality Constraints

Optimization for Machine Learning

IE 5531: Engineering Optimization I

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

IE 521 Convex Optimization Homework #1 Solution

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Convex Optimization and Modeling

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Constrained maxima and Lagrangean saddlepoints

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CONSTRAINED NONLINEAR PROGRAMMING

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Nonlinear Programming and the Kuhn-Tucker Conditions

A Brief Review on Convex Optimization

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Miscellaneous Nonlinear Programming Exercises

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

subject to (x 2)(x 4) u,

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Summer School: Semidefinite Optimization

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

UNIVERSITY OF WATERLOO

Math 5593 Linear Programming Week 1

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis.

More on Lagrange multipliers

Numerical Optimization. Review: Unconstrained Optimization

2.098/6.255/ Optimization Methods Practice True/False Questions

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Chapter 3: Constrained Extrema

Support Vector Machines: Maximum Margin Classifiers

Primal/Dual Decomposition Methods

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Lagrangian Duality Theory

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Linear and non-linear programming

Symmetric and Asymmetric Duality

LINEAR PROGRAMMING II

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

Lecture 2: Convex Sets and Functions

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

4. Algebra and Duality

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Assignment 1: From the Definition of Convexity to Helley Theorem

8 Barrier Methods for Constrained Optimization

Transcription:

Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1 a point x D α = with f (x) 0. Then: In a neigborhood of x the solution set D α = is a C 1 -manifold of dimension n 1 and at x we have f (x) D = α i.e., f (x) is perpendicular to D = α. f (x) points into the direction where f (x) has increasing values. CO, Chapter 2 p 1/34

Example: the humpback function min x 2 1 (4 2.1x 2 1 + 1 3 x 4 1 ) + x 1x 2 + x 2 2 ( 4 + 4x 2 2 ). Two global minima: (0.089-0.717) and (-0.0898 0.717), and four strict local minima. CO, Chapter 2 p 2/34

Fermat s theorem, Necessary condition Th. 2.5 (Fermat) Let f : R n R be in C 1 (C 2 ). If the point x R n is a local minimizer of the function f then f (x) = 0 (and 2 f (x) is psd). Ex. The conditions f (x) = 0, 2 f (x) psd, are necessary but not sufficient minimality conditions. Take f (x) = x 3 or ±x 4 as examples. Sufficient conditions for local minimizer Cor. 2.8 Let f : R n R be in C 2. If f (x) = 0 and 2 f (x) is positive definite then the point x is a strict local minimizer of the function f. CO, Chapter 2 p 3/34

Global minimizer of convex functions For convex functions the situation is easier L2.4 (Ex2.1) Any (strict) local minimizer of a convex function f is a (strict) global minimizer. T.2.6 Let f : R n R be convex and in C 1. The point x R n is a global minimizer of f if and only if f (x) = 0. Ex. (convex quadratic function) With positive definite Q, c R n consider the quadratic function f (x) := 1 2 x T Qx + c T x. Now, f is convex and therefore x is a global minimizer iff f (x) = 0 or Qx + c = 0 The global minimizer is therefore given by x = Q 1 c. CO, Chapter 2 p 4/34

2.2 Optimality conditions for constrained (convex) optimization We consider the convex optimization problem: where J = {1,, m}, (CO) min f (x) s.t. g j (x) 0, j J x C, C R n is a convex (closed) set; f, g 1,, g m are convex functions on C (or on an open set containing C). The set of feasible points (feasible set) is: F = {x C g j (x) 0, j = 1,, m}. Ex. Show that F is a convex set. CO, Chapter 2 p 5/34

Consider first P 0 : min{ f (x) : x C}, with C, a relatively open convex set and f is convex, differentiable on C. Th.2.9 The point x C is a (global) minimizer of P 0 if and only if f (x) T s = 0 for all s L, where L denotes the linear subspace defined by aff(c) = x + L for any x C. (Here aff(c) is the affine hull of C.) Remark: Essentially, P 0 is an unconstrained problem CO, Chapter 2 p 6/34

Example 1 Consider the relative interior of the simplex S = {x R 3 3 i=1 x i = 1, x 0} in R 3 : C = {x R 3 : 3 x i = 1, x > 0}, i=1 and define the (convex!) function f : C R by f (x) = (x 1 1) 2 + (x 2 1) 2 + (x 3 1) 2. By Exercise 1.6: aff(c) = (1, 0, 0) + L, where L is the linear subspace L := {s R 3 : s 1 + s 2 + s 3 = 0} = α 1 1 0 + β 1 0 1 α, β R. Now x C is a minimizer of min x C f (x), if and only if f (x) T s = 0 for all s L. CO, Chapter 2 p 7/34

Since f (x) = 2 x 1 1 x 2 1 x 3 1 this holds if and only if x 1 1 = x 2 1 and x 1 1 = x 3 1, which gives x 1 = x 2 = x 3. Since x 1 + x 2 + x 3 = 1, it follows that f has a unique minimizer, namely x = (1/3; 1/3; 1/3). CO, Chapter 2 p 8/34

Feasible Directions The next characterization of minimizers is based on the concept of feasible directions Def.2.10 The vector s R n is a feasible direction at a point x F if there is a λ 0 > 0 such that x + λs F for all 0 λ λ 0. The set of feasible directions at x F is denoted by FD(x). Note that 0 FD(x). Recall. K R n is a cone if: x K, λ > 0 λx K. L.2.13 For any x F the set FD(x) is a convex cone. Th.2.14 (Let f C 1.) The feasible point x F is a (global) minimizer of (CO) if and only if δf (x, s) = f (x) T s 0, s FD(x). CO, Chapter 2 p 9/34

Example 1. (continued): Reconsider the simplex in R 3 : C = {x R 3 : and f : C R given by 3 x i = 1, x 0}, i=1 f (x) = (x 1 1) 2 + (x 2 1) 2 + (x 3 1) 2. At the point x = (1/3, 1/3, 1/3), the cone of feasible directions is simply FD(x) = {s R 3 : s 1 + s 2 + s 3 = 0}. (why?) Therefore, x = (1/3, 1/3, 1/3) is an optimal solution of min x C f (x), because x 1 1 f (x) = 2 x 2 1 = 4 1 1, x 3 1 3 1 and hence f (x) T s = 0 for all s FD(x). CO, Chapter 2 p 10/34

Optimality conditions based on the Karush-Kuhn-Tucker relations: The conditions in Theorem 2.9/2.14 are not easy to check (in practice). We are thus interested in easier minimality conditions. Definition 2.15 With respect to (CO), an index j J is called active in x F if g j (x) = 0. The set of all active indices, also called active index set in x, is denoted by J x. CO, Chapter 2 p 11/34

Let us assume: C = R n, f, g j C 1 (in the rest of this subsection) Def. The feasible point x F satisfies the Karush-Kuhn-Tucker (KKT) conditions if there exist multipliers y j, j J such that: y j 0, j J and f (x) = j J y j g j (x), y j g j (x) = 0, j J. The KKT-conditions can equivalently be given as f (x) = j J x y j g j (x), y j 0, j J x. Th.2.16 (Sufficient condition) If x F satisfies the KKT conditions then x is a minimizer of (CO) (with C = R n ). CO, Chapter 2 p 12/34

Ex. The KKT conditions are not necessary for minimality. Take as example: min{x : x 2 0}. The minimum occurs at x = 0 (which is the only feasible point!). But there is no y such that 1 = y 0, y 0 = 0, y 0. (The problem here is that the feasible set F = {0} does not have interior points.) CO, Chapter 2 p 13/34

Some examples: (Application of the KKT-conditions) Ex. (minimizing a linear function over a sphere) Let a, c R n, c 0. Consider the problem { n } n c i x i (x i a i ) 2 1. min x R n i=1 The objective function is linear (hence convex), and there is one constraint function g 1 (x) (m = 1), which is convex. The KKT conditions give c 1. c n i=1 x 1 a 1 = 2λ. x n a n, λ ( n (x i a i ) 2 1 ) = 0, λ 0. i=1 CO, Chapter 2 p 14/34

c 0 implies λ > 0 and n i=1 (x i a i ) 2 = 1. We conclude that x a = αc, for α = 1/(2λ) > 0. Since x a = 1, we have α c = 1. So, the minimizing vector is x = a αc = a c c. Hence, the minimal value of the objective function is: c T x = c T a ct c c = ct a c 2 c = ct a c. CO, Chapter 2 p 15/34

Ex. (Exercise 2.7) Find a cylindrical can with height h, radius r and volume at least V such that the total surface area is minimal. Denoting the surface by p(h, r), we want to solve min p(h, r) := 2π(r 2 + rh) s.t. πr 2 h V, (r, h > 0) This is not a convex optimization problem! (why?) Setting r = e x 1 and h = e x 2 the problem becomes: { ) ) } min 2π (e 2x 1 + e x 1+x 2 π (e 2x 1+x 2 V which is equivalent to ) min{2π (e 2x 1 + e x 1+x 2 ln V π 2x 1 x 2 0, x 1 R, x 2 R}. This is a convex optimization problem! (why?) CO, Chapter 2 p 16/34

Ex. Exercise 2.7 (ctd) Let ) f (x) = 2π (e 2x 1 + e x 1+x 2 and g(x) = ln V π 2x 1 x 2. Now the problem becomes: { min f (x) g(x) 0, x R 2}. KKT conditions for this problem: we are looking for some x F such that f (x) = y (g(x)) for some scalar y 0, such that yg(x) = 0. The KKT conditions are: f (x) = y g(x), yg(x) = 0, y 0. CO, Chapter 2 p 17/34

Ex. Exercise 2.7 (ctd.) Calculate the gradients to get: ( ) [ 2e 2x 1 + e 2π x 1+x 2 e x 1+x 2 ] = y [ 2 1 ] = y [ 2 1 and yg(x) = 0, y 0. From ( ) we derive: y > 0 and therefore 2e 2x 1 + e x 1+x 2 = 2y e x 1+x 2 y = 2. This implies 2e x 1 = e x 2, whence 2r = h. Conclusion: The surface is minimal if 2r = h. Now use g(x) = 0, to get r and h in terms of V. Remark Later (in Chapter 6 Nonlinear Optimization ) we will be able to solve the problem without the transformation trick. ], CO, Chapter 2 p 18/34

Question to be answered now: Under which condition is the KKT-relation necessary for minimality? 2.2.2 The Slater condition Def.2.18 x 0 C 0 is called a Slater point of (CO) if g j (x 0 ) < 0, j where g j is nonlinear, g j (x 0 ) 0, j where g j is (affine) linear. We say that (CO) satisfies the Slater condition (constraint qualification) if a Slater point exists. CO, Chapter 2 p 19/34

Example. (Slater regularity:) Consider the optimization problem min f (x) s.t. x 2 1 + x 2 2 4 x 1 x 2 2 x 2 1 C = R 2. The points (1, 1) and ( 3 2, 3 4 ) are Slater points, (2, 0) not. CO, Chapter 2 p 20/34

Definition Some constraints g j (x) 0 may take the value zero at all feasible points. Each such constraint can be replaced by the equality constraint g j (x) = 0, without changing the feasible region. These constraints are called singular while the others are called regular. Under the Slater condition, we define the index sets J s of singular- and J r of regular constraints: J s := {j J g j (x) = 0, x F}, J r := J \ J s = {j J g j (x) < 0 for some x F}. Remark: Note, that if (CO) satisfies the Slater condition, then all singular constraints must be linear. CO, Chapter 2 p 21/34

Ideal Slater points Def.2.20 A Slater point x C 0 is called an Ideal Slater point of (CO) if g j (x ) < 0 for all j J r, g j (x ) = 0 for all j J s. L.2.21 If (CO) has a Slater point then it also has an ideal Slater point x F. Rem: An ideal Slater point is in the relative interior of F (Exercise 2.10). CO, Chapter 2 p 22/34

Example. (Slater regularity, continued:) Reconsider the program min f (x) s.t. x 2 1 + x 2 2 4 x 1 x 2 2 x 2 1 C = R 2. The point (1, 1) is a Slater point, but not an ideal Slater point. The point ( 3 2, 3 4 ) is an ideal Slater point. CO, Chapter 2 p 23/34

2.2.3 Convex Farkas Lemma Remark The so-called convex Farkas Lemma is the basis for the duality theory in convex optimization We begin with an auxiliary result. Th.2.23 (see also [FKS,Th10.1) Let = U R n be convex and w / U. Then there exists a separating hyperplain H = {x a T x = α}, 0 a R n, α R such that a T w α a T x x U and α > a T u 0 for some u 0 U. Proof: See pdf-file extraproofs for a proof. CO, Chapter 2 p 24/34

Observation: A number a is a lower bound for the optimal value of (CO) if and only if the system (1) has no solution. f (x) < a g j (x) 0, x C j = 1,..., m Lemma.2.25 (Farkas) Let (CO) satisfy the Slater condition. Then the inequality system (1) has no solution if and only there exists a vector y = (y 1,, y m ) such that (2) y 0, f (x) + m y j g j (x) a, x C. j=1 The systems (1) and (2) are called alternative systems, because exactly one of them has a solution. CO, Chapter 2 p 25/34

Proof of the (convex) Farkas lemma We need to show that precisely one of the systems (1) or (2) has a solution. First we show (easy part) that not both systems can have a solution: Otherwise for some x C and y 0, the relation a f (x) + m y j g j (x) f (x) < a, j=1 would hold, a contradiction. Then we show that at least one of the two systems has a solution. To do so, we show that if (1) has no solution. then (2) has a solution. CO, Chapter 2 p 26/34

Proof: Define the set U R m+1 as follows. U = {u = (u 0 ;... ; u m ) x C such that f (x) < a + u 0 g j (x) u j if j J r g j (x) = u j if j J s Since (1) is infeasible, 0 / U. One easily checks that U is a nonempty convex set (using that the singular functions are linear). So there exists a hyperplane that separates 0 from U, i.e., there exists a nonzero vector y = (y 0, y 1,, y m ) such that y T u 0 = y T 0 for all u U y T u > 0 for some u U. ( ) The rest of the proof is divided into four parts. CO, Chapter 2 p 27/34

I. Prove that y 0 0 and y j 0 for all j J r. II. Establish that holds y 0 (f (x) a) + m y j g j (x) 0, x C. j=1 III. Prove that y 0 must be positive. IV. Show by induction that we can assume y j > 0, j J s. CO, Chapter 2 p 28/34

Example. (Illustration of the Farkas lemma) Let us consider the convex optimization problem (CO) min 1 + x s.t. x 2 1 0 x R. Then (CO) is Slater regular. The optimal value is 0 (for x = 1). Hence, the system 1 + x < 0, x 2 1 0, x R has no solution. By the Farkas lemma there must exist a real number y 0 such that ( ) 1 + x + y x 2 1 0, x R. Indeed, taking y = 1 2 we get x + 1 + y(x 2 1) = 1 2 x 2 + x + 1 2 = 1 2 (x + 1)2 0. CO, Chapter 2 p 29/34

Exercise 2.11 ( Linear Farkas Lemma ) Let A R m n and b R m. Exactly one of the following alternative systems (I) or (II) is solvable: (I) A T x 0, x 0, b T x < 0, or (II) Ay b, y 0. Proof: Apply Farkas lemma to the special case: f (x) = b T x, ( g j (x) = (A T x) j = a j) T x, j = 1,, m where a j denotes column j of A, and C is the nonnegative orthant (nonnegative vectors) of R m. Make use of: z T x = i z i x i 0 x 0 z 0 ( ) CO, Chapter 2 p 30/34

2.2.4. Karush-Kuhn-Tucker theory (Duality theory) Remark: This (Lagrange) Duality Theory is based on the concept of saddle points. Def. We introduce the Lagrangean function of (CO): L(x, y) := f (x) + m y j g j (x), y 0 j=1 Note that L(x, y) is convex in x and linear in y. Note: For any y 0: inf {f (x) + m x C j=1 y j g j (x)} inf {f (x) + x C g j (x) 0, j { 0 }} { m j=1 y j g j (x)} inf {f (x) g j(x) 0, j} x C CO, Chapter 2 p 31/34

So: For any y 0, inf {f (x) + m x C j=1 y j g j (x)} inf x C {f (x) g j(x) 0, j} (CO) Note further that for any fixed x: m { f (x) if gj (x) 0, j sup{f (x) + y j g j (x)} = y 0 otherwise j=1 By taking (above) the sup over y 0 we find: Weak Duality: sup y 0 inf L(x, y) inf x C sup x C y 0 L(x, y) ( ) The righthand side is equivalent with (CO); the lefthand side is called the (Lagrangean) dual problem of (CO) (See also Chapter 3) CO, Chapter 2 p 32/34

Saddlepoint theory Def. 2.26 A vector pair (x, y), with x C and y 0 is called a saddle point of the Lagrange function L(x, y) if L(x, y) L(x, y) L(x, y), x C, y 0. Lemma 2.28: A saddlepoint (x, y) of L(x, y) satisfies the strong duality relation sup y 0 inf L(x, y) = L(x, y) = inf x C sup x C y 0 L(x, y). Theorem 2.30 Let the problem (CO) satisfy the Slater condition. Then the vector x is a minimizer of (CO) if and only if there is a vector y such that (x, y) is a saddle point of the Lagrange function L. CO, Chapter 2 p 33/34

We are now able to prove the converse of Th.2.16. Cor.2.33 Let C = R n and let f, g j C 1 be convex functions. Suppose (CO) satisfies the Slater condition. Then x is a minimizer of (CO) if and only if with some y 0 the pair (x, y) satisfies the KKT condition. Ex. (without Slater condition) Show that the pair (x, y) (x F, y 0) satisfies the KKT condition if and only if (x, y) (y 0) is a saddle point of L. CO, Chapter 2 p 34/34