Finite Dimensional Optimization Part III: Convex Optimization 1

Similar documents
Convexity in R N Supplemental Notes 1

Finite-Dimensional Cones 1

Finite Dimensional Optimization Part I: The KKT Theorem 1

Concave and Convex Functions 1

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis.

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Concave and Convex Functions 1

5. Duality. Lagrangian

Nonlinear Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Separation in General Normed Vector Spaces 1

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization and Modeling

Convex Optimization M2

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Generalization to inequality constrained problem. Maximize

Nonlinear Programming and the Kuhn-Tucker Conditions

Convex Optimization & Lagrange Duality

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Lecture: Duality.

Chap 2. Optimality conditions

EE/AA 578, Univ of Washington, Fall Duality

4TE3/6TE3. Algorithms for. Continuous Optimization

Lecture: Duality of LP, SOCP and SDP

Optimization. A first course on mathematics for economists

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Constrained Optimization and Lagrangian Duality

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

Week 3 Linear programming duality

Existence of Optima. 1

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

8. Constrained Optimization

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

R N Completeness and Compactness 1

The Inverse Function Theorem 1

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

OPTIMIZATION THEORY IN A NUTSHELL Daniel McFadden, 1990, 2003

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

Competitive Consumer Demand 1

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

1 Convexity, concavity and quasi-concavity. (SB )

subject to (x 2)(x 4) u,

Optimization Theory. Lectures 4-6

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

Solving Dual Problems

A Brief Review on Convex Optimization

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

The Implicit Function Theorem 1

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

E 600 Chapter 4: Optimization

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Lagrange Relaxation and Duality

Optimality Conditions for Constrained Optimization

Franco Giannessi, Giandomenico Mastroeni. Institute of Mathematics University of Verona, Verona, Italy

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Primal Solutions and Rate Analysis for Subgradient Methods

10-725/ Optimization Midterm Exam

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Lagrangian Duality Theory

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Support Vector Machines

The Karush-Kuhn-Tucker (KKT) conditions

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Homework Set #6 - Solutions

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Additional Homework Problems

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Constrained Optimization Theory

Lecture 4: Optimization. Maximizing a function of a single variable

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

Numerical Optimization

Optimization for Machine Learning

Optimization using Calculus. Optimization of Functions of Multiple Variables subject to Equality Constraints

Constrained maxima and Lagrangean saddlepoints

Math 164-1: Optimization Instructor: Alpár R. Mészáros

The Revenue Equivalence Theorem 1

Exam in TMA4180 Optimization Theory

Multivariate Differentiation 1

Constrained Optimization

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Lecture 4 September 15

Maximum likelihood estimation

Primal/Dual Decomposition Methods

Topic one: Production line profit maximization subject to a production rate constraint. c 2010 Chuan Shi Topic one: Line optimization : 22/79

Algorithms for constrained local optimization

The Kuhn-Tucker Problem

Lagrangian Methods for Constrained Optimization

Research Article Optimality Conditions and Duality in Nonsmooth Multiobjective Programs

Transcription:

John Nachbar Washington University March 21, 2017 Finite Dimensional Optimization Part III: Convex Optimization 1 1 Saddle points and KKT. These notes cover another important approach to optimization, related to, but in some ways distinct from, the KKT Theorem. For simplicity, I focus on MAX problems with a single variable, x R, and a single constraint, g : R R. It should be evident that the theory below extends easily to x R N and K constraints. And, as usual, there is an analog for MIN problems. Given a MAX problem, define L : R R + R by L(x, λ) = f(x) λg(x). L is called the Lagrangian. Say that (x, λ ) is a saddle point of L iff L(x, λ ) L(x, λ ) L(x, λ) for all x R, λ R +. (x, λ ) is a saddle point iff (x, λ ) maximizes L with respect to x and minimizes L with respect to to λ. Recall that (one form of) the KKT condition for a MAX problem is that there exists a λ 0 such that f(x ) = λ g(x ), with λ = 0 if g(x ) < 0. Theorem 1 states that if (x, λ ) is a saddle point then the KKT condition holds at x (with λ as the multipilier). The proof is in Section 2. Theorem 1. Given a MAX problem in standard form and associated Lagrangian L, if (x, λ ) is a saddle point of L and L is differentiable then x is feasible and the KKT condition holds at x. The next two results in this section establish that x solves the MAX problem iff there is a λ 0 such that (x, λ ) is a saddle point of the associated L. The proofs are again in Section 2. The first result gives sufficiency. Theorem 2. Given a MAX problem in standard form and associated L, if (x, λ ) is a saddle point of L then x solves MAX. Recall the Slater condition; since there is only one constraint, the Slater condition holds iff there is an x such that g(x) < 0. The next result gives necessity. 1 cbna. This work is licensed under the Creative Commons Attribution-NonCommercial- ShareAlike 4.0 License. 1

Theorem 3. Given a MAX problem in standard form and associated L, suppose that x solves MAX. If f is concave, g is convex, and g satisfies the Slater condition, then there is an λ 0 such that (x, λ ) is a saddle point of L. If f and g are differentiable, then Theorem 3 follows from the KKT Theorem. Explicitly, if the constraint is convex and Slater holds then constraint qualification holds and the KKT Theorem then states that the KKT condition holds at x. If g is not binding (i.e., if g(x ) < 0), L(x, λ) is strictly increasing in λ and hence is, indeed, minimized at λ = 0. If, on the other hand, g is binding (i.e., if g(x ) = 0), then L(x, λ) = f(x ), which does not depend on λ and hence is (trivially) minimized at λ = 0. As for x, the KKT condition implies that x L(x, λ ) = 0. Since L is concave (since f is concave and g is convex), x L(x, λ ) = 0 is a sufficient condition for x to maximize L(x, λ ) (by the notes on sufficiency). Theorem 3 does not, however, assume differentiability. It requires, as a consequence, a new proof, given in Section 2. Moreover, because Theorem 3 does not assume differentiability, one can use it to construct a version of KKT that holds even without differentiability, provided f is concave and g is convex. Instead of gradients, one works with subgradients, which I discuss briefly in the notes on concave functions. I do not pursue the topic of KKT without differentiability here, but it is important in some applications. A corollary of the above results is the following, a special case of a result that appears in the Sufficiency notes (where it was proved by different means). Theorem 4. Given a differentiable MAX problem in standard form, if the objective function is concave, the constraint is convex, and Slater holds, then a necessary and sufficient condition for a feasible x to be a solution is that the KKT condition holds at x. Proof. If x is a solution, then Theorem 3 says that there is a λ 0 such that (x, λ ) is a saddle point of L, and Theorem 1 then says that the KKT condition holds. Conversely, if the KKT condition holds at x then the argument given above, following the statement of Theorem 3, shows that there is a λ such that (x, λ ) is a saddle point of L. Theorem 2 then says that x solves MAX. 3. The following examples illustrate the role played by the conditions in Theorem Example 1. The following example appeared in the KKT notes. Let f(x) = x and let g(x) = x 2. Then the constraint set is {0}. The solution is, trivially, x = 0. Here, f is concave, g is convex, but Slater fails. The KKT condition fails: since f(x ) = 1 while g(x ) = 0, there is no λ 0 such that f(x ) = λ g(x ). The saddle point property fails: there is no λ such that L(x, λ ) L(x, λ ) for all x. 2

Example 2. The following example appeared in the KKT notes. Let f(x) = x and g(x) = x 3. The constraint set is (, 0] and the solution is x = 0. Here, f is concave, Slater holds (for example, g( 1) = 1 < 0), but g is not convex. The KKT condition fails: since f(x ) = 1 while g(x ) = 0, there is no λ such that f(x ) = λ g(x ). The saddle point property fails: there is no λ such that L(x, λ ) L(x, λ ) for all x. In the next example, Slater holds and g is convex but f is not concave. In contrast to the previous examples, KKT multipliers exist but the saddle point property still fails. There is no λ such that L(x, λ ) L(x, λ) for all x. Example 3. f(x) = e x 1 and g(x) = x. The constraint set is (, 0] and the solution is x = 0. Slater holds (g( 1) = 1 < 0) and g is convex. f(0) = 1 and g(0) = 1 so the KKT condition holds with λ = 1. As in the previous examples, g(0) = 0 and so L(x, λ ) L(x, λ) for any λ, λ 0. The condition L(x, λ ) L(x, λ ) reduces to, 0 e x 1 λ x. There is no λ 0 for which this holds for all x; for λ = 1 (as required by KKT Condition 1), the inequality does not hold for any x 0. In the next, and last, example, all conditions are met. Example 4. f(x) = e x + 1 and g(x) = x. The constraint set is (, 0] and the solution is, once again, x = 0. Slater holds (g( 1) = 1 < 0) and g is convex so local Slater holds as well, and in fact LI holds. As in Example 3, f(0) = 1 and g(0) = 1 so the KKT condition holds with λ = 1. Finally, the saddle point property holds. As in the previous examples, the concern is the inequality L(x, λ ) L(x, λ ), which here becomes 0 e x + 1 λ x. For λ = 1, this inequality holds for all x (indeed, with strict inequality except for x = x ). 2 Proofs of the Saddle Point Theorems. Proof of Theorem 1. If (x, λ ) is a saddle point then x L(x, λ ) = 0, hence f(x ) λ g(x ) = 0, or f(x ) = λ g(x ), which is the KKT condition. To see that x is feasible, note that, since (x, λ ) is a saddle point, λ L(x, λ ) 0, with λ L(x, λ ) = 0 if λ > 0. (If the implicit constraint λ 0 is binding then 3

λ L(x, λ ) > 0 is possible: reducing λ below 0 would reduce L further, but λ < 0 is infeasible.) λ L(x, λ ) 0 implies g(x ) 0, hence g(x ) 0: x is feasible. Proof of Theorem 2. Since (x, λ ) is a saddle point of L, L(x, λ ) L(x, λ) for every λ 0, hence or f(x ) λ g(x ) f(x ) λg(x ) (λ λ )g(x ) 0. (1) Since λ can be arbitrarily large, this can hold for all λ 0 only if g(x ) 0, which establishes that x is feasible. Since λ 0, I must have λ g(x ) 0. On the other hand, setting λ = 0 in Inequality (1) implies λ g(x ) 0, or λ g(x ) 0. Combining, λ g(x ) = 0. Since (x, λ ) is a saddle point of L, L(x, λ ) L(x, λ ) for every x, hence f(x ) λ g(x ) f(x) λ g(x). for every x. Since, as just established, λ g(x ) = 0, this implies f(x ) f(x) λ g(x) for every x. If x is feasible then, g(x) 0, hence g(x) 0, hence (since λ 0), λ g(x) 0. Therefore, for feasible x, f(x) λ g(x) f(x), hence which establishes that x solves MAX. f(x ) f(x), Proof of Theorem 3. As discussed following the statement of Theorem 3, that theorem is almost immediate if f and g are differentiable. The novelty in the following argument is that it does not require differentiability. Define two sets A = { a R 2 : x R such that a 1 f(x) and a 2 g(x) } and B = { b R 2 : b 1 > f(x ) and b 2 > 0 }. 4

These sets are convex. In particular, A is convex since f is concave and g is convex (hence g is concave). B is convex since it is a translation of the strictly positive orthant in R 2. The sets are also disjoint. To see this, suppose there is a x such that such that f(x) > f(x ) and hence there is an a 1 such that f(x ) < a 1 f(x). Since, by assumption, x solves MAX, x must be infeasible: g(x) > 0, or g(x) < 0. Hence, for any such x, a 2 < 0. Then (a 1, a 2 ) B. By the Separating Hyperplane Theorem (for sets that are convex but not necessarily closed/compact), since A, and B are disjoint convex sets, there is a vector q = (q 1, q 2 ) (0, 0) such that for any a A, b B, a q b q (2) Since I can take a 1 and a 2 to be arbitrarily negative, Inequality 2 implies that q 1, q 2 0. I claim that q 1 > 0. The argument is by contraposition: if q 1 = 0 then Slater cannot hold. Explicitly, taking a sequence in B converging to (f(x ), 0) establishes that, for any feasible x, since (f(x), g(x)) A, q 1 f(x) q 2 g(x) q 1 f(x ). (3) If q 1 = 0, then, since (q 1, q 2 ) 0 and q 2 0, it follows that q 2 > 0. Moreover, if q 1 = 0, then from Inequality 3, q 2 g(x) 0. If q 2 > 0, this holds iff g(x) 0 for every x. This shows that Slater does not hold. By contraposition, if Slater holds then q 1 > 0. Define λ = q 2 q 1, which is well defined since q 1 > 0. Then for any a A and any b B a 1 + λ a 2 b 1 + λ b 2. Again taking a sequence in B converging to (f(x ), 0), this implies that for any x, In particular, for x = x, f(x) λ g(x) f(x ). (4) f(x ) λ g(x ) f(x ) or λ g(x ) 0. 5

On the other hand, x is feasible, hence g(x ) 0, hence, since λ 0, λ g(x ) 0. Combining λ g(x ) = 0. Therefore, for any x, f(x) λ g(x) f(x ) = f(x ) λ g(x ) which shows that L(x, λ ) L(x, λ ). It remains to show that L(x, λ) L(x, λ ). Again λ g(x ) = 0. On the other hand, g(x ) 0, hence for any λ 0, λg(x ) 0, hence λg(x ) 0. It follows that for any λ 0, λ g(x ) = 0 λg(x ) or f(x ) λ g(x ) f(x ) λg(x ) which shows that L(x, λ) L(x, λ ). Thus (x, λ ) is saddle point of L. 6