Constrained Optimization Theory

Similar documents
Optimality, Duality, Complementarity for Constrained Optimization

Numerical Optimization

Optimality Conditions for Constrained Optimization

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Second Order Optimality Conditions for Constrained Nonlinear Programming

Computational Optimization. Constrained Optimization Part 2

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

We are now going to move on to a discussion of Inequality constraints. Our canonical problem now looks as ( ) = 0 ( ) 0

Generalization to inequality constrained problem. Maximize

Examination paper for TMA4180 Optimization I

Constraint qualifications for nonlinear programming

Linear Programming: Simplex

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Constrained Optimization

FIRST- AND SECOND-ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH VANISHING CONSTRAINTS 1. Tim Hoheisel and Christian Kanzow

Lecture 3. Optimization Problems and Iterative Algorithms

Lectures on Parametric Optimization: An Introduction

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

5. Duality. Lagrangian

CS / ISyE 730 Spring 2015 Steve Wright

1 Computing with constraints

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Semi-infinite programming, duality, discretization and optimality conditions

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Chap 2. Optimality conditions

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Nonlinear Programming and the Kuhn-Tucker Conditions

5 Handling Constraints

5.6 Penalty method and augmented Lagrangian method

Implications of the Constant Rank Constraint Qualification

MATH2070 Optimisation

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Convex Optimization Boyd & Vandenberghe. 5. Duality

CONSTRAINED NONLINEAR PROGRAMMING

Chapter 2: Preliminaries and elements of convex analysis

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

Numerical Optimization of Partial Differential Equations

Optimization Theory. Lectures 4-6

DUALITY, OPTIMALITY CONDITIONS AND PERTURBATION ANALYSIS

Sufficient Conditions for Finite-variable Constrained Minimization

Lecture: Duality.

Convex Optimization M2

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Algorithms for nonlinear programming problems II

Date: July 5, Contents

On the relation between concavity cuts and the surrogate dual for convex maximization problems

AN ABADIE-TYPE CONSTRAINT QUALIFICATION FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS. Michael L. Flegel and Christian Kanzow

Nonlinear Optimization

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

MS&E 318 (CME 338) Large-Scale Numerical Optimization

1. Introduction. We consider mathematical programs with equilibrium constraints in the form of complementarity constraints:

Primal/Dual Decomposition Methods

Algorithms for nonlinear programming problems II

Lagrangian Duality Theory

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

A FRITZ JOHN APPROACH TO FIRST ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS

Lecture: Duality of LP, SOCP and SDP

4. Algebra and Duality

Existence of minimizers

MODIFYING SQP FOR DEGENERATE PROBLEMS

8. Constrained Optimization

Optimality Conditions

Lagrange Relaxation and Duality

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Microeconomics I. September, c Leopold Sögner

Optimization. A first course on mathematics for economists

Gradient Descent. Dr. Xiaowei Huang

Math 273a: Optimization Basic concepts

Optimization Tutorial 1. Basic Gradient Descent

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Module 04 Optimization Problems KKT Conditions & Solvers

Support Vector Machines

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Higher-Order Methods

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

x +3y 2t = 1 2x +y +z +t = 2 3x y +z t = 7 2x +6y +z +t = a

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Lecture 3: Basics of set-constrained and unconstrained optimization

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

Priority Programme 1962

Optimization for Machine Learning

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)

Some new facts about sequential quadratic programming methods employing second derivatives

ON AUGMENTED LAGRANGIAN METHODS WITH GENERAL LOWER-LEVEL CONSTRAINTS. 1. Introduction. Many practical optimization problems have the form (1.

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Convex Optimization & Lagrange Duality

Assignment 1: From the Definition of Convexity to Helley Theorem

Inequality Constraints

Reading group: Calculus of Variations and Optimal Control Theory by Daniel Liberzon

5.5 Quadratic programming

OPTIMIZATION THEORY IN A NUTSHELL Daniel McFadden, 1990, 2003

Constrained Optimization and Lagrangian Duality

Symmetric and Asymmetric Duality

Transcription:

Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 1 / 22

Constrained Optimization Theory How can we recognize solutions of constrained optimization problems? Answering this question is crucial to algorithm design. We already covered some cases, associated with the geometric formulation min f (x) s.t. x Ω, where Ω is closed and convex. We ll review these. But we focus mainly on the case in which the feasible set is specified algebraically, that is, c i (x) = 0, for all i E, c i (x) 0, for all i I. This general case admits a few complications! But ultimately we get a set of checkable conditions. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 2 / 22

Closed Convex Ω Recall: When Ω is closed and convex, we have the normal cone defined for all x Ω by N Ω (x) := {d d T (y x) 0 for all y Ω}. Theorem Suppose Ω is closed and convex and that f is convex with continuous gradients. Then a necessary and sufficient condition for x to be a (global) solution of min f (x) s.t. x Ω is that f (x ) N Ω (x ). Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 3 / 22

Proof. Suppose that f (x ) N Ω (x ). By convexity we have for any y Ω that f (y) f (x ) + f (x ) T (y x ) f (x ), where the last condition arises from f (x ) N Ω (x ) and the definition of N Ω (x ). Thus x is a global solution. Now suppose that f (x ) / N Ω (x ). Then there exists y Ω such that f (x ) T (y x ) > 0. From Taylor s theorem and continuity of f, we have for small α > 0 that x + α(y x ) Ω and f (x + α(y x )) = f (x ) + α f (x ) T (y x ) + o(α) < f (x ), where the latter holds for sufficiently small α > 0. Thus, x is not a minimizer. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 4 / 22

Ω closed and convex, f smooth When f is not convex, the condition f (x ) N Ω (x ) is still a necessary condition. Theorem Suppose Ω is closed and convex and that f has continuous gradients. Then if x is a local solution of min f (x) s.t. x Ω, we have Proof. f (x ) N Ω (x ). Suppose that f (x ) / N Ω (x ). Then we use the second path of the previous proof to identify y Ω such that f (x + α(y x )) < f (x ) for all α sufficiently small and positive. Any neighborhood of x will contain a point of the form x + α(y x ) for some small enough α > 0, and this point has a lower function value than f (x ). Thus, x is not a local solution. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 5 / 22

Tangent Cone When Ω is specified algebraically and / or is nonconvex, we have the issue of how to define the normal cone N Ω (x). We propose a more general definition (which coincides with the one above when Ω is convex). The new definition is based on the Tangent cone T Ω (x), which is defined as the cone of limiting feasible directions. We say that d is a limiting feasible direction to Ω at x if there exists a sequence {z k } with z k Ω for all k, and a sequence {t k } of positive scalars, such that Do some nonconvex examples: lim k z k x t k = d. Ω = {(x 1, x 2 ) x 2 x 2 1 }, Ω = {(x 1, x 2 ) x 2 1 x 2 x 2 1 }. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 6 / 22

Tangent Cone of a Polyhedron When Ω is defined by linear constraints, it s easy to identify the tangent cone algebraically. Ω = {x a T i x = b i, i E; a T i x b i, i I}. At a given point x Ω, we define the active set: The tangent cone at x is then A( x) := E {i I a T i x = b i }. T Ω ( x) = {d a T i d = 0, i E; a T i d 0, i A( x) I}. Proof: For i A( x), we have a T i d = lim k a T i z k a T i x t k = lim k a T i z k b i t k. For i E, we have a T i z k b i = 0 for all k, so a T i d = 0. For i I A( x), we have a T i z k b i 0, so in the limit we have a T i d 0. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 7 / 22

Nonlinear Algebraic Constraints What about nonlinear algebraic constraints c i (x) = 0, i E; c i (x) 0, i I. Can we linearize these constraints at a given feasible x, then use the polyhedron methodology to find T Ω ( x)? Following the previous slide, we can define the active set: and the feasible direction set: A( x) := E {i I c i ( x) = 0} F( x) := {d c i ( x) T d = 0, i E; c i ( x) T d 0, i A( x) I}. But we don t necessarily have F( x) = T Ω ( x)! We need extra conditions called constraint qualifications to guarantee that this equality holds. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 8 / 22

Nonlinear Constraints It s easy to prove that T Ω ( x) F( x), provided the constraints c i are smooth. This is a consequence of Taylor s theorem. (Exercise!) The issue comes with proving the converse: F( x) T Ω ( x). Here s where we need the constraint qualifications. Example: x 2 0, x 2 x 2 1. Here we have a single feasible point: x = (0, 0). But T Ω (0) = {0}, F(0) = {(d 1, 0) d 1 R}. An even more elementary example: x 3 0. Here we have T Ω (0) = {d d 0}, F(0) = R. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 9 / 22

Constraint Qualifications Two famous CQs, that ensure that F( x) T Ω ( x) are: LICQ: (Linear Independent Constraint Qualification): the active constraint gradients c i ( x), i A( x), are linearly independent. Can use the implicit function theorem to prove F( x) T Ω ( x). MFCQ: (Mangasarian-Fromovitz Constraint Qualification): The equality constraint gradients are linearly independent, and there exists a vector d such that c i ( x) T d = 0, i E; c i ( x) T d < 0, i A( x) I. (Note: Strict inequality!) Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 10 / 22

First-Order Necessary Condition A fundamental necessary condition for x to be a local min is that f (x ) T d 0 for all d T Ω (x ). (Prove using Taylor s theorem: If this condition does not hold, we can construct a sequence {z k } with z k Ω and f (z k ) < f (x ), so x cannot be a local min.) But as written, this condition is hard to check! We now give the general definition of the normal cone: N Ω ( x) = {v v T d 0 for all d T Ω ( x)}. That is, the normal cone is the polar of the tangent cone. (Exercise: Show that this coincides with the earlier def for convex Ω!) The first-order necessary condition then becomes f (x ) N Ω (x )! Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 11 / 22

CQs and First-Order Necessary If a CQ holds, we have F(x ) = T Ω (x ). To calculate the normal cone in this case, we need to find the normal to the polyhedral set F(x ) at x. Here s where Farkas s Lemma is useful! Farkas s Lemma: Given vectors a i, i = 1, 2,..., m and b, EITHER there are coefficient λ i 0 such that b = m i=1 λ ia i OR there is a vector w such that b T w > 0 and ai T w 0, i = 1, 2,..., m. We apply this Lemma defining the a i vectors to be and b = f (x ). ± c i (x ), i E; c i (x ), i A(x ) I, Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 12 / 22

KKT Conditions The first order necessary condition tells us that there is no vector d with f (x ) T d > 0 and d T c i (x ) = 0, i E; d T c i (x ) 0, i A(x ) I. Farkas s Lemma tells us that the alternative must be true, that is, there are nonnegative coefficients such that f (x ) = (λ + i λ i ) c i (x ) + λ i c i (x ). i E i A(x ) I By combining λ i := λ + i λ i for i E, we obtain the condition: f (x ) = i E λ i c i (x ) + i A(x ) I λ i c i (x ). We combine this equality with other conditions that ensure feasibility of x to obtain the KKT conditions. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 13 / 22

KKT Conditions: First-Order Necessary Theorem Suppose that x is a local solution of the problem min f (x) s.t. c i (x) = 0, i E, c i (x) 0, i I, and that a constraint qualification (linear constraints, LICQ, MFCQ) is satisfied at x. Then there are coefficients λ i, i E I such that the following are true: f (x ) = λ i c i (x ), i E I 0 λ i c i (x ) 0, i I, c i (x ) = 0, i E. Note that the condition forces λ i = 0 for i / A(x ). Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 14 / 22

Lagrangian The Lagrangian is a linear combination of objective function and constraints, with coefficients λ i called Lagrange multipliers. L(x, λ) = f (x) + λ i c i (x). i E I We can restate the first of the KKT conditions succinctly using the Lagrangian: x L(x, λ) = 0. It s also useful in defining second-order conditions that characterize solutions. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 15 / 22

Unconstrained: Second-order Conditions Go back to the unconstrained problem min f (x) with f smooth. If f is convex, the first-order condition f (x ) = 0 is necesssary and sufficient for x to be a solution of the problem. If f is nonconvex, we can t say much about global solutions (except in special cases), but we can talk about conditions for local solutions. We saw earlier than f (x ) = 0 is a first-order necessary (1oN) condition for a local solution. We can also identify second-order conditions that make reference to the Hessian 2 f (x ). Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 16 / 22

Second-Order Necessary (2oN) Conditions Theorem If x is a local minimizer of f and 2 f is continuous in an open neighborhood of x, then f (x ) = 0 and 2 f (x ) is positive semidefinite. Proof. We know from 1oN condition that f (x ) = 0 Assume for contradiction that there is a direction p such that p T 2 f (x )p < 0. By Taylor s theorem, we have that f (x + αp) = f (x ) + 1 2 α2 p T 2 f (x )p + o(α 2 ) < f (x ) for all α > 0 sufficiently small. Thus x cannot be a local solution. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 17 / 22

Second-Order Sufficient (2oS) Conditions Theorem Suppose that 2 f is continuous in a neighborhood of x and that f (x ) = 0 and 2 f (x ) is positive definite. Then x is a strict local minimizer of f. The proof follows again from Taylor s theorem, using a second-order expansion around x. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 18 / 22

Critical Cone An important quantity in the second-order conditions is the critical cone C(x, λ ). This the cone of directions w in the linearized feasible set F(x ) for which the KKT conditions alone do not tell us whether f increases along w. We need higher-order information to resolve the issue. C(x, λ ) = {w F(x ) c i (x ) T w = 0, i A(x ) I for which λ i > 0}. This excludes directions w F(x ) such that c i (x ) T w < 0 for some λ i > 0, i I. For w C(x, λ ) we have from KKT that w T f (x ) = i E I λ i c i (x ) T w = 0, so the first-order conditions alone are not enough to verify that w is an ascent direction for f. (Other directions in F(x ) yield w T f (x ) > 0.) Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 19 / 22

Second-Order Necessary (2oN) Conditions Theorem Suppose that x is a local solution at which a CQ holds, and suppose that KKT conditions are satisfied by (x, λ ). Then w T 2 xxl(x, λ )w 0, for all w C(x, λ ). Proofs uses the fact that w is a limiting feasible direction, Taylor s theorem applied to L(, λ ), definition of C(x, λ ). See (Nocedal and Wright, 2006, Theorem 12.5). Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 20 / 22

Second-Order Sufficient (2oS) Conditions Theorem Suppose that x is feasible and there exists λ such that (x, λ ) satisfy KKT. Suppose that w T 2 xxl(x, λ )w > 0 for all w C(x, λ ) with w 0. Then x is a strict local solution. Proof uses Taylor s theorem, compactness of {d C(x, λ ) d = 1}. Note the differences between 2oN and 2oS results: strict inequality in the curvature condition; strict local solution in the 2oS result, not just local solution; 2oS result does not require a CQ. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 21 / 22

References I Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, New York. Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August 2016 22 / 22