SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Similar documents
Second Order Optimality Conditions for Constrained Nonlinear Programming

The Fundamental Theorem of Linear Inequalities

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Generalization to inequality constrained problem. Maximize

Numerical Optimization

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

The Karush-Kuhn-Tucker (KKT) conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Constrained Optimization

MATH2070 Optimisation

Constraint qualifications for nonlinear programming

Optimality Conditions for Constrained Optimization

8. Constrained Optimization

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Constrained Optimization Theory

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Lagrange Multipliers

Computational Optimization. Constrained Optimization Part 2

Optimality Conditions

minimize x subject to (x 2)(x 4) u,

Examination paper for TMA4180 Optimization I

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Convex Optimization and Modeling

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Convex Optimization M2

Date: July 5, Contents

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

5. Duality. Lagrangian

CS-E4830 Kernel Methods in Machine Learning

Link lecture - Lagrange Multipliers

KKT Examples. Stanley B. Gershwin Massachusetts Institute of Technology

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Lecture: Duality of LP, SOCP and SDP

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

Convex Optimization & Lagrange Duality

Existence of minimizers

We are now going to move on to a discussion of Inequality constraints. Our canonical problem now looks as ( ) = 0 ( ) 0

Assignment 1: From the Definition of Convexity to Helley Theorem

Chap 2. Optimality conditions

ICS-E4030 Kernel Methods in Machine Learning

Support Vector Machines for Regression

Optimization. A first course on mathematics for economists

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Implications of the Constant Rank Constraint Qualification

FIRST- AND SECOND-ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH VANISHING CONSTRAINTS 1. Tim Hoheisel and Christian Kanzow

Constrained Optimization and Lagrangian Duality

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Constrained optimization

The Karush-Kuhn-Tucker conditions

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Mathematical Economics. Lecture Notes (in extracts)

Lecture 13 - Wednesday April 29th

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

IE 5531: Engineering Optimization I

Support Vector Machines: Maximum Margin Classifiers

CO 250 Final Exam Guide

Lectures on Parametric Optimization: An Introduction

CONSTRAINED NONLINEAR PROGRAMMING

Nonlinear Optimization

Lecture 18: Optimization Programming

Tangent spaces, normals and extrema

Optimization and Optimal Control in Banach Spaces

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Convex Optimization Boyd & Vandenberghe. 5. Duality

Nonlinear Programming and the Kuhn-Tucker Conditions

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

More on Lagrange multipliers

Optimality, Duality, Complementarity for Constrained Optimization

Nonlinear Programming (NLP)

Optimization Tutorial 1. Basic Gradient Descent

3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

4TE3/6TE3. Algorithms for. Continuous Optimization

Written Examination

Machine Learning. Support Vector Machines. Manfred Huber

Lecture: Duality.

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Constrained Optimization

5 Handling Constraints

14 Lecture 14 Local Extrema of Function

CS711008Z Algorithm Design and Analysis

4. Algebra and Duality

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

TMA947/MAN280 APPLIED OPTIMIZATION

A Brief Review on Convex Optimization

Lecture 6: Conic Optimization September 8

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

A FRITZ JOHN APPROACH TO FIRST ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS

Transcription:

Nf SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING f(x R m g HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 5, DR RAPHAEL HAUSER. Optimality Conditions: What We Know So Far. In Lecture we showed that f(x = and D f(x are necessary optimality conditions for unconstrained optimisation, and we found that the stronger conditions f(x =, D f(x are sufficient in guaranteeing that x is a strict local minimiser. In effect, sufficiency occurs because D f(x guarantees that f is locally strictly convex. Indeed, if convexity of f is a given, we can neglect second derivatives altogether, as f( = is then a necessary and sufficient condition, see Lecture. In the exercises of Problem Set 4 we used the fundamental theorem of linear inequalities to derive the LP duality theorem, yielding necessary and sufficient optimality conditions for linear programming, the simplest case of a constrained optimisation problem in which the objective and constraint functions are all linear. Note that the LP duality theorem only involved first order derivatives, and that linear programming is a special case of a convex optimisation problem, that is, the minimisation of a convex function over a convex domain. It is thus natural to ask if optimality conditions of constrained optimisation problems (NLP min f(x x R n s.t. g i (x =, (i E, g j (x (j I mirror the situation in unconstrained optimisation, that is, first order conditions are necessary and sufficient for convex problems, second order conditions rely on strict local convexity. In the next two lectures we will see that both points need much further refinement but hold under proper regularity assumptions.. First Order Necessary Optimality Conditions... Mechanistic Interpretation. A useful picture in unconstrained optimisation is to imagine a point mass m or an infinitesimally small ball that moves on a hard surface F := { (x, f(x : x R n} without friction, see Figure.. The external forces acting on the point mass are the gravity force m g = ( mg and the reaction force mg ( N f = f(x + f(x acting in normal direction away from the surface and countering the normal gravity component, so that the resulting total external force is [ R = m g + N mg f(x f = + f(x f(x N f. Fig... Mechanistic interpretation of unconstrained optimality conditions This total force equals zero if and only if f(x =, and if the test mass is placed at such a point then it will not move away, which is why we call such points stationary points of f. When the test mass is slightly moved from a local maximiser, then the external forces will pull it further away, whereas in a neighbourhood of a local minimiser they will restore the point mass to its former position. This is expressed by the second order optimality conditions: an equilibrium position is stable if D f(x and instable if D f(x. This mechanistic interpretation extends to constrained optimisation: we can interpret an inequality constraint g(x as a hard smooth surface G := { (x, z R n R : g(x = } which is parallel to the z-axis everywhere and keeps the point mass from rolling into the domain where g(x <. Such a surface can exert only a normal force that points towards the domain {x : g j (x > }. Therefore, the reaction force must be of the form ( N g = µ g(x g, where µg. In the case depicted in Figure. where there is only one inequality constraint, the point mass is at rest and does not roll to lower terrain if the sum of external forces is zero, that is, N f + N g +m g =. Since N ( f = µ f(x f for some µ f, we find µ f [ f(x from where it follows that µ f = mg and [ [ g(x + µ g + =, mg f(x = λ g(x (. with λ = µ/mg. In other words, Nf is determined by the condition that its vertical component counter-balances the force of gravity, and N g by the condition that it counter-balances the horizontal component of Nf. This second condition is expressed in the balance equation (.. When multiple inequality constraints are present, then the horizontal component of Nf must be counter-balanced by the sum of the reaction forces excerted by the x

f(x Nf Gj F where we replaced the difference λ + i λ i of the bound-constrained variables λ + i, λ i by a single unconstrained variable λ i = λ + i λ i. Note that in this case the conditions λ + i g i(x =, λ i ( g i(x = are satisfied automatically, since g i (x = if x is feasible. In summary, our mechanistic motivation suggests that if x is a local minimiser of (NLP, then there exist Lagrange multipliers λ R I E such that f(x λ i g i (x = i I E g i (x = (i E g j (x (j I λ j g j (x = (j I λ j (j I. Ng x These are the so-called Karush-Kuhn-Tucker (KKT conditions. Our intuitive motivation cannot replace a rigorous proof, but the physical interpretation provides an easy explanation and provides a jog for memory. x m g.. Constraint Qualification. Before we start deriving the KKT conditions more rigorously, we introduce a few technical concepts and some notation. Fig... Mechanistic interpretation of constrained first order optimality conditions: the sum of external forces has to be zero. constraint manifolds that touch the test mass. The balance equation (. must thus be replaced with f(x = j I λ j g j (x for some λ j, and since constraints for which g j (x > cannot excert a force on the test mass, we must set λ j > for these indices, or equivalently, the equation λ j g j (x = must hold for all j I. It remains to discuss the influence of equality constraints when they are present. Replacing g i (x = by the two inequality constraints g i (x and g i (x, our mechanistic interpretation yields two parallel surfaces G + i and G i, leaving an infinitesimally thin space between them within which our point mass is constrained to move, see Figure.3. The net reaction force of the two surfaces is of the form Fig..3. Mechanistic interpretation of an equality constraint: the net reaction force can point to either side of G i. Here, we are looking down the z-axis. Ng λ + i g i(x + λ i ( g i(x = λ i g i (x, 3 N g G + i G i Definition.. Let R n be feasible for the problem (NLP. We say that the inequality constraint g j (x is active at if g( =. We write A( := {j I : g j ( = } for the set of indices corresponding to active inequality constraints. Of course, equality constraints are always active, but we will account for their indices separately. If J E I is a subset of indices, we will write g J for the vectorvalued map that has g i (i J as components in some specific order. Furthermore, we write g for g E I. There are situations in which our mechanical picture does not apply: if two inequality constraints have first order contact at a local minimiser, as in Figure.4, then they cannot annul the horizontal part of N f. In this case the mechanistic interpretation is flawed. When there are more constraints constraints, then generalisations of this situation can occur. In order to prove the KKT conditions, we must therefore make a regularity assumption on the constraints: Definition.. If { g i : i E A( } is a linearly independent set of vectors, we say that the linear independence constraint qualification (LICQ holds at. The LICQ assumption guarantees that the linearisation of (NLP around is differential-topologically equivalent to (NLP in a neighbourhood of : the dimension of the manifold formed by the strictly feasible points in a neighbourhood of must remain the same after replacing each of the constraint surfaces by their tangent plane at, see Figure.6. We illustrate two situations in which the dimension of the strictly feasible set changes in the linearised problem: Some of the active inequality constraint surfaces may not intersect properly: Figure.5 shows that when two constraint surfaces have first order contact, then the dimension of the strictly feasible set in the linearised problem col- 4

Ng G F {x : g(x > } G {x : g( (x > } Nf {x : g( (x > } f(x Ng {x : g(x > } x Fig..6. Two active inequality constraints intersect properly x Fig..4. The mechanistic interpretation breaks down because the surfaces G and G are tangential to one another at. lapses. Figure.6 shows that this doesn t happen when the constraint surfaces intersect properly. Some of the equality constraints may not intersect properly: Figure.7 shows that when two equality constraint surfaces have first order contact at, then the set of points that satisfy the equality constraints has higher dimension in the linearised problem. At x however the equality constraints intersect properly and the dimension remains the same. x T {x : g(x > } {x : g( (x > } x T x {x : g( (x > } Fig..7. Two equality constraints have first order contact at but intersect properly at x. T x is defined as T x := {y : g (x(y x =, g (x(y x = }. {x : g(x > }.3. A Key Lemma. In this section we will prove a lemma that will be extremely useful in everything that follows. Fig..5. Two active inequality constraints do not intersect properly 5 6

Lemma.3. Consider the problem (NLP where f and g i (i E I are C k functions with k. Let be a feasible point where the LICQ holds and let d R n be a vector such that.4. KKT Conditions. We are ready to prove a theorem which shows that if is a local minimiser for (NLP and if the LICQ holds at then is a minimiser of the linear programming problem obtained by linearising (NLP around. d, d T g i ( =, (i E, d T g j (, (j A(. (. Theorem.4. If is a local minimiser of (NLP where the LICQ holds then f( cone ({± g i ( : i E} { g j ( : j A( }. Then for ɛ > small enough there exists a path x C k( ( ɛ, +ɛ, R n such that x( =, d x( = d, dt g i (x(t = td T g i ( g i (x(t =, (i E A(, t ( ɛ, ɛ, (i E, t ( ɛ, ɛ, g j (x(t (j I, t. (.3 Proof. Let l = A( E. Since the LICQ holds, it is possible to choose Z R (n l n such that [ Dg A( E ( Z is a nonsingular n n matrix. Let h : R n R R n be defined by (x, t [ g A( E (x tdg A( E ( d Z(x td Then h (, = [ Dxh(, Dth(,, where where D x h(, = [ Dg A( E ( and Z D t h(, = [ Dg A( E ( d Zd = Dx h(, d are the leading n n and trailing n block respectively. Since D x h(, is nonsingular, the Implicit Function Theorem (see Lecture 8 implies that for ɛ > small enough there exists a unique C k function x : ( ɛ, ɛ R n and a neighbourhood V( such that for x V(, t ( ɛ, ɛ, h(x, t = x = x(t. In particular, we have x( = and g i (x(t = td T g( for all i A( E and t ( ɛ, ɛ. (. therefore implies that g i (x(t = (i E and g i (x(t (i A(, t [, ɛ. On the other hand, since g i ( > (i / A(, the continuity of x(t implies that there exists ɛ (, ɛ such that g j (x(t > (j I \ A(, t ( ɛ, ɛ. Finally, d dt x( = ( D x h(, Dt h(, = d follows from the second part of the Implicit Function Theorem. Proof. Suppose our claim is wrong. Then the fundamental theorem of linear inequalities implies that there exists a vector d R n such that d T g j (, (j A(, ±d T g i (, (i.e., d T g i ( = (i E, d T f( <. Since d satisfies (., Lemma.3 implies that there exists a path x : ( ɛ, ɛ R n that satisfies (.3. Taylor s theorem then implies that f(x(t = f( + td f( + O(t < f( for < t. Since (.3 shows that x(t is feasible for t [, ɛ, this contradicts the assumption that is a local minimiser. Note that the condition f( cone ({± g i ( : i E} { g j ( : j A( } is equivalent to the existence of λ R E I such that f( = λ i g i (, (.4 i E I where λ j (j A( and λ j = for (j I \ A(. Moreover, must be feasible. Thus, Theorem.4 shows that when is a local minimiser where the LICQ holds, then the KKT conditions must hold. We can formulate this result in slightly more abstract form in terms of the Lagrangian associated with (NLP: L : R n R m R m (x, λ f(x λ i g i (x. Equation (.4 says that the derivative of the Lagrangian with respect to the x coordinates is zero. Putting all the pieces together, we obtain the following theorem: i= 7 8

Theorem.5 (First Order Necessary Optimality Conditions. If is a local minimiser of (NLP where the LICQ holds then there exists λ R m such that (, λ solves the following system of inequalities, D x L(, λ =, λ j λ i g i ( = g j ( g i ( = (j I, (i E I, (j I, (i E..5. The Method of Lagrange Multipliers. In this section we show by ways of an example how the KKT conditions can be used to solve nonlinear optimisation problems. This method of solving optimisation problems is called the method of Lagrange multipliers. Even though this approach can be carried through explicitly only when the number of constraints is small, numerical algorithms for nonlinear programming are actually designed to do the same in situations where the calculations cannot be done by hand. Example.6. Solve the following nonlinear programming problem min x R f(x = x3 + x s.t. g (x := x + x = g (x := x. (.5 (.6 and (.7 together imply 84 8 + =, which shows that {.94,.59}. Since we have assumed > this leaves the two possible solutions x [ = [.43.6859, x [ = [.97.78 with function values f(x [ =.7, f(x [ =.845. In the second case where A( = {} we have =. We need to find λ R and λ such that [ = λ [ 4 + λ [. This implies λ =. Moreover, since x must be feasible and hence =, we have λ = (4x and x = ±/. This yields the two further candidate points [ x [3, x [4 = ±. At these points we have g ( = ( ± and g ( = (, and hence, the LICQ holds. The objective function values are f(x [3 =.77, f(x [4 =.77. Overall we find four candidate points where the KKT conditions hold. Among these four points, x [4 has the smallest objective value, and this must be the global minimiser of our problem. We have [ [ [ x 3x g (x =, g 4x (x =, f(x =. If we want to find all points that satisfy the conditions of Theorem.4 we have to distinguish two cases corresponding to A( = and A( = {}. If A( = then >. We must find λ R such that which is equivalent to and implies [ 3x = λ [ x 4, λ = 3x, λ = 4 6 =. (.6 In particular,, and hence g (, g ( are linearly independent, that is, the LICQ holds at these points. We also need to be feasible, + x 9 =. (.7