Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Similar documents
Date: July 5, Contents

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

Lagrange Multipliers with Optimal Sensitivity Properties in Constrained Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

Convex Optimization & Lagrange Duality

Numerical Optimization

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

Convex Optimization Theory. Chapter 4 Exercises and Solutions: Extended Version

5. Duality. Lagrangian

Primal Solutions and Rate Analysis for Subgradient Methods

LECTURE 10 LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

1. f(β) 0 (that is, β is a feasible point for the constraints)

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Lagrangian Duality Theory

Convex Optimization M2

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Additional Homework Problems

Convex Optimization and Modeling

4. Algebra and Duality

Subgradient Methods in Network Resource Allocation: Rate Analysis

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Optimization for Machine Learning

Generalization to inequality constrained problem. Maximize

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Optimality Conditions for Constrained Optimization

Lecture 18: Optimization Programming

Constrained Optimization Theory

Lecture: Duality.

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Nonlinear Programming and the Kuhn-Tucker Conditions

Lecture 8. Strong Duality Results. September 22, 2008

Lecture: Duality of LP, SOCP and SDP

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

subject to (x 2)(x 4) u,

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Machine Learning And Applications: Supervised Learning-SVM

EE/AA 578, Univ of Washington, Fall Duality

Pseudonormality and a Lagrange Multiplier Theory for Constrained Optimization 1

Constrained Optimization and Lagrangian Duality

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Constrained Optimization

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Extended Monotropic Programming and Duality 1

Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lecture 3. Optimization Problems and Iterative Algorithms

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Optimality, Duality, Complementarity for Constrained Optimization

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

EE364a Review Session 5

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Linear and non-linear programming

10 Numerical methods for constrained problems

Primal/Dual Decomposition Methods

Chap 2. Optimality conditions

Part IB Optimisation

Lagrange Relaxation and Duality

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

Lecture 6: Conic Optimization September 8

Homework Set #6 - Solutions

Nonlinear Optimization

Support Vector Machines

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lagrangian Duality and Convex Optimization

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Introduction to Nonlinear Stochastic Programming

Symmetric and Asymmetric Duality

Solution Methods for Stochastic Programs

Semi-infinite programming, duality, discretization and optimality conditions

Support Vector Machines: Maximum Margin Classifiers

CONSTRAINED NONLINEAR PROGRAMMING

Linear and Combinatorial Optimization

Convexity, Duality, and Lagrange Multipliers

On the Method of Lagrange Multipliers

Lecture Note 5: Semidefinite Programming for Stability Analysis

Existence of Global Minima for Constrained Optimization 1

A FRITZ JOHN APPROACH TO FIRST ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS

Nonlinear Programming

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

CONVEX OPTIMIZATION VIA LINEARIZATION. Miguel A. Goberna. Universidad de Alicante. Iberian Conference on Optimization Coimbra, November, 2006

Lecture 2: Linear SVM in the Dual

Minimax Problems. Daniel P. Palomar. Hong Kong University of Science and Technolgy (HKUST)

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

Lecture 7: Convex Optimizations

Approximate Farkas Lemmas in Convex Optimization

CS-E4830 Kernel Methods in Machine Learning

Lagrangian Duality. Evelien van der Hurk. DTU Management Engineering

Support Vector Machines

Transcription:

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology March 2016 1 / 27

Constrained Optimization Problem We focus on the problem minimize f (x) subject to x X, g j (x) 0, j = 1,..., r, where f : R n R, g j : R n R, and X R n. The Lagrangian function L(x, µ) = f (x) + involves the multiplier vector µ = (µ 1,..., µ r ). r µ j g j (x) j=1 We aim to find µ (Langange multipliers) that facilitate analysis or algorithms. 2 / 27

Salient Properties of Lagrange Multipliers They enter in optimality conditions: They convert the problem to an unconstrained or less constrained optimization" of the Lagrangian They are central in sensitivity analysis: They quantify the rate of cost improvement as the constraint level is perturbed. Lagrange multipliers are some sort of derivative" of the primal function p(u) = inf f (x) x X, g(x) u This talk will revolve around these two properties. 3 / 27

Fritz John Conditions: A Classical Line of Development of L-Multiplier Theory There are several different lines of development of L-multiplier theory (forms of the implicit function theorem, forms of Farkas lemma, penalty functions, etc) FJ conditions is a classical line but not the most popular This talk will be in the direction of strengthening this line Our starting point A more powerful version of the classical FJ conditions They include extra conditions that narrow down the candidate multipliers It is based on the combination of two related but complementary works: Hestenes (1975) and Rockafellar (1993) The line of proof is based on penalty functions; goes back to a 4-page paper by McShane (1973) Allows: An easy" development of unifying constraint qualifications A direct connection to sensitivity 4 / 27

References for this Overview Talk Joint and individual works with Asuman Ozdaglar and Paul Tseng Bertsekas and Ozdaglar, 2002. Pseudonormality and a Lagrange Multiplier Theory for Constrained Optimization," J. Opt. Th. and Appl. Ozdaglar and Bertsekas, 2004. The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization," Optimization Methods and Software. Bertsekas, 2005. Lagrange Multipliers with Optimal Sensitivity Properties in Constrained Optimization," in Proc. of the 2004 Erice Workshop on Large Scale Nonlinear Optimization, Erice, Italy, Kluwer. Bertsekas, Ozdaglar, and Tseng, 2006. Enhanced Fritz John Optimality Conditions for Convex Programming," SIAM J. on Optimization. An umbrella reference is the book Bertsekas, D. P., Nedić, A., and Ozdaglar, A. E., 2003. Convex Analysis and Optimization, Athena Scientific, Belmont, MA. But it does not contain some refinements in the last three papers. 5 / 27

Outline 1 Enhanced FJ Conditions for Nonconvex/Differentiable Problems 2 Pseudonormality: A Unifying Constraint Qualification 3 Sensitivity for Nonconvex/Differentiable Problems 4 Convex Problems 5 Sensitivity Analysis for Convex Problems 6 / 27

Lagrange Multipliers for Nonconvex Differentiable Problems minimize f (x) subject to x X, g j (x) 0, j = 1,..., r, where f : R n R, g j : R n R, are cont. differentiable, and X R n is closed. The Lagrangian function is r L(x, µ) = f (x) + µ j g j (x) j=1 We focus at a local minimum x A L-multiplier at x is a µ = (µ 1,..., µ r ) 0 such that L(, µ ) is stationary at x and g j (x ) = 0 j with µ j > 0, Complementary Slackness (CS) Meaning of stationarity Case X = R n : xl(x, µ ) = 0 Case X R n : xl(x, µ ) y 0 for all y T x(x ), the tangent cone of X at x (we will define this later). 8 / 27

FJ Necessary Conditions for Case X = R n (Hestenes 1975) There exists (µ 0, µ ) (0, 0) such that (µ 0, µ ) (0, 0) and 1 µ 0 f (x ) + r j=1 µ j g j (x ) = 0 2 In every neighborhood of x there exists x such that f (x) < f (x ), and g j (x) > 0 j with µ j > 0 Condition (2) (called CV, for complementary violation) = CS Two Cases µ 0 = 0. Then µ 0 and satisfies r j=1 µ j g j (x ) = 0 and the stronger CV condition. This greatly facilitates proofs that µ 0 0 under various constraint qualifications. µ 0 = 1. Then µ is a L-multiplier and the positive µ j indicate the constraints j that need to be violated to effect cost improvement - a sensitivity property. 9 / 27

An Example A problem with equality constraint h(x) = 0 split as h(x) 0 and h(x) 0 apple x f(x) =f(x ) g 2 (x) =h(x) apple 0 g 1 (x) = h(x) apple 0 rf(x ) x rh(x ) µ 2 =0 µ 1 > 0 CS requires that µ 1 0 and µ 2 0 (there is an infinite number of these) CV requires that µ 1 > 0 and µ 2 = 0 (we cannot violate simultaneously both constraints) The multiplier satisfying CV is unique. Through its sign pattern indicates that the constraint g 1 (x) 0 should be violated for cost reduction. 10 / 27

Case X R n (Rockafellar 1993) Extension of FJ conditions with the Lagrangian stationarity at the local min x expressed in terms of N X (x ), the normal cone of X at x. Definitions of Tangent Cone T X (x ) and Normal Cone N X (x ) The tangent cone T X (x) at some x X is the set of all y such that y = 0 or there exists a sequence {x k } X such that x k x for all k and x k x, x k x x k x y y The normal cone N X (x) at some x X is the set of all z such that there exist sequences {x k } X and {z k } such that x k x, z k z, and z k T X (x k ) [the polar of T X (x k )]. Note that T X (x ) N X (x ). If equality holds, X is called regular at x (if X is convex, it is regular at all points). Convex X Not Regular X = TX(x) TX(x) TX(x) = {0} NX(x) x = 0 x = 0 NX(x) 11 / 27

FJ Conditions for Case X R n (Rockafellar 1993) Constrained optimization problem minimize f (x) subject to x X, g j (x) 0, j = 1,..., r, where f : R n R, g : R n R, are cont. differentiable, and X R n is closed. FJ Conditions There exists (µ 0, µ ) (0, 0) such that (µ 0, µ ) (0, 0) and ( 1 µ 0 f (x ) + ) r j=1 µ j g j (x ) N X (x ) 2 g j (x ) = 0 j with µ j > 0, i.e., CS holds. If µ 0 = 1 and X is regular at x, then N X (x ) = T X (x ), and condition (1) becomes r f (x ) + µ j g j (x ) y 0, y T X (x ), j=1 so µ is a L-multiplier satisfying CS. 12 / 27

Enhanced FJ Necessary Conditions for Case X R n (B+O 2002) Constrained optimization problem minimize f (x) subject to x X, g j (x) 0, j = 1,..., r, where f : R n R, g : R n R, are cont. differentiable, and X R n is closed. There exists (µ 0, µ ) (0, 0) such that (µ 0, µ ) (0, 0) and 1 ( µ 0 f (x ) + r j=1 µ j g j (x ) ) N X (x ) 2 In every neighborhood of x there exists x such that i.e., CV holds. f (x) < f (x ), and g j (x) > 0 j with µ j > 0 So if µ 0 = 1 and X is regular at x, then µ is a L-multiplier satisfying CV. We call such a multiplier informative. 13 / 27

Constraint Qualifications: Conditions Under Which µ 0 0 Pseudonormality: An umbrella constraint qualification (B+O 2002) x is a pseudonormal local min if there is no µ 0 and sequence {x k } X with x k x such that r r µ j g j (x ) N X (x ), µ j g j (x k ) > 0, k. j=1 j=1 Note: If x is pseudonormal and X is regular at x there is an informative L-multiplier. All the principal constraint qualifications for existence of L-multipliers and some new ones can be shown (very easily) to imply pseudonormality. Linear constraints Linear independence A Slater constraint qualification Concave inequalities Arrow-Hurwitz- Uzawa-Mangasarian Pseudonormality 15 / 27

Sensitivity: A Hierarchy of L-Multipliers (B+O 2002, 2005) Assume T X (x ) is convex and the set of L-multipliers is nonempty. Then: L-Multipliers (satisfy CS) Informative L-Multipliers (satisfy CV) Sensitivity Result Min-Norm L-Multiplier µ Cost improvement bound: For every sequence of infeasible vectors {x k } X with x k x we have f (x ) f (x k ) µ g + (x k ) + o ( x k x ), where g + (x) = ( g + 1 (x),..., g+ r (x) ) : constraint violation vector. Optimal constraint violation direction: If µ 0, there exists infeasible sequence {x k } X with x k x and such that lim k f (x ) f (x k ) g+ (x k ) g + = µ j (x k ), lim k g+ (x k ) = µ j, j = 1,..., r. µ 17 / 27

The Convex Case (B+O+T 2006) Consider the problem minimize f (x) subject to x X, g j (x) 0, j = 1,..., r, where f : R n R, g : R n R, are convex, and X R n is convex and closed. Primal function: p(u) = inf x X, g(x) u f (x), Dual function: q(µ) = inf x X L(x, µ) Optimal primal value: f = p(0), Optimal dual value: q = sup µ 0 q(µ) Duality gap: f q µ is Lagrange multiplier if f q = 0 and µ is a dual optimal solution Primal and dual FJ conditions adapted to convex programming There exists (µ 0, µ ) (0, 0) such that (µ 0, µ ) (0, 0), and { µ 0f = inf x X µ 0f (x) + } r j=1 µ j g j (x) (Primal FJ Conditions) { µ 0q = inf x X µ 0f (x) + } r j=1 µ j g j (x) (Dual FJ Conditions) and CS holds (there are versions with CV also). 19 / 27

Visualization with the Primal Function p(u) = inf x X, g(x) u f (x) p(u) µ 0 = 1 (0, f ) (µ 0, µ ) 0 g(x k ) u The most favorable case: p(0) exists, and µ = p(0) is the unique L-multiplier Rate of cost improvement: The slope of the support hyperplane at (0, f ) 21 / 27

Visualization with the Primal Function p(u) = inf x X, g(x) u f (x) p(u) µ 0 = 1 (0, f ) (µ 0, µ ) 0 g(x k ) u p is subdifferentiable at 0. Set of L-multipliers = p(0). Optimal rate of cost improvement: The slope of the min norm support hyperplane at (0, f ) 22 / 27

Visualization with the Primal Function p(u) = inf x X, g(x) u f (x) p(u) p(u) µ 0 = 1 (0, f ) µ 0 = 0 (0, f ) (µ 0, µ ) (µ 0, µ ) 0 g(x k ) u 0 g(x k ) u 23 / 27

Visualization with the Primal Function p(u) = inf x X, g(x) u f (x) p(u) µ 0 = 1 (0, f ) p(u) µ 0 = 0 (0, f ) (µ 0, µ ) (µ 0, µ ) 0 g(x k ) u 0 g(x k ) u p(u) (0, f ) (0, q ) µ 0 = 1 (µ 0, µ ) 0 g(x k ) u 24 / 27

Visualization with the Primal Function p(u) = inf x X, g(x) u f (x) p(u) p(u) µ 0 = 1 (0, f ) µ 0 = 0 (0, f ) (µ 0, µ ) (µ 0, µ ) 0 g(x k ) u 0 g(x k ) u p(u) p(u) (0, f ) µ 0 = 1 (0, f ) µ 0 = 0 (0, q ) (µ 0, µ ) (0, q ) (µ 0, µ ) 0 g(x k ) u 0 g(x k ) u 25 / 27

Dual Sensitivity Result Assume that < q f <, and that the set of dual optimal solutions is nonempty. Let µ be the min norm dual optimal solution (may not be a L-multiplier): Cost improvement bound: For all x X q f (x) µ g + (x) If µ 0, there exists infeasible sequence {x k } X such that the bound is asymptotically sharp, f (x k ) q, g + (x k ) 0, q f (x k ) g+ (x k ) µ. Also µ is the optimal constraint violation direction, g + (x k ) g+ (x k ) µ µ [Note: { g + (x k ) } may need to lie on a curve]. 26 / 27

Thank you! 27 / 27