Optimality, Duality, Complementarity for Constrained Optimization

Similar documents
Constrained Optimization Theory

Linear Programming: Simplex

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Optimality Conditions for Constrained Optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

5. Duality. Lagrangian

Convex Optimization & Lagrange Duality

Lecture 18: Optimization Programming

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Convex Optimization M2

4TE3/6TE3. Algorithms for. Continuous Optimization

Lecture: Duality.

Support Vector Machines: Maximum Margin Classifiers

Convex Optimization and Modeling

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Constrained Optimization and Lagrangian Duality

4. Algebra and Duality

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

Chap 2. Optimality conditions

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

EE/AA 578, Univ of Washington, Fall Duality

Introduction to Nonlinear Stochastic Programming

Lecture: Duality of LP, SOCP and SDP

Numerical Optimization

Part IB Optimisation

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 6: Conic Optimization September 8

Linear and Combinatorial Optimization

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Linear and non-linear programming

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES

Conic Linear Optimization and its Dual. yyye

Lecture Note 18: Duality

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

On the Method of Lagrange Multipliers

Lagrange duality. The Lagrangian. We consider an optimization program of the form

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

The Karush-Kuhn-Tucker conditions

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

15-780: LinearProgramming

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Lecture 2: Linear SVM in the Dual

Lecture Note 5: Semidefinite Programming for Stability Analysis

2.098/6.255/ Optimization Methods Practice True/False Questions

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

Lecture 10: Linear programming. duality. and. The dual of the LP in standard form. maximize w = b T y (D) subject to A T y c, minimize z = c T x (P)

Lagrangian Duality Theory

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Support Vector Machines

A Brief Review on Convex Optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

LECTURE 10 LECTURE OUTLINE

Nonlinear Optimization

Support Vector Machines

Generalization to inequality constrained problem. Maximize

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Lecture Notes on Support Vector Machine

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

IE 5531 Midterm #2 Solutions

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

minimize x subject to (x 2)(x 4) u,

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Nonlinear Optimization: What s important?

How to Take the Dual of a Linear Program

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Absolute value equations

Optimization. A first course on mathematics for economists

Lecture 7: Convex Optimizations

CO 250 Final Exam Guide

Nonlinear Programming

Computing Solution Concepts of Normal-Form Games. Song Chong EE, KAIST

Economic Foundations of Symmetric Programming

Lecture: Algorithms for LP, SOCP and SDP

y Ray of Half-line or ray through in the direction of y

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Algorithmic Game Theory and Applications. Lecture 7: The LP Duality Theorem

EC /11. Math for Microeconomics September Course, Part II Problem Set 1 with Solutions. a11 a 12. x 2

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Lecture: Introduction to LP, SDP and SOCP

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Optimization for Machine Learning

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Lecture 10: Linear programming duality and sensitivity 0-0

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Interior-Point Methods

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control

Optimization 4. GAME THEORY

Tutorial on Convex Optimization: Part II

Linear programming: Theory

Transcription:

Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41

Linear Programming The fundamental problem in constrained optimization is linear programming (LP). Continuous variables, gathered into a vector x R n ; A linear objective function (just one!) Linear constraints (usually many!) can be equalities or inequalities. A standard form of LP is: where x R n are the variables; min x c T x subject to Ax = b, x 0, A R m n is the constraint matrix; b R m is the right-hand side of the constraints; c R n is the cost vector; x 0 means that we require all components of x to be nonnegative. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 2 / 41

An LP in Two Variables Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 3 / 41

Other Forms of LP Any LP can be converted to the standard form, by adding extra variables and constraints, and doing other simple manipulations. Example 1: the constraint 3x 1 + 5x 2 0 can be converted to standard form by introducing a slack variable s 1 : 3x 1 + 5x 2 s 1 = 0, s 1 0. Example 2: the free variable x 10 can be replaced by a difference of two nonnegative variables: x 10 = x + 10 x 10, x + 10 0, x 10 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 4 / 41

Does it have a solution? There are three possible outcomes for an LP. It can be: INFEASIBLE: There is no x that satisfies all the constraints: min 3x 1 + 2x 2 s.t. x 1 + x 2 = 3, x 1 0, x 2 0. x 1,x 2 UNBOUNDED: There is a feasible ray along which the objective decreases to ; min x 1 x 1 s.t. x 1 0. OPTIMAL: The LP has one or more points that achieve the optimal objective value. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 5 / 41

LP Dual Associated with any LP is another LP called its dual. Together, these two LPs form a primal-dual pair of LPs. The dual takes the same data that defines the primal LP A, b, c but arranges it differently: cost vector switches with right-hand side; constraint matrix is transposed. Primal and dual give two different perspectives on the same data. (Primal) (Dual) min x c T x subject to Ax = b, x 0, max λ b T λ subject to A T λ c. We can introduce a slack s to get an alternative form of the dual: (Dual) max λ b T λ subject to A T λ + s = c, s 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 6 / 41

LP Duality Practically speaking, the dual may be easier to solve than the primal. More importantly, the primal and dual problems give a great deal of valuable information about each other. There are two big theorems about LP duality: Weak duality: One-line proof! Strong duality: really hard to prove! Theorem (Weak Duality) If x is feasible for the primal and (λ, s) is feasible for the dual, we have c T x b T λ. Proof. b T λ = x T A T λ = }{{} x T (A T λ c) +c T x c T x. }{{} 0 0 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 7 / 41

LP Duality Theorem (Strong Duality) Given a primal-dual pair of LPs, exactly one of the following three statements is true. (a) Both are feasible, in which case both have solutions, and their objectives are equal at the solutions: c T x = b T λ. (b) One of the pair is unbounded, in which case the other is infeasible. (c) Both are infeasible. Proof. One way is to show that the simplex method works in which case it can resolve cases (a) and (b). But this is not at all trivial in particular we have to enhance simplex to avoid getting stuck. (c) can be illustrated with a simple example. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 8 / 41

LP General Form For purposes of finding duals, etc, it can be a pain to convert to standard form, take the dual, then simplify. We can shorten this process by taking general form LP and defining its dual. Primal: Dual: max u,v min x,y ct x + d T y s.t. Ax + By b, Ex + Fy = g, x 0. b T u + g T v s.t. A T u + E T v c, B T u + F T v = d, u 0. Dual variable u is associated with first primal constraint. We have u 0 because this constraint is an inequality. Dual variable v is associated with second primal constraint. We have v free because this constraint is an equality. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 9 / 41

Karush-Kuhn-Tucker (KKT) Conditions Back to standard form... KKT conditions are a set of algebraic conditions satisfied whenever x is a primal solution and (λ, s) is a dual solution. Ax = b A T λ + s = c 0 x s 0, where x s means x T s = 0: perpendicularity. The last KKT condition means that x i = 0 AND/OR s i = 0 for all i = 1, 2,..., n. Another strategy for solving an LP would be to solve the KKT conditions. In fact, this strategy would yield solutions to primal and dual. Primal-dual interior-point methods use this strategy. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 10 / 41

KKT for General Form We can write KKT conditions for more general LP formulations. For the one on the earlier slide we have 0 Ax + By b u 0, Ex + Fy = g, 0 x c A T u E T v 0, B T u + F T v = d. KKT conditions consist of All primal and dual constraints, and Complementarity between each inequality constraint and its corresponding dual variable. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 11 / 41

Example Consider minimization of a linear function over the simplex: n min c T x s.t. x i = 1, x 0. x Dual is: max λ s.t. λ c i, i = 1, 2,..., n. λ KKT conditions are: n x i = 1, 0 x i (c i λ i ) 0, i = 1, 2,..., n. i=1 i=1 Solution of dual is totally obvious, by inspection: λ = min i c i. Can use KKT conditions to figure out solution to the primal: x is the set of all vectors for which n x 0, xi = 1, xj = 0 if c j > min i c i. i=1 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 12 / 41

Theorems of the Alternative: Party Tricks with Duality LP Duality can be used to prove an interesting class of results known as theorems of the alternative. These theorems have a generic form: Two logical statements, consisting of a set of algebraic conditions, labelled I and II. Exactly one of I and II is true. Lemma (Farkas Lemma) Given a matrix A R m n and a vector c R n, exactly one of the following two statements is true: I. There exists µ R m with µ 0 such that A T µ = c; II. There exists y R n such that Ay 0 and c T y < 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 13 / 41

Farkas Lemma: Proof Proof. Consider the following LP and its dual: P : min y c T y s.t. Ay 0, D : max µ 0 µ s.t. A T µ = c, µ 0. Suppose first that II is true. Then P is unbounded, so by strong duality D is infeasible. Hence I is false. Now suppose that II is false. Then the optimal objective for P must be 0. In fact, y = 0 is optimal, with objective 0. Strong duality tells us that D also has a solution (with objective 0, trivially). Any solution of D will satisfy I, so I is true Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 14 / 41

Convex Quadratic Programming (Convex QP) Standard form: 1 min x 2 x T Qx + c T x subject to Ax = b, x 0, where Q is symmetric positive semidefinite (that is, x T Qx 0 for all x). KKT conditions are a straightforward extension of those for LP: Ax = b A T λ + s = Qx + c 0 x s 0. Duality is a bit more complicated. We define it via a Lagrangian function: L(x, λ, s) := 1 2 x T Qx + c T x λ T (Ax b) s T x, which combines the objective and constraints, using the dual variables as coefficients for the constraints. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 15 / 41

QP Dual The Wolfe dual is: which reduces to max x,λ,s max L(x, λ, s) s.t. xl(x, λ, s) = 0, s 0, x,λ,s 1 2 x T Qx +c T x λ T (Ax b) s T x s.t. Qx +c A T λ s = 0, s 0, or equivalently max x,λ,s 1 2 x T Qx + b T λ s.t. Qx + c A T λ s = 0, s 0. When Q = 0 (linear programming) this simplifies to max λ,s b T λ s.t. c A T λ s = 0, s 0, so that x disappears from the problem, and we have exactly the dual LP form obtained earlier. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 16 / 41

Dual: Eliminating x If Q is positive definite (that is, positive semidefinite and nonsingular), we can eliminate x entirely from the QP dual. From the constraint Qx + c A T λ s = 0 we can write x = Q 1 (A T λ + s c). By substitution into the objective we obtain max λ,s 1 2 (AT λ + s c) T Q 1 (A T λ + s c) + b T λ s.t. s 0. This form may be easier to solve when Q is easy to invert (e.g. diagonal) because it has only nonegativity constraints s 0 no general constraints. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 17 / 41

Lagrangian Duality Consider a constrained optimization problem involving possibly nonlinear functions: min f (x) subject to c i (x) 0, i = 1, 2,..., m. Define the Lagrangian in the obvious way: L(x, λ) := f (x) λ T c(x) = f (x) Define the dual objective q(λ) as: q(λ) := inf x L(x, λ). m λ i c i (x). Note that we require the global infimum of L with respect to x to make this work but this can be found tractably when f and c i, i = 1, 2,..., m are all convex. The Lagrangian dual problem is then: max λ q(λ) s.t. λ 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 18 / 41 i=1

Some Properties Since we wish to maximize q, we are not interested in values of λ for which q(λ) =. Hence define the domain: D := {λ q(λ) > }. Theorem q is concave and its domain D is convex. Theorem (Weak Duality) For any x feasible for the primal and λ feasible for the dual, we have q( λ) f ( x). Proof. q( λ) = inf x since λ 0 and c( x) 0. f (x) λ T c(x) f ( x) λc( x) f ( x), Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 19 / 41

Another View of the Primal Another way to define the primal problem is as minimizing the max of the Lagrangian over λ 0. Consider r(x) := sup L(x, λ) = max f (x) λ 0 λ 0 λt c(x). If c i (x) < 0 for any i, we can drive λ i to + to make r(x) =! Hence, if we are looking to minimize r, we need only consider values of x for which c(x) 0. When c(x) 0, the max w.r.t. λ 0 is attained at λ = 0. For this λ we have r(x) = f (x). Thus the problem min x r(x) is equivalent to the original primal! There s a nice symmetry between primal and dual. Primal objective is sup λ 0 L(x, λ) Dual objective is inf x L(x, λ). Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 20 / 41

Equality Constraints Given primal problem min f (x) s.t. c i (x) 0, i = 1, 2,..., m; d j (x) = 0, j = 1, 2,..., p, Define Lagrangian as L(x, λ, µ) = f (x) λ T c(x) µ T d(x), and dual objective as before: q(λ, µ) := inf x L(x, λ, µ). Dual problem is then: max λ,µ q(λ, µ) s.t. λ 0. (µ is free.) Note that the primal objective is sup λ 0,µ L(x, λ, µ). Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 21 / 41

Linear Complementarity Problem (LCP) Complementarity problems involve algebraic and complementary relationships. In linear complementarity, all the relationships are linear. The basic LCP is defined by matrix M R N N and vector q R N. The problem is: Find z R N such that 0 z Mz + q 0. There s no objectve function! This is not an optimization problem. But it s closely related to optimization (via KKT conditions) and can also be used to formulate problems in economics, game theory, contact problems in mechanics. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 22 / 41

Varieties of LCP Monotone LCP: M is positive semidefinite: z T Mz 0 for all z. Strictly monotone LCP: M is positive definite (z T Mz > 0 for all z 0). Mixed LCP: contains equality constraints as well as complementarity conditions. Partition M as: [ ] M11 M M = 12, with M M 21 M 11 and M 22 square, 22 and partition q and z accordingly. Mixed LCP defined as: M 11 z 1 + M 12 z 2 + q 1 = 0 0 M 21 z 1 + M 22 z 2 + q 2 z 2 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 23 / 41

LP as LCP The KKT conditions for LP and QP form an LCP (usually mixed, depending on the formulation). KKT for LP: This is a mixed LCP with [ ] M11 M M = 12 = M 21 M 22 Ax b = 0, 0 A T λ + c x 0. [ ] 0 A A T, q = 0 In fact, it s a monotone mixed LCP, since z T Mz = λ T Ax λ T Ax = 0. [ ] b, z = c [ ] λ. x Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 24 / 41

QP as KKT Similarly can write KKT for QP as a mixed LCP: Ax b = 0, 0 A T λ + Qx + c x 0. This is a mixed LCP with [ ] [ M11 M M = 12 0 A = M 21 M 22 A T Q Note that ], q = [ ] b, z = c z T Mz = λ T Ax λ T Ax + x T Qx = x T Qx, [ ] λ. x so that it s a monotone LCP provided that Q is positive semidefinite i.e. the QP is convex. It can t be a strongly monotone LCP unless A is vacuous (that is, the QP has only bound constraints x 0) and Q is positive definite. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 25 / 41

Algorithms for LCP If we have an algorithm for solving monotone LCP, then we also have an algorithm for LP and convex QP. (Algorithms for nonmonotone LCP and nonconvex QP are a different proposition; these are hard problems in general for which polynomial algorithms are not known to exist.) Two main classes of algorithms of practical interest: Simplex algorithms for LP, related to active-set algorithm for QP and Lemke s method for LCP. Primal-dual interior-point methods which are quite similar for all three classes of problems. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 26 / 41

Bimatrix Games as LCP In a bimatrix game there are two players each of whom can play one of a finite number of moves. Depending on the combination of moves playes, each player wins or loses something, the amount being determined by an entry in a loss matrix. Player 1 has m possible moves: i = 1, 2,..., m; Player 2 has n possible moves: j = 1, 2,..., n; There are m n loss matrices A and B, such that if Player 1 plays move i and Player 2 plays move j, then Player 1 loses A ij dollars while Player 2 loses B ij dollars. It s a zero-sum game if A + B = 0. Example: Matching Pennies: Each player shows either H or T, and Player 1 wins $1 and Player 2 loses $ when the pennies match; Player 1 loses $1 and Player 2 wins $ when the pennies don t match. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 27 / 41

Matching Pennies loss matrices: [ ] 1 1 A =, B = 1 1 [ ] 1 1. 1 1 Assume that the bimatrix game is played repeatedly. Usually both players play a mixed strategy in which they choose ech move randomly with a certain probability, and independently of moves before and after: Player 1 plays move i with probability x i ; Player 2 plays move j with probability y j. Since x and y denote vectors of probabilities, we have x 0, y 0, e T x = 1, e T y = 1. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 28 / 41

Nash Equilibrium A Nash Equilibrium is a pair of mixed strategies x and ȳ such that neither player can gain an advantage by changing to a different strategy, provided that the opponent also does not change. Formally: (x x) T Aȳ 0, for all x with x 0 and e T x = 1. x T B(y ȳ) 0, for all y with y 0 and e T y = 1. Note that the definition is not changed it we add a constant to all elements of A and B. Thus can assume that A and B have all positive elements. (Useful for computation.) We can find Nash equilibria by solving an LCP with [ ] 0 A M = B T, q = e, 0 where e = (1, 1,..., 1) T is the vector of ones. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 29 / 41

Lemma Let A and B be positive loss matrices of dimension m n and suppose that (s, t) R m+n solves the LCP above. Then the point ( x, ȳ) = (s/(e T s), t/(e T t)) is a Nash equilibrium. Proof. By complementarity, have x T (At e) = 1 e T s st (At e) = 0, so that x T At = x T e = 1. We thus have Aȳ ( x T Aȳ)e = 1 e T t (At ( x T At)e) = 1 e T (At e) 0. t Thus for any x with e T x = 1 and x 0, we have 0 x T (Aȳ e( x T Aȳ)) (x x)aȳ 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 30 / 41

KKT Conditions for Nonlinear Problems Consider now the more general problem min f (x) subject to x Ω, where f is smooth and Ω is a polyhedral set, define by linear inequalities: Special cases: Ω := {x a T i x b i, i = 1, 2,..., m}. Positive orthant: Ω = R n + = {x x 0}. Bound constraints: Ω = {x l x u}. We can define optimality conditions for this problem using local linear approximations of f. (The constraint sets is already defined by linear quantities so no need to approximate it.) Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 31 / 41

Taylor s Theorem (version 1) Theorem If f : R n R is a continuously differentiable function, we have for x, p R n that there is s (0, 1) such that f (x + p) = f (x) + f (x + sp) T p. A consequence is that if d is a direction with f (x) T d < 0, then for all ɛ sufficiently small, we have f (x + sɛd) T d < 0, for all s [0, 1] (by continuity of f ); f (x + ɛd) < f (x) (by applying Taylor s theorem with p = ɛd). Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 32 / 41

Active Sets and Feasible Directions Given a feasible point x Ω, we can define Active set A(x) := {i = 1, 2,..., m ai T x = b i }; Inactive set I(x) := {1, 2,..., m} \ A(x) = {i = 1, 2,..., m ai T x > b i }. The feasible directions F(x) are the directions that move into Ω from x. F(x) := {d ai T d 0, i A(x)}. We have that d F(x) x + ɛd Ω, for all ɛ 0 sufficiently small, because i A(x) implies that ai T (x + ɛd) = b i + ɛai T d b i ; i I(x) implies that ai T (x + ɛd) > b i + ɛai T d > b i for ɛ small enough. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 33 / 41

x feasible directions Ω Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 34 / 41

Optimality: A Necessary Condition Lemma If x is a local solution of min x Ω f (x) that is, there are no other points close to x that are feasible and have a lower function value then there can be no feasible direction d with f (x ) T d < 0. Proof. Suppose that in fact we have d F(x ) with f (x ) T d < 0. By definition of F(x ), we have x + ɛd Ω for all ɛ 0 sufficiently small. Moreover from our consequence of Taylor s theorem, we also have f (x + ɛd) < f (x ) for all ɛ sufficiently small. Hence there are feasible points with lower values of f arbitrarily close to x, so x is not a local minimum. This is neat, but it s not a checkable, practical condition, because F(x ) contains infinitely many directions in general. However we can use Farkas s Lemma to turn it into KKT conditions. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 35 / 41

KKT Conditions for Linearly Constrained Optimization Theorem If x is a local solution of min x Ω f (x), then there exist Lagrange multipliers λ i, i A(x ), such that f (x ) = i A(x ) a i λ i. Proof. By the lemma above, there can be no direction d such that f (x ) T d < 0 and a T i d 0 for all i A(x ). Thus Farkas s Lemma tells us that the alternative statement must be true which is exactly the expression above. Full KKT conditions are obtained by adding feasibility for x : a T i x b i. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 36 / 41

KKT Conditions for Linearly Constrained Optimization We can restate the KKT conditions by using complementarity conditions to absorb the definition of A(X ): 0 ai T x b i λ i 0, i = 1, 2,..., m, m f (x ) + a i λ i = 0. i=1 The complementarity condition implies that λ i = 0 for i I(x ), so the inactive constraints do not contribute to the sum in the second condition. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 37 / 41

KKT conditions using Lagrangian We can define the Lagrangian for this case too: m L(x, λ) := f (x) λ i (ai T x b) = f (x) λ T (Ax b), where A = [a 1, a 2,..., a m ] and b = (b 1, b 2,..., b m ) T. Restating the KKT conditions in this notation, we have: 0 Ax b λ 0, x L(x, λ ) = 0. Example: When Ω = R n +, we have A(x ) = {i xi = 0}, a i = e i, b i = 0. KKT conditions reduce to [ f (x )] i 0 for i A(x ); [ f (x )] i = 0, for i I(x ). i=1 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 38 / 41

Nonlinear Constraints: What Could Possibly Go Wrong? Suppose that Ω is define by nonlinear algebraic inequalities: Ω := {x c i (x) 0, i = 1, 2,..., m}. A natural extension of the KKT conditions above is obtained by linearizing each c i around the point x, just as we did with f. Define Lagrangian: and KKT conditions: m L(x, λ) = f (x) λ i c i (x), i=1 x L(x, λ ) = 0, 0 c i (x ) λ i 0, i = 1, 2,..., m. Can we say that when x is a local minimizer, there must be λ such that these KKT conditions hold? NOT QUITE! We need constraint qualifications to make sure that nothing pathological is happening with the constraints at x. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 39 / 41

Constraint Qualifications Constraint qualifications (CQ) ensure that the linearized approximation to the feasible set Ω, evaluated at x, has a similar geometry to Ω itself. The linearization comes from a first-order Taylor expansion of the active constraints around x : {x c i (x ) T (x x ) 0, i A(x )}. x Ω * x* "linearized" Ωat x* Linearization captures the geometry of Ω well a CQ would be satisfied. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 40 / 41

Ω (single point) "linearized" Ω (entire line) The true Ω is the single point x, whereas the linearization is the entire line very different geometry. CQs would not be satisfied here, and KKT conditions would not be satisfied in general at x, even though it must be a local solution, regardless of f. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 41 / 41

References I Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific, second edition. Ferris, M. C., Mangasarian, O. L., and Wright, S. J. (2007). Linear Programming with Matlab. MOS-SIAM Series in Optimization. SIAM. Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, New York, second edition. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 1