Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41

Linear Programming The fundamental problem in constrained optimization is linear programming (LP). Continuous variables, gathered into a vector x R n ; A linear objective function (just one!) Linear constraints (usually many!) can be equalities or inequalities. A standard form of LP is: where x R n are the variables; min x c T x subject to Ax = b, x 0, A R m n is the constraint matrix; b R m is the right-hand side of the constraints; c R n is the cost vector; x 0 means that we require all components of x to be nonnegative. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 2 / 41

An LP in Two Variables Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 3 / 41

Other Forms of LP Any LP can be converted to the standard form, by adding extra variables and constraints, and doing other simple manipulations. Example 1: the constraint 3x 1 + 5x 2 0 can be converted to standard form by introducing a slack variable s 1 : 3x 1 + 5x 2 s 1 = 0, s 1 0. Example 2: the free variable x 10 can be replaced by a difference of two nonnegative variables: x 10 = x + 10 x 10, x + 10 0, x 10 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 4 / 41

Does it have a solution? There are three possible outcomes for an LP. It can be: INFEASIBLE: There is no x that satisfies all the constraints: min 3x 1 + 2x 2 s.t. x 1 + x 2 = 3, x 1 0, x 2 0. x 1,x 2 UNBOUNDED: There is a feasible ray along which the objective decreases to ; min x 1 x 1 s.t. x 1 0. OPTIMAL: The LP has one or more points that achieve the optimal objective value. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 5 / 41

LP Dual Associated with any LP is another LP called its dual. Together, these two LPs form a primal-dual pair of LPs. The dual takes the same data that defines the primal LP A, b, c but arranges it differently: cost vector switches with right-hand side; constraint matrix is transposed. Primal and dual give two different perspectives on the same data. (Primal) (Dual) min x c T x subject to Ax = b, x 0, max λ b T λ subject to A T λ c. We can introduce a slack s to get an alternative form of the dual: (Dual) max λ b T λ subject to A T λ + s = c, s 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 6 / 41

LP Duality Practically speaking, the dual may be easier to solve than the primal. More importantly, the primal and dual problems give a great deal of valuable information about each other. There are two big theorems about LP duality: Weak duality: One-line proof! Strong duality: really hard to prove! Theorem (Weak Duality) If x is feasible for the primal and (λ, s) is feasible for the dual, we have c T x b T λ. Proof. b T λ = x T A T λ = }{{} x T (A T λ c) +c T x c T x. }{{} 0 0 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 7 / 41

LP Duality Theorem (Strong Duality) Given a primal-dual pair of LPs, exactly one of the following three statements is true. (a) Both are feasible, in which case both have solutions, and their objectives are equal at the solutions: c T x = b T λ. (b) One of the pair is unbounded, in which case the other is infeasible. (c) Both are infeasible. Proof. One way is to show that the simplex method works in which case it can resolve cases (a) and (b). But this is not at all trivial in particular we have to enhance simplex to avoid getting stuck. (c) can be illustrated with a simple example. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 8 / 41

LP General Form For purposes of finding duals, etc, it can be a pain to convert to standard form, take the dual, then simplify. We can shorten this process by taking general form LP and defining its dual. Primal: Dual: max u,v min x,y ct x + d T y s.t. Ax + By b, Ex + Fy = g, x 0. b T u + g T v s.t. A T u + E T v c, B T u + F T v = d, u 0. Dual variable u is associated with first primal constraint. We have u 0 because this constraint is an inequality. Dual variable v is associated with second primal constraint. We have v free because this constraint is an equality. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 9 / 41

Karush-Kuhn-Tucker (KKT) Conditions Back to standard form... KKT conditions are a set of algebraic conditions satisfied whenever x is a primal solution and (λ, s) is a dual solution. Ax = b A T λ + s = c 0 x s 0, where x s means x T s = 0: perpendicularity. The last KKT condition means that x i = 0 AND/OR s i = 0 for all i = 1, 2,..., n. Another strategy for solving an LP would be to solve the KKT conditions. In fact, this strategy would yield solutions to primal and dual. Primal-dual interior-point methods use this strategy. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 10 / 41

KKT for General Form We can write KKT conditions for more general LP formulations. For the one on the earlier slide we have 0 Ax + By b u 0, Ex + Fy = g, 0 x c A T u E T v 0, B T u + F T v = d. KKT conditions consist of All primal and dual constraints, and Complementarity between each inequality constraint and its corresponding dual variable. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 11 / 41

Example Consider minimization of a linear function over the simplex: n min c T x s.t. x i = 1, x 0. x Dual is: max λ s.t. λ c i, i = 1, 2,..., n. λ KKT conditions are: n x i = 1, 0 x i (c i λ i ) 0, i = 1, 2,..., n. i=1 i=1 Solution of dual is totally obvious, by inspection: λ = min i c i. Can use KKT conditions to figure out solution to the primal: x is the set of all vectors for which n x 0, xi = 1, xj = 0 if c j > min i c i. i=1 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 12 / 41

Theorems of the Alternative: Party Tricks with Duality LP Duality can be used to prove an interesting class of results known as theorems of the alternative. These theorems have a generic form: Two logical statements, consisting of a set of algebraic conditions, labelled I and II. Exactly one of I and II is true. Lemma (Farkas Lemma) Given a matrix A R m n and a vector c R n, exactly one of the following two statements is true: I. There exists µ R m with µ 0 such that A T µ = c; II. There exists y R n such that Ay 0 and c T y < 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 13 / 41

Farkas Lemma: Proof Proof. Consider the following LP and its dual: P : min y c T y s.t. Ay 0, D : max µ 0 µ s.t. A T µ = c, µ 0. Suppose first that II is true. Then P is unbounded, so by strong duality D is infeasible. Hence I is false. Now suppose that II is false. Then the optimal objective for P must be 0. In fact, y = 0 is optimal, with objective 0. Strong duality tells us that D also has a solution (with objective 0, trivially). Any solution of D will satisfy I, so I is true Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 14 / 41

Convex Quadratic Programming (Convex QP) Standard form: 1 min x 2 x T Qx + c T x subject to Ax = b, x 0, where Q is symmetric positive semidefinite (that is, x T Qx 0 for all x). KKT conditions are a straightforward extension of those for LP: Ax = b A T λ + s = Qx + c 0 x s 0. Duality is a bit more complicated. We define it via a Lagrangian function: L(x, λ, s) := 1 2 x T Qx + c T x λ T (Ax b) s T x, which combines the objective and constraints, using the dual variables as coefficients for the constraints. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 15 / 41

QP Dual The Wolfe dual is: which reduces to max x,λ,s max L(x, λ, s) s.t. xl(x, λ, s) = 0, s 0, x,λ,s 1 2 x T Qx +c T x λ T (Ax b) s T x s.t. Qx +c A T λ s = 0, s 0, or equivalently max x,λ,s 1 2 x T Qx + b T λ s.t. Qx + c A T λ s = 0, s 0. When Q = 0 (linear programming) this simplifies to max λ,s b T λ s.t. c A T λ s = 0, s 0, so that x disappears from the problem, and we have exactly the dual LP form obtained earlier. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 16 / 41

Dual: Eliminating x If Q is positive definite (that is, positive semidefinite and nonsingular), we can eliminate x entirely from the QP dual. From the constraint Qx + c A T λ s = 0 we can write x = Q 1 (A T λ + s c). By substitution into the objective we obtain max λ,s 1 2 (AT λ + s c) T Q 1 (A T λ + s c) + b T λ s.t. s 0. This form may be easier to solve when Q is easy to invert (e.g. diagonal) because it has only nonegativity constraints s 0 no general constraints. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 17 / 41

Lagrangian Duality Consider a constrained optimization problem involving possibly nonlinear functions: min f (x) subject to c i (x) 0, i = 1, 2,..., m. Define the Lagrangian in the obvious way: L(x, λ) := f (x) λ T c(x) = f (x) Define the dual objective q(λ) as: q(λ) := inf x L(x, λ). m λ i c i (x). Note that we require the global infimum of L with respect to x to make this work but this can be found tractably when f and c i, i = 1, 2,..., m are all convex. The Lagrangian dual problem is then: max λ q(λ) s.t. λ 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 18 / 41 i=1

Some Properties Since we wish to maximize q, we are not interested in values of λ for which q(λ) =. Hence define the domain: D := {λ q(λ) > }. Theorem q is concave and its domain D is convex. Theorem (Weak Duality) For any x feasible for the primal and λ feasible for the dual, we have q( λ) f ( x). Proof. q( λ) = inf x since λ 0 and c( x) 0. f (x) λ T c(x) f ( x) λc( x) f ( x), Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 19 / 41

Another View of the Primal Another way to define the primal problem is as minimizing the max of the Lagrangian over λ 0. Consider r(x) := sup L(x, λ) = max f (x) λ 0 λ 0 λt c(x). If c i (x) < 0 for any i, we can drive λ i to + to make r(x) =! Hence, if we are looking to minimize r, we need only consider values of x for which c(x) 0. When c(x) 0, the max w.r.t. λ 0 is attained at λ = 0. For this λ we have r(x) = f (x). Thus the problem min x r(x) is equivalent to the original primal! There s a nice symmetry between primal and dual. Primal objective is sup λ 0 L(x, λ) Dual objective is inf x L(x, λ). Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 20 / 41

Equality Constraints Given primal problem min f (x) s.t. c i (x) 0, i = 1, 2,..., m; d j (x) = 0, j = 1, 2,..., p, Define Lagrangian as L(x, λ, µ) = f (x) λ T c(x) µ T d(x), and dual objective as before: q(λ, µ) := inf x L(x, λ, µ). Dual problem is then: max λ,µ q(λ, µ) s.t. λ 0. (µ is free.) Note that the primal objective is sup λ 0,µ L(x, λ, µ). Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 21 / 41

Linear Complementarity Problem (LCP) Complementarity problems involve algebraic and complementary relationships. In linear complementarity, all the relationships are linear. The basic LCP is defined by matrix M R N N and vector q R N. The problem is: Find z R N such that 0 z Mz + q 0. There s no objectve function! This is not an optimization problem. But it s closely related to optimization (via KKT conditions) and can also be used to formulate problems in economics, game theory, contact problems in mechanics. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 22 / 41

Varieties of LCP Monotone LCP: M is positive semidefinite: z T Mz 0 for all z. Strictly monotone LCP: M is positive definite (z T Mz > 0 for all z 0). Mixed LCP: contains equality constraints as well as complementarity conditions. Partition M as: [ ] M11 M M = 12, with M M 21 M 11 and M 22 square, 22 and partition q and z accordingly. Mixed LCP defined as: M 11 z 1 + M 12 z 2 + q 1 = 0 0 M 21 z 1 + M 22 z 2 + q 2 z 2 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 23 / 41

LP as LCP The KKT conditions for LP and QP form an LCP (usually mixed, depending on the formulation). KKT for LP: This is a mixed LCP with [ ] M11 M M = 12 = M 21 M 22 Ax b = 0, 0 A T λ + c x 0. [ ] 0 A A T, q = 0 In fact, it s a monotone mixed LCP, since z T Mz = λ T Ax λ T Ax = 0. [ ] b, z = c [ ] λ. x Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 24 / 41

QP as KKT Similarly can write KKT for QP as a mixed LCP: Ax b = 0, 0 A T λ + Qx + c x 0. This is a mixed LCP with [ ] [ M11 M M = 12 0 A = M 21 M 22 A T Q Note that ], q = [ ] b, z = c z T Mz = λ T Ax λ T Ax + x T Qx = x T Qx, [ ] λ. x so that it s a monotone LCP provided that Q is positive semidefinite i.e. the QP is convex. It can t be a strongly monotone LCP unless A is vacuous (that is, the QP has only bound constraints x 0) and Q is positive definite. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 25 / 41

Algorithms for LCP If we have an algorithm for solving monotone LCP, then we also have an algorithm for LP and convex QP. (Algorithms for nonmonotone LCP and nonconvex QP are a different proposition; these are hard problems in general for which polynomial algorithms are not known to exist.) Two main classes of algorithms of practical interest: Simplex algorithms for LP, related to active-set algorithm for QP and Lemke s method for LCP. Primal-dual interior-point methods which are quite similar for all three classes of problems. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 26 / 41

Bimatrix Games as LCP In a bimatrix game there are two players each of whom can play one of a finite number of moves. Depending on the combination of moves playes, each player wins or loses something, the amount being determined by an entry in a loss matrix. Player 1 has m possible moves: i = 1, 2,..., m; Player 2 has n possible moves: j = 1, 2,..., n; There are m n loss matrices A and B, such that if Player 1 plays move i and Player 2 plays move j, then Player 1 loses A ij dollars while Player 2 loses B ij dollars. It s a zero-sum game if A + B = 0. Example: Matching Pennies: Each player shows either H or T, and Player 1 wins $1 and Player 2 loses $ when the pennies match; Player 1 loses $1 and Player 2 wins $ when the pennies don t match. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 27 / 41

Matching Pennies loss matrices: [ ] 1 1 A =, B = 1 1 [ ] 1 1. 1 1 Assume that the bimatrix game is played repeatedly. Usually both players play a mixed strategy in which they choose ech move randomly with a certain probability, and independently of moves before and after: Player 1 plays move i with probability x i ; Player 2 plays move j with probability y j. Since x and y denote vectors of probabilities, we have x 0, y 0, e T x = 1, e T y = 1. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 28 / 41

Nash Equilibrium A Nash Equilibrium is a pair of mixed strategies x and ȳ such that neither player can gain an advantage by changing to a different strategy, provided that the opponent also does not change. Formally: (x x) T Aȳ 0, for all x with x 0 and e T x = 1. x T B(y ȳ) 0, for all y with y 0 and e T y = 1. Note that the definition is not changed it we add a constant to all elements of A and B. Thus can assume that A and B have all positive elements. (Useful for computation.) We can find Nash equilibria by solving an LCP with [ ] 0 A M = B T, q = e, 0 where e = (1, 1,..., 1) T is the vector of ones. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 29 / 41

Lemma Let A and B be positive loss matrices of dimension m n and suppose that (s, t) R m+n solves the LCP above. Then the point ( x, ȳ) = (s/(e T s), t/(e T t)) is a Nash equilibrium. Proof. By complementarity, have x T (At e) = 1 e T s st (At e) = 0, so that x T At = x T e = 1. We thus have Aȳ ( x T Aȳ)e = 1 e T t (At ( x T At)e) = 1 e T (At e) 0. t Thus for any x with e T x = 1 and x 0, we have 0 x T (Aȳ e( x T Aȳ)) (x x)aȳ 0. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 30 / 41

KKT Conditions for Nonlinear Problems Consider now the more general problem min f (x) subject to x Ω, where f is smooth and Ω is a polyhedral set, define by linear inequalities: Special cases: Ω := {x a T i x b i, i = 1, 2,..., m}. Positive orthant: Ω = R n + = {x x 0}. Bound constraints: Ω = {x l x u}. We can define optimality conditions for this problem using local linear approximations of f. (The constraint sets is already defined by linear quantities so no need to approximate it.) Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 31 / 41

Taylor s Theorem (version 1) Theorem If f : R n R is a continuously differentiable function, we have for x, p R n that there is s (0, 1) such that f (x + p) = f (x) + f (x + sp) T p. A consequence is that if d is a direction with f (x) T d < 0, then for all ɛ sufficiently small, we have f (x + sɛd) T d < 0, for all s [0, 1] (by continuity of f ); f (x + ɛd) < f (x) (by applying Taylor s theorem with p = ɛd). Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 32 / 41

Active Sets and Feasible Directions Given a feasible point x Ω, we can define Active set A(x) := {i = 1, 2,..., m ai T x = b i }; Inactive set I(x) := {1, 2,..., m} \ A(x) = {i = 1, 2,..., m ai T x > b i }. The feasible directions F(x) are the directions that move into Ω from x. F(x) := {d ai T d 0, i A(x)}. We have that d F(x) x + ɛd Ω, for all ɛ 0 sufficiently small, because i A(x) implies that ai T (x + ɛd) = b i + ɛai T d b i ; i I(x) implies that ai T (x + ɛd) > b i + ɛai T d > b i for ɛ small enough. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 33 / 41

x feasible directions Ω Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 34 / 41

Optimality: A Necessary Condition Lemma If x is a local solution of min x Ω f (x) that is, there are no other points close to x that are feasible and have a lower function value then there can be no feasible direction d with f (x ) T d < 0. Proof. Suppose that in fact we have d F(x ) with f (x ) T d < 0. By definition of F(x ), we have x + ɛd Ω for all ɛ 0 sufficiently small. Moreover from our consequence of Taylor s theorem, we also have f (x + ɛd) < f (x ) for all ɛ sufficiently small. Hence there are feasible points with lower values of f arbitrarily close to x, so x is not a local minimum. This is neat, but it s not a checkable, practical condition, because F(x ) contains infinitely many directions in general. However we can use Farkas s Lemma to turn it into KKT conditions. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 35 / 41

KKT Conditions for Linearly Constrained Optimization Theorem If x is a local solution of min x Ω f (x), then there exist Lagrange multipliers λ i, i A(x ), such that f (x ) = i A(x ) a i λ i. Proof. By the lemma above, there can be no direction d such that f (x ) T d < 0 and a T i d 0 for all i A(x ). Thus Farkas s Lemma tells us that the alternative statement must be true which is exactly the expression above. Full KKT conditions are obtained by adding feasibility for x : a T i x b i. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 36 / 41

KKT Conditions for Linearly Constrained Optimization We can restate the KKT conditions by using complementarity conditions to absorb the definition of A(X ): 0 ai T x b i λ i 0, i = 1, 2,..., m, m f (x ) + a i λ i = 0. i=1 The complementarity condition implies that λ i = 0 for i I(x ), so the inactive constraints do not contribute to the sum in the second condition. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 37 / 41

KKT conditions using Lagrangian We can define the Lagrangian for this case too: m L(x, λ) := f (x) λ i (ai T x b) = f (x) λ T (Ax b), where A = [a 1, a 2,..., a m ] and b = (b 1, b 2,..., b m ) T. Restating the KKT conditions in this notation, we have: 0 Ax b λ 0, x L(x, λ ) = 0. Example: When Ω = R n +, we have A(x ) = {i xi = 0}, a i = e i, b i = 0. KKT conditions reduce to [ f (x )] i 0 for i A(x ); [ f (x )] i = 0, for i I(x ). i=1 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 38 / 41

Nonlinear Constraints: What Could Possibly Go Wrong? Suppose that Ω is define by nonlinear algebraic inequalities: Ω := {x c i (x) 0, i = 1, 2,..., m}. A natural extension of the KKT conditions above is obtained by linearizing each c i around the point x, just as we did with f. Define Lagrangian: and KKT conditions: m L(x, λ) = f (x) λ i c i (x), i=1 x L(x, λ ) = 0, 0 c i (x ) λ i 0, i = 1, 2,..., m. Can we say that when x is a local minimizer, there must be λ such that these KKT conditions hold? NOT QUITE! We need constraint qualifications to make sure that nothing pathological is happening with the constraints at x. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 39 / 41

Constraint Qualifications Constraint qualifications (CQ) ensure that the linearized approximation to the feasible set Ω, evaluated at x, has a similar geometry to Ω itself. The linearization comes from a first-order Taylor expansion of the active constraints around x : {x c i (x ) T (x x ) 0, i A(x )}. x Ω * x* "linearized" Ωat x* Linearization captures the geometry of Ω well a CQ would be satisfied. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 40 / 41

Ω (single point) "linearized" Ω (entire line) The true Ω is the single point x, whereas the linearization is the entire line very different geometry. CQs would not be satisfied here, and KKT conditions would not be satisfied in general at x, even though it must be a local solution, regardless of f. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 41 / 41

References I Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific, second edition. Ferris, M. C., Mangasarian, O. L., and Wright, S. J. (2007). Linear Programming with Matlab. MOS-SIAM Series in Optimization. SIAM. Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, New York, second edition. Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 1