CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Size: px

Start display at page:

Download "CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS"

Amelia Phelps
5 years ago
Views:

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master

1 CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics Devi Ahilya Vishwavidyalaya, (NACC Accredited Grade A ) Indore (M.P.)

2 Contents Page No. Introduction 1 Chapter Preliminaries Chapter Constraint Qualifications Chapter Lagrangian Duality & Saddle Point Optimality Conditions References 53

3 Introduction The dissertation is a study of Constraint Qualifications, Lagrangian Duality and saddle point Optimality Conditions. In fact, it is a reading of chapters 5 and 6 of [1]. First chapter is about preliminaries. We collect results which are useful in subsequent chapters, like Fritz-John necessary and sufficient conditions for optimality and Karush-Kuhn-Tucker necessary and sufficient conditions for optimality. In second chapter we define the cone of tangents and show that F 0 T = is a necessary condition for local optimality is the cone of tangents. The constraint qualification which are defined are Abidie s, Slater s, Cottle s, Zangvill s, Kuhn- Tucker s and linear independence constraint qualification. We shall prove LICQ CQ ZCQ KT CQ AQ SQ We derive KKT conditions under various constraint qualifications. Further, We study of various constraint qualifications and their interrelationships. In third chapter, we define the Lagrangian dual problem and give its geometric interpretation. We prove the weak and strong duality theorems. We also develop the saddle point optimality conditions and its relationship with KKT conditions. Further, some important properties of the dual function, such as concavity, differentiability, and subdifferentiability have been discussed. Special cases of the Lagrangian duality for Linear and quadratic programs are also discussed.

4 Chapter 1 Preliminaries We collect definitions and results which will be useful. Definition 1.1(Convex function): Let f : S R, where S is a nonempty convex set in R n. The function f is said to be convex on S if f(x 1 +(1-)x 2 ) f(x 1 ) + (1-)f(x 2 ) for each x 1, x 2 S and for each (0, 1). Definition 1.2(Pseudoconvex function): Let S be a nonempty open set in R n, and let f : S R be differentiable on S. The function f is said to be pseudoconvex if for each x 1, x 2 S with f(x 1 ) t (x 2 - x 1 ) 0 we have f(x 2 ) f(x 1 ). Definition 1.3(Strictly Pseudoconvex function): Let S be a nonempty open set in R n, and let f : S R be differentiable on S. The function f is said to be strictly pseudoconvex if x 1 x 2, f(x 1 ) t (x 2 - x 1 ) 0 we have f(x 2 ) > f(x 1 ). Definition 1.4(Quasiconvex function): Let f : S R, where S is a nonempty convex set in R n. The function f is said to be quasiconvex if, for each x 1 and x 2 S, f(x 1 +(1-)x 2 ) max {f(x 1 ), f(x 2 )} for each (0, 1). Notation 1.5: F 0 = {d/ f(x 0 ) t d < 0} The cone of feasible directions: D = {d/d 0, x+d, for all (0, ) for some > 0} Theorem 1.6: Consider the problem to minimize f(x) subject to x S, where f : R n R and S is a nonempty set in R n. Suppose f is a differentiable at x 0, x 0 S. If x 0 is local minimum then F 0 D =. Conversely, suppose F 0 D =, f is pseudoconvex at x 0 and there exists an -neigborhood N(x 0 ), > 0 such that d = (x x 0 ) D for any x S N(x 0 ). Then, x 0 is a local minimum of f. Lemma 1.7: Consider the feasible region S = {x X : g i (x) 0 for i = 1,,m}, where X is a nonempty open set in R n, and where g i : R n R for i = 1,,m. Given a feasible point x 0 S, let I = {i : g i (x 0 ) = 0} be the index set for the binding or active

5 constraints, and assume that g i for i I are differentiable at x 0 and that the g i s for i I are continuous at x 0. Define the sets G 0 = {d : g i (x 0 ) t d < 0, for each i I} G = {d 0 : g i (x 0 ) t d 0, for each i I} [Cones of interior directions at x 0 ] Then, we have G 0 D G 0 Theorem 1.8: Consider the Problem P to minimize f(x) subject to x X and g i (x) 0 for i = 1,,m, where X is a nonempty open set in R n, f : R n R, and g i : R n R, for i = 1,,m. Let x 0 be a feasible point, and denote I = {i : g i (x 0 ) = 0}. Furthermore, suppose f and g i for i I are differentiable at x 0 and g i for i I are continuous at x 0. If x 0 is a local optimal solution, then F 0 G 0 =. Conversely, if F 0 G 0 =, and if f is pseudoconvex at x 0 and g i for i I are strictly pseudoconvex over some neigborhood of x 0, then x 0 is a local minimum. Theorem 1.9(The Fritz John Necessary Conditions): Let X be a nonempty open set in R n and let f : R n R, and g i : R n R, for i = 1,,m. Consider the Problem P to minimize f(x) subject to x X and g i (x) 0 for i = 1,,m. Let x 0 be a feasible solution, and denote I = {i : g i (x 0 ) = 0}. Furthermore, suppose f and g i for i I are differentiable at x 0 and g i for i I are continuous at x 0. If x 0 locally solves Problem P, then there exist scalars u 0 and u i for i I such that u 0 f(x 0 ) + i g i (x 0 ) = 0 u 0, u i 0 for i I (u 0, u I ) (0, 0) where u I is the vector whose component are u i for i I. Furthermore, if g i for i I are also differentiable at x 0, then the foregoing conditions can be written in the following equivalent form: u 0 f(x 0 ) + i g i (x 0 ) = 0 u i g i (x 0 ) = 0 for i = 1,,m u 0, u i 0 for i = 1,,m (u 0, u) (0, 0) where u is the vector whose components are u i for i = 1,,m.

6 Theorem 1.10(Fritz John Sufficient Conditions): Let X be a nonempty open set in R n and let f : R n R, and g i : R n R, for i = 1,,m. Consider the Problem P to minimize f(x) subject to x X and g i (x) 0 for i = 1,,m. Let x 0 be a FJ solution and denote I = {i : g i (x 0 ) = 0}. Define S as the relaxed feasible region for Problem P in which the nonbinding constraints are dropped. a. If there exists an -neigborhood N(x 0 ), > 0 such that f is pseudoconvex over N(x 0 ) S and g i, i I are strictly pseudoconvex over N(x 0 ) S, then x 0 is a local minimum for Problem P. b. If f is pseudoconvex at x 0 and if g i, i I are both strictly pseudoconvex and quasiconvex at x 0, then x 0 is a global optimal solution for Problem P. In particular, if these generalized convexity assumptions hold true only by restricting the domain of f to N(x 0 ) for some > 0, then x 0 is a local minimum for Problem P. Theorem 1.11(Karush-Kuhn-Tucker Necessary Conditions): Let X be a nonempty open set in R n and let f : R n R, and g i : R n R, for i = 1,,m. Consider the Problem P to minimize f(x) subject to x X and g i (x) 0 for i = 1,,m. Let x 0 be a feasible solution, and denote I = {i : g i (x 0 ) = 0}. Suppose f and g i for i I are differentiable at x 0 and g i for i I are continuous at x 0. Furthermore, suppose g i (x 0 ) for i I are linearly independent. If x 0 locally solves Problem P, then there exist scalars u i for i I such that f(x 0 ) + i g i (x 0 ) = 0 u i 0 for i I In addition to the above assumption, if g i for each i I is also differentiable at x 0, then the foregoing conditions can be written in the following equivalent form: f(x 0 ) + i g i (x 0 ) = 0 u i g i (x 0 ) = 0 for i = 1,,m u i 0 for i = 1,,m Theorem 1.12(Karush-Kuhn-Tucker Sufficient Conditions): Let X be a nonempty open set in R n and let f : R n R, and g i : R n R, for i = 1,,m. Consider the Problem P to minimize f(x) subject to x X and g i (x) 0 for i = 1,,m. Let x 0 be a

7 KKT solution, and denote I = {i : g i (x 0 ) = 0}. Define S as the relaxed feasible region for Problem P in which the constraints that are not binding at x 0 are dropped. Then, a. If there exists an -neigborhood N(x 0 ), > 0 such that f is pseudoconvex over N(x 0 ) S and g i, i I are differentiable at x 0 and are quasiconvex over N(x 0 ) S, then x 0 is local minimum for Problem P. b. If f is pseudoconvex at x 0, and if g i, i I are differentiable and quasiconvex at x 0, then x 0 is a global optimal solution to Problem P. In particular, if this assumption holds true with the domain of the feasible restriction to N(x 0 ), for some > 0, then x 0 is a local minimum for P. Theorem 1.13(Farkas Lemma): Let A be an m n matrix and c be an n vector. Then, exactly one of the following two system has a solution: System 1 A x 0 and c t x > 0 for some x R n System 2 A t y = c and y 0 for some y R n. Theorem 1.14(Gordan s Theorem): Let A be an m n matrix. Then, exactly one of the following systems has solutions: System 1 A x < 0 for some x R n System 2 A t p = 0 and p 0 for some nonzero p R m. Theorem 1.15(Closest point theorem): Let S be a nonempty closed convex set in R n and y S. Then, there exists a nonzero vector p and a scalar such that p t y > and p t x for each x S. Corollary 1.16(Existence of supporting hyperplane): Let S be, a nonempty convex set in R n and x 0 int S. Then there is a nonzero vector p such that p t (x x 0 ) 0 for each x cls. Lemma 1.17: Let f : R n R be a convex function. Consider any point x 0 R n and a nonzero direction d R n. Then, the directional derivative f (x 0, d), of f at x 0 in the direction d, exists. Theorem 1.18: Let S be a nonempty convex set in R n, and let f : S R be convex. Then, for x 0 int S, There exists a vector such that the hyperplane H = {(x, y) : y = f(x 0 ) + t (x-x 0 )} Supports epi f at (x 0, f(x 0 )). In particular, f(x) f(x 0 ) + t (x-x 0 ) for each x S

8 i.e., is a sub-gradient of f at x 0. Theorem 1.19: Let S be a nonempty convex set in R n, and let f : S R be convex on S. Consider the problem to minimize f(x) subject to x S. Suppose that x 0 S is a local optimal solution to the problem. 1. Then, x 0 is a global solution. 2. If either x 0 is strict local minimum or if f is strictly convex, then x 0 is the unique global optimal solution, and it is also a strong local minimum. Theorem 1.20: Let f : R n R be a convex function, and let S be a nonempty compact polyhedral set in R n. Consider the problem to maximize f(x) subject to x S. An optimal solution x 0 to the problem then exists, where x 0 is an extreme poin

9 Chapter 2 Constraint Qualifications Consider a problem P: Minimize f(x) Subject to x X g i (x) 0 i = 1,2,,m Usually, first, Fritz John necessary conditions (at local optimality) are derived. Then under certain constraint qualifications it is asserted that the multiplier associated with the objective function is positive at a local minimum. These are called Karush-Kuhn-Tucker (KKT) necessary conditions. In view of, theorem 1.8, local optimality implies that F 0 G 0 =, which implies the Fritz John conditions. Under the linear independence constraint qualification or more generally G 0, we deduced that the Fritz John conditions can only be satisfied if the Lagrangian multiplier associated with the objective function is positive. This led to KKT conditions. Local optimality F 0 D = F J conditions KKT conditions Theorem 1.6 Theorem 1.8 Constraint qualification Below, first we show, a necessary condition for local optimality is that F 0 T =, where is the cone of tangents. Using the constraint qualification T = G, we show get F 0 G =. Further using Farkas lemma(1.13), we get the KKT conditions. Local optimality F 0 T = F 0 G = KKT conditions Theorem 2.5 Theorem 2.7 Farkas Lemma Definition 2.1(The cone of tangents of S at 0 ): Let S be a non-empty set in R n, and let x 0 cls. The cone of tangents of S at x 0, denoted by T, is the set of all directions d such that d = lim (x k x 0 ) Where k > 0, x k S for each k, and x k x 0. Note 2.2: It is clear that d belongs to the cone of tangents if there is a feasible sequence {x k } converging to x 0 such that the directions of the cords x k x 0 converge to d.

10 Remark 2.3: Alternative equivalent descriptions: The cone of tangents T can be equivalently characterized in either of the following ways: a) T = {d : there exists a sequence{ k }0 + and a function : R R, where ()0 as 0, such that x = x 0 + k d + k ( k ) S for each k} b) T = { d : d = lim,where > 0, {x k}x 0 and where x k S and Proof: We have, x k x 0, for each k} ( k ) = d - k (x-x 0 )0 as k d = lim (x k x 0 ), where k > 0, x k S and x k x 0 Next, write k = in definition (b). Proposition 2.4: T is closed. Proof: T is the collection of limits. Here we develop the KKT conditions directly without first deriving the Fritz John conditions. This is done under various constraint qualifications. Theorem 2.5: Let S be a non-empty set in R n and let x 0 S. Furthermore, suppose that f: R n R is differentiable at x 0. If x 0 locally solves the problem to minimize f(x) subject to x S. Then F 0 T =, where F 0 = {d : f(x 0 ) t d < 0} and T is the cone of tangents of S at x 0. Proof: Let d T i.e., d = lim (x k x 0 ) where > 0, x k S for each K and x x 0. By differentiability of f at x 0, we get f(x k ) f(x 0 ) = f(x 0) ) t (x k -x 0 ) + x k -x 0 (x 0, x k - x 0 ) (2.1) where (x 0, x- x 0 )0 as x k x 0. By local optimality of x 0, we have f(x k ) f(x 0 ) for k large By (2.1), we get f(x 0) t (x k -x 0 ) + x k -x 0 (x 0, x k -x 0 ) 0 Multiplying by > 0 and taking the limit as k, the above inequality implies that

11 f(x 0 ) t d 0 d F 0 F 0 T = Note 2.6: The condition F 0 T = is not sufficient for local optimality of x 0. This condition will hold true whenever F 0 =, which is not sufficient for local optimality. However, if there exists as -neighborhood N(x 0 ) about x 0 such that N(x 0 ) S is convex and f is pseudo-convex over N(x 0 ) S than F 0 T = is sufficient to claim that x 0 is a local minimum. Abadie Constraint Qualification: T = G Theorem 2.7(KKT Necessary Conditions): Let X be a non-empty set in R n and let f : R n R and g i : R n R, i = 1,,m. Consider the problem Minimize f(x) Subject to g i (x) 0 i = 1,,m x X Let x 0 be a feasible solution and let I = {i : g i (x 0 ) = 0}. Suppose f and g i, i Ι are differentiable at x 0. Furthermore, suppose the constraint qualification T = G is true. If x 0 is a local optimal solution, then there exist nonnegative scalars u i for i Ι such that f(x 0 ) + i g i (x 0 ) = 0 Proof: By above theorem 2.5, we have F 0 T =. By assumption T = G, so that F 0 G =. This means the following system has no solution f(x 0 ) t d < 0, g i (x 0 ) t d 0 for i I By Farkas Lemma, the following system has a non-negative solution f(x) + i g i (x 0 ) = 0 Example 2.8: Minimize -x 1 Subject to x 2 -(1-x 1 ) 3 0 -x 2 0 The optimal point is x 0 = (1, 0). I = {1, 2} f(x 0 ) = (-1, 0) t g 1 (x 0 ) = (0, 1) t g 2 (x 0 ) = (0, -1) t

12 Now, we have and Figure 2.1 G = {d / g i (x 0 ) t d 0, i I} g 1 (x 0 ) t d 0 (0, 1) 0 d 2 0 g 2 (x 0 ) t d 0 Hence Therefore, (0, -1) 0 d 2 0 d 2 = 0 and d 1 R G = {(d, 0) / d R} where T is the shaded region. Thus, the Abade s Constraint Qualification T = G does not satisfy. Remark 2.9: The Abadie constraint qualification T = G can be equivalently stated ast G ; since T G is always true. We have

13 S= {x X : g i (x) 0, i = 1,,m} Let d T. Then d = lim (x k x 0 ) Where k > 0, x k S for each k and x k x 0. Since x k S g i (x k ) 0 For i I, we write g i (x 0 + k (x k -x 0 )) = g i (x 0 ) + k g i (x 0 ) t (x k - x 0 ) + 2 k x k -x 0 2 (x 0, k ( x k -x 0 )) = k g i (x 0 ) t (x k -x 0 ) + 2 k x k -x 0 2 (x 0, k (x k -x 0 )) Choose k, such that x 0 + k (x k -x 0 ) S, so that g i (x k + k (x k -x 0 )) 0 and hence as k g i (x 0 ) t d 0 d G Linearly Constrained Problem: Below, we shall show that if the constraints are linear, then Abadie constraint is automatically true. Further, it implies that KKT conditions are always necessary for problem with linear constraints whether the objective function is linear or nonlinear. Lemma 2.10: Let A be an m n matrix. Let b be an m vector, and let S = {x : Ax b}. Suppose x 0 S is such that A 1 x = b 1 and A 2 x < b 2, where A t = (A t 1, A t 2 ) and b t = (b t 1, b t 2 ). Then T = G. Proof: If A 1 = Then by definition of G, G = R n. Furthermore, as x 0 int S, T = R n. Thus G = T. Now, suppose that A 1. Let d T i.e, d = lim (x k x 0 ), where x k S, k > 0 for each k Then, A 1 (x k -x 0 ) b 1 - b 1 = 0 (2.2) Multiplying (2.2) by k > 0 and taking the limit as k A 1 d 0 d G T G

14 Now, let d G i.e, A 1 d 0. To show that d T. Since A 2 x 0 < b 2, there is a > 0 such that A 2 (x 0 + k d) < b 2 for all (0,) Further, since A 1 x 0 = b 1 and A 1 d 0 then A 1 (x 0 +d) b 1 for all > 0 x 0 +d S for each (0,) d T [ Use definition of T ] G T Therefore, T = G. Other Constraint Qualifications The KKT conditions can be developed under various constraint qualifications. In this section, we present some important constraint qualifications. We know that local optimality implies that F 0 T = and that the KKT conditions follow under the constraint qualification T = G. If we define a cone C T, then F 0 T = also implies that F 0 C =. Therefore, any constraint of the form C = G will lead to the KKT conditions. Since C T G, the constraint qualification C = G implies T = G. Hence, the constraint qualification C = G is more restrictive than Abadie s qualification. Theorem 2.5 if C T constraint qualification C = G local optimality F 0 T = F 0 C = F 0 G = Farkas theorem KKT Conditions We consider below several such cones whose closures are contained in T. Note that S = {x X : g i (x) 0, i=1,...,m}, x 0 is a feasible point, I = {i : g i (x 0 ) = 0} The cone of feasible Directions of S at x 0 : D = {d : d 0, x + d S, for all (0, ) for some > 0} The Cone of Attainable Directions of S at x 0 :

15 It is a cone of directions d such that there exist a > 0 and an : R R n such that () S for (0, ), (0) = x 0 and (( lim = d It is denoted by A. In other words, d belongs to the cone of attainable direction if there is a feasible arc starting from x 0 that is tangential to d. The Cone of Interior Directions of S at x 0 : G 0 = {d : g i (x 0 ) t d < 0, i I} Note 2.11: If X is open and each g i, for i I is continuous at x 0 then d G 0 implies that x 0 +d X 0 for > 0 and sufficiently small. Below, we show that all above cones and their closures are contained in T. Lemma 2.12: Let X be a non-empty set in R n, and let f : R n R and g i : R n R for i = 1,,m. Consider the problem to minimize f(x) subject to g i (x) 0, i=1,,m and x X. Let x 0 be a feasible point and let I = {i : g i (x 0 ) = 0}. Suppose that each g i for i I is differentiable at x 0 and let G = {d : g i (x 0 ) t d 0, i I}. Then cld cla T G where D, A and T are, respectively, the cone of feasible directions, the cone of attainable directions, and the cone of tangents of feasible region at x 0. Furthermore if X is open and each g i for i I is continuous at x 0 then G 0 D, so that clg 0 cld cla T G where G 0 is the cone of interior directions of the feasible region at x 0. Proof: If we consider (x) = x, an identity mapping, D A. Next, if we consider, since ( X, any sequence x k = ( k ) which converges to x 0 then d = lim ( ( = lim ( = Therefore A T. By remark 2.9. T G. Hence D A T G As T and G are closed cld cla T G Now note that, by lemma 1.7, G 0 D. Hence, clg 0 cld cla T G

16 Remark 2.13: To see how each of five containments can be strict we consider examples. Example 2.14: Consider figure 2.2. Figure 2.2 As there is no interior in the immediate vicinity of, G 0 = = clg 0 ; whereas clearly D = cld = G ; a feasible direction is along the edge incident at x 0. Note that any d G 0 is a direction leading to interior feasible solutions, it is not true that any feasible direction that leads to interior points belongs to G 0. Example 2.15: [Figure 2.3] Minimize -x 1 Subject to x 2 (1- x 1 ) 3 0 -x 2 (1- x 1 ) 3 0.

17 Figure 2.3 Obviously G 0 = at x 0 = (1, 0) t where as d = (-1, 0) t gives interior feasible solutions. Following example shows cl D cl (A). Example 2.16: Consider the region defined by 2 x 1 x x 1 + x i.e., x 1 = x 2 The set of feasible points lies on the parabola x 1 = x 2 2. At x 0 = (0, 0) t, D = = cl D while A = {d / d = (0, 1) t or d = (0, -1) t, 0} = G Now, we give an example to show that cl A T. Example 2.17: Suppose that the feasible region S =, 0 / = 1,2 Thus A = = cl A, since there are no feasible arcs. By definition, T = {d / d = (1,0), 0} and T = G. Next, to see that T G.

18 Example 2.18: [Figure 2.1] Here, while Minimize -x 1 Subject to x 2 - (1- x 1 ) 3 0 -x 2 0 T = {d / d = (-1, 0) t, 0} G = {d / d = (-1, 0) t or d = (1, 0) t }. Below, we show that some constraint qualifications that validate the KKT conditions and discuss their interrelationships. Slater s Constraint Qualification (SCQ) The set X is open, each g i, i I is pseudo-convex at x 0, each g i, i I is continuous at x 0, and there is an x X such that g i (x) < 0 for all i I. Linear Independence Constraint Qualification (LICQ) The set X is open, each g i, i I is continuous at x 0 and g i (x 0 ), i I are linearly independent. Cottle s Constraint Qualification (CCQ) The set X is open each g i, i I is continuous at x 0, and cl G 0 = G. Zangwill s Constraint Qualification (ZCQ) cl D = G Kuhn Tucker s Constraint Qualification (KTCQ) cl A = G Validity of the Constraint Qualification and their Interrelationships In theorem 2.7, we showed that the KKT necessary optimality conditions are necessarily true under Abadie s constraint qualification T = G. We show below that all the constraint qualifications stated above imply Abadie s constraint qualification and, hence each validates the KKT necessary conditions.

19 From lemma 2.12, it is clear that Cottle s constraint qualification implies that Zangwill, which implies that of Kuhn and Tucker which in turn implies Abadie s qualification. Cottle s CQ Zangwill s CQ KT CQ Abadie CQ To show that Slater s CQ Cottle s CQ LICQ Suppose Slater s constraint qualification holds. Then there is an x X such that g i (x) < 0, i I. Since g i (x) < 0 and g i (x 0 ) = 0 i.e., g i (x) < g i (x 0 ) = 0 By the pseudo-convexity of g i at x 0 g i (x 0 ) t (x-x 0 ) < 0 d = (x-x 0 ) G 0 i.e., G 0. Therefore, cl G 0 = G and hence, Slater s CQ Cottle s CQ Now suppose that the linear independence constraint qualification is satisfied. i.e., ( = 0 has no nonzero solution. By theorem 1.14, it follows that there exists a vector d such that g i (x 0 ) t d < 0, i I Thus G 0, and, hence Cottle s qualification holds true. Therefore, LICQ Cottle s CQ The counter examples given also show that all these implication are one-way. Example 2.19: [Example 2.8, Figure 2.1] minimize -x 1 Subject to x 2 - (1 - x 1 ) 3 0 -x 2 0 f(x 0 ) = (-1, 0) t g 1 (x 0 ) = (0, 1) t g 2 (x 0 ) = (0, -1) t In Figure 2.1, observe that FJ conditions are true with u 0 = 0, u 1 = u 2 =. Hence, x 0 is not a KKT point. Thus, no constraint qualification can possibly hold true.

20 Note 2.20: Cottle s constraint qualification is equivalent to requiring that G 0. Let A be an m n matrix and consider the cones G 0 = {d / Ad < 0} G = {d / Ad 0} It is easy to see that G 0 is an open convex cone and G is a closed convex cone. Further G 0 = int G. Next, clg 0 = G G 0 Remark 2.21: We know Slater s CQ and LICQ both imply Cottle s CQ. Hence whenever these constraints qualifications hold at a local minimum x 0, then x 0 is a FJ point with the Lagrangian multiplier u 0 > 0. Other way we might have Zangwill s or KT s or Abadie s CQs holding at a local minimum x 0 with possibly be zero in some solution to the FJ conditions.

21 Chapter 3 Lagrangian Duality & Saddle Point Optimality Conditions Given a non-linear programming problem there is another non-linear programming problem closely associated with it. The former is called the primal problem and the later is called the Lagrangian dual problem. Under certain convexity assumptions and suitable constraint qualifications, the primal and dual problems have equal optimal objective values. Hence it is possible to solve the primal problem indirectly by solving the dual problem. The Lagrangian Dual Problem Consider the following non-linear programming Problem P, called primal problem Primal Problem P: Minimize f(x Subject to gi(x 0 i = 1,,m hi(x = 0 i = 1,,l x X Lagrangian Dual Problem D: Maximum (u, v Subject to u 0 where (u, v = inf ( + ( + h (: Remark 3.1: The Lagrangian dual function may assume the value of - for some vector (u, v. The optimization problem that evaluates (u, v is called Lagrangian dual subproblem. The multiplier u i is non-negative where as v i is

22 unrestricted. Since the dual problem consist of maximizing the infimum (glb of the function ( + ( + h ( it is sometimes referred to as max-min min dual problem. Note 3.2: The primal and Lagrangian dual problems can be written in the following form using vector notation, where f : R n R, g : R n R m is a vector function whose i th component is gi and h : R n R is a vector function whose i th component is h i. Primal Problem P: Minimize f(x Subject to g(x 0 h(x = 0 x X Lagrangian Dual Problem D : Maximize (u, v Subject to u 0 where (u, v = inf {f(x + u t g(x + v t h(x : x X} Remark 3.3: Given a non-linear programming problem, several Lagrangian dual problems can be devised, depending on which constraints are handled as g(x 0 and h(x = 0 and which constraint are treated by the set X. This, choice can affect both the optimal value of D (as in non-convex situations and the effort expended in evaluating and updating the dual function during the course of solving the dual problem. Hence, an appropriate selection of the set X must be made, depending on the structure of the problem and the purpose for solving D.

23 Geometric Interpretation of the Dual Problem For the simplicity, we shall consider only one inequality constraint and assume that no equality constraints exist. Then, the primal problem is Minimize f(x Subject to g(x 0 x X In the (y, z plane, the set G = {(y, z : y = g(x, z = f(x for some x X}is the image of X under the (g, f map. The primal problem is to find a point in G with y 0 that has minimum ordinate. Obviously, this points is (, in Figure 3.1. Figure Geometric interpretation of Lagrangian duality Now suppose that u 0 is given. To determine (u, we need to minimize f(x + ug(x over all x X. Put y = g(x and z = f(x, x X. This problem is equivalent to minimize z + uy over points in G. Note that z + uy = is an equation of a straight line with slop u and intercept on the z-axis. To minimize z + uy over G, we need to move the line z + uy = parallel to itself as far down (along its negative gradient as possible until it supports G from below, i.e, the set G is above the line and touches it. Then, the intercept on the z axis gives (u.

24 The dual problem is therefore equivalent to finding the slop of the supporting hyperplane such that its intercept on the z axis is maximal. In figure such a hyperplane has and supports the set G at the point (,. Thus, the optimal dual solution is, and the optimal dual objective value is. Furthermore, the optimal primal and dual objectives are equal in this case. There is another related interesting interpretation that provides an important conceptual tool. For the problem under consideration, define the function (y = min {f(x : g(x y, x X} The function is called a perturbation function since it is the optimal value function of a problem obtained from the original problem by perturbing the right-hand side of the inequality constraint g(x 0 to y from the value of zero. Note that (y is a non-increasing function of y since, as y increases, the feasible region of the perturbed problem enlarges (or stays the same. In figure 3.1, observe that corresponds here to the lower envelope of G between points A and B because this envelope is itself monotone-decreasing. Moreover, remains constant at the value at point B for values of y higher than that at B, and becomes for points to the left of A because of infeasibility. In particular, if is differentiable at the origin, we observe that (0 = -. Hence, the marginal rate of change in objective function value with an increase in the right-hand side of the constraint from its present value of zero is given by -, the negative of the Lagrangian multiplier value at optimality. If is convex but is not differentiable at the origin, then is evidently a subgradient of at y = 0. In either case, we know that (y (0 y for all y R We shall see later, can be non-differentiable and/or non-convex, but the condition (y (0 y holds true for all y R, if and only if is a KKT Lagrangian multiplier corresponding to an optimal solution x0 such that it solves the dual problem with equal primal and dual objective values. This happens to be the case in figure 3.1.

25 Example 3.4: Consider the following primal Problem Minimize x1 2 + x2 2 Subject to -x1 x x 1, x 2 0 Observe that the optimal solution occurs at the point (x 1, x 2 = (2, 2 with objective value 8. Let g(x = -x 1 - x 2 + 4, X = {(x 1, x 2 : x 1, x 2 0} The dual function is given by (u = inf {x x u(-x 1 - x : x 1, x 2 0} = inf {x ux 1 : x 1 0} + inf {x ux 2 : x 2 0} + 4u It is easy to see that the above infima are achieved at x 1 = x 2 = if u 0 and x1 = x 2 = 0 if u < 0. Hence, (u = < 0 Figure 3.2 Graph of primal and dual function Note that is a concave function, and its maximum over u 0 occurs at = 4. Note also that the optimal primal and dual objective values are both equal to 8. Now let us consider the problem in the (y, z plane, where y = g(x and z = f(x. To find G, the image of X = {(x 1, x 2 : x 1 0, x 2 0}, under the (g, f map. This is done by finding explicit expressions for the lower and upper envelops of G, denoted respectively and.

26 Given y, note that ( and ( are the optimal objective values of following problems P1 and P2 respectively Problem P1 Problem P2 Minimize x x 2 2 Maximize x x 2 2 Subject to -x 1 x = 0 Subject to -x 1 x = 0 x1, x2 0 x1, x2 0 For P1: P It is easy to see that optimal (minimum value exists at, with objective function value ( = ( for y 4 For P 2: Optimal points (0, 4, (4, 0 with ( = (4 for y 4. Note that x X implies that x 1, x 2 0 so that -x 1 - x Thus, every point x X corresponds y 4. Note that the optimal dual solution is = 4, which is the negative of the slop of the supporting hyperplane shown in figure 3.2. The optimal dual objective value is (0 = 8 and is equal to the optimal primal objective value. Next, the perturbation function (y, y R corresponds to the lower envelope ( for y 4 and (y remains constant at the value of 0 for y 4. The slop (0 equals -4, the negative of the optimal Lagrangian multiplier value. Moreover, we have (y (0 4y for all y R

27 Figure 3.3- Geometric illustration of the example We shall show that this is a necessary and sufficient condition for the primal and dual objective values to match at optimality. Duality Theorems and Saddle Point P Optimality Conditions Weak Duality Theorem 3.5: Let x be a feasible solution to Problem P: Minimize f(x Subject to g(x 0 h(x = 0 x X Let (u, v be a feasible solution to Problem D: Maximize (u, v Subject to u 0 where (u, v = inf {f(x + u t g(x + v t h(x : x X}. Then f(x (u, v.

28 Proof: By the definition of, and since x X, we have (u, v = inf {f(y + u t g(y + v t h(y : y X} f(x + u t g(x + v t h(x f(x f(x [Since u 0, g(x 0, and h(x = 0] Corollary 3.6: inf {f(x : x X, g(x 0, h(x = 0} sup { (u, v : u 0} Corollary 3.7: If f(x0 = (, where 0 and x0 {x X : g(x 0, h(x = 0} then x0 and (, solve the primal and dual problems, respectively. Proof oof: f(x 0 = (, (u, v f(x 0 = (, f(x x0 is an optimal solution of P and (, is dual optimal solution of D. Corollary 3.8: If inf {f(x : x X, g(x 0, h(x = 0} = - Then (u, v = - for each u 0 Corollary 3.9: If Sup {(u, v : u 0} = Then the primal problem has no feasible solution. Duality Gap: From corollary 3.6, to Weak duality theorem, the optimal objective value of the primal problem is greater than or equal to the optimal objective value of the dual problem. If strict inequality holds true, then a duality gap is said to exist.

29 Figure Illustration of a duality gap. The figure illustrates the case of a duality gap for a problem with a single inequality constraint and no equality constraints. The perturbation function (y for y R, is the greatest monotone nonincreasing function that envelopes G from below. The optimal primal value is (0. The greatest intercept on the ordinates z-axis achieved by a hyperplane that supports G from below gives the optimal dual objective value. In particular, there does not exist a such that (y (0 - y for all y R. Example 3.10: Consider the following problem Minimize f(x = -2x 1 + x 2 Subject to h(x = x1 + x2-3 = 0 (x1, x2 X where X = {(0, 0, (0, 4, (4, 4, (4, 0, (1, 2, (2, 1} It is easy to see that x 0 = (2, 1 is the optimal solution to the primal solution. With objective value equal to -3. The dual objective function is given by (v = min {f(x + vh(x : x X} = min {-2x 1 + x 2 + v(x 1 + x 2-3 : (x 1, x 2 X} = min {-3v, 4+v, 5v-4, v-8, 0, -3}

30 Thus, the explicit expression for is given by (v = Figure Dual function for Example 3.10 The dual function is shown in above figure 3.5, and the optimal solution is = 2 with objective value -6. There exists a duality gap. Now, consider the graph G = {(h(x, f(x: x X}, which consist of a finite number of points. In particular, G = {(x1 + x2 3, -2x1 + x2 : (x1, x2 X} Figure Geometric interpretation of Example 3.10

31 The supporting hyperplane, whose intercept on the vertical axis is maximal, is equal to -6 with the slop -2. Thus the optimal dual solution is = 2, with objective value -6. Furthermore, note that the points in the set G on the vertical axis correspond to the primal feasible points and hence the primal objective value is equal to -3. The perturbation function here is defined as (y = min {f(x : h(x = y, x X} Because of the discrete nature of X, h(x can take only a finite possible number of values. Hence noting G in figure 3.6, we obtain (-3 = 0, (0 = -3, (1 = -8, and (5 = -4 with (y = - for all y R otherwise. Again, the optimal primal value is (0 = -3, and there does not exist a such that (y (0 y. Hence, a duality gap exists. Conditions that guarantee absence of a duality gap are given by Strong duality theorem. Further, we shell related these conditions to perturbation function. Lemma 3.11: Let X be a non-empty convex set in R n. Let : R n R and g : R n R m be convex, and let h : R n R be affine; i.e, h is of the form h(x = Ax b. If system 1 below has no solution x, then system 2 has a solution (u 0, u, v. The converse holds if u0 > 0. System 1: α(x < 0 g(x 0 h(x = 0 for some x X System 2: u 0(x + u t g(x + v t h(x 0 for all x X (u 0, u 0 (u 0, u, v 0 Proof: Suppose the System 1 has no solution, and consider the following set: = {(p, q, r : p > (x, q = g(x, r = h(x for some x X} As X, and g are convex and h is affine, is convex. Since System 1 has no solution (0, 0, 0. By corollary 1.16, there exists a non-zero (u 0, u, v such that u0p + u t q + v t r 0 for each (p, q, r cl (3.1

32 Now fix an x X. Since p and q can be made arbitrarily large, (3.1 holds true only if u0 0 and u 0. Furthermore; (p, q, r = [(x, g(x, h(x] cl Therefore, from (3.1, we get u 0(x + u t g(x + v t h(x 0 Since the above inequality is true for each x X, System 2 has a solution. To prove the converse, assume that System 2 has a solution (u0, u, v such that u 0 > 0 and u 0, satisfying u 0(x + u t g(x + v t h(x 0 for each x X Now let x X be such that g (x 0 and h (x = 0. From the above inequality, since u 0, we conclude that u 0(x 0. Since u 0 > 0, (x 0; and, hence, System 1 has no solution. Following theorem is the Strong duality theorem which shows that under suitable convexity assumptions and under a constraint qualification, the optimal objective function values of the primal and dual problems are equal. Theorem 3.12(Strong Duality Theorem: Let X be a nonempty convex set in R n. Let f : R n R and g : R n R m be convex, and let h : R n R be affine; i.e., h is of the form h(x = Ax b. Suppose that the following constraint qualification holds true. There exists an X such that g( < 0 and h( = 0, and 0 int {h(x = {h(x : x X}}. Then inf {f(x : x X, g(x 0, h(x = 0} = sup {(u, v :u 0} (3.2 Furthermore, if the inf is finite, then sup {(u, v : u 0} is achieved at (, with 0. If the inf is achieved at x0, then t g(x0 = 0. Proof: Let = inf {f(x : x X, g(x 0, h(x = 0}. By assumption <. If = -, then by corollary 3.8, to weak duality theorem, sup {(u, v : u 0} = -. Therefore (3.2 holds true.

33 Hence, suppose <. Consider the following system: f(x < 0 g(x 0 h(x = 0 x X By definition of, this system has no solution. Hence, from lemma 3.11, there exists a non-zero vector (u 0, u, v with (u 0, u 0 such that u 0[f(x - ] + u t g(x + v t h(x 0 for all x X (3.3 To show that u0 > 0. If u0 = 0, by assumption, there exists an X such that g( < 0 and h( = 0. Substituting in (3.3, it follows u t g( 0. Since g( < 0 and u 0, u t g( 0 is possible only if u = 0. But from (3.3, u 0 = 0 and u = 0, implies that v t h(x 0 for all x X. Further, since 0 int h(x, we can pick an x X such that h(x = -v, > 0. Therefore, 0 v t h(x = - 2 v = 0 Thus, we have shown that u0 = 0 implies that (u0, u, v = 0, which is impossible. Hence u 0 > 0. Dividing (3.3 by u 0 and denoting u/u 0 and v/u 0 by and, respectively, we get f(x + t g(x + t h(x for all x X (3.4 This shows that (, = inf {f(x + t g(x + t h(x : x X}. In view of Weak duality theorem, it is then clear that (, =. and (, solves the dual problem. Now, suppose that x 0 is an optimal solution to the primal problem, i.e, x 0 X, g(x 0 0, h(x 0 = 0, and f(x 0 =. From (3.4, let x = x 0, we get t g(x 0 0. Since 0 and g(x0 0, we get t g(x0 = 0. Definition 3.13 (Saddle Point Criteria: Given the primal problem P, define the Lagrangian function (x, u, v = f(x + u t g(x + v t h(x A solution (x0,, is called a saddle point of the Lagrangian function if x0 X, 0, and

34 (x 0, u, v (x 0,, (x,, for all x X, and all (u, v with u 0 (3.5 Observe that x0 minimizes over X when (u, v is fixed at (,, and that (, maximizes over all (u, v with u 0 when x is fixed at x 0. The following result characterizes a saddle point solution and shows that its existence is a necessary and sufficient condition for the absence of a duality gap. Theorem 3.14(Saddle Point Optimality and Absence of a Duality Gap: A solution (x0,, with x0 X and 0 is a saddle point for the Lagrangian function (x, u, v = f(x + u t g(x + v t h(x if and only if a. (x 0,, = minimum { (x,, : x X} b. g(x0 0, h(x0 = 0, and c. t g(x0 = 0 Moreover, (x0,, is a saddle point if and only if x0 and (, are, respectively, optimal solutions to the primal and dual problems P and D with no duality gap, i.e, with f(x 0 = (,. Proof: Suppose that (x0,, is a saddle point for the Lagrangian function. By definition, condition (a is true. Again by definition 3.13,(first inequality f(x 0 + t g(x 0 + t h(x 0 f(x 0 + u t g(x 0 + v t h(x 0 for all (u, v, u 0 (3.6 (u - t g(x 0 + (v - t h(x 0 0 As u 0 and v is arbitrary, we must have g(x0 0 and h(x0 = 0. If u = 0 in (3.6; then t g(x 0 0 Next, 0 and g(x0 0 implies t g(x0 0. Therefore, t g(x0 = 0 Thus, conditions (a, (b, and (c hold. Conversely, suppose that we are given (x 0,, with x 0 X and 0 such that conditions (a, (b, and (c hold. Then, by property (a (x0,, (x,, for all x X Furthermore,

35 (x 0,, = f(x 0 + t g(x 0 + t h(x 0 = f(x0 [ t g(x0 = 0, h(x0 = 0] f(x0 + u t g(x0 + v t h(x0 [g(x0 0 and h(x0 = 0] = (x 0, u, v for all (u, v with u 0 Thus we get, (x0, u, v (x0,, (x,, Hence, (x0,, is a saddle point. Now, to prove the second part of the theorem, suppose (x 0,, is a saddle point. By property (b, x 0 is feasible to problem P. Since 0, we also have that (, is feasible to D. Moreover, by properties (a, (b and (c (, = (x 0,, = f(x 0 + t g(x 0 + t h(x 0 = f(x0 [ t g(x0 = 0, h(x0 = 0] By corollary 3.7, to Weak duality theorem, x0 and (, solve P and D, respectively, with no duality gap. Now, suppose that x 0 and (, are optimal solutions to problems P and D, respectively, with f(x0 = (,. Hence we have x0 X, g(x0 0, h(x0 = 0, and 0, by primal dual feasibility. Thus, (, = min {f(x + t g(x + t h(x : x X} f(x 0 + t g(x 0 + t h(x 0 = f(x0 + t g(x0 f(x0 t g(x 0 = 0 [f(x 0 = (, ] and (x0,, = f(x0 = (, = min { (x0,, : x X} Hence, properties (a, (b, and (c hold. x0 X and 0 imply that (x0,, is a saddle point. Corollary 3.15: Suppose that X, f and g are convex and that h is affine; i.e, h is of the form h(x = Ax b. Further, suppose 0 int h(x and that there exists an X with g( < 0 h( = 0. If x0 is an optimal solution to the primal Problem P, then there exists a vector (, with 0, such that (x 0,, is a saddle point.

36 Proof: By Strong duality theorem, there exists an optimal solution (,, 0 to problem D such that f(x0 = (,. Hence, by theorem 3.14, (x0,, is a saddle point solution. The dual optimal value is given by sup inf (,: [ (,, ] If we interchange the other of optimization, we get inf sup [ (,, ] (,: But the Sup of (x, u, v = f(x + u t g(x + v t h(x over (u, v with u 0 is infnity, unless g(x 0 and h(x = 0, whence it is f(x. Hence which is the primal optimal value. inf sup [ (,, ] (,: inf {f(x: g(x 0, h(x = 0, x X} Hence we see that the primal and dual objective values match at optimality if and only f the interchange of the foregoing inf and sup operations leaves the optimal values unchanged. By above theorem assuming that an optimum exists, this occurs if and only if there exists a saddle point (x 0,, for the Lagrangian function. Relationship Between B the Saddlepoint Criteria and the KKT Conditions Theorem 3.16: Let S = {x : g(x 0, h(x = 0}, and consider Problem P to minimize f(x subject to x S. 1. Suppose that x 0 S satisfies the KKT conditions, i.e, there exists 0 and such that f(x0 + g(x0 t + h(x0 t = 0 (3.7 t g(x 0 = 0 2. Suppose that f, g i for i I are convex at x 0, where I = {i : g i(x 0 = 0}.

37 3. Further suppose if i 0, then h i is affine. Then (x0,, is a saddle point for the Lagrangian function (x, u, v = f(x + u t g(x + v t h(x Conversely, Suppose that (x0,, with x0 int X and 0 is a saddle point solution. Then, x 0 is a feasible to problem P and, furthermore, (x 0,, satisfies the KKT conditions specified by (3.7. Proof: Suppose that (x0,,, with x0 S and 0 satisfies the KKT conditions specified by (3.7. By convexity at x 0 of f and g i, i I, and since h i is affine for i 0, we get for all x X f(x f(x0 + f(x0 t (x x0 (3.8a g i(x g i(x 0 + g i(x 0 t (x x 0 for i I (3.8b h i(x = h i(x 0 + h i(x 0 t (x x 0 for i = 1,,l, i 0 (3.8c Multiplying (3.8b by i 0 and (3.8c by i, and adding all to (3.8a, f(x + igi(x + ihi(x f(x0 + igi(x0 + ihi(x0 + { f(x 0 t + i g i(x 0 t + i h i(x 0 t }(x x 0 f(x + ig i(x + ih i(x f(x 0 + ig i(x 0 + ih i(x 0 [use (3.7] i.e., (x,, (x 0,, for all x X Also, since g(x0 0, h(x0 = 0 and t g(x0 = 0 f(x0 + igi(x0 + ihi(x0 f(x0 + igi(x0 + ihi(x0 i.e., (x0, u, v (x0,, for all (x0, u, v with u 0. Thus (x 0, u, v (x 0,, (x,, i.e., (x0,, satisfies the saddle point conditions.

38 For converse, suppose (x 0,, with x 0 int X and 0 is a saddle point solution. Since (x0, u, v (x0,, for all (u, v with u 0 By theorem 3.14, g(x 0 0, h(x 0 = 0 and t g(x 0 = 0 This shows that x0 is feasible to problem P. Since (x 0,, (x,, for all x X, than x 0 solves the problem to minimize (x,, subject to x X. Since x 0 int X, then x (x 0,, = 0 i.e, f(x 0 + g(x 0 t + h(x 0 t = 0 Remark 3.17: The theorem shows that if x0 is a KKT point then under certain convexity assumptions, the Lagrangian multipliers in the KKT conditions also serve as the multipliers in the saddle point criteria. Conversely, the multipliers in the saddle point conditions are the Lagrangian multipliers of the KKT conditions. Properties of the Dual Function: In view of theorem 3.12 & 3.14, it would be possible to solve the primal problem indirectly by solving the dual problem. For this, we need to examine the properties of the dual function. We shall assume that the set X is compact. This will simplify the proofs of several of the theorems. Note that this assumption is not unduly restrictive. If X is not bounded, one could add suitable lower and upper bounds on the variables such that the feasible region is not affected. For convenience we shall also combine the vectors u and v as w and the functions g and h as. i.e., w = (u, v t, =(g, h t.

39 First we show that is concave Theorem 3.18: Let X be a non-empty compact set in R n, and let f : R n R and : R n R m+l be continuous. Then, defined by (w = inf {f(x + w t (x : x X} is concave over R m+l m+l. Proof: Since f and are continuous and X is compact is finite everywhere on R m+l m+l. Let w1, w2 R m+l and (0, 1, we then have [w 1 + (1 - w 2] = inf {f(x + [w 1 + (1 - w 2] t (x : x X} = inf {[f(x + w t 1 (x] + (1 - [f(x + w 2t (x] : x X} inf {f(x+w1 t (x: x X}+(1 inf{f(x + w2 t (x: x X} = (w 1 + (1 - (w 2 since, inf f(x + inf g(x inf {f(x + g(x} Therefore, is concave. Remark : Since is concave, by theorem 1.19; a local optimal of is also global optimum. Differentiability of : It will convenient to introduce the following set of optimal solutions to the Lagrangian dual sub-problem X(w = {y : y minimizes f(x + w t (x over x X} The differentiability of at any given point depends on the element of X (. Theorem 3.21 below shows that if the set X ( is a singleton then is differentiable at. Lemma 3.20: Let X be a non-empty compact set in R n and f : R n R and : R n R m+l m+l be continuous. Let R m+l, and suppose that X ( is the singleton {x 0}. Suppose that w k, and let x k X(w kfor each k. Then x k x 0. Proof: By contradiction, suppose that wk, xk X(wk and > > 0 for k K, where K is some index set.

40 Since X is compact, then the sequence {x k} K has a convergent subsequence {xk}k, with limit y in X. Note that > 0, and hence y and x0 are distinct. Furthermore, for each w k with k K we have f(x k + w k t (x k f(x 0 + w k (x 0 Taking the limit as k in K and note that xk y, wk, and that f and are continuous, it follows that f(y + t (y f(x 0 + t (x 0 Therefore, y X(, contradicting the assumption that X( is a singleton. Theorem 3.21: Let X be non-empty compact set in R n, and let f : R n R, and : R n R m+l be continuous. Let R m+l, and suppose that X( is the singleton {x 0}. Then, is differentiable at with gradient ( = (x 0. Proof: Since f and are continuous and X is compact, then, for any given w, there exists an xw X(w. From the definition of, the following two inequalities hold true: (w ( f(x 0 + w t (x 0 f(x 0 - t (x 0 = (w - t (x 0 ( (w f(xw + t (xw f(xw - w t (xw = (- w t (xw From (3.9 and (3.10 and the Schwartz inequality, it follows that This further implies that 0 (w ( (w - t (x 0 (w - t [(x w - (x 0] 0 ((( ( - - w (x w - (x 0 - (x w (x 0 (3.9 (3.10 (3.11 As w, then, by lemma 3.20, xwx0 and by the continuity of, (xw(x0. Therefore, from (3.11, we get lim ((( ( Hence, is differentiable at with gradient (x 0. Note 3.22: As is concave (theorem 3.19, by theorem 1.14, is subdifferentiable i.e., it has subgradients. We shall show that these subgradients characterize the direction of ascent. = 0.

41 Theorem 3.23: Let X be a non-empty compact set in R n, and let f : R n R and : R n R m+l be continuous so that for any R m+l, X( is not empty. If x0 X(, then, (x0 is a sub-gradient of at. Proof: Since f and are continuous and X is compact, X( for any R m+l and let x 0 X(. Then, (w = inf {f(x + w t (x : x X} f(x0 + w t (x0 = f(x 0 + (w - t (x 0 + t (x 0 = ( + (w - t (x 0 Therefore, (x0 is a subgradient of at. m+l, Example 3.24: Consider the following primal Problem: Minimize -x1 x2 Subject to x 1 + 2x x 1, x 2 = 0, 1,2, or 3 Let g(x1, x2 = x1 + 2x2 3 and X = {(x1, x2 : x1, x2 = 0, 1, 2, or 3}, so that = 16. Then, the dual function is given by (u = inf {- x 1 x 2 + u(x 1 + 2x 2 3 : x 1, x 2 = 0, 1, 2, or 3} = inf {-3u, -1-u,-2+u, -3+3u, -1-2u, -2, -3+2u, -4+4u, -2-u, 3+u,4+3u, -5+5u, -3, -4+2u, -5+4u, -6+6u} Figure 3.7- Graph of dual function

42 and hence, (u = Now let =. In order to find a subgradient of at, consider the following subproblem: Minimize -x 1 x 2 + (x1 + 2x2 3 Subject to x 1, x 2 = 0, 1, 2, or 3 Note that the set X( of optimal solutions to the above problem is {(3, 0, (3, 1, (3, 2, and (3, 3}. Thus, from theorem 3.23, g(0, 3 = 0, g(3, 1 = 2, g(3, 2 = 4, and g(3, 3 = 6 are subgradients of at. Note, however, that is also a subgradient of at, but can not be represented as g(x0 for any x0 X(. Note 3.25: From the above example, it is clear that theorem 3.23, gives only a sufficient characterization of subgradients. Theorem 3.26: Let X be a non-empty compact set in R n, and let f : R n R and : R n R m+l be continuous. Let, d R m+l. Then, the directional derivative of at in the direction d satisfies (; d d t (x0 for some x0 X( Proof roof: Consider + kd, where k 0 +. For each k, there exists an xk X( + kd; and since X is compact, there is a convergent subsequence {x k} K with limit x 0 in X. Given an x X, note that f(x +( + kd t (x f(xk + ( + kd t (xk for each k K. Taking the limit as k, it follows that f(x + t (x f(x 0 + t (x 0 i.e., x 0 X(. Furthermore, by the definition of ( + kd and (, we get ( + kd - ( = f(xk + ( + k d t (xk - ( = f(x k + t (x k - ( + kd t (x k k d t (x k [ f(x k + t (x k - ( 0]

43 The above inequality holds true for each k K. Note that x kx 0 as k foe each k K, we get, By lemma 1.17, exists. ( lim ( d t (x0 (; d = lim ( ( Corollary 3.27: Let ( be the collection of sub-gradients of at, and suppose that assumptions of the theorem hold true. Then, (; d = inf {d t : (} Proof: Let x 0 X(. By theorem 3.23, (x 0 (; and, hence, theorem 3.26, implies (; d inf { d t : (} Now let (. Since is concave ( + d - ( d t Dividing by > 0 and taking the limit as 0 + Since this is true for each (. (; d d t (; d inf {d t : (} Thus (3.12 & (3.13 imply the result. (3.12 (3.13 Theorem 3.28: Let X be a nonempty compact set in R n, and let f : R n R, and : R n R m+l be continuous. Then, is a subgradient of at R m+l if and only if belongs to the convex hull of {(y : y X(}. Proof: Write = {(y : y X(} Let H( be the convex hull of. By theorem 3.23, (; and since ( is convex, H( (. As X is compact and is continuous, is compact. Furthermore, the convex hull of a compact set is closed. Therefore, H( is a closed set.

44 We shall now show that H( (. By contradiction, suppose ( but H(. By theorem 1.15, there exists a scalar and a nonzero vector d such that d t (y for all y X( (3.14 d t < (3.15 By theorem 3.26, there exists a y X( such that (; d d t (y and by (3.14. We have (; d But the corollary 3.27, and (3.15 gives (; d = inf {d t : (} d t < which is a contradiction. Therefore, H(, and ( = H(. Example 3.29: Consider the problem (Example 3.10 Minimum f(x = -2x1 + x2 Subject to h(x = x1 + x2 3 = 0 (x 1, x 2 X Where X = {(0, 0, (0, 4, (4, 4, (4, 0, (1, 2, (2, 1} The dual function (v, v R is as shown in figure 3.5. Note That is differentiable for all v except for v = -1 and v = 2. Consider v = 2, for example. The set X(2 is given by set of alternative optimal solutions to the problem (2 = min {3x 2 6 : (x 1, x 2 X} Hence X(2 = {(0, 0, (4, 0}, with (2 = -6. By theorem 3.23, the subgradients of the form (x0 for x0 X(2 are h(0, 0 = -3 and h(4, 0 = 1. Observe that, in Figure 3.5, these values are the slopes of two affine segments defining the graph of that are incident at the point (v, (v = (2, -6. Therefore, as in theorem 3.28, the set of subgradients of at v = 2, which is given by the slopes of the two affine supports for the hypograph of, is precisely [-3, 1], the set of convex combinations of -3 and 1. Example 3.30: Consider the following primal problem Minimize -(x1-4 2 (x2-4 2 Subject to x 1 3 0

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R