Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum point. If the sufficiency condition is not met, the point may not be an isolated minimum point. Comments on Use of Optimality Conditions. Always use the standard for of the NLP problem Make sure to check all the KKT conditions Check regularity of the candidate solution points Review for midterm exam. Exam will be on Thursday, 9/28/06. Duality in Nonlinear Programming. Duality in Nonlinear Programming: Lagrangian duality, or local duality Equality Constrained Problem: x* is a local min for the equality constrained problem as well as the Lagrangian function at u*. Given optimum u, optimum x can be found by minimizing the Lagrangian. Given u in the neighborhood of its optimum, x found by minimizing the Lagrangian is also in the neighborhood of its optimum value. Thus, there is a unique correspondence between u and x; x = x(u) and x(u) is differentiable of u. Dual function. Definition of the dual function. Gradient and Hessian of the dual function. Local duality theorem. Maximize the dual function. Example problem Generalization to inequality constrained problem. Maximize the dual function subject to non-negativity of the dual variables. Strong duality theorem; weak duality theorem. Saddle points. Saddle point theorem. Example problem. Read: Duality in NLP. 1
Project #2; Report due today. 53:235 Applied Optimal Design HW#9: Solve Exercise 5.4 using KKT optimality conditions; Check duality assumption; calculate the dual function; maximize the dual function; show x* = x(u*), and f(x*) = phi (u*). No need to submit. 2
4.8 DUALITY IN NONLINEAR PROGRAMMING (J.S. Arora) 4.8.1 Introduction Given a nonlinear programming problem, there is another nonlinear programming problem closely associated with it. The former is called the primal problem, and the latter is called the Lagrangian dual problem, or simply dual problem. Under certain convexity assumptions, the primal and dual problems have equal optimal cost values and therefore it is possible to solve the primal problem indirectly by solving the dual problem. As a by-product of one of the duality theorems, we obtain the saddle point necessary optimality conditions that are explained later. In recent years, duality has played a very important role in development of optimization theory and numerical methods. Development of the duality theory requires assumptions about convexity of the problem. However to be broadly applicable, the theory should require a minimum of convexity assumptions. This leads to the concept requiring local convexity and to the local duality theory. In this section, we shall present only the local duality theory and discuss its computational aspects. The theory can be used to develop computational methods for solving optimization problems. We shall see later that it can be used to develop the so-called multiplier or augmented Lagrangian methods. 4.8.2 Local Duality 4.8.2.1 EQUALITY CONSTRAINT CASE. For sake of developing the local duality theory, we consider the equality-constrained problem first: Problem PE: Minimize f(x), x R n (4.8.1) Subject to g i (x) = 0; i = 1 to p (4.8.2) Later on we will extend the theory to both equality and inequality constrained problems. The theory we are going to present is sometimes called strong duality or Lagrangian duality. We will assume that f and g i C 2, i = 1 to p. Let x * be a local minimum of the Problem PE that is also a 3
regular point of the constraint set. Then there exists a unique Lagrange multiplier vector u * R p such that x f(x * ) + x g(x * )u * = 0 (4.8.3) where x g is an n p matrix whose columns are the gradients of the constraints. Also Hessian of the Lagrange function x 2 L(x *,u * ), where L(x,u) = f(x) + (u,g(x)) (4.8.4) must be at least positive semidefinite (second order necessary condition) on the tangent subspace Or, M = {y R n ( x g i (x * ),y) = 0; i = 1 to p, y 0} (4.8.5) (y, x 2 L(x *,u * )y) 0, y 0, y M (4.8.6) Now, we introduce the assumption that x 2 L(x *,u * ) is actually positive definite; i.e., (y, x 2 L(x *,u * )y) > 0, for all y R n (4.8.7) This assumption is necessary for development of the local duality theory. The assumption guarantees that the Lagrangian of Eq. (4.8.4) is locally convex at x *. This also satisfies the sufficiency condition for x * to be an isolated local minimum of Problem PE. With this assumption, the point x * is not only a local minimum of the Problem PE, it is also a local minimum for the unconstrained problem: Minimize L(x,u * ) = f(x) + (u *,g(x)) (4.8.8) where u * is a vector of Lagrange multipliers at x *. The necessary and sufficient conditions for the above unconstrained problem are the same as for the constrained Problem PE (with x 2 L(x *,u * ) being positive definite). In addition for any u sufficiently close to u * the Lagrange function f(x) + (u,g(x)) will have a local minimum at a point x near x *. Now we shall establish the condition that x(u) exists and is a differentiable function of u. The Karush-Kuhn-Tucker necessary condition is x L(x,u) x f + ( x g)u = 0 (4.8.9) 4
Since x 2 L(x *,u * ) is positive definite, it is nonsingular. Also because of positive definiteness, x 2 L(x,u) is nonsingular in a neighborhood of (x *,u * ). This is a generalization of a theorem from calculus: if a function is positive at a point, it is positive in a neighborhood of that point. x 2 L(x,u) is also the Jacobian of the necessary conditions of Eq. (4.8.9) with respect to x. Therefore, Eq. (4.8.9) has a solution x near x * when u is near u *. Thus, locally there is a unique correspondence between u and x through solution of the unconstrained problem Minimize L(x,u) = f(x) + (u,g(x)) (4.8.10) Furthermore, for a given u, x(u) is a differentiable function (by the Implicit Functions Theorem of calculus). The necessary condition for the problem (4.8.10) can be written as x f(x) + ( x g(x))u = 0 (4.8.11) and x 2 L(x,u) is positive definite as x 2 L(x *,u * ) is positive definite. Def. 4.8.1 (Dual Function): Near u *, we define the dual function φ by the equation φ(u) = min x [f(x) + (u,g(x))] = min x L(x,u). (4.8.12) In the above definition, the minimum is taken locally with respect to x near x *. With this definition of the dual function we can show that locally the original constrained Problem PE is equivalent to unconstrained local maximization of the dual function φ with respect to u. Thus, we can establish equivalence between a constrained problem in x and an unconstrained problem in u. To establish the duality relation, we must prove two lemmas. Lemma 4.8.1: The dual function φ(u) has gradient u φ(u) = g(x(u)) (4.8.13) Proof: Let x(u) represent a local minimum for the Lagrange function L(x) = f(x) + (u,g(x)) (4.8.14) Therefore, the dual function can be explicitly written from Eq. (4.8.12) as φ(u) = f(x(u)) + (u,g(x(u))) (4.8.15) Therefore, 5
u φ(u) dφ du = φ + u dx φ = g(x(u)) + du x dx L du x (4.8.16) But L/ x in Eq. (4.8.16) is zero because x(u) minimizes the Lagrange function of Eq. (4.8.14). This proves the result of Eq. (4.8.13). Lemma 4.8.1 is of extreme practical importance, since it shows that the gradient of the dual function is quite simple to calculate. Once the dual function is evaluated by minimization with respect to x, the corresponding g(x), which is the gradient of φ(u), can be evaluated without any further calculation. Lemma 4.8.2: The Hessian of the dual function is 2 u φ(u) = - x gt (x) [ 2 x L(x)]-1 x g(x) (4.8.17) Proof: By Lemma 4.8.1, 2 u φ(u) = u ( u φ) = u g(x(u)) = u x x g(x) (4.8.18) To calculate u x, we observe that x L(x,u) x f(x) + x g(x)u = 0 (4.8.19) where L(x,u) is defined in Eq. (4.8.14). Differentiating Eq. (4.8.19) with respect to u, u ( x L) x g(x) T + u x 2 x L(x) = 0 u x = - x g(x) T [ 2 x L(x)]-1. (4.8.20) Substituting Eq. (4.8.20) into Eq. (4.8.18), we obtain the result of Eq. (4.8.17) that was to be proved. Since [ 2 x L(x)]-1 is positive definite, and since x g(x) is of full column rank near x *, we have 2 u φ(u), a p p matrix (Hessian of φ) to be negative definite. This observation and the Hessian of φ play dominant role in analysis of dual methods. Theorem 4.8.1 (Local Duality Theorem): Consider the problem Minimize f(x) subject to g(x) = 0 6
Let (i) x * be a local minimum, (ii) x * be a regular point, (iii) u * be the Lagrange multipliers at x *, and (iv) x 2 L(x *,u * ) be positive definite. Then the dual problem Maximize φ(u) has a local solution at u * with x * = x(u * ). The maximum value of the dual function is equal to the minimum value of f(x); i.e., φ(u * ) = f(x * ) Proof: It is clear that x * = x(u * ) by definition of φ. Now at u *, we have Lemma 4.8.1: u φ(u * ) = g(x) = 0 and by Lemma 4.8.2 the Hessian of φ is negative definite. Thus, u * satisfies the first order necessary and second order sufficiency conditions for an unconstrained maximum point of φ. Substituting u * in the definition of φ of Eq. (4.8.15), φ(u * ) = f(x(u * )) + (u *,g(x(u * ))) = f(x * ) + (u *,g(x * )) = f(x * ) which was to be proved. 4.8.2.2 INEQUALITY CONSTRAINT CASE. Consider the inequality-constrained problem: Problem P Minimize f(x) Subject to x S S = {x R n g i (x) = 0, i = 1 to p; g i (x) 0; i = p+1 to m} (4.8.21) Define the Lagrange function as L(x,u) = f(x) + (u,g(x)) with u i 0, i > p (4.8.22) Dual function for the Problem P is defined as φ(u) = min x Dual problem is defined as Maximize φ(u) L(x,u); u i 0, i > p (4.8.23) 7
Subject to u i 0, i > p (4.8.24) Theorem 4.8.2 (Strong Duality Theorem). Let (i) x * be a local minimum of the Problem P, (ii) x * be a regular point, (iii) x 2 L(x * ) be positive definite, and (iv) u * be the Lagrange multipliers at the optimum point x *. Then u * solves the dual problem defined in Eq. (4.8.24) with f(x * ) = φ(u * ) and x * = x(u * ). If the assumption of positive definiteness of x 2 L(x * ) is not made, we get the weak duality theorem. Theorem 4.8.3 (Weak Duality Theorem). Let x be a feasible solution to Problem P and let u the feasible solution for the dual problem defined in Eq. (4.8.24); i.e., g i (x) = 0, i = 1 to p; g i (x) 0, i = p + 1 to m and u i 0, i = p + 1 to m. Then φ(u) f(x). Proof: By definition φ(u) = min x L(x,u) = min x [f(x) + (u,g(x))] f(x) + (u,g(x)) f(x) since u i 0, i = p + 1 to m; g i (x) 0, i = p + 1 to m; and g i (x) = 0, i = 1 to p. From the Theorem 4.8.3, we obtain the following results: 1. Minimum [f(x) with x S] Maximum [φ(u) with u i 0, i = p + 1 to m] 2. If f(x * ) = φ(u * ) with u i 0, i = p + 1 to m and x * S, then x * and u * solve the primal and dual problems, respectively. 3. If Minimum [f(x) for x S] = -, then the dual is infeasible, and vice versa (i.e., if dual is infeasible, the primal is unbounded). 4. If Maximum [φ(u); u i 0, i > p] =, then the primal problem has no feasible solution, and vice versa (i.e., if primal is infeasible, the dual is unbounded). 8
Lemma 4.8.3 (Lower Bound for Primal Cost Function): Let u R m. Then for any u with u i 0, i = p + 1 to m φ(u) f(x * ) Proof: φ(u) maximum φ(u); u i 0, i = p + 1 to m = f(x) The above Lemma is quite useful for practical applications. It tells us how to find a lower bound on the optimal primal cost function. Dual cost function for arbitrary u i, i = 1 to p and u i 0, i = p + 1 to m provides a lower bound for the cost function. For any x S, f(x) provides an upper bound for the cost function. Def. 4.8.2 (Saddle Points): Let L(x,u) be the Lagrange function with u R m. L has a saddle point at (x *,u * ) subject to u i 0, i = p + 1 to m if L(x *,u) L(x *,u * ) L(x,u * ) holds for all x near x * and u near u * with u i 0 for i = p + 1 to m. Theorem 4.8.4 (Saddle Point Theorem). Consider the NLP problem: Minimize f(x) with x S. Let f and g i C 2, i = 1 to m and let L(x,u) be defined as L(x,u)= f(x) + (u,g(x)) Let L(x *,u * ) exist with u i 0, i = p + 1 to m. Also let 2 x L(x *,u * ) be positive definite. Then x * satisfying a suitable constraint qualification is a local minimum of NLP if and only if (x *,u * ) is a saddle point of the Lagrangian, i.e., L(x *,u) L(x *,u * ) L(x,u * ) for all x near x * and all u near u * with u i 0 for i = p + 1 to m. See Bazaraa and Shetty (1979) p. 185 for proof. Example: Consider the following problem in two variables (Ref. ): min f = x 1 x2 2 2 1 2 = subject to ( x 3) + x 5 9
Let us first solve the problem using the optimality conditions first. The Lagrangian for the problem is defined as 2 2 = 1 2 1 2 ] L x x + u[ ( x 3) + x 5 The first order necessary conditions are -x 2 + (2x 1-6)u = 0 (b) -x 1 + 2x 2 u = 0 (c) (a) together with the equality constraint. These equations have a solution x1 = 4, x2 = 2, u = 1, f = 8 The Hessian of the Lagrangian is (d) x 2 L = 2 1 1 2 Since this is positive definite, we conclude that the solution obtained is an isolated local minimum. Since x 2 L(x * ) is positive definite, we can apply the local duality theory near the solution. (e) Define a dual function as φ( u ) min = x L( x,u) (f) Solving Eqs. (b) and (c), we get x 1 and x 2 in terms of u as 2 12u x 1 = (g) 2 4 u 1 6u x 2 = (h) 2 4 u 1 provided 4u 2 1 0. Substituting Eqs. (g) and (h) into Eq. (f), the dual function is given as φ(u) = 3 5 4u + 4u 80u 2 2 ( 4u 1) (i) valid for u ± 1/2. This φ has a local maximum at u * = 1. Substituting u = 1 in Eqs. (g) and (h), we get the same solution as in Eqs. (d). Note that φ = 8 (=f * ). 10