Nonlinear Optimization Etienne de Klerk (UvT)/Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos Course WI3031 (Week 4) February-March, A.D. 2005 Optimization Group 1
Outline for today NECESSARY optimality conditions for convex constrained optimization; The Farkas Lemma; The Lagrange function; Saddle points of the Lagrange function; The Karusch Kuhn Tucker (KKT) optimality conditions revisited; Examples. In the course notes: 2.2.3 till the end of Chapter 2. Optimization Group 2
The generic constrained convex problem (CO) (CO) min f(x) s.t. g j (x) 0, j = 1,, m x C, where C R n is a convex set; f, g 1,, g m are convex functions on C (or on an open set that contains the set C); The set of feasible solutions will be denoted by F, hence F = {x C g j (x) 0, j = 1,, m}. Optimization Group 3
Slater points Recall from last week: Definition: A vector (point) x 0 C 0 is called a Slater point of (CO) if g j (x 0 ) < 0, for all j where g j is nonlinear, g j (x 0 ) 0, for all j where g j is linear. If a Slater point exists we say that (CO) is Slater regular, or (CO) satisfies the Slater condition, or (CO) satisfies the Slater constraint qualification. Optimization Group 4
Slater points: singular constraints Some constraint functions g j (x) might take the value zero for all feasible points. Such constraints are called singular while the others are called regular. We define the index sets J r for the singular and J r for the singular sonstraints: J s = {j J g j (x) = 0 for all x F}, J r = J \ J s = {j J g j (x) < 0 for some x F}. Remark: Note, that if (CO) is Slater regular, then all singular constraints must be linear. Optimization Group 5
Ideal Slater points Definition: A point x C 0 is called an Ideal Slater point of (CO) if g j (x ) < 0 for all j J r, Lemma: If (CO) is Slater regular then there exists an ideal Slater point x F. NB: An ideal Slater point is in the relative interior of F. Optimization Group 6
Example Consider the optimization problem x 2 min f(x) s.t. x 2 1 + x2 2 4 x 1 x 2 2 x 2 1 C = R 2. F x 1 The point (1, 1) is a Slater point, but not an ideal Slater point. The point ( 3 2, 3 4 ) is an ideal Slater point. Optimization Group 7
Convex Farkas Lemma The number a is a lower bound for the optimal value of (CO) if and only if the inequality system has no solution. f(x) < a g j (x) 0, x C j = 1,, m (1) Lemma 2.22 (Farkas) If the inequality system (1) satisfies the Slater condition then it has no solution if and only if there exists a vector y = (y 1,, y m ) 0 such that f(x) + y j g j (x) a for all x C. (2) The systems (1) and (2) are called alternative systems, because exactly one of them has a solution. Optimization Group 8
Example: application of the Farkas lemma Let us consider the convex optimization problem (CO) min 1 + x s.t. x 2 1 0 x R. Then (CO) is Slater regular (why?) The optimal value is 0. Hence, the system 1 + x < 0 x 2 1 0 x R has no no solution. By the Farkas lemma there must exist a real number y 0 such that 1 + x + y ( x 2 1 ) 0, x R. Indeed, taking y = 1 2 we get g(x) = x + 1 + y(x 2 1) = 1 2 x2 + x + 1 2 = 1 2 (x + 1)2 0. Optimization Group 9
Exercise 2.9 Let A R m n and b R m. Exactly one of the following alternative systems (I) or (II) is solvable: or In this case, f(x) = b T x, (I) A T x 0, x 0, b T x < 0, (II) Ay b, y 0. g j (x) = (A T x) j = ( a j) T x, j = 1,, m where a j denotes column j of A, and C is the positive orthant (nonnegative vectors) of R m. Optimization Group 10
Langrangean The Lagrange function (or Langrangean) of (CO) is defined by: L(x, y) := f(x) + y j g j (x) where x C and y 0. Note that the Lagrangean is convex in x and linear in y. If x F then g j (x) 0 for each j. Since y 0 we then have m y j g j (x) 0. So L(x, y) f(x), x F, y 0. Equality holds if and only if m y j g j (x) = 0 which is equivalent to y j g j (x) = 0, j = 1,..., m. Another consequence is that for x F we have sup L(x, y) f(x). y 0 Optimization Group 11
Saddle points Definition A vector pair (x, y ), with x C and y 0 is called a saddle point of the Lagrange function L(x, y) if L(x, y) L(x, y ) L(x, y ), x C, y 0. Due to the definition of the Lagrangean this means f(x ) + y j g j (x ) f(x ) + y jg j (x ) f(x) + y jg j (x), x C, y 0, or, equivalently m y j g j (x ) m y j g j(x ), y 0 f(x ) + m y j g j(x ) f(x) + m y j g j(x), x C. N.B. Skip Lemma 2.26 in the book. It is false!! Optimization Group 12
Karush-Kuhn-Tucker Theorem m y jg j (x ) m y j g j(x ), y 0 f(x ) + m y j g j(x ) f(x) + m y j g j(x), x C. Theorem 2.27 The problem (CO) is given. Assume that the Slater regularity condition is satisfied. The vector x is an optimal solution of (CO) if and only if there is a vector y such that (x, y ) is a saddle point of the Lagrange function L. As we will see, the if part does not require regularity or convexity: a saddle point of the Lagrangian always corresponds to an optimal solution! The proof of the only if part will make clear that we need convexity and Slater regularity to ensure that the Lagrangian has a saddle point if (CO) has an optimal solution. A corollary of the proof of the if part is that for any saddle point (x, y ) we have the so called complementarity property: y j g j(x ) = 0, j = 1,..., m. Optimization Group 13
Karush-Kuhn-Tucker Theorem: if proof m y jg j (x ) m y j g j(x ), y 0 f(x ) + m y j g j(x ) f(x) + m y j g j(x), x C. Let (x, y ) be a saddle point. Then x C, y 0, and L(x, y) L(x, y ) L(x, y ) for all x C and all y 0. The first inequality gives y j g j (x ) y j g j(x ), y 0. The right hand side is fixed. If g j (x ) > 0 for some j, we can let the right hand side go to infinity, by taking y i = 0 for j i and y j. This contradiction makes clear that g j (x ) 0. Thus it follows that x F. Since y 0 and g j (x ) 0, we see that the rhs is 0. By taking y = 0 the lhs equals 0. Hence we must have y j g j(x ) = 0. The second inequality now gives that x is optimal since for x F C one has f(x ) f(x) + y j g j(x) f(x). Optimization Group 14
Karush-Kuhn-Tucker Theorem: only if proof Let us take an optimal solution x of (CO). Then the inequality system f(x) < f(x ), g j (x) 0 (j = 1,..., m), x C, is infeasible. By the Convex Farkas Lemma there must exist a y 0 such that f(x) + y j g j(x) f(x ), x C. Taking x = x we obtain m y j g j(x ) 0. The converse also holds, hence y j g j(x ) = 0. Consequently, f(x ) = L(x, y ). Now we have, for all x C and y 0, L(x, y) = f(x ) + y j g j (x ) f(x ) f(x) + proving that (x, y ) is a saddle point of L(x, y). y j g j(x) = L(x, y ). Optimization Group 15
Karush-Kuhn-Tucker (KKT) points Definition Let us assume that C = R n and the functions f, g 1,, g m are continuously differentiable functions. The vector (x, y ) R n+m is called a Karush Kuhn Tucker (KKT) point of (CO) if (i) g j (x ) 0, for all j = 1,..., m, (ii) 0 = f(x ) + y j g j(x ) (iii) y j g j(x ) = 0, for all j = 1,..., m (iv) y 0. NB: Saddle point of L KKT point of (CO) if the Slater condition holds. Optimization Group 16
Karush-Kuhn-Tucker (KKT) points (ctd.) Via Exercise 2.12 and 2.13 one can prove the following corollary of the Karush-Kuhn- Tucker theorem. Corollary 2.30 Let us assume that C = R n and the functions f, g 1,, g m are continuously differentiable convex functions and Slater regularity holds. There exists a KKT point (x, y ) if and only if x is an optimal solution of (CO). Optimization Group 17
Discussion Last week we proved that existence of a KKT point is a sufficient condition for optimality. Today we proved it is also a necessary condition if the Slater condition holds. Optimization Group 18
Exercise 3.3, parts i, ii, v instead of Exercise 2.15 We want to design a box with dimensions l b h such that the volume of the box is at least V, and the total surface area is minimal. min l,b,h Non-convex problem! Replace l by e x 1, etc: 2(lb + bh + lh), lbh V, l, b, h > 0. such that min 2(e x 1+x 2 + e x 2+x 3 + e x 1+x 3 ), x 1,x 2,x 3 x 1 + x 2 + x 3 ln(v ), x 1, x 2, x 3 R. Let f(x 1, x 2, x 3 ) = 2(e x 1+x 2 + e x 2+x 3 + e x 1+x 3 ) and g(x 1, x 2, x 3 ) = ln V (x 1 + x 2 + x 3 ). The transformed problem is of the form (CO) and satisfies Slater s regularity condition (why?). Optimization Group 19
Example (ctd.) KKT conditions: we are looking for some x F such that f(x) = y g(x) for some y 0, such that yg(x) = 0. 2 e x 1+x 2 + e x 1+x 3 e x 1+x 2 + e x 2+x 3 e x 1+x 3 + e x 2+x 3 = y 1 1 1 Solution to this system (KKT point): x 1 = x 2 = x 3 = 1 3 ln(v ), y = 4V 2/3 Original variables: l = e x 1 = V 1/3, b = e x 2 = V 1/3 and h = e x 3 = V 1/3. The optimal solution is the cube (l = b = h)! Optimization Group 20