# Date: July 5, Contents

Save this PDF as:

Size: px
Start display at page: ## Transcription

1 2 Lagrange Multipliers Date: July 5, 2001 Contents 2.1. Introduction to Lagrange Multipliers p Enhanced Fritz John Optimality Conditions p Informative Lagrange Multipliers p Pseudonormality and Constraint Qualifications..... p Exact Penalty Functions p Using the Extended Representation p Extensions Under Convexity Assumptions p Notes and Sources p. 65 1

2 2 Lagrange Multipliers Chap. 2 The optimality conditions developed in Chapter 1 apply to any type of constraint set. On the other hand, the constraint set of an optimization problem is usually specified in terms of equality and inequality constraints. If we take into account this structure, we can obtain more refined optimality conditions, involving Lagrange multipliers. These multipliers facilitate the characterization of optimal solutions and often play an important role in computational methods. They are also central in duality theory, as will be discussed in Chapter 3. In this chapter, we provide a new treatment of Lagrange multiplier theory. It is simple and it is also more powerful than the classical treatments. In particular, it establishes new necessary conditions that nontrivially supplement the classical Karush-Kuhn-Tucker conditions. It also deals effectively with problems where in addition to equality and inequality constraints, there is an additional abstract set constraint. The starting point for our development is an enhanced set of necessary conditions of the Fritz John type, which is given in Section 2.2. In Section 2.3, we provide a classification of different types of Lagrange multipliers, and in Section 2.4, we introduce the notion of pseudonormality, which unifies and expands the conditions that guarantee the existence of Lagrange multipliers. In the final sections of the chapter, we discuss various topics related to Lagrange multipliers, including their connection with exact penalty functions. 2.1 INTRODUCTION TO LAGRANGE MULTIPLIERS Let us provide some orientation by first considering a problem with equality constraints of the form minimize f(x) subject to h i (x) = 0, i = 1,..., m. We assume that f : R n R and h i : R n R, i = 1,..., m, are smooth functions. The basic Lagrange multiplier theorem for this problem states that, under appropriate assumptions, for a given local minimum x, there exist scalars λ 1,..., λ m, called Lagrange multipliers, such that f(x ) + λ i h i(x ) = 0. (2.1) These are n equations which together with the m constraints h i (x ) = 0, form a system of n + m equations with n + m unknowns, the vector x and the multipliers λ i. Thus, a constrained optimization problem can be transformed into a problem of solving a system of nonlinear equations.

3 Sec. 2.1 Introduction to Lagrange Multipliers 3 This is the traditional rationale for illustrating the importance of Lagrange multipliers. To see why Lagrange multipliers may ordinarily be expected to exist, consider the case where the h i are linear, so that h i (x) = a i x b i, i = 1,..., m, and some vectors a i and scalars b i. Then it can be seen that the tangent cone of the constraint set at the given local minimum x is T(x ) = {y a iy = 0, i = 1,..., m}, and according to the necessary conditions of Chapter 1, we have f(x ) T(x ). Since T(x ) is the nullspace of the m n matrix having as rows the a i, T(x ) is the range space of the matrix having as columns the a i. It follows that f(x ) can be expressed as a linear combination of the a i, so f(x ) + λ i a i = 0 for some scalars λ i. In the general case where the h i are nonlinear, the argument above would work if we could guarantee that the tangent cone T(x ) can be represented as T(x ) = {y h i (x ) y = 0, i = 1,..., m}. (2.2) Unfortunately, additional assumptions are needed to guarantee the validity of this representation. One such assumption, called regularity of x, is that the gradients h i (x ) are linearly independent; this will be shown in Section 2.4. Thus, when x is regular, there exist Lagrange multipliers. However, in general the equality (2.2) may fail, and there may not exist Lagrange multipliers, as shown by the following example. Example (A Problem with no Lagrange Multipliers) Consider the problem of minimizing subject to the two constraints f(x) = x 1 + x 2 h 1 (x) = (x 1 + 1) 2 + x = 0, h 2 (x) = (x 1 2) 2 + x = 0.

4 4 Lagrange Multipliers Chap. 2 x 2 h 2 (x) = 0 h 1 (x) = 0 f(x * ) = (1,1) h 2 (x * ) = (-4,0) 1 x * h 1 (x * ) = (2,0) 2 x 1 Figure Illustration of how Lagrange multipliers may not exist (cf. Example 2.1.1). The problem here is minimize f(x) = x 1 + x 2 subject to h 1 (x) = (x 1 + 1) 2 + x = 0, h 2 (x) = (x 1 2) 2 + x = 0, with a local minimum x = (0,0) (the only feasible solution). Here f(x ) cannot be expressed as a linear combination of h 1 (x ) and h 2 (x ), so there are no Lagrange multipliers. The geometry of this problem is illustrated in Fig It can be seen that at the local minimum x = (0,0) (the only feasible solution), the cost gradient f(x ) = (1,1) cannot be expressed as a linear combination of the constraint gradients h 1 (x ) = (2,0) and h 2 (x ) = ( 4,0). Thus the Lagrange multiplier condition f(x ) + λ 1 h 1 (x ) + λ 2 h 2 (x ) = 0 cannot hold for any λ 1 and λ 2. The difficulty here is that T(x ) = {0}, while the set {y h 1 (x ) y = 0, h 2 (x ) y = 0} is equal to {(0, γ) γ R}, so the characterization (2.2) fails. Fritz-John Conditions In summary, based on the preceding assertions, there are two possibilities at a local minimum x :

5 Sec. 2.1 Introduction to Lagrange Multipliers 5 (a) The gradients h i (x ) are linearly independent (x is regular). Then, there exist scalars/lagrange multipliers λ 1,..., λ m such that f(x ) + λ i h i(x ) = 0. (b) The gradients h i (x ) are linearly dependent, so there exist scalars λ 1,..., λ m, not all equal to 0, such that λ i h i(x ) = 0. These two possibilities can be lumped into a single condition: at a local minimum x there exist scalars µ 0, λ 1,..., λ m, not all equal to 0, such that µ 0 0 and µ 0 f(x ) + λ i h i (x ) = 0. (2.3) Possibility (a) corresponds to the case where µ 0 > 0, in which case the scalars λ i = λ i /µ 0 are Lagrange multipliers. Possibility (b) corresponds to the case where µ 0 = 0, in which case the condition (2.3) provides no information regarding the existence of Lagrange multipliers. Necessary conditions that involve a nonnegative multiplier µ 0 for the cost gradient, such as the one of Eq. (2.3), are known as Fritz John necessary conditions, and were first proposed in 1948 by John [Joh48]. These conditions can be extended to inequality-constrained problems, and they hold without any further assumptions on x (such as regularity). However, this extra generality comes at a price, because the issue of whether the cost multiplier µ 0 can be taken to be positive is left unresolved. Unfortunately, asserting that µ 0 > 0 is nontrivial under some commonly used assumptions, and for this reason, traditionally Fritz John conditions in their classical form have played a somewhat peripheral role in the development of Lagrange multiplier theory. Nonetheless, the Fritz John conditions, when properly strengthened, can provide a simple and powerful line of analysis, as we will see in the next section. Sensitivity Lagrange multipliers frequently have an interesting interpretation in specific practical contexts. In economic applications they can often be interpreted as prices, while in other problems they represent quantities with concrete physical meaning. It turns out that within our mathematical framework, they can be viewed as rates of change of the optimal cost as

6 6 Lagrange Multipliers Chap. 2 the level of constraint changes. This is fairly easy to show for the case of linear and independent constraints, as indicated in Fig When the constraints are nonlinear, the sensitivity interpretation of Lagrange multipliers is valid, provided some assumptions are satisfied. Typically, these assumptions include the linear independence of the constraint gradients, but also additional conditions involving second derivatives (see e.g., the textbook [Ber99] and the exercises). Inequality Constraints The preceding discussion can be extended to the case where there are both equality and inequality constraints. Consider for example the case of linear inequality constraints: minimize f(x) subject to a j x b j, j = 1,..., r. The tangent cone of the constraint set at a local minimum x is given by T(x ) = { y a j y 0, j A(x ) }, where A(x ) is the set of indices of the active constraints, i.e., those constraints that are satisfied as equations at x, A(x ) = {j a j x = 0}. The necessary condition f(x ) T(x ) can be shown to be equivalent to the existence of scalars µ j, such that f(x ) + µ j a j = 0, µ j 0, j = 1,..., r, µ j = 0, j A(x ). This follows from the characterization of the cone T(x ) as the cone generated by the vectors a j, j A(x ) (Farkas Lemma). The above necessary conditions, properly generalized for nonlinear constraints, are known as the Karush-Kuhn-Tucker conditions. Similar to the equality-constrained case, the typical method to ascertain existence of Lagrange multipliers for the nonlinearly-constrained general problem minimize f(x) subject to h i (x) = 0, i = 1,..., m, g j (x) 0, j = 1,..., r, (2.4)

7 Sec. 2.1 Introduction to Lagrange Multipliers 7 f(x * ) a x * + x a'x = b + b x * x a'x = b Figure Illustration of the sensitivity theorem for a problem involving a single linear constraint, minimize f(x) subject to a x = b, where a 0. Here, x is a local minimum and λ is a corresponding Lagrange multiplier. If the level of constraint b is changed to b + b, the minimum x will change to x + x. Since b + b = a (x + x) = a x + a x = b + a x, we see that the variations x and b are related by a x = b. Using the Lagrange multiplier condition f(x ) = λ a, the corresponding cost change can be written as cost = f(x + x) f(x ) = f(x ) x + o( x ) = λ a x + o( x ). By combining the above two relations, we obtain cost = λ b + o( x ), so up to first order we have λ = cost b. Thus, the Lagrange multiplier λ gives the rate of optimal cost decrease as the level of constraint increases. In the case where there are multiple constraints a i x = b i, i = 1,..., m,the preceding argument can be appropriately modified. In particular, assuming the a i are linearly independent, for any set for changes b i the system a i x = b i+ b i has a solution x + x, and we have cost = f(x + x) f(x ) = f(x ) x + o( x ) = λ i a i x + o( x ), and a i x = b i for all i, so we obtain cost = m λ i b i + o( x ).

8 8 Lagrange Multipliers Chap. 2 is to assume structure which guarantees that the tangent cone has the form T(x ) = { y h i (x ) y = 0, i = 1,..., m, g j (x ) y 0, j A(x ) } ; (2.5) a condition known as quasiregularity. This is the classical line of development of Lagrange multiplier theory: conditions implying quasiregularity (known as constraint qualifications) are established, and the existence of Lagrange multipliers is then inferred using Farkas Lemma. This line of analysis is insightful and intuitive, but has traditionally required fairly complex proofs to show that specific constraint qualifications imply the quasiregularity condition (2.5). A more serious difficulty, however, is that the analysis based on quasiregularity does not extend to the case where, in addition to the equality and inequality constraints, there is an additional abstract set constraint x X. Uniqueness of Lagrange Multipliers For a given local minimum, the set of Lagrange multiplier vectors, call it M, may contain more than one element. Indeed, if M is nonempty, it contains either one or an infinite number of elements. For example, for the case of equality constraints only, M contains a single element if and only if the gradients h 1 (x ),..., h m (x ) are linearly independent. Generally, if M is nonempty, it is a linear manifold, which is a translation of the nullspace of the n m matrix h(x ) = [ h 1 (x ) h m (x ) ]. For the mixed equality/inequality constrained case of problem (2.4), it can be seen that the set of Lagrange multipliers M = (λ, µ) µ 0, f(x ) + λ i h i (x ) + µ j g j (x ) = 0 is a polyhedral set, which (if nonempty) may be bounded or unbounded. What kind of sensitivity information do Lagrange multipliers convey when they are not unique? The classical theory does not offer satisfactory answers to this question, as indicated by the following example. Example Assume that f and the g j are linear functions, f(x) = c x, g j (x) = a jx b j, j = 1,..., r,

9 Sec. 2.1 Introduction to Lagrange Multipliers 9 and for simplicity assume that there are no equality constraints and that all the inequality constraints are active at a minimum x. The Lagrange multiplier vectors are the µ = (µ 1,..., µ r) such that µ 0 and c + µ 1a µ ra r = 0. In a naive extension of the traditional sensitivity interpretation, a positive (or zero) multiplier µ j would indicate that the jth inequality is significant (or insignificant ) in that the cost can be improved (or cannot be improved) through its violation. Figure 2.1.3, however, indicates that for each j, one can find a Lagrange multiplier vector where µ j > 0 and another Lagrange multiplier vector where µ j = 0. Furthermore, one can find Lagrange multiplier vectors where the signs of the multipliers are wrong in the sense that it is impossible to improve the cost while violating only constraints with positive multipliers. Thus the traditional sensitivity interpretation fails generically when there is a multiplicity of Lagrange multipliers. Exact Penalty Functions An important analytical and algorithmic technique in nonlinear programming involves the use of penalty functions, whereby the equality and inequality constraints are discarded and are replaced by additional terms in the cost function that penalize their violation. An important example for the mixed equality and inequality constraint problem (2.4) is the quadratic penalty function Q c (x) = f(x) + c ( hi (x) ) 2 ( + g + 2 j (x) ) 2, where c is a positive penalty parameter, and we use the notation g + j (x) = max{ 0, g j (x) }. We may expect that by minimizing Q c k(x) over X for a sequence {c k } of penalty parameters with c k, we will obtain in the limit a solution of the original problem. Indeed, convergence of this type can generically be shown, and it turns out that typically a Lagrange multiplier vector can also be simultaneously obtained (assuming such a vector exists); see e.g., [Ber99]. We will use these convergence ideas in various proofs, starting with the next section. The quadratic penalty function is not exact in the sense that a local minimum x of the constrained minimization problem is typically not a local minimum of Q c (x) for any value of c. A different type of penalty function is given by F c (x) = f(x) + c h i (x) + g j + (x),

10 10 Lagrange Multipliers Chap. 2 a 2 - c a 1 y a 3 a 4 Hyperplane with normal y x * Constraint set Figure Illustration of Lagrange multipliers for the case of a two-dimensional problem with linear cost and four linear inequality constraints (cf. Example 2.1.2). We assume that c and a 1,..., a 4 lie in the same halfspace, and c lies between a 1 and a 2 on one side and a 3 and a 4 on the other side, as shown in the figure. For (µ 1,..., µ 4 ) to be a Lagrange multiplier vector, we must have µ j 0 for all j, and c = µ 1 a 1 + µ 2 a 2 + µ 3 a 3 + µ 4 a 4. There exist exactly four Lagrange multiplier vectors with two positive multipliers/components and the other two components equal to zero. It can be seen that it is impossible to improve the cost by moving away from x, while violating the constraints with positive multipliers but not violating any constraints with zero multipliers. For example, when µ 1 = 0, µ 2 > 0, µ 3 > 0, µ 4 = 0, moving from x along a direction y such that c y < 0 will violate either the 1st or the 4th inequality constraint. To see this, note that to improve the cost while violating only the constraints with positive multipliers, the direction y must be the normal of a hyperplane that contains the vectors c, a 2, and a 3 in one of its halfspaces, and the vectors a 1 and a 4 in the complementary subspace (see the figure). There also exist an infinite number of Lagrange multiplier vectors where all four multipliers are positive, or any three multipliers are positive and the fourth is zero. For any one of these vectors, it is possible to find a direction y such that c y < 0 and moving from x in the direction of y will violate only the constraints that have positive multipliers. where c is a positive penalty parameter. It can be shown that x is typically a local minimum of F c, provided c is larger than some threshold value. The conditions under which this is true and the threshold value for c bear an intimate connection with Lagrange multipliers. Figure illustrates this connection for the case of a problem with a single inequality constraint. In

11 Sec. 2.1 Introduction to Lagrange Multipliers 11 particular, it can be seen that usually (though not always) if the penalty parameter exceeds the value of a Lagrange multiplier, x is also a local minimum of F c. F c (x) f(x) f(x) g(x) < 0 g(x) < 0 x * x x * x µ * g(x) g(x) (a) (b) Figure Illustration of an exact penalty function for the case of a onedimensional problem with a single inequality constraint. Figure (a) illustrates that, typically, if µ is a Lagrange multiplier and c > µ, then x is also a local minimum of F c (x) = f(x) + cg + (x). Figure (b) illustrates an exceptional case where the penalty function is not exact, even though there exists a Lagrange multiplier. In this case, g(x ) = 0, thus violating the condition of constraint gradient linear independence (we will show later that one condition guaranteeing the exactness of F c is that the constraint gradients at x are linearly independent). Here we have f(x ) = g(x ) = 0, so any nonnegative scalar µ is a Lagrange multiplier. However, it is possible that F c (x) does not have a local minimum at x for any c > 0 (if for example the downward order of growth of f exceeds the upward order of growth of g at x when moving from x towards smaller values). Our Treatment of Lagrange Multipliers Our development of Lagrange multiplier theory in this chapter differs from the classical treatment in a number of ways: (a) The optimality conditions of the Lagrange multiplier type that we will develop are sharper than the classical Kuhn-Tucker conditions (they include extra conditions, which may narrow down the field of candidate local minima). They are also more general in that they apply when in addition to the equality and inequality constraints, there is an additional abstract set constraint. (b) We will simplify the proofs of various Lagrange multiplier theorems by introducing the notion of pseudonormality, which serves as a connecting link between the major constraint qualifications and quasireg-

12 12 Lagrange Multipliers Chap. 2 ularity. This analysis carries through even in the case of an additional set constraint, where the classical proof arguments based on quasiregularity fail. (c) We will see in Section 2.3 that there may be several different types of Lagrange multipliers for a given problem, which can be characterized in terms of their sensitivity properties and the information they provide regarding the significance of the corresponding constraints. We show that one particular Lagrange multiplier vector, the one that has minimum norm, has nice sensitivity properties in that it characterizes the direction of steepest rate of improvement of the cost function for a given level of the norm of the constraint violation. Along that direction, the equality and inequality constraints are violated consistently with the signs of the corresponding multipliers. (d) We will make a connection between pseudonormality, the existence of Lagrange multipliers, and the minimization over X of the penalty function F c (x) = f(x) + c h i (x) + g j + (x), where c is a positive penalty parameter. In particular, we will show in Section 2.5 that pseudonormality implies that local minima of the general nonlinear programming problem (2.4) are also local minima (over just X) of F c when c is large enough. We will also show that this in turn implies the existence of Lagrange multipliers. Much of our analysis is based on an enhanced set of Fritz John necessary conditions that are introduced in the next section. E X E R C I S E S (Second Order Sufficiency Conditions) Assume that f, h, and g are twice continuously differentiable, and let x R n, λ R m, and µ R r satisfy x L(x, λ, µ ) = 0, h(x ) = 0, g(x ) 0, µ j 0, j = 1,..., r,

13 Sec. 2.2 Enhanced Fritz John Optimality Conditions 13 µ j = 0, j / A(x ), y 2 xxl(x, λ, µ )y > 0, for all y 0 such that h i (x ) y = 0, i = 1,..., m, g j (x ) y = 0, j A(x ). Assume also that µ j > 0, j A(x ). Then x is a strict local minimum of f subject to h(x) = 0, g(x) (Sensitivity Under Second Order Conditions) Let x and λ be a local minimum and Lagrange multiplier, respectively, satisfying the second order sufficiency conditions of Exercise 2.1.1, and assume that the gradients h i (x ), i = 1,..., m, g j (x ), j A(x ) are linearly independent. Consider the family of problems minimize f(x) subject to h(x) = u, g(x) v, (2.6) parameterized by the vectors u R m and v R r. Then there exists an open sphere S centered at (u, v) = (0,0) such that for every (u, v) S there is an x(u, v) R n and λ(u, v) R m, µ(u, v) R r, which are a local minimum and associated Lagrange multiplier vectors of problem (2.6). Furthermore, x( ), λ( ), and µ( ) are continuously differentiable in S and we have x(0, 0) = x, λ(0,0) = λ, µ(0, 0) = µ. In addition, for all (u, v) S, there holds u p(u, v) = λ(u, v), v p(u, v) = µ(u, v), where p(u, v) is the optimal cost parameterized by (u, v), p(u, v) = f ( x(u, v) ) (General Sufficiency Condition) Consider the problem minimize f(x) subject to x X, g j (x) 0, j = 1,..., r, where f and g j are real valued functions on R n and X is a given subset of R n. Let x be a feasible vector which together with a vector µ = (µ 1,..., µ r), satisfies µ j 0, j = 1,..., r, µ j = 0, j / A(x ), and minimizes the Lagrangian function L(x, µ ) over x X: x = arg min x X L(x, µ ). Then x is a global minimum of the problem.

14 14 Lagrange Multipliers Chap ENHANCED FRITZ JOHN OPTIMALITY CONDITIONS We will develop general optimality conditions for optimization problems of the form minimize f(x) (2.7) subject to x C, where the constraint set C consists of equality and inequality constraints as well as an additional abstract set constraint X: C = X { x h 1 (x) = 0,..., h m (x) = 0 } { x g 1 (x) 0,..., g r (x) 0 }. (2.8) Except for Section 2.7, where we introduce some convexity conditions, we assume in this chapter that f, h i, g j are smooth functions from R n to R, and X is a nonempty closed set. We will use frequently the tangent and normal cone of X at any x X, which are denoted by T X (x) and N X (x), respectively. We recall from Section 1.5 that X is said to be regular at a point x X if N X (x) = T X (x). Furthermore, if X is convex, then it is regular at all x X. Definition 2.2.1: We say that the constraint set C, as represented by Eq. (2.8), admits Lagrange multipliers at a point x C if for every smooth cost function f for which x is a local minimum of problem (2.7) there exist vectors λ = (λ 1,..., λ m) and µ = (µ 1,..., µ r) that satisfy the following conditions: f(x ) + λ i h i(x ) + µ j µ j g j(x ) y 0, y T X (x ), (2.9) 0, j = 1,..., r, (2.10) µ j = 0, j / A(x ), (2.11) where A(x ) = { j g j (x ) = 0 } is the index set of inequality constraints that are active at x. A pair (λ, µ ) satisfying Eqs. (2.9)- (2.11) is called a Lagrange multiplier vector corresponding to f and x. When there is no danger of confusion, we refer to (λ, µ ) simply as a Lagrange multiplier vector or a Lagrange multiplier, without reference to the corresponding local minimum x and the function f. Figure illustrates the definition of a Lagrange multiplier. Condition (2.11) is referred to as the complementary slackness condition (CS for short). Note that from Eq. (2.9), it follows that the set of Lagrange multiplier vectors

15 Sec. 2.2 Enhanced Fritz John Optimality Conditions 15 corresponding to a given f and x is the intersection of a collection of closed half spaces [one for each y T X (x )], so it is a (possibly empty) closed and convex set. (T X (x * )) * Level Sets of f g(x) < 0 g(x * ) X T X (x * ) f(x * ) x L(x *, µ * ) Figure Illustration of a Lagrange multiplier for the case of a single inequality constraint and the spherical set X shown in the figure. The tangent cone T X (x ) is a halfspace and its polar T X (x ) is the halfline shown in the figure. There is a unique Lagrange multiplier µ, and it is such that x L(x, µ ) belongs to T X (x ). Note that for this example, we have µ > 0, and that there is a sequence {x k } X that converges to x and is such that f(x k ) < f(x ) and g(x k ) > 0 for all k, consistently with condition (iv) of the subsequent Prop The condition (2.9), referred to as the Lagrangian stationarity condition, is consistent with the Lagrange multiplier theory outlined in the preceding section for the case where X = R n. It can be viewed as the necessary condition for x to be a local minimum of the Lagrangian function L(x, λ, µ ) = f(x) + λ i h i(x) + µ j g j(x) over x X. When X is a convex set, Eq. (2.9) is equivalent to f(x ) + λ i h i(x ) + µ j g j(x ) (x x ) 0, x X. (2.12) This is because when X is convex, T X (x ) is equal to the closure of the set of feasible directions F X (x ), which is in turn equal to the set of vectors

16 16 Lagrange Multipliers Chap. 2 of the form α(x x ), where α > 0 and x X. If X = R n, Eq. (2.12) becomes f(x ) + λ i h i(x ) + µ j g j(x ) = 0, which together with the nonnegativity condition (2.10) and the CS condition (2.11), comprise the classical Karush-Kuhn-Tucker conditions. The following proposition is central in the line of development of this chapter. It enhances the classical Fritz John optimality conditions, and forms the basis for enhancing the classical Karush-Kuhn-Tucker conditions as well. The proposition asserts that there exist multipliers corresponding to a local minimum x, including a multiplier µ 0 for the cost function gradient. These multipliers have standard properties [conditions (i)-(iii) below], but they also have a special nonstandard property [condition (iv) below]. This condition asserts that by violating the constraints corresponding to nonzero multipliers, we can improve the optimal cost (the remaining constraints, may also need to be violated, but the degree of their violation is arbitrarily small relative to the other constraints). Proposition 2.2.1: Let x be a local minimum of problem (2.7)- (2.8). Then there exist scalars µ 0, λ 1,..., λ m, and µ 1,..., µ r, satisfying the following conditions: ( (i) µ 0 f(x ) + m λ i h i(x ) + ) r µ j g j(x ) N X (x ). (ii) µ j 0 for all j = 0, 1,..., r. (iii) µ 0, λ 1,..., λ m, µ 1,..., µ r are not all equal to 0. (iv) If the index set I J is nonempty where I = {i λ i 0}, J = {j 0 0 µ j > 0}, there exists a sequence {x k } X that converges to x and is such that for all k, f(x k ) < f(x ), λ i h i(x k ) > 0, i I, µ j g j(x k ) > 0, j J, (2.13) h i (x k ) = o ( w(x k ) ), i / I, g j (x k ) = o ( w(x k ) ), j / J, (2.14) where { } w(x) = min min h i(x), min g j(x). (2.15) i I j J Proof: We use a quadratic penalty function approach. For each k =

17 Sec. 2.2 Enhanced Fritz John Optimality Conditions 17 1, 2,..., consider the penalized problem minimize F k (x) f(x) + k 2 subject to x X S, where we denote ( hi (x) ) 2 + k 2 ( g + j (x) ) x x 2 g + (x) = max { 0, g j (x) }, S = {x x x ɛ}, and ɛ > 0 is such that f(x ) f(x) for all feasible x with x S. Since X S is compact, by Weierstrass theorem, we can select an optimal solution x k of the above problem. We have for all k, F k (x k ) F k (x ), which is can be written as f(x k ) + k 2 ( hi (x k ) ) 2 k ( + g + 2 j (x k ) ) xk x 2 f(x ). (2.16) Since f(x k ) is bounded over X S, we obtain lim h i(x k ) = 0, i = 1,..., m, lim k k g+ j (xk ) = 0, j = 1,..., r; otherwise the left-hand side of Eq. (2.16) would become unbounded from above as k. Therefore, every limit point x of {x k } is feasible, i.e., x C. Furthermore, Eq. (2.16) yields f(x k ) + (1/2) x k x 2 f(x ) for all k, so by taking the limit as k, we obtain f(x) x x 2 f(x ). Since x S and x is feasible, we have f(x ) f(x), which when combined with the preceding inequality yields x x = 0 so that x = x. Thus the sequence {x k } converges to x, and it follows that x k is an interior point of the closed sphere S for all k greater than some k. For k k, we have the necessary optimality condition F k (x k ) T X (x k ), which [by using the formula ( g j + (x)) 2 = 2g + j (x) g j (x)] is written as f(x k ) + ξi k h i(x k ) + ζj k g j(x k ) + (x k x ) T X (x k ), where (2.17) ξ k i = kh i(x k ), ζ k j = kg+ j (xk ). (2.18)

18 18 Lagrange Multipliers Chap. 2 Denote δ k = 1 + (ξi k)2 + (ζj k)2, (2.19) µ k 0 = 1 δ, k λk i = ξk i δ, i = 1,..., m, k µk j = ζk j, j = 1,..., r. δk (2.20) Then by dividing Eq. (2.17) with δ k, we obtain µ k 0 f(xk ) + λ k i h i(x k ) + µ k j g j(x k ) + 1 δ k (xk x ) T X (x k ) Since by construction we have (µ k 0 )2 + (λ k i )2 + (2.21) (µ k j )2 = 1, (2.22) the sequence {µ k 0, λk 1,..., λk m, µ k 1,..., µk r } is bounded and must contain a subsequence that converges to some limit {µ 0, λ 1,..., λ m, µ 1,..., µ r}. From Eq. (2.21) and the defining property of the normal cone N X (x ) [x k x, z k z, and z k T X (x k ) for all k, imply that z N X (x )], we see that µ 0, λ i, and µ j must satisfy condition (i). From Eqs. (2.18) and (2.20), µ 0 and µ j must satisfy condition (ii), and from Eq. (2.22), µ 0, λ i, and µ j must satisfy condition (iii). Finally, to show that condition (iv) is satisfied, assume that I J is nonempty, and note that for all sufficiently large k within the index set K of the convergent subsequence, we must have λ i λk i > 0 for all i I and µ j µk j > 0 for all j J. Therefore, for these k, from Eqs. (2.18) and (2.20), we must have λ i h i(x k ) > 0 for all i I and µ j g j(x k ) > 0 for all j J, while from Eq. (2.16), we have f(x k ) < f(x ) for k sufficiently large (the case where x k = x for infinitely many k is excluded by the assumption that I J is nonempty). Furthermore, the conditions h i (x k ) = o ( w(x k ) ) for all i / I, and g j (x k ) = o ( w(x k ) ) for all j / J are equivalent to { }) λ k i (min = o min i I λk i, min j J µk j, i / I, and { }) µ k j (min = o min i I λk i, min j J µk j, j / J, respectively, so they hold for k K. This proves condition (iv). Q.E.D. Note that condition (iv) of Prop is stronger than the CS condition (2.11). [If µ j > 0, then according to condition (iv), the corresponding

19 Sec. 2.2 Enhanced Fritz John Optimality Conditions 19 jth inequality constraint must be violated arbitrarily close to x [cf. Eq. (2.13)], implying that g j (x ) = 0.] For ease of reference, we refer to condition (iv) as the complementary violation condition (CV for short). This condition can be visualized in the examples of Fig It will turn out to be of crucial significance in the following sections. Level Sets of f (T X (x * )) * Level Sets of f g 1 (x * ) x*... x k g 2 (x * ) g(x) < 0 x*... g(x * ) x k g 1 (x) < 0 f(x * ) X g 2 (x) < 0 f(x * ) (a) (b) Figure Illustration of the CV condition. In the example of (a), where X = R 2, the triplets that satisfy the enhanced Fritz-John conditions are the positive multiples of a unique vector of the form (1, µ 1, µ 2 ) where µ 1 > 0 and µ 2 > 0. It is possible to reduce the cost function while violating both constraints by approaching x along the sequence {x k } shown. The example of (b) is the one of Fig , and is similar to the one of (a) except that X is the shaded sphere shown rather than X = R 2. If X is regular at x, i.e., N X (x ) = T X (x ), condition (i) of Prop becomes µ 0 f(x ) + λ i h i(x ) + µ j g j(x ) T X (x ), or equivalently µ 0 f(x ) + λ i h i(x ) + µ j g j(x ) y 0, y T X (x ). This term is in analogy with complementary slackness, which is the condition that for all j, µ j > 0 implies g j (x ) = 0. Thus complementary violation reflects the condition that for all j, µ j > 0 implies g j (x) > 0 for some x arbitrarily close to x.

20 20 Lagrange Multipliers Chap. 2 If in addition, the scalar µ 0 can be shown to be strictly positive, then by normalization we can choose µ 0 = 1, and condition (i) of Prop becomes equivalent to the Lagrangian stationarity condition (2.9). Thus, if X is regular at x and we can guarantee that µ 0 = 1, the vector (λ, µ ) = {λ 1,..., λ m, µ 1,..., µ r} is a Lagrange multiplier vector, which satisfies the CV condition (a stronger condition than the CS condition, as mentioned above). As an example, suppose that there is no abstract set constraint (X = R n ), and the gradients h i (x ), i = 1,..., m, and g j (x ), j A(x ) are linearly independent, then we cannot have µ 0 = 0, since then condition (i) of Prop would be violated. It follows that there exists a Lagrange multiplier vector, which in this case is unique in view of the linear independence assumption. We thus obtain the Lagrange multiplier theorem alluded to in Section 2.1. This is a classical result, found in almost all nonlinear programming textbooks, but it is obtained here in a stronger form, which includes the assertion that the multipliers satisfy the CV condition in place of the CS condition. To illustrate the use of the generalized Fritz John conditions of Prop and the CV condition in particular, consider the following example. Example Suppose that we convert the problem min h(x)=0 f(x), involving a single equality constraint, to the inequality constrained problem minimize f(x) subject to h(x) 0, h(x) 0. The Fritz John conditions, in their classical form, assert the existence of nonnegative µ 0, λ +, λ, not all zero, such that µ 0 f(x ) + λ + h(x ) λ h(x ) = 0. (2.23) The candidate multipliers that satisfy the above condition as well as the CS condition λ + h(x ) = λ h(x ) = 0, include those of the form µ 0 = 0 and λ + = λ > 0, which provide no relevant information about the problem. However, these multipliers fail the stronger CV condition of Prop , showing that if µ 0 = 0, we must have either λ + 0 and λ = 0 or λ + = 0 and λ 0. Assuming h(x ) 0, this violates Eq. (2.23), so it follows that µ 0 > 0. Thus, by dividing Eq. (2.23) by µ 0, we recover the familiar first order condition f(x )+ λ h(x ) = 0 with λ = (λ + λ )/µ 0, under the assumption h(x ) 0. Note that this deduction would not have been possible without the CV condition. We will explore further the CV condition as a vehicle for characterizing Lagrange multipliers in the next section. Subsequently, in Section 1.4, we will derive conditions that guarantee that one can take µ 0 = 1 in Prop , by using the notion of pseudonormality.

21 Sec. 2.3 Informative Lagrange Multipliers INFORMATIVE LAGRANGE MULTIPLIERS The Lagrange multipliers whose existence is guaranteed by Prop (assuming that µ 0 = 1) are special: they satisfy the stronger CV condition in place of the CS condition. These multipliers provide a significant amount of sensitivity information by in effect indicating which constraints to violate in order to effect a cost reduction. In view of this interpretation, we refer to a Lagrange multiplier vector (λ, µ ) that satisfies, in addition to Eqs. (2.9)- (2.11), the CV condition [condition (iv) of Prop ] as being informative. The salient property of informative Lagrange multipliers is consistent with the classical sensitivity interpretation of a Lagrange multiplier as the rate of cost reduction when the corresponding constraint is violated. Here we are not making enough assumptions for this stronger type of sensitivity interpretation to be valid. Yet it is remarkable that with hardly any assumptions, at least one informative Lagrange multiplier vector exists if X is regular and we can guarantee that we can take µ 0 = 1 in Prop In fact we will show shortly a stronger and more definitive property: if the tangent cone T X (x ) is convex and there exists at least one Lagrange multiplier vector, there exists one that is informative. An informative Lagrange multiplier vector is useful, among other things, if one is interested in identifying redundant constraints. Given such a vector, one may simply discard the constraints whose multipliers are 0 and check to see whether x is still a local minimum. While there is no general guarantee that this will be true, in many cases it will be; for example, as we will show in Chapter 3, in the special case where f and X are convex, the g j are convex, and the h i are linear, x is guaranteed to be a global minimum, even after the constraints whose multipliers are 0 are discarded. Now if discarding constraints is of analytical or computational interest and constraints with 0 Lagrange multipliers can be discarded as indicated above, we are motivated to find multiplier vectors that have a minimal number of nonzero components (a minimal support). We call such Lagrange multiplier vectors minimal, and we define them as having support I J that does not strictly contain the support of any other Lagrange multiplier vector. Minimal Lagrange multipliers are not necessarily informative. For example, think of the case where some of the constraints are duplicates of others. Then in a minimal Lagrange multiplier vector, at most one of each set of duplicate constraints can have a nonzero multiplier, while in an informative Lagrange multiplier vector, either all or none of these duplicate constraints will have a nonzero multiplier. Another interesting example is provided by Fig There are four minimal Lagrange multiplier vectors in this example, and each of them has two nonzero components. However, none of these multipliers is informative. In fact all the other Lagrange multiplier vectors, those involving three out of four, or all four components

22 22 Lagrange Multipliers Chap. 2 positive, are informative. Thus, in the example of Fig , the sets of minimal and informative Lagrange multipliers are nonempty and disjoint. Nonetheless, minimal Lagrange multipliers turn out to be informative after the constraints corresponding to zero multipliers are neglected, as can be inferred by the subsequent Prop In particular, let us say that a Lagrange multiplier (λ, µ ) is strong if in addition to Eqs. (2.9)-(2.11), it satisfies the condition (iv ) If the set I J is nonempty, where I = {i λ i 0} and J = {j 0 µ j > 0}, there exists a sequence {xk } X that converges to x and is such that for all k, f(x k ) < f(x ), λ i h i(x k ) > 0, i I, µ j g j(x k ) > 0, j J. (2.24) This condition resembles the CV condition, but is weaker in that it makes no provision for negligibly small violation of the constraints corresponding to zero multipliers, as per Eqs. (2.14) and (2.15). As a result, informative Lagrange multipliers are also strong, but not reversely. The following proposition, illustrated in Fig , clarifies the relationships between different types of Lagrange multipliers. It requires that the tangent cone T X (x ) be convex, which is true in particular if X is convex. In fact regularity of X at x is sufficient to guarantee that T X (x ) is convex (we mentioned this fact in Section 1.5 without proof; see Rockafellar and Wets [RoW98]). Strong Informative Minimal Lagrange multipliers Figure Relations of different types of Lagrange multipliers, assuming that the tangent cone T X (x ) is convex (which is true in particular if X is regular at x ). Proposition 2.3.1: Let x be a local minimum of problem (2.7)- (2.8). Assume that the tangent cone T X (x ) is convex and that the set of Lagrange multipliers is nonempty. Then:

23 Sec. 2.3 Informative Lagrange Multipliers 23 (a) The set of informative Lagrange multiplier vectors is nonempty, and in fact the Lagrange multiplier vector that has minimum norm is informative. (b) Each minimal Lagrange multiplier vector is strong. Proof: (a) We summarize the essence of the proof argument in the following lemma. Lemma 2.3.1: Let N be a closed convex cone in R n, and let a 0,..., a r be given vectors in R n. Suppose that the closed and convex set M R r given by M = µ 0 a 0 + µ j a j N is nonempty. Then there exists a sequence {d k } N such that a 0 dk µ 2, (2.25) (a j dk ) + µ j, j = 1,..., r, (2.26) where µ is the vector of minimum norm in M and we use the notation (a j dk ) + = max{0, a j dk }. Furthermore, we have 1 2 µ 2 = inf d N a 0 d + 1 ( (a 2 j d) +) 2 = lim k a 0 dk + 1 ( (a 2 j d k ) +) 2. In addition, if the problem (2.27) minimize a 0 d subject to d N, ( (a j d) +) 2 (2.28)

24 24 Lagrange Multipliers Chap. 2 has an optimal solution, denoted d, we have a 0 d = µ 2, (a j d ) + = µ j, j = 1,..., r. (2.29) Proof: For any γ 0, consider the function L γ (d, µ) = a 0 + µ j a j d + γ 2 d µ 2. (2.30) Our proof will revolve around saddle point properties of the convex/concave function L 0, but to derive these properties, we will work with its γ-perturbed and coercive version L γ for γ > 0, and then take the limit as γ 0. From the saddle point theorem of Section 1.4, for all γ > 0, the coercive convex/concave quadratic function L γ has a saddle point, denoted (d γ, µ γ ), over d N and µ 0. This saddle point is unique and can be easily characterized, taking advantage of the quadratic nature of L γ. In particular, the maximization over µ 0 when d = d γ yields µ γ j = (a j dγ ) +, j = 1,..., r (2.31) [to maximize µ j a j dγ (1/2)µ 2 j subject to the constraint µ j 0, we calculate the unconstrained maximum, which is a j dγ, and if it is negative we set it to 0, so that the maximum subject to µ j 0 is attained for µ j = (a j dγ ) + ]. The minimization over d N when µ = µ γ is equivalent to projecting the vector s γ = a 0 + r µγ j a j γ on the cone N, since by adding and subtracting (γ/2) s γ 2, we can write L γ (d, µ γ ) as It follows that L γ (d, µ γ ) = γ 2 d sγ 2 γ 2 sγ µγ 2. (2.32) d γ = P N (s γ ), (2.33) where P N ( ) denotes projection on N. Furthermore, using the projection theorem and the fact that N is a cone, we have so Eq. (2.32) yields s γ 2 d γ s γ 2 = d γ 2 = P N (s γ ) 2, L γ (d γ, µ γ ) = γ 2 P N (sγ ) µγ 2, (2.34)

25 Sec. 2.3 Informative Lagrange Multipliers 25 or equivalently using the definition of s γ : ( ( P N a 0 + )) r L γ (d γ, µ γ µγ j a 2 j ) = 2γ 1 2 µγ 2. (2.35) We will use the preceding facts to show that as γ 0, µ γ µ, while d γ yields the desired sequence d k, which satisfies Eqs. (2.25)-(2.27). Furthermore, we will show that if d is an optimal solution of problem (2.28), then (d, µ ) is a saddle point of the function L 0. We note that { 1 inf L 0(d, µ) = d N 2 µ 2 if µ M, otherwise, so since µ is the vector of minimum norm in M, we obtain for all γ > 0, 1 2 µ 2 = sup inf L 0(d, µ) d N µ 0 inf sup L 0 (d, µ) d N µ 0 inf sup L γ (d, µ) d N µ 0 = L γ (d γ, µ γ ). (2.36) Combining Eqs. (2.35) and (2.36), we obtain P N L γ (d γ, µ γ ) = ( ( a 0 + )) r µγ j a 2 j 1 2 µγ µ 2. 2γ (2.37) From this, we see that µ γ µ, so that µ γ remains bounded as γ 0. By taking the limit above as γ 0, we see that lim γ 0 P N a 0 + µ γ j a j = 0, so (by the continuity ( of ( the projection operation) any limit point of µ γ, call it µ, satisfies P N a 0 + )) ( r µ j a j = 0, or a 0 + ) r µ j a j N. Since µ γ 0, it follows that µ 0, so µ M. We also have µ µ (since µ γ µ ), so by using the minimum norm property of µ, we conclude that any limit point µ of µ γ must be equal to µ. Thus, µ γ µ. From Eq. (2.37) we then obtain L γ (d γ, µ γ ) 1 2 µ 2, (2.38)

26 26 Lagrange Multipliers Chap. 2 while from Eqs. (2.33) and (2.34), we have We also have L γ (d γ, µ γ ) = sup L γ (d γ, µ) µ 0 γ 2 dγ 2 0. (2.39) = a 0 dγ + γ 2 dγ ( (a j d γ ) +) 2 = a 0 dγ + γ 2 dγ µγ 2. (2.40) Taking the limit as γ 0 and using the fact µ γ µ and (γ/2) d γ 2 0 [cf. Eq. (2.39)], it follows that lim L γ(d γ, µ γ ) = lim a γ 0 γ 0 0 dγ µ 2. Combining this equation with Eq. (2.38), we obtain lim γ 0 a 0 dγ = µ 2, which together with the fact a j dγ = µ γ j µ j shown earlier, proves Eqs. (2.25) and (2.26). The maximum of L 0 over µ 0 for fixed d is attained at µ j = (a j d)+ [compare with Eq. (2.31)], so that sup L 0 (d, µ) = a 0 d + 1 ( (a µ 0 2 j d) +) 2. Hence inf sup L 0 (d, µ) = inf d N µ 0 d N Combining this with the equation a 0 d ( (a j d) +) 2. (2.41) inf sup L 0 (d, µ) = lim L γ (d γ, µ γ ) = 1 d N µ 0 γ 0 2 µ 2 [cf. Eqs. (2.36) and (2.38)], and Eqs. (2.39) and (2.40), we obtain the desired relation 1 2 µ 2 = inf d N a 0 d + 1 ( (a 2 j d) +) 2 = lim γ 0 a 0 dγ + 1 ( (a 2 j d γ ) +) 2

27 Sec. 2.3 Informative Lagrange Multipliers 27 [cf. Eq. (2.27)]. Finally, if d attains the infimum in the right-hand side above, Eqs. (2.36), (2.38), and (2.41) show that (d, µ ) is a saddle point of L 0, and that a 0 d = µ 2, (a j d ) + = µ j, j = 1,..., r. Q.E.D. We now return to the proof of Prop (a). For simplicity we assume that all the constraints are inequalities that are active at x (equality constraints can be handled by conversion to two inequalities, and inactive inequality constraints are inconsequential in the subsequent analysis). We will use Lemma with the following identifications: N = T X (x ), a 0 = f(x ), a j = g j (x ), j = 1,..., r, M = set of Lagrange multipliers, µ = Lagrange multiplier of minimum norm. If µ = 0, then µ is an informative Lagrange multiplier and we are done. If µ 0, by Lemma [cf. (2.25) and (2.26)], for any ɛ > 0, there exists a d N = T X (x ) such that a 0d < 0, (2.42) a j d > 0, j J, a j d ɛ min l J a l d, j / J, (2.43) where J = {j µ j > 0}. By suitably scaling the vector d, we can assume that d = 1. Let {x k } X be such that x k x for all k and x k x, x k x x k x d. Using Taylor s theorem for the cost function f, we have for some vector sequence ξ k converging to 0 f(x k ) f(x ) = f(x ) (x k x ) + o( x k x ) = f(x ) ( d + ξ k) x k x + o( x k x ) ( = x k x f(x ) d + f(x ) ξ k + o( xk x ) x k x (2.44) From Eq. (2.42), we have f(x ) d < 0, so we obtain f(x k ) < f(x ) for k sufficiently large. Using also Taylor s theorem for the constraint functions g j, we have for some vector sequence ξ k converging to 0, g j (x k ) g j (x ) = g j (x ) (x k x ) + o( x k x ) ). = g j (x ) ( d + ξ k) x k x + o( x k x ) ( ) = x k x g j (x ) d + g j (x ) ξ k + o( xk x ). x k x (2.45)

28 28 Lagrange Multipliers Chap. 2 This, combined with Eq. (2.43), shows that for k sufficiently large, g j (x k ) is bounded from below by a constant times x k x for all j such that µ j > 0 [and hence g j(x ) = 0], and satisfies g j (x k ) o( x k x ) for all j such that µ j = 0 [and hence g j(x ) 0]. Thus, the sequence {x k } can be used to establish the CV condition for µ, and it follows that µ is an informative Lagrange multiplier. (b) We summarize the essence of the proof argument of this part in the following lemma. Lemma 2.3.2: Let N be a closed convex cone in R n, let a 0, a 1,..., a r be given vectors in R n. Suppose that the closed and convex set M R r given by M = µ 0 a 0 + µ j a j N is nonempty. Among index subsets J {1,..., r} such that for some µ M we have J = {j µ j > 0}, let J {1,..., r} have a minimal number of elements. Then if J is nonempty, there exists a vector d N such that a 0 d < 0, a jd > 0, for all j J. (2.46) Proof: We apply Lemma with the vectors a 1,..., a r replaced by the vectors a j, j J. The subset of M given by M = µ 0 a 0 + µ j a j N, µ j = 0, j / J j J is nonempty by assumption. Let µ be the vector of minimum norm on M. Since J has a minimal number of indices, we must have µ j > 0 for all j J. If J is nonempty, Lemma implies that there exists a d N such that Eq. (2.46) holds. Q.E.D. Given Lemma 2.3.2, the proof of Prop (b) is very similar to the corresponding part of the proof of Prop (a). Q.E.D. Sensitivity Let us consider now the special direction d that appears in Lemma 2.3.1, and is a solution of problem (2.28) (assuming this problem has an optimal

29 Sec. 2.3 Informative Lagrange Multipliers 29 solution). Let us note that this problem is guaranteed to have at least one solution when N is a polyhedral cone. This is because problem (2.28) can be written as minimize a 0 d + 1 zj 2 2 subject to d N, 0 z j, a j d z j, j = 1,..., r, where the z j are auxiliary variables. Thus, if N is polyhedral, then problem (2.28) is a quadratic program with a cost function that is bounded below by Eq. (2.27), and as shown in Section 1.3, it has an optimal solution. An important context where this is relevant is when X = R n in which case N X (x ) = T X (x ) = R n, or more generally when X is polyhedral, in which case T X (x ) is polyhedral. It can also be proved that problem (2.28) has an optimal solution if there exists a vector µ M such that a 0 + µ j a j ri(n), where ri(n) denotes the relative interior of the cone N. We will show this in the next section after we develop the theory of pseudonormality and constraint qualifications. Assuming now that T X (x ) is polyhedral [or more generally, that problem (2.28) has an optimal solution], the line of proof of Prop (a) [combine Eqs. (2.44) and (2.45)] can be used to show that if the Lagrange multiplier that has minimum norm, denoted by (λ, µ ), is nonzero, there exists a sequence {x k } X, corresponding to the vector d T X (x ) of Eq. (2.29) such that f(x k ) = f(x ) λ i h i(x k ) µ j g j(x k ) + o( x k x ). (2.47) Furthermore, the vector d solves problem (2.28), from which it can be seen that d solves the problem minimize f(x ) d ( subject to hi (x ) d ) 2 + where β is given by β = j A(x ) ( gj (x ) d) +) 2 = β, d TX (x ), ( hi (x ) d ) 2 (( + gj (x ) d ) +) 2. j A(x )

### The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1 October 2003 The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1 by Asuman E. Ozdaglar and Dimitri P. Bertsekas 2 Abstract We consider optimization problems with equality,

### Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Enhanced Fritz John Optimality Conditions and Sensitivity Analysis Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology March 2016 1 / 27 Constrained

### Pseudonormality and a Lagrange Multiplier Theory for Constrained Optimization 1 November 2000 Submitted for publication to JOTA Pseudonormality and a Lagrange Multiplier Theory for Constrained Optimization 1 by Dimitri P. Bertsekas and Asuman E. Ozdaglar 2 Abstract We consider optimization

### UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

### Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

### Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

### Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

### Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

### A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

### Chap 2. Optimality conditions Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1

### In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written 11.8 Inequality Constraints 341 Because by assumption x is a regular point and L x is positive definite on M, it follows that this matrix is nonsingular (see Exercise 11). Thus, by the Implicit Function

### Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

### LECTURE 10 LECTURE OUTLINE LECTURE 10 LECTURE OUTLINE Min Common/Max Crossing Th. III Nonlinear Farkas Lemma/Linear Constraints Linear Programming Duality Convex Programming Duality Optimality Conditions Reading: Sections 4.5, 5.1,5.2,

### A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

### Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

### Nonlinear Programming and the Kuhn-Tucker Conditions Nonlinear Programming and the Kuhn-Tucker Conditions The Kuhn-Tucker (KT) conditions are first-order conditions for constrained optimization problems, a generalization of the first-order conditions we

### Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

### Inequality Constraints Chapter 2 Inequality Constraints 2.1 Optimality Conditions Early in multivariate calculus we learn the significance of differentiability in finding minimizers. In this section we begin our study of the

### Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

### Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6 Nonlinear Programming 3rd Edition Theoretical Solutions Manual Chapter 6 Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts 1 NOTE This manual contains

### Convex Analysis and Optimization Chapter 4 Solutions Convex Analysis and Optimization Chapter 4 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

### 5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

### TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

### Optimization. A first course on mathematics for economists Optimization. A first course on mathematics for economists Xavier Martinez-Giralt Universitat Autònoma de Barcelona xavier.martinez.giralt@uab.eu II.3 Static optimization - Non-Linear programming OPT p.1/45

### Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces Introduction to Optimization Techniques Nonlinear Optimization in Function Spaces X : T : Gateaux and Fréchet Differentials Gateaux and Fréchet Differentials a vector space, Y : a normed space transformation

### CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

### Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

### Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

### CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics

### Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

### I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

### Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016 Linear Programming Larry Blume Cornell University, IHS Vienna and SFI Summer 2016 These notes derive basic results in finite-dimensional linear programming using tools of convex analysis. Most sources

### Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

### Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement

### Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima B9824 Foundations of Optimization Lecture 1: Introduction Fall 2009 Copyright 2009 Ciamac Moallemi Outline 1. Administrative matters 2. Introduction 3. Existence of optima 4. Local theory of unconstrained

### Constrained maxima and Lagrangean saddlepoints Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analysis and Economic Theory Winter 2018 Topic 10: Constrained maxima and Lagrangean saddlepoints 10.1 An alternative As an application

### Optimality Conditions Chapter 2 Optimality Conditions 2.1 Global and Local Minima for Unconstrained Problems When a minimization problem does not have any constraints, the problem is to find the minimum of the objective function.

### Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

### Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

### Extended Monotropic Programming and Duality 1 March 2006 (Revised February 2010) Report LIDS - 2692 Extended Monotropic Programming and Duality 1 by Dimitri P. Bertsekas 2 Abstract We consider the problem minimize f i (x i ) subject to x S, where

### SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING Nf SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING f(x R m g HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 5, DR RAPHAEL

### Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

### 5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

### Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7 Mathematical Foundations -- Constrained Optimization Constrained Optimization An intuitive approach First Order Conditions (FOC) 7 Constraint qualifications 9 Formal statement of the FOC for a maximum

### CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

### Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

### The Karush-Kuhn-Tucker (KKT) conditions The Karush-Kuhn-Tucker (KKT) conditions In this section, we will give a set of sufficient (and at most times necessary) conditions for a x to be the solution of a given convex optimization problem. These

### Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b. Solutions Chapter 5 SECTION 5.1 5.1.4 www Throughout this exercise we will use the fact that strong duality holds for convex quadratic problems with linear constraints (cf. Section 3.4). The problem of

### Second Order Optimality Conditions for Constrained Nonlinear Programming Second Order Optimality Conditions for Constrained Nonlinear Programming Lecture 10, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk)

### GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Chapter 4 GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Alberto Cambini Department of Statistics and Applied Mathematics University of Pisa, Via Cosmo Ridolfi 10 56124

### Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima B9824 Foundations of Optimization Lecture 1: Introduction Fall 2010 Copyright 2010 Ciamac Moallemi Outline 1. Administrative matters 2. Introduction 3. Existence of optima 4. Local theory of unconstrained

### Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1 Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1 Session: 15 Aug 2015 (Mon), 10:00am 1:00pm I. Optimization with

### Jørgen Tind, Department of Statistics and Operations Research, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen O, Denmark. DUALITY THEORY Jørgen Tind, Department of Statistics and Operations Research, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen O, Denmark. Keywords: Duality, Saddle point, Complementary

### Convexity, Duality, and Lagrange Multipliers LECTURE NOTES Convexity, Duality, and Lagrange Multipliers Dimitri P. Bertsekas with assistance from Angelia Geary-Nedic and Asuman Koksal Massachusetts Institute of Technology Spring 2001 These notes

### ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

### Lagrangian Duality Theory Lagrangian Duality Theory Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapter 14.1-4 1 Recall Primal and Dual

### Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

### EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

### Lagrange Multipliers with Optimal Sensitivity Properties in Constrained Optimization Lagrange Multipliers with Optimal Sensitivity Properties in Constrained Optimization Dimitri P. Bertsekasl Dept. of Electrical Engineering and Computer Science, M.I.T., Cambridge, Mass., 02139, USA. (dimitribhit.

### Elements of Convex Optimization Theory Elements of Convex Optimization Theory Costis Skiadas August 2015 This is a revised and extended version of Appendix A of Skiadas (2009), providing a self-contained overview of elements of convex optimization

### Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

### g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to 1 of 11 11/29/2010 10:39 AM From Wikipedia, the free encyclopedia In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) provides a strategy for finding the

### ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS GERD WACHSMUTH Abstract. Kyparisis proved in 1985 that a strict version of the Mangasarian- Fromovitz constraint qualification (MFCQ) is equivalent to

### where u is the decision-maker s payoff function over her actions and S is the set of her feasible actions. Seminars on Mathematics for Economics and Finance Topic 3: Optimization - interior optima 1 Session: 11-12 Aug 2015 (Thu/Fri) 10:00am 1:00pm I. Optimization: introduction Decision-makers (e.g. consumers,

### Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

### 2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

### Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

### The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints. 1 Optimization Mathematical programming refers to the basic mathematical problem of finding a maximum to a function, f, subject to some constraints. 1 In other words, the objective is to find a point,

### Generalization to inequality constrained problem. Maximize Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

### Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

### Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution Outline Roadmap for the NPP segment: 1 Preliminaries: role of convexity 2 Existence of a solution 3 Necessary conditions for a solution: inequality constraints 4 The constraint qualification 5 The Lagrangian

### 1 Review of last lecture and introduction Semidefinite Programming Lecture 10 OR 637 Spring 2008 April 16, 2008 (Wednesday) Instructor: Michael Jeremy Todd Scribe: Yogeshwer (Yogi) Sharma 1 Review of last lecture and introduction Let us first

### CO 250 Final Exam Guide Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,

### Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

### Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

### LECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS. LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON 6.253 CLASS LECTURES AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS http://web.mit.edu/dimitrib/www/home.html

### Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis. Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS Carmen Astorne-Figari Washington University in St. Louis August 12, 2010 INTRODUCTION General form of an optimization problem: max x f

### Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

### MATH2070 Optimisation MATH2070 Optimisation Nonlinear optimisation with constraints Semester 2, 2012 Lecturer: I.W. Guo Lecture slides courtesy of J.R. Wishart Review The full nonlinear optimisation problem with equality constraints

### 4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality

### LECTURE 9 LECTURE OUTLINE. Min Common/Max Crossing for Min-Max Min-Max Problems Saddle Points LECTURE 9 LECTURE OUTLINE Min Common/Max Crossing for Min-Max Given φ : X Z R, where X R n, Z R m consider minimize sup φ(x, z) subject to x X and maximize subject to z Z.

### Mathematics 530. Practice Problems. n + 1 } Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined

### Appendix A Taylor Approximations and Definite Matrices Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary

### Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Dr. Abebe Geletu Ilmenau University of Technology Department of Simulation and Optimal Processes (SOP)

### Support vector machines Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support

### September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

### Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

### 6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games 6.254 : Game Theory with Engineering Applications Lecture 7: Asu Ozdaglar MIT February 25, 2010 1 Introduction Outline Uniqueness of a Pure Nash Equilibrium for Continuous Games Reading: Rosen J.B., Existence

### Summary Notes on Maximization Division of the Humanities and Social Sciences Summary Notes on Maximization KC Border Fall 2005 1 Classical Lagrange Multiplier Theorem 1 Definition A point x is a constrained local maximizer of f subject

### Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

### Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

### Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in Convex Sets Prof. Dan A. Simovici UMB 1 / 57 Outline 1 Closures, Interiors, Borders of Sets in R n 2 Segments and Convex Sets 3 Properties of the Class of Convex Sets 4 Closure and Interior Points of Convex Convex Optimization Theory A SUMMARY BY DIMITRI P. BERTSEKAS We provide a summary of theoretical concepts and results relating to convex analysis, convex optimization, and duality theory. In particular,