Differentiable exact penalty functions for nonlinear optimization with easy constraints. Takuma NISHIMURA

Master s Thesis Differentiable exact penalty functions for nonlinear optimization with easy constraints Guidance Assistant Professor Ellen Hidemi FUKUDA Takuma NISHIMURA Department of Applied Mathematics and Physics Graduate School of Informatics Kyoto University KYOTO UNIVERSITY KYOTO JAPAN F OU N DED 1 8 9 7 February 2015

Abstract One approach for solving nonlinear constrained optimization problems is to use methods based on exact penalty functions. Basically, with an appropriate choice of the penalty parameter, the optimal solutions of the original constrained problem are obtained by solving an unconstrained one, which is easier to solve. Recently, Andreani, Fukuda and Silva proposed an implementable exact penalty method for nonlinear optimization problems with general equality and inequality constraints. In this paper, we extend their work, considering problems that distinguish the easy constraints from the general difficult ones. In this case, the definition of the exact penalty function is changed, in such a way that the original problem is replaced by a problem containing only easy constraints, which is also easier to solve. In order to construct such an exact penalty function, we consider the case that the easy constraints are defined by linear equalities, and then create an estimate of the Lagrange multipliers associated to a point. We incorporate this multipliers estimate in an augmented Lagrangian function, and then prove the whole exactness results. Finally, we propose to use the spectral projected gradient method with a dynamical way to update the penalty parameter.

Contents 1 Introduction 1 2 Preliminaries 3 3 Constructing the penalty function 3 3.1 The multipliers estimate............................. 4 3.2 The penalty function............................... 9 4 Exactness results 9 4.1 Analysis of KKT points.............................. 9 4.2 Optimality results................................. 14 5 Algorithm 16 5.1 Updating the penalty parameter.......................... 16 5.2 Spectral projected gradient method........................ 20 6 Conclusion 21 References 21

1 Introduction We consider the following nonlinear constrained optimization problem: min f (x) s.t. g(x) 0 h(x) = 0 x X, where f : R n R, g : R n R m, and h : R n R p are twice continuously differentiable functions, and X R n is a nonempty closed convex set. Here, X is the set of easy constraints, in particular X = {x R n Ax = b}, (1.2) with A R l n, b R l,l n and rank(a) = l. Many methods for nonlinear constrained minimization problems have been considered in the literature, but here we focus on the penalty function approach. Among those methods, we cite the quadratic penalty functions, the augmented Lagrangian functions, and the exact penalty functions. In particular, the latter one consists of replacing the constrained minimization problem into a single unconstrained one, which is easier to solve. Moreover, the penalty parameter of the unconstrained problem is finite. One of the most famous exact penalty functions is the one proposed by Zangwill in [16]. For problem (1.1) with X = R n, it is defined by f (x) + c max{0,g 1 (x),...,g m (x), h 1 (x),..., h p (x) }, where c > 0 is a penalty parameter. Under reasonable assumptions and for c sufficiently large, we can get a solution of (1.1) by minimizing the above penalty function. However, such a function is nondifferentiable because it contains a maximum term in its formula. Also, it is not easy to find the appropriate penalty parameter. After Zangwill, many researchers proposed differentiable exact penalty functions, starting with Fletcher in 1970 [8], Mukai and Polak in 1975 [13], and Glad and Polak in 1979 [10]. In 1980 s, Di Pillo and Grippo proposed another type of exact penalty functions, based on the Lagrange multipliers estimate given by Glad and Polak [10]. They first considered optimization problems with inequality constraints in [6] and further extended the idea for problems with both inequality and equality constraints [7]. The idea of Di Pillo and Grippo was to incorporate an Lagrange multipliers estimate in an augmented Lagrangian function, so the obtained function has exactness properties. More recently, in 2010, André and Silva extended Di Pillo and Grippo s idea to solve variational inequality problems with the feasible set defined by functional inequality constraints [3]. We recall that variational inequality problems have many applications, and also extend the first-order necessary optimality condition of nonlinear programming problems. In 2013, based on André and Silva s approach, Andreani et al. proposed a Gauss-Newton-type method based on exact penalty functions to solve optimization problems with general inequality and equality constraints [2]. If second-order methods, like the Newton-type method, are applied to solve the problem, it would be necessary to deal with third-order derivatives of the problem data, which is difficult in the numerical point of view. To overcome such difficulty, they proposed to incorporate (1.1) 1

an multipliers estimate in the augmented Lagrangian function for variational inequalities. Another approach that does not deal with these third-order derivatives was given in 2012 by Fukuda et al. [9]. They proposed an exact penalty function for nonlinear second-order cone programs, which is an extension of nonlinear programming problems. However, in this case, they extended the multipliers estimate given by Lucidi in [12], and incorporated it into the classical augmented Lagrangian for nonlinear programming. Also, they approximated the B-subdifferential of the gradient of the penalty function and proved that the Newton-type method has global and superlinear convergence. In this paper, we further extend the exact penalty approach given by Andreani et al., but use the multipliers estimate and the augmented Lagrangian function that is similar to the ones given by Fukuda et al. Moreover, the nonlinear optimization problems considered here distinguish the easy constraints from the general (equality and inequality) ones. We are particularly interested in the case that the easy constraints are given by general linear equality constraints. For such a case, the definition of exact penalty functions changes. Here, the original problem can be solved by doing a minimization of the exact penalty function subject to the easy constraints, instead of an unconstrained minimization. In fact, one can observe that optimization problems containing only easy constraints are not difficult to solve comparing to the unconstrained ones. In augmented Lagrangian methods, for example, it is also common to distiguish the easy constraints. The paper is organized as follows. In Section 2, we give some notions and results that are necessary for the construction of the penalty function. In Section 3, we first propose an extension of Lucidi s multipliers estimate using the fact that the projection mapping onto the set X is easy to compute. Next, we show the properties associated to this estimate and incorporate it in the augmented Lagrangian for nonlinear programming problems. In Section 4, we prove that the constructed function is in fact an exact penalty function. In order to do that, we make the analysis for stationary points, global optimizers and local optimizers. In Section 5, we propose a way to dynamically the update the penalty parameter, and suggest to use the spectral projected gradient method to solve the problem. We conclude in Section 6, with some remarks and future works. Throughout the paper, we use the following notations. We define the Euclidean norm by, the supremum norm by and the inner product by,. The set of positive real numbers is defined by R ++. We also define the identity matrix of dimension n by I n, and the transpose of a matrix Z by Z T. For functions θ : R n R, and ν : R n R m, the gradient of θ, the Hessian of θ, and the Jacobian matrix of ν at x R n are given by f (x), 2 f (x) and Jg(x), respectively. For a function η : R n R m R, the gradient and the Hessian of η with respect to the first variable are given by x η(x, y) and 2 x xη(x, y), respectively. Moreover, given a vector z := (z 1,..., z n ) T R n, the diagonal matrix with diagonal entries z i, i = 1,...,n, is defined by diag(z). 2

2 Preliminaries In this section, we introduce some basic notions and results in order to construct a penalty function. First, we describe the Karush-Kuhn-Tucker (KKT) conditions of the problem (1.1), which can be written as follows. x L(x, λ, µ) N X (x), (2.1) g(x) 0, (2.2) h(x) = 0, (2.3) λ 0, (2.4) λ,g(x) = 0, (2.5) where, L(x, λ, µ) := f (x) + λ, g(x) + µ, h(x) is the Lagrangian function associated to (1.1), and λ R m and µ R p correspond to the Lagrange multipliers associated with the inequality g(x) 0 and equality constraints h(x) = 0, respectively. Moreover, N X (x) is the normal cone to the set of easy constraints X at x, that is N X (x) := {z R n z, y x 0, y X} Concerning the condition (2.1), we can prove the following lemma. Lemma 2.1. For all (x, λ, µ) R n+m+p, we have x L(x, λ, µ) N X (x) P X ( x L(x, λ, µ) + x) x = 0, (2.6) where P X denotes projection onto X. Proof. L(x, λ, µ) N X (x) is equivalent to which can be written as L(x, λ, µ), y x 0, y X, ( L(x, λ, µ) + x) x, y x 0, y X. Note that a point z X is a projection of u onto X if and only if u z, y z 0, y X. Therefore, we get P X ( L(x, λ, µ) + x) = x. 3 Constructing the penalty function In this section, we construct an exact penalty function based on the ideas given by Di Pillo and Grippo in [7] and Andreani et al. in [2]. Here, the definition of exact penalty functions is changed. In fact, instead of replacing the original problem with an unconstrained one, we replace 3

it with a problem containing only easy constraints. More precisely, we transform the problem (1.1) into the following problem. min w c (x) (3.1) s.t. x X. The basic idea to construct w c is to construct a Lagrange multipliers estimate associated to a point and incorporate it in the classical augmented Lagrangian for nonlinear programming problems. 3.1 The multipliers estimate In order to construct the penalty function, we first consider the following unconstrained minimization problem. The idea was given by Glad and Polak in [10] and further extended by Lucidi in [12]. In this work, it consists of finding an estimator of the Lagrange multipliers associated to a point x R n, by solving the problem min λ, µ P X ( x L(x, λ, µ) + x) x 2 + ζ 2 1 G(x)λ 2 + ζ 2 2 α(x)( λ 2 + µ 2 ), (3.2) where x L(x, λ, µ) is the gradient of the Lagrangian function associated to x, ζ 1, ζ 2 > 0, and G(x) := diag(g 1 (x),...,g m (x)) is the diagonal matrix with diagonal entries g i (x), i = 1,...,m. Moreover, α(x) := 1 ( max{g(x),0} 2 + h(x) 2) = 1 2 2 m max{g i (x),0} 2 + i=1 p i=1 h i (x) 2 is a function that measures how a point x is feasible/infeasible with respect to the equality and inequality constraints. Note that if G(x)λ = 0 holds, then the complementarity condition (2.5) is satisfied. Now, we show a property related to P X. Lemma 3.1. Let z R n and X be given by (1.2). Then, the projection of z onto X can be written as P X (z) = (I n A ( T AA ) ) T 1 A z + A ( T AA ) T 1 b. (3.4) (3.3) Proof. First, given z R n, we consider the following minimization problem. 1 min w z 2 2 s.t. Aw = b. Then, we get a Lagrangian function for it, that is, L(w, λ) = 1 2 w z 2 λ, Aw b. 4

Considering w L(w, λ), we get w L(w, λ) = w z A T λ. If w L(w, λ) = 0, then we obtain Since rank(a) = l and Aw = b, we get Aw Az AA T λ = 0. λ = ( AA T ) 1 (b Az). Therefore, we obtain w = (I n A ( T AA ) ) T 1 A z + A ( T AA ) T 1 b. where Observe that from Lemma 3.1, we have P X ( x L(x, λ, µ) + x) x = q(x) P x L(x, λ, µ), P := I n A T (AA T ) 1 A, q(x) := Px + A T (AA T ) 1 b x. (3.5) Here, we also note that P = P T = P 2 and Pq(x) = 0. This result shows that (3.2) is equivalent to the following problem: min λ, µ PJg(x) T PJh(x) T ζ 1 G(x) 0 ζ 2 α(x) 1/2 I m 0 0 ζ 2 α(x) 1/2 I p [ λ µ ] which is a linear least squares problem. From now on, we consider that the following assumptions hold. q(x) P f (x) 0 0 0 2, (3.6) Assumption 3.1. A point x R n satisfies the linear independence constraint qualification (LICQ) on the set of feasible points. More precisely, the gradients g i (x), h j (x) and a k (where a k is the kth column of A T ) are linearly independent for all i {i {1,...,m} g i (x) = 0}, all j = 1,... p and all k = 1,... l. If Assumption 3.1 is satisfied, then the following result holds. Proposition 3.1. Suppose that x R n satisfies Assumption 3.1. Then, P g i (x), P h j (x) are linearly independent for all i {i {1,...,m} g i (x) = 0}, all j = 1,... p. 5

Proof. Suppose that Assumption 3.1 is satisfied. It means that α i g i (x) + i I p β j h j (x) + A T γ = 0 α i, β j,γ = 0, (3.7) j=1 where I := i {i {1,...,m} g i (x) = 0}. Now, assume that there exist α i, i I, and β j, j = 1,...,l, such that p α i P g i (x) + β j P h j (x) = 0. From (3.5), we get i I j=1 ( In A T (AA T ) 1 A ) α i g i (x) + i I Defining v = i I α i g i (x) + p j=1 β j h j (x), (3.8) is equivalent to α i g i (x) + i I p j=1 p j=1 β j h j (x) = 0. (3.8) β j h j (x) + A ( T (AA T ) 1 Av ) = 0. (3.9) Comparing to (3.7), this means that α i, β j = 0 for all i I, j = 1,..., p, and (AA T ) 1 Av = 0. Thus, we conclude that P g i (x), P h j (x) i I, all j 1,... p are linearly independent. The following proposition gives some properties associated with the multipliers estimate. Proposition 3.2. Suppose that x R n satisfies Assumption 3.1. Define the matrix N (x) as follows. [ Jg(x)PJg(x) N (x) := T + ζ1 2G(x)2 + ζ2 2α(x)I m Jg(x)PJh(x) T ] Jh(x)PJg(x) T Jh(x)PJh(x) T + ζ2 2α(x)I. p Then, (a) The matrix N (x) is positive definite. (b) The solution of (3.6) (equivalently, (3.2)) is unique and it is given by [ ] [ ] λ(x) = N 1 Jg(x)P (q(x) ) (x) P f (x). µ(x) Jh(x)P (c) If (x, λ, µ) R n+m+p satisfies the KKT conditions (2.1) (2.5), then λ = λ(x) and µ = µ(x). (d) The Jacobian matrices of λ( ) and µ( ) are given by, [ ] [ ] Jλ(x) = N 1 R1 (x) (x), J µ(x) R 2 (x) 6

with R 1 (x) :=Jg(x)P 2 x x L(x, λ(x), µ(x)) + 2ζ1 2 Λ(x)G(x)Jg(x) + ζ 2 2 λ(x) α(x)t m + ei m x L(x, λ(x), µ(x)) T P 2 g i (x), i=1 R 2 (x) :=Jh(x)P 2 x x L(x, λ(x), µ(x)) + ζ2 2 M(x) α(x)t p + e p i x L(x, λ(x), µ(x)) T P 2 h i (x), i=1 where, Λ(x) := diag(λ 1 (x),..., λ m (x)) and M(x) := diag(µ 1 (x),..., µ p (x)) are diagonal matrices with diagonal entries λ i (x) and µ i (x), respectively. Proof. (a) We consider the matrix A(x) R (n+m+p) (m+p) associated to the linear least squares problem (3.6), that is A(x) := PJg(x) T PJh(x) T ζ 1 G(x) 0 ζ 2 α(x) 1/2 I m 0 0 ζ 2 α(x) 1/2 I p. (3.10) If x is infeasible, then A(x) has full column rank, since in this case α(x) 0. Now, assume that x is feasible, so that α(x) = 0. Without loss of generality, we can write Jg(x) = [Jg(x) T = Jg(x) T ], where Jg(x) = and Jg(x) correspond to the parts of Jg(x) where g i (x) = 0 and g i (x) 0, respectively. In the same way, we can define the matrices Jh(x) =, Jh(x), and G(x). Moreover, we define m 1 and m 2 as the number of rows of Jg(x) = and Jg(x), respectively. Then, we have A(x) = PJg(x) = T PJg(x) T PJh(x) T 0 0 0 0 ζ 1 G(x) 0 0 0 0 We can see that A(x) has linearly independent columns, by Proposition 3.1 and because of the nonzero block diagonal matrix G(x). Furthermore, we can see that N (x) = A(x) T A(x), so we can conclude that N (x) is nonsingular and positive definite. (b) If we differentiate the objective function of problem (3.6) and set the result to zero, it yields A(x) T A(x) [ λ(x) µ(x) ] = A(x) T. q(x) P f (x) 0 0 0 where A(x) is defined in (3.10). The result follows since N (x) = A(x) T A(x) is nonsingular from item (a). 7,

(c) From KKT conditions and equivalence (2.6), we have P X ( x L(x, λ, µ) + x) x = 0, G(x) λ = 0, H(x) µ = 0, and α(x) = 0 so the objective function s value of (3.2) at ( λ, µ) is zero. The result follows since the solution of (3.2) is unique from (b), and because the objective function s value is always nonnegative. (d) From item (b), we have: Jg(x)Pq(x) Jg(x)P f (x) = (Jg(x)PJg(x) T + ζ 2 1 G(x)2 + ζ 2 2 α(x)i m)λ(x) +Jg(x)PJh(x) T µ(x), Jh(x)Pq(x) Jh(x)P f (x) = Jh(x)PJg(x) T λ(x) which is equivalent to +(Jh(x)PJh(x) T + ζ 2 2 α(x)i p)µ(x), Jg(x)P x L(x, λ(x), µ(x)) + (ζ 2 1 G(x)2 + ζ 2 2 α(x)i m)λ(x) = 0, (3.11) From equation (3.11), we obtain Jh(x)P x L(x, λ(x), µ(x)) + ζ 2 2 α(x)i p µ(x) = 0. (3.12) m ei m g i (x) T P x L(x, λ(x), µ(x)) + (ζ1 2 G(x)2 + ζ2 2 α(x)i m)λ(x) = 0. i=1 Thus, deriving it with respect to x, it yields 0 = m ei m x L(x, λ(x), µ(x)) T P 2 g i (x) + 2ζ1 2 Λ(x)G(x)Jg(x) i=1 +ζ 2 1 G(x)2 Jλ(x) + ζ 2 2 λ(x) α(x)t + ζ 2 2 α(x)jλ(x) +Jg(x)P ( 2 x x L(x, λ(x), µ(x)) + Jg(x) T Jλ(x) + Jh(x) T J µ(x) ) = R 1 (x) + Jg(x)PJg(x) T Jλ(x) + ζ 2 1 G(x)2 Jλ(x) +ζ 2 2 α(x)jλ(x) + Jg(x)PJh(x)T J µ(x). Equivalently from equation (3.12), we obtain 0 = R 2 (x) + Jh(x)PJg(x) T Jλ(x) + Jh(x)PJh(x) T J µ(x) + ζ2 2 α(x)j µ(x). These two equations give the desired result. Note that the solution of the minimization (3.6) is unique under LICQ assumption. In Subsection 3.2, we will see that the solution of (3.6), that is, λ(x), µ(x) will be used to construct a penalty function. Furthermore, from (b) and (d) of the above proposition, we observe that the same matrix N (x) is used to define the estimates λ(x), µ(x) and their Jacobian Jλ(x), J µ(x). This means that the computation of Jλ(x), J µ(x) does not require much effort (more precisely, for the factorization of N (x) ) after the computation of λ(x), µ(x). 8

3.2 The penalty function The construction of the penalty function derives from the idea given by Di Pillo and Grippo in [6, 7]. It consists in including the multipliers estimate λ(x), µ(x), solution of (3.2) (equivalently (3.6)) into the classical augumented Lagrangian function [11, 14, 15], given by L c (x, λ, µ) := f (x) + λ,g(x) + c 2 g(x) 2 1 2c + µ, h(x) + c 2 h(x) 2, m max{0, λ i cg i (x)} 2 i=1 where c > 0 is the penalty parameter. Thus, the following is our possible penalty function: The gradient of w c at x is as follows. w c (x) := L c (x, λ(x), µ(x)). (3.13) w c (x) = f (x) + Jg(x) T λ(x) + (cjg(x) T + Jλ(x) T )(g(x) + y c (x)) +Jh(x) T µ(x) + (cjh(x) T + J µ(x) T )h(x) (3.14) = x L(x, λ(x), µ(x)) + (cjg(x) T + Jλ(x) T )(g(x) + y c (x)) +(cjh(x) T + J µ(x) T )h(x) where { y c (x) := max 0, λ(x) } g(x). c Although w c is not differentiable because it includes the maximum function, we can see that it is semismooth. Moreover, w c (x) has Jλ(x), J µ(x) in its formula, which contain secondorder terms 2 f (x), 2 g i (x), 2 h i (x). If a second-order method, like the Newton method, is used, then we have to deal with third-order terms of the problem data, which should be avoided for numerical reasons. 4 Exactness results In this section, we show some exactness results for w c defined in (3.13). We follow closely the results presented by Di Pillo and Grippo [6, 7], André and Silva [3], and Fukuda, Andreani and Silva [2]. 4.1 Analysis of KKT points First, we show that if a point satisfies KKT conditions, it satisfies w c (x) N X (x) for an arbitrary parameter c > 0. 9

Proposition 4.1. Let (x, λ, µ) be a KKT triple associated with the problem (1.1). Then, w c (x) N X (x) for all c > 0. Proof. From, (2.2), (2.4), (2.5), Proposition 3.2(c) and the fact that { g(x) + y c (x) = max g(x), λ(x) }, c we obtain g(x) + y c (x) = 0. From (2.3), we also have h(x) = 0. Therefore, we obtain w c (x) = f (x) + Jg(x) T λ + Jh(x) T µ = x L(x, λ, µ). But since (2.1) holds, we get w c (x) N X (x). In order to prove the opposite implication of the above result, we consider the function α(x) defined in (3.3). We recall that the function α is an infeasibility measure. Moreover, α(x) = 0 is equivalent to g(x) 0 and h(x) = 0. Considering the problem min α(x) (4.1) s.t x X, we see that it is a minimization of the infeasibility measure subject to easy constraints. Furthermore, if α(x) = Jg(x) T max{0,g(x)} Jh(x) T h(x) N X (x) holds, then we call the point x a stationary point of the minimization of the infeasibility problem (4.1). Now, we will show that the other implication can be true for large enough c, and for instance, under boundedness assumption. As we will see, instead of a KKT point, we may find a stationary point of the minimization of the infeasibility that is infeasible for (1.1). Proposition 4.2. Let {x k } R n and {c k } R ++ be sequences such that c k, x k x and w ck (x k ) N X (x k ) for all k. Then x is a stationary point of the minimization of the infeasibility (4.1). Proof. By the definition of w ck (x k ), we get x L ( x k, λ(x k ), µ(x k ) ) ( c k Jg(x k ) T + Jλ(x k ) T ) max{g(x k ), λ(x k )/c k } Jh(x k ) T µ(x k ) ( c k Jh(x k ) T + J µ(x k ) T ) h(x k ) N X (x k ). Since λ( ), µ( ) are continuous by LICQ, and f, g, h are twice continuously differentiable, we can divide the above expression by c k and take the limit, so ( J( x) T max{g( x),0} + Jh( x) T h( x) ) N X ( x), which means that x is a stationary point of the minimization of the infeasibility (4.1). 10

Proposition 4.3. Let x R n be a feasible point of problem (1.1). Then there exist c, δ > 0 such that if x x δ, c c and w c (x) N X (x), then (x, λ(x), µ(x)) satisfies the KKT conditions (2.1) (2.5). Proof. First, it is easy to prove that Y c (x)λ(x) = cy c (x)(g(x) + y c (x)), (4.2) where Y c (x):= diag((y c ) 1 (x),..., (y c ) m (x)). From equations (3.11) and (3.12), we obtain Jg(x)P x L(x, λ(x), µ(x)) = (ζ 2 1 G(x)2 + ζ 2 2 α(x)i m)λ(x), (4.3) Observe that (4.3) can be written as Jh(x)P x L(x, λ(x), µ(x)) = ζ2 2 α(x)µ(x). (4.4) From (4.2), we get Jg(x)P x L(x, λ(x), µ(x)) = (ζ 2 1 G(x)2 + ζ 2 2 α(x)i m)λ(x) = ζ 2 1 G(x)(G(x) + Y c (x))λ(x) + (ζ 2 1 G(x)Y c (x) ζ 2 2 α(x)i m)λ(x) = ζ 2 1 G(x)Λ(x)(g(x) + y c (x)) + (ζ 2 1 G(x)Y c (x) ζ 2 2 α(x)i m)λ(x). 1 c Jg(x)P x L(x, λ(x), µ(x)) ( ) 1 = ζ1 2 G(x) c Λ(x) + Y c (x) (g(x) + y c (x)) 1 c ζ 2 2 α(x)λ(x). From the definition of w c (x) given in (3.14), we have 1 c Jg(x)P w c (x) = 1 c Jg(x)P x L(x, λ(x), µ(x)) + Jg(x)P (Jg(x) T + 1c ) Jλ(x)T (g(x) + y c (x)) + Jg(x)P (Jh(x) T + 1c ) J µ(x)t h(x) ( = Jg(x)P (Jg(x) T + 1c ) ( )) 1 Jλ(x)T ζ1 2 G(x) c Λ(x) + Y c (x) (g(x) + y c (x)) 1 c ζ 2 2 α(x)λ(x) + Jg(x)P ( Jh(x) T + 1 c J µ(x)t ) h(x). (4.5) Moreover, we use the equation (3.12) and we get 11

Jh(x)P x L(x, λ(x), µ(x)) = ζ 2 2 α(x)µ(x), where and and we also obtain 1 c Jh(x)P w c (x) = 1 c ζ 2 2 α(x)µ(x) + Jh(x)P (Jg(x) T + 1c ) Jλ(x)T (g(x) + y c (x)) + Jh(x)P (Jh(x) T + 1c ) J µ(x)t h(x). (4.6) From equations (4.5) and (4.6) we get [ 1 Jg(x) c Jh(x) ] P w c (x) = K c (x) [ g(x) + yc (x) h(x) [ ] (Kc (x)) K c (x) := 11 (K c (x)) 12, (K c (x)) 21 (K c (x)) 22 ] ζ 2 2 α(x) c [ λ(x) µ(x) (K c (x)) 11 := Jg(x)P (Jg(x) T + 1c ) ( ) 1 Jλ(x)T ζ1 2 G(x) c Λ(x) + Y c (x), (K c (x)) 12 := Jg(x)P (Jh(x) T + 1c ) J µ(x)t, (K c (x)) 21 := Jh(x)P (Jg(x) T + 1c ) Jλ(x)T, (K c (x)) 22 := Jh(x)P (Jh(x) T + 1c ) J µ(x)t. Now, denoting σ m+p (K c (x)) as the smallest singular value of K c (x), we have ], (4.7) K c (x) [ g(x) + yc (x) h(x) ] 2 σ m+p (K c (x)) [ ] 2 g(x) + yc (x) 2 h(x) = σ m+p (K c (x)) 2 ( g(x) + y c (x) 2 + h(x) 2 ). Furthermore, from the definition of α(x) and y c (x), we obtain α(x) = 1 ( max{g(x),0} 2 + h(x) 2) 1 ( g(x) + yc (x) 2 + h(x) 2). (4.8) 2 2 Thus, considering the square of the norm in (4.7) and using the following basic inequality u v 2 u 2 2 v 2 for all,u,v, 12

we have 1 [ Jg(x) c 2 Jh(x) 1 2 K c (x) 1 ] P w c (x) 2 [ g(x) + yc (x) h(x) ] 2 ζ 4 2 c 2 α(x)2 ( λ(x) 2 + µ(x) 2) 2 σ m+p (K c (x)) 2 ζ 2 4 2c 2 α(x) ( λ(x) 2 + µ(x) 2) ( g(x) + yc (x) ) 2 + h(x) 2). Because x is a feasible point, we have y c ( x) g( x) if c and we get K c ( x) N ( x). Recalling that, N ( x) is nonsingular and by continuity, there exist c and δ such that x x δ, c c, then K c (x) is also nonsingular. The nonsingularity of K c (x) implies the existence of δ, c, ρ > 0 such that, for any x R n with x x δ and c c, 1 2 σ m+p (K c (x)) 2 ζ 2 4 2c 2 α(x) ( λ(x) 2 + µ(x) 2) ρ > 0. Thus, we obtain 1 [ ] Jg(x) c 2 P w Jh(x) c (x) 2 ρ ( g(x) + y c (x) ) 2 + h(x) 2). Now, we take any x and c such that w c (x) N X (x), x x δ, and c c. Recalling that w c (x) N X (x) is equivalent to P w c (x) = q(x) and observing that P 2 = P implies P w c (x) = Pq(x) = 0, the left-hand side of the above expression is zero. Hence, we get g(x) + y c (x) = 0, h(x) = 0. Next, from the definition of w c (x), we obtain x L(x, λ(x), µ(x)) N X (x). Moreover, g(x) + y c (x) = 0 implies g(x) 0, λ(x) 0 and g(x), λ(x) = 0, and we conclude that (x, λ(x), µ(x)) is a KKT triple. From these two results, we can prove the following theorem. Theorem 4.1. Let {x k } R n and {c k } R ++ be sequences such that c k and w ck (x k ) N X (x k ) for all k. Also, consider a subsequence {x k j } from {x k } such that {x k j } x for some x R n. Then, either there exists K such that (x k j, λ(x k j ), µ(x k j )) is a KKT triple associated with (1.1) for all k j > K, or x is a stationary point of the minimization of the infeasibility (4.1) that is infeasible for (1.1). Proof. From Proposition 4.2, x is a stationary point of the minimization of the infeasibility (4.1). If x is a feasible point, from Proposition 4.3, there exists such K that (x k j, λ(x k j ), µ(x k j )) is a KKT triple for all k j > K. Note that in the above theorem, we assume that a converging subsequence of {x k } exists. This happens, for example, under the boundedness assumption. Furthermore, if we assume that all stationary points of the minimization of the infeasibility (4.1) are feasible, we can show the following corollary, which is similar to Theorem 4.1. Such a property holds, for example, under the convexity of g i, i = 1,...,m and when h i, i = 1,..., p are affine. 13

Corollary 4.1. Assume that there exists c > 0 such that the set Z := {x R n w c (x) N X (x), c > c} is bounded. Assume that all stationary points of minimization of the infeasibility (4.1) are feasible for the problem (1.1). Then, there exists a positive c such that if w c (x) N X (x) and c > c then (x, λ(x), µ(x)) is a KKT triple associated with the problem (1.1). Proof. Suppose that there is no such c. Therefore, there exist two sequences {x k } R n and {c k } R ++ with w ck (x k ) N X (x k ) and c k and such that (x k, λ(x k ), µ(x k )) is not KKT. But for c k > c, we have x k Z, which is bounded. It means that there exists a convergent subsequence {x k j } of {x k }. This is not possible from Theorem 4.1 and because there is no stationary point of the minimization of the infeasibility (4.1) that is infeasible for (1.1). 4.2 Optimality results Here, we will prove that w c is in fact an exact penalty function. First, we define G f and L f as the set of global and local optimizers, respectively of the problem (1.1). Considering the problem (3.1), we also define G w (c) and L w (c) as the set of global and local minimizers, respectively of (3.1). The definition of (weakly) exact penalty function is as follows. Definition 4.1. The function w c is a weakly exact penalty function if there exists c > 0 such that for all c c, G w (c) = G f. Also, the function w c is an exact penalty function if there exists c > 0 such that for all c c, G w (c) = G f and L w (c) L f. First, we will show that w c is a weakly exact penalty function, by showing the equivalence of the sets of global minimizers. The following two lemmas will be useful for such a proof. Lemma 4.1. The function w c defined in (3.13) at x R n can be written as w c (x) = f (x) + λ(x),g(x) + y c (x) + c 2 g(x) + y c 2 + µ(x), h(x) + c 2 h(x) 2. Proof. It follows from [2, Lemma 4.1]. Lemma 4.2. Let (x, λ, µ) be a KKT triple associated to the problem (1.1) such that x R n. Then, w c (x) = f (x) for all c > 0. Proof. It follows from [2, Lemma 4.2]. Proposition 4.4. Let {x k } R n and {c k } R ++ be sequences such that {x k } is bounded, c k and x k G w (c k ) for all k. If G f holds, then there exist K such that x k G f for all k > K. 14

Proof. Suppose that for all K, there exists k > K such that x k G f. First, let ˆx G f, which exists because G f. Since ˆx is a KKT point and satisfies LICQ, from Lemma 4.2, we have w ck (x k ) w ck ( ˆx) = f ( ˆx) (4.9) for all k. Since {x k } X is bounded, there exists a subsequence of {x k } converging to x X. Without loss of generality, we can write lim k x k = x. So, taking the supremum limit in both sides of (4.9), we obtain lim sup w c k (x k ) f ( ˆx). (4.10) k Now, from Lemma 4.1, w ck can be written as w ck (x k ) = f (x k ) + λ(x k ),g(x k ) + y ck (x k ) + c k 2 + µ(x k ), h(x k ) + c k 2 h(xk ) 2. g(xk ) + y ck (x k ) 2 Thus, inequality (4.10) implies that h( x) = 0 and g( x) + max{0, g( x)} = 0, which implies g( x) 0, for the continuity of the involved functions. Moreover, it is easy to show that f ( x) lim sup k w ck (x k ). Therefore, f ( x) f ( ˆx), that is, x G f. Since x is feasible and satisfies LICQ, there exist c and δ as in the Theorem 4.1. Let K be sufficiently large such that x k x δ, c k c and x k G w (c k ) for all k > K. Since x k G w (c k ) implies w ck (x k ) N X (x k ), the same corollary ensures that x k is KKT. It means that x k is feasible, for all k > K. Furthermore, Lemma 4.2 and inequality (4.9) yield f (x k ) = w ck (x k ) f ( x) (4.11) for all k > K. We conclude that for such K, x k G f for all k > K, which is a contradiction. Proposition 4.5. Assume that G f all c > 0. holds. Then, G w (c) G f implies that G w (c) = G f for Proof. It follows from [2, Proposition 4.5]. Theorem 4.2. If there exists c > 0 such that c c G w (c) is bounded and G f holds. Then w c is a weakly exact penalty function for the problem. Proof. It follows from [2, Theorem 4.2]. Now, we will show that w c is in fact an exact penalty function, by proving the equivalence of the sets of local minimizers. We recall that such a proof is important because optimization solvers, in general, search for local solutions instead of the global ones. Before presenting the results for local minimizers, we state an additional lemma, which shows that w c at a feasible point is not greater that its objective function s value. Lemma 4.3. Let x R n be a feasible point for (1.1). Then, w c (x) f (x) for all c > 0. Proof. It follows from [2, Lemma 4.3]. 15

Theorem 4.3. Let {x k } R n and {c k } R ++ be sequences such that c k and x k L w (c k ) for all k. Let {x k j } be a subsequence of {x k } such that x k j x. If G f holds, then either there exists K such that x k j L f for all k j > K, or x is a stationary point of the minimization of the infeasibility (4.1) that is infeasible for (1.1). Proof. Since x k j L w (c k j ) implies w ck j (x k j ) N X (x k j ) for all k j, from Theorem 4.1 there is K such that x k j is KKT for all k j > K or x is a stationary point of the minimization of the infeasibility (4.1) that is infeasible. Considering the first case and fixing k j > K, from Lemma 4.2 there exists a neighborhood V (x k j ) of x k j such that f (x k j ) = w ck j (x k j ) w ck j (x) for all x V (x k j ) X. Note that the above statement is also true for all x V (x k j ) X {x g(x) 0, h(x) = 0}. Finally, from Lemma 4.3 we conclude that f (x k j ) w ck j (x) f (x) for all x V (x k j ) that is feasible for (1.1). This means that x k j L f for all k j > K, which completes the proof. Corollary 4.2. Suppose that there exists c > 0 such that c c L w (c) is bounded. Consider also that G f holds and, all stationary points of the minimization of the infeasibility (4.1) are feasible for problem (1.1). Then there exists c > 0 such that if x L w (c) and c > c then, x L f. Proof. It follows from [2, Corollary 4.4]. 5 Algorithm In the previous section, we proved that the function w c is in fact a penalty function in the meaning of Definition 4.1. It means that we are able to solve the original problem (1.1), by solving the problem (3.1) that contains only easy constraints. 5.1 Updating the penalty parameter As we noted previously, it is important to choose a good penalty parameter. In this work, we extend the dynamical update of parameter proposed by Glad and Polak [10]. The basic idea is to use a test function that measures the risk of computing a point x satisfying w c (x) N X (x) that is not a KKT point. First, we define the following function: { a c (x) := g(x) + y c (x) := max g(x), λ(x) }. c Note that if a c (x) = 0 for all c > 0, this is equivalent to g(x), λ(x) = 0, g(x) 0, and λ(x) 0. Also observe that if (x, λ(x), µ(x)) is KKT triple, then w c (x) N X (x). Finally, we define a test function given by t c (x) := P X ( w c (x) + x) x 2 + 1 c γ ( a c (x) 2 + h(x) 2 ), 16

where γ > 0. It is easy to prove that t c is continuous because the functions involved in the test function are also continuous. In the next proposition, we show that t c is a test function. Proposition 5.1. The following statements are equivalent: (a) (x, λ(x), µ(x)) is a KKT triple for (1.1); (b) w c (x) N X (x), a c (x) = 0, and h(x) = 0; (c) w c (x) N X (x) and t c (x) 0. Proof. (a) (b): From (2.2), (2.4), and (2.5), we have a c (x) = 0. Also from (2.1) and (2.3), we conclude that w c (x) N X (x). (b) (a): It holds trivially. (b) (c): We just have to show that t c (x) 0. Since a c (x) = 0 and h(x) = 0, we have t c (x) = P X ( w c (x) + x) x 2 0. (c) (b): First, recall that w c (x) N X (x) is equivalent to P X ( w c (x) + x) = x. Thus, t c (x) 0 implies t c (x) = 1 c γ ( a c (x) 2 + h(x) 2 ) 0 which means that a c (x) = 0 and h(x) = 0. In the next result, we show that either x is a stationary point of the minimization of the infeasibility (4.1) that is infeasible for (1.1), or there exists c large enough such that t c (x) 0 for all c c and all x in a neighborhood of x. From Proposition 5.1, we observe that the latter case reveals us a way to update the penalty parameter c. More precisely, for each time we compute x satisfying w c (x) N X (x), we increase the value of c if t c (x) is greater that zero. Lemma 5.1. Let S R n be a compact set that contains no KKT points. Then, either there exist c, ɛ such that P X ( w c (x) + x) x ɛ for all x S and all c c; or there exist {x k } S, {c k } R ++ such that c k, P X ( w ck (x k ) + x k ) x k 0 and {x k } converges to a stationary point of the minimization of the infeasibility (4.1) that is infeasible for (1.1). Proof. If the first condition does not hold, there exist two sequences {x k } S, {c k } R ++ such that x k x S, c k and P X ( w ck (x k ) + x k ) x k 0. Recalling the definition of w ck (x k ) and P X, we have, from the continuity of the involved functions, Now, if c k, we get P ( f (x k ) + Jg(x k ) T λ(x k ) + Jh(x k ) T µ(x k ) +Jλ(x k ) T max{g(x k ), λ(x k )/c k } + J µ(x k ) T h(x k ) ) P ( c k Jg(x k ) T max{g(x k ), λ(x k )/c k } + c k Jh(x k ) T h(x k ) ) + q(x k ) 0. (5.1) P ( J( x) T max{g( x),0} + Jh( x) T h( x) ) + q( x) = 0, 17

which is equivalent to ( J( x) T max{g( x),0} + Jh( x) T h( x) ) N X ( x). Hence, x is a stationary point of minimization of the infeasibility (4.1). Next, we assume that x is feasible. First, we define λ k and µ k as follows. From (5.1), we obtain λ k := λ(x k ) + c k max{g(x k ), λ(x k )/c k } = max{λ(x k ) + c k g(x k ),0} µ k := µ(x k ) + c k h(x k ) P ( f (x k ) + Jg(x k ) T λ(x k ) + Jh(x k ) T µ(x k ) +Jλ(x k ) T max{g(x k ), λ(x k )/c k } + J µ(x k ) T h(x k ) ) q(x k ) 0. By the continuity, we get λ k λ 0, µ k µ and, P ( f ( x) + Jg( x) T λ + Jh( x) T µ + Jλ( x) T max{g( x),0} + J µ( x) T h( x) ) = q( x). From the definition of λ k, if g( x) ( < 0, then λ = 0. Moreover, h( x) = max{g( x),0} = 0 because x is feasible. This shows that P X x L( x, λ, µ) + x ) = x. Therefore, from (2.6), ( x, λ, µ) is a KKT triple and that is a contradiction because x S and S has no KKT points. Proposition 5.2. For all x R n, either x is a stationary point of the minimization of the infeasibility (4.1), or there exits c, δ > 0 such that if c c and if x x δ, then t c (x) 0. Proof. Consider that the second condition does not hold, that is, there are sequences {x k } R n and {c k } R ++ such that x k x, c k and t ck (x k ) > 0. Note that in this case, x k is not a KKT point for all k. We consider two cases. 1. Assume that x is not a KKT point. From Lemma 5.1, if we assume that S := {x k } { x}, then, we obtain that x is either an infeasible stationary point of the minimization of the infeasibility (4.1) or we obtain t ck (x k ) ɛ 2 + 1 ck r ( a ck (x k ) 2 + h(x k ) 2 ) for all k large enough. Since c γ k, we have a contradiction because t c k (x k ) 0. 2. Assume now that x is a KKT point. From equation (4.7) and the fact that P w ck (x k ) = P ( P w ck (x k ) q(x k ) ), we obtain [ K ck (x k ack (x ) k ] ) h(x k = 1 [ Jg(x k ] ) ) c k Jh(x k P[P w ) ck (x k ) q(x k )] + ζ 2 2 [ α(xk ) λ(x k ] ) c k µ(x k ) for all k. Note that P = 1, K ck (x k ) converges to a nonsingular matrix N ( x), and J(x k ), Jh(x k ), λ(x k ), µ(x k ) converge to Jg( x), Jh( x), λ( x), µ( x) respectively. Therefore, for sufficiently large k, we have [ ack (x k ) h(x k ) ] 1 c k N 1 ( x) ( [ Jg( x) Jh( x) ] [P w c k (x k ) q(x k )] + ζ 2 2 α( x) 18 [ λ( x) µ( x) ] ).

From (4.8) and observing that max{g( x), 0} = 0, h( x) = 0, we get [ ack (x k ] ) h(x k ) 1 N 1 ( x) [ ] Jg( x) c k Jh( x) [P w c k (x k ) q(x k )], Squaring both sides of the above inequality gives ( a ck (x k ) 2 + h(x k ) 2 ) Therefore, 1 c 2 k N 1 ( x) 2 ( Jg( x) 2 + Jh( x) 2) [P w ck (x k ) q(x k )] 2. t ck (x k ) = P w ck (x k ) q(x k ) 2 + 1 c γ k ( a ck (x k ) 2 + h(x k ) 2 ) 1 c γ+2 k N 1 ( x) 2 ( Jg( x) 2 + Jh( x) 2) 1 [P w ck (x k ) q(x k )] 2, which is not positive as c γ+2 k, giving again a contradiction. Based on the above results, we construct a framework of an algorithm to update the penalty parameter. We also show a theorem associated with it. Algorithm 5.1. Dynamical update of the penalty parameter. Step 1. Let A(x,c) be an algorithm that computes a point x satisfying w c (x) N X (x). Initialize x 0 R n, c 0 > 0, ξ > 1 and γ > 0. Set k = 0. Step 2. If x k is a KKT point of the problem, stop. Step 3. While t ck (x k ) > 0, do c k = ξc k. Step 4. Compute x k+1 = A(x k,c k ), set k = k + 1 and go to Step 2. Theorem 5.1. Let {x k } R n be a sequence computed by Algorithm 5.1. If {x k } is bounded and infinite, then for each one of its accumulation points, either it is a KKT point or it is a stationary point of the minimization of the infeasibility (4.1) that is infeasible for (1.1). Proof. Suppose that x is an accumulation point of {x k }. Then, by Proposition 5.2, if x is not a stationary point of the minimization of the infeasibility (4.1) that is infeasible, then t ck (x k ) 0 for all k large enough. Let c be the largest value of c k that was computed. Since x is a feasible accumulation point of the algorithm A(x,c), we have w c ( x) N X ( x) and in particular, P X ( w c ( x) + x) x = 0. From the continuity of t c, we also obtain t c ( x) 0 and so we conclude that x is a KKT point. 19

5.2 Spectral projected gradient method In this section, we present an algorithm to solve problem (3.1). Since it is not an unconstrained problem, the algorithm proposed in [2] can not be considered here. Thus, instead of a Newtontype method, we propose to use the spectral projected gradient method (SPG) to solve the easy constrained problem (3.1) [4]. The SPG method utilizes the orthogonal projection onto the set X. Note that the computation of an orthogonal projection is difficult in general. However, such a computation is trivial here, because X is the set of easy constraints. In particular, if X is defined as in (1.2), the projection is given by (3.4). The main algorithm is as follows. Algorithm 5.2. The spectral projected method with the exact penalty function. Step 1. Choose x 0 R n, c 0 > 0, β 0 [β min, β max ], ξ > 1, ɛ 0 and σ (0,1/2). Set k = 0. Step 2. If P X ( x k w ck (x k ) ) x k ɛ, stop. Step 3. While t ck (x k ) > 0, do c k = ξc k. Step 4. Compute d k = P X ( x k β k w ck (x k ) ) x k Step 5. Find t k > 0 such that w ck (x k + t k d k ) w ck (x k ) + σt k w ck (x k ), d k with a backtracking strategy. Step 6. Set x k+1 = x k + d k. Step 7. Let s k = x k+1 x k and y k = w ck { (x k+1 ) w { ck (x k ). If s }} k, y k 0, then set β k+1 = β max. Otherwise, set β k+1 = max β min,min s k 2 s k,y k, β max. Step 8. Set k = k + 1 and go to step 2. In Step 1, the parameter β 0 [ β min, β max ] is arbitrary, but one possible choice, given in [4] is to choose { { }} 1 β 0 := max β min,min ( P X x 0 w c0 (x 0 ) ) x 0, β max, assuming that P X ( x 0 w c0 (x 0 ) ) x 0. Another possibility, given in [5] is { { }} s 2 β 0 := max β min,min s, ȳ, β max, if s, ȳ > 0, β max, otherwise, where s := x x 0, ȳ := w c0 ( x) w c0 (x 0 ), with x := x 0 max { ɛ rel x 0,ɛ abs } wc0 (x 0 ) and ɛ rel, ɛ abs are the relative and absolute small values respectively, associated to the machine precision. Furthermore, a typical choice for the other parameters is to set β min = 10 30, β max = 10 30, σ = 10 4, ξ = 10, γ = 2, ζ 1 = ζ 2 = 2, and ɛ = 10 8. 20

As for the initial value of the penalty parameter, we can take the idea given in [1], which considers the scaling of the objective function and the constraints, so 10 max { f (x 0 ),1 } c 0 := max c min,min max { 1,1/2 ( h(x 0 ) 2 + max{g(x 0 ),0} 2 + P X (x 0 ) x 0 2)},c max where c min and c max are the minimum and maximum values allowed for the penalty parameter, for example, c min = 1 and c max = 10 8. We also observe that in Step 5, an Armijo-type line search is performed in order to find a step size t k. For each iteration of the line search, we need to compute w ck (x k + t k d k ), which requires the computation of the multipliers estimates λ(x k + t k d k ) and µ(x k + t k d k ). This, on the other hand, requires a matrix factorization to solve a linear least squares problem with order (n + 2m + p) (m + p). Since at each iteration of the line search, the point associated changes, this strategy may be computationally expensive. Another fact that we should point out is that the (spectral) projected gradient method is a first-order method, in the sense that it requires only the computation of the gradient w ck (x k ) at each iteration. In [2], the authors proposed to use a Gauss-Newton-type method to solve the problem (1.1) with X = R n. Recalling that an element of the B-subdifferential B w ck (x k ) has third-order terms of the problem data, the idea given in [2] is to ignore that terms by considering the augmented Lagrangian function for variational inequalities. Another approach, given in [9] for the nonlinear second-order cone programming problems, is to use an approximation of an element of B w ck (x k ), so the Newton method applied for the problem has global and superlinear convergence. The use of a second-order method for the case considered here (that is, a minimization with easy constraints) is still an ongoing topic of research. 6 Conclusion In this paper, we extended Andreani, Fukuda and Silva s exact penalty function approach for nonlinear optimization with easy constraints, in particular for linear equality constraints. In order to do that, we proposed a modification of the multipliers estimate and showed some exactness results. We also give a way to dynamically update the penalty parameter, and proposed to use the spectral projected gradient method to solve the problem. As a future work, it is desirable to use a second-order method, like the Newton method, and show the corresponding convergence results. Numerical experiments should be also done, including comparison with other methods. Acknowledgments I would like to express my appreciation for assistant professor Ellen Hidemi Fukuda. She always encouraged and supervised me kindly although I often troubled her. Moreover, she gave me a precious advice during my research. Without her help, I could not obtain the results in this paper. I would also like to address my acknowledgments to professor Nobuo Yamashita who was always supporting me during my master s course. He gave me invaluable comments and warm encouragements. Finally, I would like to thank all members of Yamashita Laboratory, my friends and my family for their encouraging words. 21

References [1] R. Andreani, E. G. Birgin, J. M. Martínez, M. Schuverdt. On augmented Lagrangian methods with general lower-level constraints, SIAM J. Optim. 18(4), pp. 1286 1309 (2007). [2] R. Andreani, E. H. Fukuda, P. J. S. Silva. A Gauss-Newton approach for solving constrained optimization problems using differentiable exact penalties, J. Optim. Theory Appl. 156(2), pp. 417 419 (2013) [3] T. A. André, P. J. S. Silva. Exact penalties for variational inequalities with applications to nonlinear complementarity problems, Comput. Optim. Appl. 47(3), pp. 401 429 (2010) [4] E. G. Birgin, J. M. Martínez, M. Raydan. Nonmonotone spectral gradient methods on convex sets, SIAM J. Optim. 10(4), pp. 1196 1211 (2000) [5] E. G. Birgin, J. M. Martínez, M. Raydan. Inexact spectral projected gradient methods on convex sets, IMA J. Numer. Anal. 23, pp. 539 559 (2003) [6] G. Di Pillo, L. Grippo. A continuously differentiable exact penalty function for nonlinear programming problems with inequality constraints, SIAM J. Control Optim. 23, pp. 72 84 (1985) [7] G. Di Pillo, L. Grippo. Exact penalty functions in constrained optimization, SIAM J. Control Optim. 27(6), pp. 1333 1360 (1989) [8] R. Fletcher. A class of methods for nonlinear programming with termination and convergence properties, In: Abadie, J.(ed. ) Integer and Nonlinear Programming, North-Holland, Amsterdam, pp. 157 173 (1970) [9] E. H. Fukuda, P. J. S. Silva, M. Fukushima. Differentiable exact penalty functions for nonlinear second-order cone programs, SIAM J. Optim. 22(4), pp. 1607 1633 (2012) [10] T. Glad, E. Polak. A multiplier method with automatic limitation of penalty growth, Math. Program. 17(2), pp. 140 155 (1979) [11] M. R. Hestenes. Multiplier and gradient methods, J. Optim. Theory Appl. 4, pp. 303 320 (1969) [12] S. Lucidi. New results on a continuously differentiable exact penalty function, SIAM J. Optim. 2(4), pp. 558 574 (1992) [13] H. Mukai, E. Polak. A quadratically convergent primal-dual algorithm with global convergence properties for solving optimization problems with equality constraints, Math. Program. 9(3), pp. 336 349 (1975) [14] M. J. D. Powell, A method for nonlinear constraints in minimization problems, In R. Fletcher (ed.) Optimization, Academic Press, New York, pp. 283 298 (1969) 22

[15] R. T. Rockafellar. Augmented Lagrange multiplier functions and duality in nonconvex programming, SIAM J. Control Optim. 12(2), pp. 268 285 (1974) [16] W. I. Zangwill. Nonlinear programming via penalty functions, Manag. Sci. 13, pp. 344 358 (1967) 23