A New Sequential Optimality Condition for Constrained Nonsmooth Optimization

Size: px

Start display at page:

Download "A New Sequential Optimality Condition for Constrained Nonsmooth Optimization"

Howard Walker
5 years ago
Views:

1 A New Sequential Optimality Condition for Constrained Nonsmooth Optimization Elias Salomão Helou Sandra A. Santos Lucas E. A. Simões November 23, 2018 Abstract We introduce a sequential optimality condition for locally Lipschitz constrained nonsmooth optimization, verifiable just using derivative information, and which holds even in the absence of any constraint qualification. The proposed sequential optimality condition is not only novel for nonsmooth problems, but brings new insights for the smooth case as well. We present a practical algorithm that generates iterates fulfilling the new necessary optimality condition. A main feature of the devised algorithm is to allow a stronger control over the infeasibility of the iterates than usually obtained by exact penalty strategies, ensuring theoretical and practical advantages. Illustrative numerical experiments highlight the potentialities of the algorithm. Keywords nonsmooth nonconvex optimization; constrained optimization; sequential optimality condition; constraint qualification AMS Subject Classifications 90C26, 90C30, 90C46 1 Introduction We consider constrained nonsmooth optimization problems of the form min f(x) x R n s.t. c(x) 0, (P) where both f : R n R and c : R n R p are locally Lipschitz continuous functions. Equality constraints of the type h(x) = 0 can also be easily incorporated in our framewor (see Remar 1 for more details). A common way of solving constrained optimization problems is to turn (P) into an unconstrained minimization by penalizing points x R n that violate the constraints, i.e., one sees to find a solution to min f(x) + ρz(x), (Unc-P) x Rn where ρ is a positive real number and z : R n R is a continuous function satisfying { z(x) > 0, if c(x) > 0 z(x) = 0, if c(x) 0. Institute of Mathematical Sciences and Computation - University of São Paulo. São Carlos - SP, Brazil. (elias@icmc.usp.br) Department of Applied Mathematics - University of Campinas. Campinas - SP, Brazil. (sandra@ime.unicamp.br,simoes.lea@gmail.com) 1

2 Under certain conditions, and considering ρ to be large enough, the unconstrained minimization (Unc-P) and the original optimization (P) are equivalent [12]. This approach is nown as exact penalization [28, 50] and it can be used to derive most of the theory behind constrained optimization for finite dimensions [13]. For the majority of the optimization problems, the equivalence between (P) and (Unc-P) can only be achieved for finite ρ when z is a nondifferentiable function. However, when both objective and constraint functions are smooth in the entire domain, one usually does not want to trade the constrained smooth problem by a nonsmooth unconstrained one. Hence, many alternatives have been proposed over the years in order to overcome the nondifferentiable nature of (Unc-P). For instance, the Augmented Lagrangian [1, 20] and the Sequential Quadratic Programming (SQP) [9, 46] are widely nown methods. However, when (P) already presents nonsmoothness, the nondifferentiability of z usually does not introduce any additional difficulty. For this reason, many methods developed for solving constrained nonsmooth optimization problems are based on exact penalty functions [22, 23, 38, 39, 48]. There is a strong connection between constraint qualifications (CQ) [5, 12, 46] for problem (P) and the existence of a finite penalty parameter that maes (Unc-P) equivalent to (P). Moreover, convergence results for exact penalization methods (and, more generally, for the majority of the optimization algorithms) are based on different inds of CQs. However, checing the validity of constraint qualifications is not an easy tas, which may justify the fact that most of the algorithms are not designed to test any ind of CQ even when theoretical convergence of the method relies on such conditions. As a result, practical necessary optimality conditions for (P) that do not depend on constraint qualifications are of great importance for establishing theoretically sound stopping criteria for any optimization method. The approximate Karush-Kuhn-Tucer (AKKT) and the complementary approximate Karush-Kuhn- Tucer (CAKKT) conditions [2, 4] present themselves as reliable necessary optimality conditions for smooth optimization problems. Looing at auxiliary functions that approximate the exact penalty approach, the authors of both studies show that any solution x of the smooth optimization problem must satisfy the following sequential optimality condition lim f(x ) + c i (x )µ i = 0, (1) where x is the Euclidean norm of x R n, {x } R n is a sequence converging to x, and {µ } R p + is a sequence of vectors whose components µ i must satisfy lim min{ c i (x ), µ i } = 0, where c i : R n R are the components of c : R n R p for each i {1,..., p}. A natural attempt to generalize (1) to the nonsmooth case is to use the subdifferential concept, i.e., to consider a sequence of elements ( ) {v } f(x ) + c i (x )µ i, with lim v = 0, (2) where g(x) stands for the Clare subdifferential set of the function g at the point x [18]. Unfortunately, appropriate vectors from the subdifferential satisfying these conditions are, in general, not expected to be computed by numerical algorithms, which invalidates the use of (2) as a practical stopping criteria. Using the fact that many real nonsmooth optimization problems are described by functions that are continuously differentiable in a full-measure subset of R n, the authors of [27] present a necessary optimality condition that relies upon the fact that, in many cases, the sets f(x ) and c i (x ) can be traded by f(ˆx ) and c i (ˆx ), where ˆx is a point sufficiently close to x. However, if no constraint qualification is assumed, one must have ˆx = x, which brings bac the same issues discussed in the previous paragraph. In this paper, we propose a new sequential optimality condition for constrained nonsmooth optimization problems that allows the user to wor with derivatives even in the absence of any CQ. To 2

3 show that our necessary optimality condition is practical, we present an algorithm that is proven to generate a sequence of points that satisfies such conditions. A main feature of the proposed algorithm is that it presents a stronger control over the infeasibility of the iterates than usual exact penalty methods, which has important practical consequences. One of the main challenges of exact penalty methods is the selection of the initial value of the penalty parameter. In case this value is chosen to be too large, the method may privilege the feasible region in detriment of optimality, which, in turn, can cause very short steps toward the optimal solution and/or may cause the penalized problem to be ill-conditioned. On the other hand, if the penalty parameter value is much smaller than the magnitude of the objective function at infeasible points, the method may rapidly be attracted by unconstrained minimizers that possess very low function values, preventing the user to obtain a successful solution for reasonable values of ρ. Such a phenomenon is called greediness [7, 17], and may occur even if the user starts the method at a feasible point of the optimization problem. Unlie some existing penalty methods, for which there is no simple way to compute a suitable initial value for the penalty parameter to easily confine the iterates into an almost feasible region, our algorithm allows the user to set a tolerance target value ξ > 0 for infeasibility. Outside this tolerance region, the method is indifferent to the objective function, which ensures the control of the infeasibility even at the initial iterations. As a consequence, it prevents our method to suffer from the greediness phenomenon. The outline of the paper is as follows. Section 2 presents theoretical results supporting that every local minimizer of (P) must fulfill our proposed sequential optimality conditions. Section 3 shows the relation between our sequential optimality conditions and the AKKT and CAKKT conditions in the case where the objective and constraint functions of (P) are all smooth. In Section 4, we state a general algorithm that produces a sequence of iterates fulfilling the proposed sequential optimality conditions. Section 5 brings numerical results that illustrate some of the important properties of our method. Finally, we leave Section 6 for the conclusions of this study. During the entire manuscript, the following notations will be frequently used: is the Euclidean norm; B(x, ɛ) is the Euclidean closed ball with center at x and radius ɛ; P(v X ) is the Euclidean projection of v onto X ; R + is the set of all nonnegative real numbers; R + is the set of all strictly positive real numbers; v + := max{v, 0}. If v is a vector, then v + is also a vector where every entry is taen as the maximum between the respective entry of v and zero; conv X denotes the convex hull of X ; cl X denotes the closure of X ; A + B represents the set {x + y : x A, x B}; A B represents the set {xy : x A, x B}. 2 Establishing new necessary optimality conditions We start this section by introducing the concept of (Goldstein) ɛ-subdifferential set for any locally Lipschitz continuous function f : R n R [31]. 3

4 Definition 2.1 (ɛ-subdifferential set, ɛ-subgradient, ɛ-stationary point). The ɛ-subdifferential set of f at x is given by ɛ f(x) := conv f(b(x, ɛ)), where conv S is the convex hull of the set S and B(x, ɛ) := {y : y x ɛ}. Any v ɛ f(x) is nown as an ɛ-subgradient of f at x. Moreover, if 0 ɛ f(x), then we say that x is an ɛ-stationary point for f. One of the greatest advantages of looing at ɛ-subdifferential sets instead of subdifferential sets is the possibility of seeing ɛ f(x) as a convex hull of the derivatives of f nearby x. Indeed, for any locally Lipschitz continuous function f and any full-measure subset D f of R n such that f is differentiable at any point in D f this subset always exists due to Rademachers theorem [30, Theorem 3.1.6] the following relations hold (see [14, 40]) G f ɛ (x) ɛ f(x) and ɛ1 f(x) G f ɛ 2 (x) (0 ɛ 1 < ɛ 2 ), (3) where G f ɛ (x) := cl conv f ( B(x, ɛ) D f ), with cl S being the closure of the set S. Recent advances on practical tools [14, 15, 16] to approximate G f ɛ (x) allow us to consider a new sequential optimality condition of practical use. The subgradients involved in this new necessary optimality condition are those associated with the objective function and the constraints related to the following index set I ɛ (x) := {i : y B(x, ɛ) with c i (y) 0}. Definition 2.2 (wea ɛ-approximate Nonsmooth Optimality Condition). A feasible point x R n of (P) is said to satisfy the wea ɛ-approximate Nonsmooth Optimality Condition (wea ɛ-anoc) if there exist sequences {x } R n, {ɛ } R +, where R + is the set of positive real numbers, {v } R n and {µ } R p + such that x x, ɛ 0 and v 0, where v ( G f ɛ (x ) + µ i G ci ɛ (x ) ) and i / I ɛ (x ) µ i = 0. (4) Notice that the above definition does not impose any control over the speed of the convergence ɛ 0. Not establishing a relation between the sequences {ɛ } and {x } allows ɛ x x, which, in turn, means that the wea ɛ-anoc may depart too much from the necessary condition that at least one generalized derivative of f at x must be close to a linear combination of the subgradients of the active constraints at x. Fig. 1 exemplifies this issue, showing that subgradients of far away inactive constraints may be considered in the linear combination. Figure 1: Illustration of how the value of ɛ in the ɛ-subdifferential can influence in the activeness of the constraints. Using the exact penalization approach for problem (P) and a similar reasoning employed by the authors in [4, Theorem 3.3], one can infer the wea ɛ-anoc as a necessary optimality condition for nonsmooth problems. However, this strategy does not clarify on the question of how fast the sequence {ɛ } must go to zero. To overcome this issue, we have applied a new penalization strategy that has its roots in [8, Section 4.1]. 4

5 The inspiring idea behind this new penalization is to trade the original nonsmooth problem by the minimization of a discontinuous function Θ ρ : R n R, { Θρ (x) 0 if c(x) > 0 Θ ρ (x) = f(x) ρ otherwise, where ρ R plays a different role than the one presented in the exact penalization function. Here, the parameter ρ is any constant ensuring that f(x ) ρ < 0, with x being a solution of (P). Lie the strategy of solving constrained smooth optimization problems by the use of an exact penalty function, where a degree of difficulty is added by introducing a nondifferentiable term in order to avoid a constrained minimization, this approach also inserts one extra difficulty to the nonsmoothness of the original problem by using a discontinuous function. Therefore, we avoid using this idea directly. Theorem 2.1 ahead shows that the wea ɛ-anoc is, indeed, a necessary optimality condition for problem (P) and also elucidates the matter of the speed of the convergence ɛ 0. Its proof relies on a sequence {Ψ ξ,ρ} of nonsmooth continuous functions Ψ ξ,ρ : R n R approximating the discontinuous function Θ ρ. Given a scalar ξ R +, we define { Ψ ξ,ρ (x) := max 1 c(x) } + 1, 0 [f(x) ρ] + c(x) + 1. (5) ξ It is worth noticing that, since f and c are locally Lipschitz continuous functions, the map Ψ ξ,ρ is also locally Lipschitz continuous. Preceding the result that establishes the wea ɛ-anoc as a necessary optimality condition, we present two lemmas that will facilitate the proof of Theorem 2.1. Lemma 2.1. Let x R n, (ɛ, ξ) R + R +, ρ R with ρ > f(x), and v ɛ Ψ ξ,ρ (x). Then, if ɛ > 0 is such that y B(x, ɛ) ρ f(y), there exists {x j } n+1 e T λ = 1, where e := (1, 1,..., 1) T, such that n+1 v { λ j (max where σ j 1, j {1,..., n + 1}. 1 c(xj ) + 1, 0 ξ B(x, ɛ) and λ Rn+1 +, with } ) f(x j ) + σ j c(x j ) + 1, (6) Proof. Recalling that ɛ Ψ ξ,ρ (x) is a convex set, it follows from Carathéodory s Theorem [49, Theorem 2.29] that there exists {s j } n+1 Ψ ξ,ρ(b(x, ɛ)) and λ R n+1 +, with e T λ = 1, such that v = n+1 λ js j. Because of [18, Theorem ], [18, Theorem 2.3.9], and [18, Proposition 2.3.3], we now that Ψ ξ,ρ (x) is a subset of { max 1 c(x) } ( { }) + 1 1, 0 f(x) [ρ f(x)] conv ξ ξ, 0 c(x) + 1. So, since, by hypothesis, ρ f(x j ) > 0, j {1,..., n + 1}, it follows that there exists {x j } ( { n+1 B(x, ɛ) such that (6) holds with 1 σ j 1 + [ρ f(x j 1 )] conv ξ })., 0 The next result presents sufficient conditions for the feasible point x to satisfy the wea ɛ-anoc. Lemma 2.2. Suppose x is a feasible point of problem (P), {x } is a sequence converging to x, {ɛ } and {ξ } are both real-valued sequences with ɛ 0, ξ 0 and ɛ /ξ 0, and ρ is a real value satisfying f(x ) ρ < 0. If lim c(x ) + 1 /ξ exists and it is strictly less than one, and there exists a sequence {r } R n such that r 0 and r ɛ Ψ ξ,ρ(x ) for all N, then x satisfies the wea ɛ-anoc. 5

6 Proof. Let us choose a sequence {ζ } satisfying ζ < ɛ for all N. Recalling Lemma 2.1, it follows that, for all large enough, there exists {x,j } n+1 B(x, ζ ) such that n+1 r λ j ( { } ) max 1 c(x,j ) + 1, 0 f(x,j ) + σ j c(x,j ) + 1, ξ with σj 1, j {1,..., n + 1}. Therefore, for each j, there must exist s,j f c i (x,j ) for each i, such that s,j c i where Then, defining n+1 r = λ j ( max { } 1 c(x,j ) + 1, 0 s,j f + σj ξ {1}, if c i (x) > 0 i (x) conv{1, 0}, if c i (x) = 0 {0}, if c i (x) < 0 n+1 ι f = { } λ j max 1 c(x,j ) + 1, 0 ξ and. n+1 ι ci = f(x,j ) and i (x,j )s,j c i, (7) ) λ j σj i (x,j ), (8) one can see, for large values of, that ι f is strictly positive due to the fact that lim c(x ) + 1 /ξ is strictly less than one, x x,j ɛ, and ɛ /ξ 0. Hence, for large enough, we have, for all i and j, that 0 τ f,j := λ j max { } 1 c(x,j ) + 1 ξ, 0 Additionally, the following holds: n+1 τ f,j = 1 and n+1 τ ci ι f and 0 τ ci,j := { λ j σ j i(x,j ) ι c i { 1, if ι c,j = i > 0 0, if ι ci = 0 for all i {1,..., p}., if ι ci > 0 0, if ι ci = 0. ( This together with (7) imply that r ι f ζ f(x ) + ) p ιci ζ c(x ). So, choosing v := 1 r, ι ( f we obtain v ζ f(x ) + ) p ι c i ι f ζ c(x ). Now, notice that, because of the way ι ci and i were defined, it must follow i / I ζ (x ) µ i := ιc i = 0. Moreover, since r 0, it yields ι f v 0 as well. So, ( ) v ζ f(x ) + µ i ζ c i (x ) and i / I ζ (x ) µ i = 0. (9) Remembering the inclusions presented in (3) and that ɛ1 g(x) ɛ2 g(x) for any nonsmooth function g and 0 ɛ 1 ɛ 2, it yields v ( Gɛ f (x ) + p µ i Gci ɛ (x ) ) and i / I ɛ (x ) µ i = 0, which proves the statement. We are finally able to introduce the theorem that guarantees that the wea ɛ-anoc is a necessary optimality condition. Theorem 2.1. Let x be a local minimizer of (P). Then, x satisfies the wea ɛ-anoc. 6

7 Proof. Since x is a local minimizer of (P), there must exist δ 1 > 0 such that f(x) f(x ) for all x B(x, δ 1 ) F, where F is the feasible set of (P). Moreover, setting ρ as any real number satisfying ρ > f(x ), it yields that there exists δ 2 > 0 with ρ > f(x) for all x B(x, δ 2 ). Choosing δ = min{δ 1, δ 2 }/2, we define Φ ξ, ρ : R n R as { Φ ξ,ρ (x) := max 1 max{ x } { x δ, 0}, 0 max 1 c(x) } + 1, 0 [f(x) ρ] ξ ξ + c(x) x x 2. Since Φ ξ,ρ is a coercive function for any ξ > 0, a global minimizer of this function always exists. So, taing {ξ } R + as any sequence satisfying ξ 0, we consider an infinite sequence {x } R n such that x argmin x R n Φ ξ,ρ(x). Consequently, Φ ξ,ρ(x ) Φ ξ,ρ(x ) = f(x ) ρ < 0, which yields c(x ) + 1 < ξ and x x < ξ + δ. (10) This implies that the sequence {x } must be bounded. So, let ˆx R n be a cluster point of {x }, which means that there exists an infinite index set K N such that x ˆx. We then define K { 0 τ := max 1 max{ x x } { } δ, 0}, 0 max 1 c(x ) + 1, 0 1. ξ ξ Therefore, it is possible to find an infinite index set K K such that τ ˆτ, for some ˆτ [0, 1]. Hence, since c(ˆx) + 1 = 0 because of (10), we have Φ ξ,ρ(x ) Φ ξ,ρ(x ) lim Φ K ξ,ρ(x ) lim Φ K ξ,ρ(x ) ˆτ[f(ˆx) ρ] ˆx x 2 f(x ) ρ. By (10), we now that ˆx x δ. Then, because of the way we have defined δ, we see that f(ˆx) ρ and f(x ) ρ are strictly negative numbers. Consequently, recalling that x is a local minimizer of (P), we must have ˆτ = 1 and ˆx = x. Therefore, since ˆx and ˆτ are arbitrary cluster points of {x } and {τ }, respectively, this means that any cluster point of {x } must be x and that any cluster point of {τ } must be 1. Since both sequences are bounded, it follows that x x c(x and τ 1. In particular, this last limit also ensures that lim ) + 1 ξ exists and it is strictly less than one. The equality ˆx = x implies, for any sufficiently large N, that x x δ/2 Φ ξ,ρ(x ) = Ψ ξ,ρ(x ) + x x. However, by the way we have defined x, we now that 0 Φ ξ,ρ(x ), which, for any ζ 0, gives us x x Ψ ξ,ρ(x ) r ζ Ψ ξ,ρ(x ) with r 0. Due to Lemma 2.2, the statement is proven. The demonstration of Lemma 2.2 gives a clue about the matter of convergence of ɛ. Notice that, along the proof, we had to assume ɛ /ξ > ζ /ξ 0, where each ξ is a parameter for the nonsmooth function Ψ ξ,ρ. This was necessary to ensure that the multiplier associated with ζ f(x ) is bounded away from 0 in (8), which consequently means that the objective function is always considered in the sequential optimality condition (9). However, notice that even the imposition ɛ /ξ 0 only subjects ɛ to the function Ψ ξ,ρ, meaning that this limit is useful only 7

8 when one has a method to solve nonsmooth problems that is based on such a function, which reduces the generality of the necessary condition. This issue can be overcome by noticing that µ and the limit ɛ /ξ 0 are lined inside the proof of Lemma 2.2. So, by defining a new sequential optimality condition that requires ɛ µ 0, we introduce the ɛ-approximate Nonsmooth Optimality Condition. Definition 2.3 (ɛ-approximate Nonsmooth Optimality Condition). A feasible point x R n of (P) is said to satisfy the ɛ-approximate Nonsmooth Optimality Condition (ɛ-anoc) if there exist sequences {x } R n, {ɛ } R +, {v } R n and {µ } R p + fulfilling the wea ɛ-approximate Nonsmooth Optimality Condition at x and, moreover, ɛ µ 0 holds. From the proof of Theorem 2.1, one can see that the ɛ-anoc is also a necessary optimality condition for any local solution x of (P). Theorem 2.2. Let x be a local minimizer of (P). Then, x satisfies the ɛ-anoc. Proof. Looing at the proof of Theorem 2.1, and since x is a local minimizer of (P), one can find a sequence {x } converging to x, and the associated sequences {ɛ }, {ξ } satisfying ɛ 0, ξ 0 and ɛ /ξ 0 such that the assumptions of Lemma 2.2 hold, and (9) is also valid (where ζ < ɛ ). Additionally, since ɛ /ξ 0, we also have conv{0, 1/ξ }ɛ 0. Hence, recalling that, in (8), ι f is bounded away from zero for large values of, and that {ιci ɛ } approaches zero for all i {1,..., p} (due to the fact that conv{0, 1/ξ }ɛ 0), it must follow that ɛ µ i = ɛ (ι ci /ιf ) 0 for all i {1,..., p}. This information together with Theorem 2.1 ensure the result. Remar 1. Although the sequential optimality conditions that we have proposed so far consider only inequality constraints, one can easily generalize them to nonsmooth problems that may include equality constraints. Indeed, if in problem (P) the feasible set is given by F = {x R n : c(x) 0, h(x) = 0}, where h : R n R l, we can apply the concepts of the wea ɛ-anoc and ɛ-anoc to the problem min f(x) s.t. c(x) 0, g(x) 0, x Rn where g(x) = max{h(x), h(x)}. Considering (4) to the above equality-free optimization problem and using the definition of g, one obtain a sequence {λ } R l free of sign that satisfies l v G ɛ f (x ) + µ i Gɛ ci (x ) + λ j Gɛ hj (x ) and i / I ɛ (x ) µ i = 0. For the ɛ-anoc, one only needs to replace ɛ µ 0 by ɛ ((µ ) T, (λ ) T ) 0. Although the ɛ-anoc is more restrictive over the sequence {ɛ } than the wea ɛ-anoc, it is not clear if the former has practical advantages over the latter. The next example has the goal to show that the ɛ-anoc is, in fact, stronger than the wea ɛ-anoc. Example 1. Let us consider the following nonsmooth optimization problem min x f(x) = max{x, x 2 2} s.t. c 1 (x) = x 0, c 2 (x) = x 2 0. We choose {x } R as a sequence converging to x = 0, and also let {ɛ } be such that 0 < ɛ < x. So, 2 / I ɛ (x ). Then, for all sufficiently large such that x < 1/2, and in case µ 2 = 0, we obtain Gɛ f (x ) + µ 1Gɛ c1 (x ) + µ 2Gɛ c2 (x ) = 1 + µ 1 1. This shows that if {x } fulfills the required conditions of the wea ɛ-anoc, the sequence {ɛ } must satisfy ɛ x for every large enough. This guarantees that 2 I ɛ (x ). So, for sufficiently large values of ɛ, one can always set µ 1 = 0, and find µ 2 large enough in order to have P ( 0 Gɛ f (x ) + µ 1Gɛ c1 (x ) + µ 2Gɛ c2 (x ) ) 0. The previous reasoning ensures that, although x is a local maximum point, it satisfies the wea ɛ-anoc. 8

9 However, we state that x does not satisfy the stronger condition ɛ-anoc. By contradiction, suppose x satisfies the ɛ-anoc. Then, in particular, it must fulfill the wea ɛ-anoc, which gives us ɛ x for every large. Also, notice that any element in Gɛ c2 (x ) is always greater than 2( x + ɛ ). Combining this with ɛ x gives gc 2 Gɛ c2 (x ) gc 2 4ɛ. So, v ( Gɛ f (x ) + µ 1Gɛ c1 (x ) + µ 2Gɛ c2 (x ) ) implies v = 1 + µ 1 + µ 2g c 2 1 4µ 2ɛ. (11) Since we supposed that x satisfies the ɛ-anoc, it follows ɛ µ 0. This last fact together with (11) show that v cannot converge to zero, which evinces a contradiction with the statement that x satisfies the ɛ-anoc. We now prove that our proposed optimality conditions must be, at least, as strong as the necessary optimality condition presented in [29] (considering D(x ) = R n in [29, Proposition 3.1]), which has its roots in the study presented in [36]. Theorem 2.3. If x R n is a feasible point for (P) satisfying the (wea) ɛ-anoc, then the generalized Fritz-John optimality conditions hold at x, i.e., there must exist positive real values λ 0, λ 1,..., λ p, not all simultaneously zero, such that 0 λ 0 f(x ) + λ i c i (x ) and λ i c i (x ) = 0, for all i {1,..., p}. (12) Proof. For any feasible point x R n satisfying the (wea) ɛ-anoc, there exist sequences {x } R n, {ɛ } R +, {v } R n and {µ } R p + such that x x, ɛ 0 and v 0, with ( ) v = v f + µ i v c i and i / I ɛ (x ) µ i = 0, (13) where v f Gf ɛ (x ) ɛ f(x ) and v c i Gɛ ci (x ) ɛ c i (x ). Then two situations can happen: (a) the sequence {µ } is bounded, or (b) the sequence {µ } is unbounded. Let us consider that (a) is the case. Since x x and all functions considered here are locally Lipschitz continuous functions, it follows that {v f } and {v c i }, i {1,..., p}, are all bounded sequences. Therefore, there must exist an infinite index set K N such that µ µ, v f K K v f and v c i v c K i, i {1,..., p}. So considering the result of [40, Lemma 3.2 (iii)] for the auxiliary functions f(x) = f(x) v T f x and c i (x) = c i (x) v c T i x, i {1,..., p}, it follows that v f f(x ) and v c i c i (x ), i {1,..., p}. Therefore, recalling that v 0, we get 0 ( f(x ) + p µ i c i(x )). Notice also that if c i (x ) < 0 for some i, then i / I ɛ (x )( µ i = 0) for all N sufficiently large, which yields µ i = 0. This assures the result for case (a). If (b) holds, then one only has to divide the equation that appears in (13) by max i {µ i }. This ensures that the new multipliers of v c i, i {1,..., p}, will be bounded and the multiplier of v f will be 1/ max i {µ i }. Hence, the same reasoning used in case (a) can be employed, obtaining the Fritz-John conditions with zero being the multiplier of f(x ). Considering the Mangasarian-Fromovitz constraint qualification for nonsmooth problems [27, Section 3], which guarantees that (12) cannot be true if λ 0 = 0, and the generalized KKT condition, as defined below, the corollary presented in the sequence follows immediately from the theorem above. Definition 2.4 (Generalized KKT conditions [18, Section 6.3]). A feasible point x R n of (P) is said to satisfy the generalized KKT conditions if there exists µ R p + such that ( ) 0 f(x ) + µ i c i (x ) and µ i c i (x ) = 0 for all i {1,..., p}. 9

10 Corollary 2.1. If a feasible point x R n for (P) satisfies the (wea) ɛ-anoc, and the nonsmooth Mangasarian-Fromovitz constraint qualification holds at x, then the generalized KKT condition also holds at x. We believe that the results obtained in this section have great relevance in the nonsmooth optimization field, specially when one is concerned with convergence analysis of nonsmooth optimization methods. In the nonsmooth context, it is common to prove the convergence of an algorithm by showing that every accumulation point generated by the iterates of such a method is a stationary point of (Unc-P) (see, for example, [23, 38, 39]). This can be well justified when a property called calmness holds at a local minimizer x of the original problem (P). Definition 2.5 (Calmness [12, 32]). Consider x a local minimizer of (P). We say that problem (P) is calm at x when one can find ρ > 0 sufficiently large such that x is also a locally optimal solution of (Unc-P). Thus, under the calmness hypothesis, the statement that x is a locally optimal solution of (Unc-P) becomes a necessary optimality condition. However, this is not a legitimate necessary optimality condition, in the sense that one needs to assume more than just the fact that x is a local minimizer of (P). That is not the case for the (wea) ɛ-anoc. By the results of Theorems 2.1 and 2.2, we have shown that the (wea) ɛ-anoc at x is a legitimate necessary optimality condition. Therefore, the results exposed here give alternative forms of dealing with the convergence analysis of nonsmooth optimization methods even in the absence of calmness. We end our theoretical analysis with a brief section presenting the relations between the ɛ-anoc and the sequential optimality conditions AKKT and CAKKT [2, 4] in the smooth context. 3 ɛ-anoc applied to smooth optimization The goal of this section is to show that, under mild assumptions, the scheme from Fig. 3 holds true when the objective and constraints functions of (P) are all differentiable. For completeness, we define below the concepts of AKKT and CAKKT for the following smooth optimization problem min f(x) s.t. c(x) 0, h(x) = 0, (P-smooth) x Rn where f : R n R, c : R n R p and h : R n R l are all functions of class C 1. Figure 2: Scheme of relations between the ɛ-anoc and the well-nown sequential optimality conditions in the smooth field. Definition 3.1 (AKKT, CAKKT [2, 4]). A feasible point x of (P-smooth) is an AKKT point if there exist sequences {x }, {µ } R p + and {λ } R l such that x x, lim min{ c i (x ), µ i } = 0 for all i {1,..., p}, and l lim f(x ) + µ i c i (x ) + λ j h j (x ) = 0. If, in addition, lim µ i c i(x ) = 0 and lim λ j h j(x ) = 0 for all i {1,..., p} and j {1,..., l}, then we call x a CAKKT point. 10

11 First, we prove that every feasible point x of any smooth optimization problem satisfying the ɛ-anoc must also be an AKKT point. Theorem 3.1. Suppose the functions c and h in (P-smooth) have locally Lipschitz continuous derivatives. If x is a feasible point of (P-smooth) and satisfies the ɛ-anoc, then x is also an AKKT point. Proof. Since x satisfies the ɛ-anoc, there must exist sequences {x } R n, {ɛ } R +, {v } R n, {µ } R p + and {λ } R l such that x x, ɛ 0 and v 0, where i / I ɛ (x ) µ i = 0 and v G ɛ f (x ) + ɛ ((µ ) T, (λ ) T ) 0, µ i G ci ɛ (x ) + l λ j Gɛ hj (x ). (14) Since the derivatives of the involved functions are locally Lipschitz continuous, it follows that v = f(x ) + µ i c i (x ) + l λ j h j (x ) + r f + µ i r c i + l λ j r h j, where r f, r c i and r h j are all error vectors of order ɛ. Since v 0, ɛ 0 and ɛ ((µ ) T, (λ ) T ) 0, we get l f(x ) + µ i c i (x ) + λ j h j (x ) 0. (15) Additionally, because i / I ɛ (x ) µ i = 0, one can see that min{ c i (x ), µ i } 0 for all i {1,..., p}. This together with (15) fulfill the conditions of an AKKT point, which completes the proof. Remar 2. It is worth noticing that the reciprocal is not true, i.e., AKKT does not imply the ɛ-anoc in the smooth case. Indeed, considering the optimization problem given in Example 1, x = 0 is an AKKT point (around x the optimization problem is smooth, that justifies applying the AKKT concept at this point). However, we have showed that x does not satisfy the ɛ-anoc, which guarantees that AKKT ɛ-anoc. Hence, our new sequential optimality condition is also stronger than AKKT in the smooth context. We now proceed with two examples showing that CAKKT and the ɛ-anoc are not connected to each other. Example 2. Let us consider the following constrained optimization problem min x f(x) = x 1 x 2 s.t. x 1 0, x 2 0, x 1 x 2 0. Although the point x = 0 is not a local minimizer for the problem, if one chooses x = (1/, 1/) T, µ 1 = µ 2 = 0 and µ 3 =, one can see that x is a CAKKT point [3]. However, we state that x does not fulfill the ɛ-anoc. Indeed, suppose, by contradiction, that x satisfies the ɛ-anoc. Recalling the arguments used to go from (14) to (15), and since the objective and constraint functions are all smooth and have locally Lipschitz continuous derivatives, there must exist {x } = {(x 1, x 2) T } and {µ } R 3 + with ɛ µ 0 such that x x and [ 1 1 ] + µ 1 [ 1 0 ] [ + µ ] + µ 3 [ x 2 x 1 ] 0. 11

12 By the second line of the above relation, we have that, for large values of, the following must hold: x 1 > 0 and µ 3 > 1/(2x 1). Since ɛ µ 3 0, it yields ɛ = o(x 1). Hence, 1 / I ɛ (x ) and, consequently, µ 1 = 0 for large values of. So, by the first line, we obtain µ 3x 2 1, which assures x 2 < 0 and µ 3 > 1/(2 x 2 ) for sufficiently large. Since µ 3 can only be different from zero if 3 I ɛ (x ), the value ɛ must be greater than min{x 1, x 2 } for large values of. This contradicts the fact that ɛ µ 0, which allows us to conclude that x does not satisfy the ɛ-anoc. Example 3. We now consider the smooth optimization problem min x f(x) = (x 2 2) 2 2 s.t. x 1 = 0, x 1 x 2 = 0. As shown in [4], the point x = (0, 1) T is not a CAKKT point. However, if we choose x = (1/, 1) T, λ = (, ) T and ɛ = 1/ 2, we obtain that the ɛ-anoc is fulfilled as defined in Remar 1. Therefore, x satisfies the ɛ-anoc. The above examples show that, in fact, the ɛ-anoc is a new legitimate (i.e., that does not require any ind of constraint qualification) necessary optimality condition even in the case where all functions are smooth. 4 A practical algorithm The goal of this section is to present an algorithm that is able to generate a sequence of iterates satisfying the sequential optimality conditions introduced in the last section. Looing at relation (4) and the proof of Theorem 2.1, one can easily come up with the following algorithm: given ξ 0 and ρ R sufficiently large, whenever possible, set x argmin Ψ ξ,ρ(x). (16) x R n Although conceptually simple, the above procedure cannot be used in a practical method. For optimization problems involving nonconvex functions, an implementable algorithm usually guarantees that the method will only find stationary points for the problem, not minimizers. Moreover, the set of minimizers of the function Ψ ξ,ρ may be empty, therefore this procedure may not be well defined. We thus present Algorithm 1 (PACNO - Penalized Algorithm for Constrained Nonsmooth Optimization), where we avoid these issues by using a solution of a relaxed version of (16) (see Remar 3 for a more detailed discussion about this algorithm) where we trade the global minimization for the easier problem of Step 1 of Algorithm 1, which concerns only with approximate stationary points. This is sufficient to ensure that Algorithm 1 generates a sequence of iterates that verifies the ɛ- ANOC. However, first we need to show that the set SΨ defined in (17) will eventually be nonempty and, moreover, that the sequence {ρ } generated by the algorithm will stabilize at a sufficiently large value. To guarantee that SΨ, we need to consider one extra assumption. Assumption 1. There exists α > 0 such that F α := {x R n : c(x) + 1 α} is compact. Since we are dealing with possibly nonconvex functions, it is not sufficient to as for the compactness of the feasible set, but, instead, one needs to as for the compactness of the perturbed feasible set F α. This is required to exclude some pathological functions lie the one illustrated in Fig. 3, where the feasible set is compact but there is no α > 0 ensuring the compactness of F α. Nevertheless, the assumption can be easily ensured if the feasible points are additionally restricted to box constraints. Noticing that unbounded solutions do not occur for the majority of real problems, the user may include sufficiently large artificial box constraints to enforce the condition. We start by showing a technical lemma that will be helpful in the subsequent results. 12

13 Algorithm 1: Penalized Algorithm for Constrained Nonsmooth Optimization Step 0. Choose x 0 R n, ρ 1 [0, + ), M (0, + ), {θ ξ, θ ɛ, θ ν } (0, 1), ω (0, 1], {ξ 1, ɛ 1, ν 1 } (0, + ), ξ opt [0, ξ 1 ], ɛ opt [0, ɛ 1 ] and ν opt [0, ν 1 ]. Set = 1. Step 1. Define S Ψ := {x : Ψ ξ,ρ (x) Ψ ξ,ρ (x 1 ) and P (0 ɛ Ψ ξ,ρ (x)) ν }. (17) If S Ψ, then set x as any point in S Ψ. Otherwise, set x as any point x satisfying c(x) + 1 c(x 1 ) + 1 and P (0 ɛ c(x) + 1 ) ν, and go to Step 3. Step 2. If c(x ) + 1 < ξ opt, then { terminate, if ɛ ɛ opt and ν ν opt set ξ +1 = ξ and go to Step 5, otherwise. Step 3. If c(x ) + 1 > ωξ with ɛ ɛ opt and ν ν opt, then stop. The iterate x is close to a stationary point of the infeasibility measure. Step 4. Set ξ +1 = ξ (1 θ ξ )[ξ c(x ) + 1 ] +. Step 5. If f(x ) ρ M, then set ρ +1 = ρ. Otherwise, proceed with ρ +1 = ρ + 2(M + [f(x ) ρ ] + ). Step 6. Set ɛ +1 = θ ɛ ɛ, ν +1 = θ ν ν and + 1. Go bac to Step 1. Figure 3: An illustration of a function that does not have a compact F α. The set F stands for the feasible region. Notice that one cannot find α > 0 that ensures the compactness of F α. Lemma 4.1. Let {x } be a sequence of iterates generated by Algorithm 1 with ξ opt = ɛ opt = ν opt = 0. Then, there must exist a sequence {δ } R + converging to zero such that ξ c(x ) δ. Moreover, if lim inf c(x ) + 1 = 0, then ξ 0. Proof. Since ξ opt = ɛ opt = ν opt = 0, the algorithm will have infinitely many iterations. Moreover, notice that {ξ } is a positive bounded monotone decreasing sequence, which ensures that such a sequence must converge to some positive real value ξ. Hence, since ξ is always updated by Step 4 (when ξ opt = 0), we have ξ +1 = ξ (1 θ ξ )[ξ c(x ) + 1 ] + (1 θ ξ )[ξ c(x ) + 1 ] + = ξ ξ +1 lim [ξ c(x ) + 1 ] + = 0. In other words, there exists {δ } R +, with δ 0, such that ξ c(x ) δ. So, if lim inf c(x ) + 1 = 0, it is straightforward to see that ξ 0. The next lemma is useful to prove the result that, eventually, the set SΨ will have at least one element. It gives a sufficient condition for the existence of a global minimizer of Ψ ξ,ρ. 13

14 Lemma 4.2. Let ρ be a real number such that min f(x) ρ < 0 x F α/2 and ξ (0, α], with α being a strictly positive real number satisfying Assumption 1. function Ψ ξ,ρ defined in (5) has a global minimizer. Then, the Proof. Consider any fixed ξ (0, α]. Let us first show that Ψ ξ,ρ is bounded from below. By contradiction, let us suppose the opposite, i.e., we can find a sequence {x } such that Ψ ξ,ρ (x ). Since is a negative number, we must have, for all N, that c(x ) + 1 ξ, since, otherwise, we would have Ψ ξ,ρ (x ) = c(x ) Because ξ (0, α], this means that x F α, for all N. However, Ψ ξ,ρ is a continuous function, which yields that it must assume a minimum value on F α. This is a contradiction with the fact that Ψ ξ,ρ (x ), for all N. Therefore, the function Ψ ξ,ρ must be bounded. Now, let us then consider a sequence {z } such that Ψ ξ,ρ (z ) inf x R n Ψ ξ,ρ (x). By hypothesis, we now that there exists ˆx F α/2 such that f(ˆx) ρ < 0. This implies that inf Ψ ξ,ρ(x) Ψ ξ,ρ (ˆx) c(ˆx) + 1. x R n Therefore, for any large N, we must have Ψ ξ,ρ (z ) 2 c(ˆx) + 1. Since, ˆx F α/2, it implies that Ψ ξ,ρ (z ) α. Recalling that ξ α, we must have z F α, for any large N. So, because F α is compact, one can ensure that there exists a subsequence of {z } converging to z F α such that Ψ ξ,ρ (z) = inf x R n Ψ ξ,ρ (x), which ends the proof. The following result is a technical lemma and it proves that c(x ) + 1 will be related with the value ξ and the infeasibility measure of the past iteration. It also ensures that our method will not suffer from the greediness phenomenon. Lemma 4.3. Let N be any iteration of Algorithm 1. Then, c(x ) + 1 max { ξ, c(x 1 ) + 1 }. Proof. By contradiction, suppose that there exists an iteration such that c(x ) + 1 > max { ξ, c(x 1 ) + 1 }. (18) Then, because of Step 1, we have x S Ψ. Therefore, Ψ ξ,ρ (x ) Ψ ξ,ρ (x 1 ). Moreover, because of Step 5, one can see that f(x 1 ) ρ < 0, which implies Ψ ξ,ρ (x ) Ψ ξ,ρ (x 1 ) c(x 1 ) + 1. Thus, by (18), we have Ψ ξ,ρ (x ) < c(x ) + 1. Consequently, the above inequality can only be true in the case that c(x ) + 1 < ξ, which contradicts (18). The next result tells us that, when the sequence of iterates approaches the feasible set, then, eventually, the set S Ψ will be nonempty and the sequence {ρ } will stabilize. Lemma 4.4. Let {x } be a sequence of iterates generated by Algorithm 1 with ξ opt = ɛ opt = ν opt = 0. If lim inf c(x ) + 1 = 0, then there exist a sufficiently large ˆ N and ρ R such that, for all ˆ, we have S Ψ and ρ = ρ. Proof. First, let us prove that S Ψ. Since, lim inf c(x ) + 1 = 0, we must have an iteration ˆ such that c(xˆ) + 1 α/2, where α is the positive real number satisfying Assumption 1. Consequently, by Step 5 of the algorithm, one can see that min x F α/2 f(x) ρ f(xˆ) ρ < 0, for any > ˆ. 14

15 Additionally, Lemma 4.1 tells us that ξ 0. So, by Lemma 4.2, we see, for any large N, that S Ψ. Let us now prove that there exists ρ R such that ρ = ρ, for any large N. By contradiction, let us assume that the statement is false. This means, by the way the algorithm was designed, that there exists an infinite index set K N such that f(x ) ρ > M, for all K, (19) implying that ρ. Because lim inf c(x ) + 1 = 0, it means that there exists an iteration ˆ satisfying c(xˆ) + 1 α and ξˆ α (because of Lemma 4.1). Consequently, due to Lemma 4.3 and recalling that {ξ } is a monotone decreasing sequence, we have c(x ) + 1 α, i.e., x F α, for all ˆ. However, by hypothesis, F α is compact, which, together with the assumption ρ, implies ρ max x F α f(x) + M f(x ) + M, for any large N. (20) This is a contradiction with (19). Therefore, there must exist ρ such that ρ = ρ, for any large N. We are now ready to present the convergence theorem of Algorithm 1. Because the nonsmooth optimization problem in hand might involve nonconvex functions, the result ensures that a cluster point x of the iterate sequence {x } can either be a stationary point for the infeasibility measure c( ) + 1, or satisfies the ɛ-anoc. Theorem 4.1. If {x } is the sequence of iterates generated by Algorithm 1 with ξ opt = ɛ opt = ν opt = 0 and θ ɛ (0, θ ξ ), then, given any infinite index set K N such that x x, for some K x R n, one of the following statements must be true: a) x satisfies 0 c(x ) + 1 ; b) x is a feasible point for (P) and it satisfies the ɛ-anoc. Moreover, there exists ρ R such that ρ = ρ, for any large N. Proof. We start by noticing that, since ξ opt = ɛ opt = ν opt = 0, Algorithm 1 in fact generates an infinite sequence of points {x } R n. Moreover, ξ is always updated by Step 4 (since ξ opt = 0), hence ξ +1 = ξ (1 θ ξ )[ξ c(x ) + 1 ] + θ ξ ξ. Therefore, because we suppose θ ɛ (0, θ ξ ), it yields ɛ +1 η ɛ η 2 ɛ 1... η ɛ 1, where η = θ ɛ < 1, ξ +1 ξ ξ 1 ξ 1 θ ξ which ensures ɛ /ξ 0. We then divide the proof in the following cases: i) lim inf K c(x ) + 1 /ξ 1 and there exists an infinite index set ˆK K such that S Ψ =, for all ˆK; ii) lim inf K c(x ) + 1 /ξ 1 and SΨ, for all K sufficiently large; iii) lim inf K c(x ) + 1 /ξ < 1. Suppose i) holds. By Step 1, it follows that P ( ) 0 ɛ c(x ) Consequently, since ˆK ɛ 0, [40, Lemma 3.2 (iii)] guarantees that 0 c(x ) + 1. Assume now that case ii) holds. This yields { } max 1 c(x ) + 1, 0 0. (21) ξ K 15

16 Additionally, by the way Algorithm 1 updates the value ρ, we must have ρ > f(x ) + M, for any large iteration. Recalling that ν 0, ɛ 0 (because ν opt = 0 and ɛ opt = 0), SΨ for any large iteration K, and the result of Lemma 2.1, it follows, for any large K, that there exists {x,j } n+1 B(x, ɛ ), and λ R n+1 with e T λ = 1, such that there exists r Ψ ξ,ρ (x ) with r 0 and n+1 r λ j ( { } ) max 1 c(x,j ) + 1, 0 f(x,j ) + σ j c(x,j ) + 1, (22) ξ where σj 1, j {1,..., n + 1}. Due to (21), x x,j ɛ, and ɛ /ξ 0, we get, for any j {1,..., n + 1}, { } max 1 c(x,j ) + 1, 0 ξ 0. K The above limit together with (22) tell us that P(0 ɛ c(x ) + 1 ) K 0. So, since ɛ 0, we obtain 0 c(x ) + 1 due to [40, Lemma 3.2 (iii)]. Finally, we assume the validity of case iii), and we claim c(x ) Indeed, because {ξ } is a monotone decreasing and bounded sequence, it must follow that ξ ξ, for some ξ 0. According to Lemma 4.1, there exists {δ } R + satisfying δ 0 such that ξ c(x ) δ 1 lim inf K c(x ) + 1 ξ + lim inf K δ δ lim inf > 0. ξ K ξ This yields ξ = 0. However, by the way we have designed our algorithm, this only happens if there exists an infinite index set ˆK N with c(x ) Therefore, given any τ > 0, there exists ˆ such that c(xˆ) + 1 < τ and ξˆ < τ. Because of Lemma 4.3, this ensures ˆ c(x ) + τ. Since τ > 0 is arbitrary, we obtain c(x ) + 1 0, which guarantees that x is a feasible point. Then, Lemma 4.4 assures that, for all large enough, we have S Ψ and ρ = ρ for some ρ R. Notice that Step 5 ensures f(x ) ρ M. This information, together with Step 1, Lemma 2.2, the limit ɛ /ξ 0 and the reasoning used in the proof of Theorem 2.2, prove that the cluster point x satisfies the ɛ-anoc. ˆK Remar 3. Some comments regarding Algorithm 1 are in order: (a) The technique based on sampling points to approximate G ɛ (x) that originated the method nown as Gradient Sampling (GS) [16, 40], and its recent variants [24, 25, 33, 34, 35, 44] have shown to be effective tools for minimizing nonsmooth and nonconvex functions. The good results of those methods assure that Step 1 is not just an idealized step, but it can be performed in practice 1. In addition, BFGS techniques, which are not guaranteed to converge but wor well in practice, may be used as well [22, 42]. Both possibilities were considered in Section 5. (b) Looing carefully at the proof of Theorem 4.1, we can understand the importance of Step 3 inside Algorithm 1. When c(x ) + 1 /ξ is greater than or approaching one, this implies that we are losing information about the objective function along the iterations, meaning that the method is tending to a stationary point of the infeasibility measure. Therefore, this suggests that one should choose ω 1. (c) For the case that problem (P) satisfies calmness [12] and ρ is large enough, every local minimizer of (P) is also a local minimizer of Ψ ξ,ρ for every ξ small enough (this can be easily proven by following the same reasoning used in the proof of [18, Proposition 6.4.3]). This ensures that, in many cases, PACNO will find a good approximation to the solution of the problem without needing to bring ξ down to zero. 1 Assuming that f and c i, i {1,..., p}, are continuously differentiable in full-measure open subsets of R n, all hypotheses required by the convergence theory of the GS method are satisfied. 16

17 5 Numerical results This section has the goal to illustrate different properties of PACNO. Subsection 5.1 is devoted to show that, since our method is based on a legitimate necessary optimality condition (i.e., our convergence result does not depend on any ind of constraint qualification), PACNO may achieve good convergence to the solution of the nonsmooth optimization problem even in the absence of calmness. Additionally, we exhibit a practical example in which PACNO prevents the greediness phenomenon. Subsection 5.2 reveals that our method may converge even when the optimization problem does not satisfy all of our convergence hypotheses. Finally, Subsection 5.3 aims to illustrate that Step 1 of PACNO may accept different solvers to find a stationary point of Ψ ξ,ρ. The tests were performed in a noteboo DELL Latitude 7490, processor Intel Core i7-8650u, CPU 2.11GHz, with 16GB RAM (64-bit) using Matlab R2018a. 5.1 Nonsmooth optimization in the absence of calmness In the previous sections, we have discussed the theoretical benefits of applying the penalization strategy used in Ψ ξ,ρ over the exact penalization approach to produce a practical necessary optimality condition. However, one can wonder if our penalization idea has any practical advantage when compared to the standard exact penalization procedure. Aiming to elucidate this matter, we present a simple nonsmooth optimization problem: min f(x 1, x 2 ) = max{x 3 1 x 2, x 2 } s.t. c(x 1, x 2 ) = (x 1 10) 2 0. (23) x 1,x 2 The optimal solution of this problem is given by x = (10, 500) T with its respective optimal value f = 500. Due to our choice of the constraint function that defines the feasible set F = {(x 1, x 2 ) R 2 : x 1 = 10}, the calmness property does not hold at the optimal point. This has a great impact on the behavior of methods that are based on the exact penalization approach, since this implies that one cannot solve the original problem with a finite penalty parameter [12]. In case this parameter value is allowed to go to infinity, this usually produces a sequence of points that has a poor precision on the optimal function value. To illustrate how a method based on exact penalization behaves in the absence of calmness, we have solved the nonsmooth problem (23) using the SLQP-GS 2 (version 1.3) [23] in its default mode, with the exception that we have allowed the SLQP-GS algorithm to run 10 4 iterations in order to observe the optimal value precision that such a method is able to reach. In a similar way, the GS algorithm [16, 40] was chosen to be our internal solver for Step 1 a version that we call PACNO GS and we have used the following parameter values for PACNO GS : ρ 1 = 0, M = 10, θ ξ = 0.5, θ ɛ = 0.25, θ ν = 0.5, ω = 0.99, ξ 1 = ɛ 1 = ν 1 = 1 and ξ opt = ɛ opt = ν opt = In addition, for each call of GS, the sampling radii were set first as min{10ɛ, 1} with optimality tolerance given by min{10ν, 1}, and then min{ɛ, 1} as the final sampling radius with optimality tolerance min{ν, 1}. The results are shown in Figs. 4 and 5, so that one can follow the reached distributions of the inviability measure in (a), and of the relative errors ((b) in the domain, and (c) in the image space). Because the SLQP-GS method relies on exact penalization and, additionally, its convergence theory is established only when the nonsmooth problem satisfies the calmness property at the solution, it was expected that the precision achieved by the method regarding the optimal function value would not be satisfactory. On the other hand, our algorithm possesses a convergence theory even in the absence of calmness, and, as anticipated, the precision obtained related to the optimality measure is considerably better. In addition, the PACNO GS algorithm is able to eep the iterates closer to the feasible set. Nevertheless, one can easily represent the feasible set F in a manner that the calmness property is satisfied. Indeed, we can consider the constraint x instead of (x 1 10) 2 0 (see 2 The code can be found in 17

(a) c(x SLQP) + 1 (b) xslqp x x (c) f(xslqp) f f Figure 4: Boxplots of the results of twenty runs (depicted as ) of the SLQP-GS method with different initial points chosen in a box [ 5, 5] 2 centered

(a) c(x PACNO) + 1 (b) xpacno x x (c) f(xpacno) f f Figure 5: Boxplots of the result of twenty runs (depicted as ) of the PACNO GS method with different initial points chosen in a box [ 5, 5] 2

18 (a) c(x SLQP) + 1 (b) xslqp x x (c) f(xslqp) f f Figure 4: Boxplots of the results of twenty runs (depicted as ) of the SLQP-GS method with different initial points chosen in a box [ 5, 5] 2 centered at the optimal solution of problem (23). The last iterate obtained by the SLQP-GS method is represented by x SLQP. (a) c(x PACNO) + 1 (b) xpacno x x (c) f(xpacno) f f Figure 5: Boxplots of the result of twenty runs (depicted as ) of the PACNO GS method with different initial points chosen in a box [ 5, 5] 2 centered at the optimal solution of problem (23). The last iterate obtained by the PACNO GS method is represented by x PACNO. Figs. 6 and 7). For this new optimization problem, the penalty parameter in the exact penalization approach no longer must go to infinity. However, methods based on exact penalization may experience another undesirable behavior: by initializing the method with an inappropriate value for the penalty parameter, such methods may present the greediness phenomenon. This anomaly can be seen when one tries to solve this new nonsmooth problem with the SLQP-GS method. In many runs, due to the initial value of the penalty parameter in the standard configuration of the SLQP-GS algorithm, the method gives excessive importance to the objective function in the first iterations, which carries the iterates away from the feasible set, preventing the method to converge even when one allows a large number of iterations to be performed (10 4 iterations). However, it is worth mentioning that, if the user sets a better scaled penalty parameter value, the SLQP-GS algorithm will easily converge to the solution point. In contrast, the PACNO GS algorithm is able to reach the solution without needing to bring the parameter ξ down to zero (in the twenty runs that were performed, the mean value of ξ was ept above 10 3 ), and, due to Lemma 4.3, the good behavior of PACNO GS is not subjected to a tuned value of ξ. 5.2 Bilevel optimization via a nonsmooth approach One of the most usual ways to solve a bilevel optimization problem [19] is to consider a mathematical problem with equilibrium constraints (MPEC) instead of the original multilevel instance. However, 18

(a) c(x SLQP) + 1 (b) xslqp x x (c) f(xslqp) f f Figure 6: Boxplots of the result of twenty runs (depicted as ) of the SLQP-GS method with different initial points chosen in a box [ 5, 5] 2 centered

19 (a) c(x SLQP) + 1 (b) xslqp x x (c) f(xslqp) f f Figure 6: Boxplots of the result of twenty runs (depicted as ) of the SLQP-GS method with different initial points chosen in a box [ 5, 5] 2 centered at the optimal solution of problem (23). The last iterate obtained by the SLQP-GS method is represented by x SLQP. (a) c(x PACNO) + 1 (b) xpacno x x (c) f(xpacno) f f Figure 7: Boxplots of the result of twenty runs (depicted as ) of the PACNO GS method with different initial points chosen in a box [ 5, 5] 2 centered at the optimal solution of problem (23). The last iterate obtained by the PACNO GS method is represented by x PACNO. there are situations in which this approach may generate an MPEC that is not equivalent to the original problem. For example, the authors of [26] show that the following minimization min x,y x s.t. x 0, y Λ(x), with Λ(x) := argmin y {y 1 : y 2 1 y 2 x, y y 2 0}, may generate an MPEC that does not possess an optimal solution when one considers the KKT conditions of the minimization problem related to the definition of Λ(x). Indeed, consider the following nonsmooth single level optimization problem in which the first three constraints are associated with the KKT conditions of the implicit optimization problem presented in Λ(x), min x,y,λ x s.t. 2λ 1 y 1 + 2λ 2 y λ 1 + λ 2 0 min{ y1 2 + y 2 + x, λ 1 } 2 + min{ y1 2 y 2, λ 2 } 2 0 x 0. (24) One can see that, for every x > 0, it is always possible to find y(x) and λ(x) such that (x, y(x), λ(x)) is a feasible point. Although the objective function value of this problem converges to zero when x 0, the feasible set is empty for x = 0. This indicates that the MPEC instance (24) does not possess 19

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general