Technische Universität Dresden Institut für Numerische Mathematik. An LP-Newton Method: Nonsmooth Equations, KKT Systems, and Nonisolated Solutions

Als Manuskript gedruckt Technische Universität Dresden Institut für Numerische Mathematik An LP-Newton Method: Nonsmooth Equations, KKT Systems, and Nonisolated Solutions F. Facchinei, A. Fischer, and M. Herrich MATH NM 5 2011 September 2011

An LP-Newton Method: Nonsmooth Equations, KKT Systems, and Nonisolated Solutions Francisco Facchinei Department of Computer and System Sciences Antonio Ruberti Universita di Roma La Sapienza Via Ariosto 25, 00185 Rome, Italy facchinei@dis.uniroma1.it Andreas Fischer 1 and Markus Herrich Institute of Numerical Mathematics Department of Mathematics Technische Universität Dresden 01062 Dresden, Germany Andreas.Fischer@tu-dresden.de Markus.Herrich@tu-dresden.de September 15, 2011 Abstract. We define a new Newton-type method for the solution of constrained systems of equations and analyze in detail its properties. Under suitable conditions, that do not include differentiability or local uniqueness of solutions, the method converges locally quadratically to a solution of the system of equations, thus filling an important gap in the existing theory. The new algorithm improves on known methods and, when particularized to KKT systems deriving from optimality conditions for constrained optimization or variational inequalities, it has theoretical advantages even over methods specifically designed to solve such systems. Keywords. Quadratic convergence, Newton method, nonsmooth system, nonisolated solution, KKT system Mathematics Subject Classification (2010). 90C30, 90C33, 49M15, 65K05 1 Part of this research was done while this author was visiting the Department of Computer and System Sciences Antonio Ruberti at the University of Rome La Sapienza. The financial support by the University of Rome La Sapienza is kindly acknowledged. 2

1 Introduction In this paper we develop a fast, local method for the solution of the constrained system of equations F(z) = 0, z Ω, (1) where Ω R n is a nonempty and closed set and F : R n R m is a given continuous map. This problem has a long and glorious history and its importance cannot be overestimated. To put this paper in perspective it might be useful to present a very short review of results. We begin by considering the most classical case of problem (1): the solution of a square system of equations, which corresponds to n = m and Ω = R n, that is we consider first the solution of F(z) = 0 (2) with F : R n R n. The prototype of all fast, local algorithms is Newton s method which starts at a point z 0 close to a solution and generates a sequence {z k } by setting z k+1 equal to the solution of the linear system F (z k )(z z k ) = F(z k ), (3) where F denotes the Jacobian of F. The classical result for this method states that if z is a solution of (2), F is twice continuously differentiable in a neighborhood of z, and F (z ) is nonsingular, then {z k } converges quadratically to z provided that z 0 belongs to a suitably small neighborhood of z (throughout this paper all convergence rates are quotient rates, so that quadratic stands for Q-quadratic). This result is the cornerstone for the development of a host of variants and relaxations which all assume at least the continuous differentiability of F and the nonsingularity of the Jacobian of F at z. However, in the optimization community the need for results that go beyond the classical ones has long been felt. The development of new applicative fields and the refinements of the analysis of standard settings all pointed to the need to extend the classical Newton method in at least three directions: 1. Relaxation of the differentiability assumption; 2. Relaxation of the nonsingularity assumption; 3. The ability to find a solution lying in some prescribed set Ω. The differentiability issue is probably the one on which has attracted more attention due to its practical importance. In fact it is well understood that nondifferentiable systems of equations arise quite naturally both when modeling natural phenomena and in the study of problems that are usually thought as smooth, for example the Kojima reformulation of the KKT system of an optimization problem or the equation reformulations of complementarity systems. We cannot go into the details of these developments, but it is safe to say that, computationally, the big breakthrough has been the development of semismooth Newton methods ([26, 30, 31] and [12] for more bibliographical references). For example, if the function F is assumed to be strongly semismooth the iteration (3) can be substituted by V k (z z k ) = F(z k ), V k F(z k ), 3

where F(z k ) denotes the generalized Jacobian of Clarke. Assuming that all the elements in F(z ) are nonsingular, the semismooth Newton method can be shown to be well defined and quadratically convergent to z provided that z 0 belongs to a suitably small neighborhood of z. The remarkable things about this semismooth Newton method are its simplicity and formal similarity to the classical Newton method and the fact that when F happens to be twice continuously differentiable around z it reduces automatically to the classical method. Advancements on the relaxation of the nonsingularity assumption that are useful in optimization contexts are more recent. They were prompted by the desire to develop fast methods for the solution of classical problems, typically KKT systems and complementarity problems, under assumptions that are weaker than traditional ones. For example, the study of the convergence properties of algorithms for the solution of the KKT system of a constrained optimization problem were traditionally carried out assuming the linear independence of the gradients of the active constraints. But this is a strong assumption, and as soon as one relaxes it, the solutions of the KKT system become nonisolated in general, since the multipliers are no longer unique. It is then clear that in this setting the nonsingularity assumption is not reasonable, since it obviously implies the local uniqueness of the solution. The key to advancements on these issues was the understanding that crucial to the quadratic convergence of the Newton method is not the nonsingularity of the Jacobian per se, but rather one of its consequences: the error bound condition. Let Z be the set of solutions of the system F(z) = 0. We say that F provides a local error bound around some z Z if there exist positive constants l and δ, such that dist[s,z] l F(s) for all s B δ (z ), where, B δ (z ) := {z R n z z δ} is the ball of radius δ > 0 and dist[s,z] := inf{ s z z Z} denotes the distance of a point s to the solution set Z. Roughly speaking, the error bound condition holds if the function F itself provides an (over-) estimate of the distance to the solution set for every point sufficiently close to the solution z. Assuming for simplicity twice continuous differentiability of F and this error bound condition, it is possible to design a Levenberg-Marquardt method that retains quadratic convergence even in the case of nonisolated solutions. With this approach, given z k, the new iteration z k+1 is the unique solution of the following quadratic convex optimization problem min z F(z k ) + F (z k )(z z k ) 2 2 + µ k z z k 2 2, (4) where µ k is a strictly positive scalar. If the sequence {µ k } is chosen appropriately, it can be shown that the resulting algorithm generates a sequence {z k } that converges quadratically to a possibly nonisolated solution of (2) [13, 17, 38], see also [10] for the use of Levenberg- Marquardt method to compute an isolated solution of a nonsmooth system of equations. One important point to note is however that, in spite of the many efforts devoted to this issue, the semismooth Newton method and the Levenberg-Marquardt method under an error bound condition seem incompatible : fine details apart, when F is nondifferentiable, to date there is no general method with a fast local convergence rate for systems F(z) = 0 with nonisolated solutions under a mere error bound condition. This is a crucial issue since it is by now very clear that this is precisely the feature one needs in order to make interesting advancements in the solution of structured system of equations like those arising from the KKT conditions of optimization problems, variational inequalities and generalized Nash games (see [12] for more information about variational inequalities and [11] for a discussion of generalized Nash games). 4

Turning to the addition of the constraint z Ω, the subject is still in its infancy. The utility of limiting the search of solutions to a prescribed set is of obvious applicative importance. And indeed, in many situations it is known a priori the solutions we are interested in should belong to some set: for example, the multipliers relative to inequality constraints in a KKT system should be nonnegative. The presence of a constraint z Ω could also have a less obvious technical significance: by restricting the region where we are looking for the solution we could also obtain gains in terms of differentiability and nonsingularity assumptions that could lead to improvements with respects to the developments outlined above. This is a rather technical issue that we don t discuss further at this point, but that will play an important role in this paper. For the time being we only mention that the presence of a constraint set Ω, if convex, can easily be incorporated in the Levenberg-Marquardt approach, essentially by changing the subproblem (4) to min F(z k ) + F (z k )(z z k ) 2 z 2 + µ k z z k 2 2, s.t. z Ω, (5) see [25] and [1]. Other approaches, essentially interior point methods, may also be suited for the solution of constrained systems of equations, but are not very relevant to our developments, we refer the interested reader to [12] for further information. In this paper we propose a new, fast, local algorithm for the solution of the constrained system of equations (1). The new method is rather different from previous methods. Given an iterate z k Ω, we take z k+1 as a solution of the subproblem min z,γ γ z Ω, F(z k ) + G(z k )(z z k ) γ F(z k ) 2, z z k γ F(z k ), γ 0, where G(z k ) is a suitable substitute for the Jacobian of F at z k (if the function F is differentiable then we can take it to be the Jacobian). If is the infinity norm and if Ω is polyhedral (an assumption which is satisfied in practically all applications), the above problem is a simple linear program, whence the name LP-Newton method. The main contributions we provide in this paper are: The definition of the new LP-Newton method and the investigation of its local convergence properties based on a new set of assumptions; A thorough analysis of the assumptions under which quadratic convergence can be guaranteed, proving that the new method can successfully deal with interesting classes of constrained nonsmooth systems of equations with nonisolated solutions, thus overcoming one of the main limitations of all Newton-type methods proposed so far; A detailed discussion of the applicability of the LP-Newton method to KKT systems arising from constrained minimization problems or variational inequalities, showing that the new method compares favorably to existing, specialized methods for KKT systems, as those in [14, 16, 20, 22, 23, 24, 35, 36, 37]. 5

We also give a few hints to the applicability of the LP-Newton method to a wide range of problems beyond KKT systems and discuss some numerical examples. The rest of the paper is organized as follows. Section 2 describes our new method for the solution of problem (1). It is shown that this method converges quadratically to a solution of (1) under weak assumptions. In Section 3 these assumptions are analyzed in great detail. The results of Section 3 are applied in Section 4 to establish the local convergence properties of our method for the solution of KKT systems. In the last section we give further applications of our method and some numerical examples. 2 The Algorithm and its Local Convergence Properties In this section we describe our iterative method for the solution of the constrained system of nonlinear equations (1) and show that the algorithm is well-defined. Then, after stating our basic assumptions, we investigate the local convergence properties of the algorithm. The main result of this section is Theorem 1 on the local quadratic convergence of the method. Throughout the paper the solution set of (1) is denoted by Z and it is assumed that this set is nonempty, i.e., Z := {z Ω F(z) = 0} =. (6) We recall that Ω is simply assumed to be a closed set. For computational reasons, see comments after (7), the convexity of Ω is a highly desirable characteristic, but convexity is not strictly necessary from the theoretical point of view. However, even if we won t assume convexity explicitly, we stress from the outset that, in essentially all problems we are interested in, Ω is either just the whole space, Ω = R n, or a polyhedral set. In the description of the algorithm we use a mapping G : R n R m n. We will shortly describe the requirements this mapping must meet. However, for the time being, one should think of G(s) as a suitable substitute of the Jacobian. If the function F is continuously differentiable, then taking G(s) := F (s) will be the natural choice, while, if F is only locally Lipschitzian, G(s) can be taken to be an element of the B-subdifferential B F(s) (see Subsection 3.1 for definition) or of its convex hull F(s). For given s Ω we consider the following optimization problem: min z,γ γ z Ω, F(s) + G(s)(z s) γ F(s) 2, z s γ F(s), γ 0. (7) The subproblems of our algorithm will be of this form. By we denote an arbitrary but fixed vector norm in R n or R m. A convenient choice of the norm is the infinity norm, i.e., =. Then, if Ω is polyhedral, (7) is just a linear optimization problem. It is here that the convexity of Ω plays an important computational role since in any case, whatever norm, if Ω is convex, (7) is a convex optimization problem and, as such, easily solvable. The next proposition shows that problem (7) has a solution for any s R n. 6

Proposition 1 For any s R n, (a) the optimization problem (7) has a solution and (b) the optimal value of (7) is zero if and only if s is a solution of (1). Proof. Assertion (b) is obvious, so we only prove (a). If s Z the assertion is clear. Otherwise, let z be an element of Ω. With γ := max{ F(s) + G(s)( z s) F(s) 2, z s F(s) 1 }, we see that the feasible set of (7) contains ( z, γ). This remains true if we modify (7) by adding the constraint γ γ. The modified problem has a nonempty compact feasible set and a continuous objective. Thus, the theorem of Weierstrass shows that the modified problem is solvable. Since (7) has the same solution set it also has a solution. Due to Proposition 1 the optimal value of problem (7) is well-defined for any s and will be denoted by γ(s). Now, we formally describe our method for the solution of problem (1). Algorithm 1: LP-Newton Algorithm (S.0) : Choose a starting point z 0 Ω. Set k := 0. (S.1) : If z k Z then stop. (S.2) : Compute a solution (z k+1,γ k+1 ) of (7) with s := z k. (S.3) : Set k := k + 1 and go to (S.1). Algorithm 1 is well-defined for any starting point z 0 due to Proposition 1. Moreover, although subproblem (7) needs not have a unique solution in respect to the z-part, it suffices that the algorithm picks an arbitrary solution of the subproblem. To analyze local convergence properties of a sequence generated by Algorithm 1 we now state some assumptions. To this end, let z Z denote an arbitrary but fixed solution of (1). Moreover, let δ > 0 be the arbitrary but fixed radius of the ball B δ (z ) around z. Assumption 1 There exits L > 0 such that holds for all s B δ (z ) Ω. F(s) Ldist[s,Z] This assumption is a very weak one and it is satisfied if F is locally Lipschitz continuous. Assumption 2 There exists l > 0 such that holds for all s B δ (z ) Ω. dist[s, Z] l F(s) If Assumption 2 holds, we also say that F provides a local error bound around z on Ω. As we discussed in the Introduction, Assumption 2 turned out to be a key condition for proving local superlinear convergence in cases when z is not an isolated solution of problem (1), see 7

[1, 9, 13, 17, 18, 19, 25, 33, 38] as examples for its use in Levenberg-Marquardt methods and [4, 5, 14, 16, 17, 20, 23, 24, 27, 35, 37, 39] for further methods and related assumptions. The next two assumptions are rather more technical and new to some extent, at least in the analysis of Newton type methods. If their technicality may make it difficult to immediately appreciate these assumptions significance, it is precisely this quality that will allow us to obtain some of the strong results that will be described later on. These assumptions will be discussed in depth in the next section. Note also that these assumptions give implicit conditions on the choice of the mapping G. Assumption 3 There exists Γ 1 such that holds for all s B δ (z ) Ω. γ(s) Γ Recall that γ(s), the optimal value of program (7), is well-defined by Proposition 1. Assumption 3 requires that these optimal values are uniformly bounded in a neighborhood of z (intersected with Ω). In the analysis of stability issues of parametric optimization problems an assumption that turns out to be important is an inf-boundedness condition, see for example [2] and references therein. It can be shown that Assumption 3 is in fact equivalent to requiring uniform inf-boundedness around z of the subproblems (7) when these are seen as parametric problems with s as parameter. A key part of our developments will be understanding when, in practical settings, this condition is satisfied. In Section 3 we will show that Assumption 3 is satisfied in the standard case in which F is differentiable, has a locally Lipschitz continuous derivative and provides a local error bound on Ω. In addition, in Section 3 we will present further sufficient conditions for Assumption 3 to hold that cover nonsmooth settings. Assumption 4 There exists ˆα > 0 such that } w L (s,α) := {w Ω w s α, F(s) + G(s)(w s) α 2 implies for all s (B δ (z ) Ω) \ Z and all α [0,δ]. This assumption requires that the mapping F(w) ˆαα 2 w F(s) + G(s)(w s) is in some sense a good approximation of the mapping w F(w) for w Ω with w sufficiently close to s. In Section 3 we will provide some sufficient conditions for Assumption 4. In particular, we will prove in Subsection 3.1 that also Assumption 4 holds in the standard setting in which F is differentiable with F being locally Lipschitzian. Moreover, Subsection 3.3 will show that this assumption may hold also for structured nondifferentiable functions. The next theorem provides the local quadratic convergence of Algorithm 1 if Assumptions 1 4 are satisfied. 8

Theorem 1 Algorithm 1 is well-defined for any starting point z 0 Ω. If Assumptions 1 4 are satisfied, then there is r > 0 such that any infinite sequence {z k } generated by Algorithm 1 with starting point z 0 B r (z ) Ω converges quadratically to some ẑ Z. In order to prove this theorem we need two preliminary lemmas that are stated and proved next. Lemma 1 Let Assumption 3 be satisfied and define the set F (s,γ) by F (s,γ) := { z Ω z s Γ F(s), F(s) + G(s)(z s) Γ F(s) 2}. Then, for any s B δ (z ) Ω, the set F (s,γ) is nonempty. If, in addition, Assumption 1 is satisfied, then F(s) + G(s)(z s) ΓL 2 dist[s,z] 2 and z s ΓLdist[s,Z] hold for all z F (s,γ). Proof. Let us choose any s B δ (z ) Ω and let (z(s),γ(s)) be a solution of problem (7). Then, Assumption 3 yields z(s) F (s,γ) so that this set is nonempty. Now, let z F (s,γ) be arbitrary but fixed. The definition of the set F (s,γ) and Assumption 1 imply F(s) + G(s)(z s) Γ F(s) 2 ΓL 2 dist[s,z] 2 and z s Γ F(s) ΓLdist[s,Z]. Lemma 2 Let Assumptions 1 4 be satisfied. Then, there are ε > 0 and C > 0 such that, for any s B ε (z ) Ω, holds for all z F (s,γ). Proof. Let us first choose any ε according to dist[z,z] C dist[s,z] 2 1 2 dist[s,z] 0 < ε 1 2 min{ δ, δγ 1 L 1, ˆα 1 l 1 Γ 2 L 2}. (8) For s Z the assertion is clear because then F (s,γ) = {s} holds. So, let us choose s (B ε (z ) Ω)\Z and z F (s,γ). Lemma 1, together with (8), provides z s ΓLdist[s,Z] ΓL s z ΓLε 1 2 δ. Therefore, z z z s + s z δ 2 + δ 2 = δ, i.e., z B δ (z ) Ω follows. Since z F (s,γ) and Γ 1 yield F(s) + G(s)(z s) Γ F(s) 2 Γ 2 F(s) 2 9

and z s Γ F(s), we have z L (s,α), with α := Γ F(s). Moreover, α = Γ F(s) ΓLdist[s,Z] ΓLε δ 2 follows by Assumption 1 and (8). Thus, Assumption 4 implies F(z) ˆαα 2 = ˆαΓ 2 F(s) 2. Using this, Assumptions 1 and 2, and (8), we obtain dist[z, Z] l F(z) ˆαlΓ 2 F(s) 2 ˆαlΓ 2 L 2 dist[s,z] 2 ˆαlΓ 2 L 2 ε dist[s,z] 1 2 dist[s,z]. Hence, the assertion follows with C := ˆαlΓ 2 L 2. Proof of Theorem 1 We already noted that Algorithm 1 is well-defined for any z 0 Ω. With ε according to (8) let us choose r so that ε 0 < r 1 + 2ΓL. We first show by induction that the assertions and z k B ε (z ) Ω (9) z k+1 F (z k,γ) (10) are valid for all k N. For k = 0 the first assertion is clear by r < ε. Moreover, since r < ε < δ due to (8), Assumption 3 implies z 1 F (z 0,Γ). Suppose now that (9) and (10) hold for k = 0,...,ν. To show z ν+1 B ε (z ) Ω we first note that z ν+1 z z ν z + z ν+1 z ν z 0 z ν + }{{} z j+1 z j. (11) j=0 r Due to (9) and (10) for k = 0,...,ν and Lemma 1 we have, for all j = 0,...,ν, Because of (9) and (10), Lemma 2 implies z j+1 z j ΓLdist[z j,z]. (12) dist[z j,z] 1 2 dist[z j 1,Z] 10 ( ) 1 j dist[z 0,Z] 2

for j = 0,..., ν. Therefore, using (11) and (12), we obtain z ν+1 z r + ΓLdist[z 0,Z] }{{} r (1 + 2ΓL)r ε. ν ( ) 1 j j=0 2 }{{} 2 Thus, z ν+1 B ε (z ) Ω is valid. This and Assumption 3 imply z ν+2 F (z ν+1,γ). Hence, (9) and (10) hold for k = ν + 1 and, consequently, for all k N. Because of (9) and (10), Lemma 2 provides for all k N. This yields dist[z k+1,z] C dist[z k,z] 2 1 2 dist[zk,z] (13) lim k dist[zk,z] = 0. (14) For j,k N with k > j we obtain from Lemma 1, (10), and Lemma 2 that z k z j k 1 ( ) 1 i z i+1 z i ΓLdist[z j,z] 2ΓLdist[z j,z]. (15) i= j 2 So, due to (14), {z k } is a Cauchy sequence and thus, by the closedness of Z, converges to some ẑ Z. Finally, we prove the convergence rate. The use of (15) for k + 1 instead of j and k + j instead of k together with (13) leads to k 1 i= j z k+ j z k+1 2ΓLdist[z k+1,z] 2CΓLdist[z k,z] 2 for any k, j N with j > 1. Passing to the limit for j we obtain ẑ z k+1 2CΓLdist[z k,z] 2 2CΓL ẑ z k 2. The latter shows the quadratic convergence of {z k } to ẑ and completes the proof. Remark 1 Algorithm 1 can be modified in such a way that in each step the subproblem (7) with s := z k is not solved exactly. Instead, only a feasible point (z k+1,γ k+1 ) of (7) is determined. If, for some S > 0, γ k S can be ensured for all k N the theory provided in this section is applicable (just replace Γ by S in the proofs) and shows the local quadratic convergence of the modified algorithm. There are several possibilities to achieve this. Such techniques for saving computational costs belong to our current research work [6] but they are not within the scope of this paper. It may be useful to conclude this section by illustrating the behavior of the method on a very simple linear problem. Suppose that F : R R is given by F(z) := z, Ω := R, and G(s) := F (s) = 1. The only solution of this problem is z = 0. It can be readily checked that Assumptions 1 4 are all satisfied. Given z k, subproblem (7) can be written as min z,γ γ γ z k 2 z γ z k 2, γ z k z z k γ z k, γ 0. 11

It is easy to verify that this problem has a unique solution given by z k+1 = (z k ) 3 z k + (z k ) 2. Using this, it is simple seen that {z k } converges to 0 from any starting point z 0 and that the convergence rate is quadratic. We remark that the standard Newton method would converge to the solution in one iteration on this linear problem. The non-finiteness on linear problems is in a sense a price we have to pay in order to be able to tackle a much wider array of problems than the standard Newton method does. However, without going into details, we mention that under suitable assumptions the standard Newton method may be regarded as an inexact version of Algorithm 1, cf. Remark 1. 3 Discussion of Assumptions In this section we discuss in detail the new Assumptions 3 and 4. To this end, we first investigate these assumptions in relation to existing conditions, particularly the (semi)smoothness of F. In Subsection 3.1 we show that some standard conditions that have been used in the literature to establish local convergence of Newton-type methods in nonstandard situations, imply our assumptions. Furthermore, in Subsection 3.2 and 3.3 we establish that our key Assumptions 3 and 4 can hold for some genuinely nonsmooth functions F with nonisolated solutions. In particular, in Subsection 3.2 we develop sufficient conditions for Assumption 3 that are applicable when F is a continuous selection of functions, while Subsection 3.3 deals with conditions which guarantee Assumption 4 for an important class of continuous selections. The results of this section will be used in Sections 4 and 5 to analyze the applicability of Algorithm 1 to the solution of, among others, KKT systems, complementarity and feasibility problems. 3.1 Relations to Existing Conditions In this subsection we first introduce two conditions that have been used in the literature to establish state-of-the-art local quadratic convergence results for systems that are either nonsmooth or have nonisolated solutions. Roughly speaking, we prove that any of these two conditions, Conditions 1 and 2 below, implies our Assumptions 3 and 4. Examples in Section 5 show that the reverse implications do not hold, thus establishing that our framework improves on existing methods. Condition 1 below is a smoothness condition that, in combination with Assumption 2, was used to prove local convergence properties of a Levenberg-Marquardt method for the solution of constrained systems of equations with possibly nonisolated solutions, see [25]. As Assumptions 3 and 4, Condition 1 restricts the choice of the mapping G. Condition 1 There exist κ 0 > 0 and δ 0 > 0 such that F(s) + G(s)(w s) F(w) κ 0 w s 2 (16) holds for all pairs (w,s) with w B δ0 (z ) Ω and s (B δ0 (z ) Ω) \ Z. 12

Note that Condition 1 implies the differentiability of F for all s in the interior of the set (B δ (z ) Ω) \ Z. Vice versa, if F is continuously differentiable with Lipschitz gradient on B δ0 (z ) Ω then Condition 1 holds. Proposition 2 will show that Condition 1 implies both Assumption 3 (if F provides a local error bound on Ω) and Assumption 4. The second condition we will consider plays a crucial role for proving local quadratic convergence of semismooth Newton methods. Let F be locally Lipschitz continuous around a solution z ; we denote by F(z) the generalized Jacobian of Clarke that, in turn, is the convex hull of the limiting Jacobian (or B-subdifferential) B F(z), i.e., F(z) := conv B F(z) with B F(z) := { lim l F (z l ) lim l z l = z,z l D F }, where D F R n is the set of points where F is differentiable. Condition 2 F is locally Lipschitz continuous and there exists κ 1 > 0 such that holds for all s B δ (z ) Ω. sup{ F(s) +V (z s) V F(s)} κ 1 z s 2 (17) Condition 2 does not imply differentiability of F. Even for Ω = R n, this assumption is slightly weaker than the strong semismoothness of F at a solution z, see [15, Section 5] for a discussion. Moreover, it is easily seen that we can equivalently state the condition by replacing F(z) with B F(z) in (17). To prove superlinear or quadratic convergence of the semismooth Newton method the nonsingularity of all matrices in B F(z ) is needed in [30], where it is further assumed that m = n. In our setting, in which m and n can be different, we will generalize the condition in [30] by assuming that the rank of B F(z ) is n. Note that this condition implies that the solution z is locally unique. Proposition 4 will show that Condition 2 and the local uniqueness of z imply Assumption 3 when G(s) F(s). Moreover, it will be clear from Proposition 5 that Condition 2 with the rank condition above guarantees Assumption 4 for G(s) B F(s). Given the facts described above, to show that Assumptions 3 and 4 can hold although neither Condition 1 nor Condition 2 (the latter associated with the full rank condition) is satisfied, we refer to Examples 5 and 7 11 in Section 5. We now show that in our setting Condition 1 implies both Assumptions 3 and 4. Proposition 2 Let Condition 1 be satisfied. Then, for δ > 0 sufficiently small, the following assertions are valid: (a) If Assumption 2 holds then Assumption 3 is satisfied. (b) Assumption 4 is satisfied. Proof. Let us choose δ according to 0 < δ 1 2 δ 0 with δ 0 from Condition 1. (a) First note that γ(s) = 0 for any s Z. Now, for s (B δ (z ) Ω) \ Z let s Z be so that s s = dist[s,z] holds. Then, we have s z s s + s z 2δ δ 0. Thus, inequality (16) with w := s and Assumption 2 yield F(s) + G(s)(s s) κ 0 s s 2 κ 0 l 2 F(s) 2. 13

Assumption 2 further provides s s = dist[s,z] l F(s). Obviously, (s,γ ) with Γ := lmax{1,κ 0 l} is feasible for problem (7). Hence, Assumption 3 is satisfied. (b) Let s (B δ (z ) Ω) \ Z and α [0,δ] be arbitrarily chosen. Then, w L (s,α) implies w Ω and w z w s + s z α + δ 2δ δ 0. By (16) and w L (s,α) it follows that F(w) F(s) + G(s)(w s) κ }{{} 0 w s 2 κ 0 α 2 α 2 and therefore F(w) (1 + κ 0 )α 2. Thus, Assumption 4 is satisfied, with ˆα := 1 + κ 0. The next result easily follows by recalling that Condition 1 holds if F is sufficiently smooth in a neighborhood of z. Corollary 1 Let F be differentiable with a locally Lipschitz continuous derivative around z. Then, with G(s) = F (s) for all s B δ (z ) Ω, assertions (a) and (b) of Proposition 2 are valid for δ > 0 sufficiently small. We now consider Condition 2 and its relationship to Assumptions 3 and 4. To analyze this point we first state a further condition, Condition 3 below, showing that Proposition 3 is equivalent to Assumption 3, provided that Assumptions 1 and 2 hold. Condition 3 There exists κ > 0 such that, for any s B δ (z ) Ω, an s Ω exists with s s κ dist[s,z] (18) and F(s) + G(s)(s s) κ dist[s,z] 2. (19) Note that, if z is an isolated solution and δ > 0 is sufficiently small, we can take s := z. Then, (18) holds for any κ 1 and (19) is related to Condition 2. This shows that Condition 3, and therefore Assumption 3 (see next proposition), can also be viewed as a relaxation of strong semismoothness. Proposition 3 The following assertions are valid: (a) If Assumption 2 is satisfied then Condition 3 implies Assumption 3. (b) If Assumptions 1 is satisfied then Assumption 3 implies Condition 3. 14

Proof. Let s B δ (z ) Ω be arbitrarily chosen. (a) Due to Condition 3, there is s Ω satisfying (18) and (19). This and Assumption 2 imply and s s κl F(s) F(s) + G(s)(s s) κl 2 F(s) 2. Obviously, (s,γ ) with Γ := κlmax{1,l} is feasible for problem (7). Hence, Assumption 3 is satisfied. (b) Let (z(s),γ(s)) denote a solution of problem (7) which exists due to Proposition 1. By Assumption 3, we have z(s) F (s,γ) (see Lemma 1 for the definition of the set F (s,γ)). This, together with Assumption 1, yields z(s) s Γ F(s) ΓLdist[s,Z] and F(s) + G(s)(z(s) s) Γ F(s) 2 ΓL 2 dist[s,z] 2. Thus, Condition 3 is satisfied for s := z(s) and κ := ΓLmax{1,L}. The following two propositions establish the desired relations between Condition 2 and Assumptions 3 and 4, respectively. Proposition 4 Let Assumption 2 and Condition 2 be satisfied and suppose that z is the only solution of (1) within B δ1 (z ) Ω for some δ 1 > 0. Then, with G(s) F(s) for all s B δ (z ) Ω, Assumption 3 holds for δ > 0 sufficiently small. Proof. Since z is the only solution of (1) in B δ1 (z ) Ω, we have s z = dist[s,z] for all s B δ (z ) Ω if δ (0, 1 2 δ 1]. Furthermore, (17) in Condition 2 and G(s) F(s) yield F(s) + G(s)(z s) κ 1 s z 2 = κ 1 dist[s,z] 2. Thus, Condition 3 is satisfied with κ := max{1,κ 1 } and with s := z for all s B δ (z ) Ω. By Proposition 3 (a), Assumption 3 holds for δ (0, 1 2 δ 1]. Proposition 5 Suppose that Condition 2 is satisfied and that the rank of all matrices in B F(z ) is equal to n. Then, with G(s) B F(s) for all s B δ (z ) Ω, Assumption 4 holds for δ > 0 sufficiently small. Proof. Let s (B δ (z ) Ω) \ Z and α [0,δ] be arbitrarily chosen. Take any w L (s,α). Then, the inequalities hold and, by the latter inequality, F(s) + G(s)(w s) α 2 and w s α (20) w B 2δ (z ) Ω (21) follows. The local Lipschitz continuity of F in Condition 2 implies that, for all x,y B 2δ (z ) Ω, F(x) F(y) L 0 x y, 15

with some L 0 > 0. In particular, F(w) = F(w) F(z ) L 0 w z (22) holds. Furthermore, choosing δ > 0 sufficiently small, the rank condition on B F(z ) and the upper semicontinuity of B F : R n R m yield that any V B F(s) has (full) rank n for all s B δ (z ). In addition, there exists c > 0 such that, for all s B δ (z ) Ω, V + c for all V B F(s), (23) where V + R n m denotes the pseudo-inverse of V. Note that rank(v ) = n m implies V + V = I R n n. Setting V := G(s), we therefore obtain w z = w s + s z = V + ((F(s) +V (w s)) F(s) V (z s)). Thus, by (23), (20), and Condition 2 it follows that w z V + ( F(s) +V (w s) + F(s) +V (z s) ) c(α 2 + κ 1 z s 2 ) c(α 2 + κ 1 ( w z 2 + 2 w z w s + w s 2 )) c(α 2 + κ 1 ( w z 2 + 2α w z + α 2 )) and w z (1 cκ 1 w z 2cκ 1 α) c(1 + κ 1 )α 2. (24) For δ > 0 sufficiently small, we have 1 cκ 1 w z 2cκ 1 α 1 2 due to (21) and α [0,δ] (as needed for Assumption 4). This, (22), and (24) yield F(w) L 0 w z 2cL 0 (1 + κ 1 )α 2. Hence, Assumption 4 holds with ˆα := 2cL 0 (1 + κ 1 ) and δ > 0 sufficiently small. The next corollary summarizes the assertions of the last two propositions. Corollary 2 Let Condition 2 be satisfied and suppose that the rank of all matrices in B F(z ) is equal to n. Then, with G(s) B F(s) for all s B δ (z ) Ω, Assumptions 3 and 4 are satisfied for δ > 0 sufficiently small. 3.2 Assumption 3 This subsection deals with continuous selections of functions. We will provide conditions under which Assumption 3 holds for continuous selections. The results of this subsection are used in Section 4 to prove convergence properties of Algorithm 1 for the solution of reformulated KKT systems. 16

Definition 1 Let F 1,...,F p : R n R m be continuous functions. A function F : R n R m is said to be a continuous selection of the functions F 1,...,F p on the set N R n if F is continuous on N and F(z) {F 1 (z),...,f p (z)} for all z N. We denote by A (z) := { i {1,..., p} F(z) = F i (z) } the set of indices of those selection functions that are active at z and by Z i := {z Ω F i (z) = 0} the solution set of the constrained equation F i (z) = 0, z Ω. The function F is termed piecewise affine (linear) if N = R n and the selection functions are all affine (linear). For a better understanding we give an example of a continuous selection. Example 1 Let H : R n R n be a continuous function. Then, the function F : R n R n with min{z 1,H 1 (z)} F(z) :=. min{z n,h n (z)} is a continuous selection of functions F 1,F 2,...,F 2n on R n. For example, if n = 2 we have the functions F i with i = 1,...,4 given by F 1 (z) := (z 1,z 2 ), F 2 (z) := (H 1 (z),z 2 ), F 3 (z) := (z 1,H 2 (z)), F 4 (z) := (H 1 (z),h 2 (z)). The next is the key result of this section, roughly speaking showing that, with respect to Assumption 3, a continuous selection inherits the behavior of its pieces. Theorem 2 Let F : R n R m be a continuous selection of the functions F 1,...,F p : R n R m on the set B δ (z ) Ω. Moreover, suppose that, for every i A (z ), a number Γ i 1 and a mapping G i : R n R m n exist such that γ i (s) Γ i holds for all s B δ (z ) Ω, where γ i (s) denotes the optimal value of the program min z,γ γ z Ω, F i (s) + G i (s)(z s) γ F i (s) 2, z s γ F i (s), γ 0. (25) Then, with G(s) {G i (s) i A (s)} for all s B δ (z ) Ω, Assumption 3 is satisfied. 17

Proof. Let us assume the contrary. Then, a sequence {s k } in B δ (z ) Ω exists such that {s k } converges to z and {γ(s k )} tends to infinity. The latter means that the sequence of optimal values of the programs min z,γ γ z Ω, F(s k ) + G(s k )(z s k ) γ F(s k ) 2, z s k γ F(s k ), γ 0 goes to infinity. Subsequencing if necessary, we can assume without loss of generality that there is an index j {1,..., p} with j A (s k ) for all k N such that for all k N. Therefore, we have F(s k ) = F j (s k ) and G(s k ) = G j (s k ) lim γ j(s k ) = (26) k for the optimal values γ j (s k ) of the programs (25) for i = j. By continuity of F j and F, we obviously get F j (z ) = lim F j (s k ) = lim F(s k ) = F(z ) k k so that j A (z ) follows. Therefore, by assumption, γ j (s k ) Γ j holds for all sufficiently large k, which contradicts (26). Hence, Assumption 3 is valid. In order to guarantee that the pieces of a continuous selection satisfy the assumptions of Theorem 2 we can use, in view of Proposition 2 (see below for the details), the following condition. Condition 4 The function F : R n R m is a continuous selection of the functions F 1,...,F p : R n R m on the set B δ (z ) Ω and, for every i A (z ), an l i > 0 exists such that holds for all s B δ (z ) Ω. dist[s,z i ] l i F i (s) Note that, for every i A (z ), Z i is nonempty since F i (z ) = F(z ) = 0. The next result uses Condition 4 to guarantee Assumption 3 is satisfied. Corollary 3 Suppose that Condition 4 is satisfied. Moreover, suppose that the functions F 1,...,F p : R n R m are differentiable on B δ (z ) Ω with locally Lipschitz continuous derivatives. Then, with G(s) { (F i ) (s) i A (s) } for all s B δ (z ) Ω, Assumption 3 is satisfied for δ > 0 sufficiently small. Proof. Let i A (z ) be arbitrary but fixed. Note, that Condition 4 implies Assumption 2 for the function F i. Since F i is differentiable and has a locally Lipschitz continuous derivative, Assumption 3 holds for F i (for δ > 0 sufficiently small) due to Corollary 1. So, there is a constant Γ i 1 such that the optimal values γ i (s) of the programs (25), with G i (s) := (F i ) (s), 18

are bounded above by Γ i for all s B δ (z ) Ω. Therefore, the hypotheses of Theorem 2 are satisfied and Assumption 3 holds. From the result just proved we can easily deduce that Assumption 3 is in particular satisfied if F is piecewise affine and Ω is a polyhedral set. Corollary 4 Let F : R n R m be piecewise affine and Ω R n a polyhedral set. Then, Assumption 2 is satisfied. Moreover, with G(s) {(F i ) (s) i A (s)} for all s B δ (z ) Ω, Assumption 3 is also valid for δ > 0 sufficiently small. Proof. The validity of Assumption 2 can be derived from Theorem 2.1 in [28]. Due to Hoffman s error bound [21], Condition 4 is satisfied and, by Corollary 3, Assumption 3 as well (for δ > 0 sufficiently small). The next two results, assuming that the function F has a rather special structure, are more technical; their importance will become clear when we deal with KKT systems in the next section. Let us consider the situation, where the mapping F and the vector z are split so that F(z) = (F a (z),f b (z)) R m a R m b, z = (x,y) R n x R n y, (27) where m a,m b,n x,n y N with m a +m b = m and n x +n y = n. Moreover, the matrix G(z) R m n (i.e., the Jacobian or a substitute) is also split accordingly, namely ( G(z) = (G x (z),g y G x (z)) = a (z) G y ) a(z) G x b (z) Gy b (z) with G x a(z) R m a n x, G y a(z) R m a n y, G x b (z) Rm b n x, and G y b (z) Rm b n y. Theorem 3 Let Assumption 2 be satisfied and let F and z be split according to (27). Suppose that z = (x,y) Z B δ (z ) implies x = x and that Ω has the form Ω = R n x Ω for some polyhedral set Ω R n y. Moreover, suppose that F is a continuous selection of functions F 1,...,F p on R n, that F a is differentiable with locally Lipschitz continuous derivative, that F a (x, ) is affine and that F b does not depend on the variable x and is piecewise affine. Then, with G(s) {(F i ) (s) i A (s)} for all s B δ (z ) Ω, Assumption 3 is satisfied for δ > 0 sufficiently small. Proof. By the assumptions made, the mapping F(x, ) : R n y R m is piecewise affine and thus, by Corollary 4, the optimal value γ(y) of the program min w,γ γ (x,w) Ω, F(x,y) + G y (x,y)(w y) γ F(x,y) 2, w y γ F(x,y), γ 0 is bounded by some Γ 1 for all (x,y) B δ (z ) Ω with δ > 0 sufficiently small. This means that, for any (x,y) B δ (z ) Ω, there is ŷ Ω such that F(x,y) + G y (x,y)(ŷ y) Γ F(x,y) 2 (28) 19

and Moreover, note that, for all (x,y) R n x R n y, ŷ y Γ F(x,y). (29) F b (x,y) = F b (x,y), (30) G x b (x,y) = 0, and Gy b (x,y) = Gy b (x,y) (31) hold since F b does not depend on x. Now let us choose any s := (x,y) B δ (z ) Ω. Setting ŝ := (x,ŷ) we obtain from (28) (31) that F(s) + G(s)(ŝ s) = F(s) + G x (s)(x x) + G y (s)(ŷ y) = F(x,y) + G y (x,y)(ŷ y) + F(s) F(x,y) + G x (s)(x x) + (G y (s) G y (x,y))(ŷ y) Γ F(x,y) 2 + F a (s) F a (x,y) + G x a(s)(x x) }{{} L 0 x x 2 + F b (s) F b (x },y) + G x b (s)(x x) {{} =0 + G y a(s) G y a(x,y) y ŷ + G y }{{} b (s) Gy b (x,y) y ŷ }{{} L 0 x x =0 Γ F(x,y) 2 ( + L 0 x x 2 + y ŷ x x ) Γ F(x,y) 2 + L 0 (dist[s,z] 2 + Γ F(x,y) dist[s,z]), (32) where L 0 > 0 exists due to the local Lipschitz continuity of (G x a,g y a) = F a. By the assumptions made, F is Lipschitz continuous on B δ (z ) Ω with some modulus L 1 > 0. This yields F(x,y) F(s) F(x,y) F(x,y) L 1 x x. Thus, with dist[s,z] l F(s) by Assumption 2, F(x,y) L 1 x x + F(s) L 1 dist[s,z] + F(s) (L 1 l + 1) F(s) follows. Therefore, using Assumption 2 again, we obtain from (32) that F(s) + G(s)(ŝ s) (Γ (L 1 l + 1) 2 + L 0 l 2 + L 0 Γ l(l 1 l + 1)) F(s) 2 (34) and, with (29) and (33), that Hence, setting s ŝ x x + y ŷ dist[s,z] + Γ (L 1 l + 1) F(s) (l + Γ (L 1 l + 1)) F(s). ˆΓ := max{γ (L 1 l + 1) 2 + L 0 l 2 + L 0 Γ l(l 1 l + 1), l + Γ (L 1 l + 1)} (34) and (35) show that the point (ŝ, ˆΓ) is feasible for problem (7) so that its optimal value is bounded by ˆΓ. Since s B δ (z ) Ω was chosen arbitrarily, Assumption 3 is satisfied for Γ := ˆΓ and δ > 0 sufficiently small. 20 (33) (35)

Proposition 6 Let Assumption 2 be satisfied and let F be split according to (27). Suppose that F is a continuous selection of functions F 1,...,F p : R n R m on R n, that F a is differentiable with locally Lipschitz continuous derivative, that F b is piecewise affine, and ˆδ > 0 exists such that A (z) = A (z ) holds for all z Z B ˆδ (z ). Then, with G(s) {(F i ) (s) i A (s)} for all s B δ (z ) Ω, Assumption 3 is satisfied for δ > 0 sufficiently small. Proof. By the continuity of the selection functions F 1,...,F p we can assume that δ > 0 is sufficiently small so that A (s) A (z ) for all s B δ (z ). (36) For the rest of the proof, let s B δ (z ) Ω be arbitrarily chosen and s Z satisfy dist[s,z] = s s. This implies s Z B 2δ (z ). Thus, with δ (0, 1 2 ˆδ], we obtain that A (s ) = A (z ) by assumption. Due to (36) this implies A (s) A (s ). Hence, there is an index i A (s), such that F(s) = F i (s), G(s) = (F i ) (s), and F(s ) = F i (s ) (37) hold. Since F i b is affine, we have Fi b (s) = A is + b i, with some A i R m b n and b i R m b, and (G x b (s),gy b (s)) = A i. This together with (37) yields F b (s) + (G x b,gy b )(s s) = A i s + b i + A i s A i s = A i s + b i = F b (s ) = 0. Thus, taking into account Assumption 2, we obtain F(s) + G(s)(s s) = F a (s) + (G x a,g y a)(s s) = F a (s) + F a(s)(s s) L 0 s s 2 L 0 l 2 F(s) 2, (38) where L 0 > 0 exists due to the local Lipschitz continuity of F a. Assumption 2 also ensures s s = dist[s,z] l F(s) for all s B δ (z ). (39) Since s B δ (z ) Ω was chosen arbitrarily, (38) and (39) show that Assumption 3 is satisfied for Γ := lmax{l 0 l, 1} and δ > 0 sufficiently small. 3.3 Assumption 4 In this subsection we show that Assumption 4 is satisfied by a particular, but practically important class of nonsmooth functions. The interesting thing to note is that the set Ω will play an important role in the analysis. It will turn out that even if Ω = R n in the statement of the problem it might be crucial to force Ω to be a smaller set. This will be done without eliminating any solution and will have the advantage of making Assumption 4 satisfied. In the Introduction we mentioned that the set Ω could have a not obvious technical use: the results in this subsection substantiate this statement. We already presented sufficient conditions for Assumption 4 in Propositions 2 and 5. In this subsection we investigate Assumption 4 for a class of continuous selections with a particular structure. To this end, let F be split again according to (27). Moreover, suppose that 21

F a : R n R m a is differentiable with a locally Lipschitz continuous derivative and that F b : R n R m b is given by min {B j 1 j q 1 (z)} F b (z) := min {B j (z)} := 1 j q. min {B m j, (40) b (z)} 1 j q where B 1,...,B q : R n R m b are given differentiable functions with a locally Lipschitz continuous derivative. Obviously, F is a continuous selection of functions F 1,...,F p : R n R m on R n with p = q m b, where F i (for i = 1,..., p) is differentiable and has a locally Lipschitz continuous derivative. To motivate our approach we first consider a simple example whose purpose is twofold. On the one hand it shows how the structure described above arises quite naturally when dealing with optimization and complementarity problems. On the other hand it illustrates how we can, quite naturally, add some constraints to a system of equations without changing its solution set. Example 2 The complementarity problem T (x) 0, x 0, x T (x) = 0 with a function T : R n 0 R n 0 can be reformulated as the following system of equations ( ) T (x) y F(x,y) := = 0. min{x, y} This system fits the above setting for T sufficiently smooth, just set F a (z) := T (x) y, B 1 (z) := x, and B 2 (z) := y. It is clear that the solutions of the system F(x,y) = 0, (x,y) R n are the solutions of the original complementarity system. We can also define an alternative, constrained system by setting Then, the solutions of the constrained system Ω := {(x,y) x 0, y 0} F(x,y) = 0, (x,y) Ω are the same as those of the unconstrained system and of the original complementarity problem (if we disregard the y-component). We will show shortly that although equivalent from the point of view of the solution sets, the constrained system is advantageous with respect to the unconstrained one in that Assumption 4 can be shown to hold for the constrained system. The next theorem formalizes and extends what we illustrated in Example 2. 22

Theorem 4 Let F be split according to (27) with F b : R n R m b given by (40) and consider the problem F(z) = 0, z Ω, (41) where Ω R n is a nonempty and closed set. Suppose that F is a continuous selection of functions F 1,...,F p on R n, that F a : R n R m a and B 1,...,B q : R n R m b are differentiable with locally Lipschitz continuous derivatives. For Ω := Ω {z R n B j (z) 0, j = 1,...,q}, the solution set of the problem F(z) = 0, z Ω (42) coincides with the solution set of (41). Furthermore, with G(s) {(F i ) (s) i A (s)} for all s B δ (z ) Ω, Assumption 4 is satisfied for problem (42). Proof. By (40), the equivalence of the solution sets of (41) and (42) is obvious and does not need a proof. To prove the assertion on Assumption 4 let s B δ (z ) Ω, α [0,δ], and w L (s, α) be arbitrarily chosen. Then, w s α and F(s) + G(s)(w s) = F i(s) + (F i(s) ) (s)(w s) α 2 (43) hold for any i(s) A (s). Since the selection functions F 1,...,F p are differentiable and have locally Lipschitz continuous Jacobians, a constant L > 0 exists such that Therefore, by (43), we obtain F i(s) (w) F i(s) (s) (F i(s) ) (s)(w s) L w s 2. F i(s) (w) L w s 2 + F i(s) (s) + (F i(s) ) (s)(w s) (L + 1)α 2. (44) For any i(w) A (w), this implies F i(w) t (w) = F i(s) t (w) (L + 1)α 2 for all t = 1,...,m a (45) since F a = F i a for all i = 1,..., p. Because of w L (s,α) Ω we obtain (taking into account the definition of Ω) 0 min {B t j (w)} Bt(w) i for all i = 1,...,q and all t = 1,...,m b. 1 j q Using this, (40), and (44), we get, for t = 1,...,m b, 0 min {B t j (w)} = F ma +t(w) = F i(w) m 1 j q a +t (w) Fi(s) m a +t (w) (L + 1)α2. (46) Now, the latter and (45) provide F(w) (L + 1)α 2. By the equivalence of norms in R m, this shows that Assumption 4 is satisfied. We already discussed the importance of the convexity of the set Ω from a computational point of view. Assuming that Ω in the previous theorem is convex, then the same holds for Ω if all B j are concave. In particular, this condition will be satisfied if the B j are affine; this latter case is 23

the most common one and will be illustrated in the next section. Another observation that may be of interest from the computational point of view, is that the set Ω, as defined in Theorem 4, requires the use of q additional constraints. In some case it is possible to reduce this number and define a different Ω. Corollary 5 below deals exactly with this case and covers a situation of interest. In particular it can be applied to Example 2 and to the reformulation of the KKT systems we discuss in the next section. We need a simple preliminary result. Lemma 3 Suppose that a, b R are given so that a + b 0. Then, holds. min{a,b} max{ a, b } Corollary 5 Suppose that the assumptions of Theorem 4 are satisfied except that now q := 2 and that Ω := Ω {z R n B 1 (z) + B 2 (z) 0}. Then, the assertion of Theorem 4 still holds. Proof. Just repeat the proof of Theorem 4 for q = 2 and note that, instead of (46), we get F i(w) m a +t (w) Fi(s) m a +t (w) (L + 1)α2 for all t = 1,...,m b. To verify this note first that the right inequality is implied by (44) like in the proof of Theorem 4. The left inequality follows from the application of Lemma 3, with a := B 1 t (w), b := B 2 t (w), by taking into account that a+b 0 due to the definition of Ω and that holds by (40). F i(w) m a +t (w) = min{b1 t (w),b 2 t (w)}, F i(s) m a +t (w) {B1 t (w),b 2 t (w)} One could think that our analysis is lacking and it should be possible to show that, in the setting of Theorem 4, it can be proved that Assumption 4 holds also for the problem F(z) = 0, z Ω. The following example shows that this is not the case and that the redefinition of Ω is indeed necessary. Example 3 Let F : R 2 R be given by F(x,y) := min{x,y} and take Ω = R 2, i.e., we want to solve the problem min{x,y} = 0, (x,y) R 2. Theorem 4 gives us the alternative formulation while Corollary 5 gives rise to min{x,y} = 0, x 0, y 0; min{x,y} = 0, x + y 0. Theorem 4 and Corollary 5 ensure that the second and third constrained systems do satisfy Assumption 4. We now show that the first, unconstrained reformulation instead does not. To this end take z := (0,0), s := (β,0), and w := ( β,0) (see the definition of Assumption 4), where β is a positive number. Then, F(s) = 0, F(w) = β and G(s) = (0,1). Whatever α > 0, if we take β = α/2 we have w L (s,α). But F(w) = α/2 and therefore it is not possible to find a positive ˆα for which α/2 = F(w) ˆαα 2 for any α small enough, as required by Assumption 4. 24