A nonmonotone semismooth inexact Newton method

A nonmonotone semismooth inexact Newton method SILVIA BONETTINI, FEDERICA TINTI Dipartimento di Matematica, Università di Modena e Reggio Emilia, Italy Abstract In this work we propose a variant of the inexact Newton method for the solution of semismooth nonlinear systems of equations. We introduce a nonmonotone scheme, which couples the inexact features with the nonmonotone strategies. For the nonmonotone scheme, we present the convergence theorems. Finally, we show how we can apply these strategies in the variational inequalities context and we present some numerical examples. Keywords: Semismooth Systems, Inexact Newton Methods, Nonmonotone Convergence, Variational Inequality Problems, Nonlinear Programming Problems. 1 Introduction The inexact Newton method is one of the methods for the solution of the nonlinear system of equations F (x) = 0, (1) where F : R n R n is a continuously differentiable function. The idea of this method was presented firstly in [4], with local convergence properties; then in [7] the authors proposed a global version of the method. Furthermore, the inexact Newton method was proposed also for the solution of nonsmooth equations (see for example [12], [8]). Corresponding author. Email bonettini.silvia@unimo.it This research was supported by the Italian Ministry for Education, University and Research(MIUR), FIRB Project RBAU01JYPN. 1

2 An inexact Newton method is every method which generates a sequence satisfying the following properties, F (x k ) + G(x k, s k ) η k F (x k ) (2) and F (x k + s k ) (1 β(1 η k )) F (x k ) (3) where G : R n R n R n is a given iteration function, x k+1 = x k + s k, η k is the forcing term, i.e. a scalar parameter chosen in the interval [0, 1), and β is a positive number fixed in (0, 1). In the smooth case, we could choose G(x, s) = F (x) T s, where F (x) T is the Jacobian matrix of F, while if F is just semismooth, we can take G(x, s) = Hs where H is a matrix of the B-subdifferential of F at x (for the definition of semismooth function and B-subdifferential see for example [17] and [18]). The quantity F (x k ) on the right hand side of the condition (2) represents the residual of the nonlinear system (1) at the iterate k. Furthermore, the condition (2) implies that the vector s k is an approximate solution of the equation F (x k ) + G(x k, s) = 0, (4) since the left hand side of (2) is the residual of (4), and the tolerance of such approximation is given by the term η k F (x k ). The condition (3) implies that the ratio of the norms of the vector F computed in two successive iterates is less than (1 β(1 η k )), a quantity less than one. It is worth to stressing that both conditions are depending by the forcing term η k. There are many advantages in the algorithms with inexact features, from the theoretical and practical point of view. Indeed, global convergence theorems can be proved under some standard assumptions. Furthermore, the condition (2) tells us that an adaptive tolerance can be introduced in the solution of the iteration function equation (4), saving unnecessary computations when we are far from the solution. A further relaxation on the requirements can be obtained by allowing nonmonotone choices. The nonmonotone strategies (see for example [10]) are well known in the literature for their effectiveness in the choice of the steplength in many line search algorithms. In [1] the nonmonotone convergence has been proved in the smooth case for an inexact Newton line search algorithm and the numerical experience

3 shows that the nonmonotone strategies can be useful in this kind of algorithm to avoid the stagnation of the iterates in a neighborhood of some critical points. In this paper we propose to modify the general inexact Newton scheme (2) and (3) in a nonmonotone way, by substituting (2) and (3) with the following conditions F (x k ) + G(x k, s k ) η k F (x l(k) ) (5) and F (x k + s k ) (1 β(1 η k )) F (x l(k) ) (6) where F is a semismooth function and, given N N, x l(k) is the element with the following property F (x l(k) ) = max F (x k j). (7) 0 j min(n,k) The nonmonotone conditions (5)-(6) can be considered as a generalization of the global method for smooth equations presented in [1], and in this work, we provide convergence theorems under analogous assumptions. In the following section we recall some basic definitions and some results for the semismooth functions; in section 3 we describe the general scheme of our nonmonotone semismooth inexact method and we prove the convergence theorems; in section 4 we apply the method to a particular semismooth system arising from variational inequalities and nonlinear programming problems and, in section 5, we report the numerical results. 2 The semismooth case Now we consider the nonlinear system of equations (1) with a nonsmooth operator F ; in particular we focus on the case in which the system (1) is semismooth. In order to introduce the semismooth notion, we also report the B- subdifferential and the generalized gradient definitions. We consider a vectorvalued function F : R n R n, with F (x) = [f 1 (x),..., f n (x)] T and we assume that, for each component f i, a Lipschitz condition near a given point x holds. This means that a function F : R n R n is said locally Lipschitz near a given point x R n, if there exists a positive number δ such that each f i satisfies f i (x 1 ) f i (x 2 ) l i x 1 x 2 x 1, x 2 N δ (x), l i R

4 where N δ (x) is the set {y R n : y x < δ} and L = (l 1,..., l n ) is called rank of F. The function F : R n R n is locally Lipschitz if, for any x R n, F is locally Lipschitz near x. Rademacher s Theorem asserts that F is differentiable almost everywhere (i.e. each f i is differentiable almost everywhere) on any neighborhood of x in which F is a locally Lipschitz function. We denote with Ω F the set of points at which F fails to be differentiable. Definition 2.1 B-subdifferential ([17]) Let F : R n R n be a locally Lipschitz function near a given point x (with x Ω F ). The B-subdifferential of F at x is B F (x) = {Z R n n : {x k } Ω F, with lim F (x k ) T = Z}. x k x Definition 2.2 (Clarke s generalized Jacobian ([3])) Let F : R n R n be a locally Lipschitz function near a given point x. Clarke s generalized Jacobian of F at x is F (x) = co B F (x) where co denotes the convex combinations in the space R n n. Remark: Clarke s generalized Jacobian is the convex hull of all matrices Z obtained as the limit of sequence of the form F (x i ) T where x i x and x i / Ω F. Now we can finally define the semismooth function, as follows. Definition 2.3 ([18]) Let F : R n R n be locally Lipschitzian near a given point x R n. We say that F is semismooth at x if exists for all v R n. lim Z F (x+tv ) v v,t 0 Zv The following definition of a BD-regular vector plays a crucial role in establishing global convergence results of several iterative methods.

5 Definition 2.4 ([16]) Let F : R n R n. We say that a point x R n is BD regular for F (F is BD regular at x) if F is locally Lipschitz near x and if all the elements in the B subdifferential B F (x) are nonsingular. The next results play an important role in establishing the global convergence of the semismooth Newton methods. Proposition 2.1 ([17]) If F : R n R n is BD regular at x, then there exists a positive number δ and a constant K > 0 such that for all x N δ (x ) and all H B F (x), H is nonsingular and H 1 K Proposition 2.2 ([16]) If F : R n R n is semismooth at a point x R n then for any ɛ > 0 there exists a δ > 0 such that F (y) F (x) H (y x) ɛ y x for all H B F (y), for all y N δ (x). 3 Nonmonotone semismooth inexact Newton methods We define a nonmonotone semismooth Newton method every method which generates a sequence {x k } such that F (x k ) + H k s k η k F (x l(k) ) (8) and F (x k + s k ) (1 β(1 η k )) F (x l(k) ) (9) where x l(k) is defined in (7), x k+1 = x k + s k, H k B F (x k ), η k [0, 1) and β is a positive parameter fixed in (0, 1). We will call the vector s k which satisfies (8) nonmonotone semismooth inexact Newton step at the level η k. For a sequence satisfying (8) and (9) it is possible to prove the following convergence result which is fundamental for the convergence proofs of the algorithms presented in the following. In [7, Theorem 3.3] and [19] an analogous result can be found for the smooth case.

6 Theorem 3.1 Let F : R n R n be a locally Lipschitz function. Let {x k } be a sequence such that lim k F (x k ) = 0 and for each k the following conditions hold: F (x k ) + H k s k η F (x l(k) ), (10) F (x k+1 ) F (x l(k) ), (11) where H k B F (x k ), s k = x k+1 x k and η < 1. If x is an accumulation point 1 of {x k }, then F (x ) = 0. Furthermore, if F is semismooth at x and F is BD-regular at x, then the sequence {x k } converges to x. Proof. If x is an accumulation point of the sequence {x k }, there exists a subsequence {x kj } of {x k } convergent to x. By the continuity of F, we obtain ( ) F (x ) = F lim x k j = lim F (x kj ) = 0. j j Furthermore, since {x l(k) } is a subsequence of {x k }, also the sequence {F (x l(k) )} converges to zero when k diverges. From Proposition 2.1, there exist δ > 0 and a constant K such that each H B F (x) is nonsingular and H 1 K for any x N δ (x ); we can suppose that δ is sufficiently small such that Proposition 2.2 implies F (y) F (x ) H y (y x ) 1 2K y x for y N δ (x ) and for any H y B F (y). Then for any y N δ (x ) we have Then F (y) = H y (y x ) + F (y) F (x ) H y (y x ) H y (y x ) F (y) F (x ) H y (y x ) = 1 K y x 1 2K y x 2K y x. y x 2K F (y) (12) holds for any y N δ (x ). Now let ɛ (0, δ 4 ) and since x is an accumulation point of {x k }, there exists a k sufficiently large that x k N δ (x ) 2 1 We say that x ia an accumulation point for the sequence {x k } if for any positive number δ there exists an index j such that x j x < δ. We say that the sequence {x k } converges to a point x (or, equivalently, that x is the limit point of the sequence {x k }) if for any positive number δ there exists an integer n such that x j x < δ for any j n.

7 and x l(k) S ɛ { y : F (y) < } ɛ. K(1 + η) Note that since x l(k) S ɛ then also x k+1 S ɛ because F (x k+1 ) F (x l(k) ). For the direction s k, by (10), (11) and since H 1 k K, the following inequality holds: s k H 1 k ( F (x k) + F (x k ) + H k s k ) K( F (x l(k) ) + η F (x l(k) ) ) = K(1 + η) F (x l(k) ) < ɛ < δ 2. Since s k = x k+1 x k, the previous inequality implies x k+1 x < δ and from (12) we obtain ɛ x k+1 x 2K F (x k+1 ) < 2K K(1 + η) < δ 2 that implies x k+1 N δ (x ). Therefore x l(k+1) S ɛ, since F (x l(k+1) ) 2 F (x l(k) ). It follows that, for any j sufficiently large, x j N δ (x ), and from (12) x j x 2K F (x j ). Since F (x j ) converges to 0 we can conclude that x j converges to x. 3.1 A line search semismooth inexact Newton algorithm In this section we describe a line search algorithm: once computed a semismooth inexact Newton step, the steplengh is reduced by a backtracking procedure until an acceptance rule is satisfied. In the remaining of the section, we prove that the proposed algorithm is well defined. Algorithm 3.1 Step 1 Choose x 0 R n, β (0, 1),0 < θ min < θ max < 1, η max (0, 1). Set k = 0; Step 2 (Search direction) Select an element H k B F (x k ). Determine η k [0, η max ] and s k that satisfy H k s k + F (x k ) η k F (x l(k) ) ;

8 Step 3 (Linesearch) While F (x k + α k s k ) > (1 α k β(1 η k )) F (x l(k) ) Step 3.a Choose θ [θ min, θ max ]; Step 3.b Set α k = θα k ; End Step 4 Set x k+1 = x k + α k s k ; Step 5 Set k = k + 1 and go to Step 2. The steplength is represented by the damping parameter α k which is reduced until the backtracking condition F (x k + α k s k ) (1 α k β(1 η k )) F (x l(k) ) (13) is satisfied. Condition (13) is more general than the Armijo condition employed for example in [8], since it does not require the differentiability of the merit function Ψ(x) = 1/2 F (x) 2. The final inexact Newton step is given by s k = α k s k, and it satisfies conditions (8) and (9) with forcing term η k = 1 α k (1 η k ). We will simply assume that at each iterate k it is possible to compute the vector s k which is an inexact Newton step at the level η k (see for example the assumption A1 in [12] for a sufficient condition). The next lemma shows that, under the previous assumption, the sequence generated by Algorithm 3.1 satisfies conditions (8) and (9). Lemma 3.1 Let β (0, 1); suppose that there exist η [0, 1), s satisfying F (x k ) + H k s η F (x l(k) ). Then, there exist α max (0, 1] and a vector s such that F (x k ) + H k s η F (x l(k) ) (14) F (x k + s) (1 βα(1 η)) F (x l(k) ) (15) hold for any α (0, α max ], where η [ η, 1), η = (1 α(1 η)). Proof. Let s = α s. Then we have F (x k ) + H k s = F (x k ) αf (x k ) + αf (x k ) + αh k s (1 α) F (x k ) + α F (x k ) + H k s (1 α) F (x l(k) ) + α η F (x l(k) ) = η F (x l(k) ),

9 so (14) is proved. Now let ε = (1 β)(1 η) F (x s l(k) ), (16) and δ > 0 be sufficiently small (see Proposition 2.2) that F (x k + s) F (x k ) H k s ε s (17) δ whenever s δ. Choosing α max = min(1, s ), for any α (0, α max] we have s δ and then, using (16) and (17), we obtain the following inequality F (x k + s) F (x k + s) F (x k ) H k s + F (x k ) + H k s εα s + η F (x l(k) ) = ((1 β)(1 η)α + (1 α(1 η))) F (x l(k) = (1 βα(1 η)) F (x l(k) ) (1 βα(1 η)) F (x l(k) ), that completes the proof. A consequence of the previous lemma is that the backtracking loop at the step 3 of Algorithm 3.1, at each iterate k, terminates in a finite number of steps. Indeed, at each iterate k the backtracking condition (13) is satisfied for α < α max, where α max depends on k. Since the value of α k is reduced by a factor θ < θ max < 1, then there exists a positive integer p such that (θ max ) p < α max and so the while loop terminates at most after p steps. When, at some iterate k, it is impossible to determine the next point x k+1 satisfying (8) and (9), we say that the algorithm breaks down. Then, Lemma 3.1 yields that assuming that it is possible to compute the semismooth inexact Newton step s k satisfying (8), then Algorithm 3.1 does not break down and it is well defined. 3.2 Convergence Analysis The next theorem proves, under appropriate assumptions, that the sequence {x k } generated by Algorithm 3.1 converges to a solution of the system (1). The proof is carried out by showing that lim k +infty F (x k ) = 0, so that the convergence of the sequence is ensured by Theorem 3.1.

10 Theorem 3.2 Suppose that {x k } is the sequence generated by Algorithm 3.1, with 2β < 1 η max. Assume that the following conditions hold: A1 There exists an accumulation point x of the sequence {x k }, such that F is semismooth and BD-regular at x ; A2 At each iterate k it is possible to find a forcing term η k and a vector s k such that the inexact residual condition (8) is satisfied; A3 For every sequence {x k } converging to x, every convergent sequence {s k } and every sequence {λ k } of positive scalars converging to zero, lim sup k + Ψ(x k + λ k s k ) Ψ(x l(k) ) λ k lim k + F (x k) T H k s k, where Ψ(x) = 1/2 F (x) 2, whenever the limit on the left-hand side exists; A4 For every sequence {x kj } such that α kj is bounded. converges to zero, then s kj Then, F (x ) = 0 and the sequence {x k } converges to x. Proof. Assumption A1 implies that the norm of the vector s k is bounded in a neighborhood of the point x. Indeed, from Proposition 2.1, there exists a positive number δ and a constant K such that H 1 k K for any H k B F (x k ), for any x k N δ (x ). Thus, the following conditions hold: s k H 1 k ( F (x k) + F (x k ) + H k s k ) K( F (x l(k) ) + η max F (x l(k) ) ) = K(1 + η max ) F (x l(k) ) K(1 + η max ) F (x 0 ). Furthermore, the condition A2 ensures that the Algorithm 3.1 does not break down, thus it generates an infinite sequence. Now we consider separately the two following cases: a) There exists a set of indices K such that {x k } k K converges to x and lim inf k +,k K α k = 0; b) For any subsequence {x k } k K converging to x we have lim inf k +,k K α k = τ > 0.

11 a)since F (x l(k) ) is a monotone nonincreasing, bounded sequence, then there exists L 0 such that L = lim k F (x l(k)). (18) From the definition (7) it follows that F (x l(k) ) F (x k ), thus L lim F (x k) (19) k +,k I where {x k } k I is a subsequence of {x k } such that the limit of the sequence { F (x k ) } k I exists. Since α k is the final value after the backtracking loop, we must have which yields F (x k + α k θ s k) > lim F (x k + α k k +,k K θ s k) ( 1 α k θ β(1 η k) ) F (x l(k) ) (20) ( lim 1 α ) k k +,k K θ β(1 η k) F (x l(k) ). (21) If we choose K as the set of indices with the property a), exploiting the continuity of F, recalling that η k is bounded, that s k is bounded and subsequencing to ensure the existence of the limit of α k, we obtain F (x ) L. On the other hand, from (19) we have also that L F (x ), thus it follows that L = F (x ). (22) Furthermore, by squaring both sides of (20), we obtain the following inequalities F (x k + α ( k θ s k) 2 > 1 α ) k 2 θ β(1 η k) F (xl(k) ) 2 ( 1 2 α ) k θ β(1 η k) F (x l(k) ) 2. This yields F (x k + α k θ s k) 2 F (x l(k) ) 2 > 2 α k θ β(1 η k) F (x l(k) ) 2. (23) Dividing both sides by α k θ, passing to the limit and exploiting the assumption A4, we obtain lim F (x k) T F (x k + α k H k s k lim θ s k ) 2 F (x l(k) ) 2 α k +,k K k +,k K k θ lim 2β(1 η k) F (x l(k) ) 2. (24) k +,k K

12 Since (22) holds and taking into account that η k 0, we have On the other hand, we have lim F (x k) T H k s k 2β F (x ) 2. (25) k +,k K F (x k ) T H k s k = F (x k ) T [ F (x k ) + F (x k ) + H k s k ] thus we can write = F (x k ) 2 + F (x k ) T [F (x k ) + H k s k ] F (x k ) 2 + F (x k ) F (x k ) + H k s k F (x k ) 2 + η max F (x l(k) ) 2, (26) lim F (x k) T H k s k lim F (x k) 2 + η max F (x l(k) ) 2. k + k + Furthermore, considering the subsequence {x k } k K, it follows that lim F (x k) T H k s k (1 η max ) F (x ) 2. (27) k +,k K From (25) and (27) we deduce 2β F (x ) 2 (1 η max ) F (x ) 2. Since we set (1 η max ) > 2β, then we must have F (x ) = 0. This implies that lim k + F (x l(k) ) = 0 and, consequently from (7), we have lim k + F (x k) = 0. Thus, the convergence of the sequence is ensured by Theorem 3.1. b) Writing the backtracking condition for the iterate l(k), we obtain F (x l(k) ) (1 α l(k) 1 β(1 η l(k) 1 )) F (x l(l(k) 1) ). (28) When k diverges, we can write L L L lim k α l(k) 1β(1 η l(k) 1 ), (29) where L is defined as in (18). Since β is a constant and 1 η j 1 η max > 0 for any j, the inequality (29) yields L lim k α l(k) 1 0

13 that implies or L = 0 lim α l(k) 1 = 0. (30) k Suppose that L 0, so that (30) holds. Defining ˆl(k) = l(k + N + 1) so that ˆl(k) > k, we show by induction that for any j 1 we have and lim k αˆl(k) j = 0 (31) lim F k (xˆl(k) j ) = L. (32) For j = 1, since {αˆl(k) 1 } is a subsequence of {α l(k) 1 }, (30) implies (31). Thanks to the assumption A4, we also obtain lim k xˆl(k) xˆl(k) 1 = 0. (33) By exploiting the Lipschitz property of F, from F (x) F (y) F (x) F (y) and (33) we obtain lim F k (xˆl(k) 1 ) = L. (34) Assume now that (31) and (32) hold for a given j. We have F (x l(k) j ) (1 α l(k) (j+1) β(1 η l(k) (j+1) )) F (x l(l(k) (j+1)) ). Using the same arguments employed above, since L > 0, we obtain and so lim k αˆl(k) (j+1) = 0 lim k xˆl(k) j xˆl(k) (j+1) = 0, lim F k (xˆl(k) (j+1) ) = L. Thus, we conclude that (31) and (32) hold for any j 1. Now, for any k, we can write x k+1 xˆl(k) ˆl(k) k 1 αˆl(k) j sˆl(k) j j=1

14 so that, since we have ˆl(k) k 1 N, we have Furthermore, we have lim x k+1 k xˆl(k) = 0. (35) xˆl(k) x xˆl(k) x k+1 + x k+1 x (36) Since x is an accumulation point of {x k } and (35) holds, (36) implies that x is an accumulation point for the sequence {xˆl(k) }. From (33) we conclude that x is an accumulation point also for the sequence {xˆl(k) 1 }, which contradicts the assumption made. Indeed, since {xˆl(k) 1 } converges to x, we should have that αˆl(k) 1 is bounded away from zero, from the hypothesis b). Hence, we necessarily have L = 0, that implies lim F (x k) = 0. k Now Theorem 3.1 completes the proof. The previous theorem is proved under the assumptions A1 A4: the hypothesis A4 is analogous to the one employed in [1] in the smooth case, while A3 is the nonmonotone, and weaker, version of the assumption (A4) in [12]. This hypothesis is not required in the smooth case, thanks to the stronger properties of the function F and of its Jacobian F (x) T (see 3.2.10 in [14]). 4 An application to the Karush Kuhn Tucker systems In this section we consider a particular semismooth system of equations derived from the optimality conditions of variational inequalities or nonlinear programming problems. We consider the classical variational inequality problem VIP(C,V), which is to find x C, such that < V (x ), x x > 0, x C (37) where C is a nonempty closed convex subset of R n, <, > is the usual inner product in R n and V : R n R n is a continuous function. When V is the gradient mapping of the real-valued function f : R n R, the

15 problem VIP(C,V) becomes the stationary point problem of the following optimization problem min f(x). (38) s. t. x C We assume, as in [20], that the feasible set C can be represented as follows C = {x R n h(x) = 0, g(x) 0, Π l x l, Π u x u}, (39) where h : R n R neq, g : R n R m, Π l R nl n and Π u R nu n ; Π l (or Π u ) denotes a matrix given by the rows of the identity matrix with indices equal to those of the entries of x which are bounded below (above). Furthermore, nl and nu denote the number of entries of the vector x subject to lower and upper bounds respectively. We consider the following conditions, representing the Karush-Kuhn-Tucker (KKT) optimality conditions of VIP(C,V) or of the nonlinear programming problem (38): L(x, λ, µ, κ l, κ u ) = 0 h(x) = 0 µ T g(x) = 0 g(x) 0 µ 0 (40) κ T l (Π lx l) = 0 Π l x l 0 κ l 0 κ T u (u Π u x) = 0 u Π u x 0 κ u 0 where L(x, λ, µ, κ l, κ u ) = V (x) h(x)λ g(x)µ Π T l κ l + Π T u κ u is the Lagrangian function. Here h(x) T and g(x) T are the Jacobian matrices of h(x) and g(x) respectively. In order to rewrite the KKT-conditions as a nonlinear system of equations, we make use of the Fischer s function, [9], ϕ : R 2 R defined by ϕ(a, b) := a 2 + b 2 a b. The main property of this function is the following characterization of its zeros: ϕ(a, b) = 0 a 0, b 0, ab = 0. Therefore, the KKT-conditions (40) can be equivalently written as the non-

16 linear system of equations V (x) h(x)λ g(x)µ Π T l κ l + Π T u κ u = 0 h(x) = 0 ϕ I (µ, g(x)) = 0 ϕ l (κ l, Π l x l) = 0 ϕ u (κ u, u Π u x) = 0 or, in more concise form, Φ(w) = 0 (41) where w = (x T, λ T, µ T, κ T l, κt u ) T ; ϕ I : R 2m R m with ϕ I (µ, g(x)) := (ϕ(µ 1, g 1 ),..., ϕ(µ m, g m )) T ; ϕ l : R 2nl R nl, with ϕ l (κ l, Π l x l) R nl ; ϕ u : R 2nu R nu, with ϕ u (κ u, u Π u x) R nu. Note that the functions ϕ I, ϕ l, ϕ u are not differentiable in the origin, so that the system (41) is a semismooth reformulation of the KKT-conditions (40). The system (41) can be solved by the semismooth inexact Newton method [8], given by w k+1 = w k + α k w k, with a given starting point w 0, where α k is a damping parameter and w k is the solution of the following linear system H k w = Φ(w k ) + r k (42) where H k B Φ(w k ) and r k is the residual vector and it satisfies the condition r k η k Φ(w k ). As shown in [20], permuting the equations of the system (42) and changing

17 the sign of the fourth equation, the system (42) can be written as follows: R κl 0 0 R l Π l 0 0 R κu 0 R u Π u 0 0 0 R µ R g ( g(x)) T 0 Π T l Π T u g(x) V (x) + 2 g(x)µ + 2 h(x)λ h(x) 0 0 0 ( h(x)) T 0 κ l ϕ l (κ l, Π l x l) κ u µ x = ϕ u (κ u, u Π u x) ϕ I (µ, g(x)) α + P r λ h(x) where P r is the permuting residual vector and α = V (x) h(x)λ g(x)µ (Π l ) T κ l + (Π u ) T κ u ; R g = diag(r g1,..., r gm ) ( ) g i 1 if (g i (x), µ i ) 0 (r g ) i = µ 2 i + g2 i 1 if (g i (x), µ i ) = 0 R µ = diag(r µ1,..., r µm ) ( ) µ i 1 if (g i (x), µ i ) 0 (r µ ) i = µ 2 i + g2 i 1 if (g i (x), µ i ) = 0 R l = diag(r l1,..., r lnl ) ( ) ((Π l x) i l i ) 1 if ((Π l x) i l i, (κ l ) i ) 0 (r l ) i = (κ l ) 2 i + ((Π lx) i l i ) 2 1 if ((Π l x) i l i, (κ l ) i ) = 0 R κl = diag(r κl1,..., r κlnl ) ( ) (κ l ) i 1 if ((κ l ) i, (Π l x) i l i ) 0 (r κl ) i = (κ l ) 2i + ((Π lx) i l i ) 2 1 if ((κ l ) i, (Π l x) i l i ) = 0

18 R u = diag(r u1,..., r unu ) ( ) (u i (Π u x) i ) 1 if (u i (Π u x) i, (κ u ) i ) 0 (r u ) i = (κ u ) 2i + (u i (Π u x) i ) 2 1 if (u i (Π u x) i, (κ u ) i ) = 0 R κu = diag(r κu1,..., r κu nu ) ( ) (κ u ) i 1 if ((κ u ) i, u i (Π u x) i ) 0 (r κu ) i = (κ u ) 2 i + (u i (Π u x) i ) 2 1 if ((κ u ) i, u i (Π u x) i ) = 0. Now we define the merit function Ψ : R m+n+neq+nl+nu R as Ψ(w) = 1 2 Φ(w) 2. (43) The differentiability of the function Ψ(w) plays a crucial role in the globalized strategy of the semismooth inexact Newton method proposed in [8]. In the approach followed here, this property is not required, since the convergence theorem 3.2 can be proved without assuming this hypothesis, thanks to the backtracking rule (28) which is similar to the ones proposed in [7] and in [12]. Now we introduce the nonmonotone inexact Newton algorithm, as follows: Algorithm 4.1 Step 1 Choose w 0 = (x 0, λ 0, µ 0, κ l0, κ u0 ) R m+n+neq+nl+nu, θ (0, 1), β (0, 1/2) and fix η max < 1; λ 0 = 0, κ l0 = 0, κ u0 = 0, µ 0 = 0. Step 2 (Stopping criterion) if Φ(w k ) tol then stop else Step 3 (Search direction w) Select an element H k B Φ(w k ). Find the direction w k R n and a parameter η k [0, η max ] such that H k w k + Φ(w k ) η k Φ(w l(k) ) (44)

19 Step 4 (Linesearch) Compute the minimum integer h, such that, if α k = θ h the following condition holds Φ(w k + α k w k ) (1 βα k (1 η k )) Φ(w l(k) ) (45) Step 5 Compute w k+1 = w k + α k w k go to Step 2. It is straightforward to observe that Algorithm 4.1 is a special case of Algorithm 3.1. Furthermore, the merit function Ψ(w) is differentiable and Ψ(w) = H T Φ(w) where H belongs to B Φ(w) (see Proposition 4.2 in [8]). This implies that the hypothesis A3 holds [12]. Moreover we assume that H k in (44) is nonsingular and that all the iterates w k belong to a compact set. As a consequence, we have that the norm of the search direction w k is bounded: indeed, for any k, from (44) we obtain where M = max wk H 1 k. 5 Numerical results w k M(1 + η max ) Φ(w 0 ) In this section we report some numerical experiments, obtained by coding Algorithm 4.1 in FORTRAN 90 using double precision on HP zx6000 workstation with Itanium2 processor with 1.3 GHz and 2 Gb of RAM, running HP-UX operating system. In particular, we set β = 10 4, θ = 0.5, tol = 10 8. Our aim is to compare the performances of the Algorithm 4.1 with different monotonicity degrees, by choosing different values for the parameter N. We declare a failure of the algorithm when the tolerance tol can not be reached after 500 iterations or when, in order to satisfy the backtracking condition (45), more than 30 reductions of the damping parameter have been performed. The forcing term η k has been adaptively chosen as ( ) 1 η k = max 1 + k, 10 8. The solution of the linear system (44) is computed by the LSQR method [15] with a suitable preconditioner proposed in [20]. The stopping criterion

20 for the inner linear solver is the condition (44). The test problems we considered are the nonlinear programming problems and the complementarity problems listed in Table 1, where we also report the number of variables n, the number of equality and inequality constraints, neq and m respectively, and the number of lower, and upper bounds, nl and nu respectively. Tables 2 and 3 summarize the results obtained on this set of test problems. The tables report a comparison of the performances of the algorithm with different choices of the parameter N (N = 1, 3, 5, 7) in terms of number of external and inner iterations, in the rows with the ext. and inn. symbols respectively, and of number of backtracking reductions (the rows denoted by back ). The case N = 1 is the usual monotone case. Tables 2 and 3 show that the nonmonotone strategies can produce a sensible decrease not only of the number of backtracking reduction, but also of the number of inner iterations. Furthermore, in some cases, also the number of external iterations is reduced, when nonmonotone choices are employed. This fact could be explained by observing that different choices of the parameter N imply different values of the inner tolerance: since the direction w k computed at the step 3 of Algorithm 4.1 depends on the inner tolerance, for different values of N, we obtain different search directions. Figure 1 depicts the decreasing of the function Ψ(w) defined in (43): the value Ψ(w k ) has been reported in logarithmic scale on the y axis for each iteration of the Algorithm 4.1 applied to the MCP problem lincont. For N = 3, N = 5 and N = 7, a nonmonotone decrease can be observed, and the tolerance of 10 8 is reached after 32, 33 and 34 iterations respectively, while in the monotone case (N = 1), the same tolerance is satisfied after 46 iterations. A similar behaviour has been observed also for the most part of the MCP and NLP test problems, for example ehl-kost, ehl-def, optcntrl, marine, rosenbr. On the other side, a too large value of the parameter N could, in some case, produce a degenerate behaviour of the algorithm, as we observed for example in the MCP problem duopoly. The decrease of the number of iterations and of the number of backtracking steps corresponds to a decreasing of the execution time. For example, in the problem lincont, the execution time of the monotone algorithm is 3.62 seconds: setting N = 3 the time is reduced to 1.37 seconds, which is less than one half of the monotone algorithm time. The CPU time for N = 5 and N = 7 are 1.36 and 1.42 respectively.

21 A significant reduction has been obtained, for example, also for the test problem ehl-kost, for which we have obtained 0.43, 0.26, 0.29 and 0.34 seconds with N = 1, N = 3, N = 5 and N = 7 respectively, for the problem ehl-def (0.23, 0.15, 0.17, 0.21), dtoc6 (0.11, 3.1e-2, 2.3e-2, 2.3e-2), marine (1.68, 1.06, 1.06, 1.11), opt-cont3 (10.3, 18.7, 18.7, 18.7). Since the execution time related to the other test problems is very small (less than one seconds), we report in the graphs of figures 2 and 3 the ratio between the execution time of the nonmonotone algorithms obtained with the different values of N and the execution time of the monotone algorithm. Thus, the value 1 on the y axis represents the time employed by the monotone algorithm. In general, the nonmonotone choice improved the execution times, but we observed that, for large values of the parameter N, the number of inner iterations could decrease while the number of external iterations could rise. This leads to an increase of the execution time. Thus, we can conclude that the nonmonotone strategies are effective in improving the performance of the Algorithm 4.1, but the nonmonotonicity parameter N has to be chosen very carefully. Acknowledgements. The authors are grateful to the referees for their comments, which helped to improve the presentation of the paper.

22 Table 1: Test Problems NLP Problem Ref. n neq m nl nu harkerp2 [2] 100 0 0 100 0 himmelbk [2] 24 14 0 24 0 optcdeg2 [2] 295 197 0 197 0 optcdeg3 [2] 295 197 0 197 99 optcntrl [2] 28 19 1 20 10 aug2dc [2] 220 96 0 18 0 minsurf [6] 225 0 0 225 0 marine [6] 175 152 0 15 0 steering [6] 294 236 0 61 60 dtoc2 [2] 294 196 0 0 0 dtoc6 [2] 298 149 0 0 0 lukvle8 [11] 300 298 0 0 0 blend * 24 14 0 24 0 branin * 2 2 0 2 0 kowalik * 4 0 0 4 4 osbornea [2] 5 0 0 5 5 rosenbr [2] 2 0 0 0 0 hs6 [2] 2 0 1 0 0 mitt105 [13, Ex.5.5] 65 45 0 65 65 α = 0.01, N = 5 mitt305 [13, Ex.4] 70 45 0 25 50 α = 0.001, N = 5 mitt405 [13, Ex.3] 50 25 0 25 50 α = 0.001, N = 5 MCP Problem Ref. n neq m nl nu ehl-kost [5] 101 0 0 100 0 ehl-def [5] 101 0 0 100 0 bertsek [5] 15 0 0 10 0 choi [5] 13 0 0 0 0 josephy [5] 4 0 0 4 0 bai-haung [5] 4900 0 0 4900 0 bratu [5] 5929 0 0 5625 5625 duopoly [5] 69 0 0 63 0 ehl-k40 [5] 41 0 0 40 0 hydroc06 [5] 29 0 0 11 0 lincont [5] 419 0 0 170 0 opt-cont1 [5] 1024 0 0 512 512 opt-cont2 [5] 4096 0 0 2048 2048 opt-cont3 [5] 16384 0 0 8192 8192 *http://scicomp.ewha.ac.kr/netlib/ampl/models/nlmodels/

Table 2: Nonmonotone results in the NLP problems NLP Problem N=1 N=3 N=5 N=7 harkerp2 ext. 105 104 104 104 inn. 471 404 415 438 back 37 20 14 8 himmelbk ext. 22 23 27 25 inn. 114 96 149 110 back 22 60 83 58 optcdeg2 ext. - - 87 75 inn. - - 294 278 back - - 313 289 optcdeg3 ext. - 100 73 68 inn. - 266 158 134 back - 315 133 82 optcntrl ext. 31 25 23 23 inn. 78 54 45 45 back 135 70 54 54 aug2dc ext. 7 7 7 7 inn. 12 7 7 7 back 0 0 0 0 minsurf ext. 5 5 5 5 inn. 10 8 8 5 back 0 0 0 0 marine ext. 94 68 67 72 inn. 436 296 284 287 back 473 327 287 309 steering ext. 11 - - - inn. 30 - - - back 37 - - - dtoc2 ext. 7 7 7 7 inn. 11 8 8 8 back 1 1 1 1 dtoc6 ext. 21 9 9 9 inn. 21 9 9 9 back 39 1 0 0 lukvle8 ext. - 20 21 25 inn. - 418 365 455 back - 34 34 34 blend ext. 23 20 19 19 inn. 79 112 82 75 back 225 98 72 72 branin ext. 8 8 8 8 inn. 8 8 8 8 back 11 2 2 2 kowalik ext. 37 11 22 20 inn. 37 11 22 20 back 149 1 14 6 23

24 NLP Problem N=1 N=3 N=5 N=7 osborne1 ext. 121 16 16 16 inn. 121 16 16 16 back 443 0 0 0 rosenbr ext. 180 14 9 9 inn. 180 14 9 9 back 1121 4 2 2 hs6 ext. 8 7 7 7 inn. 8 7 7 7 back 14 11 11 11 mitt105 ext. 11 9 8 8 inn. 19 13 9 9 back 13 4 0 0 mitt305 ext. 31 24 24 20 inn. 71 47 46 37 back 195 105 102 68 mitt405 ext. 32 26 19 19 inn. 77 54 39 39 back 227 144 81 81 the algorithm does not converge References [1] S. Bonettini (2005). A Nonmonotone Inexact Newton Method, Optim. Meth. Software, 20, 4-5, 475 491. [2] I. Bongartz, A.R. Conn, N. Gould and Ph. L. Toint (1995). CUTE: Constrained and Unconstrained Testing Environnment, ACM Transactions on Mathematical Software, 21, 123 160. [3] F. H. Clarke (1983). Optimization and Nonsmooth Analisys, John Wiley & Songs, New York, [4] R. S. Dembo, S. C. Eisenstat and T. Steihaug (1982). Inexact Newton methods, SIAM Journal on Numerical Analysis, 19, 400 408.

Table 3: Nonmonotone results in the MCP problems MCP Problem N=1 N=3 N=5 N=7 ehl-kost ext. 14 12 14 16 inn. 104 50 50 48 back 17 0 0 0 ehl-def ext. 14 12 14 16 inn. 103 50 50 48 back 17 0 0 0 bertsek ext. 6 6 6 6 inn. 8 7 6 6 back 0 0 0 0 choi ext. 5 5 5 5 inn. 5 5 5 5 back 0 0 0 0 josephy ext. 6 6 7 7 inn. 8 7 7 7 back 2 2 2 2 bai-haung ext. 6 6 6 6 inn. 13 9 9 9 back 0 0 0 0 bratu ext. 5 5 5 5 inn. 10 6 6 6 back 0 0 0 0 duopoly ext. 44 38 48 inn. 135 127 140 back 225 158 81 ehl-k40 ext. 203 333 inn. 1292 2056 back 2252 3230 hydroc06 ext. 5 5 5 5 inn. 8 6 6 6 back 1 1 1 1 lincont ext. 46 32 33 34 inn. 385 165 144 170 back 220 89 77 73 opt-cont1 ext. 10 10 10 10 inn. 49 33 21 19 back 0 0 0 0 opt-cont2 ext. 9 9 9 11 inn. 101 52 42 42 back 0 0 0 0 opt-cont3 ext. 8 8 8 8 inn. 22 15 14 14 back 0 0 0 0 maximum number of backtracking reductions reached 25

26 Figure 1: Decrease of the merit function N=1 N=3 10 0 10 0 10 5 10 5 10 10 0 10 20 30 40 10 10 0 10 20 30 40 N=5 N=7 10 0 10 0 10 5 10 5 10 10 0 10 20 30 40 10 10 0 10 20 30 40 1.4 1.2 1 0.8 Figure 2: Time comparison for the MCP problems MCP N=3 N=5 N=7 0.6 0.4 0.2 0 ehl kost ehl def bertsek choi josephy bai haung bratu duopoly hydroc06 lincont opt cont1opt cont2opt cont3

27 1.4 1.2 1 0.8 Figure 3: Time comparison for the NLP problems NLP N=3 N=5 N=7 0.6 0.4 0.2 0 harkerp2 himmelbk optcntrl aug2dc minsurf marine dtoc2 dtoc6 blend 1.4 1.2 1 0.8 0.6 0.4 0.2 0 branin kovalik osborne1 rosenbr hs6 mitt1 mitt2 mitt3

28 [5] S. P. Dirkse and M. C. Ferris (1995). A collection of nonlinear mixed complementarity problems, Optimization Methods and Software, 5, 123 156. [6] E.D. Dolan, J. J. Moré and T. S. Munson (2004). Benchmarking optimization software with COPS 3.0, Technical Report ANL/MCS-TM- 273, Argonne National Laboratory, Illinois, USA. [7] S. C. Eisenstat and H. F. Walker (1994). Globally convergent Inexact Newton methods, SIAM Journal on Optimization, 4, 393 422. [8] F. Facchinei, A. Fischer and C. Kanzow (1996). Inexact Newton methods for semismooth equations with applications to variational inequality problems, in G. Di Pillo and F. Giannessi (eds.), Nonlinear Optimization and Applications, Plenum Press, New York 1996, 125-139. [9] A. Fischer (1992). A special Newton-type optimization method, Optimization, 24, 269 284. [10] L. Grippo, F. Lampariello and S. Lucidi (1986). A Nonmonotone line search technique for Newton s method, SIAM Journal on Numerical Analysis 23, 707 716. [11] L. Lukšan and J. Vlček (1999). Sparse and partially separable test problems for unconstrained and equality constrained optimization, Technical Report 767, Institute of Computer Science, Academy of Science of the Czech Republic. [12] J. M. Martinez and L. Qi (1995). Inexact Newton methods for solving nonsmooth equations, J. Comput. Appl. Math., 60, 127 145. [13] H. D. Mittelmann and H. Maurer (1999). Optimization techiques for solving elliptic control problems with control and state constraints: Part 1. Boundary control, Computational Optimization and Applications, 16, 29 55. [14] J. M. Ortega and W. C. Rheimboldt (1970). Iterative solution of nonlinear equations in several variables, Academic Press, New York. [15] C. C. Paige and M. A. Saunders (1982). LSQR: An Algorithm for Sparce Linear Equations and Sparse Least Squares, ACM Transactions on Mathematical Software, 8, 1, 43 71.

29 [16] J. S. Pang and L. Qi (1993). Nonsmooth equations: Motivations and algorithms, SIAM J. Optim., 3, 443 465. [17] L. Qi (1993). A convergence analysis of some algorithms for solving nonsmooth equations, Mathematical of Operator Research, 18, 227 244. [18] L. Qi, J. Sun (1993). A nonsmooth version of Newton method, Mathematical Programming, 58, 353 367. [19] W.C. Rheinboldt (1998). Methods for Solving Systems of Nonlinear Equations, Second Edition, SIAM, Philadelphia. [20] V. Ruggiero and F. Tinti (2005). A preconditioner for solving largescale variational inequality problems by a semismooth inexact approach, Tech. Rep. n. 69, Dipartimento di Matematica, Università di Modena e Reggio Emilia, Modena.