Algebraic rules for computing the regularization parameter of the Levenberg-Marquardt method

Size: px

Start display at page:

Download "Algebraic rules for computing the regularization parameter of the Levenberg-Marquardt method"

Nathan Ball
5 years ago
Views:

1 Algebraic rules for computing the regularization parameter of the Levenberg-Marquardt method E. W. Karas S. A. Santos B. F. Svaiter May 7, 015 Abstract A class of Levenberg-Marquardt methods for solving the nonlinear least-squares problem is proposed with algebraic explicit rules for computing the regularization parameter. The convergence properties of this class of methods are analyzed. All accumulation points of the generated sequence are proved to be stationary. Q-quadratic rate of convergence for the zero-residual problem is obtained under an error bound condition. Illustrative numerical experiments are presented, with encouraging results. Keywords: Nonlinear least-squares problems; Levenberg-Marquardt method; regularization; global convergence; local convergence; computational results. AMS Classification: 90C30, 65K05, 49M37. 1 Introduction Given F : R n R m, the nonlinear least-squares (NLS) problem is stated as min F x R (x), n where is the Euclidean norm. It is important from both theoretical and practical viewpoints [3, 18]. The seminal works of Levenberg [1], Morrison [15] and Marquardt [13] provided a regularization strategy for improving the convergence of the Gauss-Newton (GN) method, possibly the first and simpler choice of a method to address the NLS problem. In the GN method, the nonlinear residual function F is replaced by its linear approximation at the current iterate, so that the NLS problem is solved by means of a sequence of quadratic problems, without any additional parameter. In the Levenberg-Marquardt (LM) method, a sequence of quadratic problems is also generated, but a regularization term is introduced, depending on the so-called LM parameter in an essential way. In practice, especially for very large problems, as those arising in data assimilation, it may be necessary to make further approximations within the GN method in order to reduce computational Department of Mathematics, Federal University of Paraná, CP 19081, , Curitiba, PR, Brazil. This author was partially supported by CNPq grants /013-3 and / Department of Applied Mathematics, University of Campinas, Campinas, SP, Brazil. This author was partially supported by CNPq grant 30403/010-7, FAPESP grants 013/ and 013/ and PRONEX Optimization. IMPA, Estrada Dona Castorina, 110, Rio de Janeiro, Brazil. This author was partially supported by CNPq grants 3096/011-5, /010-7, FAPERJ grant E-6/10.940/011 and PRONEX Optimization. 1

2 costs. Gratton et al. [10] point out conditions ensuring that the truncated and perturbed GN methods converge and also derive rates of convergence for the iterations. Inexact versions of the LM method have also been considered, usually under an error bound condition, which is weaker than assuming nonsingularity of the appropriate matrix at a solution, namely the Jacobian JF R m n if m = n or the square matrix JF T JF, otherwise [, 8]. Local convergence of the LM method has been studied under an error bound condition as well. Yamashita and Fukushima [17] proved q-quadratic rate for the LM method with the LM parameter sequence set as µ k = F (x k ). Later, this q- quadratic rate was extended by Fan and Yuan [6] for the setting µ k = F (x k ) δ, with δ [1, ]. Fan and Pan [5] have enlarged the analysis for the variation of the exponent to δ (0, 1) within the aforementioned update of the LM parameter, showing local superlinear convergence in this case. Accelerated versions of the LM method have been recently proposed by Fan [4], in which cubic local convergence was reached. When it comes to complexity analysis, Ueda and Yamashita [16] have investigated a global complexity bound for the LM method. In this work we have devised algebraic rules for computing the LM parameter, so that µ k belongs to an interval that is proportional to F (x k ). Under the Lipschitz continuity of the Jacobian, the rules were obtained with the aim of accepting the full LM step by the Armijo sufficient decrease condition. On one hand, under availability of the Lipschitz constant, we have proved that the full steps are accepted. On the other hand, in case the Lipschitz constant is unknown, we have proposed a scheme for dynamically updating an estimate of such a constant, within a well defined and globally convergent algorithm for solving the NLS problem. The q-quadratic rate of convergence for the zero-residual problem is obtained under an error bound condition. Numerical results illustrate the performance of the proposed algorithm for a set of test problems from the CUTEst, with promising results, in terms of both efficiency and robustness. The text is organized as follows. The basic results that have generated the proposed algorithm are presented in Section. The algorithm and its underlying properties are given in Section 3. The global convergence analysis is provided in Section 4, whereas the local convergence result is developed in Section 5. The numerical experiments are presented and examined in Section 6. A summary of the work is stated in Section 7 and the tables with the complete numerical results compose the Appendix. Technical results We consider F : R n R m of class C 1 with an L-Lipschitz continuous Jacobian JF R m n, that is, for all x, y R n, JF (x) JF (y) L x y, (1) where at the left hand-side of the previous inequality we have the operator norm induced by the canonical norms in R n and R m. More specifically, we are concerned with the choice of the regularization parameter µ in the Levenberg-Marquardt method, s = (JF (x) T JF (x) + µi) 1 JF (x) T F (x), x + = x + ts, 0 < t 1 where x + is the new iterate and t is the step length. Initially, a classical result is recalled [3, Lemma 4.1.1]. Lemma.1 (Linearization error). For any x, s in R n, F (x + s) (F (x) + JF (x)s) L s.

3 From now on, in this section, s is the Levenberg-Marquardt step at x R n, with regularization parameter µ > 0, that is, s = argmin F (x) + JF (x)s + µ s = (JF (x) T JF (x) + µi) 1 JF (x) T F (x). () Define also φ : R R Note that φ(t) = F (x + ts). (3) φ (0) = F (x), JF (x)s = JF (x) T F (x), (JF (x) T JF (x) + µi) 1 JF (x) T F (x) 0. (4) First we analyze the norm of the linearization of F along the direction s. Lemma. (Linearization s norm). For any t [0, 1], F (x) + tjf (x)s = F (x) + t JF (x) T F (x), s + (t t) JF (x)s tµ s F (x). Proof. F (x) + tjf (x)s = F (x) + t JF (x) T F (x), s + t JF (x)s = F (x) + t F (x), JF (x)s + t JF (x) T F (x), s + t JF (x)s = F (x) + t F (x), JF (x)s + t (JF (x) T JF (x) + µi)s, s + t JF (x)s = F (x) + t F (x), JF (x)s + (t t) JF (x)s tµ s. Using (4) and the fact that t [0, 1], we conclude the proof. Now we analyze the norm of F along the direction s. Lemma.3. For any t [0, 1], F (x + ts) F (x) + t F (x), JF (x)s + (t t) JF (x)s + [ ] L + t s 4 t3 s + Lt F (x) + tjf (x)s µ F (x) + t JF (x) T F (x), s + (t t) JF (x)s + [ ] L + t s 4 t3 s + Lt F (x) µ. Proof. Let R(t) = F (x + ts) (F (x) + tjf (x)s). Since F (x + ts) = F (x) + tjf (x)s + R(t), F (x + ts) = F (x) + tjf (x)s + R(t) + F (x) + tjf (x)s, R(t) F (x) + tjf (x)s + R(t) + F (x) + tjf (x)s R(t) F (x) + tjf (x)s + R(t) + F (x) R(t) 3

4 where the first inequality follows from Cauchy-Schwarz inequality and the second one from Lemma.. From the definition of R(t) and the assumption of JF being L-Lipschitz continuous, we have that R(t) Lt s /. Therefore, F (x + ts) F (x) + tjf (x)s + L 4 t4 s 4 + Lt F (x) s. To end the proof, use Lemma. again. In view of Lemma.3 we need a bound for s in order to choose, at a non-stationary point, a regularization parameter µ. The next technical lemma will be useful to obtain such a bound. Lemma.4. For any A R m n, b R m and µ > 0 (A T A + µi) 1 A T b 1 µ P R(A)b, where R(A) is the range of A and P R(A) is the orthogonal projection onto this subspace. Proof. Let b = P R(A) b and s = (A T A + µi) 1 A T b. Observe that (A T A + µi)s = A T b. (5) We will use an SVD of A. There exist U and V unitary matrices m m and n n, respectively, such that { U T σ i, i = j AV = D, d i,j = 0, otherwise, where σ i 0 for all i = 1,..., min{m, n}. Note that V T = V 1, U T = U 1 and V T A T AV = (U T AV ) T U T AV = D T D. Therefore, pre-multiplying both sides of (5) by V T and using the substitutions in (5) we conclude that It follows from this equation that if n m then while if n > m then s = V T s, b = U T b D T b = V T (A T A + µi)v s = (D T D + µi) s. s i = σ i σ i + µ b i, i = 1,..., n; σ i s i = σi + µ b i, 1 i m 0, m < i n. Since t/(t + µ) 1/( µ) for µ > 0 and t 0, we have s 1 µ b. To end the proof, note that s = s and b = b. 4

5 In the following result, the main ingredients for the devised algebraic rules are established. Theorem.5. For any µ > 0, s 1 1 P (F (x)) µ F (x) µ where P is the orthogonal projection onto the range of JF (x). Moreover, if then µ L 4 ( F (x) + 4 F (x) + P (F (x)) ), (6) F (x + s) F (x) + F (x), JF (x)s. Proof. The first part of the theorem, which are the bounds for s, follows from (), Lemma.4 and the metric properties of the orthogonal projection. Combining the first part of the theorem with Lemma.3, we conclude that F (x + s) F (x) + F (x), JF (x)s + s µ [ L 16 P (F (x)) + L F (x) µ µ To end the proof, note that the right hand-side of inequality (6) is the largest root of the concave quadratic µ L 16 P (F (x)) + L F (x) µ µ. Remark.6. In view of Theorem.5, possible choices for µ at a point x are or or µ = L 4 ( F (x) + 4 F (x) + P (F (x)) ), (7) µ = L (4 F (x) + P (F (x)) ), (8) 4 µ = + 5 L F (x). (9) 4 The first value is the smallest one and the two first values require the computation of P (F (x)), although an upper bound for such a norm can also be used. ]. 5

6 3 The algorithm Now we propose and analyze the Levenberg-Marquard algorithm with algebraic rules for computing the regularization parameter. Algorithm 1. Input : x 0 R n, β (0, 1), η [0, 1), L 0 > 0 and δ 0 with L 0 δ. 1. k 0. while JF (x k ) T F (x k ) 0 do 3. Set F k = F (x k ), J k = JF (x k ) 4. Choose µ k [µ k, µ+ k ] where P k is the projection onto the range of J k, 5. Compute s k = (J T k J k + µ k I) 1 J T k F k 6. t 1 µ k = L k 4 ( F k + 4 F k + P k (F k ) ), µ + k = + 5 L k F k 4 7. while F (x k + ts k ) > F k + βt s k, Jk T F k do 8. t t/ 9. end while 10. t k = t 11. x k+1 = x k + t k s k 1. if t k < 1 then 13. L k+1 = L k 14. else 15. Ared = F k F (x k+1 ) 16. Pred = F k F k + J k s k µ k s k = s k, Jk T F k 17. if Ared > η Pred then 18. L k+1 = max{l k /, δ} 19. else 0. L k+1 = L k 1. end if. end if 3. k k end while Concerning Algorithm 1, it is worth noticing that i) Iteration l begins with k = l 1, and ends with k = l if JF (x l 1 ) T F (x l 1 ) 0. ii) If the algorithm does not end at iteration k + 1, then µ k > 0, s k is well defined and it is a descent direction for F ( ). Therefore, the Armijo line search on Steps 6 10 of Algorithm 1 has finite termination. Altogether, Algorithm 1 is well defined and either it terminates after l steps with JF (x l 1 ) T F (x l 1 ) = 0 or it generates infinite sequences (x k ), (s k ), (t k ), (µ k ), (L k ). iii) L k plays the role of the Lipschitz constant of JF. If δ > 0, this parameter also plays the role of a safeguard which prevents L k from becoming too small. From now on, we assume that Algorithm 1 with input x 0 R n, β (0, 1), η [0, 1), L 0 > 0 and δ 0 does not stop at Step and that (x k ), (s k ), (t k ), (µ k ), (L k ) are the (infinite) sequences generated by it. 6

7 We analyze next the basic properties of Algorithm 1. Proposition 3.1. If L k L, then t k = 1 and L k+1 = max{l k /, δ}. Proof. Suppose that L k L. From this assumption and the definition of µ k (in Step 4) we have that µ k L 4 ( F (x k) + 4 F (x k ) + P (F (x k )) ), where P is the orthogonal projection onto the range of JF (x k ). The first equality of the proposition follows from the above inequality; the definition of s k (in Step 5); Theorem.5 with µ = µ k, x = x k, s = s k ; and Steps The second equality comes from the first one and Steps 1. Proposition 3.. For all k, and, for infinitely many ks, t k = 1. δ L k max{l 0, L}; (10) Proof. Since L 0 δ and L k+1 max{l k /, δ} for all k, the first inequality in (10) also holds for all k. We will prove the second inequality by induction in k. This inequality holds trivially for k = 0. Assume that it holds for some k. Steps 1 of the algorithm imply that if t k = 1, then L k+1 = L k or L k+1 = max{δ, L k /} L k and in both cases the inequality holds for k + 1. If t k < 1, it follows from Proposition 3.1 that L k < L and then L k+1 = L k L. So the inequality holds for k + 1 and the induction proof is complete. To prove the second part of the proposition, suppose that t k < 1 for any k k 0. Then in contradiction with (10). L k = k k 0 L k0, k = k 0, k 0 + 1,... From Proposition 3. and the Step 4 of the algorithm, we have for all k. δ F (x k ) µ k max{l 0, L} F (x k ) (11) Proposition 3.3. For each k F (x k+1 ) F (x k ) + βt k JF (x k ) T F (x k ), s k F (x k ) βt k JF (x k ) T F (x k ) JF (x k ) + µ k. (1) As a consequence, the sequence ( F (x k ) ) is strictly decreasing and k=0 βt k JF (x k ) T F (x k ) JF (x k ) + µ k F (x 0 ). Proof. The first inequality follows from the stopping condition for the Armijo line search (Steps 6 10). In view of the definition of s k and µ k, and the fact that µ k > 0, JF (x k ) T F (x k ), s k = JF (x k ) T F (x k ), (JF (x k ) T JF (x k ) + µ k I) 1 JF (x k ) T F (x k ) JF (x k ) T F (x k ) JF (x k ) T JF (x k ) + µ k, which trivially implies the second inequality. The last statement of the proposition follows directly from (1). 7

8 4 General convergence analysis The convergence of Algorithm 1 is examined in the following results. Proposition 4.1. If the sequence (x k ) is bounded, then it has a stationary accumulation point. Proof. By Proposition 3., t k = 1 for infinitely many ks. Since (x k ) is bounded, there exists a subsequence (x kj ) convergent to some x, such that t kj = 1 for all j. Thus, by Proposition 3.3, j=1 β JF (x k j ) T F (x kj ) JF (x kj ) + µ kj F (x 0 ). Moreover, since F and JF are continuous, F (x k ), JF (x k ) and µ k are bounded. Hence the sequence (JF (x kj ) T F (x kj )) converges to 0. To end the proof, use again the continuity of F and JF to conclude that JF ( x) T F ( x) = 0. From the bound for the norm of F along the direction s, obtained in Lemma.3, we prove next that the step length is bounded away from zero. Proposition 4.. If δ > 0, then for all k. Proof. For any t [0, 1] it holds L t k 8δ /L δ/L ( ) L 4 t3 s k + Lt F (x k ) µ k t 4 s k + L F (x k ) µ k ( ) L t F (x k ) + L F (x k ) µ k 16µ k ( ) L t 16δ F (x k) + L F (x k ) δ F (x k ) [ ( L = δ F (x k ) t 16δ + L ) ] 1, (13) δ where the first inequality comes from the bound on t, the second from Theorem.5 and the third from the fact that µ k δ F (x k ), as stated in (11). From inequality (13) and Lemma.3, we have that if 1 0 t L 16δ + L δ = 16δ /L δ/L, then F (x k + ts k ) F (x k ) + t s k, JF (x k ) T F (x k ). So, the result follows from Steps 6 10 of Algorithm 1. We now present the global convergence result of Algorithm 1. 8

9 Proposition 4.3. If δ > 0, then all accumulation points of the sequence (x k ) are stationary for the function F (x). Proof. Suppose that (x kj ) converges to some x. From Propositions 3.3 and 4., we have that j=1 JF (x kj ) T F (x kj ) JF (x kj ) + µ kj <. It follows from (11) and from the continuity of F and JF that JF (x kj ) +µ kj is bounded. Therefore JF (x kj ) T F (x kj ) converges to 0. A complexity result concerning the stationarity measure of the Algorithm 1 is given next. Proposition 4.4. If δ > 0 and (x k ) is bounded, then min JF (x i) T F (x i ) = O i=1,...,k ( 1 k ). Proof. Define M = sup k { JF (x k ) max{l 0, L} F (x k ) }. Then, by (11) and Propositions 3.3 and 4., for any k k β ( 8δ /L ) min M δ/L JF (x i) T F (x i ) i=1,...,k and the conclusion follows. k i=1 βt i JF (x i ) T F (x i ) JF (x i ) + µ i F (x 0 ) 5 Quadratic convergence under an error bound condition Given F : R n R m, consider the system and let X be its solution set, that is, F (x) = 0 (14) X = {x R n F (x) = 0}. (15) Our aim is to prove that (x k ) converges quadratically near solutions of (14) where, locally, F (x) provides an error bound for this system, in the sense of [17, Definition 1]. For completeness, we present this definition in the sequel (see Definition 5.3). Define, for γ > 0, S γ : R n \ X R n ( 1 S γ (x) = JF (x) T JF (x) + γ F (x) I) JF (x) T F (x). (16) Auxiliary bounds are established in the next two results. 9

10 Proposition 5.1. If x R n \ X, γ > 0, s = S γ (x), and x + = x + s then JF (x + ) T F (x + ) (γ + L) F (x) s + L JF (x +) s. Proof. Direct algebraic manipulations yields JF (x + ) T F (x + ) = JF (x) T (F (x) + JF (x)s) + (JF (x + ) JF (x)) T (F (x) + JF (x)s) + JF (x + ) T [F (x + ) (F (x) + JF (x)s)]. It follows from (16) that JF (x) T (F (x) + JF (x)s) = γ F (x) s. To end the proof, combine this result with the above equality, assumption (1), Lemma.1, and Lemma.. Proposition 5.. If x R n \ X, x X, γ > 0, s = S γ (x), and s = x x, then 1. F (x) + JF (x) s L s ;. F (x) + JF (x)s + JF (x)( s s) + γ F (x) ( s + s s ) L 4 s 4 + γ F (x) s. Proof. Since F ( x) = 0, F (x) + JF (x) s = F (x) + JF (x)( x x) F ( x) and the first inequality follows from this equality and Lemma.1. Define From item 1 we have ψ γ,x : R n R, ψ γ,x (u) = F (x) + JF (x)u + γ F (x) u. ψ γ,x ( s) L 4 s 4 + γ F (x) s. Observe that s = argmin u R n ψ γ,x (u). Since ψ γ,x is a quadratic with Hessian (JF (x) T JF (x) + γ F (x) I) and it is minimized by s, ψ γ,x ( s) = ψ γ,x (s) + JF (x)( s s) + γ F (x) s s = F (x) + JF (x)s + JF (x)( s s) + γ F (x) ( s + s s ). The second inequality of the proposition follows from the two above relations. We will analyze the local convergence of the sequence (x k ) under the local error bound condition, as defined next, that is weaker than assuming nonsingularity of JF (x) T JF (x) for x at the solution set X. Definition 5.3 ([17, Definition 1]). Let V be an open subset of R n such that V X, where X is as in (15). We say that F (x) provides a local error bound on V for the system (14) if there exists a positive constant c such that for all x V. c dist(x, X ) F (x) 10

11 The next lemma, which will be instrumental in the main result of this section, was proved in [7, Corollary ]. A proof is provided here for the sake of completeness, where the function F is simply continuously differentiable. Lemma 5.4. If F : R n R m is continuoulsy differentiable and F (x) provides an error bound for (14) in a neighborhood V of x X as in Definition 5.3, then JF (x) T F (x) also provides an error bound for (14) in some neighborhood of x. Proof. Suppose that V R n is a neighborhood of x where F (x) provides an error bound for (14), that is, c dist(x, X ) F (x) x V. Define, for x, y R n R(y, x) = F (y) (F (x) + JF (x)(y x)). Since F is continuously differentiable, there exists r > 0 such that B(x, r) V and R(y, x) c y x / y, x B(x, r). Take x B(x, r/) and x arg min z X z x. Then dist(x, X ) = x x < r/, x B(x, r) and, in view of the above assumptions, c x x F (x), R( x, x) c x x /. (17) Since F ( x) = 0, JF (x)( x x) = F (x) + R( x, x) and JF (x) T F (x), x x = F (x), JF (x)( x x) = F (x) + F (x), R( x, x). Using the above equalities, Cauchy-Schwarz inequality and (17) we conclude that JF (x) T F (x) x x F (x) ( F (x) R( x, x) ) F (x) ( c x x (c/) x x ) c x x /. Therefore, JF (x) T F (x) (c /) dist(x, X ) for any x B(x, r/). From now on, in this section, x R n, r, c > 0 are such that F (x ) = 0, c dist(x, X ) F (x) x B(x, r). (18) Another auxiliary result is provided next. Lemma 5.5. Consider x R n, r, c > 0 satisfying (18). If x B(x, r) \ X, x arg min u x, u X γ > 0, s = S γ (x) and s = x x, then 1. s = dist(x, X ) F (x) /c;. s ) 1/ (1 + L 4cγ s s ; 3. F (x) + JF (x)s ( L 4c γ s + 1 ) 1/ cγ s γ F (x). 11

12 Moreover, if s cγ L then F (x + s) F (x) + s, JF (x) T F (x). Proof. The first relation in item 1 follows trivially from the definition of x while the second relation comes from (18). From item of Proposition 5. and item 1 of this lemma, we have that and γ F (x) s L 4 s 4 + γ F (x) s L 4c F (x) s 3 + γ F (x) s ( F (x) + JF (x)s L L 4 s 4 + γ F (x) s 4c γ s + 1 ) cγ s γ F (x), which trivially imply items and 3, respectively. To prove the last part of the lemma, suppose that s cγ and define L From items and 3, we have a = L 4 s + L F (x) + JF (x)s γ F (x), w = L cγ s. ) ( a (1 L + L L 4 4cγ s s + L 4c γ s + 1 ) 1/ cγ s γ F (x) γ F (x) [ L ) ( (1 4cγ s + L L 4cγ s + L 4c γ s + 1 ) 1/ cγ s 1] γ F (x) [ w ( = 1 + w ) ( + w 1/ 1 + w ) ] 1/ 1 γ F (x), where the second inequality follows from item 1 and the equality comes from the definition of w. To end the proof, observe that since w 1/, a < 0 and use the first inequality in Lemma.3 with t = 1 and µ = γ F (x). Observe that in Algorithm 1, s k = S γ (x k ) for γ = µ k / F (x k ). In order to simplify the proofs, define D = + 5 µ k max{l 0, L}, γ k =, k N. (19) 4 F (x k ) In view of (11) and the definition of s k in Algorithm 1, for all k N, δ γ k D, s k = S γk (x k ). (0) Assuming that the residual function F provides an error bound for the solution set of the NLS zero-residual problem, the local convergence of Algorithm 1 is established as follows. Theorem 5.6. Consider x R n, r, c > 0 satisfying (18). There exists r > 0 such that if x k0 B(x, r) for some k 0, then either Algorithm 1 stops at some x k solution of (14) or (x k ) converges q-quadratically to some ˆx solution of (14). 1

13 Proof. In view of Lemma 5.4, there exist 0 < r 1 r and c 1 > 0 such that c 1 dist(x, X ) JF (x) T F (x), x B(x, r 1 ). (1) Define and M 1 = max { JF (x) x B(x, r 1 )}, M = 3M ( ( 1 D + L )) c 1 4 ρ = min { c δ L, 3 r 1, 1 M }. We claim that if x B(x, ρ) \ X, δ γ D, s = S γ (x), x + = x + s, x arg min u x, s = x x, () u X then s 3 dist(x, X ) 3 ρ, (3) dist(x +, X ) M dist(x, X ) dist(x, X )/, (4) F (x + ) F (x) + s, JF (x) T F (x). (5) The first inequality in (3) follows from item of Lemma 5.5 and the definitions of s and ρ. The second inequality comes from the inclusions x X and x B(x, ρ). To prove (4) first observe that x + x x x + s ρ + 3 ρ < r 1 which comes from the definition of x +, the triangle inequality, (3) and the definition of ρ. Consequently, from (1) for x + and Proposition 5.1, dist(x +, X ) 1 JF (x + ) T F (x + ) 1 ((γ + L) F (x) + L ) c 1 c 1 JF (x +) s s. Since JF is continuous, F (x) = F ( x) 1 0 JF ( x t s) s dt. As F ( x) = 0, we have ( ) F (x) max JF (x) s M 1 s. x B(x,ρ) Using the two above displayed relations, the bounds for γ in (), (3), and the definitions of M 1 and M we have dist(x +, X ) 3M ( ( 1 D + L )) c 1 4 s = M dist(x, X ), which proves the first inequality of (4). The second inequality in (4) comes from the fact that dist(x, X ) ρ and from the definition of ρ. 13

14 Our third claim, (5), follows directly from the last result of Lemma 5.5. Next we define a family of sets on R n which are, in some sense, well behaved with respect to Algorithm 1. Define and let W τ = { x R n x x 3τ ρ, dist(x, X ) r = Since x X, we have W 0 = B(x, r). Hence 3 + ρ. B (x, r) W B(x, ρ) B(x, r 1 ). } (1 τ) 3 + ρ, W = W τ, 0 τ<1 Let x k W. If JF (x k ) T F (x k ) = 0 then the algorithm stops at x k and, in view of (1), F (x k ) = 0. Next we analyze the case JF (x k ) T F (x k ) 0. It follows from (0) and (5) that x k+1 = x k + S γk (x k ) = x k + s k. As x k W, we have x k W τ definition of W τ, and (3) for some 0 τ < 1. Therefore, from the triangle inequality, the x k+1 x x k x + s k 3τ ρ + 3 dist(x k, X ) 3τ ρ + 3 (1 τ) 3 + ρ = 3 Additionally, from (4) and the definition of W τ we also have Altogether we proved that dist(x k+1, X ) 1 dist(x k, X ) (1 τ) (3 + ) ρ = ( 1+τ ) ρ. ( ) 1 1+τ 3 + ρ. 0 τ < 1, x k W τ, JF (x k ) T F (x k ) 0 x k+1 W 1+τ. Suppose that x k0 B(x, r). We have just proved that in this case either the algorithm stops at some x k X or an infinite sequence is generated and x k B(x, ρ) for k k 0. Assume that an infinite sequence is generated and define It follows from (3) and (4) that for k k 0, d k = dist(x k, X ). s k 3 d k, d k+1 M d k d k. (6) 14

15 Hence, j=k 0 x j+1 x j 3 j=k 0 d j 3 d k0. As (x k ) is a Cauchy sequence, it converges to some ˆx. Since d k = dist(x k, X ) converges to 0, F (ˆx) = 0, that is, ˆx X. By the triangle inequality and the first inequality in (6), x k ˆx s k + s j 3 d k + d j. j=k+1 From the two last inequalities in (6) we have j=k+1 d j M d j M d k j=k Therefore, from the two previously displayed relations, j=k d k x k ˆx 3 d k, j=k+1 d j M d k d k. where the first inequality comes from the inclusion ˆx X. Consequently, using (6) again and the definition of d k, we conclude that x k+1 ˆx 3 d k+1 3M d k 3M x k ˆx, ensuring the q-quadratic convergence and completing the proof. 6 Numerical experiments In this section we describe the experiments prepared to illustrate the practical performance of the algorithm. We start with the algorithmic and computational choices for obtaining the numerical results. 6.1 On the choice of µ k Similarly to the analysis developed for the unconstrained minimization problem (cf. [11, Prop.1.]), as the positive scalar µ is a lower bound for the smallest eigenvalue of JF (x) T JF (x) + µi, it follows that µ s s, (JF (x) T JF (x) + µi)s = s, JF (x) T F (x) s JF (x) T F (x) and so s JF (x)t F (x) µ. By Lemma.3, with t = 1, [ ] L F (x + s) F (x) + F (x), JF (x)s + s 4 s + L F (x) µ. For obtaining µ that ensures L 4 s + L F (x) µ 0, the previous condition upon the step gives L JF (x) T F (x) + 4L F (x) µ 4µ 3 0. (7) 15

16 Let µ = z and the concave function ψ : R + R given by ψ(z) = L JF (x) T F (x) + 4L F (x) z 4z 3/. Consider z 0 = L F (x). Note that ψ(z 0 ) = L JF (x) T F (x) 0. An iteration of Newton s method from z 0 gives us Then (7) is guaranteed for all z 1 = z 0 ψ(z 0) ψ (z 0 ) = L F (x) + L JF (x)t F (x). F (x) µ L F (x) + L JF (x)t F (x) F (x) def = µj. Observe that L F (x) µ J, µ + = L F (x) and so µ+ µ J. Nevertheless, there is no guarantee that µ J µ + holds, so we set µ k = min{µ + k, µ J}. 6. On the computation of the step s k The equivalence between the problems 1 min s J ks + F k + µ 1 s and min s ( Jk µk I ) ( Fk s + 0 ) implies that the system may be handled by the normal equations as follows (cf. [14]) [( ) ( (Jk T Jk Fk µk I) s + µk I 0 (J T k J k + µ k I)s + J T k F k = 0 (8) )] = 0. As an alternative to the Cholesky factorization of J T k J k + µ k I, the QR factorization of avoids performing the product J T k J k: ( Jk µk I ) ( Rµ = Q 0 ( Jk µk I where Q R (m+n) (m+n) is orthogonal and R µ R n n is upper triangular. Indeed, the economic version of the factorization may be used, in which just the first n columns of the orthogonal matrix Q are computed. The direction s is thus obtained from the system (8) by means of the relationship 6.3 About the backtracking scheme J T k J k + µ k I = R T µ R µ. Besides the simple bisection of Steps 6 10 of Algorithm 1, we have implemented a quadratic-cubic interpolation scheme (cf. [3, 6.3.]), which performed slightly better in preliminary experiments, so it was the choice for the reported results. ), ) 16

17 6.4 About the initialization of the sequence (L k ) { We set L 0 = 10 6, computed x 1 and then defined L 1 = max JF (x1 ) JF (x 0 ) n x1 F x 0 updating scheme follows Steps 1 of Algorithm The results }, δ. After that, the The tests were performed in a notebook DELL Intel Core i7-4510u, CPU.00GHz 4, with 16GB RAM, Inspiron i A30, 64-bit, using Matlab 014a, v The remaining parameters of the Algorithm 1 were defined as β = 10, η = 0.5 and δ = The set of test problems consists of all 53 problems from the CUTEst collection [9] such that the nonlinear equations F (x) = 0 are recast from feasibility instances, i.e, problems without objective function, with nonlinear equality constraints and without fixed variables for which the Jacobian matrix is available. The initial point x 0 was always the default of the collection. To put our approach in perspective, the same problems were addressed by the benchmark modular code lsqnonlin (available within the Matlab software), based on the Levenberg-Marquardt Method [1, 13, 15]. Summing up, there are three strategies under analysis, which are: S 1 : lsqnonlin; S : Algorithm 1 with Cholesky factorization to compute s k ; S 3 : Algorithm 1 with QR factorization to compute s k. The stopping criteria of the three strategies were the following, where ε mac is the machine precision: (1) Convergence to a solution with relative stationarity: JF (x k ) T F (x k )) max{ JF (x 0 ) T F (x 0 )), ε mac }; () Change in x too small: t k s k 10 9 ( εmac + x k+1 ) ; (3) Change in the norm of the residue too small: F (x k+1 ) F (x k ) 10 6 F (x k ) ; (4) Computed search direction too small: s k 10 9 ; (0) Maximum number of functional evaluations exceeded max = {000, 10n}. (-4) Line search failed, as the stepsize is too small: t k Let fi be the objective function value obtained by strategy S i when applied to a given problem, run with the default initial point. We consider that the strategy S i found a solution (cf. [1]) if fi f min 0.01, (9) max{1, f min } where f min = min{f i i = 1,, 3} and f(x) = F (x). The complete outputs are reported in the Appendix. Tables 1- list the 53 test problems, displaying in the columns the problem name; the dimensions (n and m); the number of performed iterations (#Iter); the number of function evaluations (#Fun); the function value at the last iterate 17

18 (f ); the demanded CPU time in seconds (CPU), and the reason for stopping (exit) for the three solvers, namely, S 1, S, S 3, the results of which are displayed row by row for each problem. As it is usual to have some variation of the CPU time from one execution of an algorithm to the other, for each problem we ran 5 times all the solvers and we considered the average CPU time of the 5 runs. The symbol indicates that the obtained solution does not satisfy (9). It is worth mentioning that S spent one Cholesky factorization per iteration. The only exception occurred at a single iteration of the problem 10FOLDTR in which S demanded an additional Cholesky factorization with a slight increasing of µ k, to ensure the numerically safe positive definiteness of J T k J k + µ k I. The results corresponding to the solved problems are depicted in the performance profiles of Figure 1 for the number of iterations, the number of function evaluations and the demanded CPU time, respectively, where logarithmic scale was used in the horizontal axis for better visualization of the results. In terms of efficiency, our strategies slightly outperform the benchmark when it comes to the number of iterations and function evaluations. Indeed, 56.6% of the problems were solved with the fewest number of iterations for S 1 and 58.5% for S and S 3, whereas 5.8% of the problems were solved with the fewest number of function evaluations for S 1, 60.4% for S and S 3. Now, concerning the CPU time, the difference is more significant, as just 1.9% of the problems were solved with the fewest amount of time for S 1, 8.3% for S and 69.8% for S 3. Strategies S and S 3 are the most robust, solving 5 of the 53 problems. Problem EIGENB was considered not solved by strategy S, and 10FOLDTR was not solved by S 3. On the other hand, the benchmark did not solve three problems (10FOLDTR, CYCLIC3, YATPSQ), showing that our approach is competitive. Figure 1: Performance profiles for the number of iterations, the number of function evaluations and the demanded CPU time. With the aim of illustrating the rate of convergence of the proposed algebraic rules, Figure shows the logarithm (base 10) of the residual value against iterations for both strategies S and S 3 for two typical problems. 18

19 Figure : The residual value F (x k ) against iterations, for the problems HEART6 (left) and HYDCAR6 (right). 7 Final remarks We have proposed and analyzed a class of Levenberg-Marquardt methods in which algebraic rules for computing the regularization parameter were devised. Under the Lipschitz continuity of the Jacobian of the residual functions, the algebraic rules were obtained to allow the full LM-step to be accepted by the Armijo sufficient decrease condition. In terms of global convergence, all the accumulation points of the sequence generated by the algorithm are stationary. When it comes to local convergence, for zero-residual problems we have proved q-quadratic rate under an error-bound condition, less restrictive than assuming nonsingularity at the solution. A set of numerical experiments was prepared to illustrate the practical performance of the proposed algorithm. Our approach has shown both efficiency and robustness for the 53 feasibility instances from the CUTEst, compared with the benchmark lsqnonlin of Matlab. Obtaining inexact solutions for the linear systems, with adequately matching stopping criteria, as well as considering nonzero-residual problems in the local convergence analysis are topics for future investigation. Acknowledgements. The authors are thankful to Abel S. Siqueira for installing the CUTEst and preparing our machine to run it within Matlab. References [1] E. G. Birgin and J. M. Gentil. Evaluating bound-constrained minimization software. Comput. Optim. Appl., 53(): , 01. [] H. Dan, N. Yamashita, and M. Fukushima. Convergence properties of the inexact Levenberg- Marquardt method under local error bound conditions. Optim. Methods Softw., 17(4):605 66, 00. [3] J. E. Dennis, Jr. and R. B. Schnabel. Numerical methods for unconstrained optimization and nonlinear equations, volume 16 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, Corrected reprint of the 1983 original. 19

20 [4] J. Fan. Accelerating the modified Levenberg-Marquardt method for nonlinear equations. Math. Comp., 83(87): , 014. [5] J. Fan and J. Pan. Convergence properties of a self-adaptive Levenberg-Marquardt algorithm under local error bound condition. Comput. Optim. Appl., 34(1):47 6, 006. [6] J. Fan and Y.-X. Yuan. On the quadratic convergence of the Levenberg-Marquardt method without nonsingularity assumption. Computing, 74(1):3 39, 005. [7] A. Fischer. Local behavior of an iterative framework for generalized equations with nonisolated solutions. Math. Program., 94(1, Ser. A):91 14, 00. [8] A. Fischer, P. K. Shukla, and M. Wang. On the inexactness level of robust Levenberg-Marquardt methods. Optimization, 59():73 87, 010. [9] N. Gould, D. Orban, and P. Toint. Cutest: a constrained and unconstrained testing environment with safe threads. Technical Report RAL-TR , Rutherford Appleton Laboratory, Oxfordshire, England, 013. [10] S. Gratton, A. S. Lawless, and N. K. Nichols. Approximate Gauss-Newton methods for nonlinear least squares problems. SIAM J. Optim., 18(1):106 13, 007. [11] E. W. Karas, S. A. Santos, and B. F. Svaiter. Algebraic rules for quadratic regularization of Newton s method. Comput. Optim. Appl., 60: , 014. [1] K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quart. Appl. Math., : , [13] D. W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Indust. Appl. Math., 11: , [14] J. J. Moré. The Levenberg-Marquardt algorithm: implementation and theory. In G. A. Watson, editor, Lecture Notes in Mathematics 630: Numerical Analysis, pages Springer-Verlag, New York, [15] D. D. Morrison. Methods for nonlinear least squares problems and convergence proofs. In J. Lorell and F. Yagi, editors, Proceedings of the Seminar on Tracking Programs and Orbit Determination, pages 1 9. Jet Propulsion Laboratory, Pasadena, CA, USA, [16] K. Ueda and N. Yamashita. On a global complexity bound on the Levenberg-Marquardt method. J. Optimiz. Theory App., 147: , 010. [17] N. Yamashita and M. Fukushima. On the rate of convergence of the Levenberg-Marquardt method. Topics in Numerical Analysis, Comp. Suppl., 10:39 49, 001. [18] Y.-X. Yuan. Recent advances in numerical methods for nonlinear equations and nonlinear least squares. Numer. Algebra Control Optim., 1(1):15 34, 011. Appendix The complete computational results are presented next. 0

21 Problem n m #Iter #Fun f CPU exit e e FOLDTR e e e e e e 01 3 ARGAUSS e e e e e e 01 3 ARGLALE e e e+0.88e e e 01 1 ARGLBLE e e e e e e 01 1 ARGTRIG e e e e e e 01 3 ARWHDNE e e e e e e 01 1 BOOTH e e e e e e 01 1 BROWNALE e e e e e e 01 1 BROYDN3D e e e e e 3.098e 01 1 BROYDNBD e e e 0.319e e e+00 1 CHANDHEU e e e 9.16e e e 01 1 CHNRSBNE e e e e e.9963e 01 1 CLUSTER e e e e e e 01 1 COOLHANS e 5.033e e e e e 01 1 CUBENE e e e e e e 01 1 CYCLIC e e e e e e 01 1 EIGENAU e e e e e e 01 1 EIGENB e e e e e e+00 1 EIGENC e e e e e e 01 1 GOTTFR e e e e e e 01 1 GROWTH e e e e e e 01 1 HATFLDF e e e e e e 01 1 HATFLDG e e e e e e 01 1 HEART e e e e e e 01 1 HEART e e e e e e 01 1 HIMMELBA e e e e e e 01 1 HIMMELBC e 4.484e e 8.634e 03 1 Table 1: Numerical results 1

22 Problem n m #Iter #Fun f CPU exit e e 01 3 HIMMELBD e e e e e e 01 1 HIMMELBE e e e e e e 01 3 HYDCAR e e e e e e 01 1 HYDCAR e e e e e e 01 1 HYPCIR e e e e e e+00 1 KSS e e e e e e 01 1 METHANB e e e e e e 01 1 METHANL e e e e e e+00 1 MSQRTA e e e 8.001e e 1.747e+00 1 MSQRTB e e e e e e 01 1 OSCIGRNE e e e e e e 01 3 OSCIPANE e e e e e e 01 1 POWELLBS e e e e e e 01 0 POWELLSQ e e e e e e 01 1 RECIPE e e e e e e 01 1 RSNBRNE e e e e e e 01 1 SINVALNE e e e e e e 01 1 SPIN e e e e e e 01 1 SPIN e e e e e e 01 1 SPMSQRT e e e e e e 01 1 YATP1NE e e e e e 3.85e 01 1 YATP1SQ e e e e e 3.56e 01 1 YATP1SS e e e e e e+03 3 YATPSQ e e e e e e 01 1 YFITNE e e e e e e 01 1 ZANGWIL e e e e 0 1 Table : Numerical results

The effect of calmness on the solution set of systems of nonlinear equations

Mathematical Programming manuscript No. (will be inserted by the editor) The effect of calmness on the solution set of systems of nonlinear equations Roger Behling Alfredo Iusem Received: date / Accepted: