Algebraic rules for computing the regularization parameter of the Levenberg-Marquardt method
|
|
- Nathan Ball
- 5 years ago
- Views:
Transcription
1 Algebraic rules for computing the regularization parameter of the Levenberg-Marquardt method E. W. Karas S. A. Santos B. F. Svaiter May 7, 015 Abstract A class of Levenberg-Marquardt methods for solving the nonlinear least-squares problem is proposed with algebraic explicit rules for computing the regularization parameter. The convergence properties of this class of methods are analyzed. All accumulation points of the generated sequence are proved to be stationary. Q-quadratic rate of convergence for the zero-residual problem is obtained under an error bound condition. Illustrative numerical experiments are presented, with encouraging results. Keywords: Nonlinear least-squares problems; Levenberg-Marquardt method; regularization; global convergence; local convergence; computational results. AMS Classification: 90C30, 65K05, 49M37. 1 Introduction Given F : R n R m, the nonlinear least-squares (NLS) problem is stated as min F x R (x), n where is the Euclidean norm. It is important from both theoretical and practical viewpoints [3, 18]. The seminal works of Levenberg [1], Morrison [15] and Marquardt [13] provided a regularization strategy for improving the convergence of the Gauss-Newton (GN) method, possibly the first and simpler choice of a method to address the NLS problem. In the GN method, the nonlinear residual function F is replaced by its linear approximation at the current iterate, so that the NLS problem is solved by means of a sequence of quadratic problems, without any additional parameter. In the Levenberg-Marquardt (LM) method, a sequence of quadratic problems is also generated, but a regularization term is introduced, depending on the so-called LM parameter in an essential way. In practice, especially for very large problems, as those arising in data assimilation, it may be necessary to make further approximations within the GN method in order to reduce computational Department of Mathematics, Federal University of Paraná, CP 19081, , Curitiba, PR, Brazil. This author was partially supported by CNPq grants /013-3 and / Department of Applied Mathematics, University of Campinas, Campinas, SP, Brazil. This author was partially supported by CNPq grant 30403/010-7, FAPESP grants 013/ and 013/ and PRONEX Optimization. IMPA, Estrada Dona Castorina, 110, Rio de Janeiro, Brazil. This author was partially supported by CNPq grants 3096/011-5, /010-7, FAPERJ grant E-6/10.940/011 and PRONEX Optimization. 1
2 costs. Gratton et al. [10] point out conditions ensuring that the truncated and perturbed GN methods converge and also derive rates of convergence for the iterations. Inexact versions of the LM method have also been considered, usually under an error bound condition, which is weaker than assuming nonsingularity of the appropriate matrix at a solution, namely the Jacobian JF R m n if m = n or the square matrix JF T JF, otherwise [, 8]. Local convergence of the LM method has been studied under an error bound condition as well. Yamashita and Fukushima [17] proved q-quadratic rate for the LM method with the LM parameter sequence set as µ k = F (x k ). Later, this q- quadratic rate was extended by Fan and Yuan [6] for the setting µ k = F (x k ) δ, with δ [1, ]. Fan and Pan [5] have enlarged the analysis for the variation of the exponent to δ (0, 1) within the aforementioned update of the LM parameter, showing local superlinear convergence in this case. Accelerated versions of the LM method have been recently proposed by Fan [4], in which cubic local convergence was reached. When it comes to complexity analysis, Ueda and Yamashita [16] have investigated a global complexity bound for the LM method. In this work we have devised algebraic rules for computing the LM parameter, so that µ k belongs to an interval that is proportional to F (x k ). Under the Lipschitz continuity of the Jacobian, the rules were obtained with the aim of accepting the full LM step by the Armijo sufficient decrease condition. On one hand, under availability of the Lipschitz constant, we have proved that the full steps are accepted. On the other hand, in case the Lipschitz constant is unknown, we have proposed a scheme for dynamically updating an estimate of such a constant, within a well defined and globally convergent algorithm for solving the NLS problem. The q-quadratic rate of convergence for the zero-residual problem is obtained under an error bound condition. Numerical results illustrate the performance of the proposed algorithm for a set of test problems from the CUTEst, with promising results, in terms of both efficiency and robustness. The text is organized as follows. The basic results that have generated the proposed algorithm are presented in Section. The algorithm and its underlying properties are given in Section 3. The global convergence analysis is provided in Section 4, whereas the local convergence result is developed in Section 5. The numerical experiments are presented and examined in Section 6. A summary of the work is stated in Section 7 and the tables with the complete numerical results compose the Appendix. Technical results We consider F : R n R m of class C 1 with an L-Lipschitz continuous Jacobian JF R m n, that is, for all x, y R n, JF (x) JF (y) L x y, (1) where at the left hand-side of the previous inequality we have the operator norm induced by the canonical norms in R n and R m. More specifically, we are concerned with the choice of the regularization parameter µ in the Levenberg-Marquardt method, s = (JF (x) T JF (x) + µi) 1 JF (x) T F (x), x + = x + ts, 0 < t 1 where x + is the new iterate and t is the step length. Initially, a classical result is recalled [3, Lemma 4.1.1]. Lemma.1 (Linearization error). For any x, s in R n, F (x + s) (F (x) + JF (x)s) L s.
3 From now on, in this section, s is the Levenberg-Marquardt step at x R n, with regularization parameter µ > 0, that is, s = argmin F (x) + JF (x)s + µ s = (JF (x) T JF (x) + µi) 1 JF (x) T F (x). () Define also φ : R R Note that φ(t) = F (x + ts). (3) φ (0) = F (x), JF (x)s = JF (x) T F (x), (JF (x) T JF (x) + µi) 1 JF (x) T F (x) 0. (4) First we analyze the norm of the linearization of F along the direction s. Lemma. (Linearization s norm). For any t [0, 1], F (x) + tjf (x)s = F (x) + t JF (x) T F (x), s + (t t) JF (x)s tµ s F (x). Proof. F (x) + tjf (x)s = F (x) + t JF (x) T F (x), s + t JF (x)s = F (x) + t F (x), JF (x)s + t JF (x) T F (x), s + t JF (x)s = F (x) + t F (x), JF (x)s + t (JF (x) T JF (x) + µi)s, s + t JF (x)s = F (x) + t F (x), JF (x)s + (t t) JF (x)s tµ s. Using (4) and the fact that t [0, 1], we conclude the proof. Now we analyze the norm of F along the direction s. Lemma.3. For any t [0, 1], F (x + ts) F (x) + t F (x), JF (x)s + (t t) JF (x)s + [ ] L + t s 4 t3 s + Lt F (x) + tjf (x)s µ F (x) + t JF (x) T F (x), s + (t t) JF (x)s + [ ] L + t s 4 t3 s + Lt F (x) µ. Proof. Let R(t) = F (x + ts) (F (x) + tjf (x)s). Since F (x + ts) = F (x) + tjf (x)s + R(t), F (x + ts) = F (x) + tjf (x)s + R(t) + F (x) + tjf (x)s, R(t) F (x) + tjf (x)s + R(t) + F (x) + tjf (x)s R(t) F (x) + tjf (x)s + R(t) + F (x) R(t) 3
4 where the first inequality follows from Cauchy-Schwarz inequality and the second one from Lemma.. From the definition of R(t) and the assumption of JF being L-Lipschitz continuous, we have that R(t) Lt s /. Therefore, F (x + ts) F (x) + tjf (x)s + L 4 t4 s 4 + Lt F (x) s. To end the proof, use Lemma. again. In view of Lemma.3 we need a bound for s in order to choose, at a non-stationary point, a regularization parameter µ. The next technical lemma will be useful to obtain such a bound. Lemma.4. For any A R m n, b R m and µ > 0 (A T A + µi) 1 A T b 1 µ P R(A)b, where R(A) is the range of A and P R(A) is the orthogonal projection onto this subspace. Proof. Let b = P R(A) b and s = (A T A + µi) 1 A T b. Observe that (A T A + µi)s = A T b. (5) We will use an SVD of A. There exist U and V unitary matrices m m and n n, respectively, such that { U T σ i, i = j AV = D, d i,j = 0, otherwise, where σ i 0 for all i = 1,..., min{m, n}. Note that V T = V 1, U T = U 1 and V T A T AV = (U T AV ) T U T AV = D T D. Therefore, pre-multiplying both sides of (5) by V T and using the substitutions in (5) we conclude that It follows from this equation that if n m then while if n > m then s = V T s, b = U T b D T b = V T (A T A + µi)v s = (D T D + µi) s. s i = σ i σ i + µ b i, i = 1,..., n; σ i s i = σi + µ b i, 1 i m 0, m < i n. Since t/(t + µ) 1/( µ) for µ > 0 and t 0, we have s 1 µ b. To end the proof, note that s = s and b = b. 4
5 In the following result, the main ingredients for the devised algebraic rules are established. Theorem.5. For any µ > 0, s 1 1 P (F (x)) µ F (x) µ where P is the orthogonal projection onto the range of JF (x). Moreover, if then µ L 4 ( F (x) + 4 F (x) + P (F (x)) ), (6) F (x + s) F (x) + F (x), JF (x)s. Proof. The first part of the theorem, which are the bounds for s, follows from (), Lemma.4 and the metric properties of the orthogonal projection. Combining the first part of the theorem with Lemma.3, we conclude that F (x + s) F (x) + F (x), JF (x)s + s µ [ L 16 P (F (x)) + L F (x) µ µ To end the proof, note that the right hand-side of inequality (6) is the largest root of the concave quadratic µ L 16 P (F (x)) + L F (x) µ µ. Remark.6. In view of Theorem.5, possible choices for µ at a point x are or or µ = L 4 ( F (x) + 4 F (x) + P (F (x)) ), (7) µ = L (4 F (x) + P (F (x)) ), (8) 4 µ = + 5 L F (x). (9) 4 The first value is the smallest one and the two first values require the computation of P (F (x)), although an upper bound for such a norm can also be used. ]. 5
6 3 The algorithm Now we propose and analyze the Levenberg-Marquard algorithm with algebraic rules for computing the regularization parameter. Algorithm 1. Input : x 0 R n, β (0, 1), η [0, 1), L 0 > 0 and δ 0 with L 0 δ. 1. k 0. while JF (x k ) T F (x k ) 0 do 3. Set F k = F (x k ), J k = JF (x k ) 4. Choose µ k [µ k, µ+ k ] where P k is the projection onto the range of J k, 5. Compute s k = (J T k J k + µ k I) 1 J T k F k 6. t 1 µ k = L k 4 ( F k + 4 F k + P k (F k ) ), µ + k = + 5 L k F k 4 7. while F (x k + ts k ) > F k + βt s k, Jk T F k do 8. t t/ 9. end while 10. t k = t 11. x k+1 = x k + t k s k 1. if t k < 1 then 13. L k+1 = L k 14. else 15. Ared = F k F (x k+1 ) 16. Pred = F k F k + J k s k µ k s k = s k, Jk T F k 17. if Ared > η Pred then 18. L k+1 = max{l k /, δ} 19. else 0. L k+1 = L k 1. end if. end if 3. k k end while Concerning Algorithm 1, it is worth noticing that i) Iteration l begins with k = l 1, and ends with k = l if JF (x l 1 ) T F (x l 1 ) 0. ii) If the algorithm does not end at iteration k + 1, then µ k > 0, s k is well defined and it is a descent direction for F ( ). Therefore, the Armijo line search on Steps 6 10 of Algorithm 1 has finite termination. Altogether, Algorithm 1 is well defined and either it terminates after l steps with JF (x l 1 ) T F (x l 1 ) = 0 or it generates infinite sequences (x k ), (s k ), (t k ), (µ k ), (L k ). iii) L k plays the role of the Lipschitz constant of JF. If δ > 0, this parameter also plays the role of a safeguard which prevents L k from becoming too small. From now on, we assume that Algorithm 1 with input x 0 R n, β (0, 1), η [0, 1), L 0 > 0 and δ 0 does not stop at Step and that (x k ), (s k ), (t k ), (µ k ), (L k ) are the (infinite) sequences generated by it. 6
7 We analyze next the basic properties of Algorithm 1. Proposition 3.1. If L k L, then t k = 1 and L k+1 = max{l k /, δ}. Proof. Suppose that L k L. From this assumption and the definition of µ k (in Step 4) we have that µ k L 4 ( F (x k) + 4 F (x k ) + P (F (x k )) ), where P is the orthogonal projection onto the range of JF (x k ). The first equality of the proposition follows from the above inequality; the definition of s k (in Step 5); Theorem.5 with µ = µ k, x = x k, s = s k ; and Steps The second equality comes from the first one and Steps 1. Proposition 3.. For all k, and, for infinitely many ks, t k = 1. δ L k max{l 0, L}; (10) Proof. Since L 0 δ and L k+1 max{l k /, δ} for all k, the first inequality in (10) also holds for all k. We will prove the second inequality by induction in k. This inequality holds trivially for k = 0. Assume that it holds for some k. Steps 1 of the algorithm imply that if t k = 1, then L k+1 = L k or L k+1 = max{δ, L k /} L k and in both cases the inequality holds for k + 1. If t k < 1, it follows from Proposition 3.1 that L k < L and then L k+1 = L k L. So the inequality holds for k + 1 and the induction proof is complete. To prove the second part of the proposition, suppose that t k < 1 for any k k 0. Then in contradiction with (10). L k = k k 0 L k0, k = k 0, k 0 + 1,... From Proposition 3. and the Step 4 of the algorithm, we have for all k. δ F (x k ) µ k max{l 0, L} F (x k ) (11) Proposition 3.3. For each k F (x k+1 ) F (x k ) + βt k JF (x k ) T F (x k ), s k F (x k ) βt k JF (x k ) T F (x k ) JF (x k ) + µ k. (1) As a consequence, the sequence ( F (x k ) ) is strictly decreasing and k=0 βt k JF (x k ) T F (x k ) JF (x k ) + µ k F (x 0 ). Proof. The first inequality follows from the stopping condition for the Armijo line search (Steps 6 10). In view of the definition of s k and µ k, and the fact that µ k > 0, JF (x k ) T F (x k ), s k = JF (x k ) T F (x k ), (JF (x k ) T JF (x k ) + µ k I) 1 JF (x k ) T F (x k ) JF (x k ) T F (x k ) JF (x k ) T JF (x k ) + µ k, which trivially implies the second inequality. The last statement of the proposition follows directly from (1). 7
8 4 General convergence analysis The convergence of Algorithm 1 is examined in the following results. Proposition 4.1. If the sequence (x k ) is bounded, then it has a stationary accumulation point. Proof. By Proposition 3., t k = 1 for infinitely many ks. Since (x k ) is bounded, there exists a subsequence (x kj ) convergent to some x, such that t kj = 1 for all j. Thus, by Proposition 3.3, j=1 β JF (x k j ) T F (x kj ) JF (x kj ) + µ kj F (x 0 ). Moreover, since F and JF are continuous, F (x k ), JF (x k ) and µ k are bounded. Hence the sequence (JF (x kj ) T F (x kj )) converges to 0. To end the proof, use again the continuity of F and JF to conclude that JF ( x) T F ( x) = 0. From the bound for the norm of F along the direction s, obtained in Lemma.3, we prove next that the step length is bounded away from zero. Proposition 4.. If δ > 0, then for all k. Proof. For any t [0, 1] it holds L t k 8δ /L δ/L ( ) L 4 t3 s k + Lt F (x k ) µ k t 4 s k + L F (x k ) µ k ( ) L t F (x k ) + L F (x k ) µ k 16µ k ( ) L t 16δ F (x k) + L F (x k ) δ F (x k ) [ ( L = δ F (x k ) t 16δ + L ) ] 1, (13) δ where the first inequality comes from the bound on t, the second from Theorem.5 and the third from the fact that µ k δ F (x k ), as stated in (11). From inequality (13) and Lemma.3, we have that if 1 0 t L 16δ + L δ = 16δ /L δ/L, then F (x k + ts k ) F (x k ) + t s k, JF (x k ) T F (x k ). So, the result follows from Steps 6 10 of Algorithm 1. We now present the global convergence result of Algorithm 1. 8
9 Proposition 4.3. If δ > 0, then all accumulation points of the sequence (x k ) are stationary for the function F (x). Proof. Suppose that (x kj ) converges to some x. From Propositions 3.3 and 4., we have that j=1 JF (x kj ) T F (x kj ) JF (x kj ) + µ kj <. It follows from (11) and from the continuity of F and JF that JF (x kj ) +µ kj is bounded. Therefore JF (x kj ) T F (x kj ) converges to 0. A complexity result concerning the stationarity measure of the Algorithm 1 is given next. Proposition 4.4. If δ > 0 and (x k ) is bounded, then min JF (x i) T F (x i ) = O i=1,...,k ( 1 k ). Proof. Define M = sup k { JF (x k ) max{l 0, L} F (x k ) }. Then, by (11) and Propositions 3.3 and 4., for any k k β ( 8δ /L ) min M δ/L JF (x i) T F (x i ) i=1,...,k and the conclusion follows. k i=1 βt i JF (x i ) T F (x i ) JF (x i ) + µ i F (x 0 ) 5 Quadratic convergence under an error bound condition Given F : R n R m, consider the system and let X be its solution set, that is, F (x) = 0 (14) X = {x R n F (x) = 0}. (15) Our aim is to prove that (x k ) converges quadratically near solutions of (14) where, locally, F (x) provides an error bound for this system, in the sense of [17, Definition 1]. For completeness, we present this definition in the sequel (see Definition 5.3). Define, for γ > 0, S γ : R n \ X R n ( 1 S γ (x) = JF (x) T JF (x) + γ F (x) I) JF (x) T F (x). (16) Auxiliary bounds are established in the next two results. 9
10 Proposition 5.1. If x R n \ X, γ > 0, s = S γ (x), and x + = x + s then JF (x + ) T F (x + ) (γ + L) F (x) s + L JF (x +) s. Proof. Direct algebraic manipulations yields JF (x + ) T F (x + ) = JF (x) T (F (x) + JF (x)s) + (JF (x + ) JF (x)) T (F (x) + JF (x)s) + JF (x + ) T [F (x + ) (F (x) + JF (x)s)]. It follows from (16) that JF (x) T (F (x) + JF (x)s) = γ F (x) s. To end the proof, combine this result with the above equality, assumption (1), Lemma.1, and Lemma.. Proposition 5.. If x R n \ X, x X, γ > 0, s = S γ (x), and s = x x, then 1. F (x) + JF (x) s L s ;. F (x) + JF (x)s + JF (x)( s s) + γ F (x) ( s + s s ) L 4 s 4 + γ F (x) s. Proof. Since F ( x) = 0, F (x) + JF (x) s = F (x) + JF (x)( x x) F ( x) and the first inequality follows from this equality and Lemma.1. Define From item 1 we have ψ γ,x : R n R, ψ γ,x (u) = F (x) + JF (x)u + γ F (x) u. ψ γ,x ( s) L 4 s 4 + γ F (x) s. Observe that s = argmin u R n ψ γ,x (u). Since ψ γ,x is a quadratic with Hessian (JF (x) T JF (x) + γ F (x) I) and it is minimized by s, ψ γ,x ( s) = ψ γ,x (s) + JF (x)( s s) + γ F (x) s s = F (x) + JF (x)s + JF (x)( s s) + γ F (x) ( s + s s ). The second inequality of the proposition follows from the two above relations. We will analyze the local convergence of the sequence (x k ) under the local error bound condition, as defined next, that is weaker than assuming nonsingularity of JF (x) T JF (x) for x at the solution set X. Definition 5.3 ([17, Definition 1]). Let V be an open subset of R n such that V X, where X is as in (15). We say that F (x) provides a local error bound on V for the system (14) if there exists a positive constant c such that for all x V. c dist(x, X ) F (x) 10
11 The next lemma, which will be instrumental in the main result of this section, was proved in [7, Corollary ]. A proof is provided here for the sake of completeness, where the function F is simply continuously differentiable. Lemma 5.4. If F : R n R m is continuoulsy differentiable and F (x) provides an error bound for (14) in a neighborhood V of x X as in Definition 5.3, then JF (x) T F (x) also provides an error bound for (14) in some neighborhood of x. Proof. Suppose that V R n is a neighborhood of x where F (x) provides an error bound for (14), that is, c dist(x, X ) F (x) x V. Define, for x, y R n R(y, x) = F (y) (F (x) + JF (x)(y x)). Since F is continuously differentiable, there exists r > 0 such that B(x, r) V and R(y, x) c y x / y, x B(x, r). Take x B(x, r/) and x arg min z X z x. Then dist(x, X ) = x x < r/, x B(x, r) and, in view of the above assumptions, c x x F (x), R( x, x) c x x /. (17) Since F ( x) = 0, JF (x)( x x) = F (x) + R( x, x) and JF (x) T F (x), x x = F (x), JF (x)( x x) = F (x) + F (x), R( x, x). Using the above equalities, Cauchy-Schwarz inequality and (17) we conclude that JF (x) T F (x) x x F (x) ( F (x) R( x, x) ) F (x) ( c x x (c/) x x ) c x x /. Therefore, JF (x) T F (x) (c /) dist(x, X ) for any x B(x, r/). From now on, in this section, x R n, r, c > 0 are such that F (x ) = 0, c dist(x, X ) F (x) x B(x, r). (18) Another auxiliary result is provided next. Lemma 5.5. Consider x R n, r, c > 0 satisfying (18). If x B(x, r) \ X, x arg min u x, u X γ > 0, s = S γ (x) and s = x x, then 1. s = dist(x, X ) F (x) /c;. s ) 1/ (1 + L 4cγ s s ; 3. F (x) + JF (x)s ( L 4c γ s + 1 ) 1/ cγ s γ F (x). 11
12 Moreover, if s cγ L then F (x + s) F (x) + s, JF (x) T F (x). Proof. The first relation in item 1 follows trivially from the definition of x while the second relation comes from (18). From item of Proposition 5. and item 1 of this lemma, we have that and γ F (x) s L 4 s 4 + γ F (x) s L 4c F (x) s 3 + γ F (x) s ( F (x) + JF (x)s L L 4 s 4 + γ F (x) s 4c γ s + 1 ) cγ s γ F (x), which trivially imply items and 3, respectively. To prove the last part of the lemma, suppose that s cγ and define L From items and 3, we have a = L 4 s + L F (x) + JF (x)s γ F (x), w = L cγ s. ) ( a (1 L + L L 4 4cγ s s + L 4c γ s + 1 ) 1/ cγ s γ F (x) γ F (x) [ L ) ( (1 4cγ s + L L 4cγ s + L 4c γ s + 1 ) 1/ cγ s 1] γ F (x) [ w ( = 1 + w ) ( + w 1/ 1 + w ) ] 1/ 1 γ F (x), where the second inequality follows from item 1 and the equality comes from the definition of w. To end the proof, observe that since w 1/, a < 0 and use the first inequality in Lemma.3 with t = 1 and µ = γ F (x). Observe that in Algorithm 1, s k = S γ (x k ) for γ = µ k / F (x k ). In order to simplify the proofs, define D = + 5 µ k max{l 0, L}, γ k =, k N. (19) 4 F (x k ) In view of (11) and the definition of s k in Algorithm 1, for all k N, δ γ k D, s k = S γk (x k ). (0) Assuming that the residual function F provides an error bound for the solution set of the NLS zero-residual problem, the local convergence of Algorithm 1 is established as follows. Theorem 5.6. Consider x R n, r, c > 0 satisfying (18). There exists r > 0 such that if x k0 B(x, r) for some k 0, then either Algorithm 1 stops at some x k solution of (14) or (x k ) converges q-quadratically to some ˆx solution of (14). 1
13 Proof. In view of Lemma 5.4, there exist 0 < r 1 r and c 1 > 0 such that c 1 dist(x, X ) JF (x) T F (x), x B(x, r 1 ). (1) Define and M 1 = max { JF (x) x B(x, r 1 )}, M = 3M ( ( 1 D + L )) c 1 4 ρ = min { c δ L, 3 r 1, 1 M }. We claim that if x B(x, ρ) \ X, δ γ D, s = S γ (x), x + = x + s, x arg min u x, s = x x, () u X then s 3 dist(x, X ) 3 ρ, (3) dist(x +, X ) M dist(x, X ) dist(x, X )/, (4) F (x + ) F (x) + s, JF (x) T F (x). (5) The first inequality in (3) follows from item of Lemma 5.5 and the definitions of s and ρ. The second inequality comes from the inclusions x X and x B(x, ρ). To prove (4) first observe that x + x x x + s ρ + 3 ρ < r 1 which comes from the definition of x +, the triangle inequality, (3) and the definition of ρ. Consequently, from (1) for x + and Proposition 5.1, dist(x +, X ) 1 JF (x + ) T F (x + ) 1 ((γ + L) F (x) + L ) c 1 c 1 JF (x +) s s. Since JF is continuous, F (x) = F ( x) 1 0 JF ( x t s) s dt. As F ( x) = 0, we have ( ) F (x) max JF (x) s M 1 s. x B(x,ρ) Using the two above displayed relations, the bounds for γ in (), (3), and the definitions of M 1 and M we have dist(x +, X ) 3M ( ( 1 D + L )) c 1 4 s = M dist(x, X ), which proves the first inequality of (4). The second inequality in (4) comes from the fact that dist(x, X ) ρ and from the definition of ρ. 13
14 Our third claim, (5), follows directly from the last result of Lemma 5.5. Next we define a family of sets on R n which are, in some sense, well behaved with respect to Algorithm 1. Define and let W τ = { x R n x x 3τ ρ, dist(x, X ) r = Since x X, we have W 0 = B(x, r). Hence 3 + ρ. B (x, r) W B(x, ρ) B(x, r 1 ). } (1 τ) 3 + ρ, W = W τ, 0 τ<1 Let x k W. If JF (x k ) T F (x k ) = 0 then the algorithm stops at x k and, in view of (1), F (x k ) = 0. Next we analyze the case JF (x k ) T F (x k ) 0. It follows from (0) and (5) that x k+1 = x k + S γk (x k ) = x k + s k. As x k W, we have x k W τ definition of W τ, and (3) for some 0 τ < 1. Therefore, from the triangle inequality, the x k+1 x x k x + s k 3τ ρ + 3 dist(x k, X ) 3τ ρ + 3 (1 τ) 3 + ρ = 3 Additionally, from (4) and the definition of W τ we also have Altogether we proved that dist(x k+1, X ) 1 dist(x k, X ) (1 τ) (3 + ) ρ = ( 1+τ ) ρ. ( ) 1 1+τ 3 + ρ. 0 τ < 1, x k W τ, JF (x k ) T F (x k ) 0 x k+1 W 1+τ. Suppose that x k0 B(x, r). We have just proved that in this case either the algorithm stops at some x k X or an infinite sequence is generated and x k B(x, ρ) for k k 0. Assume that an infinite sequence is generated and define It follows from (3) and (4) that for k k 0, d k = dist(x k, X ). s k 3 d k, d k+1 M d k d k. (6) 14
15 Hence, j=k 0 x j+1 x j 3 j=k 0 d j 3 d k0. As (x k ) is a Cauchy sequence, it converges to some ˆx. Since d k = dist(x k, X ) converges to 0, F (ˆx) = 0, that is, ˆx X. By the triangle inequality and the first inequality in (6), x k ˆx s k + s j 3 d k + d j. j=k+1 From the two last inequalities in (6) we have j=k+1 d j M d j M d k j=k Therefore, from the two previously displayed relations, j=k d k x k ˆx 3 d k, j=k+1 d j M d k d k. where the first inequality comes from the inclusion ˆx X. Consequently, using (6) again and the definition of d k, we conclude that x k+1 ˆx 3 d k+1 3M d k 3M x k ˆx, ensuring the q-quadratic convergence and completing the proof. 6 Numerical experiments In this section we describe the experiments prepared to illustrate the practical performance of the algorithm. We start with the algorithmic and computational choices for obtaining the numerical results. 6.1 On the choice of µ k Similarly to the analysis developed for the unconstrained minimization problem (cf. [11, Prop.1.]), as the positive scalar µ is a lower bound for the smallest eigenvalue of JF (x) T JF (x) + µi, it follows that µ s s, (JF (x) T JF (x) + µi)s = s, JF (x) T F (x) s JF (x) T F (x) and so s JF (x)t F (x) µ. By Lemma.3, with t = 1, [ ] L F (x + s) F (x) + F (x), JF (x)s + s 4 s + L F (x) µ. For obtaining µ that ensures L 4 s + L F (x) µ 0, the previous condition upon the step gives L JF (x) T F (x) + 4L F (x) µ 4µ 3 0. (7) 15
16 Let µ = z and the concave function ψ : R + R given by ψ(z) = L JF (x) T F (x) + 4L F (x) z 4z 3/. Consider z 0 = L F (x). Note that ψ(z 0 ) = L JF (x) T F (x) 0. An iteration of Newton s method from z 0 gives us Then (7) is guaranteed for all z 1 = z 0 ψ(z 0) ψ (z 0 ) = L F (x) + L JF (x)t F (x). F (x) µ L F (x) + L JF (x)t F (x) F (x) def = µj. Observe that L F (x) µ J, µ + = L F (x) and so µ+ µ J. Nevertheless, there is no guarantee that µ J µ + holds, so we set µ k = min{µ + k, µ J}. 6. On the computation of the step s k The equivalence between the problems 1 min s J ks + F k + µ 1 s and min s ( Jk µk I ) ( Fk s + 0 ) implies that the system may be handled by the normal equations as follows (cf. [14]) [( ) ( (Jk T Jk Fk µk I) s + µk I 0 (J T k J k + µ k I)s + J T k F k = 0 (8) )] = 0. As an alternative to the Cholesky factorization of J T k J k + µ k I, the QR factorization of avoids performing the product J T k J k: ( Jk µk I ) ( Rµ = Q 0 ( Jk µk I where Q R (m+n) (m+n) is orthogonal and R µ R n n is upper triangular. Indeed, the economic version of the factorization may be used, in which just the first n columns of the orthogonal matrix Q are computed. The direction s is thus obtained from the system (8) by means of the relationship 6.3 About the backtracking scheme J T k J k + µ k I = R T µ R µ. Besides the simple bisection of Steps 6 10 of Algorithm 1, we have implemented a quadratic-cubic interpolation scheme (cf. [3, 6.3.]), which performed slightly better in preliminary experiments, so it was the choice for the reported results. ), ) 16
17 6.4 About the initialization of the sequence (L k ) { We set L 0 = 10 6, computed x 1 and then defined L 1 = max JF (x1 ) JF (x 0 ) n x1 F x 0 updating scheme follows Steps 1 of Algorithm The results }, δ. After that, the The tests were performed in a notebook DELL Intel Core i7-4510u, CPU.00GHz 4, with 16GB RAM, Inspiron i A30, 64-bit, using Matlab 014a, v The remaining parameters of the Algorithm 1 were defined as β = 10, η = 0.5 and δ = The set of test problems consists of all 53 problems from the CUTEst collection [9] such that the nonlinear equations F (x) = 0 are recast from feasibility instances, i.e, problems without objective function, with nonlinear equality constraints and without fixed variables for which the Jacobian matrix is available. The initial point x 0 was always the default of the collection. To put our approach in perspective, the same problems were addressed by the benchmark modular code lsqnonlin (available within the Matlab software), based on the Levenberg-Marquardt Method [1, 13, 15]. Summing up, there are three strategies under analysis, which are: S 1 : lsqnonlin; S : Algorithm 1 with Cholesky factorization to compute s k ; S 3 : Algorithm 1 with QR factorization to compute s k. The stopping criteria of the three strategies were the following, where ε mac is the machine precision: (1) Convergence to a solution with relative stationarity: JF (x k ) T F (x k )) max{ JF (x 0 ) T F (x 0 )), ε mac }; () Change in x too small: t k s k 10 9 ( εmac + x k+1 ) ; (3) Change in the norm of the residue too small: F (x k+1 ) F (x k ) 10 6 F (x k ) ; (4) Computed search direction too small: s k 10 9 ; (0) Maximum number of functional evaluations exceeded max = {000, 10n}. (-4) Line search failed, as the stepsize is too small: t k Let fi be the objective function value obtained by strategy S i when applied to a given problem, run with the default initial point. We consider that the strategy S i found a solution (cf. [1]) if fi f min 0.01, (9) max{1, f min } where f min = min{f i i = 1,, 3} and f(x) = F (x). The complete outputs are reported in the Appendix. Tables 1- list the 53 test problems, displaying in the columns the problem name; the dimensions (n and m); the number of performed iterations (#Iter); the number of function evaluations (#Fun); the function value at the last iterate 17
18 (f ); the demanded CPU time in seconds (CPU), and the reason for stopping (exit) for the three solvers, namely, S 1, S, S 3, the results of which are displayed row by row for each problem. As it is usual to have some variation of the CPU time from one execution of an algorithm to the other, for each problem we ran 5 times all the solvers and we considered the average CPU time of the 5 runs. The symbol indicates that the obtained solution does not satisfy (9). It is worth mentioning that S spent one Cholesky factorization per iteration. The only exception occurred at a single iteration of the problem 10FOLDTR in which S demanded an additional Cholesky factorization with a slight increasing of µ k, to ensure the numerically safe positive definiteness of J T k J k + µ k I. The results corresponding to the solved problems are depicted in the performance profiles of Figure 1 for the number of iterations, the number of function evaluations and the demanded CPU time, respectively, where logarithmic scale was used in the horizontal axis for better visualization of the results. In terms of efficiency, our strategies slightly outperform the benchmark when it comes to the number of iterations and function evaluations. Indeed, 56.6% of the problems were solved with the fewest number of iterations for S 1 and 58.5% for S and S 3, whereas 5.8% of the problems were solved with the fewest number of function evaluations for S 1, 60.4% for S and S 3. Now, concerning the CPU time, the difference is more significant, as just 1.9% of the problems were solved with the fewest amount of time for S 1, 8.3% for S and 69.8% for S 3. Strategies S and S 3 are the most robust, solving 5 of the 53 problems. Problem EIGENB was considered not solved by strategy S, and 10FOLDTR was not solved by S 3. On the other hand, the benchmark did not solve three problems (10FOLDTR, CYCLIC3, YATPSQ), showing that our approach is competitive. Figure 1: Performance profiles for the number of iterations, the number of function evaluations and the demanded CPU time. With the aim of illustrating the rate of convergence of the proposed algebraic rules, Figure shows the logarithm (base 10) of the residual value against iterations for both strategies S and S 3 for two typical problems. 18
19 Figure : The residual value F (x k ) against iterations, for the problems HEART6 (left) and HYDCAR6 (right). 7 Final remarks We have proposed and analyzed a class of Levenberg-Marquardt methods in which algebraic rules for computing the regularization parameter were devised. Under the Lipschitz continuity of the Jacobian of the residual functions, the algebraic rules were obtained to allow the full LM-step to be accepted by the Armijo sufficient decrease condition. In terms of global convergence, all the accumulation points of the sequence generated by the algorithm are stationary. When it comes to local convergence, for zero-residual problems we have proved q-quadratic rate under an error-bound condition, less restrictive than assuming nonsingularity at the solution. A set of numerical experiments was prepared to illustrate the practical performance of the proposed algorithm. Our approach has shown both efficiency and robustness for the 53 feasibility instances from the CUTEst, compared with the benchmark lsqnonlin of Matlab. Obtaining inexact solutions for the linear systems, with adequately matching stopping criteria, as well as considering nonzero-residual problems in the local convergence analysis are topics for future investigation. Acknowledgements. The authors are thankful to Abel S. Siqueira for installing the CUTEst and preparing our machine to run it within Matlab. References [1] E. G. Birgin and J. M. Gentil. Evaluating bound-constrained minimization software. Comput. Optim. Appl., 53(): , 01. [] H. Dan, N. Yamashita, and M. Fukushima. Convergence properties of the inexact Levenberg- Marquardt method under local error bound conditions. Optim. Methods Softw., 17(4):605 66, 00. [3] J. E. Dennis, Jr. and R. B. Schnabel. Numerical methods for unconstrained optimization and nonlinear equations, volume 16 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, Corrected reprint of the 1983 original. 19
20 [4] J. Fan. Accelerating the modified Levenberg-Marquardt method for nonlinear equations. Math. Comp., 83(87): , 014. [5] J. Fan and J. Pan. Convergence properties of a self-adaptive Levenberg-Marquardt algorithm under local error bound condition. Comput. Optim. Appl., 34(1):47 6, 006. [6] J. Fan and Y.-X. Yuan. On the quadratic convergence of the Levenberg-Marquardt method without nonsingularity assumption. Computing, 74(1):3 39, 005. [7] A. Fischer. Local behavior of an iterative framework for generalized equations with nonisolated solutions. Math. Program., 94(1, Ser. A):91 14, 00. [8] A. Fischer, P. K. Shukla, and M. Wang. On the inexactness level of robust Levenberg-Marquardt methods. Optimization, 59():73 87, 010. [9] N. Gould, D. Orban, and P. Toint. Cutest: a constrained and unconstrained testing environment with safe threads. Technical Report RAL-TR , Rutherford Appleton Laboratory, Oxfordshire, England, 013. [10] S. Gratton, A. S. Lawless, and N. K. Nichols. Approximate Gauss-Newton methods for nonlinear least squares problems. SIAM J. Optim., 18(1):106 13, 007. [11] E. W. Karas, S. A. Santos, and B. F. Svaiter. Algebraic rules for quadratic regularization of Newton s method. Comput. Optim. Appl., 60: , 014. [1] K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quart. Appl. Math., : , [13] D. W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Indust. Appl. Math., 11: , [14] J. J. Moré. The Levenberg-Marquardt algorithm: implementation and theory. In G. A. Watson, editor, Lecture Notes in Mathematics 630: Numerical Analysis, pages Springer-Verlag, New York, [15] D. D. Morrison. Methods for nonlinear least squares problems and convergence proofs. In J. Lorell and F. Yagi, editors, Proceedings of the Seminar on Tracking Programs and Orbit Determination, pages 1 9. Jet Propulsion Laboratory, Pasadena, CA, USA, [16] K. Ueda and N. Yamashita. On a global complexity bound on the Levenberg-Marquardt method. J. Optimiz. Theory App., 147: , 010. [17] N. Yamashita and M. Fukushima. On the rate of convergence of the Levenberg-Marquardt method. Topics in Numerical Analysis, Comp. Suppl., 10:39 49, 001. [18] Y.-X. Yuan. Recent advances in numerical methods for nonlinear equations and nonlinear least squares. Numer. Algebra Control Optim., 1(1):15 34, 011. Appendix The complete computational results are presented next. 0
21 Problem n m #Iter #Fun f CPU exit e e FOLDTR e e e e e e 01 3 ARGAUSS e e e e e e 01 3 ARGLALE e e e+0.88e e e 01 1 ARGLBLE e e e e e e 01 1 ARGTRIG e e e e e e 01 3 ARWHDNE e e e e e e 01 1 BOOTH e e e e e e 01 1 BROWNALE e e e e e e 01 1 BROYDN3D e e e e e 3.098e 01 1 BROYDNBD e e e 0.319e e e+00 1 CHANDHEU e e e 9.16e e e 01 1 CHNRSBNE e e e e e.9963e 01 1 CLUSTER e e e e e e 01 1 COOLHANS e 5.033e e e e e 01 1 CUBENE e e e e e e 01 1 CYCLIC e e e e e e 01 1 EIGENAU e e e e e e 01 1 EIGENB e e e e e e+00 1 EIGENC e e e e e e 01 1 GOTTFR e e e e e e 01 1 GROWTH e e e e e e 01 1 HATFLDF e e e e e e 01 1 HATFLDG e e e e e e 01 1 HEART e e e e e e 01 1 HEART e e e e e e 01 1 HIMMELBA e e e e e e 01 1 HIMMELBC e 4.484e e 8.634e 03 1 Table 1: Numerical results 1
22 Problem n m #Iter #Fun f CPU exit e e 01 3 HIMMELBD e e e e e e 01 1 HIMMELBE e e e e e e 01 3 HYDCAR e e e e e e 01 1 HYDCAR e e e e e e 01 1 HYPCIR e e e e e e+00 1 KSS e e e e e e 01 1 METHANB e e e e e e 01 1 METHANL e e e e e e+00 1 MSQRTA e e e 8.001e e 1.747e+00 1 MSQRTB e e e e e e 01 1 OSCIGRNE e e e e e e 01 3 OSCIPANE e e e e e e 01 1 POWELLBS e e e e e e 01 0 POWELLSQ e e e e e e 01 1 RECIPE e e e e e e 01 1 RSNBRNE e e e e e e 01 1 SINVALNE e e e e e e 01 1 SPIN e e e e e e 01 1 SPIN e e e e e e 01 1 SPMSQRT e e e e e e 01 1 YATP1NE e e e e e 3.85e 01 1 YATP1SQ e e e e e 3.56e 01 1 YATP1SS e e e e e e+03 3 YATPSQ e e e e e e 01 1 YFITNE e e e e e e 01 1 ZANGWIL e e e e 0 1 Table : Numerical results
The effect of calmness on the solution set of systems of nonlinear equations
Mathematical Programming manuscript No. (will be inserted by the editor) The effect of calmness on the solution set of systems of nonlinear equations Roger Behling Alfredo Iusem Received: date / Accepted:
More informationKantorovich s Majorants Principle for Newton s Method
Kantorovich s Majorants Principle for Newton s Method O. P. Ferreira B. F. Svaiter January 17, 2006 Abstract We prove Kantorovich s theorem on Newton s method using a convergence analysis which makes clear,
More informationPrimal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization
Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department
More informationarxiv: v1 [math.na] 25 Sep 2012
Kantorovich s Theorem on Newton s Method arxiv:1209.5704v1 [math.na] 25 Sep 2012 O. P. Ferreira B. F. Svaiter March 09, 2007 Abstract In this work we present a simplifyed proof of Kantorovich s Theorem
More informationThis manuscript is for review purposes only.
1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region
More informationLocal convergence analysis of the Levenberg-Marquardt framework for nonzero-residue nonlinear least-squares problems under an error bound condition
Local convergence analysis of the Levenberg-Marquardt framework for nonzero-residue nonlinear least-squares problems under an error bound condition Roger Behling 1, Douglas S. Gonçalves 2, and Sandra A.
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationHandout on Newton s Method for Systems
Handout on Newton s Method for Systems The following summarizes the main points of our class discussion of Newton s method for approximately solving a system of nonlinear equations F (x) = 0, F : IR n
More informationOn the convergence properties of the projected gradient method for convex optimization
Computational and Applied Mathematics Vol. 22, N. 1, pp. 37 52, 2003 Copyright 2003 SBMAC On the convergence properties of the projected gradient method for convex optimization A. N. IUSEM* Instituto de
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationTrust-region methods for rectangular systems of nonlinear equations
Trust-region methods for rectangular systems of nonlinear equations Margherita Porcelli Dipartimento di Matematica U.Dini Università degli Studi di Firenze Joint work with Maria Macconi and Benedetta Morini
More informationA globally convergent Levenberg Marquardt method for equality-constrained optimization
Computational Optimization and Applications manuscript No. (will be inserted by the editor) A globally convergent Levenberg Marquardt method for equality-constrained optimization A. F. Izmailov M. V. Solodov
More informationKeywords: Nonlinear least-squares problems, regularized models, error bound condition, local convergence.
STRONG LOCAL CONVERGENCE PROPERTIES OF ADAPTIVE REGULARIZED METHODS FOR NONLINEAR LEAST-SQUARES S. BELLAVIA AND B. MORINI Abstract. This paper studies adaptive regularized methods for nonlinear least-squares
More informationAn inexact subgradient algorithm for Equilibrium Problems
Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,
More informationAn Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations
International Journal of Mathematical Modelling & Computations Vol. 07, No. 02, Spring 2017, 145-157 An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations L. Muhammad
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationInexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty
Inexact Newton Methods Applied to Under Determined Systems by Joseph P. Simonis A Dissertation Submitted to the Faculty of WORCESTER POLYTECHNIC INSTITUTE in Partial Fulfillment of the Requirements for
More informationMultipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations
Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,
More informationA Regularized Interior-Point Method for Constrained Nonlinear Least Squares
A Regularized Interior-Point Method for Constrained Nonlinear Least Squares XII Brazilian Workshop on Continuous Optimization Abel Soares Siqueira Federal University of Paraná - Curitiba/PR - Brazil Dominique
More informationc 2007 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.
More informationSpectral gradient projection method for solving nonlinear monotone equations
Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department
More informationNumerical Methods for Large-Scale Nonlinear Equations
Slide 1 Numerical Methods for Large-Scale Nonlinear Equations Homer Walker MA 512 April 28, 2005 Inexact Newton and Newton Krylov Methods a. Newton-iterative and inexact Newton methods. Slide 2 i. Formulation
More informationA PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
More informationA Smoothing Newton Method for Solving Absolute Value Equations
A Smoothing Newton Method for Solving Absolute Value Equations Xiaoqin Jiang Department of public basic, Wuhan Yangtze Business University, Wuhan 430065, P.R. China 392875220@qq.com Abstract: In this paper,
More informationCONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS
CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationGlobal convergence of a regularized factorized quasi-newton method for nonlinear least squares problems
Volume 29, N. 2, pp. 195 214, 2010 Copyright 2010 SBMAC ISSN 0101-8205 www.scielo.br/cam Global convergence of a regularized factorized quasi-newton method for nonlinear least squares problems WEIJUN ZHOU
More informationGlobal convergence of trust-region algorithms for constrained minimization without derivatives
Global convergence of trust-region algorithms for constrained minimization without derivatives P.D. Conejo E.W. Karas A.A. Ribeiro L.G. Pedroso M. Sachine September 27, 2012 Abstract In this work we propose
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationA convergence result for an Outer Approximation Scheme
A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationBulletin of the. Iranian Mathematical Society
ISSN: 1017-060X (Print) ISSN: 1735-8515 (Online) Bulletin of the Iranian Mathematical Society Vol. 41 (2015), No. 5, pp. 1259 1269. Title: A uniform approximation method to solve absolute value equation
More informationMerit functions and error bounds for generalized variational inequalities
J. Math. Anal. Appl. 287 2003) 405 414 www.elsevier.com/locate/jmaa Merit functions and error bounds for generalized variational inequalities M.V. Solodov 1 Instituto de Matemática Pura e Aplicada, Estrada
More informationComputational Methods. Least Squares Approximation/Optimization
Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationA trust-region derivative-free algorithm for constrained optimization
A trust-region derivative-free algorithm for constrained optimization P.D. Conejo and E.W. Karas and L.G. Pedroso February 26, 204 Abstract We propose a trust-region algorithm for constrained optimization
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationarxiv: v1 [math.fa] 16 Jun 2011
arxiv:1106.3342v1 [math.fa] 16 Jun 2011 Gauge functions for convex cones B. F. Svaiter August 20, 2018 Abstract We analyze a class of sublinear functionals which characterize the interior and the exterior
More informationSome Inexact Hybrid Proximal Augmented Lagrangian Algorithms
Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br
More informationMaximal Monotonicity, Conjugation and the Duality Product in Non-Reflexive Banach Spaces
Journal of Convex Analysis Volume 17 (2010), No. 2, 553 563 Maximal Monotonicity, Conjugation and the Duality Product in Non-Reflexive Banach Spaces M. Marques Alves IMPA, Estrada Dona Castorina 110, 22460-320
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationJournal of Convex Analysis Vol. 14, No. 2, March 2007 AN EXPLICIT DESCENT METHOD FOR BILEVEL CONVEX OPTIMIZATION. Mikhail Solodov. September 12, 2005
Journal of Convex Analysis Vol. 14, No. 2, March 2007 AN EXPLICIT DESCENT METHOD FOR BILEVEL CONVEX OPTIMIZATION Mikhail Solodov September 12, 2005 ABSTRACT We consider the problem of minimizing a smooth
More informationAM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization
AM 205: lecture 6 Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization Unit II: Numerical Linear Algebra Motivation Almost everything in Scientific Computing
More informationA double projection method for solving variational inequalities without monotonicity
A double projection method for solving variational inequalities without monotonicity Minglu Ye Yiran He Accepted by Computational Optimization and Applications, DOI: 10.1007/s10589-014-9659-7,Apr 05, 2014
More informationOptimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationPart 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)
Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x
More informationj=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k
Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationOn well definedness of the Central Path
On well definedness of the Central Path L.M.Graña Drummond B. F. Svaiter IMPA-Instituto de Matemática Pura e Aplicada Estrada Dona Castorina 110, Jardim Botânico, Rio de Janeiro-RJ CEP 22460-320 Brasil
More informationStep-size Estimation for Unconstrained Optimization Methods
Volume 24, N. 3, pp. 399 416, 2005 Copyright 2005 SBMAC ISSN 0101-8205 www.scielo.br/cam Step-size Estimation for Unconstrained Optimization Methods ZHEN-JUN SHI 1,2 and JIE SHEN 3 1 College of Operations
More informationCubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization
Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization J. M. Martínez M. Raydan November 15, 2015 Abstract In a recent paper we introduced a trust-region
More informationJournal of Convex Analysis (accepted for publication) A HYBRID PROJECTION PROXIMAL POINT ALGORITHM. M. V. Solodov and B. F.
Journal of Convex Analysis (accepted for publication) A HYBRID PROJECTION PROXIMAL POINT ALGORITHM M. V. Solodov and B. F. Svaiter January 27, 1997 (Revised August 24, 1998) ABSTRACT We propose a modification
More informationConvergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize
Bol. Soc. Paran. Mat. (3s.) v. 00 0 (0000):????. c SPM ISSN-2175-1188 on line ISSN-00378712 in press SPM: www.spm.uem.br/bspm doi:10.5269/bspm.v38i6.35641 Convergence of a Two-parameter Family of Conjugate
More informationComplexity of gradient descent for multiobjective optimization
Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization
More informationRejoinder on: Critical Lagrange multipliers: what we currently know about them, how they spoil our lives, and what we can do about it
TOP manuscript No. (will be inserted by the editor) Rejoinder on: Critical Lagrange multipliers: what we currently know about them, how they spoil our lives, and what we can do about it A. F. Izmailov
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationPacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT
Pacific Journal of Optimization Vol., No. 3, September 006) PRIMAL ERROR BOUNDS BASED ON THE AUGMENTED LAGRANGIAN AND LAGRANGIAN RELAXATION ALGORITHMS A. F. Izmailov and M. V. Solodov ABSTRACT For a given
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug
More informationj=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).
Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationON AUGMENTED LAGRANGIAN METHODS WITH GENERAL LOWER-LEVEL CONSTRAINTS. 1. Introduction. Many practical optimization problems have the form (1.
ON AUGMENTED LAGRANGIAN METHODS WITH GENERAL LOWER-LEVEL CONSTRAINTS R. ANDREANI, E. G. BIRGIN, J. M. MARTíNEZ, AND M. L. SCHUVERDT Abstract. Augmented Lagrangian methods with general lower-level constraints
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationNewton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions
260 Journal of Advances in Applied Mathematics, Vol. 1, No. 4, October 2016 https://dx.doi.org/10.22606/jaam.2016.14006 Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More informationMaximal Monotone Operators with a Unique Extension to the Bidual
Journal of Convex Analysis Volume 16 (2009), No. 2, 409 421 Maximal Monotone Operators with a Unique Extension to the Bidual M. Marques Alves IMPA, Estrada Dona Castorina 110, 22460-320 Rio de Janeiro,
More informationResidual iterative schemes for largescale linear systems
Universidad Central de Venezuela Facultad de Ciencias Escuela de Computación Lecturas en Ciencias de la Computación ISSN 1316-6239 Residual iterative schemes for largescale linear systems William La Cruz
More informationYURI LEVIN, MIKHAIL NEDIAK, AND ADI BEN-ISRAEL
Journal of Comput. & Applied Mathematics 139(2001), 197 213 DIRECT APPROACH TO CALCULUS OF VARIATIONS VIA NEWTON-RAPHSON METHOD YURI LEVIN, MIKHAIL NEDIAK, AND ADI BEN-ISRAEL Abstract. Consider m functions
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationCombining stabilized SQP with the augmented Lagrangian algorithm
Computational Optimization and Applications manuscript No. (will be inserted by the editor) Combining stabilized SQP with the augmented Lagrangian algorithm A.F. Izmailov M.V. Solodov E.I. Uskov Received:
More informationTwo improved classes of Broyden s methods for solving nonlinear systems of equations
Available online at www.isr-publications.com/jmcs J. Math. Computer Sci., 17 (2017), 22 31 Research Article Journal Homepage: www.tjmcs.com - www.isr-publications.com/jmcs Two improved classes of Broyden
More informationDENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS
DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS by Johannes Brust, Oleg Burdaov, Jennifer B. Erway, and Roummel F. Marcia Technical Report 07-, Department of Mathematics and Statistics, Wae
More informationOn Riesz-Fischer sequences and lower frame bounds
On Riesz-Fischer sequences and lower frame bounds P. Casazza, O. Christensen, S. Li, A. Lindner Abstract We investigate the consequences of the lower frame condition and the lower Riesz basis condition
More informationA Novel Inexact Smoothing Method for Second-Order Cone Complementarity Problems
A Novel Inexact Smoothing Method for Second-Order Cone Complementarity Problems Xiaoni Chi Guilin University of Electronic Technology School of Math & Comput Science Guilin Guangxi 541004 CHINA chixiaoni@126.com
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational
More informationOn the iterate convergence of descent methods for convex optimization
On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.
More informationIteration-complexity of a Rockafellar s proximal method of multipliers for convex programming based on second-order approximations
Iteration-complexity of a Rockafellar s proximal method of multipliers for convex programming based on second-order approximations M. Marques Alves R.D.C. Monteiro Benar F. Svaiter February 1, 016 Abstract
More information1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem
SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationOn Lagrange multipliers of trust-region subproblems
On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008
More informationNumerical Methods for Large-Scale Nonlinear Systems
Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.
More informationAN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.
Bull. Iranian Math. Soc. Vol. 40 (2014), No. 1, pp. 235 242 Online ISSN: 1735-8515 AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More informationAbsolute Value Programming
O. L. Mangasarian Absolute Value Programming Abstract. We investigate equations, inequalities and mathematical programs involving absolute values of variables such as the equation Ax + B x = b, where A
More informationA QP-FREE CONSTRAINED NEWTON-TYPE METHOD FOR VARIATIONAL INEQUALITY PROBLEMS. Christian Kanzow 1 and Hou-Duo Qi 2
A QP-FREE CONSTRAINED NEWTON-TYPE METHOD FOR VARIATIONAL INEQUALITY PROBLEMS Christian Kanzow 1 and Hou-Duo Qi 2 1 University of Hamburg Institute of Applied Mathematics Bundesstrasse 55, D-20146 Hamburg,
More informationOptimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationAM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization
AM 205: lecture 6 Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization Unit II: Numerical Linear Algebra Motivation Almost everything in Scientific Computing
More informationValue and Policy Iteration
Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite
More informationA Fast Algorithm for Computing High-dimensional Risk Parity Portfolios
A Fast Algorithm for Computing High-dimensional Risk Parity Portfolios Théophile Griveau-Billion Quantitative Research Lyxor Asset Management, Paris theophile.griveau-billion@lyxor.com Jean-Charles Richard
More informationarxiv: v3 [math.oc] 18 Apr 2012
A class of Fejér convergent algorithms, approximate resolvents and the Hybrid Proximal-Extragradient method B. F. Svaiter arxiv:1204.1353v3 [math.oc] 18 Apr 2012 Abstract A new framework for analyzing
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More information