Gradient methods exploiting spectral properties

Size: px
Start display at page:

Download "Gradient methods exploiting spectral properties"

Transcription

1 Noname manuscript No. (will be inserted by the editor) Gradient methods exploiting spectral properties Yaui Huang Yu-Hong Dai Xin-Wei Liu Received: date / Accepted: date Abstract A new stepsize is derived for the gradient method. It is shown that this stepsize asymptotically converges to the reciprocal of the largest eigenvalue of the Hessian of the objective function. Based on this spectral property, we develop a monotone gradient method that taes a certain number of steps with the asymptotic optimal stepsize proposed by Dai and Yang (Computational Optimization and Applications, 2006, 33(1): 73-88) followed by some short steps associated with the new stepsize. By employing one step retard of the asymptotic optimal stepsize, a nonmonotone variant is suggested. R-linear convergences are established for the proposed methods. Numerical comparisons with other recent successful methods demonstrate the efficiency of our methods. Keywords gradient method spectral property convergence quadratic optimization This wor was supported by the National Natural Science Foundation of China (Grant Nos , , , ) and the National 973 Program of China (Grant No. 2015CB856002). Yaui Huang Institute of Mathematics, Hebei University of Technology, Tianjin, China huangyaui2006@gmail.com Yu-Hong Dai State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China dyh@lsec.cc.ac.cn Xin-Wei Liu Institute of Mathematics, Hebei University of Technology, Tianjin, China mathlxw@hebut.edu.cn

2 2 Yaui Huang et al. 1 Introduction We consider the problem of minimizing a convex quadratic function min f(x) = 1 2 xt Ax b T x, (1) where b R n and A R n n is symmetric positive definite with eigenvalues 0 < λ 1 λ 2... and condition number κ = λn λ 1. This problem is one of the simplest non-trivial non-linear programming problems. Solving problem (1) is usually a pre-requisite for a method to be generalized to solve more general problems. In addition, various optimization problems arising in many applications including machine learning [6], sparse reconstruction [18], nonnegative matrix factorization [25, 27] can be formulated as the form of (1), possibly with the addition of regularization or bound constraints. The simplest method for solving (1) is the gradient method which updates iterates by x +1 = x α g, (2) where g = f(x ) and α > 0 is the stepsize determined in some way. The classic steepest descent (SD) method can be dated bac to Cauchy [5], who suggested to compute the stepsize by exact line search: α SD = arg min f(x αg ) = gt g α g T Ag. (3) It has been shown that the method converges linearly [1] with Q-linear factor κ 1 κ+1. Thus, the SD method can be very slow especially when the condition number is large. Further analysis shows that the gradients will asymptotically produce zigzags between two orthogonal directions associated with the two eigenvectors corresponding to λ 1 and, see [20,29] for more details. In 1988, from the point of view of quasi-newton method, Barzilai and Borwein [2] designed the following two ingenious stepsizes that significantly improve the performance of gradient methods: α BB1 = st 1 s 1 s T 1 y, α BB2 1 = st 1 y 1 y 1 T y, (4) 1 where s 1 = x x 1 and y 1 = g g 1. The BB method was shown to be global convergent for general n-dimensional strictly convex quadratics [30] and the convergence rate is R-linear [11]. Recently, the BB method and its variants have successfully been extended to general unconstrained problems [31], to constrained optimization problems [3, 24] and to various applications [25,26,28], see also [4,,16,19,33] and references therein. Let {ξ 1, ξ 2,..., ξ n } be the orthonormal eigenvectors associated with the eigenvalues. Denote the components of g along the eigenvectors ξ i by µ i, i = 1,..., n, i.e., n g = µ i ξ i, i=1

3 Gradient methods exploiting spectral properties 3 which together with the update rule (2) gives that where g +1 = g α Ag = µ +1 i (I α j A)g 1 = j=1 = µ i (1 α λ i ) = µ 1 i n i=1 (1 α j λ i ), j=1 µ +1 i ξ i, which implies that the closer α to 1 λ i, the smaller µ +1 i. In addition, if µ i = 0, the corresponding component vanishes at all subsequent iterations. Since the SD method will asymptotically zigzag between ξ 1 and ξ n, a natural way to brea the zigzagging pattern is einating the component µ 1 or µ 1 1 n which can be achieved by employing a stepsize approximates λ 1 or. One pioneer wor in this line is due to Yuan [14,32] who derived the following stepsize by imposing finite termination for two-dimensional convex quadratics: α Y = (1/α SD 2. )2 + 4 g 2 /(α 1 SD g 1 ) 2 + (1/α 1 SD + 1/αSD ) 1 1/αSD (5) Dai and Yuan [14] further suggested a new gradient method whose stepsize is given by α DY = α SD, if mod(,4)< 2; (6) α Y, otherwise. The DY method (6) eeps monotonicity and appears better than the nonmonotone BB method, see [14]. It was shown by De Asmundis et al. [15] that the stepsize α Y 1 cnoverges to if the SD method is applied to problem (1). That is, occasionally employing the stepsize α Y along the SD method will enhance the eination of the component µ n. Recently, Gonzaga and Schneider [23] suggested a monotone method where all stepsizes are of the form (3). Particularly, their method approximates 1 by a short stepsize calculated by replacing g in (3) with g = (I ηa)g, where η is a large number. Nonmonotone gradient methods exploit spectral properties have been developed as well. Frassoldati et al. [21] developed a new short stepsize by maximizing the next SD stepsize α+1 SD according to the current stepsize. They further suggested the ABB min 2 method which tries to exploit BB1 stepsizes close to 1 λ 1 by using short stepsizes to einate those components associated with large eigenvalues to force the gradient to be dominated by components corresponding to small eigenvalues. More recently, based on the favourable property of α Y, De Asmundis et al. [15] suggested to reuse it in a cyclic fashion after a certain number of SD steps. Precisely, their approach, referred to as the SDC method, employs the stepsize α SDC α SD, if mod(, h + s) < h; = (7) αt Y, otherwise, with t = max{i : mod(i, h + s) = h},

4 4 Yaui Huang et al. where h 2 and s 1. They also proposed a monotone version of (7) by imposing safeguards on the stepsizes. One common characteristic of the aforementioned methods is maing use of spectral properties of the stepsizes. The recent study in [16] points out that gradient methods using long and short stepsizes that attempt to exploit their spectral properties show better numerical behaviour both in the quadratic and general cases. For more wors on gradient methods, see [7,9,15,16,21,23,33, 34] and references therein. Recently, Dai and Yang [12] introduced a gradient method with a new stepsize that possesses similar spectral property as α Y. In particular, their stepsize is calculated by α AOP T = g Ag, (8) 2 which asymptotically converges to λ 1+ that is optimal in the sense that it minimizes I αa, see for example [12,17]. Since α AOP T α SD, the Dai- Yang method (8) is monotone but without exact line searches. In addition, the method converges Q-linearly and enjoys the same rate as the SD method. More importantly, it is able to recover the eigenvectors ξ 1 and ξ n. In this paper, based on the Dai-Yang method (8), we propose a new stepsize to exploit the spectral property. Particularly, our new stepsize is given by where d = dt d ᾱ = d T Ad, (9) g 1 g 1 g g. () We show that the stepsize ᾱ asymptotically converges to 1 if the Dai-Yang method (8) is applied to problem (1). Therefore, the stepsize ᾱ is helpful in einating the component µ n. Thans to this desired property, we are able to develop a new efficient gradient method by taing a certain number of steps with the asymptotic optimal stepsize α AOP T followed by some short steps which are determined by the smaller one of α AOP T and ᾱ 1. Thus, our method is monotone without exact line searches. We also consider a nonmonotone variant of the method which is same as the monotone one except using the stepsize α AOP T with one step retard. R-linear convergences of the proposed methods are established. Preinary numerical comparisons with the DY, ABB min 2 and SDC methods are presented to demonstrate the effiency of our methods. The paper is organized as follows. In Section 2 we analyze asymptotic spectral property of the stepsize ᾱ and present the new method as well as its nonmonotone variant. In Section 3 we show the proposed methods are R- linearly convergent. In Section 4 we report some numerical comparisons of our methods and the DY, ABB min 2 and SDC methods. Finally, a brief discussion is presented.

5 Gradient methods exploiting spectral properties 5 2 Proposed methods In this section we first analyze the spectral property of the stepsize ᾱ and then present our new gradient methods. 2.1 Spectral property of ᾱ We recall important properties of the method (8). Lemma 1 [12] For any starting point x 1 satisfying µ 1 1 0, µ 1 n 0, let {x } be the iterations generated by the method (8). Then we have that Furthermore, and which indicates that and where µ 2 1 i n = j=1 (µ2 1 j ) 2 µ 2 i n = j=1 (µ2 j )2 αaop T 2 =. λ 1 + g 1 g 1 + c 1 = sign(µ 1 1) c 1, if i = 1; 0, if i = 2,..., n 1; sign(µ 1 n) c 2, if i = n, sign(µ 1 1) c 1, if i = 1; 0, if i = 2,..., n 1; sign(µ 1 n) c 2, if i = n, g 1 g 1 g g = 2sign(µ1 1) c 1 ξ 1 g g = ±2 c 2 ξ n, λ (λ 1 + ), c 2 = 3λ 1 + 4(λ 1 + ). Lemma 1 indicates that the method (8) asymptotically reduces its search in the two-dimensional subspace spanned by ξ 1 and ξ n. In order to accelerate the method, we can employ some stepsize approximates 1 λ 1 or 1 to einate the component µ 1 or µ n. Note that the vector d tends to align the eigenvector ξ n. If we tae some gradient steps with α AOP T such that d ±2 c 2 ξ n, 1 the stepsize ᾱ will be an approximation of. The next theorem provides theoretical evidence for this.

6 6 Yaui Huang et al. Theorem 1 Under the conditions in Lemma 1, let {g } be the sequence generated by applying the method (8) to problem (1). Then we have ᾱ = 1. Proof From the definition () of d, we have d T g 1 T d = 2 2 g g 1 g (11) and d T Ad = gt 1 Ag 1 g gt Ag g 2 2 gt 1 Ag g 1 g, (12) which indicate that and dt d = 2 2 n j=1 µ 2 1 i n j=1 (µ2 1 j ) 2 µ 2 i n j=1 (µ2 j )2 = 2 2(c 1 c 2 ) (13) dt Ad = 2 n j=1 n j=1 λ i(µ 2 1 i ) 2 n + j=1 (µ2 1 j ) 2 λ i µ 2 1 i n j=1 (µ2 1 j ) 2 n j=1 λ i(µ 2 i ) 2 n j=1 (µ2 j )2 µ 2 i n j=1 (µ2 j )2 = 2(λ 1 c 1 + c 2 ) 2(λ 1 c 1 c 2 ) = 4 c 2. (14) It follows from (13), (14) and the definition (9) of ᾱ that ᾱ = d T d d T Ad = 2 2(c 1 c 2 ) 4 c 2 = 4(λ 1 + ) (2 2λ 1 ) 2 (3λ 1 + ) = 1. This completes the proof. Using the same argument as the one in Theorem 1, we get the following result.

7 Gradient methods exploiting spectral properties 7 Theorem 2 Under the conditions in Lemma 1, let {g } be the sequence generated by applying the method (8) to problem (1), we have ˆα = 1 λ 1, where ˆα = ˆd T ˆd ˆd T A ˆd with ˆd = g 1 g 1 + g g. 2.2 The algorithm Theorems 1 and 2 in the former subsection provide us the possibility of employing the two stepsizes ˆα and ᾱ to vanish the components µ 1 and µ n. However, the following example shows negative aspects of using ˆα. Particularly, we applied the method (8) to a problem in the form of (1) with A = diag{a 1, a 2,..., a n }, b = 0, (15) where a 1 = 1, a n = n and a i is randomly generated in (1, n), i = 2,..., n 1. Figure 1 presents the result of an instance with n = 00. We can see that ᾱ approximates 1 with satisfactory accuracy in few iterations. Although we did observe that after 00 iterations the value of ˆα 1 λ 1 is reduced by a factor of 0.01, ˆα converges to 1 λ 1 very slowly in the first few hundreds of iterations. Now we give a rough explanation of the above phenomenon. Since the gradient norm decreases very slowly, by the update rule (2) we have that g (i) g = (1 α λ i ) g(i) 1 g 1 g 1 g (1 α λ i ) g(i) 1 g 1. Suppose that α 0 satisfies 1 α 0 λ i 1 for all i = 1, 2,..., n and 0. This will be true since α AOP T 2 approximates λ 1+ as the iteration process goes on. Let d (i) be the i-th component of d, i.e., d (i) = g(i) g g(i) 1 g 1. For the stepsize ᾱ, 0, we consider the following two cases: Case 1. 1 α λ i 0.6. By trivial computation we now that after five steps the value of g(i) g is less than 8% of its initial value and eeps small at all subsequent iterations, thus d (i) can be neglected. Case 2. 1 α λ i > 0.6.

8 8 Yaui Huang et al g j^,! 1=6 1 j j7,! 1=6 n j iterations Fig. 1 Problem (15): convergence history of the sequence {ˆα } and {ᾱ } for the first 0 iterations of the method (8). g (i) g will not change much and thus d(i) (i) If 1 α λ i 0.9, the value of may not affect the value of ᾱ too much which indicates that the component can also be neglected. (ii) If 0.6 < 1 α λ i < 0.9, i.e., 0.1 < α λ i < 0.4 which implies that d (i) will be small in few iterations and is safe to be abandoned. g (i) g (iii) If 1 α λ i < 0.6, the value of changes signs and d(i) may increase. The above analysis shows that ᾱ will only be dominated by the components corresponding to those eigenvalues in (iii) of Case 2. Notice that the required inequality in (iii) implies that λ i > 4(λ1+λn) 5. If A has many large eigenvalues satisfy the condition, ᾱ will be an estimation of the reciprocal of 1 one of those eigenvalues which is also an approximation of. Once A has few such large eigenvalues, ᾱ will be dominated by the components corresponding to the first few largest eigenvalues which also yields a good estimation of 1. In other words, ᾱ will approximate 1 with satisfactory accuracy in few iterations. This coincides our observation in Figure 1. For the stepsize ˆα, when 1 α λ i > 0, the value of g + g(i) 1 g 1 may increase even if 1 α λ i 0.1. That is, most of the components of the gradient corresponding to those eigenvalues less than 1 α will affect the value of ˆα. 1 Thus, ˆα would not be a good approximation of λ 1 until those components 1 become very small. Moreover, a rough estimation of λ 1 may yield a large step which will increase most of the components of the gradient. As a result, it may impractical to use ˆα for einating the component µ 1. g (i)

9 Gradient methods exploiting spectral properties 9 Based on the above observations, our method only combines ᾱ with α AOP T. Note that, for quadratics, the BB1 stepsize α BB1 is the former SD stepsize (3). Motivated by this and the good performance of the BB method, the stepsize ᾱ with one step delay is employed for the proposed method. As ᾱ 1 is expected to be short to approximate 1, we resort to α AOP T if ᾱ 1 becomes large. More precisely, our method taes h steps with α AOP T and then s short steps, that is, α = { α AOP T, if mod(, h + s) < h; min{α AOP T, ᾱ 1 }, otherwise. (16) Remar 1 Although our method (16) loos lie the SDC method (7), they differ in the following ways: (i) the method (16) does not use exact line searches which are necessary for the SDC method; (ii) the method (16) does not reuse any stepsize while the SDC method uses the same Yuan s stepsize α Y for s steps; (iii) the method (16) is monotone while the SDC method is nonmontone and its monotone version is obtained by using a safeguard with 2α SD. -2,!1 AOP T 7, j7,! 1=6 n j iterations Fig. 2 Problem (15): history of the sequence {ᾱ } for the first 0 iterations of the gradient method with α AOP T 1. The analysis in the beginning of this subsection indicates that d (i) will be small if 1 α λ i is small. Thus, any stepsize approximates some 1 λ i is expected to enforce a small ᾱ. It is natural to consider the retard stepsize α 1 AOP T which has been analyzed in [8]. We can see from Figure 2 that, when the gradient method with α 1 AOP T is applied to problem (15), the stepsize ᾱ

10 Yaui Huang et al. 1 approximates with high accuracy if its value is small. Motivated by this observation, we suggest the following nonmonotone variant: { α 1 AOP T, if mod(, h + s) < h; α = min{α 1 AOP T (17), ᾱ 1 }, otherwise. 3 Convergence In this section, we establish the R-linear convergence of our method (16) and its nonmontone variant (17). Since the gradient method (2) is invariant under translations and rotations when applying to problem (1), we mae the following assumption throughout the analysis. Assumption 1. The matrix A is diagonal, i.e., A = diag{λ 1, λ 2,, }, (18) with 0 < λ 1 < λ 2 < <. In order to give a uniform analysis of the methods (16) and (17), we resort to the following property proposed by Dai [7]. Property (A) [7]. Suppose that there exist an integer m and positive constants M 1 λ 1 and M 2 such that (i) λ 1 α 1 M 1 ; (ii) for any integer l [1, n 1] and ɛ > 0, if G( j, l) ɛ and (g (l+1) j )2 M 2 ɛ hold for j [0, min{, m} 1], then α λ l+1. Here, G(, l) = l (g (i) )2. i=1 Dai [7] has proved that if A has the form (18) with 1 = λ 1 λ 2 and the stepsizes of gradient method (2) have the Property (A), then either g = 0 for some finite or the sequence { g } converges to zero R-linearly. Therefore, we show R-linear convergence of the proposed methods by proving that (16) and (17) satisfy Property (A). Theorem 3 Suppose that the sequence { g } is generated by any of the method (16) or (17) applied to n-dimensional quadratics with the matrix A has the form (18) and 1 = λ 1 < λ 2 < <. Then either g = 0 for some finite or the sequence { g } converges to zero R-linearly. Proof We show that the stepsize α has Property (A) with m = 2, M 1 = and M 2 = 2. Clearly, λ 1 α 1 for all 1. Thus, (i) of Property (A) holds with M 1 =.

11 Gradient methods exploiting spectral properties 11 and Notice that (α AOP T 1 (α AOP T ) 1 (α SD ) 1 ) 1 (α BB1 ) 1 = (α 1) SD 1. We are suffice to show (ii) holds for α BB1. If G( j, l) ɛ and (g (l+1) j )2 2ɛ hold for j [0, min{, m} 1], we have (α BB1 j ) 1 = (α SD j 1) 1 = n i=1 λ i(g (i) j 1 )2 n i=1 (g(i) j 1 )2 λ l+1 n i=l+1 (g(i) j 1 )2 G( j 1, l) + n i=l+1 (g(i) j 1 )2 λ l+1 ɛ l /2ɛ l λ l+1. This completes the proof. 4 Numerical results In this section, we compare the performance of our methods (16) and (17) with the DY method (6) in [14], with the ABB min 2 method in [21], and with the SDC method (7) in [15]. Since the SDC method performs better than its monotone counterpart for most problems, we only run SDC. All codes were written in Matlab (v.9.0-r2016a). All the runs were carried out on a laptop with an Intel Core i7, 2.9 GHz processor and 8 GB of RAM running Windows system. For all the test problems, the iteration was stopped if g ɛ g 1, (19) is satisfied with a given ɛ or the iteration number exceeds 20,000. Based on the observation from Figures 1 and 2, we tested h with values and 20 for our methods. As in [21], the parameter τ used by the ABB min 2 method is set to 0.9 for all the problems. Firstly, the compared methods were applied to 00 dimensional quadratic problems with different spectral distributions. In particular, the problem used in [,13,22,34] is employed, where A = QV Q T with Q = (I 2w 3 w T 3 )(I 2w 2 w T 2 )(I 2w 1 w T 1 ), and w 1, w 2, and w 3 are unitary random vectors, V = diag(v 1,..., v n ) is a diagonal matrix where v 1 = 1 and v n = κ, and v j is randomly generated between 1 and κ for j = 2,..., n 1. The entries of b were randomly generated between - and. Five sets of different spectral distributions presented in

12 12 Yaui Huang et al. Table 1 were carried out. For each problem set, three different values of condition number κ as well as three accuracies ɛ were tested. For each value of κ or ɛ, instances were randomly generated. Table 2 presents the number of iterations averaged over those instances using the starting point x 1 = (1,..., 1) T where the parameter pair (h, s) used for the SDC method was set to (8, 6) which is more efficient than other choices in our test. Table 1 Distributions of v j. Problem Spectrum 1 {v 2,..., v n 1 } (1, κ) {v 2,..., v n/5 } (1, 0) 2 {v n/5+1,..., v n 1 } ( κ 2, κ) {v 2,..., v n/2 } (1, 0) 3 {v n/2+1,..., v n 1 } ( κ 2, κ) {v 2,..., v 4n/5 } (1, 0) 4 {v 4n/5+1,..., v n 1 } ( κ 2, κ) {v 2,..., v n/5 } (1, 0) 5 {v n/5+1,..., v 4n/5 } (0, κ 2 ) {v 4n/5+1,..., v n 1 } ( κ 2, κ) We can see from Table 2 that, our method (16) is very competitive with the DY, ABB min 2 and SDC methods. For a fixed h, larger values of s seem to be preferable for our method (16). In addition, different settings of s lead to comparable results, with differences of less than % in the number of iterations for most of the test problems. For the first problem set, our method (16) outperforms the DY and SDC methods although the ABB min 2 method is the fastest one among the compared methods. Particularly, when a high accuracy is required, the method (16) with (h, s) = (20, 0) taes less than 1 6 and 1 4 iterations needed by the DY and SDC methods, respectively. As for the second to fourth problem sets, the method (16) with different settings performs better than the DY and ABB min 2 methods. Moreover, it is comparable to and even better than the SDC method if proper h and s are selected. Our method (16) also show evidence over the DY and SDC methods on the last problem set. The total number of iterations indicates the effectiveness of our method (16) as well. We have to point out that our method (16) and the DY method are monotone while the ABB min 2 and SDC methods are not. Moreover, the per-iteration cost of the ABB min 2 method is higher because except one matrixvector it needs four vector-vector multiplications to compute its stepsize at each iteration while our method (16) only taes three.

13 Gradient methods exploiting spectral properties 13 problem total Table 2 Number of averaged iterations of the methods (16), DY, ABB min 2 and SDC on problems in Table 1. ɛ (h, s) for the method (16) (, 20) (, 30) (, 50) (, 80) (, 0) (20, 20) (20, 30) (20, 50) (20, 80) (20, 0) DY ABB min 2 SDC Table 3 presents averaged number of iterations of our method (17) on problems in Table 1. For comparison purposes, the results of the DY, ABB min 2 and SDC methods are restated here. Similar performance as the method (16) shown in Table 2 can be ovserved. In particular, for each accuracy level, our method (17) taes around 30% less total iterations than the ABB min 2 method and even less than those by the DY and SDC methods. Notice that problems of the last set are difficult for the compared methods since they require more iterations than other four sets. However, our method (17) always dominates the compared three methods except with the pair (h, s) = (, 20). As compared with the method (16), the retard strategy used in the method (17) tends to improve the performance when h =. For the case h = 20, the method (17) is comparable to and even better than (16) in terms of total iterations. Secondly, we compared the methods on two large-scale real problems Laplace1(a) and Laplace1(b) described in [19]. Both of the problems require the solution of an elliptic system of linear equations arising from a 3D Laplacian on a box, discretized using a standard 7-point finite difference stencil. The solution is fixed by a Gaussian function whose center is (α, β, γ), multiplied by x(x 1). A parameter σ is used to control the rate of decay of the Gaussian. Both Laplace1(a) and Laplace1(b) have n = N 3 variables with N being the interior nodes taen in each coordinate direction and a highly sparse Hessian matrix with condition number We refer the readers to [19] for more details on these problems. In our tests, the associated parameters are set as

14 14 Yaui Huang et al. problem total Table 3 Number of averaged iterations of the methods (17), DY, ABB min 2 and SDC on problems in Table 1. ɛ (h, s) for the method (17) (, 20) (, 30) (, 50) (, 80) (, 0) (20, 20) (20, 30) (20, 50) (20, 80) (20, 0) DY ABB min 2 SDC follows: (a) σ = 20, α = β = γ = 0.5; (b) σ = 50, α = 0.4, β = 0.7, γ = 0.5. We used the null vector as the initial point. The number of iterations required by the compared methods on the two problems Laplace1(a) and Laplace1(b) are listed in Tables 4 and 5. From Table 4 we can see that, for the problem Laplace1(a), our method (16) with a large s outperforms the DY and SDC methods when the accuracy level is high. Moreover, with proper (h, s), it performs better than the ABB min 2 method which dominates the DY and SDC methods. For the problem Laplace1(b), our method (16) is still competitive with the compared methods. Table 5 shows similar tendency as the former one. However, for the problem Laplace1(b), our method (17) clearly outperforms the DY and SDC methods especially the requirement of the accuracy is high. Furthermore, our method (17) is very competitive with the ABB min 2 method in terms of total iterations. 5 Discussion Based on the asymptotic optimal stepsize, we have proposed a new monotone gradient method which employs a new stepsize that converges to the reciprocal of the largest eigenvalue of the Hessian of the objective function. A nonmonotone variant has been presented as well. R-linear convergences of the proposed methods have been established. Our numerical experiments showed the proposed methods are very competitive with other recent successful gradient methods.

15 Gradient methods exploiting spectral properties 15 Table 4 Number of iterations of the methods (16), DY, ABB min 2 and SDC on the 3D Laplacian problem. n ɛ (h, s) for the method (16) DY ABB min 2 SDC (, 20) (, 30) (, 50) (, 80) (, 0) (20, 20) (20, 30) (20, 50) (20, 80) (20, 0) Problem Laplace1(a) total Problem Laplace1(b) total As mentioned in Section 2, any stepsize approximates some 1 λ i is expected to enforce a small ᾱ. Thus, many stepsizes can be selected such as α BB1 and α BB2. We trivially combined the method (16) with the two BB stepsizes, that is, { α BB1, if mod(, h + s) < h; α = min{α BB1 (20), ᾱ 1 }, otherwise, and α = { α BB2, if mod(, h + s) < h; min{α BB2, ᾱ 1 }, otherwise. (21) Promising numerical results of the methods (20) and (21) have been observed. Sophisticated strategy exploits the favorable properties of the BB method and our new stepsize ᾱ will be expected. An interesting question is how to approximate 1 λ 1 to einate the corresponding components. This will be challenge because it needs a large stepsize that may increase those elements related to small eigenvalues and deteriorate the performance. Another future concern is to apply the proposed method to general unconstrained optimization and to constrained optimization.

16 16 Yaui Huang et al. Table 5 Number of iterations of the methods (17), DY, ABB min 2 and SDC on the 3D Laplacian problem. n ɛ (h, s) for the method (17) (, 20) (, 30) (, 50) (, 80) (, 0) (20, 20) (20, 30) (20, 50) (20, 80) (20, 0) DY ABB min 2 SDC Problem Laplace1(a) total Problem Laplace1(b) total References 1. Aaie, H.: On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Ann. Inst. Stat. Math. 11(1), 1 16 (1959) 2. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), (1988) 3. Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. (4), (2000) 4. Birgin, E.G., Martínez, J.M., Raydan, M., et al.: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60(3), (2014) 5. Cauchy, A.: Méthode générale pour la résolution des systemes déquations simultanées. Comp. Rend. Sci. Paris 25, (1847) 6. Cortes, C., Vapni, V.: Support-vector networs. Mach. Learn. 20(3), (1995) 7. Dai, Y.H.: Alternate step gradient method. Optimization 52(4-5), (2003) 8. Dai, Y.H., Al-Baali, M., Yang, X.: A positive Barzilai Borwein-lie stepsize and an extension for symmetric linear systems. In: Numerical Analysis and Optimization, pp Springer (2015) 9. Dai, Y.H., Fletcher, R.: Projected Barzilai Borwein methods for large-scale boxconstrained quadratic programming. Numer. Math. 0(1), (2005). Dai, Y.H., Huang, Y., Liu, X.W.: A family of spectral gradient methods for optimization. available at: HTML/2018/05/6628.html (2018) 11. Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22(1), 1 (2002)

17 Gradient methods exploiting spectral properties Dai, Y.H., Yang, X.: A new gradient method with an optimal stepsize property. Comp. Optim. Appl. 33(1), (2006) 13. Dai, Y.H., Yuan, Y.X.: Alternate minimization gradient method. IMA J. Numer. Anal. 23(3), (2003) 14. Dai, Y.H., Yuan, Y.X.: Analysis of monotone gradient methods. J. Ind. Mang. Optim. 1(2), 181 (2005) 15. De Asmundis, R., Di Serafino, D., Hager, W.W., Toraldo, G., Zhang, H.: An efficient gradient method using the Yuan steplength. Comp. Optim. Appl. 59(3), (2014) 16. Di Serafino, D., Ruggiero, V., Toraldo, G., Zanni, L.: On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput. 318, (2018) 17. Elman, H.C., Golub, G.H.: Inexact and preconditioned Uzawa algorithms for saddle point problems. SIAM J. Numer. Anal. 31(6), (1994) 18. Figueiredo, M.A., Nowa, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1(4), (2007) 19. Fletcher, R.: On the Barzilai Borwein method. In: Optimization and control with applications pp (2005) 20. Forsythe, G.E.: On the asymptotic directions of the s-dimensional optimum gradient method. Numer. Math. 11(1), (1968) 21. Frassoldati, G., Zanni, L., Zanghirati, G.: New adaptive stepsize selections in gradient methods. J. Ind. Mang. Optim. 4(2), 299 (2008) 22. Friedlander, A., Martínez, J.M., Molina, B., Raydan, M.: Gradient method with retards and generalizations. SIAM J. Numer. Anal. 36(1), (1998) 23. Gonzaga, C.C., Schneider, R.M.: On the steepest descent algorithm for quadratic functions. Comp. Optim. Appl. 63(2), (2016) 24. Huang, Y., Liu, H.: Smoothing projected Barzilai Borwein method for constrained non- Lipschitz optimization. Comp. Optim. Appl. 65(3), (2016) 25. Huang, Y., Liu, H., Zhou, S.: Quadratic regularization projected Barzilai Borwein method for nonnegative matrix factorization. Data Min. Knowl. Disc. 29(6), (2015) 26. Jiang, B., Dai, Y.H.: Feasible Barzilai Borwein-lie methods for extreme symmetric eigenvalue problems. Optim. Method Softw. 28(4), (2013) 27. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999) 28. Liu, Y.F., Dai, Y.H., Luo, Z.Q.: Coordinated beamforming for MISO interference channel: Complexity analysis and efficient algorithms. IEEE Trans. Signal Process. 59(3), (2011) 29. Nocedal, J., Sartenaer, A., Zhu, C.: On the behavior of the gradient norm in the steepest descent method. Comp. Optim. Appl. 22(1), 5 35 (2002) 30. Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13(3), (1993) 31. Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7(1), (1997) 32. Yuan, Y.X.: A new stepsize for the steepest descent method. J. Comput. Math. pp (2006) 33. Yuan, Y.X.: Step-sizes for the gradient method. AMS IP Studies in Advanced Mathematics 42(2), (2008) 34. Zhou, B., Gao, L., Dai, Y.H.: Gradient methods with adaptive step-sizes. Comp. Optim. Appl. 35(1), (2006)

On spectral properties of steepest descent methods

On spectral properties of steepest descent methods ON SPECTRAL PROPERTIES OF STEEPEST DESCENT METHODS of 20 On spectral properties of steepest descent methods ROBERTA DE ASMUNDIS Department of Statistical Sciences, University of Rome La Sapienza, Piazzale

More information

Adaptive two-point stepsize gradient algorithm

Adaptive two-point stepsize gradient algorithm Numerical Algorithms 27: 377 385, 2001. 2001 Kluwer Academic Publishers. Printed in the Netherlands. Adaptive two-point stepsize gradient algorithm Yu-Hong Dai and Hongchao Zhang State Key Laboratory of

More information

On the steepest descent algorithm for quadratic functions

On the steepest descent algorithm for quadratic functions On the steepest descent algorithm for quadratic functions Clóvis C. Gonzaga Ruana M. Schneider July 9, 205 Abstract The steepest descent algorithm with exact line searches (Cauchy algorithm) is inefficient,

More information

Accelerated Gradient Methods for Constrained Image Deblurring

Accelerated Gradient Methods for Constrained Image Deblurring Accelerated Gradient Methods for Constrained Image Deblurring S Bonettini 1, R Zanella 2, L Zanni 2, M Bertero 3 1 Dipartimento di Matematica, Università di Ferrara, Via Saragat 1, Building B, I-44100

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

R-Linear Convergence of Limited Memory Steepest Descent

R-Linear Convergence of Limited Memory Steepest Descent R-Linear Convergence of Limited Memory Steepest Descent Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 16T-010 R-Linear Convergence

More information

Step-size Estimation for Unconstrained Optimization Methods

Step-size Estimation for Unconstrained Optimization Methods Volume 24, N. 3, pp. 399 416, 2005 Copyright 2005 SBMAC ISSN 0101-8205 www.scielo.br/cam Step-size Estimation for Unconstrained Optimization Methods ZHEN-JUN SHI 1,2 and JIE SHEN 3 1 College of Operations

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Math 164: Optimization Barzilai-Borwein Method

Math 164: Optimization Barzilai-Borwein Method Math 164: Optimization Barzilai-Borwein Method Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 online discussions on piazza.com Main features of the Barzilai-Borwein (BB) method The BB

More information

On the regularization properties of some spectral gradient methods

On the regularization properties of some spectral gradient methods On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples daniela.diserafino@unina2.it contributions from

More information

New hybrid conjugate gradient methods with the generalized Wolfe line search

New hybrid conjugate gradient methods with the generalized Wolfe line search Xu and Kong SpringerPlus (016)5:881 DOI 10.1186/s40064-016-5-9 METHODOLOGY New hybrid conjugate gradient methods with the generalized Wolfe line search Open Access Xiao Xu * and Fan yu Kong *Correspondence:

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720

Steepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720 Steepest Descent Juan C. Meza Lawrence Berkeley National Laboratory Berkeley, California 94720 Abstract The steepest descent method has a rich history and is one of the simplest and best known methods

More information

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S. Bull. Iranian Math. Soc. Vol. 40 (2014), No. 1, pp. 235 242 Online ISSN: 1735-8515 AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

More information

Scaled gradient projection methods in image deblurring and denoising

Scaled gradient projection methods in image deblurring and denoising Scaled gradient projection methods in image deblurring and denoising Mario Bertero 1 Patrizia Boccacci 1 Silvia Bonettini 2 Riccardo Zanella 3 Luca Zanni 3 1 Dipartmento di Matematica, Università di Genova

More information

Bulletin of the. Iranian Mathematical Society

Bulletin of the. Iranian Mathematical Society ISSN: 1017-060X (Print) ISSN: 1735-8515 (Online) Bulletin of the Iranian Mathematical Society Vol. 43 (2017), No. 7, pp. 2437 2448. Title: Extensions of the Hestenes Stiefel and Pola Ribière Polya conjugate

More information

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1

More information

On the steplength selection in gradient methods for unconstrained optimization

On the steplength selection in gradient methods for unconstrained optimization On the steplength selection in gradient methods for unconstrained optimization Daniela di Serafino a,, Valeria Ruggiero b, Gerardo Toraldo c, Luca Zanni d a Department of Mathematics and Physics, University

More information

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Barzilai-Borwein Step Size for Stochastic Gradient Descent Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan The Chinese University of Hong Kong chtan@se.cuhk.edu.hk Shiqian Ma The Chinese University of Hong Kong sqma@se.cuhk.edu.hk Yu-Hong

More information

A Modified Hestenes-Stiefel Conjugate Gradient Method and Its Convergence

A Modified Hestenes-Stiefel Conjugate Gradient Method and Its Convergence Journal of Mathematical Research & Exposition Mar., 2010, Vol. 30, No. 2, pp. 297 308 DOI:10.3770/j.issn:1000-341X.2010.02.013 Http://jmre.dlut.edu.cn A Modified Hestenes-Stiefel Conjugate Gradient Method

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Dept. of Mathematics, University of Dundee, Dundee DD1 4HN, Scotland, UK.

Dept. of Mathematics, University of Dundee, Dundee DD1 4HN, Scotland, UK. A LIMITED MEMORY STEEPEST DESCENT METHOD ROGER FLETCHER Dept. of Mathematics, University of Dundee, Dundee DD1 4HN, Scotland, UK. e-mail fletcher@uk.ac.dundee.mcs Edinburgh Research Group in Optimization

More information

Unconstrained Optimization Models for Computing Several Extreme Eigenpairs of Real Symmetric Matrices

Unconstrained Optimization Models for Computing Several Extreme Eigenpairs of Real Symmetric Matrices Unconstrained Optimization Models for Computing Several Extreme Eigenpairs of Real Symmetric Matrices Yu-Hong Dai LSEC, ICMSEC Academy of Mathematics and Systems Science Chinese Academy of Sciences Joint

More information

The cyclic Barzilai Borwein method for unconstrained optimization

The cyclic Barzilai Borwein method for unconstrained optimization IMA Journal of Numerical Analysis Advance Access published March 24, 2006 IMA Journal of Numerical Analysis Pageof24 doi:0.093/imanum/drl006 The cyclic Barzilai Borwein method for unconstrained optimization

More information

Residual iterative schemes for largescale linear systems

Residual iterative schemes for largescale linear systems Universidad Central de Venezuela Facultad de Ciencias Escuela de Computación Lecturas en Ciencias de la Computación ISSN 1316-6239 Residual iterative schemes for largescale linear systems William La Cruz

More information

R-linear convergence of limited memory steepest descent

R-linear convergence of limited memory steepest descent IMA Journal of Numerical Analysis (2018) 38, 720 742 doi: 10.1093/imanum/drx016 Advance Access publication on April 24, 2017 R-linear convergence of limited memory steepest descent Fran E. Curtis and Wei

More information

Handling nonpositive curvature in a limited memory steepest descent method

Handling nonpositive curvature in a limited memory steepest descent method IMA Journal of Numerical Analysis (2016) 36, 717 742 doi:10.1093/imanum/drv034 Advance Access publication on July 8, 2015 Handling nonpositive curvature in a limited memory steepest descent method Frank

More information

Globally convergent three-term conjugate gradient projection methods for solving nonlinear monotone equations

Globally convergent three-term conjugate gradient projection methods for solving nonlinear monotone equations Arab. J. Math. (2018) 7:289 301 https://doi.org/10.1007/s40065-018-0206-8 Arabian Journal of Mathematics Mompati S. Koorapetse P. Kaelo Globally convergent three-term conjugate gradient projection methods

More information

R-Linear Convergence of Limited Memory Steepest Descent

R-Linear Convergence of Limited Memory Steepest Descent R-Linear Convergence of Limited Memory Steepest Descent Frank E. Curtis, Lehigh University joint work with Wei Guo, Lehigh University OP17 Vancouver, British Columbia, Canada 24 May 2017 R-Linear Convergence

More information

Global Convergence Properties of the HS Conjugate Gradient Method

Global Convergence Properties of the HS Conjugate Gradient Method Applied Mathematical Sciences, Vol. 7, 2013, no. 142, 7077-7091 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.311638 Global Convergence Properties of the HS Conjugate Gradient Method

More information

Modification of the Armijo line search to satisfy the convergence properties of HS method

Modification of the Armijo line search to satisfy the convergence properties of HS method Université de Sfax Faculté des Sciences de Sfax Département de Mathématiques BP. 1171 Rte. Soukra 3000 Sfax Tunisia INTERNATIONAL CONFERENCE ON ADVANCES IN APPLIED MATHEMATICS 2014 Modification of the

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

First Published on: 11 October 2006 To link to this article: DOI: / URL:

First Published on: 11 October 2006 To link to this article: DOI: / URL: his article was downloaded by:[universitetsbiblioteet i Bergen] [Universitetsbiblioteet i Bergen] On: 12 March 2007 Access Details: [subscription number 768372013] Publisher: aylor & Francis Informa Ltd

More information

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

More information

New hybrid conjugate gradient algorithms for unconstrained optimization

New hybrid conjugate gradient algorithms for unconstrained optimization ew hybrid conjugate gradient algorithms for unconstrained optimization eculai Andrei Research Institute for Informatics, Center for Advanced Modeling and Optimization, 8-0, Averescu Avenue, Bucharest,

More information

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality

More information

New Inexact Line Search Method for Unconstrained Optimization 1,2

New Inexact Line Search Method for Unconstrained Optimization 1,2 journal of optimization theory and applications: Vol. 127, No. 2, pp. 425 446, November 2005 ( 2005) DOI: 10.1007/s10957-005-6553-6 New Inexact Line Search Method for Unconstrained Optimization 1,2 Z.

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres Front. Math. China 2017, 12(6): 1409 1426 https://doi.org/10.1007/s11464-017-0644-1 Approximation algorithms for nonnegative polynomial optimization problems over unit spheres Xinzhen ZHANG 1, Guanglu

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Residual iterative schemes for large-scale nonsymmetric positive definite linear systems

Residual iterative schemes for large-scale nonsymmetric positive definite linear systems Volume 27, N. 2, pp. 151 173, 2008 Copyright 2008 SBMAC ISSN 0101-8205 www.scielo.br/cam Residual iterative schemes for large-scale nonsymmetric positive definite linear systems WILLIAM LA CRUZ 1 and MARCOS

More information

on descent spectral cg algorithm for training recurrent neural networks

on descent spectral cg algorithm for training recurrent neural networks Technical R e p o r t on descent spectral cg algorithm for training recurrent neural networs I.E. Livieris, D.G. Sotiropoulos 2 P. Pintelas,3 No. TR9-4 University of Patras Department of Mathematics GR-265

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize Bol. Soc. Paran. Mat. (3s.) v. 00 0 (0000):????. c SPM ISSN-2175-1188 on line ISSN-00378712 in press SPM: www.spm.uem.br/bspm doi:10.5269/bspm.v38i6.35641 Convergence of a Two-parameter Family of Conjugate

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems

More information

On efficiency of nonmonotone Armijo-type line searches

On efficiency of nonmonotone Armijo-type line searches Noname manuscript No. (will be inserted by the editor On efficiency of nonmonotone Armijo-type line searches Masoud Ahookhosh Susan Ghaderi Abstract Monotonicity and nonmonotonicity play a key role in

More information

A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations

A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations Journal of Computational Applied Mathematics 224 (2009) 11 19 Contents lists available at ScienceDirect Journal of Computational Applied Mathematics journal homepage: www.elsevier.com/locate/cam A family

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

arxiv: v1 [math.na] 17 Dec 2018

arxiv: v1 [math.na] 17 Dec 2018 arxiv:1812.06822v1 [math.na] 17 Dec 2018 Subsampled Nonmonotone Spectral Gradient Methods Greta Malaspina Stefania Bellavia Nataša Krklec Jerinkić Abstract This paper deals with subsampled Spectral gradient

More information

Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization

Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization Journal of Computational and Applied Mathematics 155 (2003) 285 305 www.elsevier.com/locate/cam Nonmonotonic bac-tracing trust region interior point algorithm for linear constrained optimization Detong

More information

Nearest Correlation Matrix

Nearest Correlation Matrix Nearest Correlation Matrix The NAG Library has a range of functionality in the area of computing the nearest correlation matrix. In this article we take a look at nearest correlation matrix problems, giving

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal

More information

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations International Journal of Mathematical Modelling & Computations Vol. 07, No. 02, Spring 2017, 145-157 An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations L. Muhammad

More information

Research Article A Descent Dai-Liao Conjugate Gradient Method Based on a Modified Secant Equation and Its Global Convergence

Research Article A Descent Dai-Liao Conjugate Gradient Method Based on a Modified Secant Equation and Its Global Convergence International Scholarly Research Networ ISRN Computational Mathematics Volume 2012, Article ID 435495, 8 pages doi:10.5402/2012/435495 Research Article A Descent Dai-Liao Conjugate Gradient Method Based

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Journal of Computational and Applied Mathematics. Notes on the Dai Yuan Yuan modified spectral gradient method

Journal of Computational and Applied Mathematics. Notes on the Dai Yuan Yuan modified spectral gradient method Journal of Computational Applied Mathematics 234 (200) 2986 2992 Contents lists available at ScienceDirect Journal of Computational Applied Mathematics journal homepage: wwwelseviercom/locate/cam Notes

More information

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,

More information

An Efficient Modification of Nonlinear Conjugate Gradient Method

An Efficient Modification of Nonlinear Conjugate Gradient Method Malaysian Journal of Mathematical Sciences 10(S) March : 167-178 (2016) Special Issue: he 10th IM-G International Conference on Mathematics, Statistics and its Applications 2014 (ICMSA 2014) MALAYSIAN

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Nonlinear conjugate gradient methods, Unconstrained optimization, Nonlinear

Nonlinear conjugate gradient methods, Unconstrained optimization, Nonlinear A SURVEY OF NONLINEAR CONJUGATE GRADIENT METHODS WILLIAM W. HAGER AND HONGCHAO ZHANG Abstract. This paper reviews the development of different versions of nonlinear conjugate gradient methods, with special

More information

Interior-Point Methods as Inexact Newton Methods. Silvia Bonettini Università di Modena e Reggio Emilia Italy

Interior-Point Methods as Inexact Newton Methods. Silvia Bonettini Università di Modena e Reggio Emilia Italy InteriorPoint Methods as Inexact Newton Methods Silvia Bonettini Università di Modena e Reggio Emilia Italy Valeria Ruggiero Università di Ferrara Emanuele Galligani Università di Modena e Reggio Emilia

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION A DIAGONAL-AUGMENTED QUASI-NEWTON METHOD WITH APPLICATION TO FACTORIZATION MACHINES Aryan Mohtari and Amir Ingber Department of Electrical and Systems Engineering, University of Pennsylvania, PA, USA Big-data

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

A Smoothing Newton Method for Solving Absolute Value Equations

A Smoothing Newton Method for Solving Absolute Value Equations A Smoothing Newton Method for Solving Absolute Value Equations Xiaoqin Jiang Department of public basic, Wuhan Yangtze Business University, Wuhan 430065, P.R. China 392875220@qq.com Abstract: In this paper,

More information

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search ANZIAM J. 55 (E) pp.e79 E89, 2014 E79 On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search Lijun Li 1 Weijun Zhou 2 (Received 21 May 2013; revised

More information

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W

TMA4180 Solutions to recommended exercises in Chapter 3 of N&W TMA480 Solutions to recommended exercises in Chapter 3 of N&W Exercise 3. The steepest descent and Newtons method with the bactracing algorithm is implemented in rosenbroc_newton.m. With initial point

More information

Modifications of Steepest Descent Method and Conjugate Gradient Method Against Noise for Ill-posed Linear Systems

Modifications of Steepest Descent Method and Conjugate Gradient Method Against Noise for Ill-posed Linear Systems Available online at www.ispacs.com/cna Volume 2012, Year 2012 Article ID cna-00115, 24 pages doi: 10.5899/2012/cna-00115 Research Article Modifications of Steepest Descent Method and Conjugate Gradient

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Zhaosong Lu December 2, 2008 Abstract In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical

More information

Multilevel Optimization Methods: Convergence and Problem Structure

Multilevel Optimization Methods: Convergence and Problem Structure Multilevel Optimization Methods: Convergence and Problem Structure Chin Pang Ho, Panos Parpas Department of Computing, Imperial College London, United Kingdom October 8, 206 Abstract Building upon multigrid

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić February 7, 2017 Abstract We consider distributed optimization

More information

The semi-convergence of GSI method for singular saddle point problems

The semi-convergence of GSI method for singular saddle point problems Bull. Math. Soc. Sci. Math. Roumanie Tome 57(05 No., 04, 93 00 The semi-convergence of GSI method for singular saddle point problems by Shu-Xin Miao Abstract Recently, Miao Wang considered the GSI method

More information

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Steepest descent method implementation on unconstrained optimization problem using C++ program

Steepest descent method implementation on unconstrained optimization problem using C++ program IOP Conerence Series: Materials Science and Engineering PAPER OPEN ACCESS Steepest descent method implementation on unconstrained optimization problem using C++ program o cite this article: H Napitupulu

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Lecture 3: Linesearch methods (continued). Steepest descent methods

Lecture 3: Linesearch methods (continued). Steepest descent methods Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).

More information

THE solution of the absolute value equation (AVE) of

THE solution of the absolute value equation (AVE) of The nonlinear HSS-like iterative method for absolute value equations Mu-Zheng Zhu Member, IAENG, and Ya-E Qi arxiv:1403.7013v4 [math.na] 2 Jan 2018 Abstract Salkuyeh proposed the Picard-HSS iteration method

More information