New Accelerated Conjugate Gradient Algorithms for Unconstrained Optimization

Size: px

Start display at page:

Download "New Accelerated Conjugate Gradient Algorithms for Unconstrained Optimization"

Kory Jordan
5 years ago
Views:

1 ew Accelerated Conjugate Gradient Algorithms for Unconstrained Optimization eculai Andrei Research Institute for Informatics, Center for Advanced Modeling and Optimization, 8-0, Averescu Avenue, Bucharest, Romania and Academy of Romanian Scientists, 54, Splaiul Independentei, Bucharest 5, Romania Abstract ew accelerated nonlinear conjugate gradient algorithms which are mainly modifications of the Dai and Yuan s for unconstrained optimization are proposed Using the exact line search, the algorithm reduces to the Dai and Yuan conjugate gradient computational scheme For inexact line search the algorithm satisfies the sufficient descent condition Since the step lengths in conjugate gradient algorithms may differ from by two order of magnitude and tend to vary in a very unpredictable manner, we suggest an acceleration scheme able to improve the efficiency of the algorithms A global convergence result is proved when the strong Wolfe line search conditions are used Computational results for a set consisting of 750 unconstrained optimization test problems show that these new conjugate gradient algorithms substantially outperform the Dai-Yuan conjugate gradient algorithm and its hybrid variants Keywords: Unconstrained optimization, conjugate gradient method, sufficient descent condition, conjugacy condition, ewton direction, numerical comparisons AMS 000 Mathematics Subject Classification: 49M07, 49M0, 90C06, 65K Dedicated to the memory of Professor Gene H Golub (93-007) Introduction Conjugate gradient methods represent an important class of unconstrained optimization algorithms with strong local and global convergence properties and modest memory requirements A history of these algorithms has been given by Golub and O Leary [8], as well as by O Leary [38] A survey of development of different versions of nonlinear conjugate gradient methods, with special attention to global convergence properties is presented by Hager and Zhang [3] A survey on their definition including 40 conjugate gradient algorithms for unconstrained optimization is given by Andrei [7] he conjugate gradient algorithms have been introduced early in 95 by Cornelius Lanczos [34, 35] and by Magnus Hestenes with the cooperation of JB Rosser, G E Forsythe, L Paige, M Stein, R Hayes and U Hochstrasser at the Institute for umerical Analysis, a part of ational Bureau of Standards in Los Angeles, and by Eduard Stiefel at Eidgenössischen echnischen Hochschule in Zürich (see Hestenes and Stiefel [3]) Even if the conjugate gradient methods are now over 50 years old, they continue to be of a considerable interest particularly due to their convergence properties and to a very easy implementation in computer programs of the corresponding algorithms Many researchers in this area brought significant contributions, clarifying many theoretical and computational aspects of this class of algorithms Particularly, for the development of effective algorithms for solving basic linear algebra problems Golub and Kahan [7] discussed the use of the Lanczos algorithm in computing the singular value decomposition of non-quadratic functions Concus, Golub and O Leary [5] considered preconditioning of conjugate gradient algorithms An important extension of the conjugate

2 gradient algorithm was given by Concus and Golub [6] where the Hermitian part of the matrix is taen as a preconditioner A development of a bloc form of the Lanczos algorithm has been proposed by Golub, Underwood and Wilinson [9] Fletcher and Reeves extended the conjugate gradient algorithm to minimization of non-quadratic functions [5], thus opening an important area of research and applications Developments of this algorithm have been considered inter allia by Pola and Ribière [4], Polya [4], Perry [40], Shanno [44, 45], Gilbert and ocedal [6], Liu and Storey [37], Hu and Storey [33],ouati-Ahmed and Storey [46] Recently, the papers of ocedal [39], Dai and Yuan [0, ], Hager and Zhang [30], Andrei [-6, 8-] present important theoretical and computational contributions to the development of this class of algorithms In this paper we suggest new nonlinear conjugate gradient algorithms which are mainly modifications of the Dai and Yuan [0] conjugate gradient computational scheme In these algorithms the direction d + is computed as a linear combination between g + and s, ie d+ = θ+ g+ + β s, where s = x+ x he parameter θ is computed in such a way that the direction d + is the ewton direction or it satisfies the conjugacy condition On the other hand, β is a proper modification of the Dai and Yuan s computational scheme in such a way that the direction d + satisfies the sufficient descent condition For the exact line search the proposed algorithms reduce to the Dai and Yuan conjugate gradient computational scheme he paper has the following structure In Section we present the development of the conjugate gradient algorithms with sufficient descent condition, while in section 3 we prove the global convergence of the algorithms under strong Wolfe line search conditions In Section 4 we present an acceleration of the algorithm while in Section 5 we compare the computational performance of the new conjugate gradient schemes against the Dai and Yuan method and its hybrid variants [] using 750 unconstrained optimization test problems from the CUE [4] library along with some other large-scale unconstrained optimization problems presented in [] Using the Dolan and Moré performance profiles [3] we prove that these new accelerated conjugate gradient algorithms outperform the Dai-Yuan algorithm as well as its hybrid variants Modifications of the Dai-Yuan conjugate gradient algorithm For solving the unconstrained optimization problem { } min f ( x): x R n, () n where f: R R is continuously differentiable we consider a nonlinear conjugate gradient algorithm: x+ = x + α d, () where the stepsize α is positive and the directions are computed by the rule: where d d + θ+ g+ β s, β = + d = g, (3) 0 0 g+ g+ s g+ = ys ( ys ) ( ), (4) and θ + is a parameter which follows to be determined Here g = f ( x ) and y = g + g, s = x+ x Observe that if f is a quadratic function and α is selected to achieve the exact minimum of f in the direction d, then s g = + 0 and the formula (4) for β reduces to the Dai and Yuan computational scheme [0] However, in this paper we refer to general nonlinear functions and inexact line search We were led to this computational scheme by modifying the Dai and Yuan algorithm

3 β DY = g g + +, y s in order to conserve the sufficient descent condition and to have some other properties for an efficient conjugate gradient algorithm Using a standard Wolfe line search, the Dai and Yuan method always generates descent directions and under Lipschitz assumption it is globally convergent In [7] Dai established a remarable property relating the descent directions to the sufficient descent condition, showing that if there exist constants γ and γ such that γ g γ for all, then for any p ( 0, ), there exists a constant c > 0 such that the sufficient descent condition gi di c gi holds for at least p indices i [ 0, ], where j denotes the largest integer j In our algorithm the parameter β is selected in such a manner that the sufficient descent condition is satisfied at every iteration As we now, despite the strong convergence theory that has been developed for the Dai and Yuan method, it is susceptible to jamming, that is it begins to tae small steps without maing significant progress to the minimum When the iterates jam, y becomes tiny while g is DY bounded away from zero herefore, β is a proper modification of the β heorem If θ + /4, then the direction d+ = θ+ g+ + β s, (d0 = g 0 ), where β is given by (4) satisfies the sufficient descent condition g d + + θ+ g+ (5) 4 Proof Since d g +, we have g, we have g d = g, which satisfy (5) Multiplying (3) by 0 = ( g )( ) g ( s g ) + g+ g+ s θ+ + ys ( ys ) g d = g + (6) ow, using the inequality uv ( u + v ) we have: ( ys ) g+ / ( g+ s) g = ys ( ys ) ( g g )( g s ) ( ) ( ) ys g g s g ( ys) ( g+ s) g+ + ys = g + 4 ( ) (7) Using (7) in (6) we get (5) o conclude, the sufficient descent condition from (5), the quantity θ + /4 is required to be nonnegative Supposing that θ + /4 > 0, then the direction given by (3) and (4) is a descent direction Dai and Yuan [0, ] present conjugate gradient schemes with the property that g d < 0 when y s > 0 If f is strongly convex or the line search satisfies the Wolfe conditions, then y s > 0 and the Dai and Yuan scheme yield descent In our 3

4 algorithm observe that, if for all, θ + /4, and the line search satisfies the Wolfe conditions, then for all the search direction (3) and (4) satisfy the sufficient descent condition ote that in (5) we bound g + d + by ( θ + /4) g+, while for scheme of Dai and Yuan only the non-negativity of g + d + is established o determine the parameter θ + in (3) we suggest the following two procedures A) Our motivation to get a good algorithm for solving () is to choose the parameter θ + in such a way that for every the direction d + given by (3) be the ewton direction herefore, from the equation f ( x ) g = θ g + β s (8) after some algebra we get θ g + s g + + = ( ) s f x+ s + s g+ s f( x+ ) g+ ys ys he salient point in this formula for θ + is the presence of the Hessian For large-scale problems, choices for the update parameter that do not require the evaluation of the Hessian matrix are often preferred in practice to the methods that require the Hessian in each iteration herefore, in order to have an algorithm for solving large-scale problems we assume that in (8) we use an approximation B + of the true Hessian quasi-ewton equation B+ s = y his leads us to: f ( x + ) and let (9) B + satisfy the g ( s g ) + + θ+ = g + + s g + (0) y g+ ys Observe that if θ + given by (0) is greater than or equal to / 4, then according to heorem the direction (3) satisfies the sufficient descent condition (5) On the other hand, if in (0) θ + < /4, then we tae ex abrupto θ + = in (3) B) he second procedure is based on the conjugacy condition Dai and Liao [8] introduced the conjugacy condition yd = tsg where t 0 is a scalar his is indeed very, + + reasonable since in real computation the inexact line search is generally used However, this condition is very dependent on the nonnegative parameter t, for which we do not now any formula to choose in an optimal manner herefore, even if in our developments we use the inexact line search we adopt here a more conservative approach and consider the conjugacy condition yd + = 0 his leads us to: g ( s g ) + + θ+ = g + () y g+ ys Again, if θ + given by () is greater than or equal to / 4, then according to heorem the direction (3) satisfies the sufficient descent condition (5) On the other hand, if in () θ + < /4, then we tae θ + = in (3) he line search in the conjugate gradient algorithms for α computation is often based on the standard Wolfe conditions: f ( x + α d ) f ( x ) ρα g d (), g + d σ g d, (3) 4

5 where d is a descent direction and 0< ρ σ < In [] Dai and Yuan proved the global convergence of a conjugate gradient algorithm for DY which β = β t, where t [ c,] with c = ( σ ) /( + σ ) Our algorithm is a modification of the Dai and Yuan s with the following property Observe that where β g + s g+ ys ys DY = = β r, (4) r s g = (5) + ys s g+ σs g σys σs g+ From the second Wolfe condition it follows that = +, ie Since by the Wolfe condition σ s g+ ys σ s g+ σ ys > 0, it follows that Hence ys σ r σ herefore, β β DY σ 3 Convergence analysis In this section we analyze the convergence of the algorithm (), (3), (4) and (0) or () where d 0 = g 0 In the following we consider that g 0 for all, otherwise a stationary point is obtained Assume that: S = x R n : f( x) f( x ) is bounded (i) he level set { 0 } (ii) In a neighborhood of S, the function f is continuously differentiable and its gradient is Lipschitz continuous, ie there exists a constant L > 0 such that f ( x) f( y) L x y, for all x, y Under these assumptions on f there exists a constant Γ 0 such that f( x) Γ for all x S In order to prove the global convergence, we assume that the step size α in () is obtained by the strong Wolfe line search, that is, f ( x + α d ) f ( x ) ρα g d (3), g d g d + σ (3) where ρ and σ are positive constants such that 0< ρ σ < Dai et al [] proved that for any conjugate gradient method with strong Wolfe line search the following general result holds: Lemma 3 Suppose that the assumptions (i) and (ii) hold and consider any conjugate gradient method () and (3), where d is a descent direction and α is obtained by the strong Wolfe line search (3) and (3) If then =, (33) d 5

6 lim inf g = 0 (34) heorem 3 Suppose that the assumptions (i) and (ii) hold and consider the algorithm (), (3), (4) and (0) or (), where d + is a descent direction and α is obtained by the strong Wolfe line search (3) and (3) If there exists a constant γ 0 such γ f ( x), /4 θ τ, where τ is a positive constant and the angle ϕ between g and d is bounded, ie cosϕ ξ for all = 0,,, then the algorithm satisfies li minf g = 0 Proof Observe that ys = g+ s gs ( σ ) gs But gs = g s cos ϕ Since d is a descent direction it follows that g s g s ξ 0 for all = 0,,, ie With these where herefore β his relation shows that ys ( σ) g s ξ + g+ Γ σ ( σ) ξ ( σ) ξγ g η =, ys g s s s Γ η = ( σ ) ξγ η d g s s + θ+ + + β τγ+ = τγ+ η s = d ( τγ+ η) Hence, from Lemma 3 it follows that liminf g = 0 4 Acceleration of the algorithm In conjugate gradient algorithms the search directions tend to be poorly scaled and as a consequence the line search must perform more function evaluations in order to obtain a suitable steplength α In order to improve the performances of the conjugate gradient algorithms the efforts were directed to design procedures for direction computation based on the second order information For example, COMI [43], and SCALCG [-5] tae this idea of BFGS preconditioning In this section we focus on the step length modification In the context of gradient descent algorithm with bactracing the step length modification has been considered for the first time in [6] ocedal [39] pointed out that in conjugate gradient methods the step lengths may differ from in a very unpredictable manner hey can be larger or smaller than depending on how the problem is scaled umerical comparisons between conjugate gradient methods and the limited memory quasi ewton method, by Liu and ocedal [36], show that the latter is more successful [8] One explanation of the efficiency of the limited memory quasi-ewton method is given by its ability to accept unity step lengths along the iterations In this section we tae advantage of this behavior of conjugate gradient algorithms and present an acceleration scheme Basically this modifies the step length in a multiplicative manner to improve the reduction of the function values along the iterations 6

7 Line search For using the algorithm () one of the crucial elements is the stepsize computation For sae of generality in the following we consider the line searches that satisfy either the Goldstein s conditions [4]: ρα g d f( x + α d ) f( x ) ρα g d, (4) where 0< ρ < < ρ < and α > 0, or the Wolfe conditions () and (3) Proposition 4 Assume that d is a descent direction and f satisfies the Lipschitz condition f ( x) f( x) L x x for all x on the line segment connecting x and x + where L is a positive constant If the line search satisfies the Goldstein conditions (4),, then α g d ( ρ) (4) L If the line search satisfies the Wolfe conditions () and (3), then ( σ ) gd α (43) L d Proof If the Goldstein conditions are satisfied, then using the mean value theorem from (4) we get: ρα g d f( x + α d ) f( x ) d = α f ( x + ξd ) d where ξ [0, α ] From this inequality we immediately get (4) ow, to prove (43) subtract condition we get: But, g d α g d + Lα d from both sides of (3) and using the Lipschitz ( ) ( ) α (44) σ gd g+ g d L d d is a descent direction and since σ <, we immediately get (43) herefore, satisfying the Goldstein or the Wolfe line search conditions α is bounded away from zero, ie there exists a positive constant ω, such that α ω Acceleration scheme [6] Given the initial point x 0 we can compute f 0 = f( x 0 ), g0 = f( x 0 ) and by Wolfe line search conditions () and (3) the steplength α 0 is determined With these, the next iteration is computed as: x = x0 + α0d0, ( d0 = g 0 ) where f and are immediately determined and the direction d can be computed as: g d = θg+ β0 s0, where the conjugate gradient parameter β 0 is computed as in (4) and the scaling factor θ is computed as in (0) or () herefore, at the iteration =,, we now x, f, g and d = θg + β s Suppose that d is a descent direction (ie θ /4) By the Wolfe line search () and (3) we can compute α with which the following point z = x + αd is determined he first Wolfe condition () shows that the steplength α > 0 satisfies: f ( z) = f( x + α d ) f( x ) + ρα g d, 7

8 With these, let us introduce the accelerated conjugate gradient algorithm by means of the following iterative scheme: x = x + + γ αd, (45) where γ > 0 is a parameter which follows to be determined in such a manner as to improve the behavior of the algorithm ow, we have: f( x + αd) = f( x) + αgd + αd f( x) d + o( αd ) (46) On the other hand, for γ > 0 we have: f( x + γαd) = f( x) + γαgd + γ αd f( x) d + o( γαd ) (47) With these we can write: f( x + γα d ) = f( x + α d ) +Ψ ( γ ), (48) where Let us denote: Ψ ( γ) = ( γ ) αd f( x) d + ( γ ) αgd ( ) ( ) + γαo α d αo α d a = α g d 0, b = α d f( x ) d, ε = o ( α d ) Observe that a 0, since d is a descent direction, and for convex functions b 0 herefore, Ψ γ = γ + γ + γ α ε α ε ow, we see that Ψ ( γ ) = ( b + αε ) γ+ a and Ψ ( γ ) = 0 where ( ) ( ) b ( ) a m (49) (40) a γ m = b + α ε (4) Observe that Ψ (0) = a 0 herefore, assuming that b + α ε > 0, then Ψ ( γ ) is a convex quadratic function with minimum value in point γ m and ( a + ( b + αε )) Ψ ( γ m) = 0 ( b + αε ) m Considering γ = γ in (48) and since b 0, we see that for every ( a + ( b + αε )) f ( x + γmαd) = f( x + αd) f( x + αd), ( b + αε ) which is a possible improvement of the values of function f (when a + ( b + α ε ) 0) herefore, using this simple multiplicative modification of the stepsize α as γ α where γ = γ = a /( b + α ε ) we get: m ( a + ( b + αε )) f( x ) = f( x + γαd ) f( x ) + ραg d + ( b + αε ) ( a + ( b + αε )) = f ( x) ρa f( x), (4) ( b + αε ) 8

9 since a 0, ( d is a descent direction) Observe that if d is a descent direction, then ( a + ( b + αε )) ( a + b) > ( b + αε ) b and from (4) we get: ( a + ( b + αε )) f ( x+ ) f( x) ρa ( b + αε ) ( a + b ) < f ( x) ρa f( x ) b herefore, neglecting the contribution of ε, we still get an improvement on the function values ow, in order to get the algorithm we have to determine a way for computation For this, at point = + α we have: z x d f ( z) = f( x + αd) = f( x) + αgd + αd f( x ) d, where x is a point on the line segment connecting x and z On the other hand, at point x = z α d we have: f ( x) = f( z αd) = f( z) αgzd + αd f( x) d, where gz = f( z) and x is a point on the line segment connecting x and z Having in view the local character of searching and that the distance between x and z is small enough, we can consider x = x = x So, adding the above equalities we get: where y = g g z b = α y d, b (43) Observe that if a > b, then γ > In this case γ α > α and it is also possible that γ α or γ α > Hence, the steplength γ α can be greater than On the other hand, if a b, then γ In this case γ α α, so the steplength γ α is reduced herefore, if a b, then γ and the steplength α computed by Wolfe conditions will be modified by its increasing or its reducing through factor γ eglecting ε in (40), we see that Ψ () = 0 and if a b /, then Ψ (0) = a b / 0 and γ < herefore, for any γ [0,], Ψ ( γ ) 0 As a consequence for any γ (0,), it follows that f ( x + γα d ) < f( x ) In this case, for any γ [0,], γ α α However, in our algorithm we selected γ = γ m as the point achieving the minimum value of Ψ ( γ ) 5 AMDY and AMDYC Algorithms Considering the definitions of g, s and y we present the following conjugate gradient algorithms which are accelerated, modified versions of the Dai and Yuan algorithm with ewton direction (AMDY) or with conjugacy condition (AMDYC) 9

10 AMDY and AMDYC Algorithms Step Initialization Select x R n 0 and the parameters 0< ρ < σ < Compute f ( x ) 0 and g 0 Consider d0 = g0 and α 0 = / g 0 Set = 0 Step est for continuation of iterations If g 0 6, then stop, else set = + Step 3 Line search Compute α satisfying the Wolfe line search conditions () and (3) Step 4 Compute: z = x + αd, gz = f( z) and y = g gz Step 5 Compute: a α g d, and b = α y d = Step 6 If b 0, then compute γ = a / b and update the variables as g + Compute y g + g x = x + + γ α d, otherwise update the variables as x = x + + αd Compute f + and = and s = x x + Step 7 θ + computation For the algorithm AMDY, θ + is computed as in (0) For the algorithm AMDYC, θ + is computed as in () If θ + < /4, then we set θ + = Step 8 Direction computation Compute d = θ g + β s + +, where β is computed as in (4) If 3 g + d 0 d g +, (5) then define d + = d, otherwise set d+ = g + Compute the initial guess α = α d / d, set = + and continue with step It is well nown that if f is bounded along the direction then there exists a stepsize α satisfying the Wolfe line search conditions () and (3) In our algorithm when the angle between d and g + is not acute enough, then we restart the algorithm with the negative gradient g + More sophisticated reasons for restarting the algorithms have been proposed in the literature, but we are interested in the performance of a conjugate gradient algorithm that uses this restart criterion, associated to a direction satisfying the sufficient descent condition Under reasonable assumptions, conditions (), (3) and (5) are sufficient to prove the global convergence of the algorithm he initial selection of the step length crucially affects the practical behaviour of the algorithm At every iteration the starting guess for the step α in the line search is computed as α d / d his selection, was considered for the first time by Shanno and Phua in COMI [43] It is also considered in the pacages: SCG by Birgin and Martínez [3] and in SCALCG by Andrei [-5, ] Proposition 5 Suppose that f is a uniformly convex function on the level set { : ( ) ( )} S = x f x f x, and satisfies the sufficient descent condition g d where c > 0, and 0 d or AMDYC converges linearly to d < c g, d c g, where c > 0 hen the sequence generated by AMDY x *, solution to the problem () Proof From (4) we have that f ( x + ) f( x ) for all Since follows that lim( f( x) f( x + )) = 0 f is bounded below, it 0

11 ow, since f is uniformly convex there exist positive constants m and M, such that mi f ( x) MI on S Suppose that x + αd S and x + γ mαd S for all α > 0 We have: ( a + b) f( x + γmαd) f( x + αd) b But, from uniform convexity we have the following quadratic upper bound on f ( x + αd ): f( x + αd) f( x) + αgd + Mα d herefore, f ( x + αd) f( x) αc g + Mcα g = f( x) + cα + Mcα g c Observe that for 0 α c/( Mc), cα + Mcα α which follows from the convexity of cα + ( Mc /) α Using this result we get: since ρ < / f ( x + αd) f( x) cα g f( x) ρcα g, From proposition 4 the Wolfe line search terminates with a value α ω > 0 herefore, for 0 α c/( Mc ), this provides a lower bound on the decrease in the function f, ie On the other hand, f( x + αd ) f( x ) ρcω g (5) 4 ( α Mc αc) g ω α Mc M g c ( a + b) ( Mc c ) b γmα ρ ω Mc g (53) Considering (5) and (53) we get: ( ωmc c ) f( x + d ) f( x ) c g g (54) herefore, ( ωmc c) f( x) f( x + γmαd) ρcω+ g Mc But, f( x) f( x + ) 0 and as a consequence g goes to zero, ie x converges to x * Having in view that f ( x ) is a nonincreasing sequence, it follows that f ( x ) converges to * f ( x ) From (54) we see that Combining this with we conclude: ( ωmc c ) f( x+ ) f( x) ρcω+ g Mc * g m( f( x ) f ) and subtracting * * f ( x+ ) f c( f( x ) f ), where (55) * f from both sides of (55)

12 ( ωmc c) c= m ρcω+ < Mc * herefore, f ( x ) converges to f at least as fast as a geometric series with a factor that depends on the parameter ρ in the first Wolfe condition and the bounds m and M of the Hessian So, the convergence of the acceleration scheme is at least linear 6 umerical results and comparisons In this section we present the computational performance of a Fortran implementation of the AMDY and AMDYC algorithms on a set of 750 unconstrained optimization test problems he test problems are the unconstrained problems in the CUE [4] library, along with other large-scale optimization problems presented in [] We selected 75 large-scale unconstrained optimization problems in extended or generalized form For each function we have considered ten numerical experiments with the increasing number of variables n = 000, 000,, 0000 All algorithms implement the Wolfe line search conditions with ρ = 0000 and σ = 09, and the same stopping criterion g 0 6, where is the maximum absolute component of a vector he comparisons of algorithms are given in the ALG ALG following context Let f i and f i be the optimal value found by ALG and ALG, for problem i =,, 750, respectively We say that, in the particular problem i, the performance of ALG was better than the performance of ALG if ALG ALG 3 fi fi <0 (6) and the number of iterations, or the number of function-gradient evaluations, or the CPU time of ALG was less than the number of iterations, or the number of function-gradient evaluations, or the CPU time corresponding to ALG, respectively All codes are written in double precision Fortran and compiled with f77 (default compiler settings) on an Intel Pentium 4, 8GHz worstation All these codes are authored by Andrei In the first set of numerical experiments we compare AMDY versus AMDYC In able we present the number of problems solved by these two algorithms with a minimum number of iterations (#iter), a minimum number of function and its gradient evaluations (#fg) and the minimum cpu time able Performance of AMDY versus AMDYC 750 problems AMDY AMDYC = # iter # fg CPU Both algorithms have similar performances However, subject to cpu time metric, AMDY proves to be slightly better In the second set of numerical experiments we compare AMDY and AMDYC algorithms with the Dai and Yuan (DY) algorithm Figures and present the Dolan-Moré performance profile for these algorithms subject to the cpu time metric We see that both AMDY and AMDYC is top performer, being more successful and more robust than the Dai and Yuan algorithm When comparing AMDY with the Dai and Yuan algorithm (Figure ), subject to the number of iterations, we see that AMDY was better in 69 problems (ie it achieved the minimum number of iterations in 69 problems) DY was better in 7 problems and they achieved the same number of iterations in 60 problems, etc Out of 750 problems, only for 706 of them does the criterion (6) hold

13 Fig Performance profile of AMDY versus DY Fig Performance profile of AMDYC versus DY Dai and Yuan [] studied the hybrid conjugate gradient algorithms and proposed the following two hybrid methods: hdy σ DY HS DY β = max β, min { β, β }, (6) + σ β { { β }} = max 0, min, β, (63) hdyz HS DY 3

14 where HS / β + = y g y s, showing their global convergence when the Lipschitz assumption holds and the standard Wolfe line search is used he numerical experiments of Dai and i [9] proved that the second hybrid method (hdyz) is the better, outperforming the Pola- Ribière [4] and Polya [4] method In the third set of numerical experiments we compare the Dolan-Moré performance profile of AMDY and AMDYC versus Dai-Yuan hybrid hdy conjugate gradient subject to the cpu time metric, as in Figures 3 and 4 β Fig 3 Performance profile of AMDY versus hdy Fig 4 Performance profile of AMDYC versus hdy 4

15 In the fourth set of numerical experiments, in Figures 5 and 6, we compare the Dolan- Moré performance profile of AMDY and AMDYC versus Dai-Yuan hybrid conjugate hdyz gradient subject to the cpu time metric β Fig 5 Performance profile of AMDY versus hdyz Fig 6 Performance profile of AMDYC versus hdyz 5

16 7 Conclusion We have presented a new conjugate gradient algorithm for solving large-scale unconstrained optimization problems he parameter β is a modification of the Dai and Yuan computational scheme in such a manner that the direction d generated by the algorithm satisfies the sufficient descent condition, independent of the line search Under strong Wolfe line search conditions we proved the global convergence of the algorithm We present computational evidence that the performance of our algorithms AMDY and AMDYC was higher than that of the Dai and Yuan conjugate gradient algorithm and its hybrid variants, for a set consisting of 750 problems References [] Andrei, : Conjugate gradient algorithms for large scale unconstrained optimization ICI echnical Report, January, 005 [] Andrei, : Scaled conjugate gradient algorithms for unconstrained optimization Computational Optimization and Applications, 38 (007), pp [3] Andrei, : Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization Optimization Methods and Software (007), pp [4] Andrei, : A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimization Applied Mathematics Letters 0 (007), pp [5] Andrei, : A scaled nonlinear conjugate gradient algorithm for unconstrained optimization Optimization, 57 (008), pp [6] Andrei, : An acceleration of gradient descent algorithm with bactracing for unconstrained optimization umerical Algorithms, 4 (006), pp63-73 [7] Andrei, : 40 conjugate gradient algorithms for unconstrained optimization A survey on their definition ICI echnical Report o 3/08, March 4, 008 [8] Andrei, : umerical comparison of conjugate gradient algorithms for unconstrained optimization Studies in Informatics and Control, 6 (007), pp [9] Andrei, : A Dai-Yuan conjugate gradient algorithm with sufficient descent and conjugacy conditions for unconstrained optimization Applied Mathematics Letters (008), pp 65-7 [0] Andrei, : Another hybrid conjugate gradient algorithm for unconstrained optimization umerical Algorithms, 47 (008), pp43-56 [] Andrei, : Performance profiles of conjugate gradient algorithms for unconstrained optimization Encyclopedia of Optimization, nd edition, 008, CA Floudas and PM Pardalos (Eds), entry 76, in press [] Andrei, : An unconstrained optimization test functions collection Advanced Modeling and Optimization An electronic International Journal, 0 (008), pp47-6 [3] Birgin, E, Martínez, JM: A spectral conjugate gradient method for unconstrained optimization, Applied Math and Optimization, 43 (00), pp7-8 [4] Bongartz, I, Conn, AR, Gould, IM, oint, PhL: CUE: constrained and unconstrained testing environments, ACM rans Math Software, (995), pp3-60 [5] Concus, P, Golub, GH, O Leary, DP: A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations, in JR Bunch and DJ Rose (Eds) Sparse Matrix Computations, Academic Press, ew Yor, 976, pp [6] Concus, P, Golub, GH: A generalized conjugate gradient method for nonsymmetric systems of linear equations, in R Glowinsi and JL Lions (Eds) Computing Methods in Applied Sciences and Engineering ( nd International Symposium, Versailles 975), Part I, Springer Lecture otes in Economics and Mathematical Systems 34, ew Yor, 976, pp56-65 [7] Dai, YH: ew properties of a nonlinear conjugate gradient method umer Math, 89 (00), pp

17 [8] Dai, YH, Liao, LZ: ew conjugacy con7 ditions and related nonlinear conjugate gradient methods Appl Math Optim, 43 (00), pp 87-0 [9] Dai, YH i, Q: esting different conjugate gradient methods for large-scale unconstrained optimization, J Comput Math, (003), pp3-30 [0] Dai, YH, Yuan, Y: A nonlinear conjugate gradient method with a strong global convergence property, SIAM J Optim, 0 (999), pp77-8 [] Dai, YH Yuan, Y: An efficient hybrid conjugate gradient method for unconstrained optimization Annals of Operations Research, 03 (00), pp33-47 [] Dai, YH, Han, JY, Liu, GH, Sun, DF, Yin, X, Yuan, Y: Convergence properties of nonlinear conjugate gradient methods SIAM Journal on Optimization 0 (999), [3] Dolan, ED, Moré, JJ: Benchmaring optimization software with performance profiles, Math Programming, 9 (00), pp 0-3 [4] Goldstein, AA: On steepest descent, SIAM J Control, Vol 3, pp47-5, 965 [5] Fletcher, R, Reeves, CM: Function minimization by conjugate gradients Computer Journal, 7 (964), pp49-54 [6] Gilbert, JC ocedal, J: Global convergence properties of conjugate gradient methods for optimization SIAM J Optim, (99), pp-4 [7] Golub, GH, Kahan, W: Calculating the singular values and pseudo-inverse of a matrix SIAM J umer Anal, (Series B) (965), pp 05-4 [8] Golub, GH O Leary, DP: Some history of the conjugate gradient and Lanczos algorithms: SIAM Review, 3 (976), pp50-00 [9] Golub, GH, Underwood, R, Wilinson, JH: he Lanczos algorithm for the symmetric Ax= λbxproblem echnical Report, Stanford University Computer Science Department, Stanford, California, 97 [30] Hager, WW, Zhang, H: A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM Journal on Optimization, 6 (005), 70-9 [3] Hager, WW, Zhang, H: A survey of nonlinear conjugate gradient methods Pacific journal of Optimization, (006), pp35-58 [3] Hestenes, MR, Stiefel, E: Methods of conjugate gradients for solving linear systems, J Research at Bur Standards Sec B 48, pp , 95 [33] Hu, YF, Storey, C: Global convergence result for conjugate gradient methods JOA, 7, (99), pp [34] Lanczos, C: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, J Res at Bur Standards, 45 (950), pp55-8 [35] Lanczos, C: Solution of systems of linear equations by minimized iterations, J Res at Bur Standards, 49 (95), pp33-53 [36] Liu, DC, ocedal, J: On the limited memory BFGS method for large scale optimization methods Mathematical Programming, 45 (989), pp [37] Liu, Y, Storey, C: Efficient generalized conjugate gradient algorithms, Part : heory JOA, 69 (99), pp9-37 [38] O Leary, DP: Conjugate gradients and related KMP algorithms: he beginnings In L Adams and JL azareth (Eds) Linear and onlinear Conjugate Gradient Related Methods SIAM, Philadelphia, 996, pp-8 [39] ocedal, J: Conjugate gradient methods and nonlinear optimization, in L Adams and JL azareth (Eds) Linear and onlinear Conjugate Gradient Related Methods, SIAM, Philadelphia, 996, pp9-3 [40] Perry, JM: A class of conjugate gradient algorithms with a two-step variable-metric memory, Discussion Paper 69, Center for Mathematical Studies in Economic and Management Sciences, orthwestern University, Evanston, Illinois, 977 [4] Pola, E, Ribière, G : ote sur la convergence de méthodes de directions conjuguée, Revue Francaise Informat Recherche Opérationnelle, 3e Année 6 (969), pp35-43 [4] Polya, B: he conjugate gradient method in extreme problems,ussr Comp Math Math Phys, 9 (969), pp94-7

18 [43] Shanno, DF, Phua, V: Algorithm 500, Minimization of unconstrained multivariate functions, ACM rans on Math Soft,, pp87-94, 976 [44] Shanno, DF: On the convergence of a new conjugate gradient algorithm,siam J umer Anal, 5 (978), pp47-57 [45] Shanno, DF: Conjugate gradient methods with inexact searches Math Oper Res, 3 (978), pp44-56 [46] ouati-ahmed, D, Storey, C: Efficient hybrid conjugate gradient techniques, Journal Of Optimization heory and Applications, 64 (990), pp June 8, 008 8

New hybrid conjugate gradient algorithms for unconstrained optimization

New hybrid conjugate gradient algorithms for unconstrained optimization ew hybrid conjugate gradient algorithms for unconstrained optimization eculai Andrei Research Institute for Informatics, Center for Advanced Modeling and Optimization, 8-0, Averescu Avenue, Bucharest,