and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s

Similar documents
GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

New hybrid conjugate gradient methods with the generalized Wolfe line search

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

Modification of the Armijo line search to satisfy the convergence properties of HS method

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

Step-size Estimation for Unconstrained Optimization Methods

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Global Convergence Properties of the HS Conjugate Gradient Method

230 L. HEI if ρ k is satisfactory enough, and to reduce it by a constant fraction (say, ahalf): k+1 = fi 2 k (0 <fi 2 < 1); (1.7) in the case ρ k is n

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

A Modified Hestenes-Stiefel Conjugate Gradient Method and Its Convergence

New Hybrid Conjugate Gradient Method as a Convex Combination of FR and PRP Methods

New Inexact Line Search Method for Unconstrained Optimization 1,2

Unconstrained optimization

An Efficient Modification of Nonlinear Conjugate Gradient Method

Jorge Nocedal. select the best optimization methods known to date { those methods that deserve to be

NUMERICAL COMPARISON OF LINE SEARCH CRITERIA IN NONLINEAR CONJUGATE GRADIENT ALGORITHMS

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

154 ADVANCES IN NONLINEAR PROGRAMMING Abstract: We propose an algorithm for nonlinear optimization that employs both trust region techniques and line

New hybrid conjugate gradient algorithms for unconstrained optimization

Chapter 4. Unconstrained optimization

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Programming, numerics and optimization

Open Problems in Nonlinear Conjugate Gradient Algorithms for Unconstrained Optimization

Step lengths in BFGS method for monotone gradients

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize

Nonlinear Programming

Adaptive two-point stepsize gradient algorithm

Methods for a Class of Convex. Functions. Stephen M. Robinson WP April 1996

Global convergence of a regularized factorized quasi-newton method for nonlinear least squares problems

On the convergence properties of the projected gradient method for convex optimization

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

1. Introduction In this paper we study a new type of trust region method for solving the unconstrained optimization problem, min f(x): (1.1) x2< n Our

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

Quasi-Newton methods for minimization

1 Numerical optimization

Chapter 10 Conjugate Direction Methods

Research Article Nonlinear Conjugate Gradient Methods with Wolfe Type Line Search

Bulletin of the. Iranian Mathematical Society

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

A derivative-free nonmonotone line search and its application to the spectral residual method

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Preconditioned conjugate gradient algorithms with column scaling

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

5 Quasi-Newton Methods

1 Numerical optimization

Lecture 10: September 26

Modification of the Wolfe line search rules to satisfy the descent condition in the Polak-Ribière-Polyak conjugate gradient method

Line Search Methods for Unconstrained Optimisation

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Algorithms for Nonsmooth Optimization

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

A Modified Polak-Ribière-Polyak Conjugate Gradient Algorithm for Nonsmooth Convex Programs

Gradient-Based Optimization

Higher-Order Methods

GRADIENT = STEEPEST DESCENT

The Conjugate Gradient Algorithm

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

1 Solutions to selected problems

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On memory gradient method with trust region for unconstrained optimization*

Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization

Convex Optimization. Problem set 2. Due Monday April 26th

Modification of the Wolfe line search rules to satisfy the descent condition in the Polak-Ribière-Polyak conjugate gradient method

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

A Nonlinear Conjugate Gradient Algorithm with An Optimal Property and An Improved Wolfe Line Search

A quasisecant method for minimizing nonsmooth functions

8 Numerical methods for unconstrained problems

1. Introduction The nonlinear complementarity problem (NCP) is to nd a point x 2 IR n such that hx; F (x)i = ; x 2 IR n + ; F (x) 2 IRn + ; where F is

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization M. Al-Baali y December 7, 2000 Abstract This pape

Nonlinear conjugate gradient methods, Unconstrained optimization, Nonlinear

f = 2 x* x g 2 = 20 g 2 = 4

GRADIENT METHODS FOR LARGE-SCALE NONLINEAR OPTIMIZATION

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Research Article A New Conjugate Gradient Algorithm with Sufficient Descent Property for Unconstrained Optimization

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)

IE 5531: Engineering Optimization I

Applied Mathematics &Optimization

Unconstrained minimization of smooth functions

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

IE 5531: Engineering Optimization I

Gradient method based on epsilon algorithm for large-scale nonlinearoptimization

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

SOLUTION OF NONLINEAR COMPLEMENTARITY PROBLEMS

A Proximal Method for Identifying Active Manifolds

Second Order Optimization Algorithms I

A DIMENSION REDUCING CONIC METHOD FOR UNCONSTRAINED OPTIMIZATION

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

The Conjugate Gradient Method

QUADRATICALLY AND SUPERLINEARLY CONVERGENT ALGORITHMS FOR THE SOLUTION OF INEQUALITY CONSTRAINED MINIMIZATION PROBLEMS 1

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a

Stochastic Optimization Algorithms Beyond SG

5 Overview of algorithms for unconstrained optimization

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

FINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING. 1. Introduction. We study the nonlinear programming problem

Spectral gradient projection method for solving nonlinear monotone equations

Transcription:

Global Convergence of the Method of Shortest Residuals Yu-hong Dai and Ya-xiang Yuan State Key Laboratory of Scientic and Engineering Computing, Institute of Computational Mathematics and Scientic/Engineering Computing, Chinese Academy of Sciences, P. O. Box 79, Beijing 00080, P. R. China Email address: dyh@indigo4.cc.ac.cn, yyx@lsec.cc.ac.cn. Abstract The method of shortest residuals (SR) was presented by Hestenes and studied by Pitlak. If the function is quadratic, and if the line search is exact, then the SR method reduces to the linear conjugate gradient method. In this paper, we put forward the formulation of the SR method when the line search is inexact. We prove that, if stepsizes satisfy the strong Wolfe conditions, both the Fletcher-Reeves and Polak-Ribiere-Polyak versions of the SR method converge globally. When the Wolfe conditions are used, the two versions are also convergent provided that the stepsizes are uniformly bounded; if the stepsizes are not bounded, an example is constructed to show that they need not converge. Numerical results show that the SR method is a promising alternative of the standard conjugate gradient method. Mathematics Subject Classication: 65k, 90c.. Introduction The conjugate gradient method is particularly useful for minimizing functions of many variables min f(x); (.) x< n because it does not require to store any matrices. It is of the form?gk ; for k = ; d k =?g k + k d k? ; for k, (.) x k+ = x k + k d k ; (.3) where g k = rf(x), k is a scalar, and k is a stepsize obtained by a line search. Well-known formulas for k are called the Fletcher-Reeves (FR), Polak-Ribiere-Polyak (PRP) formulas (see [, 8, 9]. They are given by F R k = kg kk kg k? k (.4) This work was supported by National Science Foundation of China (9550).

and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of shortest residuals(sr). The SR method was presented by Hestenes in his monograph [] on conjugate direction methods. In the SR method the search direction d k is taken as the shortest vector of the form d k =?g k + k d k? + k ; 0 < k < ; (.6) where d =?g. It follows that d k is vertical to g k + d k?, and hence its parameter satises the relation (?g k + k d k? ) T (g k + d k? ) = 0: (.7) Assuming that the line search is exact, Hestenes obtains the solution of (.7): k = kg kk kd k? k : (.8) If the function is quadratic, and if the line search is exact, the vector d k+ can be proved to be also the shortest residual in the k-simplex whose vertices are?g,,?g k+ (see []). The SR method can be viewed as a special case of the conjugate subgradient method developed in Wolfe [4] and Lemarechal [4] for minimizing a convex function which may be nondierentiable. Based on this fact, Pitlak [7] also called the above as Wolfe-Lemarechal method. To investigate the equivalent form of the SR method for general functions and construct other conjugate gradient methods, Pitlak [7] introduces the family of methods: d k =?Nrfg k ;? k d k? g; (.9) where Nrfa; bg is dened as the point from a line segment spanned by the vectors a and b which has the smallest norm, namely, knrfa; bgk = minfka + (? )bk : 0 g: (.0) If g T k d k?=0, one can solve (.9) analytically and obtain and d k =?(? k )g k + k k d k? (.) k = kg k k kg k k + kkd k? k : (.) In this case, it is easy to see that direction d k dened by (.6) and (.8) is corresponding to (.)-(.) with k = : (.3) Pitlak points out in [7] that if the line search is exact, and if k, the SR method is equivalent to the FR method, and the corresponding formula of k of the PRP method is k = kg k k jg T k (g k? g k? )j : (.4)

One may argue that the corresponding formula of k of the PRP method should be k = kg k k g T k (g k? g k? ) : (.5) Note that if the line search is exact, the reduced method by (.5) will produce the same iterations as the PRP method. Powell []'s examples can also be used to show that the method may cycle without approaching any solution point even if the line search is exact. Thus we take (.4) instead of (.5). For clarity, we now call method (.3), (.) and (.) as the SR method provided that scalar k is such that the reduced method is the linear conjugate gradient method when the function is quadratic and the line search is exact. At the same time, we call formulae (.3) and (.4) for k as the FR and PRP versions of the SR method, and abbreviate them by FRSR and PRPSR respectively. In this paper, we will investigate the convergence properties of the FRSR and PRPSR methods with the stepsize satisfying the Wolfe conditions: f(x k )? f(x k + k d k )? k g T k d k; (.6) g(x k + k d k ) T d k g T k d k ; (.7) or the strong Wolfe conditions, namely, (.6) and jg(x k + k d k ) T d k j jg T k d kj; (.8) where 0 < < <. In [7], Pitlak suggests the following search conditions: f(x k )? f(x k + k d k ) k kd k k ; (.9) g(x k + k d k ) T d k?kd k k ; (.0) where 0 < < <, and he concludes that there exists a procedure which nds k > 0 satisfying (.9)-(.0) in a nite number of operations (see Lemma in [7]). However it is possible that ^ = 0 in his proof, the statement is not true. Consider f(x) = x ; x < (.) where > 0 is constant. Suppose that at the k-th iteration x k = and d k =? are obtained. Then for any > 0, f(x k )? f(x k + d k ) =? (? ) =? : (.) The above implies that (.9) does not hold for any > 0, provided that <. It is worth noting that, for the SR method, we often have that g T d k k =?kd k k, which implies that (.9)-(.0) are equivalent to (.6)-(.7). This paper is organized as follows. In the next section, we give the formula of the SR method under inexact line searches and provide an algorithm in which some safeguards are employed. In Section 3, we prove that the FRSR and PRPSR algorithms converge with the strong Wolfe conditions (.6) and (.8). If the Wolfe conditions (.6)-(.7) are used, convergence is also guaranteed provided that the stepsizes are uniformly bounded. One direct corollary is that the FRSR and PRPSR algorithms with (.6)-(.7) converge for strictly convex functions. In Section 4, a function is constructed to show that, if we do not restrict f k g, the Wolfe conditions can guarantee neither algorithms to converge. The 3

example suits for both the FRSR and PRPSR algorithms. Numerical results are reported in Section 5, which show that the SR method is a promising alternative of the standard conjugate gradient method.. Algorithm Expression (.) is deduced in the case of the exact line search. If the line search is inexact, it is not dicult to deduce from (.9) that scalar k in (.) is given by k = kg kk + k g T k d k? kg k + k d k? k ; (.) if we do not restrict k [0; ]. By direct calculations, we can obtain and g T k d k =?kd k k (.)? kd k k = k kgk k kd k? k? (g T k d k? ) : (.3) kg k + k d k? k It follows from (.) that d k is a descent direction unless d k 6= 0. On the other hand, from (.3) we see that d k = 0 if and only if g k and d k? are collinear. In the case when g k and d k? are collinear, assuming that k d k? = tg k, we can solve d k from (.9) and (.0) as follows: (?gk ; if t <?; d k = tg k ; if? t 0; (.4) 0; if t > 0. The direction d k vanishes if t 0. The following example shows this possibility. Consider f(x; y) = ( + ) (x + y ): (.5) Given the starting point x = (; ) T, one can test the unit stepsize will satisfy (.6) and (.8) if the parameters and such that (+)(? ). It follows that x = (?;?) T, and hence that t = = for the FRSR method, t = =( + ) for the PRPSR method. Thus we have t > 0 for both methods. In practice, to avoid numerical overows we restart the algorithm with d k =?g k if the following condition is satised: (.6) jg T k d k? j b kg k kkd k? k; (.7) where b is a positive number. For the PRPSR method, it is obvious that the denominator of k possibly vanishes. Thus to establish convergence results for the PRPSR method we must assume that jg T (g k k? g k? )j > 0. In practice, we test the following condition at every iteration: jg T k (g k? g k? )j > b kg k k ; (.8) where 0 b <. Now we give a general algorithm as follows. 4

Algorithm. Given a starting point x. Choose I f0; g, and numbers 0 < b, 0 b <, 0. Step. Compute kg k. If kg k, stop; otherwise, let k =. Step. Compute d k =?g k. Step 3. Find k > 0 satisfying certain line search conditions. Step 4. Compute x k+ by (.3) and g k+. Let k = k +. Step 5. Compute kg k k. If kg k k, stop. Step 6. Test (.7): if (.7) does not hold, go to step (). If I =, go to Step 8. Step 7. Compute k by (.3), go to Step 9. Step 8. Test (.8): if (.8) does not hold, go to Step ; compute k by (.4). Step 9. Compute k by (.) and d k by (.), go to Step 3. We call the above with I = 0 and I = as the FRSR and PRPSR algorithms respectively. 3. Global Convergence Throughout this section we make the following assumption. Assumption 3. () f is bounded below in < n and is continuously dierentiable in a neighbor N of the level set L = fx < n : f(x) f(x )g; () The gradient rf(x) is Lipschitz continuous in N, namely, there exists a constant L > 0 such that krf(x)? rf(y)k Lkx? yk; for any x; y N : (3.) For some references, we formulate the following assumption. Assumption 3. The level set L = fx < n : f(x) f(x )g is bounded. Note that Assumptions 3. and 3. imply that there is a constant such that kg(x)k ; for all x L: (3.) We also formulate the following assumption. Assumption 3.3 The function f(x) is twice continuously dierentiable in N, and there are numbers 0 < such that kyk y T H(x)y kyk ; for all x N and y < n ; (3.3) where H(x) denotes the Hessian matrix of f at x. Under Assumption 3., we state a useful result which was essentially proved by Zoutendijk [5] and Wolfe [, 3]. 5

Theorem 3.4 Let x be a starting point for which Assumption 3. is satised. Consider any iteration of the form (.), where d k is a descent direction and k satises the Wolfe conditions (.6)-(.7). Then X k (g T k d k ) kd k k < : (3.4) The above gives the following result: Corollary 3.5 Let x be a starting point for which Assumption 3. is satised. Consider Algorithm. where b = ; b = 0, = 0, and where the line search conditions are (.6)- (.7). Then X k kd k k < : (3.5) Proof If the algorithm restarts, we also have (.) due to (.6). So Theorem 3.4 and relation (.) give the corollary. If Algorithm. restarts at the k-th iteration, we have kg k k = kd k k. Thus (3.5) implies lim inf k! kg k k = 0 if Algorithm. restarts for innitely many times. Thus we suppose with loss of generality that (.7) and (.8) always hold in Algorithm.. In addition, we also suppose that g k 6= 0 for all k since otherwise a stationary point has been found. First, we have the following theorem for the FRSR method. Theorem 3.6 Let x be a starting point for which Assumption 3. is satised. Consider Algorithm. with I = 0, b = and = 0. Then we have that lim inf k! kg kk = 0; (3.6) if the line search satises one of the following conditions: (i) g T k d k? = 0; (ii) the strong Wolfe conditions (.6)-(.8); (iii) the Wolfe conditions (.6)-(.7), and there exists M < such that k M; for all k: (3.7) Proof We proceed by contradiction and assume that lim inf k! kg kk 6= 0: (3.8) Then there exists a constant > 0 such that kg k k ; for all k : (3.9) From (.3) and (.3), we obtain kd k k = kd k? k ( + r k); (3.0) 6

where kd k? k + g T d k k? + (gt d k k?) kd r k = k? k kg k k? (gt d : (3.) k k?) kd k? k Using (3.0) recursively, we get that kd k k = kd k ky i= ( + r i ): (3.) Noting that r k > 0, we deduce from (3.5) and (3.) that X k r k = (3.3) because otherwise we have that =kd k k converges which contradicts with (3.5). For (i), since g T k d k? = 0, r k can be rewritten as r k = kd k?k kg k k : (3.4) From this and (3.9), we obtain X k r k X k kd k? k < ; (3.5) which contradicts (3.3); For (ii) and (iii), we conclude that there exists a positive number c such that for all k, jg T k d k?j c kd k? k : (3.6) In fact, for (ii), we see from (.8) and (.) that (3.6) with c =. For (iii), we have from (3.), (.) and (3.7) that jg T k+ d kj = j(g k+? g k ) T d k + g T k d kj kg k+? g k kkd k k + jg T k d k j k Lkd k k + kd k k ( + LM)kd k k : (3.7) So (3.6) holds with c = + LM. Due to (3.5), we see that kd k k! 0; as k! ; (3.8) which implies that there exists an integer k such that p kd k k ; for all k k : (3.9) c Applying (3.6) and this in (3.), we obtain for all k k r k c kd k? k ; (3.0) P where c = ( + c ) =. Therefore we also have that k r k <, which contradicts (3.3). The contradiction shows the truth of (3.6). 7

Corollary 3.7 Let x be a starting point for which Assumption 3.3 is satised. Consider Algorithm. with I = 0, b = and = 0. Then if the line search satises (.6)-(.7), we have (3.6). Proof get that From (iii) of the above theorem, it suces to show (3.7). By Taylor's theorem, we f(x k + d k ) = f(x k ) + g T k d k + d T k H( k)d k ; (3.) for some k. This, (.6), (3.3) and (.) imply that (3.7) holds with M = (? )=, which completes the proof. For the PRPSR method, we rst have the following theorem. Theorem 3.8 Let x be a starting point for which Assumption 3. are satised. Consider Algorithm. with I =, b =, b = 0 and = 0. Then we have (3.6) if the line search satises (iii). Proof We still proceed by contradiction and assume that (3.9) holds. It is obvious that if the line search satises (iii), (3.6) still holds. From (.4), (3.), (3.7) and (3.9), we obtain k kd k? k = kg kk kd k? k jg T k (g k? g k? )j kg k k kd k? k Lkg k k k? kd k? k c 3; (3.) where c 3 = =LM. Using this and (3.9) in (.3), we have the following estimation of =kd k k for all k : So kd k k = X k which implies lim inf k! Note that and kd k k?k + g T d k k? k kg k k kd k? k + k kd k? k + kg k k kg k k? (gt k d k?) kg k k kd k? k?? (gt d k k?)? kg k k kd k? k (c? 3 +? )? (gt k d k? )? : (3.3) kg k k kd k? k? (gt d k k?) < ; (3.4) kg k k kd k? k (g T k d k? ) = : (3.5) kg k k kd k? k g k? = g k? y k? ; (3.6) lim inf k! ky k?k = lim inf k! k?kd k? k = 0: (3.7) 8

(3.5), (3.6) and (3.7) implies that lim inf k! (gk?d T k? ) = : (3.8) kg k? k kd k? k On the other hand, we have from (.), (3.9) and (3.8) that lim inf k! (gk? T d k?) = lim inf kg k? k kd k? k k! kd k? k = 0: (3.9) kg k? k (3.8) and (3.9) give a contradiction. The contradiction shows the truth of (3.6). From Theorem 3.8, one can see that the PRPSR method also converges for strictly convex functions provided that stepsizes satisfy (.6)-(.7). Corollary 3.9 Let x be a starting point for which Assumption 3.3 is satised. Consider Algorithm. with I =, b =, b = 0 and = 0. Then if the line search satises (.6)-(.7), we have (3.6). Theorem 3.0 Let x be a starting point for which Assumptions 3. and 3. are satised. Consider Algorithm. with I =, b =, b = 0 and = 0. Then we have (3.6) if the line search satises (.6) and (.8). Proof We proceed by contradiction and assume that (3.9) holds. At rst, we conclude that there must exist a number c 4 such that k kd k? k c 4 ; for all k : (3.30) This is because otherwise there exists a subsequence fk i g and a constant, say also c 3, such that ki kd ki k c 3 ; for all i : (3.3) Then we can prove the truth of (3.3) for the subsequence fk i g, which together with (3.5) implies that (3.5) still holds. Then similarly to the proof of Theorem 3.8, we can obtain (3.8) and (3.9), which contradicts with each other. Therefore (3.30) holds. From (.3), (3.9), (3.) and (3.30), we can get that for all k k kd k k kkg k k kd k? k kg k + k d k? k kkg k k kd k? k 4(kg k k + k kd k?k ) c 5 kkd k? k ; (3.3) where c 5 = =4( + c 4). Making products of both sides in (.) with d k (.), we can get and applying k d T k?d k = kd k k : (3.33) Dene u k = d k =kd k k. Then from (3.33), k 0, (.3), (3.6) and (3.3), we obtain ku k? u k? k =? dt k? d k kd k? kkd k k =? kd kk k kd k? k k kd k?k? kd k k kkd k? k = ( kkd k? k + g T k d k?) kd k? k kg k + k d k? k kkd k? k 4 + (g T k d k? ) kd k? k kg k + k d k? k c? 5 kd kk + c kd k? k kg k + k d k? k : (3.34) 9

Besides it, from (.8), (.) and (3.30), kg k + k d k? k kg k k + k g T d k k? kg k k? k kd k? k kg k k? c 4 kd k? k: (3.35) Noting (3.5) and (3.9), we can deduce from (3.34) and (3.35) that X k ku k? u k? k < : (3.36) Let jkk; j denote the number of elements of K k; = fi : k i k +? ; ks i? k = kx i? x i? k > g: (3.37) Using (3.36), we conclude that for any > 0, there exists N and k 0 such that jk k;j ; for any k k 0; (3.38) otherwise by the proof of Theorem 4.3 in [3] a contradiction can be similarly obtained. We choose b = = and = L=c 5b. Then we have from (3.) and (3.9) that k kg k k kg k k + kg k? k = b; (3.39) and when ks k? k, we have from (3.), k kg kk Lks k? k L = (3.40) b; c 5 For this, let, k 0 be so given that (3.38) holds. Denote q = [(k? k + )=] and t = k? k +? q. It is obvious that k? k + = q + t and 0 t <. Thus for any k k = maxfk ; k 0 g, we get from (3.3), and (3.38)-(3.40) that kd k k kd k k ky i=k (c 5 i ) 3 q 4 c q+t 5 b q +t 5 c 5 b = (c 5 b) t minfc 5 b; (c 5 b) g; (3.4) which contradicts (3.8). The contradiction shows the truth of (3.6). 4. A counter-example In the above section, we prove that the FRSR and PRPSR algorithms converge with the Wolfe conditions (.6)-(.7) provided that f k g is uniformly bounded. However, if we do not restrict the size of k, they need not converge. This will be shown in the following example. The example suits for both algorithms. Example 4. Consider the function f(x; y) = ( 0; if (x; y) = (0; 0); 4 ( p x x + y )(x + y) + 4 ( p y x + y )(x? y) ; otherwise, (4.) 0

where is dened by (t) = 8 >< >: ; p for jtj > 3 ; + sin(b jtj + b ); for p jtj 0; for jtj < p : p 3 ; (4.) In (4.), b = ( p 3 + p ), b =?(5= + p 6). It is easy to see that function f dened by (4.) and (4.) satises Assumption 3.. We construct an example such that for all k ; cos k?! x k = kx k k sin k? ; (4.3) kx k k = kx k? k tan( 4? k?); k? (0; ); (4.4) 8 g k = kx cos( k? kk +! 4 ) sin( k? + 4 ) ; (4.5) d k = kg k k sin k cos( k + 4 + k) sin( k + 4 + k)! : (4.6) Because we can use, for example spline tting, we can choose the starting point and assume that (4.3)-(4.6) hold for k =. Suppose that (4.3)-(4.6) hold for k. From (4.3), (4.6) and the denition of f, we can choose > 0 such that! cos k x k+ = kx k+ k sin k ; (4.7) kx k+ k = kx k k tan( 4? k): (4.8) Thus we have that g k+ = kx k+k cos( k +! 4 ) sin( k + 4 ) : (4.9) Since g k+ is orthogonal to g k, we have gk+ T g k = 0 and hence k = for both the FRSR and PRPSR algorithms. Dening k = k =(? k ), where k is given by (.), direct calculations show that g T k+d k = kg k+ kkd k k cos k ; (4.0) kg k+ k = kg k k tan( 4? k); (4.) kd k k = kg k k sin k (4.) and consequently k = kg k+k + g T k+d k kd k k + g T k+d k = = tan ( 4? k) + cos k tan( 4? k) sin k cos k tan( 4? k) sin k + sin k tan( 4? k) k ; (4.3) sin k

where Therefore k = cos k? sin 3 k + cos k sin k : (4.4) d k+ = (? k )(?g k+ + k+ d k ) "! cos( k + = k + 4 ) cos( k + sin( k + + 4 ) + k + 4 + k) sin( k + + 4 + k)!# ; (4.5) where k = (? k ) tan( 4? k)kg k k. Noting that k for all k (0; =8), the above relation indicates that d k+ kd k+ k = cos(k + +! 4 + k+) sin( k + + 4 + (4.6) k+) and? 4 + k k+ k : (4.7) Because d T k+ g k+ < 0, we have that 0 < k+ k : (4.8) Again we have k+ (0; 8 ). Because d T k+ g k+ =?kd k+ k, we have cos( k + d k+ = kd k+ k sin k+ + 4 + k+) sin( k + + 4 + k+)! : (4.9) By induction, we have (4.3)-(4.6) hold for all k. Now we test whether (.6)-(.7) hold. First, due to g T k+ d k > 0, (.7) obviously holds. Further, from (.), (4.3) and (4.5), we have? k g T k d k = k kd k k = kd k kkx k+? x k k = kg kkkd k k cos k + sin k : (4.0) By the denition of f, (4.) and (4.), where f k? f k+ = (kg kk? kg k+ k ) = kg kk h? tan ( 4? k) k = = cos k sin k (cos k + sin k ) kg kk = cos kkg k kkd k k (cos k + sin k ) =? k k g T k d k; (4.) cos k cos k + sin k : (4.) Note that (4.8) implies that k! 0; as k! : (4.3) The above two relations indicate that for any positive number <, k > for large k. Therefore, if we choose a suitable starting point, (.6) and (.7) hold for any 0 < < <. i

However, from (4.4) and (4.8), one can prove that Y k? kx k k = kx k tan( 4? i)! c 6 kx k; as k! (4.4) i= where c 6 = Q i= tan( 4? i) > 0. Thus any cluster point of fx k g is a non-stationary point. In this example, we have from (4.0) that k = kg k k (cos k + sin k )kd k k = ; (4.5) (cos k + sin k ) sin k which, together with (4.3), implies that k! ; as k! : (4.6) The above example shows that if we only impose (.6)-(.7) on every line search, it is possible that f k g is not bounded, and the FRSR and PRPSR algorithms fail. 5. Numerical results We tested the FRSR and PRPSR algorithms on SGI Indigo workstations. Our line search subroutine computes k such that (.6) and (.8) hold for = 0:0 and = 0:. The initial value of k is always set to. Although in this case the convergence results, i. e., Theorems 3.6 and 3.0 hold for any numbers 0 < b and 0 b <, we choose b = 0:9 and b = 0: to avoid numerical overows. We compared the numerical results of our algorithms with the Fletcher-Reeves method and the Polak-Ribiere-Polyak method. For the PRP algorithm, we restart it by setting d k =?g k whenever a down-hill search direction is not produced. We tested the algorithms on the 8 examples given by More, Garbow and Hillstrom [6]. The results are reported in Table 4.. The column \P" denotes the number of the problems, and \N" the number of variables. The numerical results are given in the form of I/F/G, where I, F, G are numbers of iterations, function evaluations, and gradient evaluations respectively. The stopping condition is kg k k 0?6 : (5.) The algorithms are also terminated if the number of function evaluations exceed 5000. We also terminate the calculation if the function value improvement is too small. More exactly, algorithms are terminated whenever [f(x k )? f(x k+ )]=[ + jf(x k )j]) 0?6 : (5.) 3

Table 4. P N FRSR PRPSR FR PRP 3 90/66/6 53/68/78 06/358/33 6/4/93 6 30/768/743 58/395/375 37/86/434 6/65/05 3 3 3/7/5 3/7/5 3/7/5 3/7/5 4 >5000 F ailed >5000 F ailed 5 3 7/7/63 9/3/6 >5000 3/39/9 6 6 3/6/7 3/6/7 3/6/7 3/5/9 7 9 66/077/899 >5000 44/43/456 >5000 8 8 30//3 30/30/8 8/8/56 F ailed 9 3 9/69/46 /9/9 5/40/4 0/8/8 0 F ailed 4/56/ F ailed F ailed 4 36/67/5 F ailed F ailed 3/5/5 3 F ailed 84/78/30 >5000 579/687/965 3 0 >5000 6/5/5 >5000 53/06/0 4 4 63/3/6 44/53/9 5/583/40 /04/59 5 6 449/30/899 4/9/65 >5000 87/569/47 6 /59/40 9/53/36 3/9/45 9/8/6 7 4 >5000 49/69/86 36/4737/306 40/589/65 8 8 78/64/09 47/5/7 63/04/78 8/8/39 In the table, a superscript \*" indicates that the algorithm terminated due to (5.) but (5.) is not satised, and \F ailed" means that d k is so small that a numerical overow happens while the algorithm tries to compute f(x k + d k ): From Table 4., We found that the FRSR and PRPSR algorithms perform better than the Fletcher-Reeves and Polak-Ribiere-Polyak algorithms respectively. Therefore, the SR method will be a promising alternative of the standard conjugate gradient method. It is known that if a very small step is produced by the FR method, then this behavior may continue for a very large number of iterations unless the method is restarted. This behavior was observed and explained by Powell [0], Gilbert and Nocedal [3]. In fact, the statement is also true for the FRSR method. Let k denote the angle between?g k and d k. It follows from the denition of k and (.) that cos k =?gt k d k kg k kkd k k = kd kk kg k k : (5.3) Suppose that at k-iteration a \bad" search direction is generated, such that cos k 0, and that x k+ x k. Thus kg k+ k kg k k, and by (5.3), kd k k kg k k kg k+ k: (5.4) From (.), (.3), (.8) and this, we see that k! ; as k! ; (5.5) which with (.) implies that d k+ d k : (5.6) Hence, by this and (5.4), kd k+ k kg k+ k; (5.7) which together with (5.3) shows that cos k+ 0. The argument can therefore start all over again. 4

Acknowledgements. The authors were very grateful to the referees for their valuable comments and suggestions. References [] Fletcher R. and Reeves C. (964), Function minimization by conjugate gradients, Comput. J. 7, 49-54. [] Hestenes M. R. (980), Conjugate direction methods in optimization, Springer-Verlag, New York Heidelberg Berlin, 4-47. [3] Gilbert J. C. and Nocedal J. (99), Global convergence properties of conjugate gradient methods for optimization, SIAM. J. Optimization. (), -4. [4] Lemarechal C. (975), An extension of Davidon methods to nondierentiable problems, Mathematical Programming Study 3, 95-09. [5] McCormick G. P. and Ritter K. (975), Alternative Proofs of the convergence properties of the conjugate-gradient method, JOTA: VOL. 3, NO. 5, 497-58 [6] More J. J., Garbow B. S. and Hillstrom K. E. (98), Testing unconstrained optimization software, ACM Transactions on Mathematical Software 7, 7-4 [7] Pitlak R. (989), On the convergence of conjugate gradient algorithm, IMA Journal of Numerical Analysis 4, 443-460. [8] Polak E. and Ribiere G. (969), Note sur la convergence de directions conjugees, Rev. Francaise Informat Recherche Operationelle, 3e Annee 6, 35-43. [9] Polyak B. T. (969), The conjugate gradient method in extremem problems, USSR Comp. Math. and Math. Phys. 9, 94-. [0] Powell M. J. D. (977), Restart procedures of the conjugate gradient method, Math. Program., 4-54. [] Powell M. J. D. (984), Nonconvex minimization calculations and the conjugate gradient method, in: Lecture Notes in Mathematics vol. 066, Springer-Verlag (Berlin) (984), pp. -4. [] Wolfe P. (969), Convergence conditions for ascent methods, SIAM Review, 6-35. [3] Wolfe P. (97), Convergence conditions for ascent methods. II: Some corrections, SIAM Review 3, 85-88. [4] Wolfe P. (975), A method of conjugate subgradients for minimizing nondierentiable functions, Math. Prog. Study 3, 45-73. [5] Zoutendijk G. (970), Nonlinear Programming, Computational Methods, in: Integer and Nonlinear Programming (J.Abadie,ed.), North-Holland (Amsterdam), 37-86. 5