and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s

Size: px

Start display at page:

Download "and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s"

Rodney Davis
6 years ago
Views:

1 Global Convergence of the Method of Shortest Residuals Yu-hong Dai and Ya-xiang Yuan State Key Laboratory of Scientic and Engineering Computing, Institute of Computational Mathematics and Scientic/Engineering Computing, Chinese Academy of Sciences, P. O. Box 79, Beijing 00080, P. R. China address: Abstract The method of shortest residuals (SR) was presented by Hestenes and studied by Pitlak. If the function is quadratic, and if the line search is exact, then the SR method reduces to the linear conjugate gradient method. In this paper, we put forward the formulation of the SR method when the line search is inexact. We prove that, if stepsizes satisfy the strong Wolfe conditions, both the Fletcher-Reeves and Polak-Ribiere-Polyak versions of the SR method converge globally. When the Wolfe conditions are used, the two versions are also convergent provided that the stepsizes are uniformly bounded; if the stepsizes are not bounded, an example is constructed to show that they need not converge. Numerical results show that the SR method is a promising alternative of the standard conjugate gradient method. Mathematics Subject Classication: 65k, 90c.. Introduction The conjugate gradient method is particularly useful for minimizing functions of many variables min f(x); (.) x< n because it does not require to store any matrices. It is of the form?gk ; for k = ; d k =?g k + k d k? ; for k, (.) x k+ = x k + k d k ; (.3) where g k = rf(x), k is a scalar, and k is a stepsize obtained by a line search. Well-known formulas for k are called the Fletcher-Reeves (FR), Polak-Ribiere-Polyak (PRP) formulas (see [, 8, 9]. They are given by F R k = kg kk kg k? k (.4) This work was supported by National Science Foundation of China (9550).

2 and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of shortest residuals(sr). The SR method was presented by Hestenes in his monograph [] on conjugate direction methods. In the SR method the search direction d k is taken as the shortest vector of the form d k =?g k + k d k? + k ; 0 < k < ; (.6) where d =?g. It follows that d k is vertical to g k + d k?, and hence its parameter satises the relation (?g k + k d k? ) T (g k + d k? ) = 0: (.7) Assuming that the line search is exact, Hestenes obtains the solution of (.7): k = kg kk kd k? k : (.8) If the function is quadratic, and if the line search is exact, the vector d k+ can be proved to be also the shortest residual in the k-simplex whose vertices are?g,,?g k+ (see []). The SR method can be viewed as a special case of the conjugate subgradient method developed in Wolfe [4] and Lemarechal [4] for minimizing a convex function which may be nondierentiable. Based on this fact, Pitlak [7] also called the above as Wolfe-Lemarechal method. To investigate the equivalent form of the SR method for general functions and construct other conjugate gradient methods, Pitlak [7] introduces the family of methods: d k =?Nrfg k ;? k d k? g; (.9) where Nrfa; bg is dened as the point from a line segment spanned by the vectors a and b which has the smallest norm, namely, knrfa; bgk = minfka + (? )bk : 0 g: (.0) If g T k d k?=0, one can solve (.9) analytically and obtain and d k =?(? k )g k + k k d k? (.) k = kg k k kg k k + kkd k? k : (.) In this case, it is easy to see that direction d k dened by (.6) and (.8) is corresponding to (.)-(.) with k = : (.3) Pitlak points out in [7] that if the line search is exact, and if k, the SR method is equivalent to the FR method, and the corresponding formula of k of the PRP method is k = kg k k jg T k (g k? g k? )j : (.4)

3 One may argue that the corresponding formula of k of the PRP method should be k = kg k k g T k (g k? g k? ) : (.5) Note that if the line search is exact, the reduced method by (.5) will produce the same iterations as the PRP method. Powell []'s examples can also be used to show that the method may cycle without approaching any solution point even if the line search is exact. Thus we take (.4) instead of (.5). For clarity, we now call method (.3), (.) and (.) as the SR method provided that scalar k is such that the reduced method is the linear conjugate gradient method when the function is quadratic and the line search is exact. At the same time, we call formulae (.3) and (.4) for k as the FR and PRP versions of the SR method, and abbreviate them by FRSR and PRPSR respectively. In this paper, we will investigate the convergence properties of the FRSR and PRPSR methods with the stepsize satisfying the Wolfe conditions: f(x k )? f(x k + k d k )? k g T k d k; (.6) g(x k + k d k ) T d k g T k d k ; (.7) or the strong Wolfe conditions, namely, (.6) and jg(x k + k d k ) T d k j jg T k d kj; (.8) where 0 < < <. In [7], Pitlak suggests the following search conditions: f(x k )? f(x k + k d k ) k kd k k ; (.9) g(x k + k d k ) T d k?kd k k ; (.0) where 0 < < <, and he concludes that there exists a procedure which nds k > 0 satisfying (.9)-(.0) in a nite number of operations (see Lemma in [7]). However it is possible that ^ = 0 in his proof, the statement is not true. Consider f(x) = x ; x < (.) where > 0 is constant. Suppose that at the k-th iteration x k = and d k =? are obtained. Then for any > 0, f(x k )? f(x k + d k ) =? (? ) =? : (.) The above implies that (.9) does not hold for any > 0, provided that <. It is worth noting that, for the SR method, we often have that g T d k k =?kd k k, which implies that (.9)-(.0) are equivalent to (.6)-(.7). This paper is organized as follows. In the next section, we give the formula of the SR method under inexact line searches and provide an algorithm in which some safeguards are employed. In Section 3, we prove that the FRSR and PRPSR algorithms converge with the strong Wolfe conditions (.6) and (.8). If the Wolfe conditions (.6)-(.7) are used, convergence is also guaranteed provided that the stepsizes are uniformly bounded. One direct corollary is that the FRSR and PRPSR algorithms with (.6)-(.7) converge for strictly convex functions. In Section 4, a function is constructed to show that, if we do not restrict f k g, the Wolfe conditions can guarantee neither algorithms to converge. The 3

4 example suits for both the FRSR and PRPSR algorithms. Numerical results are reported in Section 5, which show that the SR method is a promising alternative of the standard conjugate gradient method.. Algorithm Expression (.) is deduced in the case of the exact line search. If the line search is inexact, it is not dicult to deduce from (.9) that scalar k in (.) is given by k = kg kk + k g T k d k? kg k + k d k? k ; (.) if we do not restrict k [0; ]. By direct calculations, we can obtain and g T k d k =?kd k k (.)? kd k k = k kgk k kd k? k? (g T k d k? ) : (.3) kg k + k d k? k It follows from (.) that d k is a descent direction unless d k 6= 0. On the other hand, from (.3) we see that d k = 0 if and only if g k and d k? are collinear. In the case when g k and d k? are collinear, assuming that k d k? = tg k, we can solve d k from (.9) and (.0) as follows: (?gk ; if t <?; d k = tg k ; if? t 0; (.4) 0; if t > 0. The direction d k vanishes if t 0. The following example shows this possibility. Consider f(x; y) = ( + ) (x + y ): (.5) Given the starting point x = (; ) T, one can test the unit stepsize will satisfy (.6) and (.8) if the parameters and such that (+)(? ). It follows that x = (?;?) T, and hence that t = = for the FRSR method, t = =( + ) for the PRPSR method. Thus we have t > 0 for both methods. In practice, to avoid numerical overows we restart the algorithm with d k =?g k if the following condition is satised: (.6) jg T k d k? j b kg k kkd k? k; (.7) where b is a positive number. For the PRPSR method, it is obvious that the denominator of k possibly vanishes. Thus to establish convergence results for the PRPSR method we must assume that jg T (g k k? g k? )j > 0. In practice, we test the following condition at every iteration: jg T k (g k? g k? )j > b kg k k ; (.8) where 0 b <. Now we give a general algorithm as follows. 4

5 Algorithm. Given a starting point x. Choose I f0; g, and numbers 0 < b, 0 b <, 0. Step. Compute kg k. If kg k, stop; otherwise, let k =. Step. Compute d k =?g k. Step 3. Find k > 0 satisfying certain line search conditions. Step 4. Compute x k+ by (.3) and g k+. Let k = k +. Step 5. Compute kg k k. If kg k k, stop. Step 6. Test (.7): if (.7) does not hold, go to step (). If I =, go to Step 8. Step 7. Compute k by (.3), go to Step 9. Step 8. Test (.8): if (.8) does not hold, go to Step ; compute k by (.4). Step 9. Compute k by (.) and d k by (.), go to Step 3. We call the above with I = 0 and I = as the FRSR and PRPSR algorithms respectively. 3. Global Convergence Throughout this section we make the following assumption. Assumption 3. () f is bounded below in < n and is continuously dierentiable in a neighbor N of the level set L = fx < n : f(x) f(x )g; () The gradient rf(x) is Lipschitz continuous in N, namely, there exists a constant L > 0 such that krf(x)? rf(y)k Lkx? yk; for any x; y N : (3.) For some references, we formulate the following assumption. Assumption 3. The level set L = fx < n : f(x) f(x )g is bounded. Note that Assumptions 3. and 3. imply that there is a constant such that kg(x)k ; for all x L: (3.) We also formulate the following assumption. Assumption 3.3 The function f(x) is twice continuously dierentiable in N, and there are numbers 0 < such that kyk y T H(x)y kyk ; for all x N and y < n ; (3.3) where H(x) denotes the Hessian matrix of f at x. Under Assumption 3., we state a useful result which was essentially proved by Zoutendijk [5] and Wolfe [, 3]. 5

6 Theorem 3.4 Let x be a starting point for which Assumption 3. is satised. Consider any iteration of the form (.), where d k is a descent direction and k satises the Wolfe conditions (.6)-(.7). Then X k (g T k d k ) kd k k < : (3.4) The above gives the following result: Corollary 3.5 Let x be a starting point for which Assumption 3. is satised. Consider Algorithm. where b = ; b = 0, = 0, and where the line search conditions are (.6)- (.7). Then X k kd k k < : (3.5) Proof If the algorithm restarts, we also have (.) due to (.6). So Theorem 3.4 and relation (.) give the corollary. If Algorithm. restarts at the k-th iteration, we have kg k k = kd k k. Thus (3.5) implies lim inf k! kg k k = 0 if Algorithm. restarts for innitely many times. Thus we suppose with loss of generality that (.7) and (.8) always hold in Algorithm.. In addition, we also suppose that g k 6= 0 for all k since otherwise a stationary point has been found. First, we have the following theorem for the FRSR method. Theorem 3.6 Let x be a starting point for which Assumption 3. is satised. Consider Algorithm. with I = 0, b = and = 0. Then we have that lim inf k! kg kk = 0; (3.6) if the line search satises one of the following conditions: (i) g T k d k? = 0; (ii) the strong Wolfe conditions (.6)-(.8); (iii) the Wolfe conditions (.6)-(.7), and there exists M < such that k M; for all k: (3.7) Proof We proceed by contradiction and assume that lim inf k! kg kk 6= 0: (3.8) Then there exists a constant > 0 such that kg k k ; for all k : (3.9) From (.3) and (.3), we obtain kd k k = kd k? k ( + r k); (3.0) 6

7 where kd k? k + g T d k k? + (gt d k k?) kd r k = k? k kg k k? (gt d : (3.) k k?) kd k? k Using (3.0) recursively, we get that kd k k = kd k ky i= ( + r i ): (3.) Noting that r k > 0, we deduce from (3.5) and (3.) that X k r k = (3.3) because otherwise we have that =kd k k converges which contradicts with (3.5). For (i), since g T k d k? = 0, r k can be rewritten as r k = kd k?k kg k k : (3.4) From this and (3.9), we obtain X k r k X k kd k? k < ; (3.5) which contradicts (3.3); For (ii) and (iii), we conclude that there exists a positive number c such that for all k, jg T k d k?j c kd k? k : (3.6) In fact, for (ii), we see from (.8) and (.) that (3.6) with c =. For (iii), we have from (3.), (.) and (3.7) that jg T k+ d kj = j(g k+? g k ) T d k + g T k d kj kg k+? g k kkd k k + jg T k d k j k Lkd k k + kd k k ( + LM)kd k k : (3.7) So (3.6) holds with c = + LM. Due to (3.5), we see that kd k k! 0; as k! ; (3.8) which implies that there exists an integer k such that p kd k k ; for all k k : (3.9) c Applying (3.6) and this in (3.), we obtain for all k k r k c kd k? k ; (3.0) P where c = ( + c ) =. Therefore we also have that k r k <, which contradicts (3.3). The contradiction shows the truth of (3.6). 7

8 Corollary 3.7 Let x be a starting point for which Assumption 3.3 is satised. Consider Algorithm. with I = 0, b = and = 0. Then if the line search satises (.6)-(.7), we have (3.6). Proof get that From (iii) of the above theorem, it suces to show (3.7). By Taylor's theorem, we f(x k + d k ) = f(x k ) + g T k d k + d T k H( k)d k ; (3.) for some k. This, (.6), (3.3) and (.) imply that (3.7) holds with M = (? )=, which completes the proof. For the PRPSR method, we rst have the following theorem. Theorem 3.8 Let x be a starting point for which Assumption 3. are satised. Consider Algorithm. with I =, b =, b = 0 and = 0. Then we have (3.6) if the line search satises (iii). Proof We still proceed by contradiction and assume that (3.9) holds. It is obvious that if the line search satises (iii), (3.6) still holds. From (.4), (3.), (3.7) and (3.9), we obtain k kd k? k = kg kk kd k? k jg T k (g k? g k? )j kg k k kd k? k Lkg k k k? kd k? k c 3; (3.) where c 3 = =LM. Using this and (3.9) in (.3), we have the following estimation of =kd k k for all k : So kd k k = X k which implies lim inf k! Note that and kd k k?k + g T d k k? k kg k k kd k? k + k kd k? k + kg k k kg k k? (gt k d k?) kg k k kd k? k?? (gt d k k?)? kg k k kd k? k (c? 3 +? )? (gt k d k? )? : (3.3) kg k k kd k? k? (gt d k k?) < ; (3.4) kg k k kd k? k (g T k d k? ) = : (3.5) kg k k kd k? k g k? = g k? y k? ; (3.6) lim inf k! ky k?k = lim inf k! k?kd k? k = 0: (3.7) 8

9 (3.5), (3.6) and (3.7) implies that lim inf k! (gk?d T k? ) = : (3.8) kg k? k kd k? k On the other hand, we have from (.), (3.9) and (3.8) that lim inf k! (gk? T d k?) = lim inf kg k? k kd k? k k! kd k? k = 0: (3.9) kg k? k (3.8) and (3.9) give a contradiction. The contradiction shows the truth of (3.6). From Theorem 3.8, one can see that the PRPSR method also converges for strictly convex functions provided that stepsizes satisfy (.6)-(.7). Corollary 3.9 Let x be a starting point for which Assumption 3.3 is satised. Consider Algorithm. with I =, b =, b = 0 and = 0. Then if the line search satises (.6)-(.7), we have (3.6). Theorem 3.0 Let x be a starting point for which Assumptions 3. and 3. are satised. Consider Algorithm. with I =, b =, b = 0 and = 0. Then we have (3.6) if the line search satises (.6) and (.8). Proof We proceed by contradiction and assume that (3.9) holds. At rst, we conclude that there must exist a number c 4 such that k kd k? k c 4 ; for all k : (3.30) This is because otherwise there exists a subsequence fk i g and a constant, say also c 3, such that ki kd ki k c 3 ; for all i : (3.3) Then we can prove the truth of (3.3) for the subsequence fk i g, which together with (3.5) implies that (3.5) still holds. Then similarly to the proof of Theorem 3.8, we can obtain (3.8) and (3.9), which contradicts with each other. Therefore (3.30) holds. From (.3), (3.9), (3.) and (3.30), we can get that for all k k kd k k kkg k k kd k? k kg k + k d k? k kkg k k kd k? k 4(kg k k + k kd k?k ) c 5 kkd k? k ; (3.3) where c 5 = =4( + c 4). Making products of both sides in (.) with d k (.), we can get and applying k d T k?d k = kd k k : (3.33) Dene u k = d k =kd k k. Then from (3.33), k 0, (.3), (3.6) and (3.3), we obtain ku k? u k? k =? dt k? d k kd k? kkd k k =? kd kk k kd k? k k kd k?k? kd k k kkd k? k = ( kkd k? k + g T k d k?) kd k? k kg k + k d k? k kkd k? k 4 + (g T k d k? ) kd k? k kg k + k d k? k c? 5 kd kk + c kd k? k kg k + k d k? k : (3.34) 9

10 Besides it, from (.8), (.) and (3.30), kg k + k d k? k kg k k + k g T d k k? kg k k? k kd k? k kg k k? c 4 kd k? k: (3.35) Noting (3.5) and (3.9), we can deduce from (3.34) and (3.35) that X k ku k? u k? k < : (3.36) Let jkk; j denote the number of elements of K k; = fi : k i k +? ; ks i? k = kx i? x i? k > g: (3.37) Using (3.36), we conclude that for any > 0, there exists N and k 0 such that jk k;j ; for any k k 0; (3.38) otherwise by the proof of Theorem 4.3 in [3] a contradiction can be similarly obtained. We choose b = = and = L=c 5b. Then we have from (3.) and (3.9) that k kg k k kg k k + kg k? k = b; (3.39) and when ks k? k, we have from (3.), k kg kk Lks k? k L = (3.40) b; c 5 For this, let, k 0 be so given that (3.38) holds. Denote q = [(k? k + )=] and t = k? k +? q. It is obvious that k? k + = q + t and 0 t <. Thus for any k k = maxfk ; k 0 g, we get from (3.3), and (3.38)-(3.40) that kd k k kd k k ky i=k (c 5 i ) 3 q 4 c q+t 5 b q +t 5 c 5 b = (c 5 b) t minfc 5 b; (c 5 b) g; (3.4) which contradicts (3.8). The contradiction shows the truth of (3.6). 4. A counter-example In the above section, we prove that the FRSR and PRPSR algorithms converge with the Wolfe conditions (.6)-(.7) provided that f k g is uniformly bounded. However, if we do not restrict the size of k, they need not converge. This will be shown in the following example. The example suits for both algorithms. Example 4. Consider the function f(x; y) = ( 0; if (x; y) = (0; 0); 4 ( p x x + y )(x + y) + 4 ( p y x + y )(x? y) ; otherwise, (4.) 0

11 where is dened by (t) = 8 >< >: ; p for jtj > 3 ; + sin(b jtj + b ); for p jtj 0; for jtj < p : p 3 ; (4.) In (4.), b = ( p 3 + p ), b =?(5= + p 6). It is easy to see that function f dened by (4.) and (4.) satises Assumption 3.. We construct an example such that for all k ; cos k?! x k = kx k k sin k? ; (4.3) kx k k = kx k? k tan( 4? k?); k? (0; ); (4.4) 8 g k = kx cos( k? kk +! 4 ) sin( k? + 4 ) ; (4.5) d k = kg k k sin k cos( k k) sin( k k)! : (4.6) Because we can use, for example spline tting, we can choose the starting point and assume that (4.3)-(4.6) hold for k =. Suppose that (4.3)-(4.6) hold for k. From (4.3), (4.6) and the denition of f, we can choose > 0 such that! cos k x k+ = kx k+ k sin k ; (4.7) kx k+ k = kx k k tan( 4? k): (4.8) Thus we have that g k+ = kx k+k cos( k +! 4 ) sin( k + 4 ) : (4.9) Since g k+ is orthogonal to g k, we have gk+ T g k = 0 and hence k = for both the FRSR and PRPSR algorithms. Dening k = k =(? k ), where k is given by (.), direct calculations show that g T k+d k = kg k+ kkd k k cos k ; (4.0) kg k+ k = kg k k tan( 4? k); (4.) kd k k = kg k k sin k (4.) and consequently k = kg k+k + g T k+d k kd k k + g T k+d k = = tan ( 4? k) + cos k tan( 4? k) sin k cos k tan( 4? k) sin k + sin k tan( 4? k) k ; (4.3) sin k

12 where Therefore k = cos k? sin 3 k + cos k sin k : (4.4) d k+ = (? k )(?g k+ + k+ d k ) "! cos( k + = k + 4 ) cos( k + sin( k ) + k k) sin( k k)!# ; (4.5) where k = (? k ) tan( 4? k)kg k k. Noting that k for all k (0; =8), the above relation indicates that d k+ kd k+ k = cos(k + +! 4 + k+) sin( k (4.6) k+) and? 4 + k k+ k : (4.7) Because d T k+ g k+ < 0, we have that 0 < k+ k : (4.8) Again we have k+ (0; 8 ). Because d T k+ g k+ =?kd k+ k, we have cos( k + d k+ = kd k+ k sin k k+) sin( k k+)! : (4.9) By induction, we have (4.3)-(4.6) hold for all k. Now we test whether (.6)-(.7) hold. First, due to g T k+ d k > 0, (.7) obviously holds. Further, from (.), (4.3) and (4.5), we have? k g T k d k = k kd k k = kd k kkx k+? x k k = kg kkkd k k cos k + sin k : (4.0) By the denition of f, (4.) and (4.), where f k? f k+ = (kg kk? kg k+ k ) = kg kk h? tan ( 4? k) k = = cos k sin k (cos k + sin k ) kg kk = cos kkg k kkd k k (cos k + sin k ) =? k k g T k d k; (4.) cos k cos k + sin k : (4.) Note that (4.8) implies that k! 0; as k! : (4.3) The above two relations indicate that for any positive number <, k > for large k. Therefore, if we choose a suitable starting point, (.6) and (.7) hold for any 0 < < <. i

13 However, from (4.4) and (4.8), one can prove that Y k? kx k k = kx k tan( 4? i)! c 6 kx k; as k! (4.4) i= where c 6 = Q i= tan( 4? i) > 0. Thus any cluster point of fx k g is a non-stationary point. In this example, we have from (4.0) that k = kg k k (cos k + sin k )kd k k = ; (4.5) (cos k + sin k ) sin k which, together with (4.3), implies that k! ; as k! : (4.6) The above example shows that if we only impose (.6)-(.7) on every line search, it is possible that f k g is not bounded, and the FRSR and PRPSR algorithms fail. 5. Numerical results We tested the FRSR and PRPSR algorithms on SGI Indigo workstations. Our line search subroutine computes k such that (.6) and (.8) hold for = 0:0 and = 0:. The initial value of k is always set to. Although in this case the convergence results, i. e., Theorems 3.6 and 3.0 hold for any numbers 0 < b and 0 b <, we choose b = 0:9 and b = 0: to avoid numerical overows. We compared the numerical results of our algorithms with the Fletcher-Reeves method and the Polak-Ribiere-Polyak method. For the PRP algorithm, we restart it by setting d k =?g k whenever a down-hill search direction is not produced. We tested the algorithms on the 8 examples given by More, Garbow and Hillstrom [6]. The results are reported in Table 4.. The column \P" denotes the number of the problems, and \N" the number of variables. The numerical results are given in the form of I/F/G, where I, F, G are numbers of iterations, function evaluations, and gradient evaluations respectively. The stopping condition is kg k k 0?6 : (5.) The algorithms are also terminated if the number of function evaluations exceed We also terminate the calculation if the function value improvement is too small. More exactly, algorithms are terminated whenever [f(x k )? f(x k+ )]=[ + jf(x k )j]) 0?6 : (5.) 3

14 Table 4. P N FRSR PRPSR FR PRP 3 90/66/6 53/68/78 06/358/33 6/4/ /768/743 58/395/375 37/86/434 6/65/ /7/5 3/7/5 3/7/5 3/7/5 4 >5000 F ailed >5000 F ailed 5 3 7/7/63 9/3/6 >5000 3/39/ /6/7 3/6/7 3/6/7 3/5/ /077/899 > /43/456 > //3 30/30/8 8/8/56 F ailed 9 3 9/69/46 /9/9 5/40/4 0/8/8 0 F ailed 4/56/ F ailed F ailed 4 36/67/5 F ailed F ailed 3/5/5 3 F ailed 84/78/30 > /687/ >5000 6/5/5 > /06/ /3/6 44/53/9 5/583/40 /04/ /30/899 4/9/65 > /569/47 6 /59/40 9/53/36 3/9/45 9/8/6 7 4 > /69/86 36/4737/306 40/589/ /64/09 47/5/7 63/04/78 8/8/39 In the table, a superscript \*" indicates that the algorithm terminated due to (5.) but (5.) is not satised, and \F ailed" means that d k is so small that a numerical overow happens while the algorithm tries to compute f(x k + d k ): From Table 4., We found that the FRSR and PRPSR algorithms perform better than the Fletcher-Reeves and Polak-Ribiere-Polyak algorithms respectively. Therefore, the SR method will be a promising alternative of the standard conjugate gradient method. It is known that if a very small step is produced by the FR method, then this behavior may continue for a very large number of iterations unless the method is restarted. This behavior was observed and explained by Powell [0], Gilbert and Nocedal [3]. In fact, the statement is also true for the FRSR method. Let k denote the angle between?g k and d k. It follows from the denition of k and (.) that cos k =?gt k d k kg k kkd k k = kd kk kg k k : (5.3) Suppose that at k-iteration a \bad" search direction is generated, such that cos k 0, and that x k+ x k. Thus kg k+ k kg k k, and by (5.3), kd k k kg k k kg k+ k: (5.4) From (.), (.3), (.8) and this, we see that k! ; as k! ; (5.5) which with (.) implies that d k+ d k : (5.6) Hence, by this and (5.4), kd k+ k kg k+ k; (5.7) which together with (5.3) shows that cos k+ 0. The argument can therefore start all over again. 4

15 Acknowledgements. The authors were very grateful to the referees for their valuable comments and suggestions. References [] Fletcher R. and Reeves C. (964), Function minimization by conjugate gradients, Comput. J. 7, [] Hestenes M. R. (980), Conjugate direction methods in optimization, Springer-Verlag, New York Heidelberg Berlin, [3] Gilbert J. C. and Nocedal J. (99), Global convergence properties of conjugate gradient methods for optimization, SIAM. J. Optimization. (), -4. [4] Lemarechal C. (975), An extension of Davidon methods to nondierentiable problems, Mathematical Programming Study 3, [5] McCormick G. P. and Ritter K. (975), Alternative Proofs of the convergence properties of the conjugate-gradient method, JOTA: VOL. 3, NO. 5, [6] More J. J., Garbow B. S. and Hillstrom K. E. (98), Testing unconstrained optimization software, ACM Transactions on Mathematical Software 7, 7-4 [7] Pitlak R. (989), On the convergence of conjugate gradient algorithm, IMA Journal of Numerical Analysis 4, [8] Polak E. and Ribiere G. (969), Note sur la convergence de directions conjugees, Rev. Francaise Informat Recherche Operationelle, 3e Annee 6, [9] Polyak B. T. (969), The conjugate gradient method in extremem problems, USSR Comp. Math. and Math. Phys. 9, 94-. [0] Powell M. J. D. (977), Restart procedures of the conjugate gradient method, Math. Program., [] Powell M. J. D. (984), Nonconvex minimization calculations and the conjugate gradient method, in: Lecture Notes in Mathematics vol. 066, Springer-Verlag (Berlin) (984), pp. -4. [] Wolfe P. (969), Convergence conditions for ascent methods, SIAM Review, [3] Wolfe P. (97), Convergence conditions for ascent methods. II: Some corrections, SIAM Review 3, [4] Wolfe P. (975), A method of conjugate subgradients for minimizing nondierentiable functions, Math. Prog. Study 3, [5] Zoutendijk G. (970), Nonlinear Programming, Computational Methods, in: Integer and Nonlinear Programming (J.Abadie,ed.), North-Holland (Amsterdam),

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics