New Inexact Line Search Method for Unconstrained Optimization 1,2

Size: px

Start display at page:

Download "New Inexact Line Search Method for Unconstrained Optimization 1,2"

Silvester Gervais Small
6 years ago
Views:

1 journal of optimization theory and applications: Vol. 127, No. 2, pp , November 2005 ( 2005) DOI: /s New Inexact Line Search Method for Unconstrained Optimization 1,2 Z. J. Shi 3 and J. Shen 4 Communicated by F. Zirilli Abstract. We propose a new inexact line search rule and analyze the global convergence and convergence rate of related descent methods. The new line search rule is similar to the Armijo line-search rule and contains it as a special case. We can choose a larger stepsize in each line-search procedure and maintain the global convergence of related line-search methods. This idea can make us design new line-search methods in some wider sense. In some special cases, the new descent method can reduce to the Barzilai and Borewein method. Numerical results show that the new line-search methods are efficient for solving unconstrained optimization problems. Key Words. Unconstrained optimization, inexact line search, global convergence, convergence rate. 1. Introduction Let R n be an n-dimensional Euclidean space and let f : R n R 1 be a continuously differentiable function. Line-search methods for solving the unconstrained minimization problem min f(x), x R n, (1) 1 The work was supported by NSF of China Grant , Postdoctoral Fund of China, and K. C. Wong Postdoctoral Fund of CAS Grant The authors thank the anonymous referees for constructive comments and suggestions that greatly improved the paper. 3 Professor, College of Operations Research and Management, Qufu Normal University, Rizhao, Shandong, PRC; Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, PRC. 4 Assistant Professor, Department of Computer and Information Science, University of Michigan, Dearborn, Michigan /05/ / Springer Science+Business Media, Inc.

2 426 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 have the form defined by the equation x k+1 = x k + α k d k, k= 1, 2, 3,..., (2) where x 1 R n is an initial point, d k is a descent direction of f(x) at x k, and α k is the stepsize. Let x k be the current iterative point, k = 1, 2, 3,..., and x be a stationary point which satisfies f(x ) = 0. We denote the gradient f(x k ) by g k, the function value f(x k ) by f k, and the function value f(x ) by f. Choosing the search direction d k and determining the stepsize α k along the search direction at each iteration are the main tasks in linesearch methods. The search direction d k is generally required to satisfy g T k d k < 0, (3) which guarantees that d k is a descent direction of f(x) at x k. In order to guarantee global convergence, we require sometimes that d k satisfies the sufficient descent condition g T k d k c g k 2, (4) where c>0 is a constant. Moreover, we need to choose d k to satisfy the angle property cos < g k,d k >= g T k d k/( g k d k ) η 0, (5) where η 0 (0, 1] is a constant and < g k,d k > denotes the angle between the vectors g k and d k. The commonly-used line-search rules are as follows. (a) Minimization Rule. At each iteration, α k is selected so that f(x k + α k d k ) = min α>0 f(x k + αd k ). (6) (b) (c) Approximate Minimization Rule. At each iteration, α k is selected so that { } α k = min α g(x k + αd k ) T d k = 0,α>0. (7) Armijo Rule. Set scalars s k,β,l>0, σ with s k = gk T d k/(l d k 2 ), β (0, 1), and σ (0, 1/2). Let α k be the largest α in {s k,βs k,β 2 s k,...} such that f k f(x k + αd k ) σαg T k d k. (8)

3 JOTA: VOL. 127, NO. 2, NOVEMBER (d) Limited Minimization Rule. Set s k = gk T d k/(l d k ) 2. α k is defined by f(x k + α k d k ) = min α [0,s k ] f(x k + αd k ), (9) (e) where L>0 is a constant. Goldstein Rule. A fixed scalar σ (0, 1/2) is selected and α k is chosen to satisfy σ [f(x k + α k d k ) f k ]/α k g T k d k 1 σ. (10) (f) It is possible to show that, if f is bounded below, there exists an interval of stepsize α k for which the relation above is satisfied; there are fairly simple algorithms for finding such a stepsize through a finite number of arithmetic operations. Strong Wolfe Rule. α k is chosen to satisfy simultaneously f k f(x k + α k d k ) σα k g T k d k, (11) g(x k + α k d k ) T d k βg T k d k, (12) (g) where σ and β are some scalars with σ (0, 1/2) and β (σ, 1). Wolfe Rule. α k is chosen to satisfy (11) and g(x k + α k d k ) T d k βg T k d k. (13) Some important global convergence results for various methods using the above-mentioned specific line-search procedures have been given; see e.g. Refs In fact, the above-mentioned line-search methods are monotone descent for unconstrained optimization (Refs. 8 12). Nonmonotone line-search methods have been investigated also by many authors; see for example Refs In fact, the Barzilai-Borwein method (see Refs ) is a nonmonotone descent method which is an efficient algorithm for solving some special problems. In this paper, we extend the Armijo line-search rule and analyze the global convergence of the corresponding descent methods. This new linesearch rule is similar to the Armijo line-search rule and contains it as a special case. The new line-search rule can enable us to choose larger stepsize at each iteration and reduce the number of function evaluations at each step. This idea can make us design new line-search methods in some wider sense and find some new global convergence properties. In some special cases, the new descent method can reduce to the Barzilai and Borwein method (Refs ), which is regarded as an efficient algorithm for un-constrained

4 428 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 optimization. Numerical results show that these new line-search methods are efficient for solving unconstrained optimization problems. The paper is organized as follows. In section 2, we describe the new line-search rule. In sections 3 and 4, we analyze its convergence and convergence rate. In Section 5, we propose some ways to estimate the parameters used in the new line-search rule and report some numerical results. Conclusions are given in Section Inexact Line-Search Rule Throughout the paper we make the following assumptions. (H1) The function f(x) has a lower bound on the level set L 0 = {x R n f(x) f(x 0 )}, where x 0 is given. (H2) The gradient g(x) of f(x) is Lipschitz continuous in an open convex set B that contains L 0 ; i.e., there exists L such that g(x) g(y) L x y, x,y B. We describe first the algorithm model. Algorithm Model A. Step 0. Given some parameters and the initial point x 1, set k := 1. Step 1. If g k =0, then stop; else go to Step 2; Step 2. Set x k+1 = x k + α k d k, where d k is a descent direction of f(x) at x k and where α k is selected by some line-search rule. Step 3. Set k := k + 1; go to Step 1. In this section, we do not discuss how to choose d k at each iteration, but investigate how to choose the stepsize α k. Some useful line-search rules have been mentioned in the previous section. We describe here a new inexact line search rule which contains the Armijo line search rule as a special case. We will find that the stepsize defined in the new line-search rule is larger than that defined in the original Armijo line search rule. In other words, the stepsize defined by the new line-search rule is easier to find than that defined by the original Armijo line-search rule. In particular, the stepsize defined in the modified Armijo line-search rule must be greater than the stepsize defined by the original Armijo linesearch rule. (c ) Modified Armijo Line Search Rule. Set scalars s k, β, L k, µ, and σ with s k = g T k d k/l k d k 2, β (0, 1), L k > 0, µ [0, 2), and

5 JOTA: VOL. 127, NO. 2, NOVEMBER σ (0, 1/2). Let α k be the largest α in {s k,βs k,β 2 s k,...,} such that f(x k + αd k ) f k σα[g T k d k + (1/2)αµL k d k 2 ]. (14) Remark 2.1. Suppose that α k is defined by the line-search rule (c) and that α k is defined by the line-search rule (c ); then, α k α k. In other words, let A c denote the set of the Armijo stepsize and let A c be the set of new stepsizes; then, we have A c A c. In fact, if L k L and there exists α k satisfying (8), then α k is certain to satisfy (14). Moreover, if µ=0, then the line search rule (c ) reduces to the Armijo line search rule (c). In Algorithm Model (A), the corresponding algorithms with linesearch rule (c ) is denoted by Algorithm (c ). In what follows, we analyze the global convergence of the new line-search method. 3. Global Convergence Analysis Theorem 3.1. Assume that (H1) and (H2) hold, the search direction d k satisfies (3), and α k is determined by the modified Armijo line-search rule. Algorithm (e ) generates an infinite sequence {x k } and 0 <L k m k L, (15) where m k is a positive integer and m k M 0 < +, with M 0 being a large positive constant. Then, k=1 ( g T k d k/ d k ) 2 < +. (16) Proof. Let K 1 ={k α k = s k }, K 2 ={k α k <s k }. If k K 1, then f(x k + α k d k ) f k σα k [gk T d k + (1/2)α k L k d k 2 ] [ = σ gk T d k/l k d k 2][ ] gk T d k (1/2)µgk T d k ( 2 = [σ (1 (1/2)µ)/L k ] gk k) T d / dk 2 ;

6 430 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 thus, ( 2 f(x k + α k d k ) f k [σ(1 (1/2)µ)/L k ] gk k) T d / dk 2, k K 1. (17) Let η k = σ(1 (1/2)µ)/L k, k K 1 ; by (15), we have Let η k = σ(1 (1/2)µ)/L k σ(1 (1/2)µ)/m k L σ(1 (1/2)µ)/M 0 L<0. η = σ(1 (1/2)µ)/M 0 L. This and (17) imply that η k η and f k+1 f k η ( g T k d k/ d k ) 2, k K1. (18) If k K 2, then α k <s k. This shows that s k cannot satisfy (14) and thus α k βs k. By the modified Armijo line-search rule (e ), we assert that α = α k /β cannot satisfy (14) and thus f (x k + α k d k /β) f k >σα k /β[g T k d k + (1/2)α k µl k d k 2 /β]. Using the mean-value theorem on the left-hand side of the above inequality, we see that there exists θ k [0, 1] such that α k g(x k + θ k α k d k /β) T d k /β > σ α k /β[g T k d k + (1/2)α k µl k d k 2 /β]; therefore, g(x k + θ k α k d k /β) T d k >σ [ ] gk T d k + (1/2)α k µl k d k 2 /β. (19) By (H2), the Cauchy-Schawarz inequality, and (19), we obtain α k L d k 2 /β g(x k + θ k α k d k /β) g k d k [g(x k + θ k α k d k /β) g k ] T d k > (1 σ)g T k d k + (1/2)σ µα k L k d k 2 /β.

7 JOTA: VOL. 127, NO. 2, NOVEMBER Therefore, α k L d k 2 /β > (1 σ)g T k d k, which implies that Letting we have α k [β(1 σ )/L] g T k d k/ d k 2, k K 2. (20) s k = [β(1 σ )/L] gt k d k/ d k 2, k K 2, s k >α k >s k, k K 2. (21) By (14) and (21), we have [ f k+1 f k σα k gk T d k + (1/2)α k µl k d k 2] { σ max α [gk T d k + (1/2)αµL k d k 2]} Letting we have Let σ s k α s k max s k α s k { α [gk T d k + (1/2)s k µl k d k 2]} = σs k (1 (1/2)µ)gT k d k = [σβ(1 σ)(1 (1/2)µ)/L] ( g T k d k/ d k ) 2. η = σβ(1 σ)(1 (1/2)µ)/L, (22) f k+1 f k η (g T k d k) 2 / d k 2, k K 2. (23) η 0 = min(η,η ); by (18) and (23), we have ( 2 f k+1 f k η 0 gk T d k/ d k ), k. (24) By (H1) and (24), we can obtain that {f k } is a decreasing sequence and has a bound from below. This shows that {f k } has a limit. By (24), we prove via (H1) that (16) holds.

8 432 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 Corollary 3.1. If the conditions in Theorem 3.1 hold, then ( 2 lim gk T d k/ d k ) = 0. (25) k In fact, Assumption H2 can be replaced by the following weaker assumption: (H2 ) The gradient g(x) of f(x) is uniformly continuous on an open convex set B that contains L 0. Theorem 3.2. Assume that (H1) and (H2 ) hold, the search direction d k satisfies (3), and α k is determined by the modified Armijo line-search rule. Algorithm (e ) generates an infinite sequence {x k } and 0 <L k M 0, (26) where M 0 is a large positive constant. Then, ( ) lim gk T d k/ d k = 0. (27) k Proof. Similarly as in the proof of Theorem 3.1, if k K 1, by (18), we can prove that ( ) gk T d k/ d k = 0. (28) lim k K 1,k In the case of k K 2, by (14), we have f(x k + α k d k ) f k σα k [gk T d k + (1/2)α k µl k d k 2 ] σα k [gk T d k + (1/2)s k µl k d k 2 ] = σα k (1 (1/2)µ)gk T d k. By (H1), we have lim k K 2, k ( α k g T k d k ) = 0. (29) If there exist ɛ 0 > 0 and an infinite subset K 3 K 2 such that g T k d k/ d k ɛ 0, k K 3, (30)

9 JOTA: VOL. 127, NO. 2, NOVEMBER then by (29) and (30), we have lim α k d k =0. (31) k K 3, k By (19), we have g(x k + θ k α k d k /β) T d k σg T k d k, k K 3, (32) where θ k [0, 1] is defined in the proof of Theorem 3.1. By the Cauchy- Schwarz inequality and (32), we have g(x k + θ k α k d k /β) g k = g(x k + θ k α k d k /β) g k d k / d k [g(x k + θ k α k d k /β) g k ] T d k / d k (1 σ)gk T d k/ d k, k K 3. By (H2 ) and (31), we obtain ( ) lim gk T d k/ d k = 0, k K 3, k which contradicts (30). This shows that ( ) lim gk T d k/ d k = 0. (33) k K 2, k By (28), (33), and noting that K 1 K2 ={1, 2, 3...}, we assert that (27) holds. Since (H2) implies (H2 ), Theorem 3.1 is essentially a corollary of Theorem Linear Convergence Rate In order to analyze the convergence rate, we assume that the sequence {x k } generated by the new algorithm converges to x. We make further the following assumption. (H3) 2 f(x ) is a symmetric positive-definite matrix and f(x) is twice continuously differentiable on a neighborhood N 0 (x,ɛ 0 ) of x.

10 434 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 Lemma 4.1. Assume that (H3) holds. Then, there exist ɛ>0 and 0 < m M such that (H1) and (H2) hold for x 0 N(x,ɛ) N 0 (x,ɛ 0 ) and Thus, m y 2 y T 2 f(x)y M y 2, x,y N(x,ɛ), (34) (1/2)m x x f(x) f(x ) (1/2)M x x 2, x N(x,ɛ), (35) M x y 2 (g(x) g(y)) T (x y) m x y 2, x,y N(x,ɛ). (36) M x x 2 g(x) T (x x ) m x x 2, x N(x,ɛ). (37) By (37) and (36), we obtain also from the Cauchy-Schwarz inequality that and M x x g(x) m x x, x N(x,ɛ), (38) g(x) g(y) M x y, x,y N(x,ɛ). (39) Its proof can be seen from e.g. Refs. 7 8 or Refs Lemma 4.2. If (H1) and (H2) hold, the search direction d k satisfies the angle property (5) at each iteration, Algorithm (c ) generates an infinite sequence {x k }; then, there exists η>0 such that Let f k f k+1 η g k 2, k. (40) Proof. By Theorem 3.1, (5), and (24), we have ( ) 2 f k+1 f k η 0 gk T d k/ d k ( ) 2 = η 0 gk T d k/ g k d k gk 2 η = η 0 η2 0 ; η 0 η2 0 g k 2. we obtain that (40) holds.

11 JOTA: VOL. 127, NO. 2, NOVEMBER Theorem 4.1. If (H3) holds, the search direction d k satisfies the angle property (5) at each iteration, Algorithm (c ) generates an infinite sequence {x k }, and x k N(x,ɛ) for sufficiently large k. Then, {x k } x at least linearly. Proof. By (H3), Lemma 4.1, Lemma 4.2, and (38), it follows that lim x k = x. k By (38) and Lemma 4.1, we obtain Setting f k f k+1 η g k 2 θ = m 2η/M, ηm 2 x k x ( 2 2ηm 2 /M ) (f k f ). we can prove that θ<1. In fact, by the definition of η and noting that M L, we obtain θ 2 = 2m 2 η/m 2m 2 η0 2 η 0 /M 2m 2 η 0 /M 2m 2 η /M [σ(1 (1/2)µ)/M 0 L] (2m 2 /M ) = [2σ(1 (1/2)µ/M 0 ] m 2 /M 2 (2σ/M 0 ) < 1. Set ω = 1 θ 2 ; obviously, ω<1; we obtain from the above inequality that f k+1 f (1 θ 2 )(f k f ) = ω 2 (f k f ) ω 2k (f 1 f ). By Lemma 4.1 and the above inequality, we have x k+1 x 2 (2/m )(f k+1 f ) ω 2k [ 2(f 1 f )/m ] ;

12 436 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 thus, x k+1 x ω k 2(f 1 f )/m ; therefore, R 1 {x k }= lim x k x 1/k k + = lim [ω k 1 ] 1/k 2(f 1 f )/m k + [ ] 1/k = ω lim 2(f1 f )/m k + = ω<1, which shows that {x k } converges to x at least linearly. 5. Numerical Results In this section, we discuss the implementation of the new algorithm. The technique of choosing parameters is reasonable and effective for solving practical problems in both theory and numerical aspects Parameter Estimation. In the modified Armijo line-search rule, there is a parameter L k which must be estimated. As we know, L k should approximate the Lipschitz constant M of the gradient g(x) of the objective function f(x).ifm is given, we should certainly take L k = M.However, M is not known prior in many situations. L k needs to be estimated in some cases. First of all, let δ k 1 = x k x k 1, y k 1 = g k g k 1, k= 2, 3, 4,..., and estimate or L k = y k 1 / δ k 1 (41) L k = max { y k i / δ k i i = 1, 2,...,M}, (42) whenever k M + 1, where M is a positive integer.

13 JOTA: VOL. 127, NO. 2, NOVEMBER Next, the BB method (Refs ) motivates us also to find a way of estimating M. Here, BB stands for Bazzilai and Borwein. Solving the minimization min L k δ k 1 y k 1, we obtain L k = δ T k 1 y k 1/ δ k 1 2. (43) Obviously, if k M + 1, we can also take { } L k = max δk i T y k i/ δ k i 2, i= 1, 2,...,M. (44) On the other hand, we can take L k = y k 1 2 /δ T k 1 y k 1 (45) or { L k = max y k i 2 /δk i T y k i } i = 1, 2,...,M, (46) whenever k M + 1. There are many other techniques of estimating the Lipschitz constant M ; see Ref. 8. We will use (41) (46) to estimate M and the corresponding algorithms are denoted as Algorithms (41) (46) respectively Numerical Results. In what follows, we will discuss the numerical performance of the new line-search method. The test problems are chosen from Ref. 20 and the implementable algorithm is stated as follows. Algorithm A. Step 0. Given some parameters σ (0, 1/2), β (0, 1), µ [0, 2), and L 1 = 1, let x 1 R n and set k := 1. Step 1. If g k =0, then stop; else, go to Step 3. Step 3. Choose d k to satisfy the angle property (5); for example, choose d k = g k. Step 4. Set x k+1 = x k + α k d k, where α k is defined by the modified Armijo line search rule. Step 5. Set δ k = x k+1 x k, y k = g k+1 g k, and L k+1 is determined by one of (41) (46). Step 6. Set k := k + 1; go to Step 1.

14 438 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 Table 1. Iterations and function evaluations, µ = 1. P n Armijo New (41) New (43) New (45) P5 2 8, 12 6, 9 7, 11 7, 8 P , 38 18, 22 16, 25 17, 26 P , 50 28, 34 26, 33 30, 42 P , 72 16, 63 12, 58 11, 56 P , 17 12, 13 12, 13 11, 11 P , 21 16, 23 12, 14 11, 15 P , 30 17, 22 16, 25 15, 20 P , 42 28, 34 26, 36 26, 38 P , 58 30, 34 28, 32 30, 52 P , 87 43, 67 48, 72 42, 61 P , 67 45, 59 38, 38 47, 52 P , , 32 16, 78 9, 83 P , 30 14, 19 12, 16 15, 18 P , 22 11, 18 10, 19 12, 19 In the above algorithm, we set σ = 0.38, β= 0.87, µ= 1 and set the same parametric values in the original Armijo line-search method with L = 1. We will find that the stepsize in the new line-search method is easier to find than in the original one. In other words, the new line-search method needs less evaluations of gradients and objective functions at each iteration. We tested the new line-search methods and the original Armijo line-search method with double precision in a portable computer. The codes were written in the visual C++ language. Our test problems and the initial points used are drawn from Ref. 20. For each problem, the limiting number of function evaluations is set to and the stopping condition is g k (47) Our numerical results are shown in Tables 1 4, where Armijo, New (41), New (43), New (45) stand for the Armijo line-search method and the new line search methods with L k given by (41), (43), (45) respectively. The symbols n, In, N f mean respectively the dimension of problems, the number of iterations, and the number of function evaluations, respectively. The unconstrained optimization problems are numbered in the same way as in Ref. 8. For example, P5 means Problem 5 in Ref. 20.

15 JOTA: VOL. 127, NO. 2, NOVEMBER Table 2. Iterations and function evaluations, µ = 1.5. P n Armijo New (41) New (43) New (45) P5 2 8, 12 6, 7 7, 10 6, 8 P , 38 17, 20 16, 21 15, 23 P , 50 23, 28 24, 31 25, 36 P , 72 16, 43 12, 49 11, 38 P , 17 12, 13 12, 13 11, 11 P , 21 16, 21 11, 13 11, 14 P , 30 16, 18 16, 22 15, 18 P , 42 25, 33 26, 32 23, 32 P , 58 28, 33 26, 28 30, 47 P , 87 40, 58 42, 61 38, 56 P , 67 43, 48 36, 38 42, 48 P , , 28 16, 43 9, 67 P , 30 14, 19 12, 16 15, 18 P , 22 11, 16 10, 16 12, 17 Table 3. Iterations and function evaluations, µ = 1. P n Armijo New (41) New (43) New (45) P , , , , 213 P , , , , 288 P , , , , 512 P , , , , 847 P , , , , 1628 P , , , , 2694 P , , , , 3269 P , , , , 1163 P , , , , 581 Table 4. Iterations and function evaluations, µ = 1.5. P n Armijo New (41) New (43) New (45) P , , , , 162 P , , , , 242 P , , , , 468 P , , , , 687 P , , , , 1263 P , , , , 2476 P , , , , 2129 P , , , , 982 P , , , , 283

16 440 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 (P5) Beale Function. Here, 3 f(x)= f i (x) 2, i=1 f i (x) = y i x 1 (1 x2 i ), i = 1, 2, 3, y 1 = 1.5, y 2 = 2.25, y 3 = x 1 = (1, 1) T, x = (3, 0.5) T, f = 0. (P13) Powell Singular Function. Here, 4 f(x)= f i (x) 2, i=1 f 1 (x) = x x 2, f 2 (x) = 5 1/2 (x 3 x 4 ), f 3 (x) = (x 2 2x 3 ) 2, f 4 (x) = 10 1/2 (x 1 x 4 ) 4. x 1 = (3, 1, 0, 1) T, x = (0, 0, 0, 0) T, f = 0. (P14) Wood Function. Here, 6 f(x)= f i (x) 2, i=1 f 1 (x) = 10(x 2 x1 2 ), f 2(x) = 1 x 1, f 3 (x) = (90) 1/2 (x 4 x3 2 ), f 4(x) = 1 x 3, f 5 (x) = (10) 1/2 (x 2 + x 4 2), f 6 (x) = (10) 1/2 (x 2 x 4 ). x 1 = ( 3, 1, 3, 1) T, x = (0, 0, 0, 0) T, f = 0. (P16) Brown and Dennis Function. Here, 20 f(x)= f i (x) 2, i=1 f i (x) = [x 1 + t i x 2 exp(t i )] 2 + [x 3 + x 4 sin(t i ) cos(t i )] 2, t i = i/5, i = 1, 2,...,20. x 1 = (25, 5, 5, 1) T, f =

17 JOTA: VOL. 127, NO. 2, NOVEMBER (P20) Watson Function. Here, 31 f(x)= f i (x) 2, i=1 2 n n f i (x) = (j 1)x j t j 2 x j t j 1 1, i j=2 j=1 t i = i/29, 1 i 29, f 30 (x) = x 1, f 31 (x) = x 2 x x 1 = (0,...,0) T, f = i (P21) Extended Rosenbrock Function. Here, n f(x)= f i (x) 2, i=1 f 2i 1 (x) = 10(x 2i xi 1 2 ), f 2i(x) = 1 x 2i 1. x 1 = (ξ j ), ξ 2j 1 = 1.2, ξ 2j = 1. x = (1,...,1) T, f = 0. (P23) Penalty Function I. Here, n+1 f(x)= f i (x) 2, i=1 f i (x) = a 1/2 (x i 1), 1 i n, n f n+1 (x) = 1/4, a= xj 2 j=1 x 1 = (ξ j ), ξ j = j.

18 442 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 (P24) Penalty Function II. Here, f(x)= 2n i=1 f i (x) 2, f 1 (x) = x 1 2, f i (x) = a 1/2 [exp(x i /10) + exp(x i 1 /10 y i ], 2 i n, f i (x) = a 1/2 [exp(x i n+1 /10) exp( 1/10)], n<i<2n, n f 2n (x) = (n j + 1)xj 2 1, j=1 a = 10 5, y i = exp(i/10) + exp[(i 1)/10]. x 1 = (1/2,...,1/2) T. (P25) Variably-Dimensioned Function. Here, n+2 f(x)= f i (x) 2, i=1 f i (x) = x i 1, i= 1,...,n, n f n+1 (x) = j(x j 1), j=1 2 n f n+2 (x) = j(x j 1). j=1 x 1 = (ξ j ), ξ j = 1 (j/n). x = (1,...,1) T, f = 0. (P26) Trigonometric Function. Here, n f(x)= f i (x) 2, i=1 f i (x) = n n cos x j + i(1 cos x i ) sin x i, j=1 x 1 = (1/n,...,1/n) T, f = 0. i= 1, 2,...,n.

19 JOTA: VOL. 127, NO. 2, NOVEMBER (P30) Broyden Tridiagonal Function. Here, n f(x)= f i (x) 2, i=1 f i (x) = (3 2x i )x i x i 1 2x i+1 + 1, x 0 = x n+1 = 0. x 1 = ( 1,..., 1) T, f = 0. i= 1,...,n, We set d k = g k at each iteration. In this case, if α k = s k at each iteration, then New (43) and New (45) will reduce to the BB methods (Refs ), because of α k = s k = g T k d k/l k d k 2 = 1/L k = δ k 1 2 /δ T k 1 y k 1, corresponding to (43), or because of α k = δ T k 1 y k 1/ y k 1 2, corresponding to (45). This shows that the new line-search method contains the BB methods and has global convergence, while the BB method has no global convergence in some cases. Thus, the new method is promising and will challenge the BB method in some sense. In Table 1, a pair of numbers means that the first number denotes the number of iterations and the second number denotes the number of function evaluations when reaching the same precision (47), that is, the pair of numbers is (In, N f ). It can be seen that, for some problems, the three new algorithms need less number N f of function evaluations than the original Armijo line-search method; On the other hand, for some problems, the Armijo line-search method performs as well as the new line-search method. Overall, our numerical results indicate that the new line-search methods are superior to the original Armijo line-search method in many situations. In particular, the new method needs less function evaluations than the original Armijo line-search method when reaching the same precision; i.e., in many cases, α k often takes s k in the new line-search rule. Moreover, the estimation of L k and thus s k is very important in the new algorithm. In the numerical experiment, we take d k = g k, which is a very special case. We can take other descent directions as d k at each step. In the new line search method, there are some parameters σ (0, 1/2), β (0, 1), µ [0, 2) which need to be set in concrete algorithms. In

20 444 JOTA: VOL. 127, NO. 2, NOVEMBER 2005 practical computation, µ is a key parameter because the new method performs better when it increases in [0,2). The numerical results in Table 2 show the fact. It is obvious that the numerical results in Table 2 are better than those in Table 1. The reason is that, when µ increases, s k is easily accepted as the α k and the function evaluations will decrease at each iteration. The results of Tables 1 2 show that the new algorithm works better than the original Armijo algorithm when the dimension n of the problem increases. We can increase the dimension n of some problems and conduct many more numerical experiments. It can be seen from Tables 3 and 4 that the three new algorithms have better numerical performance than the original Armijo algorithm. When the dimension n of the problems increases, the new method works better than the original Armijo line search method. We can also see that, the greater µ [0, 2) is, the better the new method performs. 6. Conclusions A new line-search rule is proposed and the related descent method is investigated. We can choose a larger stepsize in each line-search procedure and maintain the global convergence of related line-search method. This idea can make us design new line-search methods in some wider sense. Especially, the new method can reduce to BB method (Refs ) in some special cases. As we can see, the Lipschitz constant M of the gradient g(x) of the objective function f(x) needs to be estimated at each step. We have discussed some techniques for choosing L k. In the numerical experiment, we take d k = g k at each step. Indeed, we can take other descent directions as d k. For further research, we can establish other similar line-search rules such as the Goldstein rule and Wolfe rule. Of course, we hope in less evaluation numbers of the gradients and objective functions at each iteration. Since the new line-search rule needs to estimate L k, we can find other ways to estimate L k and choose diverse parameters such as σ,µ,β so as to find available parameters in solving special unconstrained optimization problems. References 1. Dennis, J. E., and Schnable, R. B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Ennglewood Cliffs, New Jersey, 1983.

21 JOTA: VOL. 127, NO. 2, NOVEMBER Nocedal, J., Theory of Algorithms for Unconstrained Optimization, Acta Numerica, Vol. 1, pp , Powell, M. J. D., Direct Search Algorithms for Optimization Calculations, Acta Numerica, Vol. 7, pp , Wolfe, P., Convergence Conditions for Ascent Methods, SIAM Review, Vol. 11, pp , Wolfe, P., Convergence Conditions for Ascent Methods, II: Some Corrections, SIAM Review, Vol. 13, pp , Shi, J. Z., Convergence of Line-Search Methods for Unconstrained Optimization, Applied Mathematics and Computation, Vol. 157, pp , Cohen, A. I., Stepsize Analysis for Descent Methods, Journal of Optimizaiton Theory and Applications, Vol. 33, pp , Yuan, Y., Numerical Methods for Nonlinear Programming, Shanghai Scientific and Technical Publishers, Todd,M.J., On Convergence Properties of Algorithms for Unconstrained Minimization, IMA Journal of Numerical Analysis, Vol. 9, pp , Shi, Z. J., A Supermemory Gradient Method for Unconstrained Optimization Problems, Chinese Journal of Engineering Mathematics, Vol. 17, pp , 2000 (in Chinese). 11. Wei, Z., Qi, L., and Jiang, H., Some Convergence Properties of Descent Methods, Journal of Optimization Theory and Applications, Vol. 95, pp , Vrahatis, M. N., Androulakis, G. S., and Manoussakis, G. E., A New Unconstrained Optimization Method for Imprecise Function and Gradient Values, Journal of Mathematical Analysis and Applications, Vol. 197, pp , Grippo, L., Lampariello, F., and Lucidi, S., A Class of Nonmonotone Stability Methods in Unconstrained Optimization, Numerische Mathematik, Vol. 62, pp , Dai, Y. H., On the Nonmonotone Line Search, Journal of Optimization Theory and Applications, Vol. 112, pp , Sun, W. Y., Han, J. Y., and Sun, J., Global Convergence of Nonmonotone Descent Methods for Unconstrained Optimization Problems, Journal of Computational and Applied Mathematics, Vol. 146, pp , Barzilai, J., and Borwein, J. M., Two-Point Stepsize Gradient Methods, IMA Journal of Numerical Analysis, Vol. 8, pp , Dai, Y. H., and Liao, L. Z., R-Linear Convergence of the Barzilai and Borwein Gradient Method, IMA Journal of Numerical Analysis, Vol. 22, pp. 1 10, Raydan, M., The Barzilai and Borwein Method for the Large-Scale Unconstrained Minimization Problem, SIAM Journal on Optimization, Vol. 7, pp , Raydan, M., On the Barzilai-Borwein Gradient Choice of Steplength for the Gradient Method, IMA Journal of Numerical Analysis, Vol. 13, pp , 1993.

22 446 JOTA: VOL. 127, NO. 2, NOVEMBER Moré, J. J., Garbow, B. S., and Hillstron, K. E., Testing Unconstrained Optimization Software, ACM Transactions on Mathematical Software, Vol. 7, pp , Nocedal, J., and Wright, S. J., Numerical Optimization, Springer Verlag, New York, NY, Polak, E., Optimization: Algorithms and Consistent Approximations, Springer Verlag, New York, NY, 1997.

Step-size Estimation for Unconstrained Optimization Methods

Volume 24, N. 3, pp. 399 416, 2005 Copyright 2005 SBMAC ISSN 0101-8205 www.scielo.br/cam Step-size Estimation for Unconstrained Optimization Methods ZHEN-JUN SHI 1,2 and JIE SHEN 3 1 College of Operations