AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

Similar documents
Bulletin of the. Iranian Mathematical Society

New Hybrid Conjugate Gradient Method as a Convex Combination of FR and PRP Methods

New hybrid conjugate gradient algorithms for unconstrained optimization

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A Modified Hestenes-Stiefel Conjugate Gradient Method and Its Convergence

New hybrid conjugate gradient methods with the generalized Wolfe line search

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

First Published on: 11 October 2006 To link to this article: DOI: / URL:

Modification of the Armijo line search to satisfy the convergence properties of HS method

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

Open Problems in Nonlinear Conjugate Gradient Algorithms for Unconstrained Optimization

Global Convergence Properties of the HS Conjugate Gradient Method

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize

Research Article A Descent Dai-Liao Conjugate Gradient Method Based on a Modified Secant Equation and Its Global Convergence

and P RP k = gt k (g k? g k? ) kg k? k ; (.5) where kk is the Euclidean norm. This paper deals with another conjugate gradient method, the method of s

An Efficient Modification of Nonlinear Conjugate Gradient Method

A modified quadratic hybridization of Polak-Ribière-Polyak and Fletcher-Reeves conjugate gradient method for unconstrained optimization problems

Nonlinear conjugate gradient methods, Unconstrained optimization, Nonlinear

NUMERICAL COMPARISON OF LINE SEARCH CRITERIA IN NONLINEAR CONJUGATE GRADIENT ALGORITHMS

A Nonlinear Conjugate Gradient Algorithm with An Optimal Property and An Improved Wolfe Line Search

Step-size Estimation for Unconstrained Optimization Methods

on descent spectral cg algorithm for training recurrent neural networks

New Accelerated Conjugate Gradient Algorithms for Unconstrained Optimization

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

Step lengths in BFGS method for monotone gradients

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Research Article A New Conjugate Gradient Algorithm with Sufficient Descent Property for Unconstrained Optimization

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Higher-Order Methods

A family of derivative-free conjugate gradient methods for large-scale nonlinear systems of equations

Unconstrained optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

On Descent Spectral CG algorithms for Training Recurrent Neural Networks

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Bulletin of the. Iranian Mathematical Society

Preconditioned conjugate gradient algorithms with column scaling

Programming, numerics and optimization

New Inexact Line Search Method for Unconstrained Optimization 1,2

2. Quasi-Newton methods

Chapter 4. Unconstrained optimization

Gradient methods exploiting spectral properties

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Research Article Nonlinear Conjugate Gradient Methods with Wolfe Type Line Search

A derivative-free nonmonotone line search and its application to the spectral residual method

Line Search Methods for Unconstrained Optimisation

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Modification of the Wolfe line search rules to satisfy the descent condition in the Polak-Ribière-Polyak conjugate gradient method

SOLVING SYSTEMS OF NONLINEAR EQUATIONS USING A GLOBALLY CONVERGENT OPTIMIZATION ALGORITHM

Downloaded 12/02/13 to Redistribution subject to SIAM license or copyright; see

Seminal papers in nonlinear optimization

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Chapter 8 Gradient Methods

Two improved classes of Broyden s methods for solving nonlinear systems of equations

Math 164: Optimization Barzilai-Borwein Method

An Iterative Descent Method

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

The Wolfe Epsilon Steepest Descent Algorithm

Adaptive two-point stepsize gradient algorithm

8 Numerical methods for unconstrained problems

The speed of Shor s R-algorithm

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Conjugate gradient methods based on secant conditions that generate descent search directions for unconstrained optimization

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

Nonlinear Programming

Convex Optimization. Problem set 2. Due Monday April 26th

R-Linear Convergence of Limited Memory Steepest Descent

Quasi-Newton Methods

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

An interior-point gradient method for large-scale totally nonnegative least squares problems

Modification of the Wolfe line search rules to satisfy the descent condition in the Polak-Ribière-Polyak conjugate gradient method

1 Numerical optimization

On Nesterov s Random Coordinate Descent Algorithms - Continued

REPORTS IN INFORMATICS

arxiv: v1 [math.oc] 1 Jul 2016

Gradient-Based Optimization

Reduced-Hessian Methods for Constrained Optimization

Cubic regularization of Newton s method for convex problems with constraints

Iterative Methods for Smooth Objective Functions

Numerical Optimization of Partial Differential Equations

A class of Smoothing Method for Linear Second-Order Cone Programming

Conjugate Directions for Stochastic Gradient Descent

Spectral gradient projection method for solving nonlinear monotone equations

Two new spectral conjugate gradient algorithms based on Hestenes Stiefel

Statistics 580 Optimization Methods

S.F. Xu (Department of Mathematics, Peking University, Beijing)

Optimization II: Unconstrained Multivariable

Iterative Methods for Solving A x = b

GRADIENT METHODS FOR LARGE-SCALE NONLINEAR OPTIMIZATION

Conjugate Gradient (CG) Method

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

Quasi-Newton methods for minimization

Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization

Transcription:

Bull. Iranian Math. Soc. Vol. 40 (2014), No. 1, pp. 235 242 Online ISSN: 1735-8515 AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S. BABAIE KAFAKI (Communicated by Abbas Salemi) Abstract. Based on an eigenvalue analysis, a new proof for the sufficient descent property of the modified Pola-Ribière-Polya conjugate gradient method proposed by Yu et al. is presented. Keywords: Unconstrained optimization, conjugate gradient algorithm, sufficient descent condition, eigenvalue. MSC(2010): Primary: 65K05; Secondary: 90C53, 49M37, 15A18. 1. Introduction Conjugate gradient (CG) methods comprise a class of unconstrained optimization algorithms characterized by low memory requirements and strong global convergence properties [7]. Although CG methods are not the fastest or most robust optimization algorithms available today, they remain very popular for engineers and mathematicians engaged in solving large-scale problems in the following form: min f(x), x R n where f : R n R is a smooth nonlinear function and its gradient is available. The iterative formula of a CG method is given by (1.1) x 0 R n, x +1 = x + s, s = α d, = 0, 1,..., Article electronically published on February 25, 2014. Received: 24 August 2012, Accepted: 12 January 2013. 235 c 2014 Iranian Mathematical Society

A study on the descent property of DPRP method 236 in which α is a steplength to be computed by a line search procedure, and d is the search direction defined by (1.2) d 0 = g 0, d +1 = g +1 + β d, = 0, 1,..., where g = f(x ), and β is a scalar called the CG (update) parameter. Different choices for the CG parameter lead to different CG methods (see [11] and the references therein). One of the efficient CG methods has been proposed by Pola, Ribière [12] and Polya [13] (PRP), with the following CG parameter: β P RP = gt +1 y g 2, where y = g +1 g, and. stands for the Euclidean norm. Numerical efficiency of the PRP method is related to an automatic restart feature which avoids jamming [14], i.e., the generation of many short steps without maing significant progress to the minimum. More exactly, when the step s is small, the factor y in the numerator of β P RP tends to zero. Therefore, β P RP becomes small and the new search direction is approximately the steepest descent direction. Next, a brief discussion on the convergence results development of the PRP method will be provided. In this context, the following definition is needed. Definition 1.1. We say that the search direction d is a descent direction (or equivalently, satisfies the descent condition) iff g T d < 0. Also, we say that the search directions {d } 0 satisfy the sufficient descent condition iff (1.3) g T d τ g 2, 0, where τ is a positive constant. In the past decades, efforts have been made to study the global convergence of the PRP method. For example, Pola and Ribière [12] showed that the PRP method is globally convergent when the objective function is uniformly convex and the line search is exact. Powell [15] established that for a general nonlinear function, if s tends to zero, the line search is exact, and f is Lipschitz continuous, then the PRP method is globally convergent. On the other hand, he constructed a three dimensional counter example and showed that the PRP method with the

237 Babaie Kafai exact line search may cycle infinitely without convergence to a solution [15]. Hence, the assumption s 0 is necessary in the Powell s convergence result. Yuan [19] proved that if the search directions of the PRP method are descent directions, then, under the Wolfe line search conditions [17], the method is globally convergent for uniformly convex functions. However, Dai [5] showed that even for the uniformly convex functions, the PRP method may fail to generate a descent direction. So, the PRP method lacs global convergence in certain circumstances. In a recent effort to mae a modification on the PRP method in order to achieve the sufficient descent condition, Yu et al. [18] proposed a modified form of β P RP as follows: (1.4) β DP RP where C is a constant satisfying (1.5) C > 1 4. = β P RP C y 2 g 4 gt +1 d, Note that if the line search is exact, then g+1 T d = 0, and consequently, the CG parameter βdp RP reduces to β P RP. An interesting feature of the DPRP method is that its search directions satisfy the sufficient descent condition (1.3) with τ = 1 1 4C, independent of the line search and the objective function convexity. Also, the DPRP method is globally convergent under certain line search strategies [18]. Furthermore, numerical results of [18] showed that the DPRP method is numerically efficient. 2. On the sufficient descent property of the DPRP method Although the descent condition is often adequate [7], sufficient descent condition may be crucial in the convergence analysis of the CG methods [1 4, 8, 9]. Also, satisfying in the sufficient descent condition is considered as a strength of a CG method [6,10]. Here, based on an eigenvalue study, a new proof for the sufficient descent property of the DPRP method is presented. The proof nicely demonstrates the importance of the condition (1.5). It is remarable that from (1.2) and (1.4), the search directions of the DPRP method can be written as: (2.1) d +1 = Q +1 g +1, = 0, 1,...,

A study on the descent property of DPRP method 238 where Q +1 R n n is defined by (2.2) Q +1 = I d y T g 2 + C y 2 g 4 d d T. Since Q +1 is determined based on a ran-two update, its determinant can be computed by (2.3) det(q +1 ) = C y 2 d 2 g 4 dt y g 2 + 1 def = ξ. (See equality (1.2.70) of [16].) The following proposition is now immediate. Proposition 2.1. If the inequality (1.5) holds, then (2.4) 4ξ > y 2 d 2 (d T y ) 2 g 4 def = γ. Proof. Since C > 1, we can write 4 ξ 1 ( 4 γ = C 1 ) y 2 d 2 4 g 4 + 1 (d T y ) 2 4 g 4 dt y g 2 + 1 ( = C 1 ) y 2 d 2 ( 1 d T 4 g 4 + y ) 2 2 g 2 1 > 0, which completes the proof. From Cauchy inequality, γ defined in (2.4) is nonnegative and so, the inequality (2.1) ensures that ξ > 0. Thus, from (2.3) the matrix Q +1 defined by (2.2) is nonsingular. The following theorem ensures the sufficient descent property of the DPRP method. Theorem 2.2. For a CG method in the form of (1.1)-(1.2) with the CG parameter βdp RP defined by (1.4), the following sufficient descent condition holds: ( (2.5) g T d 1 1 ) g 2, 0. 4C Proof. Since d 0 = g 0, the sufficient descent condition (2.5) holds for = 0. For the subsequent indices, from (2.1) we can write (2.6) d T +1 g +1 = g T +1 QT +1 g +1 = g T +1 Q T +1 + Q +1 g +1. 2

239 Babaie Kafai So, in our proof we need to find the eigenvalues of the following symmetric matrix: A +1 = QT +1 + Q +1 2 = I + C y 2 g 4 d d T 1 d y T + y d T 2 g 2. Note that if y = 0, then A +1 = I, and consequently, all the eigenvalues of A +1 are equal to 1. Otherwise, since d 0, there exists a set of mutually orthogonal vectors {u i }n 2 i=1 such that which leads to d T ui = yt ui = 0, ui = 1, i = 1,..., n 2, A +1 u i = ui, i = 1,..., n 2. That is, the vectors u i, i = 1,..., n 2, are the eigenvectors of A +1 correspondent to the eigenvalue 1. Now, we find the two remaining eigenvalues of A +1 namely λ and λ+. Since the trace of a square matrix is equal to the summation of its eigenvalues, we have which yields tr(a +1 ) = n + ξ 1 = 1 } + {{... + 1 } +λ + λ+, (n 2) times (2.7) λ + λ+ = ξ + 1, with ξ defined in (2.3). On the other hand, since from the properties of the Frobenius norm, we have A +1 2 F = tr(a T +1 A +1) = tr(a 2 +1 ) = } 1 + {{... + 1 } (n 2) times +λ 2 + λ +2 after some algebraic manipulations we get λ 2 + λ +2 = (C d 2 y 2 ) 2 g 4 + 1 2C d 2 y 2 (d T y ) g 6 which together with (2.7) yields, + 1 d 2 y 2 + (d T y ) 2 2 g 4 2 dt y g 2 + 1, (2.8) λ λ+ = ξ 1 4 γ,

A study on the descent property of DPRP method 240 with γ defined in (2.4). Thus, from (2.7) and (2.8), λ and λ+ computed as the solutions of the following quadratic equation: can be λ 2 (ξ + 1)λ + ξ 1 4 γ = 0. More precisely, λ ± = ξ + 1 ± (ξ 1) 2 + γ. 2 Since C > 1 4, from Proposition 2.1 we have ξ > 1 4 γ, and consequently, λ > 0. Also, λ+ λ and so, λ+ > 0. Therefore, the symmetric matrix A +1 is positive definite. Hence, considering (2.6), to complete the proof it is enough to show that λ 1 1. In this 4C context, we define the following function: h(z) = z + 1 (z 1) 2 + γ, 2 which is nondecreasing. Since C > 1, it can be shown that 4 Thus, ξ Cγ + 1 1 4C. ( h(ξ ) = λ f Cγ + 1 1 ) = 1 1 4c 4C, which completes the proof. Acnowledgments This research was in part supported by a grant from IPM (No. 90900023), and in part by the Research Council of Semnan University.

241 Babaie Kafai References [1] S. Babaie-Kafai, Erratum to: Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization, Optim. Methods Softw. 28 (2013), no. 1, 217 219.. [2] S. Babaie-Kafai, A note on the global convergence theorem of the scaled conjugate gradient algorithms proposed by Andrei, Comput. Optim. Appl. 52 (2012), no. 2, 409 414. [3] S. Babaie-Kafai, On the sufficient descent property of the Shanno s conjugate gradient method, Optim. Lett. 7 (2013), no. 4, 831 837. [4] S. Babaie-Kafai, A new proof for the sufficient descent condition of Andrei s scaled conjugate gradient algorithms, Pac. J. Optim., to apear. [5] Y. H. Dai, Analyses of Conjugate Gradient Methods, Ph.D. Thesis, Mathematics and Scientific/Engineering Computing, Chinese Academy of Sciences, 1997. [6] Y. H. Dai, New properties of a nonlinear conjugate gradient method, Numer. Math. 89 (2001), no. 1, 83 98. [7] Y. H. Dai, J. Y. Han, G. H. Liu, D. F. Sun, H. X. Yin and Y. X. Yuan, Convergence properties of nonlinear conjugate gradient methods, SIAM J. Optim. 10 (2000) 348 358. [8] Y. H. Dai and L. Z. Liao, New conjugacy conditions and related nonlinear conjugate gradient methods, Appl. Math. Optim. 43 (2001), no. 1, 87 101. [9] J. C. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM J. Optim. 2 (1992), no. 1, 21 42. [10] W. W. Hager and H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM J. Optim. 16 (2005), no. 1, 170 192. [11] W. W. Hager and H. Zhang, A survey of nonlinear conjugate gradient methods, Pac. J. Optim. 2 (2006), no. 1, 35 58. [12] E. Pola and G. Ribière, Note sur la convergence de méthodes de directions conjuguées, Rev. Française Informat. Recherche Opérationnelle 3 (1969) 35 43. [13] B. T. Polya, The conjugate gradient method in extreme problems, USSR Comp. Math. Math. Phys. 9 (1969) 94 112. [14] M. J. D. Powell, Restart procedures for the conjugate gradient method, Math. Programming 12 (1977), no. 2, 241 254. [15] M. J. D. Powell, Nonconvex Minimization Calculations and the Conjugate Gradient Method, 122 141, Numerical Analysis (Dundee, 1983), Lecture Notes in Mathematics, 1066, Springer, Berlin, 1984. [16] W. Sun and Y. X. Yuan, Optimization Theory and Methods: Nonlinear Programming, Springer, New Yor, 2006. [17] P. Wolfe, Convergence conditions for ascent methods, SIAM Rev. 11 (1969) 226 235. [18] G. Yu, L. Guan and G. Li, Global convergence of modified Pola-Ribière-Polya conjugate gradient methods with sufficient descent property, J. Ind. Manag. Optim. 4 (2008) 565 579. [19] Y. X. Yuan, Analysis on the conjugate gradient method, Optim. Methods Softw. 2 (1993) 19 29.

A study on the descent property of DPRP method 242 (Saman Babaie-Kafai) Department of Mathematics, Faculty of Mathematics, Statistics and Computer Sciences, Semnan University, P.O. Box 35195-363, Semnan, Iran, and, School of Mathematics, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5746, Tehran, Iran E-mail address: sb@semnan.ac.ir