ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK)

Similar documents
5 Quasi-Newton Methods

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Algorithms for Constrained Optimization

2. Quasi-Newton methods

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Optimization II: Unconstrained Multivariable

Optimization and Root Finding. Kurt Hornik

Convex Optimization CMU-10725

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Quasi-Newton methods for minimization

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Nonlinear Optimization: What s important?

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Optimization II: Unconstrained Multivariable

Lecture 18: November Review on Primal-dual interior-poit methods

Statistics 580 Optimization Methods

On Sparse and Symmetric Matrix Updating

Higher-Order Methods

MATH 4211/6211 Optimization Quasi-Newton Method

Quasi-Newton Methods

Written Examination

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces

Seminal papers in nonlinear optimization

arxiv: v3 [math.na] 23 Mar 2016

Lecture 14: October 17

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS

Search Directions for Unconstrained Optimization

Quasi-Newton Methods

Lecture V. Numerical Optimization

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

Preconditioned conjugate gradient algorithms with column scaling

Math 408A: Non-Linear Optimization

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization

5 Handling Constraints

Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms

A NOTE ON PAN S SECOND-ORDER QUASI-NEWTON UPDATES

The Newton-Raphson Algorithm

4TE3/6TE3. Algorithms for. Continuous Optimization

Chapter 4. Unconstrained optimization

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Optimization Methods

Programming, numerics and optimization

Gradient-Based Optimization

Optimization Methods for Machine Learning

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Handout on Newton s Method for Systems

Lecture 7 Unconstrained nonlinear programming

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Scientific Computing: An Introductory Survey

Part 4: IIR Filters Optimization Approach. Tutorial ISCAS 2007

Numerical Analysis of Electromagnetic Fields

Nonlinear Programming

Spectral gradient projection method for solving nonlinear monotone equations

1 Computing with constraints

Modification of Non-Linear Constrained Optimization. Ban A. Metras Dep. of Statistics College of Computer Sciences and Mathematics

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

KKT Conditions for Rank-Deficient Nonlinear Least-Square Problems with Rank-Deficient Nonlinear Constraints1

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Image restoration. An example in astronomy

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Unconstrained optimization

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

New class of limited-memory variationally-derived variable metric methods 1

Stochastic Quasi-Newton Methods

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Global convergence of a regularized factorized quasi-newton method for nonlinear least squares problems

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

An Inexact Newton Method for Optimization

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Stochastic Optimization Algorithms Beyond SG

Inexact Newton Methods and Nonlinear Constrained Optimization

Multivariate Newton Minimanization

A DIMENSION REDUCING CONIC METHOD FOR UNCONSTRAINED OPTIMIZATION

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Appendix A Taylor Approximations and Definite Matrices

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

LINEAR AND NONLINEAR PROGRAMMING

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

2.3 Linear Programming

The Kalman filter is arguably one of the most notable algorithms

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)

Lecture 15: SQP methods for equality constrained optimization

Cubic regularization in symmetric rank-1 quasi-newton methods

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis -

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

UNIVERSITY OF CALIFORNIA, MERCED LARGE-SCALE QUASI-NEWTON TRUST-REGION METHODS: HIGH-ACCURACY SOLVERS, DENSE INITIALIZATIONS, AND EXTENSIONS

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

1 Numerical optimization

Transcription:

Quasi-Newton updates with weighted secant equations by. Gratton, V. Malmedy and Ph. L. oint Report NAXY-09-203 6 October 203 0.5 0 0.5 0.5 0 0.5 ENIEEH-IRI, 2, rue Camichel, 3000 oulouse France LM AMECH, A iemens Business,5-6, Lower Park Row, B 5BN Bristol UK University of Namur, 6, rue de Bruxelles, B5000 Namur Belgium http://www.fundp.ac.be/sciences/naxys

Quasi-Newton updates with weighted secant equations. Gratton, V. Malmedy and Ph. L. oint 6 October 203 Abstract We provide a formula for variational quasi-newton updates with multiple weighted secant equations. he derivation of the formula leads to a ylvester equation in the correction matrix. Examples are given. Introduction Quasi-Newton methods have long been at the core of nonlinear optimization, both in the constrained and unconstrained case. uch methods are characterized by their use of an approximation for the Hessian of the underlying nonlinear functions which is typically recurred by low rank modifications of an initial estimate. he corresponding updates are derived variationally to enforce minimal correction in a suitable norm while guaranteeing the preservation of available secant equations which provide information about local curvature. By far the most common case is when a single such secant equation is considered for updating the Hessian approximation, and several famous formulae have been derived in this context, such as the DFP Davidon, 959, Fletcher and Powell, 963, BFG Broyden, 970, Fletcher, 970, Goldfarb, 970, hanno, 970 or PB Powell, 970 updates. More unusual is the proposal by chnabel 983 see also Byrd, Nocedal and chnabel, 994 to consider the simultaneous enforcement of several secant equations. he purpose of this small note, whose content follows up from Malmedy 200, is to pursue this line of thought, but in a slightly relaxed context where the deviation from the multiple secant equations is penalized rather than strictly forced to zero. Our development is primarily motivated by an attempt to explain the influence of the order in which secant pairs are considered in limited-memory BFG, as described in Malmedy 200, where a weighed combination of secant updates is used to conduct this investigation. his motivation was recently reinforced by a presentation of a stochastic context in which secant equations are enforced on average Nocedal, 203. his short paper is organized as follows. he new penalized multiple secant update is derived in ection 2, while a few examples are considered in ection 3. Conclusions are drawn in ection 4. 2 Penalized multisecant quasi-newton updates We consider deriving a new quasi-newton approximation B + for the Hessian of a nonlinear function from IR n into IR from an existing one B, while at the same time attempting to satisfy m secant equations. More specifically, we assume that m secant pairs of the form s i,y i i =,...,m are at our disposal. hese pairs are supposed to contain useful curvature information and are typically derived by computing differences y i in the gradient of the underlying nonlinear function corresponding to steps s i in the variable space IR n. he matrix E holds the correction B + B. he vectors s i and y i form the columns of the n m matrices and Y, respectively. ENEEIH-IRI, 2, rue Camichel, 3000 oulouse, France. Email: serge.gratton@enseeiht.fr LM AMECH, A iemens Business, 5-6, Lower Park Row, B 5BN Bristol UK. Email: vincent.malmedy@gmail.com Namur Center for Complex ystems naxys and Department of Mathematics, University of Namur, 6, rue de Bruxelles, B-5000 Namur, Belgium. Email: philippe.toint@unamur.be

Gratton, Malmedy, oint: Quasi-Newton updates with weighted secant equations 2 As is well-known see Dennis and chnabel, 983, for instance, most secant updates may be derived by a variational approach. o define multisecant versions of the classical formulae, it would therefore make sense to solve the constrained variational problem min E 2 W EW 2 F 2.a s.t. B + Es i = y i for i =,...,m, and E = E, 2.b using some particular nonsingular weighting matrix W, whose choice determines the formula that is obtained. For instance, choosing W = I yields the multisecant PB update, while any matrix W such that W W = Y yields the multisecant DFP update. he multisecant BFG may then be obtained by duality from its DFP counterpart. his is the approach followed by chnabel 983 see also Byrd et al., 994 for a multisecant analog of the BFG formula. We propose here to consider a slightly different framework : instead of solving 2., we aim to solve the penalized problem min E 2 W EW 2 F + 2 ω i B + Es i y i 2 Ŵ 2.2a s.t. E = E, 2.2b with Ŵ = W W, in effect relaxing the secant equations by penalizing their violation with independent nonnegative penalty parameters ω i for i =,...,m. 2. olving the penalized variational problem Zeroing the derivative of the Lagrangian function of problem 2.2 with respect to E in the matrix direction K, we obtain that trw EW W KW + trk M M + ω i s i K Ŵ B + Es i y i = 0, 2.3 where M is the Lagrange multiplier corresponding to the symmetry constraint. Using the vectorization operator vec, this equation may be rewritten as vecŵ EŴ + M M,vecK + ω i s i Ŵ Es i s i Ŵ r i,veck = 0, 2.4 where r i = y i Bs i is the residual on the i-th secant equation at the current Hessian. As the equation 2.4 must hold for every matrix K, we thus have that which is equivalent to vecŵ EŴ + M M + Ŵ EŴ + M M + ω i s i Ŵ Es i s i Ŵ r i = 0, ω i Ŵ Es i s i Ŵ r i s i = 0. By summing this equation with its transpose and then, pre- and post-multiplying by Ŵ, we eliminate the Lagrangian multiplier M, and obtain that E I + ω i s i s i Ŵ + I + ω i Ŵs i s i E = ω i Ŵs iri + r i s i Ŵ.

Gratton, Malmedy, oint: Quasi-Newton updates with weighted secant equations 3 If we define the n m matrices R = Y B, R = R Ω /2, = Ω /2 and Z = Ŵ, where Ω = diagω i, we thus have to solve the Lyapunov equation AE + EA = C, 2.5 where A = I + Z and C = R Z + Z R. We know from that equation 2.5 has a unique solution whenever A does not have opposite eigenvalues see p. 44 of Lancaster and ismenetsky, 985. But A = I + Z = I + Ŵ, and from the relation spa {} = spi + Ŵ {}, we may deduce that A has only positive eigenvalues. hus 2.5 has a unique solution. ransposing the left and right hand-side of 2.5, and using the symmetry of C, we furthermore see that E also satifies 2.5. herefore, since the solution of this equation is unique, E is symmetric. We now show that E can be written in the form E = R,Z X X 2 R X2 X 3 Z 2.6 where X and X 3 are symmetric. ubstituting this expression into 2.5 and using Z R Z + Z R = R, 0 Im R I m 0 Z yields [ R,Z X X 2 2 X2 X 3 + 0 0 RX + ZX 2 RX2 + ZX3 0 X R + X2 Z + 0 X2 R + X3 Z 0 Im I m 0 A solution of 2.5, and thus the unique one, can thus be obtained by setting and choosing X 3 to be the solution of the Lyapunov equation But the matrix ] R Z = 0. X = 0, X 2 = 2I m + Z 2.7 I m + ZX3 + X 3 I m + Z = RX2 X 2 R. 2.8 I m + Z = Im + Ŵ 2.9 is symmetric and positive definite. Hence equation 2.8 also has a unique solution, which can directly be computed using the eigen-decomposition of the m m symmetric matrix I m + Z. herefore, using 2.6, 2.7 and the fact that 2.9 implies the symmetry of X 2, we obtain that the solution of 2.3 can be written as E = R 2I m + Z Z + Z 2I m + Z R + ZX3 Z, 2.0 where X 3 solves 2.8. Note that the core of the computation to obtain E the determination of X 2 and X 3 only involves forming m m matrices from m n factors and then working in this smaller space, resulting in an Om 2 n + m 3 computational complexity.

Gratton, Malmedy, oint: Quasi-Newton updates with weighted secant equations 4 An important particular case is the case where we have only one secant equation, i.e. m =. In this case, the Lyapunov equation 2.8 is a scalar equation, and, defining r = y Bs, r = ωr and s = ωs, we obtain that E = rz + zr 2 + z s s r 2 + z s + z s zz, = rs Ŵ + Ŵsr 2ω + s Ŵs s r 2ω + s Ŵs ω + s Ŵs Ŵss Ŵ 2. As the scalar ω goes to infinity, we get the update E = rs Ŵ + Ŵsr s Ŵs s r s Ŵs 2 Ŵss Ŵ. 3 Examples of penalized updates 3. he penalized PB update If we choose Ŵ = I, we obtain from 2.0 and 2.8 that where X 3 solves B + = B + R 2Ω + + 2Ω + R + Ω /2 X 3 Ω /2, 3.2 I m + Ω /2 Ω /2 X 3 + X 3 I m + Ω /2 Ω /2 = Ω /2 R Ω /2 X 2 X 2 Ω /2 R Ω /2 3.3 with X 2 = 2I + Ω /2 Ω /2. If we consider the single-secant case m =, we derive from 2. that B + = B + rs + sr 2ω + s 2 + ω + s 2 2 s,r 2ω + s 2 s 2 ss, 3.4 3.2 he penalized DFP update If we now assume that the matrices B and Y are symmetric, and that Y is positive definite, we deduce that B + = B + R 2Ω + Y Y + Y 2Ω + Y R + Y Ω /2 X 3 Ω /2 Y, 3.5 where X 2 = 2I + ΩY Ω and X3 solves X 2 IX 3 + X 3 X 2 I = Ω /2 R Ω /2 X 2 X 2 Ω /2 R Ω /2. 3.6 with X 2 = 2I + Ω /2 Y Ω /2. Note also that a direct computation shows that X 2 I + IX 2 Ω /2 R Ω /2 X 2 + X 2 Ω /2 R Ω /2 X 2 X 2 I + I = Ω /2 R Ω /2 X 2 + X 2 Ω /2 R Ω /2. 3.7 o simplify and better understand the relation between the penalized and a standard DFP formula in block form, we set X 4 = X 3 + X 2 Ω /2 R Ω /2 X 2, 3.8 sum up equations 3.7 and 3.6, and obtain that I + Ω /2 Y Ω /2 X 4 + X 4 I + Ω /2 Y Ω /2 = 2X 2 Ω /2 R Ω /2 X 2. 3.9

Gratton, Malmedy, oint: Quasi-Newton updates with weighted secant equations 5 We also deduce from 3.8 that Ω /2 X 3 Ω /2 = 2Ω + Y R 2Ω + Y + Ω /2 X 4 Ω /2, and inserting this expression into the penalized DFP update 3.5, we obtain that B + = I Y 2Ω 2 + Y B I Y 2Ω 2 + Y +Y 22Ω 2 + Y 2Ω 2 + Y Y 2Ω 2 + Y + X 5 Y 3.20 where, in view of 3.9, X 5 = Ω /2 X 4 Ω /2 solves I + ΩY X 5 + X 5 I + Y Ω = 2 2Ω + Y R 2Ω + Y. 3.2 But, using Y = Y, 22Ω + Y 2Ω + Y Y 2Ω + Y and hence 3.20 finally becomes B + = = 2Ω + Y [ 2 2Ω + Y 2Ω + Y Y 2Ω + Y ] = 2Ω + Y 4Ω + Y 2Ω + Y, I Y 2Ω + Y B I Y 2Ω + Y 2Ω +Y + Y 4Ω + Y 2Ω + Y 3.22 + X5 Y where X 5 solves 3.2. If we set Ω = ωi and assume that Y is non singular, we obtain from 3.2 that X 5 = Oω, and we see from 3.22 that the penalized update converges to the DFP update in block form when ω goes to infinity. If m =, 2. gives that B + = B + ry + yr 2ω + s,y + 3.3 he penalized BFG update ω + s,y 2 2ω + s,y s,r s,y yy. 3.23 Let us assume again that Y is symmetric positive definite and set P = HY, with H being an approximation of the inverse Hessian. Exchanging the role of Y and in the penalized DFP formula, we derive the BFG penalized update defined by where X 6 solves H + = I 2Ω + Y Y H I 2Ω + Y Y 2Ω + + Y 4Ω 2 + Y 2Ω + Y + X6, 3.24 I + ΩY X 6 + X 6 I + Y Ω = 2 2Ω + Y P 2Ω + Y. 3.25 If we set Ω = ωi, and consider large values for ω we see that the penalized update again converges to the BFG formula. We also conclude that for ω sufficiently large, the penalized update will be positive definite. If Y is not symmetric and positive definite, an approximate update could be computed by 2Ω replacing the matrix + Y 4Ω + Y 2Ω + Y + X6 between and in expression 3.24 by any symmetric and positive definite approximation.

Gratton, Malmedy, oint: Quasi-Newton updates with weighted secant equations 6 In the case of a single secant pair m =, we obtain that 2ω + s,y + H + = H + ps + sp = I ρsy + 2ρω with ρ = s,y, p = s Hy and θ = + ρ ω 4 Conclusions H ω + s,y 2 2ω + s,y I ρys + 2ρω p,y s,y ss 3.26 + θρss, 3.27 [ ρ y,hy + 2ρ 2 ω + + ρ ω 2 + ρ ] ω. We have used variational techniques to derive a new algorithm for updating a quasi-newton Hessian approximation using multiple secant equations. he cost of the associated updates has been discussed and remains reasonable in the sense that it does not involve terms in the cube of the problem dimension. It is hoped that the new algorithm may prove useful beyond its initial application in Malmedy 200. References C. G. Broyden. he convergence of a class of double-rank minimization algorithms. Journal of the Institute of Mathematics and its Applications, 6, 76 90, 970. R. H. Byrd, J. Nocedal, and R. B. chnabel. Representation of quasi-newton matrices and their use in limited memory methods. Mathematical Programming, 63, 29 56, 994. W. C. Davidon. Variable metric method for minimization. Report ANL-5990Rev., Argonne National Laboratory, Research and Development, 959. Republished in the IAM Journal on Optimization, vol., pp. 7, 99. J. E. Dennis and R. B. chnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewood Cliffs, NJ, UA, 983. Reprinted as Classics in Applied Mathematics 6, IAM, Philadelphia, UA, 996. R. Fletcher. A new approach to variable metric algorithms. Computer Journal, 3, 37 322, 970. R. Fletcher and M. J. D. Powell. A rapidly convergent descent method for minimization. Computer Journal, 6, 63 68, 963. D. Goldfarb. A family of variable metric methods derived by variational means. Mathematics of Computation, 24, 23 26, 970. P. Lancaster and M. ismenetsky. he heory of Matrices. Academic Press, London, second edn, 985. V. Malmedy. Hessian approximation in multilevel nonlinear optimization. PhD thesis, Department of Mathematics, University of Namur, Namur, Belgium, 200. J. Nocedal. private communication, oulouse, 203. M. J. D. Powell. A new algorithm for unconstrained optimization. in J. B. Rosen, O. L. Mangasarian and K. Ritter, eds, Nonlinear Programming, pp. 3 65, London, 970. Academic Press. R. B. chnabel. Quasi-newton methods using multiple secant equations. echnical Report CU-C-247-83, Departement of Computer cience, University of Colorado at Boulder, Boulder, UA, 983. D. F. hanno. Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24, 647 657, 970.