Quasi-Newton methods for minimization

Similar documents
Nonlinear Programming

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

5 Quasi-Newton Methods

MATH 4211/6211 Optimization Quasi-Newton Method

Lecture 18: November Review on Primal-dual interior-poit methods

Optimization and Root Finding. Kurt Hornik

Chapter 4. Unconstrained optimization

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Convex Optimization CMU-10725

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Unconstrained optimization

2. Quasi-Newton methods

Search Directions for Unconstrained Optimization

Programming, numerics and optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Algorithms for Constrained Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Quasi-Newton Methods

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Nonlinear Optimization: What s important?

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Numerical Methods for Large-Scale Nonlinear Systems

Math 408A: Non-Linear Optimization

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

Lecture 7 Unconstrained nonlinear programming

Quasi-Newton Methods

Higher-Order Methods

Optimization II: Unconstrained Multivariable

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Statistics 580 Optimization Methods

Convex Optimization. Problem set 2. Due Monday April 26th

ECS550NFB Introduction to Numerical Methods using Matlab Day 2

Gradient-Based Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Matrix Secant Methods

arxiv: v3 [math.na] 23 Mar 2016

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

Lecture 14: October 17

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK)

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms

Optimization II: Unconstrained Multivariable

1 Numerical optimization

Part 4: IIR Filters Optimization Approach. Tutorial ISCAS 2007

March 5, 2012 MATH 408 FINAL EXAM SAMPLE

Simultaneous Diagonalization of Positive Semi-definite Matrices

Static unconstrained optimization

ISyE 6663 (Winter 2017) Nonlinear Optimization

1 Numerical optimization

The Conjugate Gradient Method

Stochastic Quasi-Newton Methods

Cubic regularization in symmetric rank-1 quasi-newton methods

Reduced-Hessian Methods for Constrained Optimization

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization

Notes on Numerical Optimization

Algorithms for Nonsmooth Optimization

ORIE 6326: Convex Optimization. Quasi-Newton Methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees

5 Handling Constraints

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

A derivative-free nonmonotone line search and its application to the spectral residual method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

Step lengths in BFGS method for monotone gradients

Matrix Derivatives and Descent Optimization Methods

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis -

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

On nonlinear optimization since M.J.D. Powell

EECS260 Optimization Lecture notes

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

NonlinearOptimization

8 Numerical methods for unconstrained problems

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

Optimization Methods

Quadratic Programming

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

minimize x subject to (x 2)(x 4) u,

New class of limited-memory variationally-derived variable metric methods 1

Lecture Notes 6: Dynamic Equations Part C: Linear Difference Equation Systems

Constrained Nonlinear Optimization Algorithms

MATH 4211/6211 Optimization Basics of Optimization Problems

Minimax Design of Complex-Coefficient FIR Filters with Low Group Delay

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

Transcription:

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1 / 63

Quasi Newton Method Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 2 / 63

Quasi Newton Method Algorithm (General quasi-newton algorithm) k 0; x 0 assigned; g 0 f(x 0 ) T ; H 0 2 f(x 0 ) 1 ; while g k > ɛ do compute search direction d k H k g k ; Approximate arg min α>0 f(x k + αd k ) by linsearch; perform step x k+1 x k + α k d k ; g k+1 f(x k+1 ) T ; update H k+1 H k+1 some algorithm ( H k, x k, x k+1, g k, g k+1 ) ; k k + 1; end while Quasi-Newton methods for minimization 3 / 63

The symmetric rank one update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 4 / 63

The symmetric rank one update Let B k an approximation of the Hessian of f(x). Let x k, x k+1, g k and g k+1 points and gradients at k and k + 1-th iterates. Using the Broyden update formula to force secant condition to B k+1 we obtain B k+1 B k + (y k B k s k )s T k s T k s k where s k = x k+1 x k and y k = g k+1 g k. By using Sherman Morrison formula and setting H k = B 1 k we obtain the update: H k+1 H k (H ky k s k )s T k s T k s k + s T k H kg k+1 H k The previous update do not maintain symmetry. In fact if H k is symmetric then H k+1 not necessarily is symmetric., Quasi-Newton methods for minimization 5 / 63

The symmetric rank one update To avoid the loss of symmetry we can consider an update of the form: H k+1 H k + uu T Imposing the secant condition (on the inverse) we obtain H k+1 y k = s k H k y k + uu T y k = s k from previous equality y T k H ky k + y T k uut y k = y T k s k y T k u = ( y T k s k y T k H ky k ) 1/2 we obtain u = s k H k y k u T y k = s k H k y k ( y T k s k y T k H ky k ) 1/2 Quasi-Newton methods for minimization 6 / 63

The symmetric rank one update substituting the expression of u u = s k H k y k ( y T k s k y T k H ky k ) 1/2 in the update formula, we obtain H k+1 H k + w kw T k w T k y k w k = s k H k y k The previous update formula is the symmetric rank one formula (SR1). To be definite the previous formula needs w T k y k 0. Moreover if w T k y k < 0 and H k is positive definite then H k+1 may loss positive definitiveness. Have H k symmetric and positive definite is important for global convergence Quasi-Newton methods for minimization 7 / 63

The symmetric rank one update This lemma is used in the forward theorems Lemma Let be q(x) = 1 2 xt Ax b T x + c with A R n n symmetric and positive defined. Then y k = g k+1 g k = Ax k+1 b Ax k + b = As k where g k = q(x k ) T. Quasi-Newton methods for minimization 8 / 63

The symmetric rank one update Theorem (property of SR1 update) Let be q(x) = 1 2 xt Ax b T x + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let x k and H k produced by 1 x k+1 = x k + s k ; 2 H k+1 updated by the SR1 formula H k+1 H k + w kw T k w T k y k w k = s k H k y k If s 0, s 1,..., s n 1 are linearly independent then H n = A 1. Quasi-Newton methods for minimization 9 / 63

The symmetric rank one update Proof. (1/2). We prove by induction the hereditary property H i y j = s j. BASE: For i = 1 is exactly the secant condition of the update. INDUCTION: Suppose the relation is valid for k > 0 the we prove that it is valid for k + 1. In fact, from the update formula H k+1 y j = H k y j + wt k y j wk T y w k k w k = s k H k y k by the induction hypothesis for j < k and using lemma on slide 8 we have w T k y j = s T k y j y T k H ky j = s T k y j y T k s j = y T k Ay j y T k Ay j = 0 so that H k+1 y j = H k y j = s j for j = 0, 1,..., k 1. For j = k we have H k+1 y k = s k trivially by construction of the SR1 formula. Quasi-Newton methods for minimization 10 / 63

The symmetric rank one update Proof. To prove that H n = A 1 notice that (2/2). H n y j = s j, As j = y j, j = 0, 1,..., n 1 and combining the equality H n As j = s j, j = 0, 1,..., n 1 due to the linear independence of s i we have H n A = I i.e. H n = A 1. Quasi-Newton methods for minimization 11 / 63

The symmetric rank one update Properties of SR1 update (1/2) 1 The SR1 update possesses the natural quadratic termination property (like CG). 2 SR1 satisfy the hereditary property H k y j = s j for j < k. 3 SR1 does maintain the positive definitiveness of H k if and only if w T k y k > 0. However this condition is difficult to guarantee. 4 Sometimes w T k y k becomes very small or 0. This results in serious numerical difficulty (roundoff) or even the algorithm is broken. We can avoid this breakdown by the following strategy Breakdown workaround for SR1 update 1 if wk T y k ɛ wk T y k (i.e. the angle between w k and y k is far from 90 degree), then we update with the SR1 formula. 2 Otherwise we set H k+1 = H k. Quasi-Newton methods for minimization 12 / 63

The symmetric rank one update Properties of SR1 update (2/2) Theorem (Convergence of nonlinear SR1 update) Let f(x) satisfying standard assumption. Let be {x k } a sequence of iterates such that lim k x k = x. Suppose we use the breakdown workaround for SR1 update and the steps {s k } are uniformly linearly independent. Then we have lim Hk 2 f(x ) 1 = 0. k A.R.Conn, N.I.M.Gould and P.L.Toint Convergence of quasi-newton matrices generated by the symmetric rank one update. Mathematic of Computation 50 399 430, 1988. Quasi-Newton methods for minimization 13 / 63

The Powell-symmetric-Broyden update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 14 / 63

The Powell-symmetric-Broyden update The SR1 update, although symmetric do not have minimum property like the Broyden update for the non symmetric case. The Broyden update B k+1 = B k + (y k B k s k )s T k s T k s k solve the minimization problem B k+1 B k F B B k F for all Bs k = y k If we solve a similar problem in the class of symmetric matrix we obtain the Powell-symmetric-Broyden (PSB) update Quasi-Newton methods for minimization 15 / 63

The Powell-symmetric-Broyden update Lemma (Powell-symmetric-Broyden update) Let A R n n symmetric and s, y R n with s 0. Consider the set B = { B R n n Bs = y, B = B T } if s T y 0 a then there exists a unique matrix B B such that A B F A C F for all C B moreover B has the following form B = A + ωst + sω T s T s (ω T s) sst (s T s) 2 ω = y As then B is a rank two perturbation of the matrix A. a This is true if Wolfe line search is performed Quasi-Newton methods for minimization 16 / 63

The Powell-symmetric-Broyden update Proof. First of all notice that B is not empty, in fact [ ] 1 1 s T y yyt B s T y yyt s = y (1/11). So that the problem is not empty. Next we reformulate the problem as a constrained minimum problem: 1 arg min B R n n 2 n (A ij B ij ) 2 subject to Bs = y and B = B T i,j=1 The solution is a stationary point of the Lagrangian: g(b, λ, M) = 1 2 A B 2 F + λt (By s) + i<j µ ij (B ij B ji ) Quasi-Newton methods for minimization 17 / 63

The Powell-symmetric-Broyden update Proof. taking the gradient we have B ij g(b, λ, B) = A ij B ij + λ i s j + M ij = 0 (2/11). where µ ij if i < j; M ij = µ ij if i > j; 0 If i = j. The previous equality can be written in matrix form as B = A + λs T + M. Quasi-Newton methods for minimization 18 / 63

The Powell-symmetric-Broyden update Proof. Imposing symmetry for B (3/11). A + λs T + M = A T + sλ T + M T = A + sλ T M solving for M we have substituting in B we have M = sλt λs T 2 B = A + sλt + λs T 2 Quasi-Newton methods for minimization 19 / 63

The Powell-symmetric-Broyden update Proof. Imposing s T Bs = s T y (4/11). s T As + st sλ T s + s T λs T s 2 λ T s = (s T ω)/(s T s) where ω = y As. Imposing Bs = y As + sλt s + λs T s 2 λ = 2ω s T s (st ω)s (s T s) 2 next we compute the explicit form of B. = s T y = y Quasi-Newton methods for minimization 20 / 63

The Powell-symmetric-Broyden update Proof. Substituting (5/11). we obtain λ = 2ω s T s (st ω)s (s T s) 2 in B = A + sλt + λs T 2 B = A + ωst + sω T s T s (ω T s) sst (s T s) 2 ω = y As next we prove that B is the unique minimum. Quasi-Newton methods for minimization 21 / 63

The Powell-symmetric-Broyden update Proof. The matrix B is a minimum, in fact B A F = ωs T + sω T s T s (ω T s) sst (s T s) 2 F (6/11). To bound this norm we need the following properties of Frobenius norm: M N 2 F = M 2 F + N 2 F 2M N; where M N = ij M ijn ij setting M = ωst + sω T s T s N = (ω T s) sst (s T s) 2 now we compute M F, N F and M N. Quasi-Newton methods for minimization 22 / 63

The Powell-symmetric-Broyden update Proof. M N = ωt s (s T s) 3 ij (ω i s j + ω j s i )s i s j = ωt s [ (ωi (s T s) 3 s i )sj 2 + (ω j s j )si 2 ) ] ij [ = ωt s (s T s) 3 (ω i s i ) sj 2 + (ω j s j ) i j j i [ ] = ωt s (s T s) 3 (ω T s)(s T s) + (ω T s)(s T s) = 2(ωT s) 2 (s T s) 2 s 2 i (7/11). ] Quasi-Newton methods for minimization 23 / 63

The Powell-symmetric-Broyden update Proof. (8/11). To bound N 2 F and M 2 F we need the following properties of Frobenius norm: uv T 2 F = (ut u)(v T v); uv T + vu T 2 F = 2(uT u)(v T v) + 2(u T v) 2 ; Then we have N 2 F = (ωt s) 2 (s T s) 4 ss T 2 F = (ωt s) 2 (s T s) 4 (st s) 2 = (ωt s) 2 (s T s) 2 M 2 F = ωst + sω T s T s = 2(ωT ω)(s T s) + 2(s T ω) 2 (s T s) 2 Quasi-Newton methods for minimization 24 / 63

The Powell-symmetric-Broyden update Proof. Putting all together and using Cauchy-Schwartz inequality (a T b a b ): (9/11). M N 2 F = (ωt s) 2 (s T s) 2 + 2(ωT ω)(s T s) + 2(s T ω) 2 (s T s) 2 4(ωT s) 2 (s T s) 2 = 2(ωT ω)(s T s) (ω T s) 2 (s T s) 2 ωt ω s T s = ω 2 s 2 [used Cauchy-Schwartz] Using ω = y As and noticing that y = Cs for all C B. so that ω = y As = Cs As = (C A)s Quasi-Newton methods for minimization 25 / 63

The Powell-symmetric-Broyden update Proof. To bound (C A)s we need the following property of Frobenius norm: in fact Mx M F x ; Mx 2 = i using this inequality ( ) 2 ( M ij s j j = M 2 F s 2 i j M 2 ij )( k s 2 k (10/11). ) M N F ω s = (C A)s s C A F s s i.e. we have A B F C A F for all C B. Quasi-Newton methods for minimization 26 / 63

The Powell-symmetric-Broyden update Proof. (11/11). Let B and B two different minimum. Then 1 2 (B + B ) B moreover A 1 2 (B + B ) 1 A B F 2 F + 1 A B 2 F If the inequality is strict we have a contradiction. From the Cauchy Schwartz inequality we have an equality only when A B = λ(a B ) so that and B λb = (1 λ)a B s λb s = (1 λ)as (1 λ)y = (1 λ)as but this is true only when λ = 1, i.e. B = B. Quasi-Newton methods for minimization 27 / 63

The Powell-symmetric-Broyden update Algorithm (PSB quasi-newton algorithm) k 0; x assigned; g f(x) T ; B 2 f(x); while g > ɛ do compute search direction d B 1 g; [solve linear system Bd = g] Approximate arg min α>0 f(x + αd) by linsearch; perform step x x + αd; update B k+1 ω f(x) T + (α 1)g; g f(x) T ; β (αd T d) 1 ; γ β 2 αd T ω; B B + β ( dω T + ωd T ) γdd T ; k k + 1; end while Quasi-Newton methods for minimization 28 / 63

The Davidon Fletcher and Powell rank 2 update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 29 / 63

The Davidon Fletcher and Powell rank 2 update The SR1 and PSB update maintains the symmetry but do not maintains the positive definitiveness of the matrix H k+1. To recover this further property we can try the update of the form: H k+1 H k + αuu T + βvv T Imposing the secant condition (on the inverse) H k+1 y k = s k H k y k + α(u T y k )u + β(v T y k )v = s k α(u T y k )u + β(v T y k )v = s k H k y k clearly this equation has not a unique solution. A natural choice for u and v is the following: u = s k v = H k y k Quasi-Newton methods for minimization 30 / 63

The Davidon Fletcher and Powell rank 2 update Solving for α and β the equation we obtain α(s T k y k)s k + β(y T k H ky k )H k y k = s k H k y k α = 1 s T k y k 1 β = yk T H ky k substituting in the updating formula we obtain the Davidon Fletcher and Powell (DFP) rank 2 update formula H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ky k Obviously this is only one of the possible choices and with other solutions we obtain different update formulas. Next we must prove that under suitable condition the DFP update formula maintains positive definitiveness. Quasi-Newton methods for minimization 31 / 63

The Davidon Fletcher and Powell rank 2 update Positive definitiveness of DFP update Theorem (Positive definitiveness of DFP update) Given H k symmetric and positive definite, then the DFP update H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ky k produce H k+1 positive definite if and only if s T k y k > 0. Remark (Wolfe DFP update is SPD) Expanding s T k y k > 0 we have f(x k+1 )s k > f(x k )s k. Remember that in a minimum search algorithm we have s k = α k p k with α k > 0. But the second Wolfe condition for line-search is f(x k + α k p k )p k c 2 f(x k )p k with 0 < c 2 < 1. But this imply: f(x k+1 )s k c 2 f(x k )s k > f(x k )s k s T k y k > 0. Quasi-Newton methods for minimization 32 / 63

The Davidon Fletcher and Powell rank 2 update Proof. (1/2). Let be s T k y k > 0: consider a z 0 then ( z T H k+1 z = z T H k H ky k yk T H ) k yk T H z + z T s ks T k ky k s T k y z k = z T H k z (zt H k y k )(y T k H kz) y T k H ky k + (zt s k ) 2 s T k y k H k is SPD so that there exists the Cholesky decomposition LL T = H k. Defining a = L T z and b = L T y k we can write z T H k+1 z = (at a)(b T b) (a T b) 2 b T b + (zt s k ) 2 s T k y k from the Cauchy-Schwartz inequality we have (a T a)(b T b) (a T b) 2 so that z T H k+1 z 0. Quasi-Newton methods for minimization 33 / 63

The Davidon Fletcher and Powell rank 2 update Proof. To prove strict inequality remember from the Cauchy-Schwartz inequality that (a T a)(b T b) = (a T b) 2 if and only if a = λb, i.e. (2/2). but in this case L T z = λl T y k z = λy k (z T s k ) 2 s T k y k = λ 2 (yt s k ) 2 s T k y k > 0 z T H k+1 z > 0. Quasi-Newton methods for minimization 34 / 63

The Davidon Fletcher and Powell rank 2 update Algorithm (DFP quasi-newton algorithm) k 0; x assigned; g f(x) T ; H 2 f(x) 1 ; while g > ɛ do compute search direction d Hg; Approximate arg min α>0 f(x + αd) by linsearch; perform step x x + αd; update H k+1 y f(x) T g; z Hy; g f(x) T ; H H α ddt d T y zzt y T z ; k k + 1; end while Quasi-Newton methods for minimization 35 / 63

The Davidon Fletcher and Powell rank 2 update Theorem (property of DFP update) Let be q(x) = 1 2 (x x ) T A(x x ) + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let {x k } and {H k } produced by the sequence {s k } 1 x k+1 x k + s k ; 2 H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ; ky k where s k = α k p k with α k is obtained by exact line-search. Then for j < k we have 1 g T k s j = 0; [orthogonality property] 2 H k y j = s j ; [hereditary property] 3 s T k As j = 0; [conjugate direction property] 4 The method terminate (i.e. f(x m ) = 0) at x m = x with m n. If n = m then H n = A 1. Quasi-Newton methods for minimization 36 / 63

The Davidon Fletcher and Powell rank 2 update Proof. Points (1), (2) and (3) are proved by induction. The base of induction is obvious, let be the theorem true for k > 0. Due to exact line search we have: g T k+1 s k = 0 moreover by induction for j < k we have g T k+1 s j = 0, in fact: g T k+1 s j = g T j s j + k 1 = 0 + k 1 i=j (g i+1 g i ) T s j i=j (A(x i+1 x ) A(x i x )) T s j = k 1 i=j (A(x i+1 x i )) T s j = k 1 i=j st i As j = 0. (1/4). [induction + conjugacy prop.] Quasi-Newton methods for minimization 37 / 63

The Davidon Fletcher and Powell rank 2 update Proof. (2/4). By using s k+1 = α k+1 H k+1 g k+1 we have s T k+1 As j = 0, in fact: s T k+1 As j = α k+1 gk+1 T H k+1(ax j+1 Ax j ) = α k+1 gk+1 T H k+1(a(x j+1 x ) A(x j x )) = α k+1 gk+1 T H k+1(g j+1 g j ) = α k+1 gk+1 T H k+1y j = α k+1 gk+1 T s j [induction + hereditary prop.] = 0 notice that we have used As j = y j. Quasi-Newton methods for minimization 38 / 63

The Davidon Fletcher and Powell rank 2 update Proof. Due to DFP construction we have (3/4). H k+1 y k = s k by inductive hypothesis and DFP formula for j < k we have, s T k y j = s T k As j = 0, moreover H k+1 y j = H k y j + s ks T k y j s T k y k H ky k y T k H ky j y T k H ky k = s j + s k0 s T k y H ky k yk T s j k yk T H [H k y j = s j ] ky k = s j H ky k (g k+1 g k ) T s j y T k H ky k [y j = g j+1 g j ] = s j [induction + ortho. prop.] Quasi-Newton methods for minimization 39 / 63

The Davidon Fletcher and Powell rank 2 update Proof. (4/4). Finally if m = n we have s j with j = 0, 1,..., n 1 are conjugate and linearly independent. From hereditary property and lemma on slide 8 i.e. we have H n As k = H n y k = s k H n As k = s k, k = 0, 1,..., n 1 due to linear independence of {s k } follows that H n = A 1. Quasi-Newton methods for minimization 40 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 41 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Another update which maintain symmetry and positive definitiveness is the Broyden Fletcher Goldfarb and Shanno (BFGS,1970) rank 2 update. This update was independently discovered by the four authors. A convenient way to introduce BFGS is by the concept of duality. Consider an update for the Hessian, say B k+1 U(B k, s k, y k ) which satisfy B k+1 s k = y k (the secant condition on the Hessian). Then by exchanging B k H k and s k y k we obtain the dual update for the inverse of the Hessian, i.e. H k+1 U(H k, y k, s k ) which satisfy H k+1 y k = s k (the secant condition on the inverse of the Hessian). Quasi-Newton methods for minimization 42 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Starting from the Davidon Fletcher and Powell (DFP) rank 2 update formula H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ky k by the duality we obtain the Broyden Fletcher Goldfarb and Shanno (BFGS) update formula B k+1 B k + y ky T k y T k s k B ks k s T k B k s T k B ks k The BFGS formula written in this way is not useful in the case of large problem. We need an equivalent formula for the inverse of the approximate Hessian. This can be done with a generalization of the Sherman-Morrison formula. Quasi-Newton methods for minimization 43 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Sherman-Morrison-Woodbury formula (1/2) Sherman-Morrison-Woodbury formula permit to explicit write the inverse of a matrix changed with a rank k perturbation Proposition (Sherman Morrison Woodbury formula) (A + UV T ) 1 = A 1 A 1 UC 1 V T A 1 where C = I + V T A 1 U, U = [ u 1, u 2,..., u k ] V = [ v 1, v 2,..., v k ] The Sherman Morrison Woodbury formula can be checked by a direct calculation. Quasi-Newton methods for minimization 44 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Sherman-Morrison-Woodbury formula (2/2) Remark The previous formula can be written as: where ( A + k u i vi T i=1 ) 1 = A 1 A 1 UC 1 V T A 1 C ij = δ ij + v T i A 1 u j i, j = 1, 2,..., k Quasi-Newton methods for minimization 45 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update The BFGS update for H Proposition By using the Sherman-Morrison-Woodbury formula the BFGS update for H becomes: H k+1 H k H ky k s T k + s kyk T H k s T k y k + s ks T ( k s T k y 1 + yt k H ) (A) ky k k s T k y k Or equivalently H k+1 ( I s kyk T ( )H s T k y k I y ks T ) k k s T k y + s ks T k k s T k y k (B) Quasi-Newton methods for minimization 46 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. (1/3). Consider the Sherman-Morrison-Woodbury formula with k = 2 and u 1 = v 1 = y k (s T k y k) 1/2 u 2 = v 2 = in this way (setting H k = B 1 k ) we have C 11 = 1 + v1 T B 1 k u 1 = 1 + yt k H ky k s T k y k C 22 = 1 + v T 2 B 1 k B k s k (s T k B ks k ) 1/2 u 2 = 1 st k B kb 1 k B ks k s T k B = 1 1 = 0 ks k C 12 = C 21 = v T 1 B 1 k u 2 = v T 2 B 1 k u 1 = C 12 yk T B 1 k B ks k (s T k y k) 1/2 (s T k B ks k ) 1/2 = (st k y k) 1/2 (s T k B ks k ) 1/2 Quasi-Newton methods for minimization 47 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. In this way the matrix C has the form ( ) β α C = C 1 = 1 ( 0 α 0 α 2 α ) α β (2/3). β = 1 + yt k H ky k s T k y k α = (st k y k) 1/2 (s T k B ks k ) 1/2 where setting Ũ = H ku and Ṽ = H kv where ũ i = H k u i and ṽ i = H k v i i = 1, 2 we have H k+1 H k H k UC 1 V T H k = H k ŨC 1 Ṽ T Quasi-Newton methods for minimization 48 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Notice that (matrix product is R n 2 R 2 2 R 2 n ) ŨC 1 Ṽ T = 1 ) ( 0 α (ũ1 α 2 ũ 2 α β ) (ṽt ) 1 ṽ T 2 (3/3). = 1 α (ũ 1ṽ T 2 ũ 2 ṽ T 1 ) + β α 2 ũ2ṽ T 2 = 1 α (H ku 1 v T 2 H k H k u 2 v T 1 H k ) + β α 2 H ku 2 v T 2 H k Substituting the values of α, β, ũ s and ṽ s we have we have H k+1 H k H ky k s T k + s ky T k H k s T k y k + s ks T ( k s T k y 1 + yt k H ) ky k k s T k y k At this point the update formula (B) is a straightforward calculation. Quasi-Newton methods for minimization 49 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Positive definitiveness of BFGS update Theorem (Positive definitiveness of BFGS update) Given H k symmetric and positive definite, then the DFP update H k+1 ( I s kyk T ( )H s T k y k I y ks T ) k k s T k y + s ks T k k s T k y k produce H k+1 positive definite if and only if s T k y k > 0. Remark (Wolfe BFGS update is SPD) Expanding s T k y k > 0 we have f(x k+1 )s k > f(x k )s k. Remember that in a minimum search algorithm we have s k = α k p k with α k > 0. But the second Wolfe condition for line-search is f(x k + α k p k )p k c 2 f(x k )p k with 0 < c 2 < 1. But this imply: f(x k+1 )s k c 2 f(x k )s k > f(x k )s k s T k y k > 0. Quasi-Newton methods for minimization 50 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Let be s T k y k > 0: consider a z 0 then z T H k+1 z = w T H k w + (zt s k ) 2 s T k y k where w = z y k s T k z s T k y k In order to have z T H k+1 z = 0 we must have w = 0 and z T s k = 0. But z T s k = 0 imply w = z and this imply z = 0. Let be z T H k+1 z > 0 for all z 0: Choosing z = y k we have 0 < y T k H k+1y k = (st k y k) 2 s T k y k = s T k y k and thus s T k y k > 0. Quasi-Newton methods for minimization 51 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Algorithm (BFGS quasi-newton algorithm) k 0; x assigned; g f(x) T ; H 2 f(x) 1 ; while g > ɛ do compute search direction d Hg; Approximate arg min α>0 f(x + αd) by linsearch; perform step x x + αd; update H k+1 y f(x) T g; z Hy; g f(x) T ; H H zdt + dz T ( d T + y k k + 1; end while α + yt z d T y ) dd T d T y ; Quasi-Newton methods for minimization 52 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Theorem (property of BFGS update) Let be q(x) = 1 2 (x x ) T A(x x ) + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let {x k } and {H k } produced by the sequence {s k } 1 x k+1 x k + s k ; 2 H k+1 ( I s kyk T s T k y k ( )H k I y ks T k s T k y k ) + s ks T k s T k y k where s k = α k p k with α k is obtained by exact line-search. Then for j < k we have 1 g T k s j = 0; [orthogonality property] 2 H k y j = s j ; [hereditary property] 3 s T k As j = 0; [conjugate direction property] 4 The method terminate (i.e. f(x m ) = 0) at x m = x with m n. If n = m then H n = A 1. ; Quasi-Newton methods for minimization 53 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Points (1), (2) and (3) are proved by induction. The base of induction is obvious, let be the theorem true for k > 0. Due to exact line search we have: g T k+1 s k = 0 moreover by induction for j < k we have g T k+1 s j = 0, in fact: g T k+1 s j = g T j s j + k 1 = 0 + k 1 i=j (g i+1 g i ) T s j i=j (A(x i+1 x ) A(x i x )) T s j = k 1 i=j (A(x i+1 x i )) T s j = k 1 i=j st i As j = 0. (1/4). [induction + conjugacy prop.] Quasi-Newton methods for minimization 54 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. (2/4). By using s k+1 = α k+1 H k+1 g k+1 we have s T k+1 As j = 0, in fact: s T k+1 As j = α k+1 gk+1 T H k+1(ax j+1 Ax j ) = α k+1 gk+1 T H k+1(a(x j+1 x ) A(x j x )) = α k+1 gk+1 T H k+1(g j+1 g j ) = α k+1 gk+1 T H k+1y j = α k+1 gk+1 T s j [induction + hereditary prop.] = 0 notice that we have used As j = y j. Quasi-Newton methods for minimization 55 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Due to BFGS construction we have (3/4). H k+1 y k = s k by inductive hypothesis and BFGS formula for j < k we have, s T k y j = s T k As j = 0, H k+1 y j = = ( I s kyk T ( )H s T k y k y j st k y ) j k s T k y y k + s ks T k y j k s T k y k ( I s kyk T ) s T k y H k y j + s k0 k s T k y k = s j yt k s j s T k y s k k = s j [H k y j = s j ] Quasi-Newton methods for minimization 56 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. (4/4). Finally if m = n we have s j with j = 0, 1,..., n 1 are conjugate and linearly independent. From hereditary property and lemma on slide 8 i.e. we have H n As k = H n y k = s k H n As k = s k, k = 0, 1,..., n 1 due to linear independence of {s k } follows that H n = A 1. Quasi-Newton methods for minimization 57 / 63

The Broyden class Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 58 / 63

The Broyden class The DFP update BF GS Hk+1 H k H ky k s T k + s kyk T H k s T k y + s ks T ( k k s T k y 1 + yt k H ) ky k k s T k y k and BFGS update Hk+1 DF P H k + s ks T k s T k y H ky k yk T H k k yk T H ky k maintains the symmetry and positive definitiveness. The following update H θ k+1 P (1 θ)hdf k+1 + θh BF GS k+1 maintain for any θ the symmetry, and for θ [0, 1] also the positive definitiveness. Quasi-Newton methods for minimization 59 / 63

The Broyden class Positive definitiveness of Broyden Class update Theorem (Positive definitiveness of Broyden Class update) Given H k symmetric and positive definite, then the Broyden Class update H θ k+1 P (1 θ)hdf k+1 + θh BF GS k+1 produce Hk+1 θ s T k y k > 0. positive definite for any θ [0, 1] if and only if Quasi-Newton methods for minimization 60 / 63

The Broyden class Theorem (property of Broyden Class update) Let be q(x) = 1 2 (x x ) T A(x x ) + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let {x k } and {H k } produced by the sequence {s k } 1 x k+1 x k + s k ; 2 Hk+1 θ (1 θ)hdf P k+1 + θhbf GS k+1 ; where s k = α k p k with α k is obtained by exact line-search. Then for j < k we have 1 g T k s j = 0; [orthogonality property] 2 H k y j = s j ; [hereditary property] 3 s T k As j = 0; [conjugate direction property] 4 The method terminate (i.e. f(x m ) = 0) at x m = x with m n. If n = m then H n = A 1. Quasi-Newton methods for minimization 61 / 63

The Broyden class The Broyden Class update can be written as where H θ k+1 = HDF P k+1 + θw k w T k BF GS = Hk+1 + (θ 1)w k wk T w k = ( yk T H ) [ 1/2 sk ky k s T k y H ] ky k k yk T H ky k For particular values of θ we obtain 1 θ = 0, the DFP update 2 θ = 1, the BFGS update 3 θ = s T k y k/(s k H k y k ) T y k the SR1 update 4 θ = (1 ± (y T k H ky k /s T k y k)) 1 the Hoshino update Quasi-Newton methods for minimization 62 / 63

The Broyden class References J. Stoer and R. Bulirsch Introduction to numerical analysis Springer-Verlag, Texts in Applied Mathematics, 12, 2002. J. E. Dennis, Jr. and Robert B. Schnabel Numerical Methods for Unconstrained Optimization and Nonlinear Equations SIAM, Classics in Applied Mathematics, 16, 1996. Quasi-Newton methods for minimization 63 / 63