Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1 / 63

Quasi Newton Method Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 2 / 63

Quasi Newton Method Algorithm (General quasi-newton algorithm) k 0; x 0 assigned; g 0 f(x 0 ) T ; H 0 2 f(x 0 ) 1 ; while g k > ɛ do compute search direction d k H k g k ; Approximate arg min α>0 f(x k + αd k ) by linsearch; perform step x k+1 x k + α k d k ; g k+1 f(x k+1 ) T ; update H k+1 H k+1 some algorithm ( H k, x k, x k+1, g k, g k+1 ) ; k k + 1; end while Quasi-Newton methods for minimization 3 / 63

The symmetric rank one update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 4 / 63

The symmetric rank one update Let B k an approximation of the Hessian of f(x). Let x k, x k+1, g k and g k+1 points and gradients at k and k + 1-th iterates. Using the Broyden update formula to force secant condition to B k+1 we obtain B k+1 B k + (y k B k s k )s T k s T k s k where s k = x k+1 x k and y k = g k+1 g k. By using Sherman Morrison formula and setting H k = B 1 k we obtain the update: H k+1 H k (H ky k s k )s T k s T k s k + s T k H kg k+1 H k The previous update do not maintain symmetry. In fact if H k is symmetric then H k+1 not necessarily is symmetric., Quasi-Newton methods for minimization 5 / 63

The symmetric rank one update To avoid the loss of symmetry we can consider an update of the form: H k+1 H k + uu T Imposing the secant condition (on the inverse) we obtain H k+1 y k = s k H k y k + uu T y k = s k from previous equality y T k H ky k + y T k uut y k = y T k s k y T k u = ( y T k s k y T k H ky k ) 1/2 we obtain u = s k H k y k u T y k = s k H k y k ( y T k s k y T k H ky k ) 1/2 Quasi-Newton methods for minimization 6 / 63

The symmetric rank one update substituting the expression of u u = s k H k y k ( y T k s k y T k H ky k ) 1/2 in the update formula, we obtain H k+1 H k + w kw T k w T k y k w k = s k H k y k The previous update formula is the symmetric rank one formula (SR1). To be definite the previous formula needs w T k y k 0. Moreover if w T k y k < 0 and H k is positive definite then H k+1 may loss positive definitiveness. Have H k symmetric and positive definite is important for global convergence Quasi-Newton methods for minimization 7 / 63

The symmetric rank one update This lemma is used in the forward theorems Lemma Let be q(x) = 1 2 xt Ax b T x + c with A R n n symmetric and positive defined. Then y k = g k+1 g k = Ax k+1 b Ax k + b = As k where g k = q(x k ) T. Quasi-Newton methods for minimization 8 / 63

The symmetric rank one update Theorem (property of SR1 update) Let be q(x) = 1 2 xt Ax b T x + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let x k and H k produced by 1 x k+1 = x k + s k ; 2 H k+1 updated by the SR1 formula H k+1 H k + w kw T k w T k y k w k = s k H k y k If s 0, s 1,..., s n 1 are linearly independent then H n = A 1. Quasi-Newton methods for minimization 9 / 63

The symmetric rank one update Proof. (1/2). We prove by induction the hereditary property H i y j = s j. BASE: For i = 1 is exactly the secant condition of the update. INDUCTION: Suppose the relation is valid for k > 0 the we prove that it is valid for k + 1. In fact, from the update formula H k+1 y j = H k y j + wt k y j wk T y w k k w k = s k H k y k by the induction hypothesis for j < k and using lemma on slide 8 we have w T k y j = s T k y j y T k H ky j = s T k y j y T k s j = y T k Ay j y T k Ay j = 0 so that H k+1 y j = H k y j = s j for j = 0, 1,..., k 1. For j = k we have H k+1 y k = s k trivially by construction of the SR1 formula. Quasi-Newton methods for minimization 10 / 63

The symmetric rank one update Proof. To prove that H n = A 1 notice that (2/2). H n y j = s j, As j = y j, j = 0, 1,..., n 1 and combining the equality H n As j = s j, j = 0, 1,..., n 1 due to the linear independence of s i we have H n A = I i.e. H n = A 1. Quasi-Newton methods for minimization 11 / 63

The symmetric rank one update Properties of SR1 update (1/2) 1 The SR1 update possesses the natural quadratic termination property (like CG). 2 SR1 satisfy the hereditary property H k y j = s j for j < k. 3 SR1 does maintain the positive definitiveness of H k if and only if w T k y k > 0. However this condition is difficult to guarantee. 4 Sometimes w T k y k becomes very small or 0. This results in serious numerical difficulty (roundoff) or even the algorithm is broken. We can avoid this breakdown by the following strategy Breakdown workaround for SR1 update 1 if wk T y k ɛ wk T y k (i.e. the angle between w k and y k is far from 90 degree), then we update with the SR1 formula. 2 Otherwise we set H k+1 = H k. Quasi-Newton methods for minimization 12 / 63

The symmetric rank one update Properties of SR1 update (2/2) Theorem (Convergence of nonlinear SR1 update) Let f(x) satisfying standard assumption. Let be {x k } a sequence of iterates such that lim k x k = x. Suppose we use the breakdown workaround for SR1 update and the steps {s k } are uniformly linearly independent. Then we have lim Hk 2 f(x ) 1 = 0. k A.R.Conn, N.I.M.Gould and P.L.Toint Convergence of quasi-newton matrices generated by the symmetric rank one update. Mathematic of Computation 50 399 430, 1988. Quasi-Newton methods for minimization 13 / 63

The Powell-symmetric-Broyden update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 14 / 63

The Powell-symmetric-Broyden update The SR1 update, although symmetric do not have minimum property like the Broyden update for the non symmetric case. The Broyden update B k+1 = B k + (y k B k s k )s T k s T k s k solve the minimization problem B k+1 B k F B B k F for all Bs k = y k If we solve a similar problem in the class of symmetric matrix we obtain the Powell-symmetric-Broyden (PSB) update Quasi-Newton methods for minimization 15 / 63

The Powell-symmetric-Broyden update Lemma (Powell-symmetric-Broyden update) Let A R n n symmetric and s, y R n with s 0. Consider the set B = { B R n n Bs = y, B = B T } if s T y 0 a then there exists a unique matrix B B such that A B F A C F for all C B moreover B has the following form B = A + ωst + sω T s T s (ω T s) sst (s T s) 2 ω = y As then B is a rank two perturbation of the matrix A. a This is true if Wolfe line search is performed Quasi-Newton methods for minimization 16 / 63

The Powell-symmetric-Broyden update Proof. First of all notice that B is not empty, in fact [ ] 1 1 s T y yyt B s T y yyt s = y (1/11). So that the problem is not empty. Next we reformulate the problem as a constrained minimum problem: 1 arg min B R n n 2 n (A ij B ij ) 2 subject to Bs = y and B = B T i,j=1 The solution is a stationary point of the Lagrangian: g(b, λ, M) = 1 2 A B 2 F + λt (By s) + i<j µ ij (B ij B ji ) Quasi-Newton methods for minimization 17 / 63

The Powell-symmetric-Broyden update Proof. taking the gradient we have B ij g(b, λ, B) = A ij B ij + λ i s j + M ij = 0 (2/11). where µ ij if i < j; M ij = µ ij if i > j; 0 If i = j. The previous equality can be written in matrix form as B = A + λs T + M. Quasi-Newton methods for minimization 18 / 63

The Powell-symmetric-Broyden update Proof. Imposing symmetry for B (3/11). A + λs T + M = A T + sλ T + M T = A + sλ T M solving for M we have substituting in B we have M = sλt λs T 2 B = A + sλt + λs T 2 Quasi-Newton methods for minimization 19 / 63

The Powell-symmetric-Broyden update Proof. Imposing s T Bs = s T y (4/11). s T As + st sλ T s + s T λs T s 2 λ T s = (s T ω)/(s T s) where ω = y As. Imposing Bs = y As + sλt s + λs T s 2 λ = 2ω s T s (st ω)s (s T s) 2 next we compute the explicit form of B. = s T y = y Quasi-Newton methods for minimization 20 / 63

The Powell-symmetric-Broyden update Proof. Substituting (5/11). we obtain λ = 2ω s T s (st ω)s (s T s) 2 in B = A + sλt + λs T 2 B = A + ωst + sω T s T s (ω T s) sst (s T s) 2 ω = y As next we prove that B is the unique minimum. Quasi-Newton methods for minimization 21 / 63

The Powell-symmetric-Broyden update Proof. The matrix B is a minimum, in fact B A F = ωs T + sω T s T s (ω T s) sst (s T s) 2 F (6/11). To bound this norm we need the following properties of Frobenius norm: M N 2 F = M 2 F + N 2 F 2M N; where M N = ij M ijn ij setting M = ωst + sω T s T s N = (ω T s) sst (s T s) 2 now we compute M F, N F and M N. Quasi-Newton methods for minimization 22 / 63

The Powell-symmetric-Broyden update Proof. M N = ωt s (s T s) 3 ij (ω i s j + ω j s i )s i s j = ωt s [ (ωi (s T s) 3 s i )sj 2 + (ω j s j )si 2 ) ] ij [ = ωt s (s T s) 3 (ω i s i ) sj 2 + (ω j s j ) i j j i [ ] = ωt s (s T s) 3 (ω T s)(s T s) + (ω T s)(s T s) = 2(ωT s) 2 (s T s) 2 s 2 i (7/11). ] Quasi-Newton methods for minimization 23 / 63

The Powell-symmetric-Broyden update Proof. (8/11). To bound N 2 F and M 2 F we need the following properties of Frobenius norm: uv T 2 F = (ut u)(v T v); uv T + vu T 2 F = 2(uT u)(v T v) + 2(u T v) 2 ; Then we have N 2 F = (ωt s) 2 (s T s) 4 ss T 2 F = (ωt s) 2 (s T s) 4 (st s) 2 = (ωt s) 2 (s T s) 2 M 2 F = ωst + sω T s T s = 2(ωT ω)(s T s) + 2(s T ω) 2 (s T s) 2 Quasi-Newton methods for minimization 24 / 63

The Powell-symmetric-Broyden update Proof. Putting all together and using Cauchy-Schwartz inequality (a T b a b ): (9/11). M N 2 F = (ωt s) 2 (s T s) 2 + 2(ωT ω)(s T s) + 2(s T ω) 2 (s T s) 2 4(ωT s) 2 (s T s) 2 = 2(ωT ω)(s T s) (ω T s) 2 (s T s) 2 ωt ω s T s = ω 2 s 2 [used Cauchy-Schwartz] Using ω = y As and noticing that y = Cs for all C B. so that ω = y As = Cs As = (C A)s Quasi-Newton methods for minimization 25 / 63

The Powell-symmetric-Broyden update Proof. To bound (C A)s we need the following property of Frobenius norm: in fact Mx M F x ; Mx 2 = i using this inequality ( ) 2 ( M ij s j j = M 2 F s 2 i j M 2 ij )( k s 2 k (10/11). ) M N F ω s = (C A)s s C A F s s i.e. we have A B F C A F for all C B. Quasi-Newton methods for minimization 26 / 63

The Powell-symmetric-Broyden update Proof. (11/11). Let B and B two different minimum. Then 1 2 (B + B ) B moreover A 1 2 (B + B ) 1 A B F 2 F + 1 A B 2 F If the inequality is strict we have a contradiction. From the Cauchy Schwartz inequality we have an equality only when A B = λ(a B ) so that and B λb = (1 λ)a B s λb s = (1 λ)as (1 λ)y = (1 λ)as but this is true only when λ = 1, i.e. B = B. Quasi-Newton methods for minimization 27 / 63

The Powell-symmetric-Broyden update Algorithm (PSB quasi-newton algorithm) k 0; x assigned; g f(x) T ; B 2 f(x); while g > ɛ do compute search direction d B 1 g; [solve linear system Bd = g] Approximate arg min α>0 f(x + αd) by linsearch; perform step x x + αd; update B k+1 ω f(x) T + (α 1)g; g f(x) T ; β (αd T d) 1 ; γ β 2 αd T ω; B B + β ( dω T + ωd T ) γdd T ; k k + 1; end while Quasi-Newton methods for minimization 28 / 63

The Davidon Fletcher and Powell rank 2 update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 29 / 63

The Davidon Fletcher and Powell rank 2 update The SR1 and PSB update maintains the symmetry but do not maintains the positive definitiveness of the matrix H k+1. To recover this further property we can try the update of the form: H k+1 H k + αuu T + βvv T Imposing the secant condition (on the inverse) H k+1 y k = s k H k y k + α(u T y k )u + β(v T y k )v = s k α(u T y k )u + β(v T y k )v = s k H k y k clearly this equation has not a unique solution. A natural choice for u and v is the following: u = s k v = H k y k Quasi-Newton methods for minimization 30 / 63

The Davidon Fletcher and Powell rank 2 update Solving for α and β the equation we obtain α(s T k y k)s k + β(y T k H ky k )H k y k = s k H k y k α = 1 s T k y k 1 β = yk T H ky k substituting in the updating formula we obtain the Davidon Fletcher and Powell (DFP) rank 2 update formula H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ky k Obviously this is only one of the possible choices and with other solutions we obtain different update formulas. Next we must prove that under suitable condition the DFP update formula maintains positive definitiveness. Quasi-Newton methods for minimization 31 / 63

The Davidon Fletcher and Powell rank 2 update Positive definitiveness of DFP update Theorem (Positive definitiveness of DFP update) Given H k symmetric and positive definite, then the DFP update H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ky k produce H k+1 positive definite if and only if s T k y k > 0. Remark (Wolfe DFP update is SPD) Expanding s T k y k > 0 we have f(x k+1 )s k > f(x k )s k. Remember that in a minimum search algorithm we have s k = α k p k with α k > 0. But the second Wolfe condition for line-search is f(x k + α k p k )p k c 2 f(x k )p k with 0 < c 2 < 1. But this imply: f(x k+1 )s k c 2 f(x k )s k > f(x k )s k s T k y k > 0. Quasi-Newton methods for minimization 32 / 63

The Davidon Fletcher and Powell rank 2 update Proof. (1/2). Let be s T k y k > 0: consider a z 0 then ( z T H k+1 z = z T H k H ky k yk T H ) k yk T H z + z T s ks T k ky k s T k y z k = z T H k z (zt H k y k )(y T k H kz) y T k H ky k + (zt s k ) 2 s T k y k H k is SPD so that there exists the Cholesky decomposition LL T = H k. Defining a = L T z and b = L T y k we can write z T H k+1 z = (at a)(b T b) (a T b) 2 b T b + (zt s k ) 2 s T k y k from the Cauchy-Schwartz inequality we have (a T a)(b T b) (a T b) 2 so that z T H k+1 z 0. Quasi-Newton methods for minimization 33 / 63

The Davidon Fletcher and Powell rank 2 update Proof. To prove strict inequality remember from the Cauchy-Schwartz inequality that (a T a)(b T b) = (a T b) 2 if and only if a = λb, i.e. (2/2). but in this case L T z = λl T y k z = λy k (z T s k ) 2 s T k y k = λ 2 (yt s k ) 2 s T k y k > 0 z T H k+1 z > 0. Quasi-Newton methods for minimization 34 / 63

The Davidon Fletcher and Powell rank 2 update Algorithm (DFP quasi-newton algorithm) k 0; x assigned; g f(x) T ; H 2 f(x) 1 ; while g > ɛ do compute search direction d Hg; Approximate arg min α>0 f(x + αd) by linsearch; perform step x x + αd; update H k+1 y f(x) T g; z Hy; g f(x) T ; H H α ddt d T y zzt y T z ; k k + 1; end while Quasi-Newton methods for minimization 35 / 63

The Davidon Fletcher and Powell rank 2 update Theorem (property of DFP update) Let be q(x) = 1 2 (x x ) T A(x x ) + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let {x k } and {H k } produced by the sequence {s k } 1 x k+1 x k + s k ; 2 H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ; ky k where s k = α k p k with α k is obtained by exact line-search. Then for j < k we have 1 g T k s j = 0; [orthogonality property] 2 H k y j = s j ; [hereditary property] 3 s T k As j = 0; [conjugate direction property] 4 The method terminate (i.e. f(x m ) = 0) at x m = x with m n. If n = m then H n = A 1. Quasi-Newton methods for minimization 36 / 63

The Davidon Fletcher and Powell rank 2 update Proof. Points (1), (2) and (3) are proved by induction. The base of induction is obvious, let be the theorem true for k > 0. Due to exact line search we have: g T k+1 s k = 0 moreover by induction for j < k we have g T k+1 s j = 0, in fact: g T k+1 s j = g T j s j + k 1 = 0 + k 1 i=j (g i+1 g i ) T s j i=j (A(x i+1 x ) A(x i x )) T s j = k 1 i=j (A(x i+1 x i )) T s j = k 1 i=j st i As j = 0. (1/4). [induction + conjugacy prop.] Quasi-Newton methods for minimization 37 / 63

The Davidon Fletcher and Powell rank 2 update Proof. (2/4). By using s k+1 = α k+1 H k+1 g k+1 we have s T k+1 As j = 0, in fact: s T k+1 As j = α k+1 gk+1 T H k+1(ax j+1 Ax j ) = α k+1 gk+1 T H k+1(a(x j+1 x ) A(x j x )) = α k+1 gk+1 T H k+1(g j+1 g j ) = α k+1 gk+1 T H k+1y j = α k+1 gk+1 T s j [induction + hereditary prop.] = 0 notice that we have used As j = y j. Quasi-Newton methods for minimization 38 / 63

The Davidon Fletcher and Powell rank 2 update Proof. Due to DFP construction we have (3/4). H k+1 y k = s k by inductive hypothesis and DFP formula for j < k we have, s T k y j = s T k As j = 0, moreover H k+1 y j = H k y j + s ks T k y j s T k y k H ky k y T k H ky j y T k H ky k = s j + s k0 s T k y H ky k yk T s j k yk T H [H k y j = s j ] ky k = s j H ky k (g k+1 g k ) T s j y T k H ky k [y j = g j+1 g j ] = s j [induction + ortho. prop.] Quasi-Newton methods for minimization 39 / 63

The Davidon Fletcher and Powell rank 2 update Proof. (4/4). Finally if m = n we have s j with j = 0, 1,..., n 1 are conjugate and linearly independent. From hereditary property and lemma on slide 8 i.e. we have H n As k = H n y k = s k H n As k = s k, k = 0, 1,..., n 1 due to linear independence of {s k } follows that H n = A 1. Quasi-Newton methods for minimization 40 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 41 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Another update which maintain symmetry and positive definitiveness is the Broyden Fletcher Goldfarb and Shanno (BFGS,1970) rank 2 update. This update was independently discovered by the four authors. A convenient way to introduce BFGS is by the concept of duality. Consider an update for the Hessian, say B k+1 U(B k, s k, y k ) which satisfy B k+1 s k = y k (the secant condition on the Hessian). Then by exchanging B k H k and s k y k we obtain the dual update for the inverse of the Hessian, i.e. H k+1 U(H k, y k, s k ) which satisfy H k+1 y k = s k (the secant condition on the inverse of the Hessian). Quasi-Newton methods for minimization 42 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Starting from the Davidon Fletcher and Powell (DFP) rank 2 update formula H k+1 H k + s ks T k s T k y H ky k yk T H k k yk T H ky k by the duality we obtain the Broyden Fletcher Goldfarb and Shanno (BFGS) update formula B k+1 B k + y ky T k y T k s k B ks k s T k B k s T k B ks k The BFGS formula written in this way is not useful in the case of large problem. We need an equivalent formula for the inverse of the approximate Hessian. This can be done with a generalization of the Sherman-Morrison formula. Quasi-Newton methods for minimization 43 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Sherman-Morrison-Woodbury formula (1/2) Sherman-Morrison-Woodbury formula permit to explicit write the inverse of a matrix changed with a rank k perturbation Proposition (Sherman Morrison Woodbury formula) (A + UV T ) 1 = A 1 A 1 UC 1 V T A 1 where C = I + V T A 1 U, U = [ u 1, u 2,..., u k ] V = [ v 1, v 2,..., v k ] The Sherman Morrison Woodbury formula can be checked by a direct calculation. Quasi-Newton methods for minimization 44 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Sherman-Morrison-Woodbury formula (2/2) Remark The previous formula can be written as: where ( A + k u i vi T i=1 ) 1 = A 1 A 1 UC 1 V T A 1 C ij = δ ij + v T i A 1 u j i, j = 1, 2,..., k Quasi-Newton methods for minimization 45 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update The BFGS update for H Proposition By using the Sherman-Morrison-Woodbury formula the BFGS update for H becomes: H k+1 H k H ky k s T k + s kyk T H k s T k y k + s ks T ( k s T k y 1 + yt k H ) (A) ky k k s T k y k Or equivalently H k+1 ( I s kyk T ( )H s T k y k I y ks T ) k k s T k y + s ks T k k s T k y k (B) Quasi-Newton methods for minimization 46 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. (1/3). Consider the Sherman-Morrison-Woodbury formula with k = 2 and u 1 = v 1 = y k (s T k y k) 1/2 u 2 = v 2 = in this way (setting H k = B 1 k ) we have C 11 = 1 + v1 T B 1 k u 1 = 1 + yt k H ky k s T k y k C 22 = 1 + v T 2 B 1 k B k s k (s T k B ks k ) 1/2 u 2 = 1 st k B kb 1 k B ks k s T k B = 1 1 = 0 ks k C 12 = C 21 = v T 1 B 1 k u 2 = v T 2 B 1 k u 1 = C 12 yk T B 1 k B ks k (s T k y k) 1/2 (s T k B ks k ) 1/2 = (st k y k) 1/2 (s T k B ks k ) 1/2 Quasi-Newton methods for minimization 47 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. In this way the matrix C has the form ( ) β α C = C 1 = 1 ( 0 α 0 α 2 α ) α β (2/3). β = 1 + yt k H ky k s T k y k α = (st k y k) 1/2 (s T k B ks k ) 1/2 where setting Ũ = H ku and Ṽ = H kv where ũ i = H k u i and ṽ i = H k v i i = 1, 2 we have H k+1 H k H k UC 1 V T H k = H k ŨC 1 Ṽ T Quasi-Newton methods for minimization 48 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Notice that (matrix product is R n 2 R 2 2 R 2 n ) ŨC 1 Ṽ T = 1 ) ( 0 α (ũ1 α 2 ũ 2 α β ) (ṽt ) 1 ṽ T 2 (3/3). = 1 α (ũ 1ṽ T 2 ũ 2 ṽ T 1 ) + β α 2 ũ2ṽ T 2 = 1 α (H ku 1 v T 2 H k H k u 2 v T 1 H k ) + β α 2 H ku 2 v T 2 H k Substituting the values of α, β, ũ s and ṽ s we have we have H k+1 H k H ky k s T k + s ky T k H k s T k y k + s ks T ( k s T k y 1 + yt k H ) ky k k s T k y k At this point the update formula (B) is a straightforward calculation. Quasi-Newton methods for minimization 49 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Positive definitiveness of BFGS update Theorem (Positive definitiveness of BFGS update) Given H k symmetric and positive definite, then the DFP update H k+1 ( I s kyk T ( )H s T k y k I y ks T ) k k s T k y + s ks T k k s T k y k produce H k+1 positive definite if and only if s T k y k > 0. Remark (Wolfe BFGS update is SPD) Expanding s T k y k > 0 we have f(x k+1 )s k > f(x k )s k. Remember that in a minimum search algorithm we have s k = α k p k with α k > 0. But the second Wolfe condition for line-search is f(x k + α k p k )p k c 2 f(x k )p k with 0 < c 2 < 1. But this imply: f(x k+1 )s k c 2 f(x k )s k > f(x k )s k s T k y k > 0. Quasi-Newton methods for minimization 50 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Let be s T k y k > 0: consider a z 0 then z T H k+1 z = w T H k w + (zt s k ) 2 s T k y k where w = z y k s T k z s T k y k In order to have z T H k+1 z = 0 we must have w = 0 and z T s k = 0. But z T s k = 0 imply w = z and this imply z = 0. Let be z T H k+1 z > 0 for all z 0: Choosing z = y k we have 0 < y T k H k+1y k = (st k y k) 2 s T k y k = s T k y k and thus s T k y k > 0. Quasi-Newton methods for minimization 51 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Algorithm (BFGS quasi-newton algorithm) k 0; x assigned; g f(x) T ; H 2 f(x) 1 ; while g > ɛ do compute search direction d Hg; Approximate arg min α>0 f(x + αd) by linsearch; perform step x x + αd; update H k+1 y f(x) T g; z Hy; g f(x) T ; H H zdt + dz T ( d T + y k k + 1; end while α + yt z d T y ) dd T d T y ; Quasi-Newton methods for minimization 52 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Theorem (property of BFGS update) Let be q(x) = 1 2 (x x ) T A(x x ) + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let {x k } and {H k } produced by the sequence {s k } 1 x k+1 x k + s k ; 2 H k+1 ( I s kyk T s T k y k ( )H k I y ks T k s T k y k ) + s ks T k s T k y k where s k = α k p k with α k is obtained by exact line-search. Then for j < k we have 1 g T k s j = 0; [orthogonality property] 2 H k y j = s j ; [hereditary property] 3 s T k As j = 0; [conjugate direction property] 4 The method terminate (i.e. f(x m ) = 0) at x m = x with m n. If n = m then H n = A 1. ; Quasi-Newton methods for minimization 53 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Points (1), (2) and (3) are proved by induction. The base of induction is obvious, let be the theorem true for k > 0. Due to exact line search we have: g T k+1 s k = 0 moreover by induction for j < k we have g T k+1 s j = 0, in fact: g T k+1 s j = g T j s j + k 1 = 0 + k 1 i=j (g i+1 g i ) T s j i=j (A(x i+1 x ) A(x i x )) T s j = k 1 i=j (A(x i+1 x i )) T s j = k 1 i=j st i As j = 0. (1/4). [induction + conjugacy prop.] Quasi-Newton methods for minimization 54 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. (2/4). By using s k+1 = α k+1 H k+1 g k+1 we have s T k+1 As j = 0, in fact: s T k+1 As j = α k+1 gk+1 T H k+1(ax j+1 Ax j ) = α k+1 gk+1 T H k+1(a(x j+1 x ) A(x j x )) = α k+1 gk+1 T H k+1(g j+1 g j ) = α k+1 gk+1 T H k+1y j = α k+1 gk+1 T s j [induction + hereditary prop.] = 0 notice that we have used As j = y j. Quasi-Newton methods for minimization 55 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. Due to BFGS construction we have (3/4). H k+1 y k = s k by inductive hypothesis and BFGS formula for j < k we have, s T k y j = s T k As j = 0, H k+1 y j = = ( I s kyk T ( )H s T k y k y j st k y ) j k s T k y y k + s ks T k y j k s T k y k ( I s kyk T ) s T k y H k y j + s k0 k s T k y k = s j yt k s j s T k y s k k = s j [H k y j = s j ] Quasi-Newton methods for minimization 56 / 63

The Broyden Fletcher Goldfarb and Shanno (BFGS) update Proof. (4/4). Finally if m = n we have s j with j = 0, 1,..., n 1 are conjugate and linearly independent. From hereditary property and lemma on slide 8 i.e. we have H n As k = H n y k = s k H n As k = s k, k = 0, 1,..., n 1 due to linear independence of {s k } follows that H n = A 1. Quasi-Newton methods for minimization 57 / 63

The Broyden class Outline 1 Quasi Newton Method 2 The symmetric rank one update 3 The Powell-symmetric-Broyden update 4 The Davidon Fletcher and Powell rank 2 update 5 The Broyden Fletcher Goldfarb and Shanno (BFGS) update 6 The Broyden class Quasi-Newton methods for minimization 58 / 63

The Broyden class The DFP update BF GS Hk+1 H k H ky k s T k + s kyk T H k s T k y + s ks T ( k k s T k y 1 + yt k H ) ky k k s T k y k and BFGS update Hk+1 DF P H k + s ks T k s T k y H ky k yk T H k k yk T H ky k maintains the symmetry and positive definitiveness. The following update H θ k+1 P (1 θ)hdf k+1 + θh BF GS k+1 maintain for any θ the symmetry, and for θ [0, 1] also the positive definitiveness. Quasi-Newton methods for minimization 59 / 63

The Broyden class Positive definitiveness of Broyden Class update Theorem (Positive definitiveness of Broyden Class update) Given H k symmetric and positive definite, then the Broyden Class update H θ k+1 P (1 θ)hdf k+1 + θh BF GS k+1 produce Hk+1 θ s T k y k > 0. positive definite for any θ [0, 1] if and only if Quasi-Newton methods for minimization 60 / 63

The Broyden class Theorem (property of Broyden Class update) Let be q(x) = 1 2 (x x ) T A(x x ) + c with A R n n symmetric and positive definite. Let be x 0 and H 0 assigned. Let {x k } and {H k } produced by the sequence {s k } 1 x k+1 x k + s k ; 2 Hk+1 θ (1 θ)hdf P k+1 + θhbf GS k+1 ; where s k = α k p k with α k is obtained by exact line-search. Then for j < k we have 1 g T k s j = 0; [orthogonality property] 2 H k y j = s j ; [hereditary property] 3 s T k As j = 0; [conjugate direction property] 4 The method terminate (i.e. f(x m ) = 0) at x m = x with m n. If n = m then H n = A 1. Quasi-Newton methods for minimization 61 / 63

The Broyden class The Broyden Class update can be written as where H θ k+1 = HDF P k+1 + θw k w T k BF GS = Hk+1 + (θ 1)w k wk T w k = ( yk T H ) [ 1/2 sk ky k s T k y H ] ky k k yk T H ky k For particular values of θ we obtain 1 θ = 0, the DFP update 2 θ = 1, the BFGS update 3 θ = s T k y k/(s k H k y k ) T y k the SR1 update 4 θ = (1 ± (y T k H ky k /s T k y k)) 1 the Hoshino update Quasi-Newton methods for minimization 62 / 63

The Broyden class References J. Stoer and R. Bulirsch Introduction to numerical analysis Springer-Verlag, Texts in Applied Mathematics, 12, 2002. J. E. Dennis, Jr. and Robert B. Schnabel Numerical Methods for Unconstrained Optimization and Nonlinear Equations SIAM, Classics in Applied Mathematics, 16, 1996. Quasi-Newton methods for minimization 63 / 63