Math 408A: Non-Linear Optimization

Size: px

Start display at page:

Download "Math 408A: Non-Linear Optimization"

Hector Magnus Wells
6 years ago
Views:

1 February 12

3 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x k ) for s k and set x k+1 : = x k + s k y k : = g(x k ) g(x k+1 ) B k+1 : = B k + (y k B k s k )s k T. s k T s k

4 Broyden Updates Algorithm: Broyden s Method (Inverse Updating) Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve s k = J k g(x k ) for s k and set x k+1 : = x k + s k y k : = g(x k ) g(x k+1 ) J k+1 i = J k + (sk J k y k )s k T J k. s k T J k y

5 MSE for Optimization P : minimize x R n f (x), where f : R n R is twice continuously differentiable.

6 MSE for Optimization P : minimize x R n f (x), where f : R n R is twice continuously differentiable. Solve f (x) = 0 using the Newton-Like iteration x k+1 = x k M 1 k f (x k ).

7 MSE for Optimization P : minimize x R n f (x), where f : R n R is twice continuously differentiable. Solve f (x) = 0 using the Newton-Like iteration x k+1 = x k M 1 k f (x k ). The MSE becomes M k+1 s k = y k, where s k := x k+1 x k and y k := f (x k+1 ) f (x k ).

8 MSE for Optimization The Broyden update gives M k+1 = M k + (y k M k s k )s k T. s k T s k

9 MSE for Optimization The Broyden update gives M k+1 = M k + (y k M k s k )s k T. s k T s k This is unsatisfactory for two reasons: 1. Since M k approximates 2 f (x k ) it must be symmetric. 2. Since we are minimizing, then M k must be positive definite to insure that s k = M 1 k f (x k ) is a direction of descent for f at x k.

10 SR1 Update The Broyden class of updates is M k+1 = M k + (y k M k s k )v T v T s k.

11 SR1 Update The Broyden class of updates is M k+1 = M k + (y k M k s k )v T v T s k. The only symmetric member of this class is obtained by taking giving v = (y k M k s k ) M k+1 = M k + (y k M k s k )(y k M k s k ) T (y k M k s k ) T s k. The corresponding inverse update (by SMW) is H k+1 = H k + (sk H k y k )(s k H k y k ) T (s k H k y k ) T y k.

12 SR1 Update The Broyden class of updates is M k+1 = M k + (y k M k s k )v T v T s k. The only symmetric member of this class is obtained by taking giving v = (y k M k s k ) M k+1 = M k + (y k M k s k )(y k M k s k ) T (y k M k s k ) T s k. The corresponding inverse update (by SMW) is H k+1 = H k + (sk H k y k )(s k H k y k ) T (s k H k y k ) T y k. But this update may not be positive definite.

13 Positive Definite Updating Basic Problem: Let M R n n be a given symmetric positive definite (spd) matrix. Given s, y R n find an spd matrix M + such that M + s = y and M + is close to M.

14 Positive Definite Updating Basic Problem: Let M R n n be a given symmetric positive definite (spd) matrix. Given s, y R n find an spd matrix M + such that M + s = y and M + is close to M. We know that close cannot mean a rank one modification since the SR1 may not preserve positive definiteness.

15 Positive Definite Updating Basic Problem: Let M R n n be a given symmetric positive definite (spd) matrix. Given s, y R n find an spd matrix M + such that M + s = y and M + is close to M. We know that close cannot mean a rank one modification since the SR1 may not preserve positive definiteness. Let s try another approach that combines the Broyden update with yet another important property of symmetric matrices.

16 Positive Definite Updating Every spd matrix M can be written in the form M = LL T for some non-singular matrix L.

17 Positive Definite Updating Every spd matrix M can be written in the form M = LL T for some non-singular matrix L. Indeed, the LU factorization can be used to give the so called Cholesky factorization M = LL T where L is lower triangular.

18 Positive Definite Updating Every spd matrix M can be written in the form M = LL T for some non-singular matrix L. Indeed, the LU factorization can be used to give the so called Cholesky factorization M = LL T where L is lower triangular. Similarly, if M + exists, then there must be a nonsingular matrix J such that M + = JJ T.

19 Positive Definite Updating So M = LL T and M + = JJ T with M + s = y. Therefore, if we set v = J T s, then Jv = y.

20 Positive Definite Updating So M = LL T and M + = JJ T with M + s = y. Therefore, if we set v = J T s, then Jv = y. We apply the Broyden update to Jv = y and L to get T (y Lv)v J = L +. v T v

21 Positive Definite Updating So M = LL T and M + = JJ T with M + s = y. Therefore, if we set v = J T s, then Jv = y. We apply the Broyden update to Jv = y and L to get T (y Lv)v J = L +. v T v Using v = J T s, we get v = J T s = L T s + v(y Lv)T s. v T v

22 Positive Definite Updating So M = LL T and M + = JJ T with M + s = y. Therefore, if we set v = J T s, then Jv = y. We apply the Broyden update to Jv = y and L to get T (y Lv)v J = L +. v T v Using v = J T s, we get v = J T s = L T s + v(y Lv)T s. v T v But then v = αl T s for some α R.

23 Positive Definite Updating Hence with v = αl T s. v = J T s = L T s + v(y Lv)T s v T v

24 Positive Definite Updating Hence with v = αl T s. Plugging in we get v = J T s = L T s + v(y Lv)T s v T v αl T s = L T s + αlt s(y αll T s) T s α 2. s T LL T s

25 Positive Definite Updating Hence with v = αl T s. Plugging in we get v = J T s = L T s + v(y Lv)T s v T v αl T s = L T s + αlt s(y αll T s) T s α 2. s T LL T s Hence α 2 = [ s T ] y. s T Ms

26 Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updating Therefore, this update only exists if s T y > 0 in which case with yielding J = L + (y αms)s T L, αs T Ms α = M + = M + yy T [ s T ] y 1/2, s T Ms y T s Mss T M s T Ms. This is called the BFGS update. It is currently viewed as the best MSE update available for optimization due to its outstanding performace in practise.

27 BFGS Updating In addition, we note that the M + can be obtained directly from the matrices J. If the QR factorization of J T is J T = QR, we can set L + = R yielding M + = JJ T = R T Q T QR = L + L T +.

28 Inverse BFGS Updating Sherman-Morrison-Woodbury formula again gives the inverse update where H = M 1. H + = H + (s + Hy)T yss T (s T y) 2 HysT + sy T H s T y,

29 How to insure s T y > 0 The BFGS update preserves both symmetry and positive definiteness when s k T y k > 0. But is this condition reasonable or even implementable?

30 How to insure s T y > 0 The BFGS update preserves both symmetry and positive definiteness when s k T y k > 0. But is this condition reasonable or even implementable? Recall that y = y k = f (x k+1 ) f (x k ) and where s k = x k+1 x k = t k d k, d k = t k H k f (x k ) is the search direction and t k > 0 is the stepsize.

31 How to insure s T y > 0 The BFGS update preserves both symmetry and positive definiteness when s k T y k > 0. But is this condition reasonable or even implementable? Recall that y = y k = f (x k+1 ) f (x k ) and where s k = x k+1 x k = t k d k, d k = t k H k f (x k ) is the search direction and t k > 0 is the stepsize. Hence y k T s k = f (x k+1 ) T s k f (x k ) T s k = t k ( f (x k + t k d k ) T d k f (x k ) T d k ),

32 How to insure s T y > 0 y k T s k = f (x k+1 ) T s k f (x k ) T s k = t k ( f (x k + t k d k ) T d k f (x k ) T d k ),

33 How to insure s T y > 0 y k T s k = f (x k+1 ) T s k f (x k ) T s k = t k ( f (x k + t k d k ) T d k f (x k ) T d k ), Since H k is sdp, f (x k ) T d k < 0 so t k > 0.

34 How to insure s T y > 0 y k T s k = f (x k+1 ) T s k f (x k ) T s k = t k ( f (x k + t k d k ) T d k f (x k ) T d k ), Since H k is sdp, f (x k ) T d k < 0 so t k > 0. Therefore, to get s k T y k > 0 we must show t k > 0 can be choosen so that f (x k + t k d k ) T d k β f (x k ) T d k for some β (0, 1) since in this case f (x k + t k d k ) T d k f (x k ) T d k (β 1) f (x k ) T d k > 0.

35 How to insure s T y > 0 y k T s k = f (x k+1 ) T s k f (x k ) T s k = t k ( f (x k + t k d k ) T d k f (x k ) T d k ), Since H k is sdp, f (x k ) T d k < 0 so t k > 0. Therefore, to get s k T y k > 0 we must show t k > 0 can be choosen so that f (x k + t k d k ) T d k β f (x k ) T d k for some β (0, 1) since in this case f (x k + t k d k ) T d k f (x k ) T d k (β 1) f (x k ) T d k > 0. But this precisely the second condition in the weak Wolfe conditions with β = c 2. Hence a successful BFGS update can always be obtained.

Search Directions for Unconstrained Optimization

8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All