Matrix Secant Methods

Size: px

Start display at page:

Download "Matrix Secant Methods"

Matthew Fletcher
5 years ago
Views:

1 Equation Solving g(x) = 0

3 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ).

4 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ) years ago the Babylonians used the secant method in 1D: J = x x 1 g(x ) g(x 1 ).

5 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ) years ago the Babylonians used the secant method in 1D: This gives the error J = x x 1 g(x ) g(x 1 ). g (x ) 1 J = g(x 1 ) [g(x ) + g (x )(x 1 x )] g (x )[g(x 1 ) g(x. )]

6 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ) years ago the Babylonians used the secant method in 1D: This gives the error J = x x 1 g(x ) g(x 1 ). g (x ) 1 J = g(x 1 ) [g(x ) + g (x )(x 1 x )] g (x )[g(x 1 ) g(x. )] If g (x ) 0, then α > 0 such that g (x ) 1 J (L/2) x 1 x 2 α g (x ) x 1 x K x 1 x by the Quadratic Bound Lemma, so x x 2-step quadratic rate.

7 Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 )

8 Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 ) Multiply on the rhs by g(x ) g(x 1 ) gives J (g(x ) g(x 1 )) = x x 1. This is called the matrix secant equation (MSE), or quasi-newton equation.

9 Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 ) Multiply on the rhs by g(x ) g(x 1 ) gives J (g(x ) g(x 1 )) = x x 1. This is called the matrix secant equation (MSE), or quasi-newton equation. Here the matrix J is unnown (n 2 unnowns), and satisfies the n linear equations of the MSE.

10 Can we apply the secant method to higher dimentions than 1? J = x x 1 g(x ) g(x 1 ) Multiply on the rhs by g(x ) g(x 1 ) gives J (g(x ) g(x 1 )) = x x 1. This is called the matrix secant equation (MSE), or quasi-newton equation. Here the matrix J is unnown (n 2 unnowns), and satisfies the n linear equations of the MSE. There are not enough equations to nail down all of the entries if J, so further conditions are required. What should they be?

11 To better understand what further conditions on J are sensible, we revert to discussing the matrices B = J 1, so the MSE becomes B (x x 1 ) = g(x ) g(x 1 ).

12 To better understand what further conditions on J are sensible, we revert to discussing the matrices B = J 1, so the MSE becomes B (x x 1 ) = g(x ) g(x 1 ). Key Idea: Thin of the J s as part of an iteration scheme {x, B }.

13 To better understand what further conditions on J are sensible, we revert to discussing the matrices B = J 1, so the MSE becomes B (x x 1 ) = g(x ) g(x 1 ). Key Idea: Thin of the J s as part of an iteration scheme {x, B }. As the iteration proceeds, it is hoped that B becomes a better approximation to g (x ). With this in mind, we assume that, in the long run, B is already close to g (x ), so B +1 should not differ from B by too much, i.e. B should be close to B +1.

14 What do we mean by close in this context? Should B B +1 be small?

15 What do we mean by close in this context? Should B B +1 be small? We tae close to mean algebraically close in the sense that B +1 should be easy to compute from B and yet satisfy the MSE.

16 What do we mean by close in this context? Should B B +1 be small? We tae close to mean algebraically close in the sense that B +1 should be easy to compute from B and yet satisfy the MSE. Let s start by assuming that B +1 is a ran one update to B. That is, there exist u, v R n such that B +1 = B + uv T

17 Set s := x +1 x and y := g(x +1 ) g(x ). The MSE becomes B +1 s = y.

18 Set s := x +1 x and y := g(x +1 ) g(x ). The MSE becomes B +1 s = y. Multiplying by s gives Hence, if v T s 0, we obtain and y = B +1 s = B s + uv T s. B +1 = B + u = y B s v T s ( y B s ) v T v T s.

19 ( y B s ) v T B +1 = B + v T s This equation determines a whole class of ran one updates that satisfy the MSE by choosing v R n so that v T s 0.

20 ( y B s ) v T B +1 = B + v T s This equation determines a whole class of ran one updates that satisfy the MSE by choosing v R n so that v T s 0. One choice is v = s giving B +1 = B = ( y B s ) s T s T s. This is nown as Broyden s update. It turns out that the Broyden update is also analytically close to B.

21 Let A R n n, s, y R n, s 0. Then for any matrix norms and e such that AB A B e and the solution to vv T v T v e 1, min { B A : Bs = y} B R n n is T (y As)s A + = A +. s T s In particular, A + solves the given optimization problem when is the l 2 matrix norm, and A + is the unique solution when is the Frobenius norm.

22 Proof: Let B {B R n n : Bs = y}, then T (y As)s A + A = = (B A) ss T s T s s T s B A ss T s T s e B A.

23 Broyden s Methods Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x, B ) compute (x +1, B x+1 ) as follows: Solve B s = g(x ) for s and set x +1 : = x + s y : = g(x +1 ) g(x ) B +1 : = B + (y B s )s T. s T s

24 Sherman-Morrison-Woodbury Formula for Matrix Inversion Suppose A R n n, U R n, V R n are such that both A 1 and (I + V T A 1 U) 1 exist, then (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1

25 Inverse Broyden Update If B 1 J +1 = = J exists and s T J y = s T B 1 y 0, then [ B + (y B s )s T s T s = J + (s J y )s T J. s T J y ] 1 = B 1 + (s B 1 y )s T B 1 s T B 1 y

26 Inverse Broyden Update If B 1 J +1 = = J exists and s T J y = s T B 1 y 0, then [ B + (y B s )s T s T s = J + (s J y )s T J. s T J y ] 1 = B 1 + (s B 1 y )s T B 1 s T B 1 y Under suitable hypotheses x x superlinearly.

27 Broyden Updates Algorithm: Broyden s Method (Inverse Updating) Initialization: x 0 R n, B 0 R n n Having (x, B ) compute (x +1, B x+1 ) as follows: s : = J g(x ) x +1 : = x + s y : = g(x +1 ) g(x ) J +1 = J + (s J y )s T J. s T J y

Search Directions for Unconstrained Optimization

8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All