1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

Size: px
Start display at page:

Download "1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν"

Transcription

1 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection of search directions The algorithms we consider are descent methods having an iteration scheme of the form x k+1 = x k + t k d k Line search methods for choosing a suitable stepsize t k were the focus of our work in last chapter We now devote our attention to choices for the search directions d k All of the search directions considered in this chapter can be classified as Newton-like since they are all of the form d k = H k f(x k ) for some matrix H k If H k = I for all k, we recover the method of steepest descent However, in general we wish to choose H k to be an approximation to 2 f(x k ) 1 which gives Newton s method for optimization The significance of Newton s method is that when it converges, it converges very fast, typically doubling the accuracy of the solution at each iteration In order to understand this behavior, we must first learn a bit about rate of convergence 11 Rate of Convergence In this section we focus on notions of quotient convergence, or Q-convergence There are corresponding notions of root convergence, or R-convergence These notions are derived from the quotient or root test for the convergence of power series In all instances the discussion given below only refers to Q-convergence Let {x ν } R n and x R n be such that x ν x We say that x ν x at a linear rate if lim sup ν x ν+1 x x ν x < 1 The convergence is said to be superlinear if this limsup is 0 The convergence is said to be quadratic if lim sup ν x ν+1 x x ν x 2 < For example, given γ (0, 1) the sequence {γ ν } converges linearly to zero, but not superlinearly The sequence {γ ν2 } converges superlinearly to 0, but not quadratically Finally, the sequence {γ 2ν } converges quadratically to zero Superlinear convergence is much faster than linear convergences, but quadratic convergence is much, much faster than superlinear convergence 12 Newton s Method for Solving Equations Consider the following problem: E : Given g : R n R n, find x R n for which g(x) = 0 A great variety of problems can be posed in this format It is a problem of great practical importance and various versions of this problem continue to be the focus of much ongoing scientific, mathematical, and numerical research It is important in the context of optimization because of the first order necessary conditions for optimality, f(x) = 0 A standard 1

2 2 approach to solving optimization problems is to locate all critical points, or, equivalently, the zero set of the gradient Newton s method for equation solving (or just, Newton s method) is designed for this purpose When one applies Newton s method to solve for the critical points of a function, it is refered to Newton s method for optimization Before moving on to optimization though, we first consider the standard Newton s method for equations Assume that the function g in E is continuously differentiable and that we have an approximate solution x 0 R n to E We now wish to improve on this approximation If x is a solution to E, then 0 = g(x) = g(x 0 ) + g (x 0 )(x x 0 ) + o x x 0 Thus, if x 0 is close to x, it is reasonable to suppose that the solution to the linearized system (11) 0 = g(x 0 ) + g (x 0 )(x x 0 ) is even closer This proceedure is known as Newton s method for finding the roots of the equation g(x) = 0 It has one obvious pitfall Equation (11) may not be consistent That is, there may not exist an x solving (11) In general, the set of solutions to (11) is either (1) the empty set, (2) an infinite set, or (3) a single point For the sake of the present argument, we assume that (3) holds, ie g (x 0 ) 1 exists Under this assumption (11) defines the iteration scheme, (12) x k+1 := x k [g (x i )] 1 g(x k ), called the Newton iteration The associated direction (13) d k := [g (x k )] 1 g(x k ) is called the Newton direction We analyze the convergence behavior of this scheme under the additional assumption that only an approximation to g (x k ) 1 is available We denote this approximation by J k The resulting iteration scheme is (14) x k+1 := x k J k g(x k ) Methods of this type are called Newton-Like methods Theorem 11 Let g : R n R n be differentiable, x 0 R n, and J 0 R n n Suppose that there exists x, x 0 R n, and ɛ > 0 with x 0 x < ɛ such that (1) g(x) = 0, (2) g (x) 1 exists for x B(x; ɛ) := {x R n : x x < ɛ} with sup{ g (x) 1 : x B(x; ɛ)] M 1 (3) g is Lipschitz continuous on clb(x; ɛ) with Lipschitz constant L, and (4) θ 0 := LM 1 2 x0 x + M 0 K < 1 where K (g (x 0 ) 1 J 0 )y 0, y 0 := g(x 0 )/ g(x 0 ), and M 0 = max{ g (x) : x B(x; ɛ)} Further suppose that iteration (14) is initiated at x 0 where the J k s are chosen to satisfy one of the following conditions;

3 (i) (g (x k ) 1 J k )y k K, (ii) (g (x k ) 1 J k )y k θ k 1K for some θ 1 (0, 1), (iii) (g (x k ) 1 J k )y k min{m 2 x k x k 1, K}, for some M 2 > 0, or (iv) (g (x k ) 1 J k )y k min{m 2 g(x k ), K}, for some M 3 > 0, where for each k = 1, 2,, y k := g(x k )/ g(x k ) These hypotheses on the accuracy of the approximations J k yield the following conclusions about the rate of convergence of the iterates x k (a) If (i) holds, then x k x linearly (b) If (ii) holds, then x k x superlinearly (c) If (iii) holds, then x k x two step quadratically (d) If (iv) holds, then x k x quadratically Proof We begin by establishing the basic inequalities (15) x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ), and (16) x k+1 x θ 0 x k x and the inclusion (17) x k+1 B( x; ɛ) by induction on k For k = 0 we have x 1 x = x 0 x g (x 0 ) 1 g(x 0 ) + [g (x 0 ) 1 J 0 ]g(x 0 ) = g (x 0 ) 1 [g(x) (g(x 0 ) + g (x 0 )(x x 0 ))] +[g (x 0 ) 1 J 0 ]g(x 0 ), since g (x 0 ) 1 exists by the hypotheses Consequently, the hypothese (1) (4) plus the quadratic bound lemma imply that x k+1 x g (x 0 ) 1 g(x) (g(x 0 ) + g (x 0 )(x x 0 )) + (g (x 0 ) 1 J 0 )g(x 0 ) M 1L 2 x0 x 2 + K g(x 0 ) g(x) M 1L 2 x0 x 2 + M 0 K x 0 x θ 0 x 0 x < ɛ, whereby (15) (16) are established for k = 0 Next suppose that (15) (16) hold for k = 0, 1,, s 1 We show that (15) (16) hold at k = s Since x s B(x, ɛ), hypotheses (2) (4) hold at x s, one can proceed exactly as in the case k = 0 to obtain (15) Now if any one of (i) (iv) holds, then (i) holds Thus, by 3

4 4 (15), we find that x s+1 x M 1L 2 xs x 2 + (g (x s ) 1 J s )g(x s ) [ M 1L 2 θs 0 x 0 x + M 0 K] x s x [ M 1L 2 x0 x + M 0 K] x s x = θ 0 x s x Hence x s+1 x θ 0 x s x θ 0 ɛ < ɛ and so x s+1 B(x, ɛ) We now proceed to establish (a) (d) (a) This clearly holds since the induction above established that (b) From (15), we have x k+1 x θ 0 x k x x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ) LM 1 2 xk x 2 + θ k 1K g(x k ) [ LM 1 2 θk 0 x 0 x + θ k 1M 0 K] x k x Hence x k x superlinearly (c) From (15) and the fact that x k x, we eventually have x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ) LM 1 2 xk x 2 + M 2 x k x k 1 g(x k ) [ LM 1 2 xk x + M 0 M 2 [ x k 1 x + x k x ]] x k x [ LM 1 2 θ 0 x k 1 x + M 0 M 2 (1 + θ 0 ) x k 1 x ] θ 0 x k 1 x Hence x k x two step quadratically = [ LM 1 2 θ 0 + M 0 M 2 (1 + θ 0 )]θ 0 x k 1 x 2

5 5 (d) Again by (15) and the fact that x k x, we eventually have x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ) LM 1 2 xk x 2 + M 2 g(x k ) 2 [ LM M 2 M 2 0 ] x k x 2 Note that the conditions required for the approximations to the Jacobian matrices g (x k ) 1 given in (i) (ii) do not imply that J k g ( x) 1 The stronger conditions (i) g (x k ) 1 J k g (x 0 ) 1 J 0, (ii) g (x k+1 ) 1 J k+1 θ 1 g (x k ) 1 J k for some θ 1 (0, 1), (iii) g (x k ) 1 J k min{m 2 x k+1 x k, g (x 0 ) 1 J 0 } for some M 2 > 0, or (iv) g (x k ) 1 = J k, which imply the conditions (i) through (iv) of Theorem 11 respectively, all imply the convergence of the inverse Jacobian approximates to g ( x) 1 Clearly the conditions (i) (iv) are not as desirable since they require a great deal more expense and care in the construction of the inverse Jacobian approximates 13 Newton s Method for Minimization In this section we translate the results of previous section in the context of minimization Here the underlying problem is The Newton-like iterations are of the form P min f(x) x R n x k+1 = x k H k f(x k ), where H k is an approximation to the inverse of the Hessian matrix 2 f(x k ) Theorem 12 Let f : R n R be twice differentiable, x 0 R n, and H 0 R n n Suppose that (1) there exists x R n and ɛ > x 0 x such that f(x) f(x) whenever x x ɛ, (2) there is a δ > 0 such that δ z 2 2 z T 2 f(x)z for all x B(x, ɛ), (3) 2 f is Lipschitz continuous on clb(x; ɛ) with Lipschitz constant L, and (4) θ 0 := L 2δ x0 x + M 0 K < 1 where M 0 > 0 satisfies z T 2 f(x)z M 0 z 2 2 for all x B(x, ɛ) and K ( 2 f(x 0 ) 1 H 0 )y 0 with y 0 = f(x 0 )/norm f(x 0 ) Further, suppose that the iteration (18) x k+1 := x k H k f(x k ) is initiated at x 0 where the H k s are chosen to satisfy one of the following conditions: (i) ( 2 f(x k ) 1 H k )y k K, (ii) ( 2 f(x k ) 1 H k )y k θ k 1K for some θ 1 (0, 1), (iii) ( 2 f(x k ) 1 H k )y k min{m 2 x k x k 1, K}, for some M 2 > 0, or (iv) ( 2 f(x k ) 1 H k )y k min{m 2 f(x k ), K}, for some M 3 > 0,

6 6 where for each k = 1, 2, y k := f(x k )/ f(x k ) These hypotheses on the accuracy of the approximations H k yield the following conclusions about the rate of convergence of the iterates x k (a) If (i) holds, then x k x linearly (b) If (ii) holds, then x k x superlinearly (c) If (iii) holds, then x ɛ x two step quadratically (d) If (iv) holds, then x k k quadradically In order to more fully understand the convergence behavior described in the above result a careful study of the role of the controling parameters L, M 0, and M 1 needs to be made Although we do not attempt this study, we do make a few observations First observe that since L is a Lipschitz constant for 2 f it represents a bound on the third order behavior of f Thus the assumptions for convergence make implicit demands on the third derivative Next, the constant δ in the context of minimization represents a local uniform lower bound on the eigenvalues of 2 f That is, f behaves locally as if it were a strongly convex function (see exercises) with modulus δ Finally, M 0 can be interpreted as a local Lipschitz constant for f and only plays a role when 2 f is approximated inexactly by H k s We now consider the performance differences between the method of steepest descent and Newton s method on a simple one dimensional problem For this we consider the function f(x) = x 2 + e x Clearly, f is a strongly convex function with f(x) = x 2 + e x f (x) = 2x + e x f (x) = 2 + e x > 2 f (x) = e x If we apply the steepest descent algorithm with backtracking (γ = α, c = 001) initiated at x 0 = 1, we get the following table k x k f(x k ) f (x k ) s

7 Let us now apply Newton s method from the same starting point taking a unit step at each iteration This time we get x f (x) / and one more iteration give f (x 5 ) This is a stunning improvement in performance and shows why one always uses Newton s method (or an approximation to it) whenever possible Our next objective is to develop numerically viable methods for approximating Jacobians and Hessians in Newton-like methods We begin with a brief excursion into numerical linear algebra 14 Numerical Linear Algebra 141 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method one first forms the augmented system [A b] and then uses the three elementary row operations to put this system into row echelon form (or upper triangular form) A solution x is then obtained by back substitution, or back solving, starting with the component x n We now show how the process of bringing a matrix to upper triangular form can be performed by left matrix multiplication The key step in Gaussian elimination is to transform a vector of the form 7 a α b, where a R k, 0 α R, and b R n k 1, into one of the form a α 0 This can be accomplished by left matrix multiplication as follows: I k k a α = a α 0 α 1 b I (n k 1) (n k 1) b 0

8 8 The matrix I k k α 1 b I (n k 1) (n k 1) is called a Gaussian elimination matrix This matrix is invertible with inverse I k k α 1 b I (n k 1) (n k 1) We now use this basic idea to show how a matrix can be put into upper triangular form Suppose [ ] a1 v1 T A = C n m, u 1 Ã 1 with 0 a 1 C, u 1 C m 1, v 1 C n 1, and Ã1 C (m 1) (n 1) Then using the first row to zero out u 1 amounts to left multiplication of the matrix A by the matrix [ ] 1 0 to get (*) where Define and observe that u 1 a 1 u 1 a 1 [ ] [ ] 1 0 a1 v1 T C I n m = u 1 Ã 1 u 1 a 1 A 1 = Ã1 u 1 v T 1 /a 1 [ ] 1 0 L 1 = C I m m and U 1 = I [ 1 0 L 1 1 = I u 1 a 1 ] [ ] a1 v1 T 0 A 1 [ ] a1 v1 T C 0 A m n 1 Hence (*) becomes L 1 1 A = U 1, or equivalently, A = L 1 U 1 Note that L 1 is unit lower triangular (ones on the mail diagonal) and U 1 is block uppertriangular with one 1 1 block and one (m 1) (n 1) block on the block diagonal The multipliers are usually denoted u/a = [µ 21, µ 31,, µ m1 ] T If the (1, 1) entry of A 1 is not 0, we can apply the same procedure to A 1 : if [ ] a2 v2 T A 1 = C (m 1) (n 1) u 2 Ã 2 with a 2 0, letting [ ] I 0 L 2 = C I (m 1) (m 1), u 2 a 2,

9 and forming L 1 2 A 1 = [ ] [ ] 1 0 a2 v2 T = I u 2 Ã 1 u 1 a 2 [ ] a2 v2 T 0 A Ũ2 C (m 1) (n 1), 2 where A 2 C (m 2) (n 2) This process amounts to using the second row to zero out elements of the second column below the diagonal Setting [ ] [ ] 1 0 a v T L 2 = 0 L and U 2 =, 2 we have [ 1 0 L 1 2 L 1 1 A = 0 L Ũ2 ] [ ] a v T = U 0 A 2, 1 or equivalently, A = L 2 L 1 U 2 Here U 2 is block upper triangular with two 1 1 blocks and one (m 2) (n 2) block on the diagonal, and again L 2 is unit lower triangular We can continue in this fashion at most m 1 times, where m = min{m, n} If we can proceed m 1 times, then L 1 m 1 L 1 2 L 1 1 A = U m 1 = U is upper triangular provided that along the way that the (1, 1) entries of A, A 1, A 2,, A m 2 are nonzero so the process can continue Define L = (L 1 m 1 L 1 1 ) 1 = L 1 L 2 L m 1 The matrix L is square unit lower triangular, and so is invertable Moreover, A = LU, where the matrix U is the so called row echelon form of A In general, a matrix T C m n is said to be in row echelon form if for each i = 1,, m 1 the first non-zero entry in the (i + 1) st row lies to the right of the first non-zero row in the i th row Let us now suppose that m = n and A C n n is invertible Writing A = LU as a product of a unit lower triangular matrix L C n n (necessarily invertible) and an upper triangular matrix U C n n (also nessecarily invertible in this case) is called the LU factorization of A Remarks (1) If A C n n is invertible and has an LU factorization, it is unique (2) One can show that A C n n has an LU factorization iff for 1 j n, the upper left j j principal submatrix a 11 a ij is invertible a j1 a jj 9

10 10 (3) Not every [ invertible ] A C n n has an LU-factorization 0 1 Example: 1 0 Typically, one must permute the rows of A to move nonzero entries to the appropriate spot for the elimination to proceed Recall that a permutation matrix P C n n is the identity I with its rows (or columns) permuted: so P R n n is orthogonal, and P 1 = P T Permuting the rows of A amounts to left multiplication by a permutation matrix P T ; then P T A has an LU factorization, so A = P LU (called the PLU factorization of A) (4) Fact: Every invertible A C n n has a (not necessarily unique) PLU factorization (5) The LU factorization can be used to solve linear systems Ax = b (where A = LU C n n is invertible) The system Ly = b can be solved by forward substitution (1 st equation gives x 1, etc), and Ux = y can be solved by back-substitution (n th equation gives x n, etc), giving the solution to? Ax = LUx = b Example: We now use the procedure outlined above to compute the LU factorization of the matrix A = L 1 1 A = We now have and L = L 1 L 2 = = L 1 2 L 1 1 A = = U = , =

11 142 The Cholesky Factorization We now consider the application of the LU factorization to a symmetric positive definite matrix, but with a twist Suppose the n n matrix H is symmetric and we have performed the first step in the procedure for computing the LU factorization of H so that L 1 1 H = U 1 Clearly, U 1 is no-longer symmetric (assuming L 1 is not the identity matrix) To recover symmetry we could multiply U 1 on the right by the upper triangular matrix L T 1 so that L 1 1 HL T 1 = U 1 L T 1 = H 1 11 We claim that H 1 necessarily has the form [ h(1,1) 0 H 1 = 0 Ĥ 1 ], where h (1,1) is the (1, 1) element of H and Ĥ1 is an (n 1) (n 1) symmetric matrix For example, consider the matrix H = In this case, we get L 1 1 HL T 1 = = If we now continue this process with the added feature of multiplying on the right by L T j as we proceed, we obtain L 1 HL T = D, or equivalently, H = LDL T, where L is a unit lower triangular matrix and D is a diagonal matrix Note that the entries on the diagonal of D are not necessarily the eigenvalues of H since the transformation L 1 HL T is not a similarity transformation Observe that if it is further assumed that H is positive definite, then the diagonal entries of D are necessarity all positive and the factorization H = LDL T can always be obtained, ie no zero pivots can arise in computing the LU factorization (see exercises)

12 12 Let us apply this approach by continuing the computation of the LU factorization for the matrix given above Thus far we have L 1 1 HL T 1 = = Next L 1 2 L 1 1 HL T 1 L T 2 = = , giving the desired factorization H = LDL T = Note that this implies that the matrix H is not positive definite We make one final comment on the positive definite case When H is symmetric and positive definite, an LU factorization always exists and we can use it to obtain a factorization of the form H = LDL T, where L is unit lower triangular and D is diagonal with positive diagonal entries If D = diag(d 1, d 2,, d n ) with each d i > 0, the D 1/2 = diag( d 1, d 2,, d n ) Hence we can write H = LDL T = LD 1/2 D 1/2 L T = ˆLˆL T, where ˆL = LD 1/2 is a non-singular lower triangular matrix The factorization H = ˆLˆL T where ˆL is a non-singular lower triangula matrix is called the Cholesky factorization of H In this regard, the process of computing a Chlesky factorization is an effective means for determining is a symmetric matrix is positive definite 143 The QR Factorization Recall the Gram-Schmidt orthogonalization process for a sequence of linearly independent vectors a 1,, a n R n Define q 1,, q n inductively, as

13 13 follows: set p 1 = a 1, q 1 = p 1 / p 1, j 1 p j = a j a j, q j q i for 2 j n, and i=1 q j = p j / p j For 1 j n, q j Span{a 1,, a j }, so each p j 0 by the lin indep of {a 1,, a n } Thus each q j is well-defined We have {q 1,, q n } is an orthonormal basis for Span{a 1,, a n } Also a k Span{q 1,, q k } 1 k n, so {q 1,, q k } is an orthonormal basis of Span{a 1,, a k } Define r jj = p j and r ij = a j, q i for 1 i < j n, we have: a 1 = r 11 q 1, a 2 = r 12 q 1 + r 22 q 2, a 3 = r 13 q 1 + r 23 q 2 + r 33 q 3, a n = n r in q i i=1 Set A = [a 1 a 2 a n ], R = [r ij ], and Q = [q 1 q 2 q n ], where r ij = 0, i > j Then A = QR, where Q is unitary and R is upper triangular Remarks (1) If a 1, a 2, is a linearly independent sequence, apply Gram-Schmidt to obtain an orthonormal sequence q 1, q 2, such that {q 1,, q k } is an orthonormal basis for Span{a 1,, a k }, k 1 (2) If the a j s are linearly dependent, for some value(s) of k, a k Span{a 1,, a k 1 }, so p k = 0 The process can be modified by setting r kk = 0, not defining a new q k for this iteration and proceeding We end up with orthogonal q j s Then for k 1, the vectors {q 1,, q k } form an orthonormal basis for Span{a 1,, a l+k } where l is the number of r jj = 0 Again we obtain A = QR, but now Q may not be square

14 14 (3) The classical Gram-Schmidt algorithm as described does not behave well computationally This is due to the accumulation of round-off error The computed q j s are not orthogonal: q j, q k is small for j k with j near k, but not so small for j k or j k An alternate version, Modified Gram-Schmidt, is equivalent in exact arithmetic, but behaves better numerically In the following pseudo-codes, p denotes a temporary storage vector used to accumulate the sums defining the p j s Classic Gram-Schmidt Modified Gram-Schmidt For j = 1,, n do For j = 1,, n do p := a j p := aj For i = 1,, j 1 do For i = 1,, j 1 do r ij = a j, q i r ij = p, q i p := p r ij q i p := p r ij q i r jj := p r jj = p q j := p/r jj q j := p/r jj The only difference is in the computation of r ij : in Modified Gram-Schmidt, we orthogonalize the accumulated partial sum for p j against each q i successively Theorem 13 Suppose A R m n with m n Then for which unitary Q R m m upper triangular R R m n A = QR If Q R m n denotes the first n columns of Q and R R n n denotes the first n rows of R, then [ ] R A = QR = [ Q ] = Q R 0 Moreover (a) We may choose an R to have nonnegative diagonal entries (b) If A is of full rank, then we can choose R with positive diagonal entries, in which case the condensed factorization A = Q R is unique (and thus if m = n, the factorization A = QR is unique since then Q = Q and R = R) Proof If A has full rank, apply the Gram-Schmidt Define Q = [q 1,, q n ] R m n and R = [rij ] R n n as above, so A = Q R

15 Extend {q 1,, q n } to an orthonormal basis {q 1,, q m } of R m, and set [ ] R Q = [q 1,, q m ] and R = C m n, so A = QR 0 As r jj > 0 in the G-S process, we have (b) Uniqueness follows by induction passing through the G-S process again, noting that at each step we have no choice Remarks (1) In practice, there are more efficient and better computationally behaved ways of calculating the Q and R factors The idea is to create zeros below the diagonal (successively in columns 1, 2, ) as in Gaussian Elimination, except we now use Householder transformations (which are unitary) instead of the unit lower triangular matrices L j (2) A QR factorization is also possible when m < n A = Q[R 1 R 2 ], where Q C m m is unitary and R 1 C m m is upper triangular Every A R m n has a QR-factorization, even when m < n Indeed, if rank(a) = k, there always exist Q R m k with orthonormal columns, R R k n full rank upper triangular, and a permutation matrix P R n n such that ( ) AP = QR Moreover, if A has rank n (so m n), then R R n n is nonsingular On the other hand, if m < n, then R = [R 1 R 2 ], where R 1 R k k is nonsingular Finally, if A R m n, then the same facts hold, but now both Q and R can be chosen to be real matrices QR-Factorization and Orthogonal Projections Let A R m n have condensed QR-factorization A = Q R Then by construction the columns of Q form an orthonormal basis for the range of A Hence P = Q Q T is the orthogonal projector onto the range of A Similarly, if the condensed QR-factorization of A T is A T = Q 1 R1, then P 1 = Q 1 QT 1 is the orthogonal projector onto Ran(A T ) = ker(a), and so I Q 1 QT 1 15

16 16 is the orthogonal projector onto ker(a) The QR-factorization can be computed using either Givens rotations or Householder reflections Although, the approach via rotations is arguably more stable numerically, it is more difficult to describe so we only illustrate the approach using Householder reflections QR using Householder Reflections Given w R n we can associate the matrix U = I 2 wwt w T w which reflects R n across the hyperplane Span{w} The matrix U is call the Householder reflection across this hyperplane Given a pair of vectors x and y with x 2 = y 2, and x y, there is a Householder reflection such that y = Ux: Proof (x y)(x y)t U = I 2 (x y) T (x y) x 2 y T x Ux = x 2(x y) x 2 2y T x + y 2 = x 2(x y) x 2 y T x 2( x 2 y T x) = y since x = y QR using Householder Reflections We now describe the basic deflation step in the QR-factorization [ ] α0 a A 0 = T 0 b 0 A 0 Set ν 0 = ( α0 b 0 ) 2

17 17 Let H 0 be the Householder transformation that maps ( ) α0 ν 0 e 1 : b T 0 H 0 = I 2 wwt w T w where w = ( α0 ) ν b 0 e 1 = 0 ( ) α0 ν 0 b 0 Thus, A problem occurs if ν 0 = 0 or H 0 A = ( α0 b 0 ) [ ] ν0 a T 1 0 A 1 = 0 In this case, permute the offending column to the right bringing in the column of greatest magnitude Now repeat with A 1 If the above method if implemented by always permuting the column of greatest magnitude into the current pivot column, then AP = QR gives a QR-factorization with the diagonal entries of R nonnegative and listed in the order of descending magnitude 15 Matrix Secant Methods Let us return to the problem of finding x R n such that g(x) = 0 where g : R n R n is C 1 In this section we consider Newton-Like methods of a special type Recall that in a Newton-Like method the iteration scheme takes the form (19) x k+1 := x k J k g(x k ), where J k is meant to approximate the inverse of g (x k ) In the one dimensional case, a method proposed by the Babylonians 3700 years ago is of particular significance Today we call it the secant method: x k 1 x k (110) J k = g(x k 1 ) g(x k ) With this approximation one has g (x k ) 1 J k = g(xk 1 ) [g(x k ) + g (x k )(x k 1 x k )] g (x k )[g(x k 1 ) g(x k )] Also, near a point x at which g (x ) 0 there exists an α > 0 such that α x y g(x) g(y) Consequently, by the Quadratic Bound Lemma, g (x k ) 1 J k L 2 xk 1 x k 2 α g (x k ) x k 1 x k K xk 1 x k

18 18 for some constant K > 0 whenever x k and x k 1 are sufficiently close to x Therefore, by Theorem 11, the secant method is locally two step quadratically convergent to a non singular solution of the equation g(x) = 0 An additional advantage of this approach is that no extra function evaluations are required to obtain the approximation J k 151 Matrix Secant Methods for Equations Unfortunately, the secant approximation (110) is meaningless if the dimension n is greater than 1 since division by vectors is undefined But this can be rectified by simply writing (111) J k (g(x k 1 ) g(x k )) = x k 1 x k Equation (111) is called the Quasi-Newton equation (QNE), or matrix secant equation (MSE), at x k and it determines J k along an n dimensional manifold in R n n Equation (111) is not enough to uniquely determine J k since (111) is n linear equations in n 2 unknowns To nail down a specific J k further conditions on the update J k must be given In order to determine sensible conditions, let us consider an overall iteration scheme based on (19) At every iteration we have (x k, J k ) and compute x k+1 by (19) Then J k+1 is constructed to satisfy (111) If B k = J 1 k is close to g (x k ) and x k+1 is close to x k, then J k+1 should be chosen not only to satisfy (111) but also to be as close to J k as possible In what sense should we mean close here? In order to facilitate the computations it is reasonable to mean algebraically close in the sense that J k+1 (or B k+1 = J 1 k+1 ) is only a rank 1 modification of J k (respectively, B k = J 1 k ) Since it may happen that g (x k ) is not invertible, we first see what happens when we use this idea to approximate B k+1 Since we are assuming that B k+1 is a rank 1 modification to B k, there are vectors u, v R n such that (112) B k+1 = B k + uv T We now use the matrix secant equation (MSE) to derive conditions on the choice of u and v In this setting, the MSE becomes where Multiplying (112) by s k gives Hence, if v T s k 0, we obtain and B k+1 s k = y k, s k := x k+1 x k and y k := g(x k 1 ) g(x k ) y k = B k+1 s k = B k s k + uv T s k u = yk B k s k v T s k (113) B k+1 = B k + (yk B k s k )v T v T s k Equation (113) determines a whole class of rank one updates that satisfy the MSE where one is allowed to choose v R n as long as v T s k 0 If s k 0, then an obvious choice for v

19 19 is s k yielding the update (114) B k+1 = B k = (yk B k s k )s kt s kt s k This is known as Broyden s update It turns out that the Broyden update is also analytically close Theorem 14 Let A R n n, s, y R n, s 0 Then for any matrix norms and such that AB A B and the solution to vvt 1, v T v (115) min{ B A : Bs = y} is (y As)sT (116) A + = A + s T s In particular, (116) solves (115) when is the l 2 matrix norm, and (116) solves (115) uniquely when is the Frobenius norm Proof Let B {B R n n : Bs = y}, then Note that if = 2, then (y As)sT A + A = = (B A) sst s T s s T s B A sst B A s T s vvt v T v 2 = sup{ vvt v T v x 2 : x 2 = 1} (v = sup{ T x) 2 : x v 2 2 = 1} = 1, so that the conclusion of the result is not vacuous For uniqueness observe that the Frobenius norm is strictly convex and A B F A F B 2 Therefore, the Broyden update (114) is both algebraically and analytically close to B k These properties indicate that it should perform well in practice and indeed it does Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows:

20 20 Solve B k s k = g(x k ) for s k and set x k+1 : = x k + s k y k : = g(x k ) g(x k+1 ) B k+1 : = B k + (yk B k s k )s kt s kt s k Let us now see if we can compute J k = B 1 k so that we can write the step computation as s k = J k g(x k ) avoiding the need to solve an equation For this we employ the following important lemma Lemma 11 (Sherman-Morrison-Woodbury) Suppose A R n n, U R n k, V R n k are such that both A 1 and (I + V T A 1 U) 1 exist, then (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1 The above lemma verifies that if B 1 k (117) J k+1 = [B k + (yk B k s k )s kt s kt s k ] 1 = B 1 k = J k exists and s kt J k y k = s kt B 1 k yk 0, then + (sk B 1 k yk )s kt B 1 k s kt B 1 k y = J k + (sk J k yk )skt s kt J k y In this case, it is possible to directly update the inverses J k But this process can become numerically unstable if s kt J k y k is small Therefore, in practise, the Broyden update is usually implemented in forward mode described by the algorithm for Broyden s method above Although we do not pause to establish the convergence rates here, we do give the following result due to Dennis and Moré (1974) Theorem 15 Let g : R n R n be continuously differentiable in an open convex set D R n Assume that there exists x R n and r, β > 0 such that x + rb D, g(x ) = 0, g (x ) 1 exists with g (x ) 1 β, and g is Lipschitz continuous on x + rb with Lipschitz constant γ > 0 Then there exist positive constants ɛ and δ such that if x 0 x 2 ɛ and B 0 g (x 0 ) δ, then the sequence {x k } generated by the iteration [ x k+1 := x k + s k where s k solves 0 = g(x k ) + B k s B k+1 := B k + (yk B k s k )s T k s T k sk where y k = g(x k+1 ) g(x k ) is well-defined with x k x superlinearly 152 Matrix Secant Methods for Minimization Let us now see how we can mimic these matrix secant ideas in the context of optimization where the underlying problem is one of minimization: P : minimize f(x) x R n where f : R n R is C 2 How can we modify and/or extend the matrix secant methods for equations to the setting of minimization where one wishes to solve the equation f(x) = 0 In this context the MSE (matrix secant equation) becomes s k = H k+1 y k J k

21 where s k := x k+1 x k and y k := f(x k+1 ) f(x k ) In this context the matrix H k is intended to be an approximation to the inverse of the Hessian matrix 2 f(x k ) Writing M k = H 1 k, a straightforward application of Broyden s method gives the update M k+1 = M k + (yk M k s k )s kt s kt s k However, this is unsatisfactory for two reasons: (1) Since M k approximates 2 f(x k ) it must be symmetric (2) Since we are minimizing, then M k must be positive definite to insure that s k = M 1 k f(xk ) is a direction of descent for f at x k To address problem 1 above, one could return to equation (113) an find an update that preserves symmetry Such an update is uniquely obtained by setting v = (y k M k s k ) This is called the symmetric rank 1 update or SR1 Although this update can on occasion exhibit problems with numerical stability, it has recently received a great deal of renewed interest The stability problems occur whenever (118) v T s k = f(x k+1 ) T M 1 k f(xk ) has small magnitude We now approach the question of how to update M k in a way that addresses both the issue of symmetry and positive definiteness while still using the Broyden updating ideas Given a symmetric positive definite matrix M and two vectors s and y, our goal is to find a symmetric positive definite matrix M such that Ms = y Since M is symmertic and positive definite, there is a non-singular n n matrix L such that M = LL T Indeed, L can be chosent to be lower triangular Cholesky factor of M If M is also symmetric and positive definite then there is a matrix J R n n such that M = JJ T The MSE implies that if (119) J T s = v then (120) Jv = y Let us apply the Broyden update technique to (120), J, and L That is, suppose that (121) J = L + Then by (119) (y Lv)vT v T v (122) v = J T s = L T s + v(y Lv)T s v T v This expression implies that v must have the form v = αl T s 21

22 22 for some α R Substituting this back into (122) we get Hence (123) α 2 = αl T s = L T s + αlt s(y αll T s) T s α 2 s T LL T s [ ] st y s T Ms Consequently, such a matrix J satisfying (122) exists only if s T y > 0 in which case with yielding J = L + (y αms)st L, αs T Ms α = [ ] 1/2 st y, s T Ms (124) M = M + yyt y T s MssT M s T Ms Moreover, the Cholesky factorization for M can be obtained directly from the matrices J Specifically, if the QR factorization of J T is J T = QR, we can set L = R yielding M = JJ T = R T Q T QR = LL T We have shown that beginning with a symmetric positive definite matrix M k we can obtain a symmetric and positive definite update M k+1 that satisfies the MSE M k+1 s k = y k by applying the formula (124) whenever s kt y k > 0 We must now address the question of how to choose x k+1 so that s kt y k > 0 Recall that and where t k is the stepsize Hence y = y k = f(x k+1 ) f(x k ) s k = t k M 1 k f(xk ), y kt s k = f(x k+1 ) T s k f(x k ) T s k = t k ( f(x k + t k d k ) T d k f(x k ) T d k ), where d k := M 1 k f(xk ) Now since M k is positive definite the direction d k is a descent direction for f at x k and so t k > 0 Therefore, to insure that s kt y k > 0 we need only show that t k > 0 can be choosen so that (125) f(x k + t k d k ) T d k β f(x k ) T d k for some β (0, 1) since in this case f(x k + t k d k ) T d k f(x k ) T d k (β 1) f(x k ) T d k > 0 But this precisely the second condition in the weak Wolfe conditions with β = c 2 The update (124) is called the BFGS (Broyden-Fletcher-Goldfarb- Shanno) update and is currently considered the best available matrix secant type update

23 for minimization Observe in (124) that if both M and M are positive definite, then they are both invertible The Sherman-Morrison-Woodbury formula shows that the inverse is given by (126) M 1 = M 1 + (s M 1 y)s T + s(s M 1 y) T (s M 1 y) T yss T y T s (y T s) 2 Thus the corresponding inverse updating scheme for the BFGS update is H k+1 = H k + (sk H k y k )s kt + s k (s k H k y k ) T y kt s k (sk H k y k ) T y k s k s kt (y kt s k ) 2 This is the form of the update that is most commonly used as it avoids the need to solve the equation M k d k = f(x k ) for the search direction d k Instead, the search direction is obtained directly with a matrix multiply, d k = H k f(x k ) However, one still needs to be careful to avoid numerical instability This is typically avoided by being careful to the formation of H k+1 from H k The advised process is as follows BFGS Updating σ := s kt y k ŝ k := s k /σ ŷ k := y k /σ H k+1 := H k + (ŝ k H k ŷ k )(ŝ k ) T + ŝ k (ŝ k H k ŷ k ) T (ŝ k H k ŷ k ) T ŷ k ŝ k (ŝ k ) T 23

24 24 Exercises (1) Let γ (0, 1) (a) Show that the sequence {γ ν } converges linearly to zero, but not superlinearly (b) Show that the sequence {γ ν2 } converges superlinearly to 0, but not quadratically (c) Finally, show that the sequence {γ 2ν } converges quadratically to zero (2) Apply Newton s method to the function f(x) = x 2 + sin x with initial point x 0 = 2π (3) If H R n n is symmetric and positive definite show that there is a diagonal matrix D having positive entries and a unit lower triangular matrix L such that H = LDL T (4) Compute the Cholesky factorization of the matrix (5) Apply the secant method to the function f(x) = x 2 + e x with starting points x 1 = and x 0 = 1 How many ierations does it take to obtain a point for which f (x k ) 10 8? Compare this performance with that of Newton s method and the method of steepest descent (6) Verify formulae (118) and (123) (7) Verify formulae (117), (124), and (126) (8) Prove the Sherman-Morrison-Woodbury Lemma (Lemma 11) (9) Show that every rank 1 matrix W in R n n can be written in the form W = uv T for some choice of vectors u, v R n (10) Show that every n n positive semi-definite matrix M can be written in the form M = V V T for some n n matrix V having the same rank as M Also, show that V can be chosen to be an n k matrix where k = rank(m) (11) The function f : R n R is said to be strongly convex if there is a δ > 0 such that f(x + λ(y x)) f(x) + λ[f(y) f(x)] δ λ(1 λ) y x 2 2 for every x, y R n and λ [0, 1] The parameter δ is called the modulus of strong convexity Prove the following characterization theorem for strongly convex functions Theorem 16 (Characterizations of strongly convex functions) Let f : R n R be differentiable Then the following statements are equivalent (a) f is strongly convex with modulus δ (b) f(y) f(x) + f(x) T (y x) + δ 2 y x 2 for all x, R n (c) ( f(x) f(y)) T (x y) δ y x 2 for all x, y R n If it is further assumed that f is twice continuously differentiable on R n, then the above conditions are also equivalent to the following statement: δ u 2 2 u T 2 f(x)u u R n That is, the spectrum of the Hessian of f is uniformly bounded below by δ on R n

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

This can be accomplished by left matrix multiplication as follows: I

This can be accomplished by left matrix multiplication as follows: I 1 Numerical Linear Algebra 11 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method

More information

Linear Analysis Lecture 16

Linear Analysis Lecture 16 Linear Analysis Lecture 16 The QR Factorization Recall the Gram-Schmidt orthogonalization process. Let V be an inner product space, and suppose a 1,..., a n V are linearly independent. Define q 1,...,

More information

Math 408A: Non-Linear Optimization

Math 408A: Non-Linear Optimization February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x

More information

Important Matrix Factorizations

Important Matrix Factorizations LU Factorization Choleski Factorization The QR Factorization LU Factorization: Gaussian Elimination Matrices Gaussian elimination transforms vectors of the form a α, b where a R k, 0 α R, and b R n k 1,

More information

Matrix Secant Methods

Matrix Secant Methods Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Math 407: Linear Optimization

Math 407: Linear Optimization Math 407: Linear Optimization Lecture 16: The Linear Least Squares Problem II Math Dept, University of Washington February 28, 2018 Lecture 16: The Linear Least Squares Problem II (Math Dept, University

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 208 LECTURE 3 need for pivoting we saw that under proper circumstances, we can write A LU where 0 0 0 u u 2 u n l 2 0 0 0 u 22 u 2n L l 3 l 32, U 0 0 0 l n l

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Review Questions REVIEW QUESTIONS 71

Review Questions REVIEW QUESTIONS 71 REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of

More information

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

March 8, 2010 MATH 408 FINAL EXAM SAMPLE March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms Christopher Engström November 14, 2014 Hermitian LU QR echelon Contents of todays lecture Some interesting / useful / important of matrices Hermitian LU QR echelon Rewriting a as a product of several matrices.

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

Orthogonal Transformations

Orthogonal Transformations Orthogonal Transformations Tom Lyche University of Oslo Norway Orthogonal Transformations p. 1/3 Applications of Qx with Q T Q = I 1. solving least squares problems (today) 2. solving linear equations

More information

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,

More information

The Linear Least Squares Problem

The Linear Least Squares Problem CHAPTER 3 The Linear Least Squares Problem In this chapter we we study the linear least squares problem introduced in (4) Since this is such a huge and important topic, we will only be able to briefly

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Matrix Factorization and Analysis

Matrix Factorization and Analysis Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 101. A H w. 5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

More information

Orthonormal Transformations and Least Squares

Orthonormal Transformations and Least Squares Orthonormal Transformations and Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 30, 2009 Applications of Qx with Q T Q = I 1. solving

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

ECE133A Applied Numerical Computing Additional Lecture Notes

ECE133A Applied Numerical Computing Additional Lecture Notes Winter Quarter 2018 ECE133A Applied Numerical Computing Additional Lecture Notes L. Vandenberghe ii Contents 1 LU factorization 1 1.1 Definition................................. 1 1.2 Nonsingular sets

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 GENE H GOLUB Issues with Floating-point Arithmetic We conclude our discussion of floating-point arithmetic by highlighting two issues that frequently

More information

Gaussian Elimination and Back Substitution

Gaussian Elimination and Back Substitution Jim Lambers MAT 610 Summer Session 2009-10 Lecture 4 Notes These notes correspond to Sections 31 and 32 in the text Gaussian Elimination and Back Substitution The basic idea behind methods for solving

More information

14.2 QR Factorization with Column Pivoting

14.2 QR Factorization with Column Pivoting page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Lecture 3: QR-Factorization

Lecture 3: QR-Factorization Lecture 3: QR-Factorization This lecture introduces the Gram Schmidt orthonormalization process and the associated QR-factorization of matrices It also outlines some applications of this factorization

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Direct Methods Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Linear Systems: Direct Solution Methods Fall 2017 1 / 14 Introduction The solution of linear systems is one

More information

Review of Matrices and Block Structures

Review of Matrices and Block Structures CHAPTER 2 Review of Matrices and Block Structures Numerical linear algebra lies at the heart of modern scientific computing and computational science. Today it is not uncommon to perform numerical computations

More information

The Solution of Linear Systems AX = B

The Solution of Linear Systems AX = B Chapter 2 The Solution of Linear Systems AX = B 21 Upper-triangular Linear Systems We will now develop the back-substitution algorithm, which is useful for solving a linear system of equations that has

More information

MAT 610: Numerical Linear Algebra. James V. Lambers

MAT 610: Numerical Linear Algebra. James V. Lambers MAT 610: Numerical Linear Algebra James V Lambers January 16, 2017 2 Contents 1 Matrix Multiplication Problems 7 11 Introduction 7 111 Systems of Linear Equations 7 112 The Eigenvalue Problem 8 12 Basic

More information

Numerical Linear Algebra And Its Applications

Numerical Linear Algebra And Its Applications Numerical Linear Algebra And Its Applications Xiao-Qing JIN 1 Yi-Min WEI 2 August 29, 2008 1 Department of Mathematics, University of Macau, Macau, P. R. China. 2 Department of Mathematics, Fudan University,

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Numerical Linear Algebra

Numerical Linear Algebra Chapter 3 Numerical Linear Algebra We review some techniques used to solve Ax = b where A is an n n matrix, and x and b are n 1 vectors (column vectors). We then review eigenvalues and eigenvectors and

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

lecture 2 and 3: algorithms for linear algebra

lecture 2 and 3: algorithms for linear algebra lecture 2 and 3: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 27, 2018 Solving a system of linear equations

More information

March 5, 2012 MATH 408 FINAL EXAM SAMPLE

March 5, 2012 MATH 408 FINAL EXAM SAMPLE March 5, 202 MATH 408 FINAL EXAM SAMPLE Partial Solutions to Sample Questions (in progress) See the sample questions for the midterm exam, but also consider the following questions. Obviously, a final

More information

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le Direct Methods for Solving Linear Systems Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le 1 Overview General Linear Systems Gaussian Elimination Triangular Systems The LU Factorization

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count

More information

2.1 Gaussian Elimination

2.1 Gaussian Elimination 2. Gaussian Elimination A common problem encountered in numerical models is the one in which there are n equations and n unknowns. The following is a description of the Gaussian elimination method for

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Algebra C Numerical Linear Algebra Sample Exam Problems

Algebra C Numerical Linear Algebra Sample Exam Problems Algebra C Numerical Linear Algebra Sample Exam Problems Notation. Denote by V a finite-dimensional Hilbert space with inner product (, ) and corresponding norm. The abbreviation SPD is used for symmetric

More information

MTH 464: Computational Linear Algebra

MTH 464: Computational Linear Algebra MTH 464: Computational Linear Algebra Lecture Outlines Exam 2 Material Prof. M. Beauregard Department of Mathematics & Statistics Stephen F. Austin State University February 6, 2018 Linear Algebra (MTH

More information

Computational Methods. Least Squares Approximation/Optimization

Computational Methods. Least Squares Approximation/Optimization Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic

More information

Orthonormal Transformations

Orthonormal Transformations Orthonormal Transformations Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 25, 2010 Applications of transformation Q : R m R m, with Q T Q = I 1.

More information

Lecture 4 Orthonormal vectors and QR factorization

Lecture 4 Orthonormal vectors and QR factorization Orthonormal vectors and QR factorization 4 1 Lecture 4 Orthonormal vectors and QR factorization EE263 Autumn 2004 orthonormal vectors Gram-Schmidt procedure, QR factorization orthogonal decomposition induced

More information

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix

More information

Symmetric Matrices and Eigendecomposition

Symmetric Matrices and Eigendecomposition Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity) Householder QR Householder reflectors are matrices of the form P = I 2ww T, where w is a unit vector (a vector of 2-norm unity) w Px x Geometrically, P x represents a mirror image of x with respect to

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

3 QR factorization revisited

3 QR factorization revisited LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 2, 2000 30 3 QR factorization revisited Now we can explain why A = QR factorization is much better when using it to solve Ax = b than the A = LU factorization

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

lecture 3 and 4: algorithms for linear algebra

lecture 3 and 4: algorithms for linear algebra lecture 3 and 4: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 30, 2016 Solving a system of linear equations

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

Linear Algebra Primer

Linear Algebra Primer Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................

More information

Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems. Matrix Factorization Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2 MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 12: Gaussian Elimination and LU Factorization Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 10 Gaussian Elimination

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

1 GSW Sets of Systems

1 GSW Sets of Systems 1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 1. qr and complete orthogonal factorization poor man s svd can solve many problems on the svd list using either of these factorizations but they

More information

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II) Math 59 Lecture 2 (Tue Mar 5) Gaussian elimination and LU factorization (II) 2 Gaussian elimination - LU factorization For a general n n matrix A the Gaussian elimination produces an LU factorization if

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information