1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν
|
|
- Percival Peters
- 5 years ago
- Views:
Transcription
1 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection of search directions The algorithms we consider are descent methods having an iteration scheme of the form x k+1 = x k + t k d k Line search methods for choosing a suitable stepsize t k were the focus of our work in last chapter We now devote our attention to choices for the search directions d k All of the search directions considered in this chapter can be classified as Newton-like since they are all of the form d k = H k f(x k ) for some matrix H k If H k = I for all k, we recover the method of steepest descent However, in general we wish to choose H k to be an approximation to 2 f(x k ) 1 which gives Newton s method for optimization The significance of Newton s method is that when it converges, it converges very fast, typically doubling the accuracy of the solution at each iteration In order to understand this behavior, we must first learn a bit about rate of convergence 11 Rate of Convergence In this section we focus on notions of quotient convergence, or Q-convergence There are corresponding notions of root convergence, or R-convergence These notions are derived from the quotient or root test for the convergence of power series In all instances the discussion given below only refers to Q-convergence Let {x ν } R n and x R n be such that x ν x We say that x ν x at a linear rate if lim sup ν x ν+1 x x ν x < 1 The convergence is said to be superlinear if this limsup is 0 The convergence is said to be quadratic if lim sup ν x ν+1 x x ν x 2 < For example, given γ (0, 1) the sequence {γ ν } converges linearly to zero, but not superlinearly The sequence {γ ν2 } converges superlinearly to 0, but not quadratically Finally, the sequence {γ 2ν } converges quadratically to zero Superlinear convergence is much faster than linear convergences, but quadratic convergence is much, much faster than superlinear convergence 12 Newton s Method for Solving Equations Consider the following problem: E : Given g : R n R n, find x R n for which g(x) = 0 A great variety of problems can be posed in this format It is a problem of great practical importance and various versions of this problem continue to be the focus of much ongoing scientific, mathematical, and numerical research It is important in the context of optimization because of the first order necessary conditions for optimality, f(x) = 0 A standard 1
2 2 approach to solving optimization problems is to locate all critical points, or, equivalently, the zero set of the gradient Newton s method for equation solving (or just, Newton s method) is designed for this purpose When one applies Newton s method to solve for the critical points of a function, it is refered to Newton s method for optimization Before moving on to optimization though, we first consider the standard Newton s method for equations Assume that the function g in E is continuously differentiable and that we have an approximate solution x 0 R n to E We now wish to improve on this approximation If x is a solution to E, then 0 = g(x) = g(x 0 ) + g (x 0 )(x x 0 ) + o x x 0 Thus, if x 0 is close to x, it is reasonable to suppose that the solution to the linearized system (11) 0 = g(x 0 ) + g (x 0 )(x x 0 ) is even closer This proceedure is known as Newton s method for finding the roots of the equation g(x) = 0 It has one obvious pitfall Equation (11) may not be consistent That is, there may not exist an x solving (11) In general, the set of solutions to (11) is either (1) the empty set, (2) an infinite set, or (3) a single point For the sake of the present argument, we assume that (3) holds, ie g (x 0 ) 1 exists Under this assumption (11) defines the iteration scheme, (12) x k+1 := x k [g (x i )] 1 g(x k ), called the Newton iteration The associated direction (13) d k := [g (x k )] 1 g(x k ) is called the Newton direction We analyze the convergence behavior of this scheme under the additional assumption that only an approximation to g (x k ) 1 is available We denote this approximation by J k The resulting iteration scheme is (14) x k+1 := x k J k g(x k ) Methods of this type are called Newton-Like methods Theorem 11 Let g : R n R n be differentiable, x 0 R n, and J 0 R n n Suppose that there exists x, x 0 R n, and ɛ > 0 with x 0 x < ɛ such that (1) g(x) = 0, (2) g (x) 1 exists for x B(x; ɛ) := {x R n : x x < ɛ} with sup{ g (x) 1 : x B(x; ɛ)] M 1 (3) g is Lipschitz continuous on clb(x; ɛ) with Lipschitz constant L, and (4) θ 0 := LM 1 2 x0 x + M 0 K < 1 where K (g (x 0 ) 1 J 0 )y 0, y 0 := g(x 0 )/ g(x 0 ), and M 0 = max{ g (x) : x B(x; ɛ)} Further suppose that iteration (14) is initiated at x 0 where the J k s are chosen to satisfy one of the following conditions;
3 (i) (g (x k ) 1 J k )y k K, (ii) (g (x k ) 1 J k )y k θ k 1K for some θ 1 (0, 1), (iii) (g (x k ) 1 J k )y k min{m 2 x k x k 1, K}, for some M 2 > 0, or (iv) (g (x k ) 1 J k )y k min{m 2 g(x k ), K}, for some M 3 > 0, where for each k = 1, 2,, y k := g(x k )/ g(x k ) These hypotheses on the accuracy of the approximations J k yield the following conclusions about the rate of convergence of the iterates x k (a) If (i) holds, then x k x linearly (b) If (ii) holds, then x k x superlinearly (c) If (iii) holds, then x k x two step quadratically (d) If (iv) holds, then x k x quadratically Proof We begin by establishing the basic inequalities (15) x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ), and (16) x k+1 x θ 0 x k x and the inclusion (17) x k+1 B( x; ɛ) by induction on k For k = 0 we have x 1 x = x 0 x g (x 0 ) 1 g(x 0 ) + [g (x 0 ) 1 J 0 ]g(x 0 ) = g (x 0 ) 1 [g(x) (g(x 0 ) + g (x 0 )(x x 0 ))] +[g (x 0 ) 1 J 0 ]g(x 0 ), since g (x 0 ) 1 exists by the hypotheses Consequently, the hypothese (1) (4) plus the quadratic bound lemma imply that x k+1 x g (x 0 ) 1 g(x) (g(x 0 ) + g (x 0 )(x x 0 )) + (g (x 0 ) 1 J 0 )g(x 0 ) M 1L 2 x0 x 2 + K g(x 0 ) g(x) M 1L 2 x0 x 2 + M 0 K x 0 x θ 0 x 0 x < ɛ, whereby (15) (16) are established for k = 0 Next suppose that (15) (16) hold for k = 0, 1,, s 1 We show that (15) (16) hold at k = s Since x s B(x, ɛ), hypotheses (2) (4) hold at x s, one can proceed exactly as in the case k = 0 to obtain (15) Now if any one of (i) (iv) holds, then (i) holds Thus, by 3
4 4 (15), we find that x s+1 x M 1L 2 xs x 2 + (g (x s ) 1 J s )g(x s ) [ M 1L 2 θs 0 x 0 x + M 0 K] x s x [ M 1L 2 x0 x + M 0 K] x s x = θ 0 x s x Hence x s+1 x θ 0 x s x θ 0 ɛ < ɛ and so x s+1 B(x, ɛ) We now proceed to establish (a) (d) (a) This clearly holds since the induction above established that (b) From (15), we have x k+1 x θ 0 x k x x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ) LM 1 2 xk x 2 + θ k 1K g(x k ) [ LM 1 2 θk 0 x 0 x + θ k 1M 0 K] x k x Hence x k x superlinearly (c) From (15) and the fact that x k x, we eventually have x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ) LM 1 2 xk x 2 + M 2 x k x k 1 g(x k ) [ LM 1 2 xk x + M 0 M 2 [ x k 1 x + x k x ]] x k x [ LM 1 2 θ 0 x k 1 x + M 0 M 2 (1 + θ 0 ) x k 1 x ] θ 0 x k 1 x Hence x k x two step quadratically = [ LM 1 2 θ 0 + M 0 M 2 (1 + θ 0 )]θ 0 x k 1 x 2
5 5 (d) Again by (15) and the fact that x k x, we eventually have x k+1 x LM 1 2 xk x 2 + (g (x k ) 1 J k )g(x k ) LM 1 2 xk x 2 + M 2 g(x k ) 2 [ LM M 2 M 2 0 ] x k x 2 Note that the conditions required for the approximations to the Jacobian matrices g (x k ) 1 given in (i) (ii) do not imply that J k g ( x) 1 The stronger conditions (i) g (x k ) 1 J k g (x 0 ) 1 J 0, (ii) g (x k+1 ) 1 J k+1 θ 1 g (x k ) 1 J k for some θ 1 (0, 1), (iii) g (x k ) 1 J k min{m 2 x k+1 x k, g (x 0 ) 1 J 0 } for some M 2 > 0, or (iv) g (x k ) 1 = J k, which imply the conditions (i) through (iv) of Theorem 11 respectively, all imply the convergence of the inverse Jacobian approximates to g ( x) 1 Clearly the conditions (i) (iv) are not as desirable since they require a great deal more expense and care in the construction of the inverse Jacobian approximates 13 Newton s Method for Minimization In this section we translate the results of previous section in the context of minimization Here the underlying problem is The Newton-like iterations are of the form P min f(x) x R n x k+1 = x k H k f(x k ), where H k is an approximation to the inverse of the Hessian matrix 2 f(x k ) Theorem 12 Let f : R n R be twice differentiable, x 0 R n, and H 0 R n n Suppose that (1) there exists x R n and ɛ > x 0 x such that f(x) f(x) whenever x x ɛ, (2) there is a δ > 0 such that δ z 2 2 z T 2 f(x)z for all x B(x, ɛ), (3) 2 f is Lipschitz continuous on clb(x; ɛ) with Lipschitz constant L, and (4) θ 0 := L 2δ x0 x + M 0 K < 1 where M 0 > 0 satisfies z T 2 f(x)z M 0 z 2 2 for all x B(x, ɛ) and K ( 2 f(x 0 ) 1 H 0 )y 0 with y 0 = f(x 0 )/norm f(x 0 ) Further, suppose that the iteration (18) x k+1 := x k H k f(x k ) is initiated at x 0 where the H k s are chosen to satisfy one of the following conditions: (i) ( 2 f(x k ) 1 H k )y k K, (ii) ( 2 f(x k ) 1 H k )y k θ k 1K for some θ 1 (0, 1), (iii) ( 2 f(x k ) 1 H k )y k min{m 2 x k x k 1, K}, for some M 2 > 0, or (iv) ( 2 f(x k ) 1 H k )y k min{m 2 f(x k ), K}, for some M 3 > 0,
6 6 where for each k = 1, 2, y k := f(x k )/ f(x k ) These hypotheses on the accuracy of the approximations H k yield the following conclusions about the rate of convergence of the iterates x k (a) If (i) holds, then x k x linearly (b) If (ii) holds, then x k x superlinearly (c) If (iii) holds, then x ɛ x two step quadratically (d) If (iv) holds, then x k k quadradically In order to more fully understand the convergence behavior described in the above result a careful study of the role of the controling parameters L, M 0, and M 1 needs to be made Although we do not attempt this study, we do make a few observations First observe that since L is a Lipschitz constant for 2 f it represents a bound on the third order behavior of f Thus the assumptions for convergence make implicit demands on the third derivative Next, the constant δ in the context of minimization represents a local uniform lower bound on the eigenvalues of 2 f That is, f behaves locally as if it were a strongly convex function (see exercises) with modulus δ Finally, M 0 can be interpreted as a local Lipschitz constant for f and only plays a role when 2 f is approximated inexactly by H k s We now consider the performance differences between the method of steepest descent and Newton s method on a simple one dimensional problem For this we consider the function f(x) = x 2 + e x Clearly, f is a strongly convex function with f(x) = x 2 + e x f (x) = 2x + e x f (x) = 2 + e x > 2 f (x) = e x If we apply the steepest descent algorithm with backtracking (γ = α, c = 001) initiated at x 0 = 1, we get the following table k x k f(x k ) f (x k ) s
7 Let us now apply Newton s method from the same starting point taking a unit step at each iteration This time we get x f (x) / and one more iteration give f (x 5 ) This is a stunning improvement in performance and shows why one always uses Newton s method (or an approximation to it) whenever possible Our next objective is to develop numerically viable methods for approximating Jacobians and Hessians in Newton-like methods We begin with a brief excursion into numerical linear algebra 14 Numerical Linear Algebra 141 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method one first forms the augmented system [A b] and then uses the three elementary row operations to put this system into row echelon form (or upper triangular form) A solution x is then obtained by back substitution, or back solving, starting with the component x n We now show how the process of bringing a matrix to upper triangular form can be performed by left matrix multiplication The key step in Gaussian elimination is to transform a vector of the form 7 a α b, where a R k, 0 α R, and b R n k 1, into one of the form a α 0 This can be accomplished by left matrix multiplication as follows: I k k a α = a α 0 α 1 b I (n k 1) (n k 1) b 0
8 8 The matrix I k k α 1 b I (n k 1) (n k 1) is called a Gaussian elimination matrix This matrix is invertible with inverse I k k α 1 b I (n k 1) (n k 1) We now use this basic idea to show how a matrix can be put into upper triangular form Suppose [ ] a1 v1 T A = C n m, u 1 Ã 1 with 0 a 1 C, u 1 C m 1, v 1 C n 1, and Ã1 C (m 1) (n 1) Then using the first row to zero out u 1 amounts to left multiplication of the matrix A by the matrix [ ] 1 0 to get (*) where Define and observe that u 1 a 1 u 1 a 1 [ ] [ ] 1 0 a1 v1 T C I n m = u 1 Ã 1 u 1 a 1 A 1 = Ã1 u 1 v T 1 /a 1 [ ] 1 0 L 1 = C I m m and U 1 = I [ 1 0 L 1 1 = I u 1 a 1 ] [ ] a1 v1 T 0 A 1 [ ] a1 v1 T C 0 A m n 1 Hence (*) becomes L 1 1 A = U 1, or equivalently, A = L 1 U 1 Note that L 1 is unit lower triangular (ones on the mail diagonal) and U 1 is block uppertriangular with one 1 1 block and one (m 1) (n 1) block on the block diagonal The multipliers are usually denoted u/a = [µ 21, µ 31,, µ m1 ] T If the (1, 1) entry of A 1 is not 0, we can apply the same procedure to A 1 : if [ ] a2 v2 T A 1 = C (m 1) (n 1) u 2 Ã 2 with a 2 0, letting [ ] I 0 L 2 = C I (m 1) (m 1), u 2 a 2,
9 and forming L 1 2 A 1 = [ ] [ ] 1 0 a2 v2 T = I u 2 Ã 1 u 1 a 2 [ ] a2 v2 T 0 A Ũ2 C (m 1) (n 1), 2 where A 2 C (m 2) (n 2) This process amounts to using the second row to zero out elements of the second column below the diagonal Setting [ ] [ ] 1 0 a v T L 2 = 0 L and U 2 =, 2 we have [ 1 0 L 1 2 L 1 1 A = 0 L Ũ2 ] [ ] a v T = U 0 A 2, 1 or equivalently, A = L 2 L 1 U 2 Here U 2 is block upper triangular with two 1 1 blocks and one (m 2) (n 2) block on the diagonal, and again L 2 is unit lower triangular We can continue in this fashion at most m 1 times, where m = min{m, n} If we can proceed m 1 times, then L 1 m 1 L 1 2 L 1 1 A = U m 1 = U is upper triangular provided that along the way that the (1, 1) entries of A, A 1, A 2,, A m 2 are nonzero so the process can continue Define L = (L 1 m 1 L 1 1 ) 1 = L 1 L 2 L m 1 The matrix L is square unit lower triangular, and so is invertable Moreover, A = LU, where the matrix U is the so called row echelon form of A In general, a matrix T C m n is said to be in row echelon form if for each i = 1,, m 1 the first non-zero entry in the (i + 1) st row lies to the right of the first non-zero row in the i th row Let us now suppose that m = n and A C n n is invertible Writing A = LU as a product of a unit lower triangular matrix L C n n (necessarily invertible) and an upper triangular matrix U C n n (also nessecarily invertible in this case) is called the LU factorization of A Remarks (1) If A C n n is invertible and has an LU factorization, it is unique (2) One can show that A C n n has an LU factorization iff for 1 j n, the upper left j j principal submatrix a 11 a ij is invertible a j1 a jj 9
10 10 (3) Not every [ invertible ] A C n n has an LU-factorization 0 1 Example: 1 0 Typically, one must permute the rows of A to move nonzero entries to the appropriate spot for the elimination to proceed Recall that a permutation matrix P C n n is the identity I with its rows (or columns) permuted: so P R n n is orthogonal, and P 1 = P T Permuting the rows of A amounts to left multiplication by a permutation matrix P T ; then P T A has an LU factorization, so A = P LU (called the PLU factorization of A) (4) Fact: Every invertible A C n n has a (not necessarily unique) PLU factorization (5) The LU factorization can be used to solve linear systems Ax = b (where A = LU C n n is invertible) The system Ly = b can be solved by forward substitution (1 st equation gives x 1, etc), and Ux = y can be solved by back-substitution (n th equation gives x n, etc), giving the solution to? Ax = LUx = b Example: We now use the procedure outlined above to compute the LU factorization of the matrix A = L 1 1 A = We now have and L = L 1 L 2 = = L 1 2 L 1 1 A = = U = , =
11 142 The Cholesky Factorization We now consider the application of the LU factorization to a symmetric positive definite matrix, but with a twist Suppose the n n matrix H is symmetric and we have performed the first step in the procedure for computing the LU factorization of H so that L 1 1 H = U 1 Clearly, U 1 is no-longer symmetric (assuming L 1 is not the identity matrix) To recover symmetry we could multiply U 1 on the right by the upper triangular matrix L T 1 so that L 1 1 HL T 1 = U 1 L T 1 = H 1 11 We claim that H 1 necessarily has the form [ h(1,1) 0 H 1 = 0 Ĥ 1 ], where h (1,1) is the (1, 1) element of H and Ĥ1 is an (n 1) (n 1) symmetric matrix For example, consider the matrix H = In this case, we get L 1 1 HL T 1 = = If we now continue this process with the added feature of multiplying on the right by L T j as we proceed, we obtain L 1 HL T = D, or equivalently, H = LDL T, where L is a unit lower triangular matrix and D is a diagonal matrix Note that the entries on the diagonal of D are not necessarily the eigenvalues of H since the transformation L 1 HL T is not a similarity transformation Observe that if it is further assumed that H is positive definite, then the diagonal entries of D are necessarity all positive and the factorization H = LDL T can always be obtained, ie no zero pivots can arise in computing the LU factorization (see exercises)
12 12 Let us apply this approach by continuing the computation of the LU factorization for the matrix given above Thus far we have L 1 1 HL T 1 = = Next L 1 2 L 1 1 HL T 1 L T 2 = = , giving the desired factorization H = LDL T = Note that this implies that the matrix H is not positive definite We make one final comment on the positive definite case When H is symmetric and positive definite, an LU factorization always exists and we can use it to obtain a factorization of the form H = LDL T, where L is unit lower triangular and D is diagonal with positive diagonal entries If D = diag(d 1, d 2,, d n ) with each d i > 0, the D 1/2 = diag( d 1, d 2,, d n ) Hence we can write H = LDL T = LD 1/2 D 1/2 L T = ˆLˆL T, where ˆL = LD 1/2 is a non-singular lower triangular matrix The factorization H = ˆLˆL T where ˆL is a non-singular lower triangula matrix is called the Cholesky factorization of H In this regard, the process of computing a Chlesky factorization is an effective means for determining is a symmetric matrix is positive definite 143 The QR Factorization Recall the Gram-Schmidt orthogonalization process for a sequence of linearly independent vectors a 1,, a n R n Define q 1,, q n inductively, as
13 13 follows: set p 1 = a 1, q 1 = p 1 / p 1, j 1 p j = a j a j, q j q i for 2 j n, and i=1 q j = p j / p j For 1 j n, q j Span{a 1,, a j }, so each p j 0 by the lin indep of {a 1,, a n } Thus each q j is well-defined We have {q 1,, q n } is an orthonormal basis for Span{a 1,, a n } Also a k Span{q 1,, q k } 1 k n, so {q 1,, q k } is an orthonormal basis of Span{a 1,, a k } Define r jj = p j and r ij = a j, q i for 1 i < j n, we have: a 1 = r 11 q 1, a 2 = r 12 q 1 + r 22 q 2, a 3 = r 13 q 1 + r 23 q 2 + r 33 q 3, a n = n r in q i i=1 Set A = [a 1 a 2 a n ], R = [r ij ], and Q = [q 1 q 2 q n ], where r ij = 0, i > j Then A = QR, where Q is unitary and R is upper triangular Remarks (1) If a 1, a 2, is a linearly independent sequence, apply Gram-Schmidt to obtain an orthonormal sequence q 1, q 2, such that {q 1,, q k } is an orthonormal basis for Span{a 1,, a k }, k 1 (2) If the a j s are linearly dependent, for some value(s) of k, a k Span{a 1,, a k 1 }, so p k = 0 The process can be modified by setting r kk = 0, not defining a new q k for this iteration and proceeding We end up with orthogonal q j s Then for k 1, the vectors {q 1,, q k } form an orthonormal basis for Span{a 1,, a l+k } where l is the number of r jj = 0 Again we obtain A = QR, but now Q may not be square
14 14 (3) The classical Gram-Schmidt algorithm as described does not behave well computationally This is due to the accumulation of round-off error The computed q j s are not orthogonal: q j, q k is small for j k with j near k, but not so small for j k or j k An alternate version, Modified Gram-Schmidt, is equivalent in exact arithmetic, but behaves better numerically In the following pseudo-codes, p denotes a temporary storage vector used to accumulate the sums defining the p j s Classic Gram-Schmidt Modified Gram-Schmidt For j = 1,, n do For j = 1,, n do p := a j p := aj For i = 1,, j 1 do For i = 1,, j 1 do r ij = a j, q i r ij = p, q i p := p r ij q i p := p r ij q i r jj := p r jj = p q j := p/r jj q j := p/r jj The only difference is in the computation of r ij : in Modified Gram-Schmidt, we orthogonalize the accumulated partial sum for p j against each q i successively Theorem 13 Suppose A R m n with m n Then for which unitary Q R m m upper triangular R R m n A = QR If Q R m n denotes the first n columns of Q and R R n n denotes the first n rows of R, then [ ] R A = QR = [ Q ] = Q R 0 Moreover (a) We may choose an R to have nonnegative diagonal entries (b) If A is of full rank, then we can choose R with positive diagonal entries, in which case the condensed factorization A = Q R is unique (and thus if m = n, the factorization A = QR is unique since then Q = Q and R = R) Proof If A has full rank, apply the Gram-Schmidt Define Q = [q 1,, q n ] R m n and R = [rij ] R n n as above, so A = Q R
15 Extend {q 1,, q n } to an orthonormal basis {q 1,, q m } of R m, and set [ ] R Q = [q 1,, q m ] and R = C m n, so A = QR 0 As r jj > 0 in the G-S process, we have (b) Uniqueness follows by induction passing through the G-S process again, noting that at each step we have no choice Remarks (1) In practice, there are more efficient and better computationally behaved ways of calculating the Q and R factors The idea is to create zeros below the diagonal (successively in columns 1, 2, ) as in Gaussian Elimination, except we now use Householder transformations (which are unitary) instead of the unit lower triangular matrices L j (2) A QR factorization is also possible when m < n A = Q[R 1 R 2 ], where Q C m m is unitary and R 1 C m m is upper triangular Every A R m n has a QR-factorization, even when m < n Indeed, if rank(a) = k, there always exist Q R m k with orthonormal columns, R R k n full rank upper triangular, and a permutation matrix P R n n such that ( ) AP = QR Moreover, if A has rank n (so m n), then R R n n is nonsingular On the other hand, if m < n, then R = [R 1 R 2 ], where R 1 R k k is nonsingular Finally, if A R m n, then the same facts hold, but now both Q and R can be chosen to be real matrices QR-Factorization and Orthogonal Projections Let A R m n have condensed QR-factorization A = Q R Then by construction the columns of Q form an orthonormal basis for the range of A Hence P = Q Q T is the orthogonal projector onto the range of A Similarly, if the condensed QR-factorization of A T is A T = Q 1 R1, then P 1 = Q 1 QT 1 is the orthogonal projector onto Ran(A T ) = ker(a), and so I Q 1 QT 1 15
16 16 is the orthogonal projector onto ker(a) The QR-factorization can be computed using either Givens rotations or Householder reflections Although, the approach via rotations is arguably more stable numerically, it is more difficult to describe so we only illustrate the approach using Householder reflections QR using Householder Reflections Given w R n we can associate the matrix U = I 2 wwt w T w which reflects R n across the hyperplane Span{w} The matrix U is call the Householder reflection across this hyperplane Given a pair of vectors x and y with x 2 = y 2, and x y, there is a Householder reflection such that y = Ux: Proof (x y)(x y)t U = I 2 (x y) T (x y) x 2 y T x Ux = x 2(x y) x 2 2y T x + y 2 = x 2(x y) x 2 y T x 2( x 2 y T x) = y since x = y QR using Householder Reflections We now describe the basic deflation step in the QR-factorization [ ] α0 a A 0 = T 0 b 0 A 0 Set ν 0 = ( α0 b 0 ) 2
17 17 Let H 0 be the Householder transformation that maps ( ) α0 ν 0 e 1 : b T 0 H 0 = I 2 wwt w T w where w = ( α0 ) ν b 0 e 1 = 0 ( ) α0 ν 0 b 0 Thus, A problem occurs if ν 0 = 0 or H 0 A = ( α0 b 0 ) [ ] ν0 a T 1 0 A 1 = 0 In this case, permute the offending column to the right bringing in the column of greatest magnitude Now repeat with A 1 If the above method if implemented by always permuting the column of greatest magnitude into the current pivot column, then AP = QR gives a QR-factorization with the diagonal entries of R nonnegative and listed in the order of descending magnitude 15 Matrix Secant Methods Let us return to the problem of finding x R n such that g(x) = 0 where g : R n R n is C 1 In this section we consider Newton-Like methods of a special type Recall that in a Newton-Like method the iteration scheme takes the form (19) x k+1 := x k J k g(x k ), where J k is meant to approximate the inverse of g (x k ) In the one dimensional case, a method proposed by the Babylonians 3700 years ago is of particular significance Today we call it the secant method: x k 1 x k (110) J k = g(x k 1 ) g(x k ) With this approximation one has g (x k ) 1 J k = g(xk 1 ) [g(x k ) + g (x k )(x k 1 x k )] g (x k )[g(x k 1 ) g(x k )] Also, near a point x at which g (x ) 0 there exists an α > 0 such that α x y g(x) g(y) Consequently, by the Quadratic Bound Lemma, g (x k ) 1 J k L 2 xk 1 x k 2 α g (x k ) x k 1 x k K xk 1 x k
18 18 for some constant K > 0 whenever x k and x k 1 are sufficiently close to x Therefore, by Theorem 11, the secant method is locally two step quadratically convergent to a non singular solution of the equation g(x) = 0 An additional advantage of this approach is that no extra function evaluations are required to obtain the approximation J k 151 Matrix Secant Methods for Equations Unfortunately, the secant approximation (110) is meaningless if the dimension n is greater than 1 since division by vectors is undefined But this can be rectified by simply writing (111) J k (g(x k 1 ) g(x k )) = x k 1 x k Equation (111) is called the Quasi-Newton equation (QNE), or matrix secant equation (MSE), at x k and it determines J k along an n dimensional manifold in R n n Equation (111) is not enough to uniquely determine J k since (111) is n linear equations in n 2 unknowns To nail down a specific J k further conditions on the update J k must be given In order to determine sensible conditions, let us consider an overall iteration scheme based on (19) At every iteration we have (x k, J k ) and compute x k+1 by (19) Then J k+1 is constructed to satisfy (111) If B k = J 1 k is close to g (x k ) and x k+1 is close to x k, then J k+1 should be chosen not only to satisfy (111) but also to be as close to J k as possible In what sense should we mean close here? In order to facilitate the computations it is reasonable to mean algebraically close in the sense that J k+1 (or B k+1 = J 1 k+1 ) is only a rank 1 modification of J k (respectively, B k = J 1 k ) Since it may happen that g (x k ) is not invertible, we first see what happens when we use this idea to approximate B k+1 Since we are assuming that B k+1 is a rank 1 modification to B k, there are vectors u, v R n such that (112) B k+1 = B k + uv T We now use the matrix secant equation (MSE) to derive conditions on the choice of u and v In this setting, the MSE becomes where Multiplying (112) by s k gives Hence, if v T s k 0, we obtain and B k+1 s k = y k, s k := x k+1 x k and y k := g(x k 1 ) g(x k ) y k = B k+1 s k = B k s k + uv T s k u = yk B k s k v T s k (113) B k+1 = B k + (yk B k s k )v T v T s k Equation (113) determines a whole class of rank one updates that satisfy the MSE where one is allowed to choose v R n as long as v T s k 0 If s k 0, then an obvious choice for v
19 19 is s k yielding the update (114) B k+1 = B k = (yk B k s k )s kt s kt s k This is known as Broyden s update It turns out that the Broyden update is also analytically close Theorem 14 Let A R n n, s, y R n, s 0 Then for any matrix norms and such that AB A B and the solution to vvt 1, v T v (115) min{ B A : Bs = y} is (y As)sT (116) A + = A + s T s In particular, (116) solves (115) when is the l 2 matrix norm, and (116) solves (115) uniquely when is the Frobenius norm Proof Let B {B R n n : Bs = y}, then Note that if = 2, then (y As)sT A + A = = (B A) sst s T s s T s B A sst B A s T s vvt v T v 2 = sup{ vvt v T v x 2 : x 2 = 1} (v = sup{ T x) 2 : x v 2 2 = 1} = 1, so that the conclusion of the result is not vacuous For uniqueness observe that the Frobenius norm is strictly convex and A B F A F B 2 Therefore, the Broyden update (114) is both algebraically and analytically close to B k These properties indicate that it should perform well in practice and indeed it does Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows:
20 20 Solve B k s k = g(x k ) for s k and set x k+1 : = x k + s k y k : = g(x k ) g(x k+1 ) B k+1 : = B k + (yk B k s k )s kt s kt s k Let us now see if we can compute J k = B 1 k so that we can write the step computation as s k = J k g(x k ) avoiding the need to solve an equation For this we employ the following important lemma Lemma 11 (Sherman-Morrison-Woodbury) Suppose A R n n, U R n k, V R n k are such that both A 1 and (I + V T A 1 U) 1 exist, then (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1 The above lemma verifies that if B 1 k (117) J k+1 = [B k + (yk B k s k )s kt s kt s k ] 1 = B 1 k = J k exists and s kt J k y k = s kt B 1 k yk 0, then + (sk B 1 k yk )s kt B 1 k s kt B 1 k y = J k + (sk J k yk )skt s kt J k y In this case, it is possible to directly update the inverses J k But this process can become numerically unstable if s kt J k y k is small Therefore, in practise, the Broyden update is usually implemented in forward mode described by the algorithm for Broyden s method above Although we do not pause to establish the convergence rates here, we do give the following result due to Dennis and Moré (1974) Theorem 15 Let g : R n R n be continuously differentiable in an open convex set D R n Assume that there exists x R n and r, β > 0 such that x + rb D, g(x ) = 0, g (x ) 1 exists with g (x ) 1 β, and g is Lipschitz continuous on x + rb with Lipschitz constant γ > 0 Then there exist positive constants ɛ and δ such that if x 0 x 2 ɛ and B 0 g (x 0 ) δ, then the sequence {x k } generated by the iteration [ x k+1 := x k + s k where s k solves 0 = g(x k ) + B k s B k+1 := B k + (yk B k s k )s T k s T k sk where y k = g(x k+1 ) g(x k ) is well-defined with x k x superlinearly 152 Matrix Secant Methods for Minimization Let us now see how we can mimic these matrix secant ideas in the context of optimization where the underlying problem is one of minimization: P : minimize f(x) x R n where f : R n R is C 2 How can we modify and/or extend the matrix secant methods for equations to the setting of minimization where one wishes to solve the equation f(x) = 0 In this context the MSE (matrix secant equation) becomes s k = H k+1 y k J k
21 where s k := x k+1 x k and y k := f(x k+1 ) f(x k ) In this context the matrix H k is intended to be an approximation to the inverse of the Hessian matrix 2 f(x k ) Writing M k = H 1 k, a straightforward application of Broyden s method gives the update M k+1 = M k + (yk M k s k )s kt s kt s k However, this is unsatisfactory for two reasons: (1) Since M k approximates 2 f(x k ) it must be symmetric (2) Since we are minimizing, then M k must be positive definite to insure that s k = M 1 k f(xk ) is a direction of descent for f at x k To address problem 1 above, one could return to equation (113) an find an update that preserves symmetry Such an update is uniquely obtained by setting v = (y k M k s k ) This is called the symmetric rank 1 update or SR1 Although this update can on occasion exhibit problems with numerical stability, it has recently received a great deal of renewed interest The stability problems occur whenever (118) v T s k = f(x k+1 ) T M 1 k f(xk ) has small magnitude We now approach the question of how to update M k in a way that addresses both the issue of symmetry and positive definiteness while still using the Broyden updating ideas Given a symmetric positive definite matrix M and two vectors s and y, our goal is to find a symmetric positive definite matrix M such that Ms = y Since M is symmertic and positive definite, there is a non-singular n n matrix L such that M = LL T Indeed, L can be chosent to be lower triangular Cholesky factor of M If M is also symmetric and positive definite then there is a matrix J R n n such that M = JJ T The MSE implies that if (119) J T s = v then (120) Jv = y Let us apply the Broyden update technique to (120), J, and L That is, suppose that (121) J = L + Then by (119) (y Lv)vT v T v (122) v = J T s = L T s + v(y Lv)T s v T v This expression implies that v must have the form v = αl T s 21
22 22 for some α R Substituting this back into (122) we get Hence (123) α 2 = αl T s = L T s + αlt s(y αll T s) T s α 2 s T LL T s [ ] st y s T Ms Consequently, such a matrix J satisfying (122) exists only if s T y > 0 in which case with yielding J = L + (y αms)st L, αs T Ms α = [ ] 1/2 st y, s T Ms (124) M = M + yyt y T s MssT M s T Ms Moreover, the Cholesky factorization for M can be obtained directly from the matrices J Specifically, if the QR factorization of J T is J T = QR, we can set L = R yielding M = JJ T = R T Q T QR = LL T We have shown that beginning with a symmetric positive definite matrix M k we can obtain a symmetric and positive definite update M k+1 that satisfies the MSE M k+1 s k = y k by applying the formula (124) whenever s kt y k > 0 We must now address the question of how to choose x k+1 so that s kt y k > 0 Recall that and where t k is the stepsize Hence y = y k = f(x k+1 ) f(x k ) s k = t k M 1 k f(xk ), y kt s k = f(x k+1 ) T s k f(x k ) T s k = t k ( f(x k + t k d k ) T d k f(x k ) T d k ), where d k := M 1 k f(xk ) Now since M k is positive definite the direction d k is a descent direction for f at x k and so t k > 0 Therefore, to insure that s kt y k > 0 we need only show that t k > 0 can be choosen so that (125) f(x k + t k d k ) T d k β f(x k ) T d k for some β (0, 1) since in this case f(x k + t k d k ) T d k f(x k ) T d k (β 1) f(x k ) T d k > 0 But this precisely the second condition in the weak Wolfe conditions with β = c 2 The update (124) is called the BFGS (Broyden-Fletcher-Goldfarb- Shanno) update and is currently considered the best available matrix secant type update
23 for minimization Observe in (124) that if both M and M are positive definite, then they are both invertible The Sherman-Morrison-Woodbury formula shows that the inverse is given by (126) M 1 = M 1 + (s M 1 y)s T + s(s M 1 y) T (s M 1 y) T yss T y T s (y T s) 2 Thus the corresponding inverse updating scheme for the BFGS update is H k+1 = H k + (sk H k y k )s kt + s k (s k H k y k ) T y kt s k (sk H k y k ) T y k s k s kt (y kt s k ) 2 This is the form of the update that is most commonly used as it avoids the need to solve the equation M k d k = f(x k ) for the search direction d k Instead, the search direction is obtained directly with a matrix multiply, d k = H k f(x k ) However, one still needs to be careful to avoid numerical instability This is typically avoided by being careful to the formation of H k+1 from H k The advised process is as follows BFGS Updating σ := s kt y k ŝ k := s k /σ ŷ k := y k /σ H k+1 := H k + (ŝ k H k ŷ k )(ŝ k ) T + ŝ k (ŝ k H k ŷ k ) T (ŝ k H k ŷ k ) T ŷ k ŝ k (ŝ k ) T 23
24 24 Exercises (1) Let γ (0, 1) (a) Show that the sequence {γ ν } converges linearly to zero, but not superlinearly (b) Show that the sequence {γ ν2 } converges superlinearly to 0, but not quadratically (c) Finally, show that the sequence {γ 2ν } converges quadratically to zero (2) Apply Newton s method to the function f(x) = x 2 + sin x with initial point x 0 = 2π (3) If H R n n is symmetric and positive definite show that there is a diagonal matrix D having positive entries and a unit lower triangular matrix L such that H = LDL T (4) Compute the Cholesky factorization of the matrix (5) Apply the secant method to the function f(x) = x 2 + e x with starting points x 1 = and x 0 = 1 How many ierations does it take to obtain a point for which f (x k ) 10 8? Compare this performance with that of Newton s method and the method of steepest descent (6) Verify formulae (118) and (123) (7) Verify formulae (117), (124), and (126) (8) Prove the Sherman-Morrison-Woodbury Lemma (Lemma 11) (9) Show that every rank 1 matrix W in R n n can be written in the form W = uv T for some choice of vectors u, v R n (10) Show that every n n positive semi-definite matrix M can be written in the form M = V V T for some n n matrix V having the same rank as M Also, show that V can be chosen to be an n k matrix where k = rank(m) (11) The function f : R n R is said to be strongly convex if there is a δ > 0 such that f(x + λ(y x)) f(x) + λ[f(y) f(x)] δ λ(1 λ) y x 2 2 for every x, y R n and λ [0, 1] The parameter δ is called the modulus of strong convexity Prove the following characterization theorem for strongly convex functions Theorem 16 (Characterizations of strongly convex functions) Let f : R n R be differentiable Then the following statements are equivalent (a) f is strongly convex with modulus δ (b) f(y) f(x) + f(x) T (y x) + δ 2 y x 2 for all x, R n (c) ( f(x) f(y)) T (x y) δ y x 2 for all x, y R n If it is further assumed that f is twice continuously differentiable on R n, then the above conditions are also equivalent to the following statement: δ u 2 2 u T 2 f(x)u u R n That is, the spectrum of the Hessian of f is uniformly bounded below by δ on R n
Search Directions for Unconstrained Optimization
8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All
More informationThis can be accomplished by left matrix multiplication as follows: I
1 Numerical Linear Algebra 11 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method
More informationLinear Analysis Lecture 16
Linear Analysis Lecture 16 The QR Factorization Recall the Gram-Schmidt orthogonalization process. Let V be an inner product space, and suppose a 1,..., a n V are linearly independent. Define q 1,...,
More informationMath 408A: Non-Linear Optimization
February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x
More informationImportant Matrix Factorizations
LU Factorization Choleski Factorization The QR Factorization LU Factorization: Gaussian Elimination Matrices Gaussian elimination transforms vectors of the form a α, b where a R k, 0 α R, and b R n k 1,
More informationMatrix Secant Methods
Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationMath 407: Linear Optimization
Math 407: Linear Optimization Lecture 16: The Linear Least Squares Problem II Math Dept, University of Washington February 28, 2018 Lecture 16: The Linear Least Squares Problem II (Math Dept, University
More informationSolving linear equations with Gaussian Elimination (I)
Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13
STAT 309: MATHEMATICAL COMPUTATIONS I FALL 208 LECTURE 3 need for pivoting we saw that under proper circumstances, we can write A LU where 0 0 0 u u 2 u n l 2 0 0 0 u 22 u 2n L l 3 l 32, U 0 0 0 l n l
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More informationReview Questions REVIEW QUESTIONS 71
REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of
More informationMarch 8, 2010 MATH 408 FINAL EXAM SAMPLE
March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationforms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms
Christopher Engström November 14, 2014 Hermitian LU QR echelon Contents of todays lecture Some interesting / useful / important of matrices Hermitian LU QR echelon Rewriting a as a product of several matrices.
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationNumerical Methods - Numerical Linear Algebra
Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear
More informationOrthogonal Transformations
Orthogonal Transformations Tom Lyche University of Oslo Norway Orthogonal Transformations p. 1/3 Applications of Qx with Q T Q = I 1. solving least squares problems (today) 2. solving linear equations
More informationNumerical Methods I Solving Square Linear Systems: GEM and LU factorization
Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,
More informationThe Linear Least Squares Problem
CHAPTER 3 The Linear Least Squares Problem In this chapter we we study the linear least squares problem introduced in (4) Since this is such a huge and important topic, we will only be able to briefly
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationMatrix decompositions
Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More informationMatrix Factorization and Analysis
Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their
More informationMath 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination
Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column
More information5.6. PSEUDOINVERSES 101. A H w.
5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and
More informationOrthonormal Transformations and Least Squares
Orthonormal Transformations and Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 30, 2009 Applications of Qx with Q T Q = I 1. solving
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationECE133A Applied Numerical Computing Additional Lecture Notes
Winter Quarter 2018 ECE133A Applied Numerical Computing Additional Lecture Notes L. Vandenberghe ii Contents 1 LU factorization 1 1.1 Definition................................. 1 1.2 Nonsingular sets
More informationCME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6
CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 GENE H GOLUB Issues with Floating-point Arithmetic We conclude our discussion of floating-point arithmetic by highlighting two issues that frequently
More informationGaussian Elimination and Back Substitution
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 4 Notes These notes correspond to Sections 31 and 32 in the text Gaussian Elimination and Back Substitution The basic idea behind methods for solving
More information14.2 QR Factorization with Column Pivoting
page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationLecture 3: QR-Factorization
Lecture 3: QR-Factorization This lecture introduces the Gram Schmidt orthonormalization process and the associated QR-factorization of matrices It also outlines some applications of this factorization
More informationNumerical Linear Algebra
Numerical Linear Algebra Direct Methods Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Linear Systems: Direct Solution Methods Fall 2017 1 / 14 Introduction The solution of linear systems is one
More informationReview of Matrices and Block Structures
CHAPTER 2 Review of Matrices and Block Structures Numerical linear algebra lies at the heart of modern scientific computing and computational science. Today it is not uncommon to perform numerical computations
More informationThe Solution of Linear Systems AX = B
Chapter 2 The Solution of Linear Systems AX = B 21 Upper-triangular Linear Systems We will now develop the back-substitution algorithm, which is useful for solving a linear system of equations that has
More informationMAT 610: Numerical Linear Algebra. James V. Lambers
MAT 610: Numerical Linear Algebra James V Lambers January 16, 2017 2 Contents 1 Matrix Multiplication Problems 7 11 Introduction 7 111 Systems of Linear Equations 7 112 The Eigenvalue Problem 8 12 Basic
More informationNumerical Linear Algebra And Its Applications
Numerical Linear Algebra And Its Applications Xiao-Qing JIN 1 Yi-Min WEI 2 August 29, 2008 1 Department of Mathematics, University of Macau, Macau, P. R. China. 2 Department of Mathematics, Fudan University,
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationScientific Computing
Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationNumerical Linear Algebra
Chapter 3 Numerical Linear Algebra We review some techniques used to solve Ax = b where A is an n n matrix, and x and b are n 1 vectors (column vectors). We then review eigenvalues and eigenvectors and
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationlecture 2 and 3: algorithms for linear algebra
lecture 2 and 3: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 27, 2018 Solving a system of linear equations
More informationMarch 5, 2012 MATH 408 FINAL EXAM SAMPLE
March 5, 202 MATH 408 FINAL EXAM SAMPLE Partial Solutions to Sample Questions (in progress) See the sample questions for the midterm exam, but also consider the following questions. Obviously, a final
More informationMathematical Optimisation, Chpt 2: Linear Equations and inequalities
Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson
More informationLecture notes: Applied linear algebra Part 1. Version 2
Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and
More informationDirect Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le
Direct Methods for Solving Linear Systems Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le 1 Overview General Linear Systems Gaussian Elimination Triangular Systems The LU Factorization
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel
LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count
More information2.1 Gaussian Elimination
2. Gaussian Elimination A common problem encountered in numerical models is the one in which there are n equations and n unknowns. The following is a description of the Gaussian elimination method for
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationAlgebra C Numerical Linear Algebra Sample Exam Problems
Algebra C Numerical Linear Algebra Sample Exam Problems Notation. Denote by V a finite-dimensional Hilbert space with inner product (, ) and corresponding norm. The abbreviation SPD is used for symmetric
More informationMTH 464: Computational Linear Algebra
MTH 464: Computational Linear Algebra Lecture Outlines Exam 2 Material Prof. M. Beauregard Department of Mathematics & Statistics Stephen F. Austin State University February 6, 2018 Linear Algebra (MTH
More informationComputational Methods. Least Squares Approximation/Optimization
Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationMidterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015
Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic
More informationOrthonormal Transformations
Orthonormal Transformations Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 25, 2010 Applications of transformation Q : R m R m, with Q T Q = I 1.
More informationLecture 4 Orthonormal vectors and QR factorization
Orthonormal vectors and QR factorization 4 1 Lecture 4 Orthonormal vectors and QR factorization EE263 Autumn 2004 orthonormal vectors Gram-Schmidt procedure, QR factorization orthogonal decomposition induced
More informationChapter 2. Solving Systems of Equations. 2.1 Gaussian elimination
Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix
More informationSymmetric Matrices and Eigendecomposition
Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationHouseholder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)
Householder QR Householder reflectors are matrices of the form P = I 2ww T, where w is a unit vector (a vector of 2-norm unity) w Px x Geometrically, P x represents a mirror image of x with respect to
More informationDense LU factorization and its error analysis
Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,
More information3 QR factorization revisited
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 2, 2000 30 3 QR factorization revisited Now we can explain why A = QR factorization is much better when using it to solve Ax = b than the A = LU factorization
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationLecture 18: November Review on Primal-dual interior-poit methods
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationlecture 3 and 4: algorithms for linear algebra
lecture 3 and 4: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 30, 2016 Solving a system of linear equations
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More informationLinear Algebra Primer
Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 12: Gaussian Elimination and LU Factorization Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 10 Gaussian Elimination
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More information1 GSW Sets of Systems
1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationLU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b
AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization
More informationAMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems
AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9
STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 1. qr and complete orthogonal factorization poor man s svd can solve many problems on the svd list using either of these factorizations but they
More informationLecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)
Math 59 Lecture 2 (Tue Mar 5) Gaussian elimination and LU factorization (II) 2 Gaussian elimination - LU factorization For a general n n matrix A the Gaussian elimination produces an LU factorization if
More informationNumerical Linear Algebra
Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost
More informationProgram Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects
Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen
More information