Search Directions for Unconstrained Optimization

Size: px
Start display at page:

Download "Search Directions for Unconstrained Optimization"

Transcription

1 8

2 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All of the search directions considered can be classified as Newton-lie since they are all of the form d = H f(x ), where H is a symmetric n n matrix. If H = µ I for all, the resulting search directions a a scaled steepest descent direction with scale factors µ. More generally, we choose H to approximate f(x ) 1 in order to approximate Newton s method for optimization. The Newton is important since it possesses rapid local convergence properties, and can be shown to be scale independent. We precede our discussion of search directions by maing precise a useful notion of speed or rate of convergence. 1. Rate of Convergence We focus on notions of quotient rates convergence, or Q-convergence rates. Let {x ν } R n and x R n be such that x ν x. We say that x ν x at a linear rate if x ν+1 x lim sup ν x ν x < 1. The convergence is said to be superlinear if this limsup is 0. The convergence is said to be quadratic if x ν+1 x lim sup ν x ν x <. For example, given γ (0, 1) the sequence {γ ν } converges linearly to zero, but not superlinearly. The sequence {γ ν } converges superlinearly to 0, but not quadratically. Finally, the sequence {γ ν } converges quadratically to zero. Superlinear convergence is much faster than linear convergences, but quadratic convergence is much, much faster than superlinear convergence.. Newton s Method for Solving Equations Newton s method is an iterative scheme designed to solve nonlinear equations of the form (89) g(x) = 0, where g : R n R n is assumed to be continuously differentiable. Many problems of importance can be posed in this way. In the context of the optimization problem P, we wish to locate critical points, that is, points at which f(x) = 0. We begin our discussion of Newton s method in the usual context of equation solvng. Assume that the function g in (89) is continuously differentiable and that we have an approximate solution x 0 R n. We now wish to improve on this approximation. If x is a solution to (89), then 0 = g(x) = g(x 0 ) + g (x 0 )(x x 0 ) + o x x 0. Thus, if x 0 is close to x, it is reasonable to suppose that the solution to the linearized system (90) 0 = g(x 0 ) + g (x 0 )(x x 0 ) is even closer. This is Newton s method for finding the roots of the equation g(x) = 0. It has one obvious pitfall. Equation (90) may not be consistent. That is, there may not exist a solution to (90). 83

3 84 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION For the sae of the present argument, we assume that (3) holds, i.e. g (x 0 ) 1 exists. Under this assumption (90) defines the iteration scheme, (91) x +1 := x [g (x )] 1 g(x ), called the Newton iteration. The associated direction (9) d := [g (x )] 1 g(x ). is called the Newton direction. We analyze the convergence behavior of this scheme under the additional assumption that only an approximation to g (x ) 1 is available. We denote this approximation by J. The resulting iteration scheme is (93) x +1 := x J g(x ). Methods of this type are called Newton-Lie methods. Theorem.1. Let g : R n R n be differentiable, x 0 R n, and J 0 R n n. Suppose that there exists x, x 0 R n, and ɛ > 0 with x 0 x < ɛ such that (1) g(x) = 0, () g (x) 1 exists for x B(x; ɛ) := {x R n : x x < ɛ} with sup{ g (x) 1 : x B(x; ɛ)] M 1 (3) g is Lipschitz continuous on clb(x; ɛ) with Lipschitz constant L, and (4) θ 0 := LM1 x0 x + M 0 K < 1 where K (g (x 0 ) 1 J 0 )y 0, y 0 := g(x 0 )/ g(x 0 ), and M 0 = max{ g (x) : x B(x; ɛ)}. Further suppose that iteration (93) is initiated at x 0 where the J s are chosen to satisfy one of the following conditions; (i) (g (x ) 1 J )y K, (ii) (g (x ) 1 J )y θ 1K for some θ 1 (0, 1), (iii) (g (x ) 1 J )y min{m 3 x x 1, K}, for some M > 0, or (iv) (g (x ) 1 J )y min{m g(x ), K}, for some M 3 > 0, where for each = 1,,..., y := g(x )/ g(x ). These hypotheses on the accuracy of the approximations J yield the following conclusions about the rate of convergence of the iterates x. (a) If (i) holds, then x x linearly. (b) If (ii) holds, then x x superlinearly. (c) If (iii) holds, then x x two step quadratically. (d) If (iv) holds, then x x quadratically. Proof. We begin by inductively establishing the basic inequalities (94) x +1 x LM 1 x x + (g (x ) 1 J )g(x ), and (95) x +1 x θ 0 x x as well as the inclusion (96) x +1 B(x; ɛ) for = 0, 1,,.... For = 0 we have x 1 x = x 0 x g (x 0 ) 1 g(x 0 ) + [ g (x 0 ) 1 ] J 0 g(x 0 ) = g (x 0 ) 1 [ g(x) (g(x 0 ) + g (x 0 )(x x 0 )) ] + [ g (x 0 ) 1 J 0 ] g(x 0 ),

4 . NEWTON S METHOD FOR SOLVING EQUATIONS 85 since g (x 0 ) 1 exists by the hypotheses. Consequently, the hypothese (1) (4) plus the quadratic bound lemma imply that x +1 x g (x 0 ) 1 ( g(x) g(x 0 ) + g (x 0 )(x x 0 ) ) + ( g (x 0 ) 1 ) J 0 g(x 0 ) M 1L x 0 x + K g(x 0 ) g(x) M 1L x 0 x + M 0 K x 0 x θ 0 x 0 x < ɛ, whereby (94) (96) are established for = 0. Next suppose that (94) (96) hold for = 0, 1,..., s 1. We show that (94) (96) hold at = s. Since x s B(x, ɛ), hypotheses () (4) hold at x s, one can proceed exactly as in the case = 0 to obtain (94). Now if any one of (i) (iv) holds, then (i) holds. Thus, by (94), we find that x s+1 x M 1L x s x + (g (x s ) 1 J s )g(x s ) [ M 1L θs 0 [ M 1L = θ 0 x s x. x 0 x + M0 K ] x s x x 0 x + M0 K ] x s x Hence x s+1 x θ 0 x s x θ 0 ɛ < ɛ and so x s+1 B(x, ɛ). We now proceed to establish (a) (d). (a) This clearly holds since the induction above established that x +1 x θ0 x x. (b) From (94), we have x +1 x LM 1 x x + (g (x ) 1 J )g(x ) LM 1 x x + θ 1K g(x ) [ LM1 θ 0 x 0 x + θ x 1 M 0 K] x Hence x x superlinearly. (c) From (94) and the fact that x x, we eventually have x +1 x LM 1 x x + (g (x ) 1 J )g(x ) Hence x x two step quadratically. LM 1 x x + M x x 1 g(x ) [ LM1 x x [ + M0 M x 1 x + x x ]] x x [ LM1 θ 0 x 1 x + M0 M (1 + θ 0 ) x 1 x ] = θ 0 x 1 x [ ] LM1 θ 0 + M 0 M (1 + θ 0 ) θ 0 x 1 x.

5 86 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION (d) Again by (94) and the fact that x x, we eventually have x +1 x LM 1 x x + (g (x ) 1 J )g(x ) LM 1 x x + M g(x ) [ ] LM1 x + M M0 x. Note that the conditions required for the approximations to the Jacobian matrices g (x ) 1 given in (i) (ii) do not imply that J g (x) 1. The stronger conditions (i) g (x ) 1 J g (x 0 ) 1 J 0, (ii) g (x +1 ) 1 J +1 θ1 g (x ) 1 J for some θ1 (0, 1), (iii) g (x ) 1 J min{m x +1 x, g (x 0 ) 1 J 0 } for some M > 0, or (iv) g (x ) 1 = J, which imply the conditions (i) through (iv) of Theorem.1 respectively, all imply the convergence of the inverse Jacobian approximates to g (x) 1. The conditions (i) (iv) are less desirable since they require greater expense and care in the construction of the inverse Jacobian approximates. 3. Newton s Method for Minimization We now translate the results of previous section to the optimization setting. The underlying problem is The Newton-lie iterations tae the form P min f(x). x R n x +1 = x H f(x ), where H is an approximation to the inverse of the Hessian matrix f(x ). Theorem 3.1. Let f : R n R be twice continuously differentiable, x 0 R n, and H 0 R n n. Suppose that (1) there exists x R n and ɛ > x 0 x such that f(x) f(x) whenever x x ɛ, () there is a δ > 0 such that δ z zt f(x)z for all x B(x, ɛ), (3) f is Lipschitz continuous on cl (B) (x; ɛ) with Lipschitz constant L, and (4) θ 0 := L δ x 0 x + M 0 K < 1 where M 0 > 0 satisfies z T f(x)z M 0 z for all x B(x, ɛ) and K ( f(x 0 ) 1 H 0 )y 0 with y 0 = f(x 0 )/ f(x 0 ). Further, suppose that the iteration (97) x +1 := x H f(x ) is initiated at x 0 where the H s are chosen to satisfy one of the following conditions: (i) ( f(x ) 1 H )y K, (ii) ( f(x ) 1 H )y θ 1 K for some θ 1 (0, 1), (iii) ( f(x ) 1 H )y min{m x x 1, K}, for some M > 0, or (iv) ( f(x ) 1 H )y min{m3 f(x ), K}, for some M3 > 0, where for each = 1,,... y := f(x )/ f(x ). These hypotheses on the accuracy of the approximations H yield the following conclusions about the rate of convergence of the iterates x. (a) If (i) holds, then x x linearly. (b) If (ii) holds, then x x superlinearly. (c) If (iii) holds, then x x two step quadratically. (d) If (iv) holds, then x x quadradically.

6 4. MATRIX SECANT METHODS 87 To more fully understand the convergence behavior described in this theorem, let us examine the nature of the controling parameters L, M 0, and M 1. Since L is a Lipschitz constant for f it loosely corresponds to a bound on the third order behavior of f. Thus the assumptions for convergence mae implicit demands on the third derivative. The constant δ is a local lower bound on the eigenvalues of f near x. That is, f behaves locally as if it were a strongly convex function (see exercises) with modulus δ. Finally, M 0 can be interpreted as a local Lipschitz constant for f and only plays a role when f is approximated inexactly by H s. We now illustrate the performance differences between the method of steepest descent and Newton s method on a simple one dimensional problem. Let f(x) = x + e x. Clearly, f is a strongly convex function with f(x) = x + e x f (x) = x + e x f (x) = + e x > f (x) = e x. If we apply the steepest descent algorithm with bactracing (γ = 1/, c = 0.01) initiated at x 0 = 1, we get the following table x f(x ) f (x ) s If we apply Newton s method from the same starting point and tae a unit step at each iteration, we obtain a dramatically different table. x f (x) / In addition, one more iteration gives f (x 5 ) This is a stunning improvement in performance and shows why one always uses Newton s method (or an approximation to it) whenever possible. Our next objective is to develop numerically viable methods for approximating Jacobians and Hessians in Newton-lie methods. 4. Matrix Secant Methods Let us return to the problem of finding x R n such that g(x) = 0 where g : R n R n is continuously differentiable. In this section we consider Newton-Lie methods of a special type. Recall that in a Newton-Lie method the iteration scheme taes the form (98) x +1 := x J g(x ), where J is meant to approximate the inverse of g (x ). In the one dimensional case, a method proposed by the Babylonians 3700 years ago is of particular significance. Today we call it the secant method: (99) J = x x 1 g(x ) g(x 1 ).

7 88 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION With this approximation one has g (x ) 1 J = g(x 1 ) [g(x ) + g (x )(x 1 x )] g (x )[g(x 1 ) g(x. )] Near a point x at which g (x ) 0 one can use the MVT to show there exists an α > 0 such that α x y g(x) g(y). Consequently, by the Quadratic Bound Lemma, L g (x ) 1 J x 1 x α g (x ) x 1 x K x 1 x for some constant K > 0 whenever x and x 1 are sufficiently close to x. Therefore, by our convergence Theorem for Newton Lie methods, the secant method is locally two step quadratically convergent to a non singular solution of the equation g(x) = 0. An additional advantage of this approach is that no extra function evaluations are required to obtain the approximation J Matrix Secant Methods for Equations. Unfortunately, the secant approximation (99) is meaningless if the dimension n is greater than 1 since division by vectors is undefined. But this can be rectified by multiplying (99) on the right by (g(x 1 ) g(x )) and writing (100) J (g(x ) g(x 1 )) = x x 1. Equation (100) is called the Quasi-Newton equation (QNE), or matrix secant equation (MSE), at x. Here the matrix J is unnown, but is required to satisfy the n linear equations of the MSE. These equations determine an n dimensional affine manifold in R n n. Since J contains n unnowns, the n linear equations in (100) are not sufficient to uniquely determine J. To nail down a specific J further conditions on the update J must be given. What conditions should these be? To develop sensible conditions on J, let us consider an overall iteration scheme based on (98). For convenience, let us denote J 1 by B (i.e. B = J 1 ). Using the B s, the MSE (100) becomes (101) B (x x 1 ) = g(x ) g(x 1 ). At every iteration we have (x, B ) and compute x +1 by (98). Then B +1 is constructed to satisfy (101). If B is close to g (x ) and x +1 is close to x, then B +1 should be chosen not only to satisfy (101) but also to be as close to B as possible. With this in mind, we must now decide what we mean by close. From a computational perspective, we prefer close to mean easy to compute. That is, B +1 should be algebraically close to B in the sense that B +1 is only a ran 1 modification of B. Since we are assuming that B +1 is a ran 1 modification to B, there are vectors u, v R n such that (10) B +1 = B + uv T. We now use the matrix secant equation (101) to derive conditions on the choice of u and v. In this setting, the MSE becomes B +1 s = y, where s := x +1 x and y := g(x +1 ) g(x ). Multiplying (??) by s gives y = B +1 s = B s + uv T s. Hence, if v T s 0, we obtain u = y B s v T s and ( y B s ) v T (103) B +1 = B + v T s. Equation (103) determines a whole class of ran one updates that satisfy the MSE where one is allowed to choose v R n as long as v T s 0. If s 0, then an obvious choice for v is s yielding the update ( y B s ) s T (104) B +1 = B =. s T s

8 4. MATRIX SECANT METHODS 89 This is nown as Broyden s update. It turns out that the Broyden update is also analytically close. and Theorem 4.1. Let A R n n, s, y R n, s 0. Then for any matrix norms and such that the solution to AB A B vv T v T v 1, (105) min{ B A : Bs = y} is (y As)sT (106) A + = A + s T. s In particular, (106) solves (105) when is the l matrix norm, and (106) solves (105) uniquely when is the Frobenius norm. Proof. Let B {B R n n : Bs = y}, then A + A = (y As)s T s T s = (B A) sst s T s B A ss T s T s B A. Note that if =, then { vv T vv T } v T v = sup v T v x x = 1 { } (v = sup T x) v x = 1 = 1, so that the conclusion of the result is not vacuous. For uniqueness observe that the Frobenius norm is strictly convex and A B F A F B. Therefore, the Broyden update (104) is both algebraically and analytically close to B. These properties indicate that it should perform well in practice and indeed it does. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x, B ) compute (x +1, B x+1 ) as follows: Solve B s = g(x ) for s and set x +1 : = x + s y : = g(x +1 ) g(x ) B +1 : = B + (y B s )s T s T s. We would prefer to write the Broyden update in terms of the matrices J = B 1 so that we can write the step computation as s = J g(x ) avoiding the need to solve an equation. To obtain the formula for J we use the the following important lemma for matrix inversion. Lemma 4.1. (Sherman-Morrison-Woodbury) Suppose A R n n, U R n, V R n are such that both A 1 and (I + V T A 1 U) 1 exist, then (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1

9 90 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION The above lemma verifies that if B 1 = J exists and s T J y = s T B 1 y 0, then [ ] 1 (107) J +1 = B + (y B s )s T = B 1 s T s + (s B 1 y )s T B 1 s T B 1 y = J + (s J y )st J. s T J y In this case, it is possible to directly update the inverses J. It should be cautioned though that this process can become numerically unstable if s T J y is small. Therefore, in practise, the value s T J y must be monitored to avoid numerical instability. Although we do not pause to establish the convergence rates here, we do give the following result due to Dennis and Moré (1974). Theorem 4.. Let g : R n R n be continuously differentiable in an open convex set D R n. Assume that there exists x R n and r, β > 0 such that x + rb D, g(x ) = 0, g (x ) 1 exists with g (x ) 1 β, and g is Lipschitz continuous on x + rb with Lipschitz constant γ > 0. Then there exist positive constants ɛ and δ such that if x 0 x ɛ and B 0 g (x 0 ) δ, then the sequence {x } generated by the iteration [ x +1 := x + s where s solves 0 = g(x ) + B s is well-defined with x x superlinearly. B +1 := B + (y B s )s T s T s where y = g(x +1 ) g(x ) Matrix Secant Methods for Minimization. We now extend these matrix secant ideas to optimization, specifically minimization. The underlying problem we consider is P : minimize x R n f(x), where f : R n R is assumed to be twice continuously differentiable. In this setting, we wish to solve the equation f(x) = 0 and the MSE (101) becomes (108) H +1 y = s, where s := x +1 x and y := f(x +1 ) f(x ). Here the matrix H is intended to be an approximation to the inverse of the hessian matrix f(x ). Writing M = H 1, a straightforward application of Broyden s method gives the update M +1 = M + (y M s )s T. s T s However, this is unsatisfactory for two reasons: (1) Since M approximates f(x ) it must be symmetric. () Since we are minimizing, then M must be positive definite to insure that s = M 1 f(x ) is a direction of descent for f at x. To address problem 1 above, one could return to equation (103) an find an update that preserves symmetry. Such an update is uniquely obtained by setting v = (y M s ). This is called the symmetric ran 1 update or SR1. Although this update can on occasion exhibit problems with numerical stability, it has recently received a great deal of renewed interest. The stability problems occur whenever (109) v T s = (y M s ) T s s has small magnitude. The inverse SR1 update is given by H +1 = H + (s H y )(s H y ) T (s H y ) T y which exists whenever (s H y ) T y 0. We now approach the question of how to update M in a way that addresses both the issue of symmetry and positive definiteness while still using the Broyden updating ideas. Given a symmetric positive definite matrix M and two vectors s and y, our goal is to find a symmetric positive definite matrix M such that Ms = y. Since M is symmertic and positive definite, there is a non-singular n n matrix L such that M = LL T. Indeed, L can be

10 4. MATRIX SECANT METHODS 91 chosen to be the lower triangular Cholesy factor of M. If M is also symmetric and positive definite then there is a matrix J R n n such that M = JJ T. The MSE (??) implies that if (110) J T s = v then (111) Jv = y. Let us apply the Broyden update technique to (111), J, and L. That is, suppose that (11) J = L + Then by (110) (y Lv)vT v T v (113) v = J T s = L T s + v(y Lv)T s v T. v This expression implies that v must have the form for some α R. Substituting this bac into (113) we get Hence (114) α = v = αl T s αl T s = L T s + αlt s(y αll T s) T s α s T LL T. s [ s T ] y s T. Ms Consequently, such a matrix J satisfying (113) exists only if s T y > 0 in which case with yielding J = L + (y αms)st L αs T, Ms α = [ s T ] 1/ y s T, Ms (115) M = M + yyt y T s MssT M s T Ms. Moreover, the Cholesy factorization for M can be obtained directly from the matrices J. Specifically, if the QR factorization of J T is J T = QR, we can set L = R yielding M = JJ T = R T Q T QR = LL T. The formula for updating the inverses is again given by applying the Sherman-Morrison-Woodbury formula to obtain (116) H = H + (s + Hy)T yss T (s T y) HysT + sy T H s T, y where H = M 1. The update (115) is called the BFGS update and (116) the inverse BFGS update. The letter BFGS stand for Broyden, Flethcher, Goldfarb, and Shanno. We have shown that beginning with a symmetric positive definite matrix M we can obtain a symmetric and positive definite update M +1 that satisfies the MSE M +1 s = y by applying the formula (115) whenever s T y > 0. We must now address the question of how to choose x +1 so that s T y > 0. Recall that and where y = y = f(x +1 ) f(x ) s = x +1 x = t d, d = t H f(x ).

11 9 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION is the matrix secant search direction and t is the stepsize. Hence y T s = f(x +1 ) T s f(x ) T s = t ( f(x + t d ) T d f(x ) T d ), where d := H f(x ). Since H is positive definite the direction d is a descent direction for f at x and so t > 0. Therefore, to insure that s T y > 0 we need only show that t > 0 can be choosen so that (117) f(x + t d ) T d β f(x ) T d for some β (0, 1) since in this case f(x + t d ) T d f(x ) T d (β 1) f(x ) T d > 0. But this precisely the second condition in the wea Wolfe conditions with β = c. Hence a successful BFGS update can always be obtained. The BFGS update and is currently considered the best matrix secant update for minimization. BFGS Updating σ := s T y ŝ := s /σ ŷ := y /σ H +1 := H + (ŝ H ŷ )(ŝ ) T + ŝ (ŝ H ŷ ) T (ŝ H ŷ ) T ŷ ŝ (ŝ ) T

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection

More information

Matrix Secant Methods

Matrix Secant Methods Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:

More information

Math 408A: Non-Linear Optimization

Math 408A: Non-Linear Optimization February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Numerical solutions of nonlinear systems of equations

Numerical solutions of nonlinear systems of equations Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

March 8, 2010 MATH 408 FINAL EXAM SAMPLE March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min LLNL IM Release number: LLNL-JRNL-745068 A STRUCTURED QUASI-NEWTON ALGORITHM FOR OPTIMIZING WITH INCOMPLETE HESSIAN INFORMATION COSMIN G. PETRA, NAIYUAN CHIANG, AND MIHAI ANITESCU Abstract. We present

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Solutions Preliminary Examination in Numerical Analysis January, 2017

Solutions Preliminary Examination in Numerical Analysis January, 2017 Solutions Preliminary Examination in Numerical Analysis January, 07 Root Finding The roots are -,0, a) First consider x 0 > Let x n+ = + ε and x n = + δ with δ > 0 The iteration gives 0 < ε δ < 3, which

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION A DIAGONAL-AUGMENTED QUASI-NEWTON METHOD WITH APPLICATION TO FACTORIZATION MACHINES Aryan Mohtari and Amir Ingber Department of Electrical and Systems Engineering, University of Pennsylvania, PA, USA Big-data

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Numerical Methods for Large-Scale Nonlinear Systems

Numerical Methods for Large-Scale Nonlinear Systems Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes Jim Lambers MAT 49/59 Summer Session 20-2 Lecture Notes These notes correspond to Section 34 in the text Broyden s Method One of the drawbacks of using Newton s Method to solve a system of nonlinear equations

More information

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for Life beyond Newton Notes for 2016-04-08 Newton s method has many attractive properties, particularly when we combine it with a globalization strategy. Unfortunately, Newton steps are not cheap. At each

More information

Cubic regularization in symmetric rank-1 quasi-newton methods

Cubic regularization in symmetric rank-1 quasi-newton methods Math. Prog. Comp. (2018) 10:457 486 https://doi.org/10.1007/s12532-018-0136-7 FULL LENGTH PAPER Cubic regularization in symmetric rank-1 quasi-newton methods Hande Y. Benson 1 David F. Shanno 2 Received:

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms

Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Robert M. Gower and Peter Richtárik School of Mathematics University of Edinburgh United Kingdom ebruary 4, 06 Abstract

More information

arxiv: v3 [math.na] 23 Mar 2016

arxiv: v3 [math.na] 23 Mar 2016 Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Robert M. Gower and Peter Richtárik arxiv:602.0768v3 [math.na] 23 Mar 206 School of Mathematics University of Edinburgh

More information

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Preconditioned conjugate gradient algorithms with column scaling

Preconditioned conjugate gradient algorithms with column scaling Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS MEHIDDIN AL-BAALI AND HUMAID KHALFAN Abstract. Techniques for obtaining safely positive definite Hessian approximations with selfscaling

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Math and Numerical Methods Review

Math and Numerical Methods Review Math and Numerical Methods Review Michael Caracotsios, Ph.D. Clinical Associate Professor Chemical Engineering Department University of Illinois at Chicago Introduction In the study of chemical engineering

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.

More information

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Numerical Algorithms as Dynamical Systems

Numerical Algorithms as Dynamical Systems A Study on Numerical Algorithms as Dynamical Systems Moody Chu North Carolina State University What This Study Is About? To recast many numerical algorithms as special dynamical systems, whence to derive

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Research Article A Two-Step Matrix-Free Secant Method for Solving Large-Scale Systems of Nonlinear Equations

Research Article A Two-Step Matrix-Free Secant Method for Solving Large-Scale Systems of Nonlinear Equations Applied Mathematics Volume 2012, Article ID 348654, 9 pages doi:10.1155/2012/348654 Research Article A Two-Step Matrix-Free Secant Method for Solving Large-Scale Systems of Nonlinear Equations M. Y. Waziri,

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal

More information

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Sub-Sampled Newton Methods I: Globally Convergent Algorithms Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information