Search Directions for Unconstrained Optimization
|
|
- Kristopher May
- 5 years ago
- Views:
Transcription
1 8
2 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All of the search directions considered can be classified as Newton-lie since they are all of the form d = H f(x ), where H is a symmetric n n matrix. If H = µ I for all, the resulting search directions a a scaled steepest descent direction with scale factors µ. More generally, we choose H to approximate f(x ) 1 in order to approximate Newton s method for optimization. The Newton is important since it possesses rapid local convergence properties, and can be shown to be scale independent. We precede our discussion of search directions by maing precise a useful notion of speed or rate of convergence. 1. Rate of Convergence We focus on notions of quotient rates convergence, or Q-convergence rates. Let {x ν } R n and x R n be such that x ν x. We say that x ν x at a linear rate if x ν+1 x lim sup ν x ν x < 1. The convergence is said to be superlinear if this limsup is 0. The convergence is said to be quadratic if x ν+1 x lim sup ν x ν x <. For example, given γ (0, 1) the sequence {γ ν } converges linearly to zero, but not superlinearly. The sequence {γ ν } converges superlinearly to 0, but not quadratically. Finally, the sequence {γ ν } converges quadratically to zero. Superlinear convergence is much faster than linear convergences, but quadratic convergence is much, much faster than superlinear convergence.. Newton s Method for Solving Equations Newton s method is an iterative scheme designed to solve nonlinear equations of the form (89) g(x) = 0, where g : R n R n is assumed to be continuously differentiable. Many problems of importance can be posed in this way. In the context of the optimization problem P, we wish to locate critical points, that is, points at which f(x) = 0. We begin our discussion of Newton s method in the usual context of equation solvng. Assume that the function g in (89) is continuously differentiable and that we have an approximate solution x 0 R n. We now wish to improve on this approximation. If x is a solution to (89), then 0 = g(x) = g(x 0 ) + g (x 0 )(x x 0 ) + o x x 0. Thus, if x 0 is close to x, it is reasonable to suppose that the solution to the linearized system (90) 0 = g(x 0 ) + g (x 0 )(x x 0 ) is even closer. This is Newton s method for finding the roots of the equation g(x) = 0. It has one obvious pitfall. Equation (90) may not be consistent. That is, there may not exist a solution to (90). 83
3 84 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION For the sae of the present argument, we assume that (3) holds, i.e. g (x 0 ) 1 exists. Under this assumption (90) defines the iteration scheme, (91) x +1 := x [g (x )] 1 g(x ), called the Newton iteration. The associated direction (9) d := [g (x )] 1 g(x ). is called the Newton direction. We analyze the convergence behavior of this scheme under the additional assumption that only an approximation to g (x ) 1 is available. We denote this approximation by J. The resulting iteration scheme is (93) x +1 := x J g(x ). Methods of this type are called Newton-Lie methods. Theorem.1. Let g : R n R n be differentiable, x 0 R n, and J 0 R n n. Suppose that there exists x, x 0 R n, and ɛ > 0 with x 0 x < ɛ such that (1) g(x) = 0, () g (x) 1 exists for x B(x; ɛ) := {x R n : x x < ɛ} with sup{ g (x) 1 : x B(x; ɛ)] M 1 (3) g is Lipschitz continuous on clb(x; ɛ) with Lipschitz constant L, and (4) θ 0 := LM1 x0 x + M 0 K < 1 where K (g (x 0 ) 1 J 0 )y 0, y 0 := g(x 0 )/ g(x 0 ), and M 0 = max{ g (x) : x B(x; ɛ)}. Further suppose that iteration (93) is initiated at x 0 where the J s are chosen to satisfy one of the following conditions; (i) (g (x ) 1 J )y K, (ii) (g (x ) 1 J )y θ 1K for some θ 1 (0, 1), (iii) (g (x ) 1 J )y min{m 3 x x 1, K}, for some M > 0, or (iv) (g (x ) 1 J )y min{m g(x ), K}, for some M 3 > 0, where for each = 1,,..., y := g(x )/ g(x ). These hypotheses on the accuracy of the approximations J yield the following conclusions about the rate of convergence of the iterates x. (a) If (i) holds, then x x linearly. (b) If (ii) holds, then x x superlinearly. (c) If (iii) holds, then x x two step quadratically. (d) If (iv) holds, then x x quadratically. Proof. We begin by inductively establishing the basic inequalities (94) x +1 x LM 1 x x + (g (x ) 1 J )g(x ), and (95) x +1 x θ 0 x x as well as the inclusion (96) x +1 B(x; ɛ) for = 0, 1,,.... For = 0 we have x 1 x = x 0 x g (x 0 ) 1 g(x 0 ) + [ g (x 0 ) 1 ] J 0 g(x 0 ) = g (x 0 ) 1 [ g(x) (g(x 0 ) + g (x 0 )(x x 0 )) ] + [ g (x 0 ) 1 J 0 ] g(x 0 ),
4 . NEWTON S METHOD FOR SOLVING EQUATIONS 85 since g (x 0 ) 1 exists by the hypotheses. Consequently, the hypothese (1) (4) plus the quadratic bound lemma imply that x +1 x g (x 0 ) 1 ( g(x) g(x 0 ) + g (x 0 )(x x 0 ) ) + ( g (x 0 ) 1 ) J 0 g(x 0 ) M 1L x 0 x + K g(x 0 ) g(x) M 1L x 0 x + M 0 K x 0 x θ 0 x 0 x < ɛ, whereby (94) (96) are established for = 0. Next suppose that (94) (96) hold for = 0, 1,..., s 1. We show that (94) (96) hold at = s. Since x s B(x, ɛ), hypotheses () (4) hold at x s, one can proceed exactly as in the case = 0 to obtain (94). Now if any one of (i) (iv) holds, then (i) holds. Thus, by (94), we find that x s+1 x M 1L x s x + (g (x s ) 1 J s )g(x s ) [ M 1L θs 0 [ M 1L = θ 0 x s x. x 0 x + M0 K ] x s x x 0 x + M0 K ] x s x Hence x s+1 x θ 0 x s x θ 0 ɛ < ɛ and so x s+1 B(x, ɛ). We now proceed to establish (a) (d). (a) This clearly holds since the induction above established that x +1 x θ0 x x. (b) From (94), we have x +1 x LM 1 x x + (g (x ) 1 J )g(x ) LM 1 x x + θ 1K g(x ) [ LM1 θ 0 x 0 x + θ x 1 M 0 K] x Hence x x superlinearly. (c) From (94) and the fact that x x, we eventually have x +1 x LM 1 x x + (g (x ) 1 J )g(x ) Hence x x two step quadratically. LM 1 x x + M x x 1 g(x ) [ LM1 x x [ + M0 M x 1 x + x x ]] x x [ LM1 θ 0 x 1 x + M0 M (1 + θ 0 ) x 1 x ] = θ 0 x 1 x [ ] LM1 θ 0 + M 0 M (1 + θ 0 ) θ 0 x 1 x.
5 86 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION (d) Again by (94) and the fact that x x, we eventually have x +1 x LM 1 x x + (g (x ) 1 J )g(x ) LM 1 x x + M g(x ) [ ] LM1 x + M M0 x. Note that the conditions required for the approximations to the Jacobian matrices g (x ) 1 given in (i) (ii) do not imply that J g (x) 1. The stronger conditions (i) g (x ) 1 J g (x 0 ) 1 J 0, (ii) g (x +1 ) 1 J +1 θ1 g (x ) 1 J for some θ1 (0, 1), (iii) g (x ) 1 J min{m x +1 x, g (x 0 ) 1 J 0 } for some M > 0, or (iv) g (x ) 1 = J, which imply the conditions (i) through (iv) of Theorem.1 respectively, all imply the convergence of the inverse Jacobian approximates to g (x) 1. The conditions (i) (iv) are less desirable since they require greater expense and care in the construction of the inverse Jacobian approximates. 3. Newton s Method for Minimization We now translate the results of previous section to the optimization setting. The underlying problem is The Newton-lie iterations tae the form P min f(x). x R n x +1 = x H f(x ), where H is an approximation to the inverse of the Hessian matrix f(x ). Theorem 3.1. Let f : R n R be twice continuously differentiable, x 0 R n, and H 0 R n n. Suppose that (1) there exists x R n and ɛ > x 0 x such that f(x) f(x) whenever x x ɛ, () there is a δ > 0 such that δ z zt f(x)z for all x B(x, ɛ), (3) f is Lipschitz continuous on cl (B) (x; ɛ) with Lipschitz constant L, and (4) θ 0 := L δ x 0 x + M 0 K < 1 where M 0 > 0 satisfies z T f(x)z M 0 z for all x B(x, ɛ) and K ( f(x 0 ) 1 H 0 )y 0 with y 0 = f(x 0 )/ f(x 0 ). Further, suppose that the iteration (97) x +1 := x H f(x ) is initiated at x 0 where the H s are chosen to satisfy one of the following conditions: (i) ( f(x ) 1 H )y K, (ii) ( f(x ) 1 H )y θ 1 K for some θ 1 (0, 1), (iii) ( f(x ) 1 H )y min{m x x 1, K}, for some M > 0, or (iv) ( f(x ) 1 H )y min{m3 f(x ), K}, for some M3 > 0, where for each = 1,,... y := f(x )/ f(x ). These hypotheses on the accuracy of the approximations H yield the following conclusions about the rate of convergence of the iterates x. (a) If (i) holds, then x x linearly. (b) If (ii) holds, then x x superlinearly. (c) If (iii) holds, then x x two step quadratically. (d) If (iv) holds, then x x quadradically.
6 4. MATRIX SECANT METHODS 87 To more fully understand the convergence behavior described in this theorem, let us examine the nature of the controling parameters L, M 0, and M 1. Since L is a Lipschitz constant for f it loosely corresponds to a bound on the third order behavior of f. Thus the assumptions for convergence mae implicit demands on the third derivative. The constant δ is a local lower bound on the eigenvalues of f near x. That is, f behaves locally as if it were a strongly convex function (see exercises) with modulus δ. Finally, M 0 can be interpreted as a local Lipschitz constant for f and only plays a role when f is approximated inexactly by H s. We now illustrate the performance differences between the method of steepest descent and Newton s method on a simple one dimensional problem. Let f(x) = x + e x. Clearly, f is a strongly convex function with f(x) = x + e x f (x) = x + e x f (x) = + e x > f (x) = e x. If we apply the steepest descent algorithm with bactracing (γ = 1/, c = 0.01) initiated at x 0 = 1, we get the following table x f(x ) f (x ) s If we apply Newton s method from the same starting point and tae a unit step at each iteration, we obtain a dramatically different table. x f (x) / In addition, one more iteration gives f (x 5 ) This is a stunning improvement in performance and shows why one always uses Newton s method (or an approximation to it) whenever possible. Our next objective is to develop numerically viable methods for approximating Jacobians and Hessians in Newton-lie methods. 4. Matrix Secant Methods Let us return to the problem of finding x R n such that g(x) = 0 where g : R n R n is continuously differentiable. In this section we consider Newton-Lie methods of a special type. Recall that in a Newton-Lie method the iteration scheme taes the form (98) x +1 := x J g(x ), where J is meant to approximate the inverse of g (x ). In the one dimensional case, a method proposed by the Babylonians 3700 years ago is of particular significance. Today we call it the secant method: (99) J = x x 1 g(x ) g(x 1 ).
7 88 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION With this approximation one has g (x ) 1 J = g(x 1 ) [g(x ) + g (x )(x 1 x )] g (x )[g(x 1 ) g(x. )] Near a point x at which g (x ) 0 one can use the MVT to show there exists an α > 0 such that α x y g(x) g(y). Consequently, by the Quadratic Bound Lemma, L g (x ) 1 J x 1 x α g (x ) x 1 x K x 1 x for some constant K > 0 whenever x and x 1 are sufficiently close to x. Therefore, by our convergence Theorem for Newton Lie methods, the secant method is locally two step quadratically convergent to a non singular solution of the equation g(x) = 0. An additional advantage of this approach is that no extra function evaluations are required to obtain the approximation J Matrix Secant Methods for Equations. Unfortunately, the secant approximation (99) is meaningless if the dimension n is greater than 1 since division by vectors is undefined. But this can be rectified by multiplying (99) on the right by (g(x 1 ) g(x )) and writing (100) J (g(x ) g(x 1 )) = x x 1. Equation (100) is called the Quasi-Newton equation (QNE), or matrix secant equation (MSE), at x. Here the matrix J is unnown, but is required to satisfy the n linear equations of the MSE. These equations determine an n dimensional affine manifold in R n n. Since J contains n unnowns, the n linear equations in (100) are not sufficient to uniquely determine J. To nail down a specific J further conditions on the update J must be given. What conditions should these be? To develop sensible conditions on J, let us consider an overall iteration scheme based on (98). For convenience, let us denote J 1 by B (i.e. B = J 1 ). Using the B s, the MSE (100) becomes (101) B (x x 1 ) = g(x ) g(x 1 ). At every iteration we have (x, B ) and compute x +1 by (98). Then B +1 is constructed to satisfy (101). If B is close to g (x ) and x +1 is close to x, then B +1 should be chosen not only to satisfy (101) but also to be as close to B as possible. With this in mind, we must now decide what we mean by close. From a computational perspective, we prefer close to mean easy to compute. That is, B +1 should be algebraically close to B in the sense that B +1 is only a ran 1 modification of B. Since we are assuming that B +1 is a ran 1 modification to B, there are vectors u, v R n such that (10) B +1 = B + uv T. We now use the matrix secant equation (101) to derive conditions on the choice of u and v. In this setting, the MSE becomes B +1 s = y, where s := x +1 x and y := g(x +1 ) g(x ). Multiplying (??) by s gives y = B +1 s = B s + uv T s. Hence, if v T s 0, we obtain u = y B s v T s and ( y B s ) v T (103) B +1 = B + v T s. Equation (103) determines a whole class of ran one updates that satisfy the MSE where one is allowed to choose v R n as long as v T s 0. If s 0, then an obvious choice for v is s yielding the update ( y B s ) s T (104) B +1 = B =. s T s
8 4. MATRIX SECANT METHODS 89 This is nown as Broyden s update. It turns out that the Broyden update is also analytically close. and Theorem 4.1. Let A R n n, s, y R n, s 0. Then for any matrix norms and such that the solution to AB A B vv T v T v 1, (105) min{ B A : Bs = y} is (y As)sT (106) A + = A + s T. s In particular, (106) solves (105) when is the l matrix norm, and (106) solves (105) uniquely when is the Frobenius norm. Proof. Let B {B R n n : Bs = y}, then A + A = (y As)s T s T s = (B A) sst s T s B A ss T s T s B A. Note that if =, then { vv T vv T } v T v = sup v T v x x = 1 { } (v = sup T x) v x = 1 = 1, so that the conclusion of the result is not vacuous. For uniqueness observe that the Frobenius norm is strictly convex and A B F A F B. Therefore, the Broyden update (104) is both algebraically and analytically close to B. These properties indicate that it should perform well in practice and indeed it does. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x, B ) compute (x +1, B x+1 ) as follows: Solve B s = g(x ) for s and set x +1 : = x + s y : = g(x +1 ) g(x ) B +1 : = B + (y B s )s T s T s. We would prefer to write the Broyden update in terms of the matrices J = B 1 so that we can write the step computation as s = J g(x ) avoiding the need to solve an equation. To obtain the formula for J we use the the following important lemma for matrix inversion. Lemma 4.1. (Sherman-Morrison-Woodbury) Suppose A R n n, U R n, V R n are such that both A 1 and (I + V T A 1 U) 1 exist, then (A + UV T ) 1 = A 1 A 1 U(I + V T A 1 U) 1 V T A 1
9 90 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION The above lemma verifies that if B 1 = J exists and s T J y = s T B 1 y 0, then [ ] 1 (107) J +1 = B + (y B s )s T = B 1 s T s + (s B 1 y )s T B 1 s T B 1 y = J + (s J y )st J. s T J y In this case, it is possible to directly update the inverses J. It should be cautioned though that this process can become numerically unstable if s T J y is small. Therefore, in practise, the value s T J y must be monitored to avoid numerical instability. Although we do not pause to establish the convergence rates here, we do give the following result due to Dennis and Moré (1974). Theorem 4.. Let g : R n R n be continuously differentiable in an open convex set D R n. Assume that there exists x R n and r, β > 0 such that x + rb D, g(x ) = 0, g (x ) 1 exists with g (x ) 1 β, and g is Lipschitz continuous on x + rb with Lipschitz constant γ > 0. Then there exist positive constants ɛ and δ such that if x 0 x ɛ and B 0 g (x 0 ) δ, then the sequence {x } generated by the iteration [ x +1 := x + s where s solves 0 = g(x ) + B s is well-defined with x x superlinearly. B +1 := B + (y B s )s T s T s where y = g(x +1 ) g(x ) Matrix Secant Methods for Minimization. We now extend these matrix secant ideas to optimization, specifically minimization. The underlying problem we consider is P : minimize x R n f(x), where f : R n R is assumed to be twice continuously differentiable. In this setting, we wish to solve the equation f(x) = 0 and the MSE (101) becomes (108) H +1 y = s, where s := x +1 x and y := f(x +1 ) f(x ). Here the matrix H is intended to be an approximation to the inverse of the hessian matrix f(x ). Writing M = H 1, a straightforward application of Broyden s method gives the update M +1 = M + (y M s )s T. s T s However, this is unsatisfactory for two reasons: (1) Since M approximates f(x ) it must be symmetric. () Since we are minimizing, then M must be positive definite to insure that s = M 1 f(x ) is a direction of descent for f at x. To address problem 1 above, one could return to equation (103) an find an update that preserves symmetry. Such an update is uniquely obtained by setting v = (y M s ). This is called the symmetric ran 1 update or SR1. Although this update can on occasion exhibit problems with numerical stability, it has recently received a great deal of renewed interest. The stability problems occur whenever (109) v T s = (y M s ) T s s has small magnitude. The inverse SR1 update is given by H +1 = H + (s H y )(s H y ) T (s H y ) T y which exists whenever (s H y ) T y 0. We now approach the question of how to update M in a way that addresses both the issue of symmetry and positive definiteness while still using the Broyden updating ideas. Given a symmetric positive definite matrix M and two vectors s and y, our goal is to find a symmetric positive definite matrix M such that Ms = y. Since M is symmertic and positive definite, there is a non-singular n n matrix L such that M = LL T. Indeed, L can be
10 4. MATRIX SECANT METHODS 91 chosen to be the lower triangular Cholesy factor of M. If M is also symmetric and positive definite then there is a matrix J R n n such that M = JJ T. The MSE (??) implies that if (110) J T s = v then (111) Jv = y. Let us apply the Broyden update technique to (111), J, and L. That is, suppose that (11) J = L + Then by (110) (y Lv)vT v T v (113) v = J T s = L T s + v(y Lv)T s v T. v This expression implies that v must have the form for some α R. Substituting this bac into (113) we get Hence (114) α = v = αl T s αl T s = L T s + αlt s(y αll T s) T s α s T LL T. s [ s T ] y s T. Ms Consequently, such a matrix J satisfying (113) exists only if s T y > 0 in which case with yielding J = L + (y αms)st L αs T, Ms α = [ s T ] 1/ y s T, Ms (115) M = M + yyt y T s MssT M s T Ms. Moreover, the Cholesy factorization for M can be obtained directly from the matrices J. Specifically, if the QR factorization of J T is J T = QR, we can set L = R yielding M = JJ T = R T Q T QR = LL T. The formula for updating the inverses is again given by applying the Sherman-Morrison-Woodbury formula to obtain (116) H = H + (s + Hy)T yss T (s T y) HysT + sy T H s T, y where H = M 1. The update (115) is called the BFGS update and (116) the inverse BFGS update. The letter BFGS stand for Broyden, Flethcher, Goldfarb, and Shanno. We have shown that beginning with a symmetric positive definite matrix M we can obtain a symmetric and positive definite update M +1 that satisfies the MSE M +1 s = y by applying the formula (115) whenever s T y > 0. We must now address the question of how to choose x +1 so that s T y > 0. Recall that and where y = y = f(x +1 ) f(x ) s = x +1 x = t d, d = t H f(x ).
11 9 8. SEARCH DIRECTIONS FOR UNCONSTRAINED OPTIMIZATION is the matrix secant search direction and t is the stepsize. Hence y T s = f(x +1 ) T s f(x ) T s = t ( f(x + t d ) T d f(x ) T d ), where d := H f(x ). Since H is positive definite the direction d is a descent direction for f at x and so t > 0. Therefore, to insure that s T y > 0 we need only show that t > 0 can be choosen so that (117) f(x + t d ) T d β f(x ) T d for some β (0, 1) since in this case f(x + t d ) T d f(x ) T d (β 1) f(x ) T d > 0. But this precisely the second condition in the wea Wolfe conditions with β = c. Hence a successful BFGS update can always be obtained. The BFGS update and is currently considered the best matrix secant update for minimization. BFGS Updating σ := s T y ŝ := s /σ ŷ := y /σ H +1 := H + (ŝ H ŷ )(ŝ ) T + ŝ (ŝ H ŷ ) T (ŝ H ŷ ) T ŷ ŝ (ŝ ) T
1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν
1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection
More informationMatrix Secant Methods
Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:
More informationMath 408A: Non-Linear Optimization
February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More informationMaria Cameron. f(x) = 1 n
Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationLecture 18: November Review on Primal-dual interior-poit methods
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationQuasi-Newton Methods
Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationNotes on Numerical Optimization
Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationMarch 8, 2010 MATH 408 FINAL EXAM SAMPLE
March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More information1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min
LLNL IM Release number: LLNL-JRNL-745068 A STRUCTURED QUASI-NEWTON ALGORITHM FOR OPTIMIZING WITH INCOMPLETE HESSIAN INFORMATION COSMIN G. PETRA, NAIYUAN CHIANG, AND MIHAI ANITESCU Abstract. We present
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationSolutions Preliminary Examination in Numerical Analysis January, 2017
Solutions Preliminary Examination in Numerical Analysis January, 07 Root Finding The roots are -,0, a) First consider x 0 > Let x n+ = + ε and x n = + δ with δ > 0 The iteration gives 0 < ε δ < 3, which
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationMultipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations
Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,
More informationABSTRACT 1. INTRODUCTION
A DIAGONAL-AUGMENTED QUASI-NEWTON METHOD WITH APPLICATION TO FACTORIZATION MACHINES Aryan Mohtari and Amir Ingber Department of Electrical and Systems Engineering, University of Pennsylvania, PA, USA Big-data
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationj=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k
Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationNumerical Methods for Large-Scale Nonlinear Systems
Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationNonlinear equations. Norms for R n. Convergence orders for iterative methods
Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationUnconstrained Multivariate Optimization
Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued
More informationOn fast trust region methods for quadratic models with linear constraints. M.J.D. Powell
DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationStatistics 580 Optimization Methods
Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of
More informationJim Lambers MAT 419/519 Summer Session Lecture 11 Notes
Jim Lambers MAT 49/59 Summer Session 20-2 Lecture Notes These notes correspond to Section 34 in the text Broyden s Method One of the drawbacks of using Newton s Method to solve a system of nonlinear equations
More informationBindel, Spring 2016 Numerical Analysis (CS 4220) Notes for
Life beyond Newton Notes for 2016-04-08 Newton s method has many attractive properties, particularly when we combine it with a globalization strategy. Unfortunately, Newton steps are not cheap. At each
More informationCubic regularization in symmetric rank-1 quasi-newton methods
Math. Prog. Comp. (2018) 10:457 486 https://doi.org/10.1007/s12532-018-0136-7 FULL LENGTH PAPER Cubic regularization in symmetric rank-1 quasi-newton methods Hande Y. Benson 1 David F. Shanno 2 Received:
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More information1. Introduction. We analyze a trust region version of Newton s method for the optimization problem
SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John
More informationRandomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms
Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Robert M. Gower and Peter Richtárik School of Mathematics University of Edinburgh United Kingdom ebruary 4, 06 Abstract
More informationarxiv: v3 [math.na] 23 Mar 2016
Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Robert M. Gower and Peter Richtárik arxiv:602.0768v3 [math.na] 23 Mar 206 School of Mathematics University of Edinburgh
More informationHYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract
HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationPreconditioned conjugate gradient algorithms with column scaling
Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationA COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS
A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS MEHIDDIN AL-BAALI AND HUMAID KHALFAN Abstract. Techniques for obtaining safely positive definite Hessian approximations with selfscaling
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationAM 205: lecture 18. Last time: optimization methods Today: conditions for optimality
AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationMath and Numerical Methods Review
Math and Numerical Methods Review Michael Caracotsios, Ph.D. Clinical Associate Professor Chemical Engineering Department University of Illinois at Chicago Introduction In the study of chemical engineering
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationMATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.
MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.
More informationHandling Nonpositive Curvature in a Limited Memory Steepest Descent Method
Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationOptimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
More informationNONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM
NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth
More informationFIXED POINT ITERATIONS
FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in
More informationA new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints
Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationImproved Damped Quasi-Newton Methods for Unconstrained Optimization
Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified
More informationOptimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes
Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding
More informationOptimization II: Unconstrained Multivariable
Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More informationNumerical Algorithms as Dynamical Systems
A Study on Numerical Algorithms as Dynamical Systems Moody Chu North Carolina State University What This Study Is About? To recast many numerical algorithms as special dynamical systems, whence to derive
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationResearch Article A Two-Step Matrix-Free Secant Method for Solving Large-Scale Systems of Nonlinear Equations
Applied Mathematics Volume 2012, Article ID 348654, 9 pages doi:10.1155/2012/348654 Research Article A Two-Step Matrix-Free Secant Method for Solving Large-Scale Systems of Nonlinear Equations M. Y. Waziri,
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS
ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal
More informationSub-Sampled Newton Methods I: Globally Convergent Algorithms
Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems
More informationc 2007 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More information