UNIVERSITY OF CALIFORNIA, SAN DIEGO

Size: px
Start display at page:

Download "UNIVERSITY OF CALIFORNIA, SAN DIEGO"

Transcription

1 UNIVERSITY OF CALIFORNIA, SAN DIEGO Reduced Hessian Quasi-Newton Methods for Optimization A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Mathematics by Michael Wallace Leonard Committee in charge: Professor Philip E. Gill, Chair Professor Randolph E. Bank Professor James R. Bunch Professor Scott B. Baden Professor Pao C. Chau 1995

2 Copyright c 1995 Michael Wallace Leonard All rights reserved.

3 The dissertation of Michael Wallace Leonard is approved, and it is acceptable in quality and form for publication on microfilm: Professor Philip E. Gill, Chair University of California, San Diego 1995 iii

4 This dissertation is dedicated to my mother and father. iv

5 Contents Signature Page iii Dedication iv Table of Contents vi List of Tables vii Preface viii Acknowledgements xiii Curriculum Vita xiv Abstract xv 1 Introduction to Unconstrained Optimization Newton s method Quasi-Newton methods Minimizing strictly convex quadratic functions Minimizing convex objective functions Computation of the search direction Notation Using Cholesky factors Using conjugate-direction matrices Transformed and reduced Hessians Reduced-Hessian Methods for Unconstrained Optimization Fenelon s reduced-hessian BFGS method The Gram-Schmidt process The BFGS update to R Z Reduced inverse Hessian methods An extension of Fenelon s method The effective approximate Hessian Lingering on a subspace Updating Z when p = p r Calculating s Z and y ɛ Z The form of R Z when using the BFGS update Updating R Z after the computation of p The Broyden update to R Z 41 v

6 2.5.6 A reduced-hessian algorithm with lingering Rescaling Reduced Hessians Self-scaling variable metric methods Rescaling conjugate-direction matrices Definition of p Rescaling V The conjugate-direction rescaling algorithm Convergence properties Extending Algorithm RH Reinitializing the approximate curvature Numerical results Rescaling combined with lingering Numerical results Algorithm RHRL applied to a quadratic Equivalence of Reduced-Hessian and Conjugate-Direction Rescaling A search-direction basis for range(v 1 ) A transformed Hessian associated with B How rescaling V affects Ū T BŪ The proof of equivalence Reduced-Hessian Methods for Large-Scale Unconstrained Optimization Large-scale quasi-newton methods Extending Algorithm RH to large problems Imposing a storage limit The deletion procedure The computation of T The updates to ḡ Z and R Z Gradient-based reduced-hessian algorithms Quadratic termination Replacing g with p Numerical results Algorithm RHR-L-P applied to quadratics Reduced-Hessian Methods for Linearly-Constrained Problems Linearly constrained optimization A dynamic null-space method for LEP Numerical results Bibliography 125 vi

7 List of Tables 2.1 Alternate methods for computing Z Alternate values for σ Test Problems from Moré et al Results for Algorithm RHR using R1, R4 and R Results for Algorithm RHRL on problems Results for Algorithm RHRL on problems Comparing p from CG and Algorithm RH-L-G on quadratics Iterations/Functions for RHR-L-G (m = 5) Iterations/Functions for RHR-L-P (m = 5) Results for RHR-L-P using R3 R5 (m = 5) on Set # Results for RHR-L-P using R3 R5 (m = 5) on Set # RHR-L-P using different m with R RHR-L-P (R4) for m ranging from 2 to n Results for RHR-L-P and L-BFGS-B (m = 5) on Set # Results for RHR-L-P and L-BFGS-B (m = 5) on Set # Results for LEPs (m L = 5, δ = 10 10, N T g 10 6 ) Results for LEPs (m L = 8, δ = 10 10, N T g 10 6 ) vii

8 Preface This thesis consists of seven chapters and a bibliography. Each chapter starts with a review of the literature and proceeds to new material developed by the author under the direction of the Chair of the dissertation committee. All lemmas, theorems, corollaries and algorithms are those of the author unless otherwise stated. Problems from all areas of science and engineering can be posed as optimization problems. An optimization problem involves a set of independent variables, and often includes constraints or restrictions that define acceptable values of the variables. The solution of an optimization problem is a set of allowed values of the variables for which some objective function achieves its maximum or minimum value. The class of model-based methods form quadratic approximations of optimization problems using first and sometimes second derivatives of the objective and constraint functions. If no constraints are present, an optimization problem is said to be unconstrained. The formulation of effective methods for the unconstrained case is the first step towards defining methods for constrained optimization. The unconstrained optimization problem is considered in Chapters 1 5. Methods for problems with linear equality constraints are considered in Chapter 6. Chapter 1 opens with a discussion of Newton s method for unconstrained optimization. Newton s method is a model-based method that requires both first and second derivatives. In Section 1.2 we move on to quasi-newton methods, which are intended for the situation when the provision of analytic second derivatives is inconvenient or impossible. Quasi-Newton methods use only first derivatives to build up an approximate Hessian over a number of iterations. At viii

9 each iteration of a quasi-newton method, the approximate Hessian is altered to incorporate new curvature information. This process, which is known as an update, involves the addition of a low-rank matrix (usually of rank one or rank two). This thesis will be concerned with a class of rank-two updates known as the Broyden class. The most important member of this class is the so-called Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula. In Chapter 2 we consider quasi-newton methods from a completely different point of view. Quasi-Newton methods that employ updates from the Broyden class are known to accumulate approximate curvature in a sequence of expanding subspaces. It follows that the search direction can be defined using matrices of smaller dimension than the approximate Hessian. In exact arithmetic these so-called reduced Hessians generate the same iterates as the standard quasi- Newton methods. This result is the basis for all of the new algorithms defined in this thesis. Reduced-Hessian and reduced inverse Hessian methods are considered in Sections 2.1 and 2.2 respectively. In Section 2.3 we propose Algorithm RH, which is the template algorithm for this thesis. In Section 2.5 this algorithm is generalized to include a lingering scheme (Algorithm RHL) that allows the iterates to be restricted to certain low dimensional manifolds. In practice, the choice of initial approximate Hessian can greatly influence the performance of quasi-newton methods. In the absence of exact secondderivative information, the approximate Hessian is often initialized to the identity matrix. Several authors have observed that a poor choice of initial approximate Hessian can lead to inefficiences especially if the Hessian itself is ill-conditioned. These inefficiences can lead to a large number of function evaluations in some cases. ix

10 Rescaling techniques are intended to address this difficulty and are the subject of Chapter 3. The rescaling methods of Oren and Luenberger [39], Siegel [45] and Lalee and Nocedal [27] are discussed. In particular, the conjugatedirection rescaling method of Siegel (Algorithm CDR), which is also a variant of the BFGS method, is described in some detail. Algorithm CDR (page 48) has been shown to be effective in solving ill-conditioned problems. Algorithm CDR has notable similarities to reduced-hessian methods, and two new rescaling algorithms follow naturally from the interpretation of Algorithm CDR as a reduced Hessian method. These algorithms are derived in Sections 3.3 and 3.4. The first (Algorithm RHR) is a modification of Algorithm RH; the second (Algorithm RHRL) is derived from Algorithm RHL. Numerical results are given for both algorithms. Moreover, under certain conditions Algorithm RHRL is shown to converge in a finite number of iterations when applied to a class of quadratic problems. This property, often termed quadratic termination, can be numerically beneficial for quasi-newton methods. In Chapter 4, it is shown that if Algorithm RHRL is used in conjunction with a particular rescaling technique of Siegel [45], then it is equivalent to Algorithm CDR in exact arithmetic. Chapter 4 is mostly technical in nature and may be skipped without loss of continuity. However, the convergence results given in Section 4.4 should be reviewed before passing to Chapter 5. If the problem has many independent variables, it may not be practical to store the Hessian matrix or an approximate Hessian. In Chapter 5, methods for solving large unconstrained problems are reviewed. Conjugate-gradient (CG) methods require storage for only a few vectors and can be used in the large-scale case. However, CG methods can require a large number of iterax

11 tions relative to the problem size and can be prohibitively expensive in terms of function evaluations. In an effort to accelerate CG methods, several authors have proposed limited-memory and reduced-hessian quasi-newton methods. The limited-memory algorithm of Nocedal [35], the successive affine reduction method of Nazareth [34], the reduced-hessian method of Fenelon [14] and reduced inverse- Hessian methods due to Siegel [46] are reviewed. In Chapter 5, new reduced-hessian rescaling algorithms are derived as extensions of Algorithms RH and RHR. These algorithms (Algorithms RHR-L-G and RHR-L-P) employ the rescaling method of Algorithm RHR. Algorithm RHR- L-P shares features of the methods of Fenelon, Nazareth and Siegel. However, the inclusion of rescaling is demonstrated numerically to be essential for efficiency. Moreover, Algorithm RHR-L-P is shown to enjoy the property of quadratic termination, which is shown to be beneficial when the algorithm is applied to general functions. Chapter 6 considers the minimization of a function subject to linear equality constraints. Two algorithms (Algorithms RH-LEP and RHR-LEP) extend reduced-hessian methods to problems with linear constraints. Numerical results are given comparing Algorithm RHR-LEP with a standard method for solving linearly constrained problems. In summary, a total of seven new reduced-hessian algorithms are proposed. Algorithm RH (p. 28) The algorithm template. Algorithm RHL (p. 41) Uses a lingering scheme that constrains the iterates to remain on a manifold. Algorithm RHR (p. 52) Rescales when approximate curvature is obtained xi

12 in a new subspace. Algorithm RHRL (p. 56) Exploits the special form of the reduced Hessian resulting from the lingering strategy. This special form allows rescaling on larger subspaces. Algorithm RHR-L-G (p. 95) A gradient-based method with rescaling for large-scale optimization. Algorithm RHR-L-P (p. 95) A direction-based method with rescaling for large-scale optimization. This algorithm converges in a finite number of iterations when applied to a quadratic function. Algorithm RHR-LEP (p. 123) A reduced-hessian rescaling method for linear equality-constrained problems. xii

13 Acknowledgements I am pleased to acknowledge my advisor, Professor Philip E. Gill. I became interested in doing research while I was a student in the Master of Arts program, but writing a dissertation seemed an unlikely task. However, Professor Gill thought that I had the right stuff. He has helped me hurdle many obstacles, not the least of which was transferring into the Ph.D. program. He introduced me to a very interesting and rewarding problem in numerical optimization. He also supported me as a Research Assistant for several summers and during my last quarter as a graduate student. I would like to express my gratitude to Professors James R. Bunch, Randolph E. Bank, Scott B. Baden and Pao C. Chao, all of whom served on my thesis committee. My thanks also to Professors Maria E. Ong and Donald R. Smith from whom I learned much in my capacity as a teaching assistant. My special thanks to Professor Carl H. Fitzgerald. His training inspired in me a much deeper appreciation of mathematics and is the basis of my technical knowledge. My family has always prompted me towards further education. I want to thank my mother and father, my stepmother Maggie and my brother Clif for their encouragement and support while I have been a graduate student. I also want to express my appreciation to all of my friends who have been supportive while I worked on this thesis. My climbing friends Scott Marshall, Michael Smith, Fred Weening and Jeff Gee listened to my ranting and raving and always encouraged me. My friends in the department, Jerome Braunstein, Scott Crass, Sam Eldersveld, Ricardo Fierro, Richard LeBorne, Ned Lucia, Joe Shinnerl, Mark Stankus, Tuan Nguyen and others were all inspirational, informative and helpful. xiii

14 Vita 1982 Appointed U.C. Regents Scholar. University of California, Santa Barbara 1985 B.S., Mathematical Sciences, Highest Honors. University of California, Santa Barbara 1985 B.S., Mechanical Engineering, Highest Honors. University of California, Santa Barbara Associate Engineering Scientist. McDonnell-Douglas Astronautics Corporation High School Mathematics Teacher. Vista Unified School District 1988 Mathematics Single Subject Teaching Credential. University of California, San Diego 1991 M.A., Applied Mathematics. University of California, San Diego Adjunct Mathematics Instructor. Mesa Community College Teaching Assistant. Department of Mathematics, University of California, San Diego 1993 C.Phil., Mathematics. University of California, San Diego 1995 Research Assistant. Department of Mathematics, University of California, San Diego 1995 Ph.D., Mathematics. University of California, San Diego Major Fields of Study Major Field: Mathematics Studies in Numerical Optimization. Professor Philip E. Gill Studies in Numerical Analysis. Professors Randolph E. Bank, James R. Bunch, Philip E. Gill and Donald R. Smith Studies in Complex Analysis. Professor Carl H. Fitzgerald Studies in Applied Algebra. Professors Jeffrey B. Remmel and Adriano M. Garsia xiv

15 Abstract of the Dissertation Reduced Hessian Quasi-Newton Methods for Optimization by Michael Wallace Leonard Doctor of Philosophy in Mathematics University of California, San Diego, 1995 Professor Philip E. Gill, Chair Many methods for optimization are variants of Newton s method, which requires the specification of the Hessian matrix of second derivatives. Quasi- Newton methods are intended for the situation where the Hessian is expensive or difficult to calculate. Quasi-Newton methods use only first derivatives to build an approximate Hessian over a number of iterations. This approximation is updated each iteration by a matrix of low rank. This thesis is concerned with the Broyden class of updates, with emphasis on the Broyden-Fletcher-Goldfarb- Shanno (BFGS) update. Updates from the Broyden class accumulate approximate curvature in a sequence of expanding subspaces. This allows the approximate Hessians to be represented in compact form using smaller reduced approximate Hessians. These reduced matrices offer computational advantages when the objective function is highly nonlinear or the number of variables is large. Although the initial approximate Hessian is arbitrary, some choices may cause quasi-newton methods to fail on highly nonlinear functions. In this case, rescaling can be used to decrease inefficiencies resulting from a poor initial approximate Hessian. Reduced-Hessian methods facilitate a trivial rescaling that implicitly changes the initial curvature as iterations proceed. Methods of this type are shown to have global and superlinear convergence. Moreover, numerical xv

16 results indicate that this rescaling is effective in practice. In the large-scale case, so-called limited-storage reduced-hessian methods offer advantages over conjugate-gradient methods, with only slightly increased memory requirements. We propose two limited-storage methods that utilize rescaling, one of which can be shown to terminate on quadratics. Numerical results suggest that the method is effective compared with other state-of-the-art limited-storage methods. Finally, we extend reduced-hessian methods to problems with linear equality constraints. These methods are the first step towards reduced-hessian methods for the important class of nonlinearly constrained problems. xvi

17 Chapter 1 Introduction to Unconstrained Optimization Problems from all areas of science and engineering can be posed as optimization problems. An optimization problem involves a set of independent variables, and often includes constraints or restrictions that define acceptable values of the variables. The solution of an optimization problem is a set of allowed values of the variables for which some objective function achieves its maximum or minimum value. The class of model-based methods form quadratic approximations of optimization problems using first and sometimes second derivatives of the objective and constraint functions. Consider the unconstrained optimization problem minimize f(x), (1.1) x IR n where f : IR n IR is twice-continuously differentiable. Since maximizing f can be achieved by minimizing f, it suffices to consider only minimization. When no constraints are present, the problem of minimizing f is often called unconstrained optimization. When linear constraints are present, the minimization problem is called linearly-constrained optimization. The unconstrained opti- 1

18 Introduction to Unconstrained Optimization 2 mization problem is introduced in the next section. Linearly constrained optimization is introduced in Chapter 6. Nonlinearly constrained optimization is not considered. However, much of the work given here applies to solving subproblems that might arise in the course of solving nonlinearly constrained problems. 1.1 Newton s method A local minimizer x of (1.1) satisfies f(x ) f(x) for all x in some open neighborhood of x. The necessary optimality conditions at x are f(x ) = 0 and 2 f(x ) 0, where 2 f(x ) 0 means that the Hessian of f at x is positive semi-definite. Sufficient conditions for a point x to be a local minizer are f(x ) = 0 and 2 f(x ) > 0, where 2 f(x ) > 0 means that the Hessian of f at x is positive definite. Since f(x ) = 0, many methods for solving the (1.1) attempt to drive the gradient to zero. The methods considered here are iterative and generate search directions by minimizing quadratic approximations to f. In what follows, let x k denote the kth iterate and p k the kth search direction. Newton s method for solving (1.1) minimizes a quadratic model of f each iteration. The function qk N (x) given by q N k (x) = f(x k ) + f(x k ) T (x x k ) (x x k) T 2 f(x k )(x x k ), (1.2) is a second-order Taylor-series approximation to f at the point x k. If 2 f(x k ) > 0, then q N k (x) has a unique minimizer, corresponding to the point at which q N k (x)

19 Introduction to Unconstrained Optimization 3 vanishes. This point is taken as the new estimate x k+1 of x. If the substitution p = x x k is made in (1.2) then the resulting quadratic model q N k (p) = f(x k ) + f(x k ) T p pt 2 f(x k )p (1.3) can be minimized with respect to p for a search direction p k. If 2 f(x k ) > 0, then the vector p k such that q N k (p k ) = 2 f(x k )p k + f(x k ) = 0 minimizes q N k (p). The new iterate is defined as x k+1 = x k + p k. This leads to the definition of Newton s method given below. Algorithm 1.1. Newton s method Initialize k = 0 and choose x 0. while not converged do Solve 2 f(x k )p = f(x k ) for p k. x k+1 = x k + p k. k k + 1 end do We now summarize the convergence properties of Newton s method. It is important to note that the method seeks points at which the gradient vanishes and has no particular affinity for minimizers. In the following theorem we will let x denote a point such that f( x) = 0. Theorem 1.1 Let f : IR n IR be a twice-continuously differentiable mapping defined in an open set D, and assume that f( x) = 0 for some x D and that 2 f( x) is nonsingular. Then there is an open set S such that for any x 0 S the Newton iterates are well defined, remain in S, and converge to x. Proof. See Moré and Sorenson [30, pp ].

20 Introduction to Unconstrained Optimization 4 The rate or order of convergence of a sequence of iterates is as important as its convergence. If a sequence {x k } converges to x and x k+1 x C x k x p (1.4) for some positive constant C, then {x k } is said to converge with order p. The special cases of p = 1 and p = 2 correspond to linear and quadratic convergence respectively. In the case of linear convergence, the constant C must satisfy C (0, 1). Note that if C is close to 1, linear convergence can be unsatisfactory. For example, if C =.9 and x k x =.1, then roughly 21 iterations may be required to attain x k x =.01. A sequence {x k } that converges to x and satisfies x k+1 x β k x k x, for some sequence {β k } that converges to zero, is said to converge superlinearly. Note that a sequence that converges superlinearly also converges linearly. Moreover, a sequence that converges quadratically converges superlinearly. In this sense, superlinear convergence can be considered a middle ground between linear and quadratic convergence. We now state order of convergence results for Newton s method (for proofs of these results, see Moré and Sorenson [30]). If f satisfies the conditions of Theorem 1.1, the iterates converge to x superlinearly. Moreover, if the Hessian is Lipschitz continuous at x, i.e., 2 f(x) 2 f( x) κ x x (κ > 0), (1.5) then {x k } converges quadratically. These asymptotic rates of convergence of Newton s method are the benchmark for all other methods that use only first

21 Introduction to Unconstrained Optimization 5 and second derivatives of f. Note that since x satisfies f(x ) = 0, these results hold also for minimizers. If x 0 is far from x, Newton s method can have several deficiencies. Consider first when 2 f(x k ) is positive definite. In this case, p k is a descent direction satisfying f(x k ) T p k < 0. However, since the quadratic model q N k is only a local approximation of f, it is possible that f(x k + p k ) > f(x k ). This problem is alleviated by redefining x k+1 = x k + α k p k, where α k is a positive step length. If p T k f(x k ) < 0, then the existence of ᾱ > 0 such that α k (0, ᾱ) implies f(x k+1 ) < f(x k ) is guaranteed (see Fletcher [15]). The specific value of α k is computed using a line search algorithm that approximately minimizes the univariate function f(x k + αp k ). As a result of the line search, the iterates satisfy f(x k+1 ) < f(x k ) for all k, which is the defining property associated with all descent methods. This thesis is concerned mainly with descent methods that use a line search. Another problem with Algorithm 1.1 arises when 2 f(x k ) is indefinite or singular. In this case, p k may be undefined, non-uniquely defined, or a nondescent direction. This drawback has been successfully overcome by both modified Newton methods and trust-region methods. Modified Newton methods replace 2 f(x k ) with a positive-definite approximation whenever the former is indefinite or singular (see Gill et al. [22] for details). Trust-region methods minimize the quadratic model (1.3) in some small region surrounding x k (see Moré and Sorenson [13, pp ] for further details). Any Newton method requires the definition of O(n 2 ) second-derivatives associated with the Hessian. In some cases, for example when f is the solution to a differential or integral equation, it may be inconvenient or expensive to define

22 Introduction to Unconstrained Optimization 6 the Hessian. In the next section, quasi-newton methods are introduced that solve the unconstrained problem (1.1) using only gradient information. 1.2 Quasi-Newton methods The idea of approximating the Hessian with a symmetric positive-definite matrix was first introduced in Davidon s 1959 paper, Variable metric methods for minimization [9]. If B k denotes an approximate Hessian, then the quadratic model q N k is replaced by q k (x) = f(x k ) + f(x k ) T (x x k ) (x x k) T B k (x x k ). (1.6) In this case, p k is the solution of the subproblem minimize p IR n f(x k ) + f(x k ) T p pt B k p. (1.7) Since B k is positive definite, p k satisfies B k p k = f(x k ) (1.8) and p k is guaranteed to be a descent direction. Approximate second-derivative information obtained in moving from x k to x k+1 is incorporated into B k+1 using an update to B k. Hence, a general quasi-newton method takes the form given in Algorithm 1.2 below. Algorithm 1.2. Quasi-Newton method Initialize k = 0; Choose x 0 and B 0 ; while not converged do Solve B k p k = f(x k ); Compute α k, and set x k+1 = x k + α k p k ;

23 Introduction to Unconstrained Optimization 7 Compute B k+1 by applying an update to B k ; k k + 1; end do It remains to discuss the form of the update to B k and the choice of α k. Define s k = x k+1 x k, g k = f(x k ) and y k = g k+1 g k. The definition of x k+1 implies that s k satisfies s k = α k p k. (1.9) This relationship will used throughout this thesis. The curvature of f along s k at a point x k is defined as s T k 2 f(x k )s k. The gradient of f can be expanded about x k to give ( 1 g k+1 = f(x k + s k ) = g k + 0 ) 2 f(x k + ξs k )dξ s k. It follows from the definition of y k that s T k 2 f(x k )s k s T k y k. (1.10) The quantity s T k y k is called the approximate curvature of f at x k along s k. Next, we present a class of low-rank changes to B k that ensure s T k B k+1 s k = s T k y k, (1.11) so that B k+1 incorporates the correct approximate curvature. The well-known Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula defined by B k+1 = B k B ks k s T k B k s T k B ks k + y ky T k s T k y k (1.12) is easily shown to satisfy (1.11). An implementation of Algorithm 1.2 using the BFGS update will be called a BFGS method.

24 Introduction to Unconstrained Optimization 8 The Davidon-Fletcher-Powell (DFP) formula is defined by ( ) B k+1 = B k 1 + st kb k s k yk yk T s T k y k s T k y y ks T kb k + B k s k yk T k s T k y. (1.13) k An implementation of Algorithm 1.2 using the DFP update will be called a DFP method. The approximate Hessians of the so-called Broyden class are defined by the formulae where B k+1 = B k B ks k s T kb k s T k B ks k + y ky T k s T k y k + φ k (s T kb k s k )w k w T k, (1.14) w k = y k s T k y k B ks k s T k B ks k, and φ k is a scalar parameter. Note that the BFGS and DFP formulae correspond to the choices φ k = 0 and φ k = 1. The convex class of updates is a subclass of the Broyden updates for which φ k [0, 1] for all k. The updates from convex class satisfy (1.11) since they are all elements of the Broyden class. Several results follow immediately from the definition of the updates in the Broyden class. First, formulae in the Broyden class apply at most ranktwo updates to B k. Second, updates in the Broyden class are such that B k+1 is symmetric as long as B k is symmetric. Third, if B k is positive definite and φ k is properly chosen (e.g., any φ k 0 is acceptable (see Fletcher [16])), then B k+1 is positive definite if and only if s T ky k > 0. In unconstrained optimization, the value of α k can ensure that s T ky k > 0. In particular, s T ky k is positive if α k satisfies the Wolfe [48] conditions f(x k + α k p k ) f(x k ) + να k g T kp k and g T k+1p k ηg T kp k, (1.15)

25 Introduction to Unconstrained Optimization 9 where 0 < ν < 1/2 and ν η < 1. The existence of such α k is guaranteed if, for example, f is bounded below. In a practical line search, it is often convenient to require α k to satisfy the modified Wolfe conditions f(x k + α k p k ) f(x k ) + να k g T kp k and g T k+1p k η g T kp k. (1.16) The existence of an α k satisfying these conditions can also be guaranteed theoretically. (See Fletcher [15, pp ] for the existence results and further details.) For theoretical discussion, α k is sometimes considered to be an exact minimizer of the univariate function Ψ(α) defined by Ψ(α) = f(x k + αp k ). This choice ensures a positive-definite update since, for such an α k, gk+1p T k = 0, which implies s T k y k > 0. Properties of Algorithm 1.2 when it is applied to a convex quadratic objective function using such an exact line search are given in the next section Minimizing strictly convex quadratic functions Consider the quadratic function q(x) = d + c T x xt Hx, where c IR, d IR n, H IR n n, (1.17) and H is symmetric positive definite and independent of x. This quadratic has a unique minimizer x that satisfies Hx = c. If Algorithm 1.2 is used with an exact line search and an update from the Broyden class, then the following properties hold at the kth (0 < k n) iteration: B k s i = Hs i, (1.18) s T i Hs k = 0, and (1.19) s T i g k = 0, (1.20)

26 Introduction to Unconstrained Optimization 10 for all i < k. Multiplying (1.18) by s T i gives s T i Bs i = s T i Hs i, which implies that the curvature of the quadratic model (1.6) along s i (i < k) is exact. Define S k = ( s 0 s 1 s k 1 ) and assume that s i 0 (0 i n 1). Under this assumption, note that (1.19) implies that the set {s i i n 1} is linearly independent. At the start of the nth iteration, (1.18) implies that B n S n = HS n, and B n = H since S n is nonsingular. It can be shown that x k minimizes q(x) on the manifold defined by x 0 and range(s k ) (see Fletcher [15, pp ]). It follows that x n minimizes q(x). This implies that Algorithm 1.2 with an exact line search finds the minimizer of the quadratic (1.17) in at most n steps, a property often referred to as quadratic termination. Further properties of Algorithm 1.2 follow from its well-known equivalence to the conjugate-gradient method when used to minimize convex quadratic functions using an exact line search. If B 0 = I and the updates are from the Broyden class, then for all k 1 and 0 i < k, g T i g k = 0 and (1.21) p k = g k + β k 1 p k 1, (1.22) where β k 1 = g k 2 / g k 1 2 (see Fletcher [15, p. 65] for further details) Minimizing convex objective functions Much of the convergence theory for quasi-newton methods involves convex functions. The theory focuses on two properties of the sequence of iterates. First, given an arbitrary starting point x 0, will the sequence of iterates converge to x? If so, then the method is said to be globally convergent. Second, what is the order of convergence of the sequence of iterates? In the next two sections, we present

27 Introduction to Unconstrained Optimization 11 some of the results from the literature regarding the convergence properties of quasi-newton methods. Global convergence of quasi-newton methods Consider the application of Algorithm 1.2 to a convex function. Powell has shown that in this case, the BFGS method with a Wolfe line search is globally convergent with lim inf g k = 0 (see Powell [40]). Byrd, Nocedal and Yuan have extended Powell s result to a quasi-newton method using any update from the convex class except the DFP update (see Byrd et al. [6]). Uniformly convex functions are an important subclass of the set of convex functions. The Hessian of these functions satisfy m z 2 z T 2 f(x)z M z 2, (1.23) for all x and z in IR n. It follows that a function in this class has a unique minimizer x. Although the DFP method is on the boundary of the convex class, it has not been shown to be globally convergent, even on uniformly convex functions (see Nocedal [36]). Order of convergence of quasi-newton methods The order of convergence of a sequence has been defined in Section 1.1. The method of steepest descent, which sets p k = g k for all k, is known to converge linearly from any starting point (see, for example, Gill et al. [22, p. 103]). This poor rate of convergence occurs because steepest descent uses no secondderivative information (the method implicitly chooses B k = I for all k). On the other hand, Newton s method can be shown to converge quadratically for x 0 sufficiently close to x if 2 f(x) is nonsingular and satisfies the Lipschitz condition (1.5) at x. Since quasi-newton methods use an approximation to the Hessian,

28 Introduction to Unconstrained Optimization 12 they might be expected to converge at a rate between linear and quadratic. This is indeed the case. The following order of convergence results apply to the general quasi- Newton method given in Algorithm 1.2. It has been shown that {x k } converges superlinearly to x if and only if (B k 2 f(x ))s k lim k s k = 0 (1.24) (see Dennis and Moré [11]). Hence, the approximate curvature must converge to the curvature in f along the unit directions s k / s k. In a quasi-newton method using a Wolfe line search, it has been shown that if the search direction approaches the Newton direction asymptotically, the step length α k = 1 is acceptable for large enough k (see Dennis and Moré [12]). Suppose now that a quasi-newton method using updates from the convex class converges to a point x such that 2 f(x ) is nonsingular. In this case, if f is convex, Powell has shown that the BFGS method with a Wolfe line search converges superlinearly as long as the unit step length is taken whenever possible (see [40]). This result has been extended to every member of the convex class of Broyden updates except the DFP update (see Byrd et al. [6]). The DFP method has not been shown to be superlinearly convergent when using a Wolfe line search. However, there are convergence results concerning the application of the DFP method using an exact line search (see Nocedal [36] for further discussion). In Section 1.2.1, it was noted that if Algorithm 1.2 with exact line search is applied to a strictly convex quadratic function, and the steps s k (0 k n 1) are nonzero, then B n = H. When applied to general functions, it should be noted that B k need not converge to 2 f(x ) even when {x k } converges to x (see Dennis

29 Introduction to Unconstrained Optimization 13 and Moré [11]). The global and superlinear convergence of Algorithm 1.2 when applied to general f using a Wolfe line search remains an open question. 1.3 Computation of the search direction Various methods for solving the system B k p k = g k in a practical implementation of Algorithm 1.2 are discussed in this section Notation For simplicity, the subscript k is suppressed in much of what follows. Bars, tildes and cups are used to define updated quantities obtained during the kth iteration. Underlines are sometimes used to denote quantities associated with x k 1. The use of the subscript will be retained in the definition of sets that contain a sequence of quantities belonging to different iterations, e.g., {g 0, g 1,..., g k }. Also, for clarity, the use of subscripts will be retained in the statement of results. Throughout the thesis, I j denotes the j j identity matrix, where j satisfies 1 j < n. The matrix I is reserved for the n n identity matrix. The vector e i denotes the ith column of an identity matrix whose order depends on the context. If u IR n and v IR m, then (u, v) T denotes the column vector of order n + m whose components are the components of u and v Using Cholesky factors The equations Bp = g can be solved if an upper-triangular matrix R is known such that B = R T R. If B is obtained from B using a Broyden update, then an upper-triangular matrix R satisfying B = R T R can be obtained from a rank-one

30 Introduction to Unconstrained Optimization 14 update to R (see Goldfarb [24], Dennis and Schnabel [10]). In particular, the BFGS update can be written as R = S(R + u(w R T u) T ) where u = Rs Rs, w = y, (1.25) (y T s) 1/2 and S is an orthogonal matrix that transforms R + u(w R T u) T to uppertriangular form. Since many choices of S yield an upper-triangular R, we now describe the particular choice used throughout the paper. The matrix S is of the form S = S 2 S 1, where S 1 and S 2 are products of Givens matrices. The matrix S 1 is defined by S 1 = P n,1 P n,n 2 P n,n 1, where P nj (1 j n 1) is a Givens matrix in the (j, n) plane designed to annihilate the jth element of P n,j+1 P n,n 1 u. The product S 1 R is upper triangular except for the presence of a row spike in the nth row. Since S 1 u = ±e n, the matrix S 1 (R + u(w R T u) T ) is also upper triangular except for a row-spike in the nth row. This matrix is restored to upper-triangular form using a second product of Givens matrices. In particular, S 2 = P n 1,n P n 2,n P 1n, where P in (1 i n 1) is a Givens matrix in the (i, n) plane defined to annihilate the (n, i) element of P i 1,n P 1n S 1 (R+u(w R T u) T ). For simplicity, the BFGS update (1.25) and the Broyden update to R will be written R = BFGS(R, s, y) and R = Broyden(R, s, y). (1.26) The form of S will be as described in the last paragraph. Another choice of S that implies S 1 (R + u(w R T u) T ) is upper Hessenberg is described by Gill, Golub, Murray and Saunders [17]. Goldfarb prefers to write the update as a product of R and a rank-one modification of the identity. This form of the update is also easily restored to upper-triangular form (see Goldfarb [24]).

31 Introduction to Unconstrained Optimization 15 Some authors reserve the term Cholesky factor of a positive definite matrix B to mean the triangular factor with positive diagonals satisfying B = R T R. However, throughout this thesis, the diagonal components of R are not restricted in sign, but R will be called the Cholesky factor of B Using conjugate-direction matrices Since B is symmetric positive definite, there exists a nonsingular matrix V such that V T BV = I. The columns of V are said to be conjugate with respect to B. In terms of V, the approximate Hessian satisfies B 1 = V V T, (1.27) which implies that the solution of (1.7) may be written as p = V V T g. (1.28) If B is defined by the BFGS formula (1.12), then a formula for V satisfying V T B V = I can be obtained from the product form of the BFGS update (see Brodlie, Gourlay, and Greenstadt [3]). The formula is given by V = (I su T )V Ω, where u = Bs (s T y) 1/2 (s T Bs) + y 1/2 s T y (1.29) and Ω is an orthogonal matrix. Powell has proposed that Ω be defined as follows. Let Ṽ denote the product V Ω. The matrix Ω is chosen as a lower-hessenberg matrix such that the first column of Ṽ is parallel to s (see Powell [42]). Let g V be defined as g V = V T g, (1.30) and define Ω such that Ω T = P 12 P 23 P n 1,n, where P i,i+1 is a rotation in the (i, i+1) plane chosen to annihilate the (i+1)th component of P i+1,i+2 P n 1,n g V.

32 Introduction to Unconstrained Optimization 16 Then, Ω is an orthogonal lower-hessenberg matrix such that Ω T g V = g V e 1. Furthermore, (1.28) and the relation s = αp give Hence, the first column of Ṽ is parallel to s. Ṽ e 1 = 1 s. (1.31) α g V With this choice of Ω, Powell shows that the columns of V satisfy v i = s, if i = 1; (s T y) 1/2 (1.32) ṽ i ṽt i y s T y s, otherwise. Note that the matrix B in the update (1.29) has been eliminated in the formulae (1.32). Formulae have also been derived for matrices V that satisfy V T B V = I, where B is any Broyden update to B (see Siegel [47]). 1.4 Transformed and reduced Hessians Let Q denote an n n orthogonal matrix and let B denote a positive-definite approximation to 2 f(x). The matrix Q T BQ is called the transformed approximate Hessian. If Q is partitioned as Q = ( Z W ), the transformed Hessian has a corresponding partition Q T BQ = ZT BZ W T BZ Z T BW W T BW. The positive-definite submatrices Z T BZ and W T BW are called reduced approximate Hessians. Transformed Hessians are often used in the solution of constrained optimization problems (see, for example, Gill et al. [21]). In the next chapter, a

33 Introduction to Unconstrained Optimization 17 particular choice of Q will be seen to give block-diagonal structure to the approximate Hessians associated with quasi-newton methods for unconstrained optimization. This simplification leads to another technique for solving Bp = g that involves a reduced Hessian. Reduced Hessian quasi-newton methods using this technique are the subject of Chapter 2.

34 Chapter 2 Reduced-Hessian Methods for Unconstrained Optimization In her dissertation, Fenelon [14] has shown that the BFGS method accumulates approximate curvature information in a sequence of expanding subspaces. This feature is used to show that the BFGS search direction can often be generated with matrices of smaller dimension than the approximate Hessian. Use of these reduced approximate Hessians leads to a variant of the BFGS method that can be used to solve problems whose Hessians may be too large to store. In this chapter, reduced Hessian methods are reviewed from Fenelon s point of view. A reduced inverse Hessian method, due to Siegel [46], is reviewed in Section 2.2. Fenelon s and Siegel s work is extended in Sections , giving new reduced-hessian methods that utilize the Broyden class of updates. 18

35 Reduced-Hessian Methods for Unconstrained Optimization Fenelon s reduced-hessian BFGS method Using the equations B i p i = g i and s i = α i p i for 0 i k, the BFGS updates from B 0 to B k can be telescoped to give B k = B 0 + k 1 i=0 ( gi g T i g T i p i + y iy T i s T i y i ). (2.1) If B 0 = σi (σ > 0), then (2.1) can be used to show that the solution of B k p k = g k is given by p k = 1 σ g k 1 σ k 1 i=0 ( ) g T i p k g gi T i + yt i p k y p i s T i. (2.2) i y i Hence, if G k denotes the set of vectors G k = {g 0, g 1,..., g k }, (2.3) then (2.2) implies that p k span(g k ). The following lemma summarizes this result. Lemma 2.1 (Fenelon) If the BFGS method is used to solve the unconstrained minimization problem (1.1) with B 0 = σi (σ > 0), then p k span(g k ) for all k. Using this result, Fenelon has shown that if Z k is a full-rank matrix such that range(z k ) = span(g k ), then p k = Z k p Z, where p Z = (Z T kb k Z k ) 1 Z T kg k. (2.4) This form of the search direction implies a reduced-hessian implementation of the BFGS method employing Z k and an upper-triangular matrix R Z such that R T ZR Z = Z T kb k Z k.

36 Reduced-Hessian Methods for Unconstrained Optimization The Gram-Schmidt process The matrix Z k is obtained from G k using the Gram-Schmidt process. This process gives an orthonormal basis for G k. The choice of orthonormal basis is motivated by the result cond(z T k B k Z k ) cond(b k ) if Z T k Z k = I rk (see Gill et al. [22, p. 162]). To simplify the description of this process we drop the subscript k, as discussed in Section At the start of the first iteration, Z is initialized to g 0 / g 0. During the kth iteration, assume that the columns of Z approximate an orthonormal basis for span(g). The matrix Z is defined so that range( Z) = span(g ḡ) as follows. The vector ḡ can be uniquely written as ḡ = ḡ R + ḡ N, where ḡ R range(z), ḡ N null(z T ). The vector ḡ R satisfies ḡ R = ZZ T ḡ, which implies that the component of ḡ orthogonal to range(z) satisfies ḡ N = ḡ ZZ T ḡ = (I ZZ T )ḡ. Let zḡ denote the normalized component of ḡ orthogonal to range(z). If we define ρḡ = ḡ N, then zḡ = ḡ N /ρḡ. Note that if ρḡ = 0, then ḡ range(z). In this case, we will define Z = Z. To summarize, if r denotes the column dimension of Z, we define r, if ρḡ = 0; r = r + 1, otherwise. Using r, zḡ and Z satisfy and zḡ = Z = 0, if r = r; 1 (I ZZ T )ḡ, otherwise, ρḡ Z, if r = r; ( Z zḡ ), otherwise. (2.5) (2.6) (2.7)

37 Reduced-Hessian Methods for Unconstrained Optimization 21 It is well-known that the Gram-Schmidt process is unstable in the presence of computer round-off error (see Golub and Van Loan [25, p. 218]). Several methods have been proposed to stabilize the process. These methods are given in Table 2.1. The advantages and disadvantages of each method are also given in the table. Note that a flop is defined as a multiplication and an addition. The flop counts given in the table are only approximations of the actual counts. The value of 3.2nr flops for the reorthogonalization process is an average that results if 3 reorthogonalizations are performed every 5 iterations. Table 2.1: Alternate methods for computing Z Method Advantage Disadvantage Gram-Schmidt Simple Unstable 2nr flops Modified More stable Z must be recomputed Gram-Schmidt than GS each iteration. Gram-Schmidt with Stable Expensive, e.g., reorthogonalization 3.2nr flops (Daniel et al. [76], Fenelon [81]) Implicitly nr + O(r 2 ) flops Expensive if (Siegel [92]) r is large Another technique for stabilizing the process suggested by Daniel et al. [8] (and used by Siegel [46]) is to ignore the component of ḡ orthogonal to range(z) if it is small (but possibly nonzero) relative to g. In this case, the definition of r satisfies r = r, if ρḡ ɛ ḡ ; r + 1, otherwise, (2.8) where ɛ 0 is a preassigned constant.

38 Reduced-Hessian Methods for Unconstrained Optimization 22 The matrix Z that results when this definition of r is used has properties that depend on the choice of ɛ. If ɛ = 0, then in exact arithmetic the columns of Z form an orthonormal basis for span(g). Moreover, for any ɛ (ɛ 0), the matrix Z forms an orthonormal basis for a subset of G. If K ɛ = {k 1, k 2,..., k r } denotes the set of indices for which ρ g > ɛ g and G ɛ = ( g k1 g k2 g kr ) is the matrix of corresponding gradients, then the columns of Z form an orthonormal basis for range(g ɛ ). Gradients satisfying ρ g > ɛ g are said to be accepted ; otherwise, they are said to be rejected. Hence, G ɛ is the matrix of accepted gradients associated with a particular choice of ɛ. Note that the dimension of Z is nondecreasing with k. During iteration k + 1, the vector ḡ Z (ḡ Z = Z T ḡ) is needed to compute the next search direction p. Since Z T ḡ, if r = r; ḡ Z = ZT ḡ, otherwise, ρḡ (2.9) this quantity is a by-product of the computation of Z. If r, ḡ Z and Z satisfy (2.8), (2.9) and (2.7), then we will write ( Z, ḡ Z, r) = GS(Z, ḡ, r, ɛ). (2.10) The BFGS update to R Z If Z, g Z and R Z are known during the kth iteration of a reduced-hessian method, then p is computed using (2.4). Following the calculation of x in the line search, ḡ is either rejected or added to the basis defined by Z. It remains to define a matrix R Z satisfying Z T B Z = RT Z R Z, where B is obtained from B using the BFGS update.

39 Reduced-Hessian Methods for Unconstrained Optimization 23 Let y Z denote the quantity Z T y. If ḡ is rejected, Fenelon employs the method of Gill et al. [17] to obtain R Z from R Z via two rank-one updates involving g Z and y Z. If ḡ is accepted, R Z can be partitioned as R Z = R Z Rḡ, where 0 φ φ is a scalar. The matrix R Z is obtained from R Z using g Z and y Z. The following lemma is used to define Rḡ and φ. Lemma 2.2 (Fenelon) If zḡ denotes the normalized component of g k+1 orthogonal to span(g k ), then Z T B k+1 zḡ = yt g s T y y Z and zḡt B k+1 zḡ = σ + (z ḡ T y) 2 s T k y. (2.11) k (Although the relation zḡt g = 0 is used in the proof of Lemma 2.2, it was not used to simplify (2.11).) The solution of an upper-triangular system involving R Z and (y T g/s T y)y Z is used to define Rḡ. The value φ is then obtained from Rḡ and zḡt Bzḡ. 2.2 Reduced inverse Hessian methods Many quasi-newton algorithms are defined in terms of the inverse approximate Hessian H k = Bk 1. The Broyden update to H k is H k+1 = M T k H k M k + s ks T k s T k y k M k = I s ky T k s T k y k ψ k (y T k H k y k )r k r T k, and r k = H ky k y T k H ky k The parameter φ k is related to ψ k by the equation s k s T k y. k where (2.12) φ k (ψ k 1)(y T k H k y k )(s T k B k s k ) = ψ k (φ k 1)(s T k y k ) 2.

40 Reduced-Hessian Methods for Unconstrained Optimization 24 Note that the values ψ k = 0 and ψ k = 1 correspond to the BFGS and the DFP updates respectively. Siegel [46] gives a more general result than Lemma 2.1 that applies to the entire Broyden class. The result is stated below without proof. Lemma 2.3 (Siegel) If Algorithm 1.2 is used to solve the unconstrained minimization problem (1.1) with B 0 = σi (σ > 0) and a Broyden update, then p k span(g k ) for all k. Moreover, if z span(g k ) and w span(g k ), then B k z span(g k ), H k z span(g k ), B k w = σw and H k w = σ 1 w. Let G k denote the matrix of the first k + 1 gradients. For simplicity, assume that these gradients are linearly independent and that k is less than n. Since G k has full column rank, it has a QR factorization of the form G k = Q k T k 0, where Q T k Q k = I and (2.13) T k is nonsingular and upper triangular. Define r k = dim(span(g k )), and partition Q k = ( Z k W k ), where Z k IR n r k. Note that the product Gk = Z k T k defines a skinny QR factorization of G k (see Golub and Van Loan [25, p. 217]). The columns of Z k form an orthonormal basis for range(g k ) and the columns of W k form an orthonormal basis for null(g T k ). If the first k+1 gradients are not linearly independent, Q k is defined as in (2.13), except that G 0 k is used in place of G k. Hence, the first r columns of Q k are still an orthonormal basis for G k. Consider the transformed inverse Hessian Q T k H k Q k. Lemma 2.3 implies that if H 0 = σ 1 I, then Q T k H k Q k is block diagonal and satisfies Q T k H k Q k = ZT k H k Z k 0 0 σ 1 I n rk. (2.14) As the equation for the search direction in terms of H k satisfies p k = H k g k, we have Q T k p k = (Q T k H k Q k )Q T k g k. It follows that p k = Z k (Z T k H k Z k )Z T k g k

Reduced-Hessian Methods for Constrained Optimization

Reduced-Hessian Methods for Constrained Optimization Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its

More information

Preconditioned conjugate gradient algorithms with column scaling

Preconditioned conjugate gradient algorithms with column scaling Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

UC San Diego UC San Diego Electronic Theses and Dissertations

UC San Diego UC San Diego Electronic Theses and Dissertations UC San Diego UC San Diego Electronic Theses and Dissertations Title Projected-search methods for box-constrained optimization Permalink https://escholarship.org/uc/item/99277951 Author Ferry, Michael William

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number Limited-Memory Matrix Methods with Applications 1 Tamara Gibson Kolda 2 Applied Mathematics Program University of Maryland at College Park Abstract. The focus of this dissertation is on matrix decompositions

More information

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS JOHANNES BRUST, OLEG BURDAKOV, JENNIFER B. ERWAY, ROUMMEL F. MARCIA, AND YA-XIANG YUAN Abstract. We present

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University

More information

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

Handling nonpositive curvature in a limited memory steepest descent method

Handling nonpositive curvature in a limited memory steepest descent method IMA Journal of Numerical Analysis (2016) 36, 717 742 doi:10.1093/imanum/drv034 Advance Access publication on July 8, 2015 Handling nonpositive curvature in a limited memory steepest descent method Frank

More information

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK)

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK) Quasi-Newton updates with weighted secant equations by. Gratton, V. Malmedy and Ph. L. oint Report NAXY-09-203 6 October 203 0.5 0 0.5 0.5 0 0.5 ENIEEH-IRI, 2, rue Camichel, 3000 oulouse France LM AMECH,

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS MEHIDDIN AL-BAALI AND HUMAID KHALFAN Abstract. Techniques for obtaining safely positive definite Hessian approximations with selfscaling

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained

More information

Arc Search Algorithms

Arc Search Algorithms Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won

More information

EECS260 Optimization Lecture notes

EECS260 Optimization Lecture notes EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Cubic regularization in symmetric rank-1 quasi-newton methods

Cubic regularization in symmetric rank-1 quasi-newton methods Math. Prog. Comp. (2018) 10:457 486 https://doi.org/10.1007/s12532-018-0136-7 FULL LENGTH PAPER Cubic regularization in symmetric rank-1 quasi-newton methods Hande Y. Benson 1 David F. Shanno 2 Received:

More information

The speed of Shor s R-algorithm

The speed of Shor s R-algorithm IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Lecture 10: September 26

Lecture 10: September 26 0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

ECE133A Applied Numerical Computing Additional Lecture Notes

ECE133A Applied Numerical Computing Additional Lecture Notes Winter Quarter 2018 ECE133A Applied Numerical Computing Additional Lecture Notes L. Vandenberghe ii Contents 1 LU factorization 1 1.1 Definition................................. 1 1.2 Nonsingular sets

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters

More information

MATRIX AND LINEAR ALGEBR A Aided with MATLAB

MATRIX AND LINEAR ALGEBR A Aided with MATLAB Second Edition (Revised) MATRIX AND LINEAR ALGEBR A Aided with MATLAB Kanti Bhushan Datta Matrix and Linear Algebra Aided with MATLAB Second Edition KANTI BHUSHAN DATTA Former Professor Department of Electrical

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis -

Line search methods with variable sample size. Nataša Krklec Jerinkić. - PhD thesis - UNIVERSITY OF NOVI SAD FACULTY OF SCIENCES DEPARTMENT OF MATHEMATICS AND INFORMATICS Nataša Krklec Jerinkić Line search methods with variable sample size - PhD thesis - Novi Sad, 2013 2. 3 Introduction

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming

A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming Roger Fletcher Numerical Analysis Report NA/223, August 2005 Abstract A new quasi-newton scheme for updating a low rank positive semi-definite

More information

An Inexact Newton Method for Optimization

An Inexact Newton Method for Optimization New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS by Johannes Brust, Oleg Burdaov, Jennifer B. Erway, and Roummel F. Marcia Technical Report 07-, Department of Mathematics and Statistics, Wae

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

A projected Hessian for full waveform inversion

A projected Hessian for full waveform inversion CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (c) Figure 1. Update directions for one iteration

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Geometry optimization

Geometry optimization Geometry optimization Trygve Helgaker Centre for Theoretical and Computational Chemistry Department of Chemistry, University of Oslo, Norway European Summer School in Quantum Chemistry (ESQC) 211 Torre

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information