MS&E 318 (CME 338) Large-Scale Numerical Optimization

Size: px

Start display at page:

Download "MS&E 318 (CME 338) Large-Scale Numerical Optimization"

Stanley Sanders
6 years ago
Views:

1 Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2015 Notes 11: NPSOL and SNOPT SQP Methods 1 Overview Here we consider sequential quadratic programming methods (SQP methods) for the general optimization problem NCB minimize φ(x) x R n subject to c(x) = 0, l x u. As with the BCL and LCL approaches, SQP methods use a sequence of subproblems (each representing a major iteration) to find estimates (x k, y k ) that reduce certain augmented Lagrangians. An important function of the subproblems is to estimate which inequalities in l x u are active. If an active-set method is used, it will perform some number of minor iterations on each subproblem. In BCL and LCL methods, the subproblem objective is the current augmented Lagrangian, which must be evaluated each minor iteration. In MINOS, for example, the reduced-gradient method that solves the LCL subproblems requires an average of 2 or 3 evaluations of the functions and gradients per minor iteration. This is acceptable if the function and gradient evaluations are cheap (as they are in the GAMS and AMPL environments) but may be prohibitive in other applications. For example, shape optimization in aerospace design requires the solution of a partial differential equation (PDE) each time φ(x) and c(x) are evaluated, and an adjoint PDE to obtain gradients. SQP methods economize by using quadratic programming (QP) subproblems to estimate the set of active constraints. Many minor iterations may be needed to solve the subproblems, but the functions and gradients are not being evaluated during that process. Instead, each subproblem solution is used to generate a search direction ( x, y) for a linesearch on the augmented Lagrangian (regarded as a function of x and y). Typically only 1 5 function and gradients values are needed in each linesearch, regardless of problem size. However, SQP methods are likely to need more major and minor iterations than LCL methods (i.e., more linear algebra). 2 An NPSOL application NPSOL [9] is a successful dense SQP implementation. It has found wide use within the NAG Library and has proved remarkably robust and efficient on highly nonlinear optimization problems, particularly within the aerospace industry. In the 1990s, the largest application we encountered was a trajectory optimization at McDonnell Douglas Space Systems (Huntington Beach) involving about 2000 constraints and variables, computing controls for the last breath-taking seconds of flight of the DC-Y, a proposed SSTO (Single-Stage-To-Orbit) manned spacecraft. The aim was to minimize the fuel needed as the rocket, descending nose-first to Earth, started rotating at an altitude of 2800 feet and slowed down using rocket thrust just time to land on its tail. NPSOL s solution showed that the landing maneuver could be accomplished using half the fuel estimated by classical techniques. A second optimization was used to determine when the rotation should begin. With an extra constraint limiting acceleration to 3g (since a human crew would be aboard!), the optimal rotation altitude proved to be only 1400 feet the height of the Empire State Building. Such touchdowns would be far more abrupt and astonishing than the Apollo moon landings. (More like the Lunar Excursion Module s take-off from the moon played in reverse!) 79

2 80 MS&E 318 (CME 338) Large-Scale Numerical Optimization 3 Quadratic approximations in a subspace Linearized constraints are a key component of SQP methods, just as they are for LCL methods. They define a subspace in which it is reasonable to form a quadratic approximation to the augmented Lagrangian. Let (x k, y k ) be the current solution estimate and ρ k be the current penalty parameter. Also define J k J(x k ). The following functions are needed: Linear approximation to c(x): c k (x) = c(x k ) + J k (x x k ) Departure from linearity: d k (x) = c(x) c k (x) Modified augmented Lagrangian: M k (x) = φ(x) y T k d k(x) + ρ k d k (x) 2 M k (x) = g 0 (x) ( J(x) J k ) T ỹk (x) 2 M k (x) = H 0 (x) i (ỹ k(x)) i H i (x) + ρ k ( J(x) Jk ) T ( J(x) Jk ) ỹ k (x) y k ρ k d k (x). The last four expressions simplify greatly when x = x k. ỹ k (x k ) = y k, and the terms involving J(x) and ρ k vanish: In particular, d k (x k ) = 0 and M k (x k ) = φ(x k ) M k (x k ) = g 0 (x k ) 2 M k (x k ) = H 0 (x k ) i (y k) i H i (x k ). With x x x k, the quadratic approximation to M k (x) at x k on the subspace c k (x) = 0 is therefore Q k (x) = φ(x k ) + g 0 (x k ) T x xt 2 M k (x k ) x, where the quadratic term may or may not be available. 4 NPSOL Recall that the MINOS subproblem takes the form LC k minimize x R n M k (x) = φ(x) y T k d k(x) ρ k d k (x) 2 subject to c k (x) = 0, l x u. The stabilized LCL method adapts early subproblems judiciously to ensure global convergence, but ultimately the subproblems become LC k and their solutions (x k, y k ) converge rapidly, as predicted by Robinson (1972). The SQP methods in NPSOL and SNOPT are based on the same subproblem, but with the objective M k (x) replaced by a quadratic approximation Q k (x): QP k minimize Q k (x) = φ(x k ) + g 0 (x k ) T x + 1 x R n 2 xt H k x subject to c k (x) = 0, l x u, where H k is a quasi-newton approximation to 2 M k (x) (perhaps with ρ k = 0). To achieve global convergence, QP k is regarded as providing a search direction ( x, y) along which the augmented Lagrangian M k (x) (with ρ k 0) may be reduced as a function of both primal and dual variables. Note that the true Hessian of M k (x) is zero in rows and columns associated with linear variables. In NPSOL, slack variables are therefore excluded from H k, and in SNOPT both

3 Spring 2015, Notes 11 SQP Methods 81 slacks and other linear variables in x are excluded. The quasi-newton Hessian approximations are therefore of the form ( Hk H k =, 0) where H k is positive definite. The positive-definiteness provides an intrinsic bound on the relevant parts of x, improving the validity of the quadratic model. 4.1 NPSOL s merit function Let (x k, y k ) be the primal and dual solution of QP k. NPSOL forms the search direction ( ) x = y ( ) x k x k y k y k and seeks a steplength α such that moving to the point (x, y) (x k + α x, y k + α y) (where 0 < α 1) produces a sufficient decrease in the augmented Lagrangian L(x, y, ρ k ) = φ(x) y T c(x) ρ k c(x) 2 relative to L(x k, y k, ρ k ). Initially ρ k = 0. Later, ρ k is increased if necessary to ensure descent for the merit function. It may also be decreased a limited number of times, because a finite (unknown) value is always adequate for well defined problems, and because ρ k = 0 becomes acceptable within some neighborhood of a solution to NCB. The global convergence properties of NPSOL (and the strategy for adjusting ρ k ) are described in Gill, Murray, Saunders, and Wright [11]. The QP subproblems are assumed to be feasible, and to be solved accurately. 4.2 Hessian updates A dense triangular factorization H k = R T k R k is maintained in NPSOL. After the linesearch, a BFGS update of the form R k+1 = Q k R k (I + r k s T k ), Q k orthogonal is made to improve the quality of the Hessian approximation within some space. Various choices are possible for the vectors r k and s k. They reflect the change in x and the change in gradient of some updated Lagrangian L k (x, y k+1, ρ k+1 ) defined with the latest estimate of y (but with ρ k+1 = 0 where possible). NPSOL uses the least-squares solver LSSOL [5] for the subproblems QP k. LSSOL deals directly with the triangular factor R k. It is unique in avoiding the squared condition number that other solvers fall prey to with Hessians of the form R T k R k. 5 SNOPT The reliability of NPSOL and LSSOL allowed them to be applied to increasingly large problems (particularly at McDonnell Douglas Space Systems, Huntington Beach, CA during the 1990s). However, LSSOL uses orthogonal transformations of J(x k ) to solve the QP subproblems, and as problem size increases the total cost becomes dominated by the associated linear algebra. To a large extent, SNOPT is a large-scale SQP implementation that combines the virtues of MINOS and NPSOL. Sparse Jacobians are handled as in the reduced-gradient method (using LUSOL to maintain a sparse basis factorization), and the same merit function as in NPSOL ensures global convergence.

4 82 MS&E 318 (CME 338) Large-Scale Numerical Optimization The primary innovation is a method for treating the QP Hessian in sparse form. Also, infeasible problems NCB are dealt with more methodically by including an l 1 penalty term within the problem definition (where e is a vector of ones): NCB(γ) minimize φ(x) + γe T (v + w) x, v, w subject to c(x) + v w = 0, l x u, v, w 0. Initially γ = 0, but it is set to γ 10 4 (1 + g 0 (x k ) ) if one of the QP subproblems QP k (0) (see below) proves to be infeasible, or if the norm of the dual variables for QP k (0) becomes large: y > SNOPT is then in elastic mode for subsequent major iterations. Optimization continues with that value of γ until an optimal solution of NCB(γ) has been obtained. If the nonlinear constraints are not satisfied ( c(x) not sufficiently small), γ is increased and optimization continues. If γ reaches an allowable maximum value (default ) and the nonlinear constraints are still violated significantly, the problem is declared infeasible. At this stage, SNOPT has essentially minimized c(x) The QP subproblem If SNOPT is in elastic mode, the SNOPT subproblems include the l 1 penalty terms verbatim: QP k (γ) minimize φ(x k ) + g 0 (x k ) T x + 1 x, v, w 2 xt H k x + γe T (v + w) subject to c k (x) + v w = 0, l x u, v, w 0, with γ > 0. But before elastic mode is activated, γ and v, w are zero. 5.2 SQOPT SNOPT includes a sparse QP solver SQOPT, which employs a two-phase active set algorithm to achieve feasibility and optimality. SQOPT implements elastic programming implicitly. Any specified bounds on the variables or constraints may be elastic. The associated variables or slacks are given a piecewise linear objective function that penalizes values outside the real bounds. If γ > 0 in NCB(γ) (perhaps because previous QP subproblems were infeasible), SNOPT defines elastic bounds on the slacks associated with linearized nonlinear constraints. SQOPT then solves each QP k (γ) with v and w implemented implicitly. Separately, SNOPT can use SQOPT to find a point close to a user-provided starting point x 0. One option is to set l j = u j = (x 0 ) j for the nonlinear variables and specify that those bounds are elastic. (Linear variables retain their true bounds.) SQOPT will then solve the proximal point problem PP1 minimize x R n x x 0 1 subject to the linear constraints and (modified) bounds. Alternatively, SNOPT s PP2 option asks SQOPT to minimize 1 2 x x subject to the linear constraints and true bounds. For further details of SNOPT and SQOPT, see [6, 7, 8]. 5.3 Limited-memory Hessian updates For moderate-sized problems, SNOPT maintains a dense Hessian approximation similar to that in NPSOL. For large problems with thousands of nonlinear variables, a limited-memory procedure is used to represent H k. At certain iterations k = r, say, a positive-definite

5 Spring 2015, Notes 11 SQP Methods 83 diagonal estimate H r = R 2 r is used. In particular, R 0 is a multiple of the identity. Up to l quasi-newton updates (default l = 10) are accumulated to represent the Hessian as H k = R T kr k, k 1 R k = R r (I + r j s T j ). j=r SQOPT accesses H k by requesting products of the form H k v. The work and storage required grows linearly with k r. When k = r + l, the diagonals of R k are computed and saved to form the next R r. Storage is then reset by discarding the vectors {r j, s j }. More elaborate limited-memory methods are known; notably Nocedal et al. [16, 2]. The simple form above has an advantage for problems with purely linear constraints: the reduced Hessian required by the reduced-gradient implementation in SQOPT can be updated between major iterations. This is important on large problems with many superbasic variables. As the SQP iterations converge, very few minor iterations are needed to solve QP k and the cost becomes dominated by the formation of Z T H k Z and its factors, but when the constraints are linear, this expense is avoided. (The LU factors of the basis can also be retained.) The thesis research of Andrew Bradley [1] led to a new limited-memory quasi-newton implementation for SNOPT7. A circular buffer is used to hold the latest pairs of update vectors {s j, y j } (recording the recent changes in x and the gradient of the Lagrangian), and the initial diagonal Hessian approximation H 0 is continually updated. The resulting LMCBD procedure (limited-memory circular buffer with diagonal update) produces a robust Hessian approximation that on average outperforms SNOPT s previous limited-memory implementation. 5.4 Quasi-Newton and CG To avoid forming and factorizing the reduced Hessian at the start of each major iteration, SNOPT can use a quasi-newton option in SQOPT. It maintains dense triangular factors R T R Z T H k Z analogous to the RG method in MINOS. A further option is to use the Conjugate-Gradient method to solve the reduced-hessian system Z T H k Z x S = Z T g. This often converges in remarkably few iterations even when there are many thousands of superbasic variables. The diagonal plus low-rank structure of H k plus the large identity matrix in Z must be largely responsible, though it would be easier to explain if the diagonal part H r = R 2 r were always the identity. 5.5 QP solvers for many degrees of freedom Hanh Huynh s thesis research [15] included the development of a convex QP solver QPBLU that periodically factorizes the current KKT matrix ( ) H0 W K 0 = T, H W 0 diagonal, and uses block-lu updates to accommodate up to about 100 active-set changes to W as well as a few BFGS updates to H 0. The aim was to handle many degrees of freedom with a direct method rather than the CG option above (allowing W to be large with many more columns than rows). QPBLU was implemented in Fortran 95 and designed to use any available sparse solver to factorize the current K 0. It included interfaces to LUSOL [10], MA57 [4], PARDISO [18], SuperLU [3], and UMFPACK [19]. Such solvers are assumed to give a factorization K 0 = LU and to permit separate solves with L, L T, U, and U T. For example, MA57 s symmetric factorization K 0 = LDL T was regarded as K 0 = L(DL T ). At the time, PARDISO did not separate its factors, so QPBLU used L = I and U = K 0. This work was continued by Chris Maes for his thesis research [17]. A new QP solver QPBLUR was developed with modules from QPBLU serving as starting point. QPBLUR used a BCL method to solve a sequence of regularized convex QP subproblems (in order to solve the given convex QP). It was incorporated into SNOPT7 as an alternative to SQOPT. The aim was to allow use of the most advanced (possibly parallel) linear system solvers within

6 84 MS&E 318 (CME 338) Large-Scale Numerical Optimization an active-set QP solver and hence within an SQP algorithm. In May 2010, Chris installed a more recent version of PARDISO that allowed separate solves with L and U and optionally used multiple CPUs during its Factor phase. The intention was for SNOPT to use SQOPT while the number of superbasic variables remains below 2000 (say), then switch to QPBLUR if necessary. With its warm-start capability, QPBLUR bridged the gap between null-space active-set methods and full-space interior methods. Comprehensive numerical comparisons are given in [17]. Meanwhile, Elizabeth Wong for her thesis research [20] developed a convex and nonconvex QP solver icqp, based again on block-lu factors of KKT systems (but without regularization). More recently, Gill and Wong have developed SQIC [14, 13, 12], a further block-lu/kkt-based solver for convex and nonconvex QP intended for use within SNOPT9 (a Fortran 2003 implementation of SNOPT). SQIC manages the transition from a reduced- Hessian method to a block-lu method for updating the active set. References [1] A. M. Bradley. Algorithms for the Equilibration of Matrices and Their Application to Limited-Memory Quasi-Newton Methods. PhD thesis, ICME, Stanford University, May [2] R. H. Byrd, J. Nocedal, and R. B. Schnabel. Representations of quasi-newton matrices and their use in limited-memory methods. Math. Program., 63: , [3] J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu. A supernodal approach to sparse partial pivoting. SIAM J. Matrix Anal. Appl., 20(3): , [4] I. S. Duff. MA57: a Fortran code for the solution of sparse symmetric definite and indefinite systems. ACM Trans. Math. Software, 30(2): , See ldl in Matlab. [5] P. E. Gill, S. J. Hammarling, W. Murray, M. A. Saunders, and M. H. Wright. User s guide for LSSOL (Version 1.0): a Fortran package for constrained linear least-squares and convex quadratic programming. Report SOL 86-1, Department of Operations Research, Stanford University, CA, [6] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM Review, 47(1):99 131, SIGEST article. [7] P. E. Gill, W. Murray, and M. A. Saunders. User s guide for SNOPT 7: Software for large-scale nonlinear programming. (see Software), [8] P. E. Gill, W. Murray, and M. A. Saunders. User s guide for SQOPT 7: Software for large-scale linear and quadratic programming. (see Software), [9] P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright. User s guide for NPSOL (Version 4.0): a Fortran package for nonlinear programming. Report SOL 86-2, Department of Operations Research, Stanford University, Stanford, CA, [10] P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright. Maintaining LU factors of a general sparse matrix. Linear Algebra and its Applications, 88/89: , [11] P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright. Some theoretical properties of an augmented Lagrangian merit function. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages North Holland, North Holland, [12] P. E. Gill and E. Wong. Numerical results for SQIC: Software for large-scale quadratic programming. Report CCoM 14-4, Center for Computational Mathematics, University of California, San Diego, La Jolla, CA, [13] P. E. Gill and E. Wong. User s guide for SQIC: Software for large-scale quadratic programming. Report CCoM 14-2, Center for Computational Mathematics, University of California, San Diego, La Jolla, CA, [14] P. E. Gill and E. Wong. Methods for convex and general quadratic programming. Math. Prog. Comp., 7:71 112, [15] H. M. Huynh. A Large-scale Quadratic Programming Solver Based On Block-LU Updates of the KKT System. PhD thesis, SCCM, Stanford University, Sep [16] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Program., 45: , [17] C. M. Maes. A Regularized Active-Set Method for Sparse Convex Quadratic Programming. PhD thesis, ICME, Stanford University, Nov [18] PARDISO parallel sparse solver. [19] UMFPACK solver for sparse Ax = b. [20] E. Wong. Active-Set Methods for Quadratic Programming. PhD thesis, Department of Mathematics, University of California, San Diego, June 2011.

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods