MS&E 318 (CME 338) Large-Scale Numerical Optimization. 1 Interior methods for linear optimization

Similar documents
SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Chapter 8 Cholesky-based Methods for Sparse Least Squares: The Benefits of Regularization

MS&E 318 (CME 338) Large-Scale Numerical Optimization

1 Computing with constraints

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Experiments with iterative computation of search directions within interior methods for constrained optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization

What s New in Active-Set Methods for Nonlinear Optimization?

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Linear Programming: Simplex

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

Sparsity Matters. Robert J. Vanderbei September 20. IDA: Center for Communications Research Princeton NJ.

LINEAR AND NONLINEAR PROGRAMMING

A PRIMAL-DUAL INTERIOR POINT ALGORITHM FOR CONVEX QUADRATIC PROGRAMS. 1. Introduction Consider the quadratic program (PQ) in standard format:

Written Examination

A Regularized Interior-Point Method for Constrained Nonlinear Least Squares

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

A review of sparsity vs stability in LU updates

Interior-Point Methods for Linear Optimization

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION

Review Questions REVIEW QUESTIONS 71

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

Scientific Computing: Optimization

Multidisciplinary System Design Optimization (MSDO)

5.5 Quadratic programming

4TE3/6TE3. Algorithms for. Continuous Optimization

Nonlinear optimization

Inexact Newton Methods and Nonlinear Constrained Optimization

Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

Algorithms for Constrained Optimization

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Convex Quadratic Approximation

5 Handling Constraints

Projection methods to solve SDP

Lecture: Algorithms for LP, SOCP and SDP

A new primal-dual path-following method for convex quadratic programming

Second-order cone programming

A priori bounds on the condition numbers in interior-point methods

Optimality, Duality, Complementarity for Constrained Optimization

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming

BOUNDS ON EIGENVALUES OF MATRICES ARISING FROM INTERIOR-POINT METHODS

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

10 Numerical methods for constrained problems

Interior Point Methods in Mathematical Programming

A convex QP solver based on block-lu updates

A SUFFICIENTLY EXACT INEXACT NEWTON STEP BASED ON REUSING MATRIX INFORMATION

Optimisation in Higher Dimensions

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

Introduction to Nonlinear Stochastic Programming

Satisfying Flux Balance and Mass-Action Kinetics in a Network of Biochemical Reactions

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

A Stable Primal-Dual Approach for Linear Programming under Nondegeneracy Assumptions

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Lecture 18: Optimization Programming

minimize x subject to (x 2)(x 4) u,

1 Number Systems and Errors 1

4TE3/6TE3. Algorithms for. Continuous Optimization

A Primal-Dual Regularized Interior-Point Method for Convex Quadratic Programs

Nonlinear Optimization: What s important?

LP. Kap. 17: Interior-point methods

The Squared Slacks Transformation in Nonlinear Programming

2.098/6.255/ Optimization Methods Practice True/False Questions

A Generalized Homogeneous and Self-Dual Algorithm. for Linear Programming. February 1994 (revised December 1994)

Nonlinear Optimization Solvers

An Inexact Newton Method for Nonlinear Constrained Optimization

Higher-Order Methods

Implementation of a KKT-based active-set QP solver

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING

Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

Scientific Computing

Generalization to inequality constrained problem. Maximize

REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Convergence Analysis of Inexact Infeasible Interior Point Method. for Linear Optimization

arxiv:cs/ v1 [cs.ms] 13 Aug 2004

Incomplete Factorization Preconditioners for Least Squares and Linear and Quadratic Programming


Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control

Lecture 3 Interior Point Methods and Nonlinear Optimization

Basis Pursuit Denoising and the Dantzig Selector

Algorithms for nonlinear programming problems II

On implementing a primal-dual interior-point method for conic quadratic optimization

Introduction to Applied Linear Algebra with MATLAB

Interior Point Methods for Mathematical Programming

Numerical Linear Algebra

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Full Newton step polynomial time methods for LO based on locally self concordant barrier functions

An Inexact Newton Method for Optimization

The Dual Flow Between Linear Algebra and Optimization

SF2822 Applied nonlinear optimization, final exam Wednesday June

Interior Point Algorithms for Constrained Convex Optimization

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization

A Trust-region-based Sequential Quadratic Programming Algorithm

Transcription:

Stanford University, Management Science & Engineering and ICME MS&E 318 CME 338 Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 017 Notes 9: PDCO: Primal-Dual Interior Methods 1 Interior methods for linear optimization First we consider vanilla LO problems of the form LO minimize x subject to c T x x b, : y x 0, : z where R m n and y, z are the dual variables for the general constraints and bounds. We assume m n and rank m. The dual problem is LD minimize y, z b T y subject to T y + z c, : w z 0, : x where w x so will not appear further. We assume that there exists a point x, y, z that is primal-dual feasible: x b, x 0, T y + z c, z 0. We further assume that the interior-point condition is satisfied: that there exists a primaldual feasible point x, y, z that is strictly interior to the bounds: x b, x > 0, T y + z c, z > 0. Some important results follow from the feasibility assumption. First note that if x is primal feasible and y, z is dual feasible, then c T x x T T y + x T z b T y + x T z b T y, so that x T z is an important quantity. Indeed it is zero at an optimal solution. Strong Duality If x, y, z is a primal-dual feasible point, c T x b T y and x T z 0. Strict Complementarity Goldman and Tucker [7] There exists a primal-dual feasible point x, y, z such that x T z 0 and x + z > 0. Interior methods often called interior-point methods or IPMs differ from primal or dual simplex methods in their handling of the bounds on x and z and their treatment of the complementarity condition x T z 0. First note that the optimality conditions for LO and LD may be stated as x b, 1 T y + z c, Xz 0, 3 x, z 0, 4 where X diagx j and constraints 3 4 are a nonlinear way of imposing the complementarity condition. They are a more direct statement of the requirement that at least one of each pair x j, z j be zero, j 1 : n. We could replace 3 by the single equation x T z 0, or we could replace 3 4 by the Matlab-type vector expression minx, z 0. However, 59

60 MS&E 318 CME 338 Large-Scale Numerical Optimization interior methods advanced dramatically following Megiddo s 1986 proposal to work with a perturbed form of Xz 0 [14]. Simplex methods satisfy the complementarity condition at all times. Primal Simplex satisfies 1 3 and x 0 while iterating until z 0. Dual Simplex satisfies 1 3 and z 0 while iterating until x 0. In contrast, primal-dual interior methods satisfy x > 0 and z > 0 throughout while iterating to satisfy 1 3. key concept is to parameterize the complementarity equation and work with the system of nonlinear equations x b, T y + z c, 5 Xz µe, where e is a vector of 1s and µ > 0. The conditions x > 0, z > 0 are understood to hold throughout. The interior-point condition ensures that a solution exists for at least some µ > 0. In fact, 5 gives the unique solution of the convex problem COµ minimize c T x µ x j lnx j subject to x b, x > 0, which is well defined for all µ > 0. The infinite sequence of solutions {xµ, yµ, zµ} for µ > 0 is called the central path for LO and LD. Primal-dual IPMs apply Newton s method for nonlinear equations to system 5 with µ decreasing toward zero in discrete stages. vital concept is the proximity of the current estimate x, y, z to the central path. The current value of µ is retained for each Newton step until some measure of proximity is suitably small; for example, maxxz/ minxz 1000. Then µ is reduced in some way e.g., to 1 αµ, α 0, 1 and Newton s method continues. Much research has occurred on interior methods since the mid 1980s a revival after much earlier work on penalty and barrier methods. The monograph by Peng, Roos and Terlaky [] summarizes much of the theory behind modern IPMs for LO and other problems, and gives a novel approach to measuring proximity. See also Wright [30], Nocedal and Wright [17]. finite termination strategy proposed by Ye [33] makes use of the intriguing Goldman- Tucker theorem to guess the sets B and N for which x B > 0 and z N > 0, based on the condition x j z j j B; see Mehrotra and Ye [15], Wright [30, pp. 146 149]. 1.1 The Newton system Linearizing 5 at the current estimate x, y, z gives the following equation for the Newton direction x,, z: Z T x I X z r r 3 b x c T y z, 6 µe Xz where Z diagz j. We can expect to be a very large sparse matrix. In some applications, may be an operator for which products v and T w can be computed for arbitrary v, w. Note that X and Z are positive-definite diagonal matrices with no large elements but increasingly many small elements as µ 0 since x j z j µ as Newton s method converges for any given µ. Thus, X and Z both become increasingly ill-conditioned. This need not imply that system 6 is ill-conditioned although it may be. If the iterates stay reasonably near the central path, we know from strict complementarity that either x j O1, z j Oµ or vice versa. Scaling the jth row of X and Z by max{x j, z j } should hence keep the condition of the 3 3 block matrix similar to the condition of.

Spring 017, Notes 9 Interior Methods 61 We can make 6 structurally symmetric by interchanging the first two rows: T I x r. 7 Z X z This is equivalent to the symmetric system T Z 1/ x r, 8 Z 1/ X Z 1/ z Z 1/ r 3 which may be helpful for some sparse-matrix solvers of the future. Indeed, Greif, Moulding, and Orban [9] analyze the eigenvalues of a similar system for convex quadratic optimization and advocate working directly with that system rather than eliminating variables. Nevertheless, systems 6 8 are large. Much research has been devoted to finding efficient and reliable numerical methods for solving such systems by eliminating blocks of variables to reduce their size. First, it seems reasonable to regard I as a safe block pivot to eliminate z from 6. Permuting I to the top left gives I X Z T z x r 3 r r 3. 9 Subtracting X times the first equation from the second gives Z X T x r3 Xr and z r r T. 10 1 lternatively we can permute X to the top left: X Z z I T x r 3 r. 11 Using X as a dangerous! block pivot and defining r 4 r X 1 r 3 gives X 1 Z T x r4 and X z r 3 Z x. 1 The hard part is solving the block systems in 10 o. Those who like to live dangerously or don t know any better! use Z in 10 or X 1 Z in 1 as a block pivot to eliminate x, giving Defining D XZ 1 gives Z 1 X T + Z 1 Xr r 3, Z x r 3 Xr. D T + D r 4, x D T r 4. 13 Often reaches zero before the other residuals if a full step x is taken. System 13 is then the normal equations for the least-squares problem min Dr 4 D T. lthough this casts immediate doubt, system 13 has better numerical properties than one might think, as long as the iterates stay near the central path r 3 small; see Wright [9, 30]. Many implementations are based on 13. The main advantage of the normal-equations approach is that standard sparse Cholesky factorizations may be applied to D T. single call to the nalyze ordering procedure suffices because only D changes. Most of the work for solving an LO problem goes into factorizing only 0 to 50 matrices with constant sparsity pattern. Since parallel Cholesky factorizations exist MUMPS [16], PRDISO [0], POOCLPCK [4, 10], WSMP [3], it becomes clear that interior methods are much easier to parallelize than simplex methods.

6 MS&E 318 CME 338 Large-Scale Numerical Optimization 1. ugmented systems mechanical difficulty with 13 is that if contains one or more rather dense columns, the matrix D T will itself be very dense let alone its factorization. Various devices have been proposed to alleviate this difficulty, but each tends to cast further numerical doubt on the normal-equations approach. Returning to the system 10, we may symmetrize it in various ways. For example, a similarity transformation involving x X 1/ x preserves the eigenvalues but unfortunately not the singular values!: Z X 1/ T X 1/ x X 1/ r X 1/ r 3. 14 lternatively, with x β x, the equivalent system βi D T x D r4 /β 15 has good numerical properties if β is judiciously small and is not too large. Fourer and Mehrotra [4] obtained good performance applying their own symmetric indefinite LBL T factorizer involving block-diagonal B with blocks of orde or. System 1 can also be solved using an LBL T factorization. Wright [31] has analyzed this approach for both non-degenerate and degenerate LO problems and shown it to be more reliable than expected from the block-pivot on X. 1.3 Quasi-definite systems Sparse Cholesky-type factorizations of augmented systems became viable with the concept of symmetric quasi-definite matrices Vanderbei [7] and the advent of LOQO [13, 8]. By judicious formulation of the LO problem itself, Vanderbei ensures that the augmented systems have the form E T M, 16 F where E and F are positive definite. Factorizations P MP T LDL T exist for arbitrary symmetric permutations P, with D diagonal but indefinite. Such factorizations are easier to justify with the help of certain perturbations to the LO problem, as discussed next. Regularized linear optimization To improve the reliability of Newton s method, and to generate quasi-definite formulations with guaranteed stability, we consider perturbations to problem LO. We define a regularized LO problem to be LOγ, δ minimize c T x + 1 x, r γx + 1 r subject to x + δr b, x 0, where γ and δ are typically 10 3 o0 4 on machines with today s normal 15 16 digit floating-point arithmetic. We assume that the data, b, c have been scaled to be of order 1. With positive perturbations, LOγ, δ is really a strictly convex quadratic problem with a unique, bounded, optimal solution x, r, y, z. Small values of γ and δ help keep this unique solution near a solution of the unperturbed LO. For least-squares applications we set δ 1. Such problems tend to be easier to solve.

Spring 017, Notes 9 Interior Methods 63.1 The barrier approach s before, we replace the non-negativity constraints by the log barrier function to obtain a sequence of convex subproblems with decreasing values of µ: COγ, δ, µ minimize c T x + 1 x, r γx + 1 r µ j ln x j subject to x + δr b, where µ > 0 and x > 0 are understood. The first-order optimality conditions state that the gradient of the subproblem objective should be a linear combination of the gradients of the primal constraint. Thus, x + δr b, T y c + γ x µx 1 e, δy r, where X diagx, y is a vector of dual variables, and e is a vector of 1s. Defining z µx 1 e and immediately converting to the equivalent condition Xz µe, we obtain a system of nonlinear equations that has a unique solution for each µ: x + δ y b T y + z c + γ x Xz µe, 17 where we have eliminated r δy. These are the parameterized nonlinear equations for LOγ, δ corresponding to 5 for the vanilla LO problem. Since the perturbations appear as γ and δ, they tend to be negligible on well behaved problems with x and y of orde. Otherwise they help prevent those norms from becoming large. We now apply Newton s method for nonlinear equations, with a steplength restriction to ensure that the estimates of x and z remain strictly positive.. The Newton system Linearizing 17 at the current estimate x, y, z gives the system analogous to 6: δ I x b x δ y γ I T I r c + γ x T y z, 18 Z X z r 3 µe Xz where Z diagz j. The analogue of 1 is X 1 Z + γ I T δ I x r4 19 with r 4 r X 1 r 3 again and z r + γ x T. friendlier version analogous to 14 is Z + γ X X 1/ T X 1/ δ I x with x X 1/ x. Defining D X 1 Z + γ I 1, we have X 1/ r 4 0 D T + δ I D r 4 + 1 analogous to 13 with x D T r 4 as before. Regularization reduces the condition of both D and D T + δ I, thereby helping the normal-equations approach.

64 MS&E 318 CME 338 Large-Scale Numerical Optimization.3 Quasi-definite systems With γ and δ positive, we recognize system 19 to be symmetric quasi-definite SQD. Thus, an indefinite Cholesky-type factorization exists for any symmetric permutation. The question is, under what conditions are the SQD factors stable? First note that if M in 16 is SQD, then the matrix M MĪ is positive definite: I Ī E M MĪ x, z, I y, z T Mz x T Ex + y T F y 0. F T More generally, we find that for any permutation P, P MP T LDL T if and only if P MP T L DU, where Ĩ P ĪP T, D DĨ, and U ĨLT Ĩ. Both L and U have unit diagonals, and D and D are indefinite but nonsingular diagonal matrices. Thus, Golub and Van Loan s analysis of LDU factorization of unsymmetric positive-definite systems without permutations [8] provides a similar analysis of LDL T factors of SQD matrices. This observation was exploited by Gill et al. [6] to show that P MP T LDL T is stable for every permutation P if is not too large compared to E and F ; diage, F is not too ill-conditioned. Regarding system 19 as Mv r, we have E T M δ, E D X 1 Z + γ I I and we find that the effective condition number of M is EcondM condm min{γ, δ }. Hence the LDL T approach should be stable until x, y, z approaches a solution with condm becoming increasingly large. more uniform bound was obtained later by Saunders [5] by writing 19 as the system δi D T D δi x Dr4 /δ ˆM v r, where x δd x. This is still an SQD system and the same theory shows that cond ˆM D δ 1 γδ, Econd ˆM 1 γ δ assuming 1. Hence, indefinite Cholesky factorization should be stable for all primal-dual iterations as long as γ δ ɛ where ɛ is the floating-point precision typically. 10 16 on today s machines. Thus we need γδ 10 8. For least-squares problems with δ 1, this is readily arranged. For regularized LO problems, γ δ 10 3 is safe. Such factorizations have been implemented successfully within IBM s OSL Optimization Subroutine Library. The sparse Cholesky solver in WSMP [3] is applied to either M or D T + δ I whichever is more sparse. major benefit is that any dense columns in are handled sensibly without special effort.

Spring 017, Notes 9 Interior Methods 65.4 Least-squares formulation With δ > 0, we have an alternative to SQD systems and normal equations. We may write 1 as a least-squares problem even when 0: min Dr4 D T /δ δi. 3 This may be solved inexactly by a conjugate-gradient-type iterative solver such as LSQR [18, 19] or LSMR []. It is especially useful when is an operator, but is also applicable when is explicit. Clearly δ should not be too small and δ 1 is ideal. This approach was used by PDSCO in the Basis Pursuit DeNoising BPDN signal decomposition software tomizer [1]. PDSCO has been superseded by PDCO..5 Exact regularization for linear and quadratic optimization To ensure numerical stability in solving the above regularized systems, the parameters γ and δ cannot be too small. Values as large as 10 3 o0 4 in problem LOγ, δ sometimes perturb the underlying LO problem more than a user may wish for. To retain the linear-algebra benefits of regularization while eliminating the perturbation to the solution, Friedlander and Orban [5] solve a sequence of regularized problems of the form QOρ, δ minimize c T x + 1 x, r xt Qx + 1 ρ x x k + 1 δ r + y k subject to x + δr b, x 0, where ρ, δ > 0 and x k, y k are current estimates of the optimal variables x, y. The positive semidefinite matrix Q generalizes the class of problems. 3 Convex optimization with linear constraints PDCO Primal-Dual Method for Convex Optimization [1] is a Matlab solver for optimization problems that are nominally of the form CO minimize φx x subject to x b, l x u, where φx is a convex function with known gradient gx and Hessian Hx, and R m n. The format of CO is suitable for any linear constraints. For example, a double-sided constraint α a T x β α < β should be entered as a T x ξ 0, α ξ β, where x and ξ are relevant parts of x. To allow for constrained least-squares problems, and to ensure unique primal and dual solutions and improved stability, PDCO really solves the regularized problem CO minimize φx + 1 x, r D 1x + 1 r subject to x + D r b, l x u, where D 1, D are specified positive-definite diagonal matrices. The diagonals of D 1 are typically small 10 3 o0 4. Similarly for D if the constraints in CO should be satisfied reasonably accurately. For least-squares applications, some diagonals of D will be 1. Note that some elements of l and u may be and + respectively, but we expect no large numbers in, b, D 1, D. If D is small, we would expect to be under-determined m < n. If D I, may have any shape.

66 MS&E 318 CME 338 Large-Scale Numerical Optimization 3.1 The barrier approach First we introduce slack variables x 1, x to convert the bounds to non-negativity constraints: CO3 minimize φx + 1 x, r, x 1, x D 1x + 1 r x + D r b x x subject to 1 l x + x u x 1, x 0. Then we replace the non-negativity constraints by the log barrier function, obtaining a sequence of convex subproblems with decreasing values of µ µ > 0: COµ minimize x, r, x 1, x φx + 1 D 1x + 1 r µ j ln[x 1] j [x ] j x + D r b : y subject to x x 1 l : z 1 x x u, : z where y, z 1, z denote dual variables for the associated constraints. With µ > 0, most variables are strictly positive: x 1, x, z 1, z > 0. Exceptions: If l j or u j, the corresponding equation is omitted and the jth element of x 1 or x doesn t exist. The KKT conditions for the barrier subproblem involve the three primal equations of COµ, along with four dual equations stating that the gradient of the subproblem objective should be a linear combination of the gradients of the primal constraints: x + D r x x 1 b l x x u T y + z 1 z gx + D 1x : x D y r : r X 1 z 1 µe : x 1 X z µe, : x where X 1 diagx 1, X diagx, and similarly for Z 1, Z later. The last two equations are commonly called the perturbed complementarity conditions. Initially they are in a different form. The dual equation for x 1 is really z 1 µ lnx 1 µx 1 1 e, where e is a vectors of 1 s. Thus, x 1 > 0 implies z 1 > 0, and multiplying by X 1 gives the equivalent equation X 1 z 1 µe as stated. 3. Newton s method We now eliminate r D y and apply Newton s method: x + x + D y + x + x x 1 + x 1 l x + x x + x u T y + + z 1 + z 1 z + z X 1 z 1 + X 1 z 1 + Z 1 x 1 µe X z + X z + Z x b g + H x + D 1x + x µe,

Spring 017, Notes 9 Interior Methods 67 where g and H are the current objective gradient and Hessian. To solve this Newton system, we work with three sets of residuals: x x1 rl l x + x1, 4 x x r u u + x + x X1 z 1 + Z 1 x 1 cl µe X1 z 1, 5 X z + Z x c u µe X z x + D r1 b x D H 1 x + T y + z 1 z r g + D1x T, 6 y z 1 + z where H 1 H + D1. We use 4 and 5 to replace two sets of vectors in 6. With x1 rl + x z1 X 1, 1 c l Z 1 x 1 x r u x z X 1 c, 7 u Z x we find that H T 3.3 Keeping x safe H H + D1 + X1 1 Z 1 + X 1 Z w r X1 1 c l + Z 1 r l + X 1 c u + Z r u D x 8 w. 9 If the objective is the entropy function φx x j lnx j, for example, it is essential to keep x > 0. However, problems CO3 and COµ treat x as having no bounds except when x x 1 l and x + x u are satisfied in the limit. Thus since pril 010, PDCO safeguards x at every iteration by setting 3.4 Solving for x, x j [x 1 ] j if l j 0 and l j < u j, x j [x ] j if u j 0 and l j < u j. If φx is a general convex function with known Hessian H, 9 may be treated by direct or iterative solvers. Since it is an SQD system, sparse LDL T factors should be sufficiently stable under the same conditions as for regularized LO: 1, H 1, and γδ 10 8, where γ and δ are the minimum values of the diagonals of D 1 and D. If K represents the sparse matrix in 9, PDCO with Method 1 uses the following lines of code to achieve reasonable efficiency: s symamdk; % sqd ordering first iteration only rhs [w; r1]; thresh eps; % eps ~ e-16 suppresses partial pivoting [L,U,p] luks,s,thresh, vector ; % expect p I sqdsoln U\L\rhss; sqdsolns sqdsoln; dx sqdsoln1:n; dy sqdsolnn+1:n+m; The symamd ordering can be reused for all iterations, and with row interchanges effectively suppressed by thresh eps, we expect the original sparse LU factorization in Matlab to do no further permutations p 1:n+m. With Method, PDCO solves 9 directly using M57: thresh 0; % tells M57 to keep its sparsity-preserving order [L,D,P,S] ldlk,thresh; if nnzd~m+n error [L,D,P,S] ldlk,0 gave non-diagonal D end sqdsoln S*P*L \D\L\P *S*rhs; dx...; dy...;

68 MS&E 318 CME 338 Large-Scale Numerical Optimization lternatively a sparse Cholesky factorization H LL T may be practical, where H H + diagonal terms 8, and L is a nonsingular permuted triangle. This is trivial if φx is a separable function, since H and H in 8 are then diagonal. System 9 may then be solved by eliminating either x or : or T D x T D 1 w, D x, 30 H 1 + D H 1 1, H x T w. 31 Sparse Cholesky factorization may again be applicable, but if an iterative solver must be used it is preferable to regard them as least-squares problems suitable for LSQR or LSMR: or min x min D 1 L T L 1 T D D 1 x L T w L 1 w D 1, D D 1 x, 3, L T x L 1 T w. 33 The right-most vectors in 3 33 are part of the residual vectors for the least-squares problems and may be by-products from the least-squares solver. 3.5 Success PDCO has been applied to some large web-traffic network problems with the entropy function xj ln x j as objective [6]. Search directions were obtained by applying LSQR to 33 with diagonal H X 1 and L H 1/, and D 1 0, D 10 3 I. problem with 50,000 constraints and 660,000 variables an explicit sparse solves in about 3 minutes on a GHz PC, requiring less than 100 total LSQR iterations. t the time 003, this was unexpectedly remarkable performance. Both the entropy function and the network matrix are evidently amenable to interior methods. 4 Interior methods for general NLO Reference [3] gives an overview of the theory of interior methods for general nonlinear optimization. With exact second derivatives increasingly available in particular within GMS and MPL, general-purpose software for large-scale nonconvex optimization with nonlinear constraints is becoming increasingly powerful and popular. Some commercial examples are IPOPT, KNITRO, LOQO, and PENOPT [11, 1, 13, 3]. To allow for general non-diagonal Hessians, they all use sparse direct methods on indefinite systems analogous to 9. To allow for nonconvex problems, the matrix H must be modified in some way often by the addition of multiples of I. References [1] S. S. Chen, D. L. Donoho, and M.. Saunders. tomic decomposition by basis pursuit. SIM Review, 431:19 159, 001. SIGEST article. [] D. C.-L. Fong and M.. Saunders. LSMR: n iterative algorithm for least-squares problems. SIM J. Sci. Comput., 335:950 971, 011. http://stanford.edu/group/sol/software.html. [3]. Forsgren, P. E. Gill, and M. H. Wright. Interior methods for nonlinear optimization. SIM Review, 444:55 597, 00. [4] R. Fourer and S. Mehrotra. Performance of an augmented system approach for solving least-squares problems in an interior-point method for linear programming. Math. Program., 19:6 31, ugust 1991. [5] M. P. Friedlander and D. Orban. primal dual regularized interior-point method for convex quadratic programs. Math. Prog. Comp., 41:71 107, 01.

Spring 017, Notes 9 Interior Methods 69 [6] P. E. Gill, M.. Saunders, and J. R. Shinnerl. On the stability of Cholesky factorization for symmetric quasi-definite systems. SIM J. Matrix nal. ppl., 17:35 46, 1996. [7]. J. Goldman and. W. Tucker. Theory of linear programming. In H. W. Kuhn and. W. Tucker, editors, Linear Inequalities and Related Systems, nnals of Mathematical Studies 38, pages 63 97. Princeton University Press, Princeton, NJ, 1956. [8] G. H. Golub and C. F. Van Loan. Unsymmetric positive definite linear systems. Linear lgebra and its pplications, 8:85 98, 1979. [9] C. Greif, E. Moulding, and D. Orban. Bounds on eigenvalues of matrices arising from interior-point methods. GERD Technical Report G-01-4, École Polytechnique, Montréal, QC, Canada, 01. [10] B. C. Gunter, W. C. Reiley, and R.. van de Geijn. Implementation of out-of-core Cholesky and QR factorizations with POOCLPCK. [11] IPOPT open source NLP solver. http://www.coin-or.org/projects/ipopt.xml. [1] KNITRO optimization software. http://www.ziena.com. [13] LOQO optimization software. http://orfe.princeton.edu/~loqo. [14] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo, editor, Progress in Mathematical Programming : Interior Point and Related Methods, pages 131 158. Springer Verlag, New York, NY, 1989. Identical version in : Proceedings of the 6th Mathematical Programming Symposium of Japan, Nagoya, Japan, pages 1 35, 1986. [15] S. Mehrotra and Y. Ye. Finding an interior point in the optimal face of linear programs. Math. Program., 6:497 515, 1993. [16] MUMPS: a multifrontal massively parallel sparse direct solver. http://mumps.enseeiht.fr/. [17] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer Verlag, New York, second edition, 006. [18] C. C. Paige and M.. Saunders. LSQR: n algorithm for sparse linear equations and sparse least squares. CM Trans. Math. Software, 81:43 71, 198. [19] C. C. Paige and M.. Saunders. lgorithm 583; LSQR: Sparse linear equations and least-squares problems. CM Trans. Math. Software, 8:195 09, 198. [0] PRDISO parallel sparse solver. http://www.pardiso-project.org. [1] PDCO convex optimization software. http://stanford.edu/group/sol/software.html. [] J. Peng, C. Roos, and T. Terlaky. Self-Regularity: New Paradigm for Primal-Dual Interior Point lgorithms. Princeton University Press, Princeton, NJ, 00. [3] PENOPT optimization systems for nonlinear programming, bilinear matrix inequalities, and linear semidefinite programming. http://www.penopt.com. [4] W. C. Reiley and R.. van de Geijn. POOCLPCK: Parallel out-of-core linear algebra package. Technical Report CS-TR-99-33, citeseer.ist.psu.edu/article/reiley99pooclapack.html, 1999. [5] M.. Saunders. Cholesky-based methods for sparse least squares: The benefits of regularization. In L. dams and J. L. Nazareth, editors, Linear and Nonlinear Conjugate Gradient-Related Methods, pages 9 100. SIM, Philadelphia, 1996. Proceedings of MS-IMS-SIM Joint Summer Research Conference, University of Washington, Seattle, W July 9 13, 1995. [6] M.. Saunders and J.. Tomlin. Interior-point solution of large-scale entropy maximization problems. Presented at 18th International Symposium on Mathematical Programming, Copenhagen, Denmark, ug 18, 003. http://stanford.edu/group/sol/talks.html. [7] R. J. Vanderbei. Symmetric quasi-definite matrices. SIM J. Optim., 5:100 113, 1995. [8] R. J. Vanderbei. Linear Programming: Foundations and Extensions. Kluwer, Boston, London and Dordrecht, second edition, 001. [9] S. J. Wright. Stability of linear equations solvers in interior-point methods. SIM J. Matrix nal. ppl., 16:187 1307, 1995. [30] S. J. Wright. Primal-Dual Interior-Point Methods. SIM, Philadelphia, 1997. [31] S. J. Wright. Stability of augmented system factorizations in interior-point methods. SIM J. Matrix nal. ppl., 18:191, 1997. [3] WSMP: Watson Sparse Matrix Package. http://www-users.cs.umn.edu/~agupta/wsmp.html. [33] Y. Ye. On the finite convergence of interior-point algorithms for linear programming. Math. Program., 57:35 336, 199.