Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Similar documents
Nonlinear Optimization: What s important?

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

2.098/6.255/ Optimization Methods Practice True/False Questions

Constrained Nonlinear Optimization Algorithms

Lecture 18: Optimization Programming

5.5 Quadratic programming

Algorithms for Constrained Optimization

1 Computing with constraints

Advanced Mathematical Programming IE417. Lecture 24. Dr. Ted Ralphs

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Optimization Methods

10 Numerical methods for constrained problems

LINEAR AND NONLINEAR PROGRAMMING

Constrained Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Computational Finance

Lecture 15: SQP methods for equality constrained optimization

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Linear Programming: Simplex

5 Handling Constraints

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Numerical optimization

Written Examination

Scientific Computing: Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems


CONSTRAINED NONLINEAR PROGRAMMING

minimize x subject to (x 2)(x 4) u,

Algorithms for constrained local optimization

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

3 The Simplex Method. 3.1 Basic Solutions

Convex Optimization & Lagrange Duality

Constrained optimization: direct methods (cont.)

What s New in Active-Set Methods for Nonlinear Optimization?

Interior Point Methods for Mathematical Programming

Support Vector Machines: Maximum Margin Classifiers

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

MATH2070 Optimisation

MS&E 318 (CME 338) Large-Scale Numerical Optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Machine Learning. Support Vector Machines. Manfred Huber

IE 5531 Midterm #2 Solutions

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Numerical Optimization

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Summary of the simplex method

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Support Vector Machines

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

Lecture 6: Conic Optimization September 8

Numerical Optimization of Partial Differential Equations

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

CS-E4830 Kernel Methods in Machine Learning

Line Search Methods for Unconstrained Optimisation

MATH Dr. Pedro V squez UPRM. P. V squez (UPRM) Conferencia 1/ 17

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

More First-Order Optimization Algorithms

Constrained Optimization and Lagrangian Duality

Numerical Optimization. Review: Unconstrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

A Dual Gradient-Projection Method for Large-Scale Strictly Convex Quadratic Problems

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Quadratic Programming

Key words. Nonconvex quadratic programming, active-set methods, Schur complement, Karush- Kuhn-Tucker system, primal-feasible methods.

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Semidefinite Programming Basics and Applications

1 Review Session. 1.1 Lecture 2

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming

Interior Point Methods in Mathematical Programming

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

MATH Dr. Pedro Vásquez UPRM. P. Vásquez (UPRM) Conferencia 1 / 17

5.6 Penalty method and augmented Lagrangian method

Support Vector Machine (SVM) and Kernel Methods

Lecture: Algorithms for LP, SOCP and SDP

Linear programming II

Support Vector Machine (SVM) and Kernel Methods

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Inexact Newton Methods and Nonlinear Constrained Optimization

The simplex algorithm

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

8 Numerical methods for unconstrained problems

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Following The Central Trajectory Using The Monomial Method Rather Than Newton's Method

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Machine Learning A Geometric Approach

A REGULARIZED ACTIVE-SET METHOD FOR SPARSE CONVEX QUADRATIC PROGRAMMING

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

Scientific Computing: An Introductory Survey

CS711008Z Algorithm Design and Analysis

Pattern Classification, and Quadratic Problems

Transcription:

Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where the objective function f : IR n IR assume that f C 1 sometimes C 2 and Lipschitz often in practice this assumption violated, but not necessary important special cases: linear programming: fx g T x quadratic programming: fx g T x + 2x 1 T Hx Concentrate here on quadratic programming

QUADRATIC PROGRAMMING QP: H is n by n, real symmetric, g IR n a T A 1. is m by n real, b a T m in general, constraints may have upper bounds: b l Ax b u include equalities: A e x b e qx g T x + 1 2x T Hx subject to Ax b [b] 1. [b] m involve simple bounds: x l x x u include network constraints... PROBLEM TYPES Convex problems H is positive semi-definite x T Hx for all x any local r is global important special case: H linear programming Strictly convex problems H is positive definite x T Hx > for all x unique r if any

CONVEX EXAMPLE 2 1.5 1.5 minx 1 1 2 + x 2.5 2 subject to x 1 + x 2 1 3x 1 + x 2 1.5 x 1, x 2.5 1.5.5 1 1.5 Contours of objective function PROBLEM TYPES II General non-convex problems H may be indefinite x T Hx < for some x may be many local rs may have to be content with a local r problem may be unbounded from below

NON-CONVEX EXAMPLE 2 1.5 1.5 min 2x 1.25 2 + 2x 2.5 2 subject to x 1 + x 2 1 3x 1 + x 2 1.5 x 1, x 2.5 1.5.5 1 1.5 Contours of objective function PROBLEM TYPES III Small values/structure of matrix data H and A irrelevant currently minm, n O1 2 Large values/structure of matrix data H and A important currently minm, n O1 3 Huge factorizations involving H and A are unrealistic currently minm, n O1 5

WHY IS QP SO IMPORTANT? many applications portfolio analysis, structural analysis, VLSI design, discrete-time stabilization, optimal and fuzzy control, finite impulse response design, optimal power flow, economic dispatch... 5 application papers prototypical nonlinear programming problem basic subproblem in constrained optimization: fx subject to cx f + g T x + 1 2x T Hx subject to Ax + c SQP methods Course Part 7 OPTIMALITY CONDITIONS Recall: the importance of optimality conditions is: to be able to recognise a solution if found by accident or design to guide the development of algorithms

FIRST-ORDER OPTIMALITY QP: qx g T x + 1 2x T Hx subject to Ax b Any point x that satisfies the conditions Ax b Hx + g A T y and y [Ax b] i [y ] i for all i primal feasibility dual feasibility complementary slackness for some vector of Lagrange multipliers y is a first-order critical or Karush-Kuhn-Tucker point If [Ax b] i [y ] i > for all i the solution is strictly complementary SECOND-ORDER OPTIMALITY QP: qx g T x + 1 2x T Hx subject to Ax b Let N + { s } a T i s for all i such that at i x [b] i and [y ] i > and a T i s for all i such that a T i x [b] i and [y ] i Any first-order critical point x for which additionally s T Hs resp. > for all s N + is a second-order resp. strong second-order critical point Theorem 4.1: x is a an isolated local r of QP x is strong second-order critical

WEAK SECOND-ORDER OPTIMALITY QP: qx g T x + 1 2x T Hx subject to Ax b Let N { s a T i s for all i such that a T i x [b] i } Any first-order critical point x for which additionally s T Hs for all s N is a weak second-order critical point Note that a weak second-order critical point may be a maximizer! checking for weak second-order criticality is easy strong is hard NON-CONVEX EXAMPLE 1.5 1.5 min x 2 1 + x 2 2 6x 1 x 2 subject to x 1 + x 2 1 3x 1 + x 2 1.5 x 1, x 2.5.5.5 1 1.5 Contours of objective function: note that escaping from the origin may be difficult!

[ DUALITY QP: qx g T x + 1 2x T Hx subject to Ax b If QP is convex, any first-order critical point is a global r If H is strictly convex, the problem maximize y IR m, y is known as the dual of QP QP is the primal 1 2g T H 1 g + AH 1 g + b T y 1 2y T AH 1 A T y primal and dual have same KKT conditions if primal is feasible, optimal value of primal optimal value dual can be generalized for simply convex case ] ALGORITHMS Essentially two classes of methods slight simplification active set methods : primal active set methods aim for dual feasibility while maintaining primal feasibility and complementary slackness dual active set methods aim for primal feasibility while maintaining dual feasibility and complementary slackness interior-point methods : aim for complementary slackness while maintaining primal and dual feasibility Course Part 6

EQUALITY CONSTRAINED QP The basic subproblem in all of the methods we will consider is EQP: g T x + 1 2x T Hx subject to Ax N.B. Assume A is m by n, full-rank preprocess if necessary First-order optimality Lagrange multipliers y H A T x g A y Second-order necessary optimality: s T Hs for all s for which As Second-order sufficient optimality: s T Hs > for all s for which As EQUALITY CONSTRAINED QP II EQP: Four possibilities: i qx g T x + 1 2x T Hx subject to Ax H A T A x g y and H is second-order sufficient unique r x ii holds, H is second-order necessary, but s such that Hs and As family of weak rs x + αs for any α IR iii s for which As, Hs and g T s < q unbounded along direction of linear infinite descent s iv s for which As and s T Hs < q unbounded along direction of negative curvature s

CLASSIFICATION OF EQP METHODS Aim to solve H A T A x g y Three basic approaches: full-space approach range-space approach null-space approach For each of these can use direct factorization method iterative conjugate-gradient method FULL-SPACE/KKT/AUGMENTED SYSTEM APPROACH KKT matrix H A T A K x g y H A T A is symmetric, indefinite use Bunch-Parlett type factorization K P LBL T P T P permutation, L unit lower-triangular B block diagonal with 1x1 and 2x2 blocks LAPACK for small problems, MA27/MA57 for large ones Theorem 4.2: H is second-order sufficient K non-singular and has precisely m negative eigenvalues

RANGE-SPACE APPROACH H A T x g A y For non-singular H eliminate x using first block of AH 1 A T y AH 1 g followed by Hx g + A T y strictly convex case H and AH 1 A T positive definite Cholesky factorization Theorem 4.3: H is second-order sufficient H and AH 1 A T have same number of negative eigenvalues AH 1 A T usually dense factorization only for small m NULL-SPACE APPROACH H A T x A y g let n by n m S be a basis for null-space of A AS second block x Sx N premultiply first block by S T S T HSx S S T g Theorem 4.4: H is second-order sufficient S T HS is positive definite Cholesky factorization S T HS usually dense factorization only for small n m

NULL-SPACE BASIS Require n by n m null-space basis S for A AS Non-orthogonal basis: let A A 1 A 2 P P permutation, A 1 non-singular S P T A 1 1 A 2 I generally suitable for large problems. Best A 1? Orthogonal basis: let A L Q L non-singular e.g., triangular, Q Q 1 Q 2 orthonormal S Q T 2 more stable but... generally unsuitable for large problems [ ITERATIVE METHODS FOR SYMMETRIC LINEAR SYSTEMS Bx b Best methods are based on finding solutions from the Krylov space K {r, Br, BBr,...} r b Bx B indefinite: use MINRES method B positive definite: use conjugate gradient method usually satisfactory to find approximation rather than exact solution usually try to precondition system, i.e., solve C 1 Bx C 1 b where C 1 B I ]

[ ITERATIVE RANGE-SPACE APPROACH AH 1 A T y AH 1 g followed by Hx g + A T y For strictly convex case H and AH 1 A T positive definite H 1 available: directly or via factors, use conjugate gradients to solve AH 1 A T y AH 1 g matrix vector product AH 1 A T v A H 1 A T v preconditioning? Need to approximate likely dense AH 1 A T H 1 not available: use composite conjugate gradient method Urzawa s method iterating both on solutions to AH 1 A T y AH 1 g and Hx g + A T y at the same time may not converge ] [ ITERATIVE NULL-SPACE APPROACH S T HSx N S T g followed by x Sx N use conjugate gradient method matrix vector product S T HSv N S T HSv N preconditioning? Need to approximate likely dense S T HS if we encounter s N such that s T NS T HSs N < s Ns N is a direction of negative curvature since As and s T Hs < Advantage: Ax approx ]

[ ITERATIVE FULL-SPACE APPROACH H A T x g A y use MINRES with the preconditioner M AN 1 A T where M and N H. Disadvantage: Ax approx use conjugate gradients with the preconditioner M A T A where M H. Advantage: Ax approx ] ACTIVE SET ALGORITHMS QP: The active set Ax at x is If x solves QP, we have qx g T x + 1 2x T Hx subject to Ax b Ax {i a T i x [b] i } arg min qx subject to Ax b arg min qx subject to a T i x [b] i for all i Ax A working set Wx at x is a subset of the active set for which the vectors {a i }, i Wx are linearly independent

BASICS OF ACTIVE SET ALGORITHMS Basic idea: Pick a subset W k of {1,..., m} and find x k+1 arg min qx subject to a T i x [b] i for all i W k If x k+1 does not solve QP, adjust W k to form W k+1 and repeat Important issues are: how do we know if x k+1 solves QP? if x k+1 does not solve QP, how do we pick the next working set W k+1? Notation: rows of A k are those of A indexed by W k components of b k are those of b indexed by W k PRIMAL ACTIVE SET ALGORITHMS Important feature: ensure all iterates are feasible, i.e., Ax k b If W k Ax k A k x k b k and A k x k+1 b k x k+1 x k + s k, where s k arg min EQP k arg min qx k + s subject to A k s }{{} equality constrained problem Need an initial feasible point x

PRIMAL ACTIVE SET ALGORITHMS ADDING CONSTRAINTS s k arg min qx k + s subject to A k s What if x k + s k is not feasible? a currently inactive constraint j must become active at x k + α k s k for some α k < 1 pick the smallest such α k move instead to x k+1 x k + α k s k and set W k+1 W k + {j} PRIMAL ACTIVE SET ALGORITHMS DELETING CONSTRAINTS What if x k+1 x k + s k is feasible? x k+1 arg min qx subject to a T i x [b] i Lagrange multipliers y k+1 such that Three possibilities: H A T k A k x k+1 y k+1 qx k+1 not strictly-convex case only g b k for all i W k y k+1 x k+1 is a first-order critical point of QP [y k+1 ] i < for some i qx may be improved by considering W k+1 W k \ {j}, where j is the i-th member of W k

ACTIVE-SET APPROACH 2 1.5 1.5 A 1 B.5 1.5.5 1 1.5 2 3 1. Starting point. Unconstrained r 1. Encounter constraint A 1. Minimizer on constraint A 2. Encounter constraint B, move off constraint A 3. Minimizer on constraint B required solution LINEAR ALGEBRA Need to solve a sequence of EQP k s in which either W k+1 W k + {j} A k+1 or W k+1 W k \ {j} A k A k+1 a T j A k a T j Since working sets change gradually, aim to update factorizations rather than compute afresh

RANGE-SPACE APPROACH MATRIX UPDATES Need factors L k+1 L T k+1 A k+1 H 1 A T k+1 given L k LT k A k H 1 A T k A When A k+1 k a T j A k+1 H 1 A T A k+1 k H 1 A T k A k H 1 a j a T j H 1 A T k a T j H 1 a j where L k+1 L k l A k H 1 a j and λ L k l T λ a T j H 1 a j l T l Essentially reverse this to remove a constraint NULL-SPACE APPROACH MATRIX UPDATES Need factors A k+1 L k+1 Q k+1 given A k L k Q k L k Q 1 k Q 2 k To add a constraint to remove is similar A A k+1 k L a T k j a T Q k j QT 1 k at j QT 2 k L k I I a T j Q T 1 k at j Q T 2 k U T U [ ] [ ] L k I a T Q k j QT 1 k σet 1 U }{{}}{{} L k+1 Q k+1 where the Householder matrix U reduces Q 2 k a j to σe 1 Q k σ

[ FULL-SPACE APPROACH MATRIX UPDATES A W k becomes W l A k C A becomes A l C A D A A Solving H A T k A k y l y C y A H A T l A l H A T C A C A D s l y l A T D A A I A T A I g l s l y C y D y A u l g l ;... ] [ FULL-SPACE APPROACH MATRIX UPDATES CONT.... can solve H A T H A k T C A T D A T A s l g l A k A C y C A D I y D A A y A I u l using the factors of K k H A T k A k and the Schur complement A S l A H A T k I A k... 1 A T A I ]

[ SCHUR COMPLEMENT UPDATING Major iteration starts with factorization of H A T k K k A k As W k changes to W l, factorization of S l A A I is updated not recomputed H A T k A k 1 A T A I Once dim S l exceeds a given threshold, or it is cheaper to factorize/use K l than maintain/use K k and S l, start the next major iteration ] PHASE-1 To find an initial feasible point x such that Ax b use traditional simplex phase-1, or let r minb Ax guess,, and solve [x, ξ x guess, 1], ξ IR ξ subject to Ax + ξr b and ξ Alternatively, use a single-phase method Big-M: for some sufficiently large M, ξ IR qx + Mξ subject to Ax + ξr b and ξ l 1 QP ρ > may be reformulated as a QP qx + ρ maxb Ax,

CONVEX EXAMPLE 2 1.5 minx1 12 + x2.52 subject to x1 + x2 1 3x1 + x2 1.5 x1, x2 1.5.5 1.5.5 1 1.5 Contours of penalty function qx + ρk maxb Ax, k with ρ 2 NON-CONVEX EXAMPLE 2 1.5 min 2x1.252 + 2x2.52 subject to x1 + x2 1 3x1 + x2 1.5 x1, x2 1.5.5 1.5.5 1 1.5 Contours of penalty function qx + ρk maxb Ax, k with ρ 3

TERMINATION, DEGENERACY & ANTI-CYCLING So long as α k >, these methods are finite: finite number of steps to find an EQP with a feasible solution finite number of EQP with feasible solutions If x k is degenerate active constraints are dependent it is possible that α k. If this happens infinitely often may make no progress a cycle algorithm may stall Various anti-cycling rules Wolfe s and lexicographic perturbations least-index Bland s rule Fletcher s robust method NON-CONVEXITY causes little extra difficulty so long as suitable factorizations are possible Inertia-controlling methods tolerate at most one negative eigenvalue in the reduced Hessian. Idea is 1. start from working set on which problem is strictly convex e.g., a vertex 2. if a negative eigenvalue appears, do not drop any further constraints until 1. is restored 3. a direction of negative curvature is easy to obtain in 2. latest methods are not inertia controlling more flexible

COMPLEXITY When the problem is convex, there are algorithms that will solve QP in a polynomial number of iterations some interior-point algorithms are polynomial no known polynomial active-set algorithm When the problem is non-convex, it is unlikely that there are polynomial algorithms problem is NP complete even verifying that a proposed solution is locally optimal is NP hard NON-QUADRATIC OBJECTIVE When fx is non quadratic H H k changes active-set subproblem x k+1 arg min fx subject to a T i x [b] i for all i W k iteration now required but each step satisfies A k s linear algebra as before usually solve subproblem inaccurately when to stop? which Lagrange multipliers in this case? need to avoid zig-zagging in which working sets repeat