Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where the objective function f : IR n IR assume that f C 1 sometimes C 2 and Lipschitz often in practice this assumption violated, but not necessary important special cases: linear programming: fx g T x quadratic programming: fx g T x + 2x 1 T Hx Concentrate here on quadratic programming
QUADRATIC PROGRAMMING QP: H is n by n, real symmetric, g IR n a T A 1. is m by n real, b a T m in general, constraints may have upper bounds: b l Ax b u include equalities: A e x b e qx g T x + 1 2x T Hx subject to Ax b [b] 1. [b] m involve simple bounds: x l x x u include network constraints... PROBLEM TYPES Convex problems H is positive semi-definite x T Hx for all x any local r is global important special case: H linear programming Strictly convex problems H is positive definite x T Hx > for all x unique r if any
CONVEX EXAMPLE 2 1.5 1.5 minx 1 1 2 + x 2.5 2 subject to x 1 + x 2 1 3x 1 + x 2 1.5 x 1, x 2.5 1.5.5 1 1.5 Contours of objective function PROBLEM TYPES II General non-convex problems H may be indefinite x T Hx < for some x may be many local rs may have to be content with a local r problem may be unbounded from below
NON-CONVEX EXAMPLE 2 1.5 1.5 min 2x 1.25 2 + 2x 2.5 2 subject to x 1 + x 2 1 3x 1 + x 2 1.5 x 1, x 2.5 1.5.5 1 1.5 Contours of objective function PROBLEM TYPES III Small values/structure of matrix data H and A irrelevant currently minm, n O1 2 Large values/structure of matrix data H and A important currently minm, n O1 3 Huge factorizations involving H and A are unrealistic currently minm, n O1 5
WHY IS QP SO IMPORTANT? many applications portfolio analysis, structural analysis, VLSI design, discrete-time stabilization, optimal and fuzzy control, finite impulse response design, optimal power flow, economic dispatch... 5 application papers prototypical nonlinear programming problem basic subproblem in constrained optimization: fx subject to cx f + g T x + 1 2x T Hx subject to Ax + c SQP methods Course Part 7 OPTIMALITY CONDITIONS Recall: the importance of optimality conditions is: to be able to recognise a solution if found by accident or design to guide the development of algorithms
FIRST-ORDER OPTIMALITY QP: qx g T x + 1 2x T Hx subject to Ax b Any point x that satisfies the conditions Ax b Hx + g A T y and y [Ax b] i [y ] i for all i primal feasibility dual feasibility complementary slackness for some vector of Lagrange multipliers y is a first-order critical or Karush-Kuhn-Tucker point If [Ax b] i [y ] i > for all i the solution is strictly complementary SECOND-ORDER OPTIMALITY QP: qx g T x + 1 2x T Hx subject to Ax b Let N + { s } a T i s for all i such that at i x [b] i and [y ] i > and a T i s for all i such that a T i x [b] i and [y ] i Any first-order critical point x for which additionally s T Hs resp. > for all s N + is a second-order resp. strong second-order critical point Theorem 4.1: x is a an isolated local r of QP x is strong second-order critical
WEAK SECOND-ORDER OPTIMALITY QP: qx g T x + 1 2x T Hx subject to Ax b Let N { s a T i s for all i such that a T i x [b] i } Any first-order critical point x for which additionally s T Hs for all s N is a weak second-order critical point Note that a weak second-order critical point may be a maximizer! checking for weak second-order criticality is easy strong is hard NON-CONVEX EXAMPLE 1.5 1.5 min x 2 1 + x 2 2 6x 1 x 2 subject to x 1 + x 2 1 3x 1 + x 2 1.5 x 1, x 2.5.5.5 1 1.5 Contours of objective function: note that escaping from the origin may be difficult!
[ DUALITY QP: qx g T x + 1 2x T Hx subject to Ax b If QP is convex, any first-order critical point is a global r If H is strictly convex, the problem maximize y IR m, y is known as the dual of QP QP is the primal 1 2g T H 1 g + AH 1 g + b T y 1 2y T AH 1 A T y primal and dual have same KKT conditions if primal is feasible, optimal value of primal optimal value dual can be generalized for simply convex case ] ALGORITHMS Essentially two classes of methods slight simplification active set methods : primal active set methods aim for dual feasibility while maintaining primal feasibility and complementary slackness dual active set methods aim for primal feasibility while maintaining dual feasibility and complementary slackness interior-point methods : aim for complementary slackness while maintaining primal and dual feasibility Course Part 6
EQUALITY CONSTRAINED QP The basic subproblem in all of the methods we will consider is EQP: g T x + 1 2x T Hx subject to Ax N.B. Assume A is m by n, full-rank preprocess if necessary First-order optimality Lagrange multipliers y H A T x g A y Second-order necessary optimality: s T Hs for all s for which As Second-order sufficient optimality: s T Hs > for all s for which As EQUALITY CONSTRAINED QP II EQP: Four possibilities: i qx g T x + 1 2x T Hx subject to Ax H A T A x g y and H is second-order sufficient unique r x ii holds, H is second-order necessary, but s such that Hs and As family of weak rs x + αs for any α IR iii s for which As, Hs and g T s < q unbounded along direction of linear infinite descent s iv s for which As and s T Hs < q unbounded along direction of negative curvature s
CLASSIFICATION OF EQP METHODS Aim to solve H A T A x g y Three basic approaches: full-space approach range-space approach null-space approach For each of these can use direct factorization method iterative conjugate-gradient method FULL-SPACE/KKT/AUGMENTED SYSTEM APPROACH KKT matrix H A T A K x g y H A T A is symmetric, indefinite use Bunch-Parlett type factorization K P LBL T P T P permutation, L unit lower-triangular B block diagonal with 1x1 and 2x2 blocks LAPACK for small problems, MA27/MA57 for large ones Theorem 4.2: H is second-order sufficient K non-singular and has precisely m negative eigenvalues
RANGE-SPACE APPROACH H A T x g A y For non-singular H eliminate x using first block of AH 1 A T y AH 1 g followed by Hx g + A T y strictly convex case H and AH 1 A T positive definite Cholesky factorization Theorem 4.3: H is second-order sufficient H and AH 1 A T have same number of negative eigenvalues AH 1 A T usually dense factorization only for small m NULL-SPACE APPROACH H A T x A y g let n by n m S be a basis for null-space of A AS second block x Sx N premultiply first block by S T S T HSx S S T g Theorem 4.4: H is second-order sufficient S T HS is positive definite Cholesky factorization S T HS usually dense factorization only for small n m
NULL-SPACE BASIS Require n by n m null-space basis S for A AS Non-orthogonal basis: let A A 1 A 2 P P permutation, A 1 non-singular S P T A 1 1 A 2 I generally suitable for large problems. Best A 1? Orthogonal basis: let A L Q L non-singular e.g., triangular, Q Q 1 Q 2 orthonormal S Q T 2 more stable but... generally unsuitable for large problems [ ITERATIVE METHODS FOR SYMMETRIC LINEAR SYSTEMS Bx b Best methods are based on finding solutions from the Krylov space K {r, Br, BBr,...} r b Bx B indefinite: use MINRES method B positive definite: use conjugate gradient method usually satisfactory to find approximation rather than exact solution usually try to precondition system, i.e., solve C 1 Bx C 1 b where C 1 B I ]
[ ITERATIVE RANGE-SPACE APPROACH AH 1 A T y AH 1 g followed by Hx g + A T y For strictly convex case H and AH 1 A T positive definite H 1 available: directly or via factors, use conjugate gradients to solve AH 1 A T y AH 1 g matrix vector product AH 1 A T v A H 1 A T v preconditioning? Need to approximate likely dense AH 1 A T H 1 not available: use composite conjugate gradient method Urzawa s method iterating both on solutions to AH 1 A T y AH 1 g and Hx g + A T y at the same time may not converge ] [ ITERATIVE NULL-SPACE APPROACH S T HSx N S T g followed by x Sx N use conjugate gradient method matrix vector product S T HSv N S T HSv N preconditioning? Need to approximate likely dense S T HS if we encounter s N such that s T NS T HSs N < s Ns N is a direction of negative curvature since As and s T Hs < Advantage: Ax approx ]
[ ITERATIVE FULL-SPACE APPROACH H A T x g A y use MINRES with the preconditioner M AN 1 A T where M and N H. Disadvantage: Ax approx use conjugate gradients with the preconditioner M A T A where M H. Advantage: Ax approx ] ACTIVE SET ALGORITHMS QP: The active set Ax at x is If x solves QP, we have qx g T x + 1 2x T Hx subject to Ax b Ax {i a T i x [b] i } arg min qx subject to Ax b arg min qx subject to a T i x [b] i for all i Ax A working set Wx at x is a subset of the active set for which the vectors {a i }, i Wx are linearly independent
BASICS OF ACTIVE SET ALGORITHMS Basic idea: Pick a subset W k of {1,..., m} and find x k+1 arg min qx subject to a T i x [b] i for all i W k If x k+1 does not solve QP, adjust W k to form W k+1 and repeat Important issues are: how do we know if x k+1 solves QP? if x k+1 does not solve QP, how do we pick the next working set W k+1? Notation: rows of A k are those of A indexed by W k components of b k are those of b indexed by W k PRIMAL ACTIVE SET ALGORITHMS Important feature: ensure all iterates are feasible, i.e., Ax k b If W k Ax k A k x k b k and A k x k+1 b k x k+1 x k + s k, where s k arg min EQP k arg min qx k + s subject to A k s }{{} equality constrained problem Need an initial feasible point x
PRIMAL ACTIVE SET ALGORITHMS ADDING CONSTRAINTS s k arg min qx k + s subject to A k s What if x k + s k is not feasible? a currently inactive constraint j must become active at x k + α k s k for some α k < 1 pick the smallest such α k move instead to x k+1 x k + α k s k and set W k+1 W k + {j} PRIMAL ACTIVE SET ALGORITHMS DELETING CONSTRAINTS What if x k+1 x k + s k is feasible? x k+1 arg min qx subject to a T i x [b] i Lagrange multipliers y k+1 such that Three possibilities: H A T k A k x k+1 y k+1 qx k+1 not strictly-convex case only g b k for all i W k y k+1 x k+1 is a first-order critical point of QP [y k+1 ] i < for some i qx may be improved by considering W k+1 W k \ {j}, where j is the i-th member of W k
ACTIVE-SET APPROACH 2 1.5 1.5 A 1 B.5 1.5.5 1 1.5 2 3 1. Starting point. Unconstrained r 1. Encounter constraint A 1. Minimizer on constraint A 2. Encounter constraint B, move off constraint A 3. Minimizer on constraint B required solution LINEAR ALGEBRA Need to solve a sequence of EQP k s in which either W k+1 W k + {j} A k+1 or W k+1 W k \ {j} A k A k+1 a T j A k a T j Since working sets change gradually, aim to update factorizations rather than compute afresh
RANGE-SPACE APPROACH MATRIX UPDATES Need factors L k+1 L T k+1 A k+1 H 1 A T k+1 given L k LT k A k H 1 A T k A When A k+1 k a T j A k+1 H 1 A T A k+1 k H 1 A T k A k H 1 a j a T j H 1 A T k a T j H 1 a j where L k+1 L k l A k H 1 a j and λ L k l T λ a T j H 1 a j l T l Essentially reverse this to remove a constraint NULL-SPACE APPROACH MATRIX UPDATES Need factors A k+1 L k+1 Q k+1 given A k L k Q k L k Q 1 k Q 2 k To add a constraint to remove is similar A A k+1 k L a T k j a T Q k j QT 1 k at j QT 2 k L k I I a T j Q T 1 k at j Q T 2 k U T U [ ] [ ] L k I a T Q k j QT 1 k σet 1 U }{{}}{{} L k+1 Q k+1 where the Householder matrix U reduces Q 2 k a j to σe 1 Q k σ
[ FULL-SPACE APPROACH MATRIX UPDATES A W k becomes W l A k C A becomes A l C A D A A Solving H A T k A k y l y C y A H A T l A l H A T C A C A D s l y l A T D A A I A T A I g l s l y C y D y A u l g l ;... ] [ FULL-SPACE APPROACH MATRIX UPDATES CONT.... can solve H A T H A k T C A T D A T A s l g l A k A C y C A D I y D A A y A I u l using the factors of K k H A T k A k and the Schur complement A S l A H A T k I A k... 1 A T A I ]
[ SCHUR COMPLEMENT UPDATING Major iteration starts with factorization of H A T k K k A k As W k changes to W l, factorization of S l A A I is updated not recomputed H A T k A k 1 A T A I Once dim S l exceeds a given threshold, or it is cheaper to factorize/use K l than maintain/use K k and S l, start the next major iteration ] PHASE-1 To find an initial feasible point x such that Ax b use traditional simplex phase-1, or let r minb Ax guess,, and solve [x, ξ x guess, 1], ξ IR ξ subject to Ax + ξr b and ξ Alternatively, use a single-phase method Big-M: for some sufficiently large M, ξ IR qx + Mξ subject to Ax + ξr b and ξ l 1 QP ρ > may be reformulated as a QP qx + ρ maxb Ax,
CONVEX EXAMPLE 2 1.5 minx1 12 + x2.52 subject to x1 + x2 1 3x1 + x2 1.5 x1, x2 1.5.5 1.5.5 1 1.5 Contours of penalty function qx + ρk maxb Ax, k with ρ 2 NON-CONVEX EXAMPLE 2 1.5 min 2x1.252 + 2x2.52 subject to x1 + x2 1 3x1 + x2 1.5 x1, x2 1.5.5 1.5.5 1 1.5 Contours of penalty function qx + ρk maxb Ax, k with ρ 3
TERMINATION, DEGENERACY & ANTI-CYCLING So long as α k >, these methods are finite: finite number of steps to find an EQP with a feasible solution finite number of EQP with feasible solutions If x k is degenerate active constraints are dependent it is possible that α k. If this happens infinitely often may make no progress a cycle algorithm may stall Various anti-cycling rules Wolfe s and lexicographic perturbations least-index Bland s rule Fletcher s robust method NON-CONVEXITY causes little extra difficulty so long as suitable factorizations are possible Inertia-controlling methods tolerate at most one negative eigenvalue in the reduced Hessian. Idea is 1. start from working set on which problem is strictly convex e.g., a vertex 2. if a negative eigenvalue appears, do not drop any further constraints until 1. is restored 3. a direction of negative curvature is easy to obtain in 2. latest methods are not inertia controlling more flexible
COMPLEXITY When the problem is convex, there are algorithms that will solve QP in a polynomial number of iterations some interior-point algorithms are polynomial no known polynomial active-set algorithm When the problem is non-convex, it is unlikely that there are polynomial algorithms problem is NP complete even verifying that a proposed solution is locally optimal is NP hard NON-QUADRATIC OBJECTIVE When fx is non quadratic H H k changes active-set subproblem x k+1 arg min fx subject to a T i x [b] i for all i W k iteration now required but each step satisfies A k s linear algebra as before usually solve subproblem inaccurately when to stop? which Lagrange multipliers in this case? need to avoid zig-zagging in which working sets repeat