Lecture 15: SQP methods for equality constrained optimization

Lecture 15: SQP methods for equality constrained optimization Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 15: SQP methods for equality constrained optimization p. 1/22

Nonlinear equality-constrained problems again! min f(x) subject to c(x) = 0, (ecp) x R n where f : R n R, c = (c 1,...,c m ) : R n R m are C 1 or C 2 (when needed), with m n. easily generalized to inequality constraints... but may be better to use interior-point methods for these. (ecp): attempt to find local solutions or at least KKT points: L(x,y) = f(x) J(x) T y = 0 and c(x) = 0 nonlinear and square system in x and y (linear in y) = use Newton s method to find a correction (s,w) to (x,y) 2 L(x,y) J(x) T J(x) 0 s w = L(x,y) c(x) Lecture 15: SQP methods for equality constrained optimization p. 2/22

Alternative formulations Recall: the Lagrangian function is L(x,y) = f(x) y T c(x). unsymmetric: 2 L(x,y) J(x) T J(x) 0 s w = L(x,y) c(x) or symmetric: 2 L(x,y) J(x) T J(x) 0 s w = L(x,y) c(x) or (with y + = y + w) symmetric: 2 L(x,y) J(x) T J(x) 0 s y + = [unsymmetric also possible] f(x) c(x) Lecture 15: SQP methods for equality constrained optimization p. 3/22

Some more details often approximate with symmetric B 2 L(x,y) = e.g. B J(x)T J(x) 0 solve system using s y + unsymmetric (LU) factorization of = symmetric (indefinite) factorization of f(x) c(x) B J(x)T J(x) 0 B J(x)T J(x) 0 symmetric factorizations of B and the Schur complement J(x)B 1 J(x) T iterative method (GMRES(k), MINRES, CG within N(J), etc.) Lecture 15: SQP methods for equality constrained optimization p. 4/22

An alternative interpretation QP : minimize s IR n f(x) T s + 1 2 st Bs subject to J(x)s = c(x) QP = quadratic program first-order model of constraints c(x + s) second-order model of objective f(x + s)... but B includes curvature of constraints Solution to (QP) satisfies B J(x)T J(x) 0 s y + = f(x) c(x) Lecture 15: SQP methods for equality constrained optimization p. 5/22

Sequential quadratic programming - SQP or successive quadratic programming or recursive quadratic programming (RQP) A basic SQP method Given (x 0,y 0 ), set k = 0 Until convergence iterate: Compute a suitable symmetric B k using (x k,y k ) Find the solution of (QP k ) s k = arg min s IR n f(xk ) T s + 1 2s T B k s subject to J(x k )s = c(x k ) along with associated Lagrange multiplier estimates y k+1. Set x k+1 = x k + s k and let k := k + 1. Lecture 15: SQP methods for equality constrained optimization p. 6/22

Sequential quadratic programming - SQP... Advantanges of SQP: simple fast quadratically convergent with B k = 2 L(x k,y k ) superlinearly convergent with good B k 2 L(x k,y k ) don t actually need B k 2 L(x k,y k ) Issues with pure SQP how to choose B k? [similar to Newton s method for unconstrained opt and systems] what if QP k is unbounded from below? and when? how do we globalize the SQP iteration? Lecture 15: SQP methods for equality constrained optimization p. 7/22

Sequential quadratic programming - SQP... The QP k subproblem: minimize s IR n f(x k ) T s + 1 2sB k s subject to J(x k )s = c(x k ) need constraints to be consistent OK if J(x k ) is full rank need B k to be positive (semi-) definite when J(x k )s = 0 N T k Bk N k positive (semi-) definite where the columns of N k form a basis for the null space of J(x k ) Bk J(x k ) T J(x k ) 0 (is non-singular and) has m ve eigenvalues Lecture 15: SQP methods for equality constrained optimization p. 8/22

Linesearch SQP methods s k = arg min s IR n f(xk ) T s + 1 2s T B k s subject to J(x k )s = c(x k ) Linesearch SQP: Set x k+1 = x k + α k s k, where α k is chosen so that Φ(x k + α k s k,σ k ) < Φ(x k,σ k ) where Φ(x,σ) is a suitable merit function and σ k are parameters. Recall unconstrained GLM: crucial that s k is a descent direction for Φ(x,σ k ) at x k ; normally require that B k is positive definite. Lecture 15: SQP methods for equality constrained optimization p. 9/22

Suitable merit functions for SQP Recall the quadratic penalty function: Φ(x,σ) = f(x) + 1 2σ c(x) 2 Theorem 30: Suppose that B k is positive definite, and that (s k,y k+1 ) are the SQP search direction and its associated Lagrange multiplier estimates for the problem the (ecp) problem at x k. Then if x k is not a KKT point of (ecp), then s k is a descent direction for the quadratic penalty function Φ(x,σ k ) at x k whenever σ k c(xk ) y k+1. Lecture 15: SQP methods for equality constrained optimization p. 10/22

Suitable merit functions for SQP... Proof of Theorem 30: SQP direction s k and associated multiplier estimates y k+1 satisfy B k s k J(x k ) T y k+1 = f(x k ) ( ) J(x k )s k = c(x k ) ( ) (*) + (**) = (s k ) T f(x k ) = (s k ) T B k s k + (s k ) T J(x k ) T y k+1 (**) = c(x k ) T J(x k )s k = c(x k ) 2. = (s k ) T B k s k c(x k ) T y k+1 These, the positive definiteness of B k, the Cauchy-Schwarz inequality, the required bound on σ k, and s k 0 if x k is not critical = Lecture 15: SQP methods for equality constrained optimization p. 11/22

Suitable merit functions for SQP... Proof of Theorem 30: (continued) (s k ) T x Φ(x k,σ k ) = (s k ) ( f(x T k ) + 1 ) ) T c(x k ) σ kj(xk = (s k ) T B k s k c(x k ) T y k+1 c(xk ) 2 ( ) σ k c(x k < c(x k ) ) y k+1 0, and so s k is descent for Φ(x k,σ k ). Other suitable merit functions: The non-differentiable exact penalty function: [widely used] Ψ(x,ρ) = f(x) + ρ c(x), where can be any norm and ρ > 0. recall that minimizers of (ecp) correspond to those of Ψ(x, ρ) for ρ sufficiently large (and finite! ρ > y D ); equivalent of Th 30 holds (ρ k y k+1 D ). σ k Lecture 15: SQP methods for equality constrained optimization p. 12/22

The Maratos effect merit function may prevent acceptance of the full SQP step (so α k 1) arbitrarily close to x = slow convergence 0.2 0.1 0 x * 0.1 x k +s k 0.2 0.3 0.4 0.5 0.6 x k 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 f(x) = 2(x 2 1 + x2 2 1) x 1 and c(x) = x 2 1 + x2 2 1; solution: x = (1,0), y = 2. 3 Here: l 1 non-differentiable exact penalty function (ρ = 1) but other merit fcts. have similar behaviour. Lecture 15: SQP methods for equality constrained optimization p. 13/22

Avoiding the Maratos effect The Maratos effect occurs because the curvature of the constraints is not adequately represented by linearization in the SQP model: c(x k + s k ) = O( s k 2 ) = need to correct for this curvature Use a second-order correction from x k + s k : c(x k + s k + s k C ) = o( sk 2 ). Also, do not want to destroy potential for fast convergence = s k C = o(sk ). Lecture 15: SQP methods for equality constrained optimization p. 14/22

Popular second-order corrections minimum norm solution to c(x k + s k ) + J(x k + s k )s k C = 0 I J(x k + s k ) T J(x k + s k ) 0 sk C y k+1 C = 0 c(x k + s k ) minimum norm solution to c(x k + s k ) + J(x k )s k C = 0 I J(xk ) T J(x k ) 0 sk C y k+1 C = 0 c(x k + s k ) another SQP step from x k + s k...and so on... 2 L(x k + s k,y+ k ) J(xk + s k ) T J(x k + s k ) 0 sk C y k+1 C = L(xk + s k ) c(x k + s k ) Lecture 15: SQP methods for equality constrained optimization p. 15/22

Second-order corrections in action 0.2 0.1 0 x * x k +s k +s c k 0.1 x k +s k 0.2 0.3 0.4 0.5 0.6 x k 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 f(x) = 2(x 2 1 + x 2 2 1) x 1 and c(x) = x 2 1 + x 2 2 1; solution: x = (1,0), y = 3 2. Here: l 1 non-differentiable exact penalty function (ρ = 1) but other merit fcts. have similar behaviour. fast convergence x k + s k + s k C reduces Φ = global convergence Lecture 15: SQP methods for equality constrained optimization p. 16/22

Trust-region SQP methods Obvious trust-region approach: s k = arg min s IR n f(xk ) T s + 1 2 st B k s subject to J(x k )s = c(x k ) and s k do not require that B k be positive definite = can use B k = 2 L(x k,y k ) if k < CRIT where CRIT def = min s subject to J(x k )s = c(x k ) = no solution to trust-region subproblem = simple trust-region approach to SQP is flawed if c(x k ) 0 = need to consider alternatives Lecture 15: SQP methods for equality constrained optimization p. 17/22

Ã È ÈÈÈÈ ÈÕ Æ Infeasibility of the SQP step Ì Ð Ò Ö Þ ÓÒ ØÖ ÒØ ØÖÙ Ø Ö ÓÒ ¹ Ì + Ì ÒÓÒÐ Ò Ö ÓÒ ØÖ ÒØ Lecture 15: SQP methods for equality constrained optimization p. 18/22

An alternative: Composite-step methods Aim: find composite step s k = n k + t k where the normal step n k moves towards feasibility of the linearized constraints (within the trust region) J(x k )n k + c(x k ) < c(x k ) (model objective may get worse) the tangential step t k reduces the model objective function (within the trust-region) without sacrificing feasibility obtained from n k J(x k )(n k + t k ) = J(x k )n k = J(x k )t k = 0 Lecture 15: SQP methods for equality constrained optimization p. 19/22

Ò È ÈÈÈÈ ÈÕ Ò Normal and tangential steps Points on dotted line are all potential tangential steps Ì Ð Ò Ö Þ ÓÒ ØÖ ÒØ Æ Ö Ø ÔÓ ÒØ ÓÒ Ð Ò Ö Þ ÓÒ ØÖ ÒØ ÐÓ ØÓ Ò Ö Ø ÔÓ ÒØ ØÖÙ Ø Ö ÓÒ ¹ Ì + + Lecture 15: SQP methods for equality constrained optimization p. 20/22

Constraint reduction [Byrd-Omojokun] normal step: replace J(x k )s = c(x k ) and s k by approximately minimize J(x k )n + c(x k ) subject to n k tangential step: (approximate) arg min t IR n subject to ( f(x k ) + B k n k ) T t + 1 2t T B k t J(x k )t = 0 and n k + t k use conjugate gradients to solve both subproblems = Cauchy conditions satisfied in both cases globally convergent basis of successful KNITRO package Lecture 15: SQP methods for equality constrained optimization p. 21/22

Other alternatives composite step SQP methods constraint relaxation (Vardi) - not addressed here constraint reduction (Byrd Omojokun) -covered constraint lumping (Celis Dennis Tapia) - not addressed the Sl p QP method of Fletcher - not addressed here the filter-sqp approach of Fletcher and Leyffer - not addressed An important class of methods not addressed: active-set methods for linearly-constrained nonlinear problems (ie, generalization of simplex methods from the LP to the nonlinear case). Lecture 15: SQP methods for equality constrained optimization p. 22/22