SF2822 Applied Nonlinear Optimization Lecture 9: Sequential quadratic programming Anders Forsgren SF2822 Applied Nonlinear Optimization, KTH / 24 Lecture 9, 207/208 Preparatory question. Try to solve theory question 6. SF2822 Applied Nonlinear Optimization, KTH 2 / 24 Lecture 9, 207/208
Optimality conditions for nonlinear programs Consider an equality-constrained nonlinear programming problem P = ) f x) subject to gx) = 0, where f, g C 2, g : IR n IR m. If the Lagrangian function is defined as Lx, λ) = f x) λ T gx), the first-order optimality conditions are Lx, λ) = 0. We write them as ) ) ) x Lx, λ) f x) Ax) T λ 0 = =, λ Lx, λ) gx) 0 where Ax) T = g x) g 2 x) g m x) ). SF2822 Applied Nonlinear Optimization, KTH 3 / 24 Lecture 9, 207/208 Newton s method for solving a nonlinear equation Consider solving the nonlinear equation f u) = 0, where f : IR n IR, f C 2. Then, f u + p) = f u) + 2 f u)p + o p ). Linearization given by f u) + 2 f u)p. Choose p so that f u) + 2 f u)p = 0, i.e., solve 2 f u)p = f u). A Newton iteration takes the following form for a given u. p solves 2 f u)p = f u). u u + p. The nonlinear equation need not be a gradient.) SF2822 Applied Nonlinear Optimization, KTH 4 / 24 Lecture 9, 207/208
Speed of convergence for Newton s method Theorem Assume that f C 3 and that f u ) = 0 with 2 f u ) nonsingular. Then, if Newton s method with steplength one) is started at a point sufficiently close to u, then it is well defined and converges to u with convergence rate at least two, i.e., there is a constant C such that u k+ u C u k u 2. The proof can be given by studying a Taylor-series expansion, u k+ u = u k 2 f u k ) f u k ) u = 2 f u k ) f u ) f u k ) 2 f u k )u u k )). For u k sufficiently close to u, f u ) f u k ) 2 f u k )u u k ) C u k u 2. SF2822 Applied Nonlinear Optimization, KTH 5 / 24 Lecture 9, 207/208 One-dimensional example for Newton s method For a positive number d, consider computing /d by minimizing f u) = du ln u. Then, f u) = d u, f u) = u 2. We see that u = d. u k+ = u k f d u k ) f u k ) = u u k k = 2u k uk 2 d. uk 2 Then, u k+ d = 2u k uk 2 d d = d u k ) 2. d SF2822 Applied Nonlinear Optimization, KTH 6 / 24 Lecture 9, 207/208
One-dimensional example for Newton s method, cont. Graphical picture for d = 2. SF2822 Applied Nonlinear Optimization, KTH 7 / 24 Lecture 9, 207/208 First-order optimality conditions as a system of equations The first-order necessary optimality conditions may be viewed as a system of n + m nonlinear equations with n + m unknowns, x and λ, according to ) ) f x) Ax) T λ 0 =, gx) 0 A Newton iteration takes the form where 2 xxlx, λ) Ax) T Ax) 0 x + λ + ) ) ) x p = +, λ ν ) ) ) p f x) + Ax) T λ =, ν gx) for Lx, λ) = f x) λ T gx). SF2822 Applied Nonlinear Optimization, KTH 8 / 24 Lecture 9, 207/208
First-order optimality conditions as a system of equations, cont. The resulting Newton system may equivalently be written as ) ) ) 2 xxlx, λ) Ax) T p f x) =, Ax) 0 λ + ν gx) alternatively 2 xxlx, λ) Ax) T Ax) 0 ) ) ) p f x) λ + =. gx) We prefer the form with λ +, since it can be directly generalized to problems with inequality constraints. SF2822 Applied Nonlinear Optimization, KTH 9 / 24 Lecture 9, 207/208 Quadratic programming with equality constraints Compare with an equality-constrained quadratic programming problem 2 pt Hp + c T p EQP) subject to Ap = b, p IR n, where the unique optimal solution p and multiplier vector λ + are given by ) ) ) H A T p c A 0 λ + =, b if Z T HZ 0 and A has full row rank. SF2822 Applied Nonlinear Optimization, KTH 0 / 24 Lecture 9, 207/208
Newton iteration and equality-constrained quadratic program ) ) ) 2 Compare xxlx, λ) Ax) T p f x) Ax) 0 λ + = gx) ) ) ) H A T p c with A 0 λ + =. b Identify: 2 xxlx, λ) H f x) c Ax) A gx) b. SF2822 Applied Nonlinear Optimization, KTH / 24 Lecture 9, 207/208 Newton iteration as a QP problem A Newton iteration for solving the first-order necessary optimality conditions to P = ) may be viewed as solving the QP problem QP = ) 2 pt 2 xxlx, λ)p + f x) T p subject to Ax)p = gx), p IR n, and letting x + = x + p, and λ + are given by the multipliers of QP = ). Problem QP = ) is well defined with unique optimal solution p and multiplier vector λ + if Z x) T 2 xxlx, λ)z x) 0 and Ax) has full row rank, where Z x) is a matrix whose columns form a basis for nullax)). SF2822 Applied Nonlinear Optimization, KTH 2 / 24 Lecture 9, 207/208
An SQP iteration for problems with equality constraints Given x, λ such that Z x) T 2 xxlx, λ)z x) 0 and Ax) has full row rank, a Newton iteration takes the following form.. Compute optimal solution p and multiplier vector λ + to QP = ) 2 pt 2 xxlx, λ)p + f x) T p subject to Ax)p = gx), p IR n, 2. x x + p, λ λ +. We call this method sequential quadratic programming SQP). Note! QP = ) is solved by solving a system of linear equations. Note! x and λ have given numerical values in QP = ). SF2822 Applied Nonlinear Optimization, KTH 3 / 24 Lecture 9, 207/208 Speed of convergence for SQP method for equality-constrained problems Theorem Assume that f C 3, g C 3 on R n and that x is a regular point that satisfies the second-order sufficient optimality conditions of P = ). If the SQP method with steplength one) is started at a point sufficiently close to x, λ, then it is well defined and converges to x, λ with convergence rate at least two. Proof. In a neighborhood of x, λ it holds that ) Z x) T 2 2 xxlx, λ)z x) 0 and xxlx, λ) Ax) T is Ax) 0 nonsingular. The subproblem QP = ) is hence well defined and the result follows from Newton s method. Note! The iterates are normally not feasible to P = ). SF2822 Applied Nonlinear Optimization, KTH 4 / 24 Lecture 9, 207/208
SQP method for equality-constrained problems So far we have discussed SQP for P = ) in an ideal case. Comments: If Z x) T 2 xxlx, λ)z x) 0 we may replace 2 xxlx, λ) by B in QP = ), where B is a symmetric approximation of 2 xxlx, λ) that satisfies Z x) T BZ x) 0. A quasi-newton approximation B of 2 xxlx, λ) may be used. If Ax) does not have full row rank Ax)p = gx) may lack solution. This may be overcome by introducing elastic variables. This is not covered here. We have shown local convergence properties. To obtain convergence from an arbitrary initial point we may utilize a merit function and use linesearch. SF2822 Applied Nonlinear Optimization, KTH 5 / 24 Lecture 9, 207/208 Enforcing convergence by a linesearch strategy Compute optimal solution p and multiplier vector λ + to QP = ) 2 pt 2 xxlx, λ)p + f x) T p subject to Ax)p = gx), p IR n, x x + αp, where α is determined in a linesearch to give sufficient decrease of a merit function. Ideally, α = eventually.) SF2822 Applied Nonlinear Optimization, KTH 6 / 24 Lecture 9, 207/208
Example of merit function for SQP on P = ) A merit function typically consists of a weighting of optimality and feasibility. An example is the augmented Lagrangian merit function M µ x) = f x) λx) T gx) + 2µ gx)t gx), where µ is a positive parameter and λx) = Ax)Ax) T ) Ax) f x). The vector λx) is here the least-squares solution of Ax) T λ = f x).) Then the SQP solution p is a descent direction to M µ at x if µ is sufficiently close to zero and Z x) T BZ x) 0, see Nocedal and Wright pp. 545 546. We may then carry out a linesearch on M µ in the x-direction and define λx) = Ax)Ax) T ) Ax) f x). Ideally the step length is chosen as α =. We consider the pure method, where α = and λ + is given by QP = ). SF2822 Applied Nonlinear Optimization, KTH 7 / 24 Lecture 9, 207/208 SQP for inequality-constrained problems In the SQP subproblem QP = ), the constraints are approximated by a linearization around x, i.e., the requirement on p is g i x) + g i x) T p = 0, i =,..., m. For an inequality constraint g i x) 0 this requirement may be generalized to g i x) + g i x) T p 0. An SQP method gives in each iteration a prediction of the active constraints in P) by the constraints that are active in the SQP subproblem. The QP subproblem gives nonnegative multipliers for the inequality constraints. SF2822 Applied Nonlinear Optimization, KTH 8 / 24 Lecture 9, 207/208
The SQP subproblem for a nonlinear program The problem P) f x) subject to g i x) 0, i I, g i x) = 0, i E, x IR n, where f, g C 2, g : IR n IR m, has, at a certain point x, λ, an SQP subproblem QP) 2 pt 2 xxlx, λ)p + f x) T p subject to g i x) T p g i x), i I, g i x) T p = g i x), i E, p IR n, which has optimal solution p and Lagrange multiplier vector λ +. SF2822 Applied Nonlinear Optimization, KTH 9 / 24 Lecture 9, 207/208 An SQP iteration for nonlinear optimization problem Given x, λ such that 2 xxlx, λ) 0, an SQP iteration for P) takes the following form.. Compute optimal solution p and multiplier vector λ + to QP) 2 pt 2 xxlx, λ)p + f x) T p subject to g i x) T p g i x), i I, g i x) T p = g i x), i E, p IR n. 2. x x + p, λ λ +. Note that λ i 0, i I, is maintained since λ + are Lagrange multipliers to QP). SF2822 Applied Nonlinear Optimization, KTH 20 / 24 Lecture 9, 207/208
Theorem Speed of convergence for SQP method Assume that f C 3, g C 3 on R n and that x is a regular point that satisfies the second-order sufficient optimality conditions of P) with strict complementarity. If the SQP method with steplength one) is started at a point sufficiently close to x, λ, then it is well defined, has the same active constraints as P) has at x, and converges to x, λ with convergence rate at least two. Proof. If the constraints that are not active at x are ignored, the SQP method converges according to the discussion from the equality constrained case. Hence, p tends to zero quadratically. For constraint i, g i x) + g i x) T p g i x) g i x) p. For x sufficiently close to x, if g i x ) > 0, then g i x) > 0, and for p small enough, g i x) + g i x) T p g i x) g i x) p > 0. SF2822 Applied Nonlinear Optimization, KTH 2 / 24 Lecture 9, 207/208 SQP method for nonlinear optimization We have discussed the ideal case. Comments: If 2 xxlx, λ) 0, we may replace 2 xxlx, λ) by B in QP), where B is a symmetric approximation of 2 xxlx, λ) that satisfies B 0. A quasi-newton approximation B of 2 xxlx, λ) may be used. Example SQP quasi-newton solver: SNOPT.) The QP subproblem may lack feasible solutions. This may be overcome by introducing elastic variables. This is not covered here. We have shown local convergence properties. To obtain convergence from an arbitrary initial point we may utilize a merit function and use linesearch or trust-region strategy. SF2822 Applied Nonlinear Optimization, KTH 22 / 24 Lecture 9, 207/208
Example problem Consider small example problem P) 2 x + ) 2 + 2 x 2 + 2) 2 subject to 3x + x 2 2) 2 x x 2 ) 2 + 6 = 0, x 0, x 2 0, x IR 2. Optimal solution x 0.5767 0.043) T, λ 0.285. SF2822 Applied Nonlinear Optimization, KTH 23 / 24 Lecture 9, 207/208 Graphical illustration of example problem Optimal solution x 0.5767 0.043) T, λ 0.285. SF2822 Applied Nonlinear Optimization, KTH 24 / 24 Lecture 9, 207/208