Constrained Nonlinear Optimization Algorithms

Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu Institute for Mathematics and its Applications University of Minnesota August 4, 2016

Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable. No is convexity required. Most algorithms for NLP have theoretical convergence guarantee only to stationary points;

Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

Design and Operation of Chemical Plant

Design and Operation of Chemical Plant Minimize Costs Profit Variables: Physical quantities Constraints: physical relationships (conservation laws; themodyn. rel.) Limits (physical and operational) < 10 5 variables; few degrees of freedom

Design Under Uncertainty Scenario parameters: Fin l, x in l, T env, l penv, l... (given) (defining scenarios l = 1,..., L) Design variables: V react, D dist, h tank,... Control variables: Q l heat, r l refl, v l valve,... State variables: x l str, F l str, L l tank, T l, p l,...

Design Under Uncertainty c(x) T = Scenario parameters: Fin l, x in l, T env, l penv, l... (given) (defining scenarios l = 1,..., L) Design variables: V react, D dist, h tank,... Control variables: Q l heat, r l refl, v l valve,... State variables: x l str, F l str, L l tank, T l, p l,...

Optimal Control / Dynamic Optimization min f (z(t f ), y(t f ), u(t f ), p) z,y,u,p s.t. F (ż(t), z(t), y(t), u(t), p) = 0 G(z(t), y(t), u(t), p) = 0 z(0) = z init bound constraints u : [0, t f ] R nu z : [0, t f ] R nz y : [0, t f ] R ny p R np z init t f control variables differentiable state variables algebraic state variables time-independent parameters initial conditions final time Large-scale NLPs arise from discretization.

Circuit Tuning Model consists of network of gates. Gate delays computed by simulation (expensive, noisy). Model has many variables (up to 1, 000, 000). Implemented in IBM s circuit tuning tool EinsTuner.

Hyperthermia Treatment Planning min T (x),u 1 (T (x) T target (x)) 2 dx 2 Ω s.t. T (x) w(t (x) T blood ) = u M(x)u in Ω T (x) n = T exterior T (x) on Ω T (x) T max in Ω \ Ω Tumor Heat tumors with microwaves (support chemo- and radio-therapy). Model is a PDE with controls: Application u of microwave antennas. states: Temperature T (x) defined over domain Ω. Finite-dimensional problem obtained by discretization. e.g., finite differences, finite elements Resulting NLP is usually very large.

Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I

Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control). Important building block for methods for general NLP. Algorithms: Active-set methods Interior-point methods

Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP)

Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP) First-order optimality conditions: Qx + g + A T λ = 0 Ax + b = 0

Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP) First-order optimality conditions: Qx + g + A T λ = 0 Ax + b = 0 Find stationary point (x, λ ) by solving the linear system [ ] ( ) ( ) Q A T x g A 0 λ =. b

KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)?

KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)? Recall the second-order optimality conditions: Let the columns of Z R n (n m) be a basis of the null-space of A, so AZ = 0 ( null-space matrix ). Then x is a strict local minimizer of (EQP) if Z T QZ 0. On the other hand: If Z T QZ has negative eigenvalue, then (EQP) is unbounded below.

Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z?

Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z? Definition Let n +, n, n 0 be the number of positive, negative, and zero eigenvalues of a matrix M. Then In(M) = (n +, n, n 0 ) is the inertia of M. Theorem Suppose that A has full rank. Then: In(K) = In(Z T QZ) + (m, m, 0).

Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity. Obtain inertia simply from counting eigenvalues of the blocks in B.

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular.

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1.

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ Requirements: Solutions with Q can be done efficiently

Step Decomposition 0

Step Decomposition 0 x

Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space

Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space x Y = Yp Y and x Z = Zp Z where [Y Z] is basis of R n and Z is null space matrix for A.

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y )

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y ) Solves min pz 1 2 pt Z [Z T QZ]p Z + (g + QYp Y ) T Zp Z ( reduced QP )

Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω

Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω Given the (independent) control variable u: (Dependent) state T is solution of PDE Can use well-established solution techniques

Discretized PDE-Constrained Problem 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω n min (t i ˆt i ) 2 n u + u t,u i 2 i=1 i=1 s.t. D t + Ku + b = 0 Discretized state variables t R N Discretized non-singular(!) differential operator D R N N Given controls u, the state variables can be computed from t = D 1 (Ku + b).

Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I

Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I p Y = [AY ] 1 d = B 1 b p Z = [Z T QZ] 1 Z T (g + QYp Y ) λ = [AY ] T Y (Qx + g) = B T Y (Qx + g) Can use existing implementations of operator B 1 : Compute Z and pz (assuming N has few columns). Compute λ (assuming that we have implementation for B T ).

Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option

Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option Schur-complement Method Requires that Q is positive definite and easy to solve (e.g., diagonal) Number of constraints m should not be large Null-space method Step decomposition into range-space step and null-space step Permits exploitation of constraint matrix structure Number of degrees of freedom (n m) should not be large

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Qx + g + a i λ i = 0 i E I ai T x + b i = 0 for i E ai T x + b i 0 for i I λ i 0 for i I (ai T x + b i )λ i = 0 for i I

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Assume: Q is positive definite; {a i } i E are linearly independent. Qx + g + i E I a i λ i = 0 a T i x + b i = 0 for i E a T i x + b i 0 for i I λ i 0 for i I (a T i x + b i )λ i = 0 for i I

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W Set missing multipliers λ i = 0 for i I \ W and verify a T i x + b i λ i? 0 for i I \ W? 0 for i I

Example QP (1) (2) (4) (5) (3) min (x 1 1) 2 + (x 2 2.5) 2 s.t. x 1 + 2x 2 2 0 (1) x 1 0 (4) x 1 + 2x 2 6 0 (2) x 2 0 (5) x 1 2x 2 2 0 (3)

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) x = (0, 2) Initialization: Choose feasible starting iterate x

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) Initialization: Choose feasible starting iterate x Choose working set W I with i W = a T i x + b i = 0 {a i } i E W are linear independent (Algorithm will maintain these properties)

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Solve (EQP)

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP).

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP). Release Constraint: Pick constraint i with λ i < 0.

Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Solve (EQP)

Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP).

Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP). Take step (x EQP is feasible): Update iterate x x EQP

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Solve (EQP)

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP)

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP) Release Constraint: Pick constraint i with λ i < 0.

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) W = x = (1, 0) x EQP = (1, 2.5) Solve (EQP)

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5)

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible):

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) W = x = (1, 0) x EQP = (1, 2.5) Status: Current iterate not optimal for (EQP) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible Update iterate x x + α(x EQP x) Update W W {i} = {1} where constraint i = 1 is blocking

Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Solve (EQP)

Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is not optimal for (EQP).

Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is not optimal for (EQP). Take step (x EQP feasible): Update iterate x x EQP. Keep W.

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Solve (EQP)

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP)

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP) λ i 0 for all i W.

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP) λ i 0 for all i W. Declare Optimality!

Primal Active-Set QP Method 1. Select feasible x and W I A(x).

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP.

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP :

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done!

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}.

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x. Compute α = arg max{α [0, 1] : x + αp is feasible}. If α < 1, pick i I \ W with a T i p > 0 and ai T (x + αp) + b i = 0, and set W W {i}.

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent.

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps) Efficient solution of (EQP) Update the factorization when W changes. Complexity Fast convergence if good estimate of optimal working set is given. Worst case exponential complexity. Alternative: Interior-point QP solvers (polynomial complexity).

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence!

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence! Issues: Guarantees only (fast) local convergence. We would like to find local minima and not just any kind of stationary point.

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k f k := f (x k ) c k := c(x k ) c k := c(x k )

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k f k := f (x k ) c k := c(x k ) c k := c(x k ) m L(x, λ) := f (x) + c j (x)λ j H k := 2 xxl(x k, λ k ) j=1

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k Update iterate (x k+1, λ k+1 ) = (x k, λ k ) + (p k, p λ k ) f k := f (x k ) c k := c(x k ) c k := c(x k ) m L(x, λ) := f (x) + c j (x)λ j H k := 2 xxl(x k, λ k ) j=1

Sequential Quadratic Programming [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k

Sequential Quadratic Programming [ ] ( ) ( Hk c k pk fk + c ck T 0 λ k + pk λ = ) k λ k c k

Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k λ k+1 = λ k + p λ k

Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k These are the optimality conditions of 1 min p R n 2 pt H k p + fk T p + f k s.t. c T k p + c k = 0 with multipliers λ k+1 = λ k + p λ k

Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k These are the optimality conditions of 1 min p R n 2 pt H k p + fk T p + f k s.t. c T k p + c k = 0 with multipliers λ k+1 = λ k + p λ k Newton step can be interpreted as solution of a local QP model!

Local QP Model x k x c(x) = 0 Original Problem min f (x) x R n s.t. c(x) = 0

Local QP Model x k p x c(x) = 0 Original Problem min f (x) x R n s.t. c(x) = 0 Local QP model (QP k ) min f p R n k + fk T p + 1 2 pt H k p s.t. c k + c T k p = 0

Exact Penalty Function Need tool to facilitate convergence from any starting point.

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals:

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x)

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x) Combined in (non-differentiable) exact penalty function: Lemma φ ρ (x) = f (x) + ρ c(x) 1 (ρ > 0) Suppose, x is a local minimizer of (NLP) with multipliers λ and LICQ holds. Then x is a local minimizer of φ ρ if ρ > λ.

Line Search φ ρ (x) = f (x) + ρ c(x) 1

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1))

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k.

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until Lemma φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k. Let p k be an optimal solution of (QP k ). Then Dφ ρ (x k ; p k ) p T k H k p k (ρ λ k+1 ) c k 1.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1. 3. Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1. 3. Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise. 4. Perform backtracking line search: Find largest α k {1, 1 2, 1 4,...} with φ ρk (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρk (x k ; p k ).

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable.

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite.

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite. The smallest singular value of c k is uniformly bounded away from zero. Theorem Under these assumptions, we have ( ) fk + c lim k λ k+1 = 0. k c k

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k.

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex.

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ. Would like Zk T H kz k 0 (Z k null-space matrix for ck T ).

Regularization Recall optimality conditions 1 min p R n 2 pt H k p + fk T p [ ] Hk c k ck T 0 }{{} =:K k s.t. c T k p + c k = 0 ( pk λ k+1 ) = ( ) fk c k

Regularization Recall optimality conditions Choose γ 0 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk c k

Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = Choose γ 0 so that K k has inertia (n, m, 0). ( ) fk c k Then Z T k H kz k is positive definite.

Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk Choose γ 0 so that K k has inertia (n, m, 0). E.g.: Trial and error, computing inertia via factorization Then Z T k H kz k is positive definite. c k

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k Reduced space QP 1 min p Z 2 pt Z [Zk T H k Z k ]p Z + ( f k + H k Y k p Y,k ) T Z k p Z

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k Step p k is accepted if pred k ared k η with (η (0, 1)) pred k = m k (0) m k (p k ), ared k = φ ρ (x k ) φ ρ (x k + p k ) Piece-wise quadratic model of φ ρ (x) = f (x) + ρ c(x) 1 : m k (p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1

Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible

Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible One remedy: Penalize constraint violation 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 p k

Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible One remedy: Penalize constraint violation 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1

Fletcher s Sl 1 QP 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1

Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1

Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1 Natural algorithm for minimizing φ ρ (x).

Byrd-Omojokun Trust-Region Algorithm x k n k x Decompose step p k = n k + t k

Byrd-Omojokun Trust-Region Algorithm x k n k x Decompose step p k = n k + t k Normal component n k towards feasibility

Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility Tangential component t k towards optimality

Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility Tangential component t k towards optimality min n c T k n + c k 2 2 s.t. n 2 0.8 k

Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n 2 0.8 k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2

Byrd-Omojokun Trust-Region Algorithm Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n 2 0.8 k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2 Subproblems can be solved inexactly