Constrained Nonlinear Optimization Algorithms

Similar documents
An Inexact Newton Method for Nonlinear Constrained Optimization

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

An Inexact Newton Method for Optimization

PDE-Constrained and Nonsmooth Optimization

Inexact Newton Methods and Nonlinear Constrained Optimization

Algorithms for constrained local optimization

A New Penalty-SQP Method

5 Handling Constraints

1 Computing with constraints

MATH Dr. Pedro V squez UPRM. P. V squez (UPRM) Conferencia 1/ 17

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MATH Dr. Pedro Vásquez UPRM. P. Vásquez (UPRM) Conferencia 1 / 17

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Large-Scale Nonlinear Optimization with Inexact Step Computations

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Lecture 15: SQP methods for equality constrained optimization

Written Examination

Lecture 13: Constrained optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Numerical Methods for PDE-Constrained Optimization


E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Computational Finance

Hot-Starting NLP Solvers

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

What s New in Active-Set Methods for Nonlinear Optimization?

Nonlinear Optimization Solvers

5.5 Quadratic programming

Nonlinear Optimization: What s important?

Algorithms for nonlinear programming problems II

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

Algorithms for Constrained Optimization

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Recent Adaptive Methods for Nonlinear Optimization

2.3 Linear Programming

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

A GLOBALLY CONVERGENT STABILIZED SQP METHOD

POWER SYSTEMS in general are currently operating

Nonlinear optimization

Linear Programming: Simplex

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Nonlinear Optimization for Optimal Control

Algorithms for nonlinear programming problems II

Some new facts about sequential quadratic programming methods employing second derivatives

A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

On the use of piecewise linear models in nonlinear programming

An Inexact Newton Method for Nonconvex Equality Constrained Optimization

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

5.6 Penalty method and augmented Lagrangian method

minimize x subject to (x 2)(x 4) u,

Chapter 3 Numerical Methods

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Constrained Optimization

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

On nonlinear optimization since M.J.D. Powell

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Nonlinear Programming

CONSTRAINED NONLINEAR PROGRAMMING

Line Search Methods for Unconstrained Optimisation

1. Introduction. We consider nonlinear optimization problems of the form. f(x) ce (x) = 0 c I (x) 0,

10 Numerical methods for constrained problems

A Trust-region-based Sequential Quadratic Programming Algorithm

Numerical Optimization: Basic Concepts and Algorithms

Higher-Order Methods

Arc Search Algorithms

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

LINEAR AND NONLINEAR PROGRAMMING

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

10-725/ Optimization Midterm Exam

Quadratic Programming

A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties

A Regularized Interior-Point Method for Constrained Nonlinear Least Squares

Constrained optimization: direct methods (cont.)

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

CHAPTER 2: QUADRATIC PROGRAMMING

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Modern Optimal Control

Newton s Method. Javier Peña Convex Optimization /36-725

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Nonlinear Programming, Elastic Mode, SQP, MPEC, MPCC, complementarity

Examination paper for TMA4180 Optimization I

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

1. Introduction. In this paper we discuss an algorithm for equality constrained optimization problems of the form. f(x) s.t.

Interior Point Methods in Mathematical Programming

Constrained Optimization Theory

CONVEXIFICATION SCHEMES FOR SQP METHODS

Interior Point Methods for Mathematical Programming

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Transcription:

Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu Institute for Mathematics and its Applications University of Minnesota August 4, 2016

Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable.

Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable. No is convexity required.

Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable. No is convexity required. Most algorithms for NLP have theoretical convergence guarantee only to stationary points;

Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable. No is convexity required. Most algorithms for NLP have theoretical convergence guarantee only to stationary points; ingredients that steer towards local minimizers.

Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

Design and Operation of Chemical Plant

Design and Operation of Chemical Plant Minimize Costs Profit Variables: Physical quantities Constraints: physical relationships (conservation laws; themodyn. rel.) Limits (physical and operational) < 10 5 variables; few degrees of freedom

Design and Operation of Chemical Plant Minimize Costs Profit Variables: Physical quantities Constraints: physical relationships (conservation laws; themodyn. rel.) Limits (physical and operational) < 10 5 variables; few degrees of freedom Constraint Jacobian c(x) T = is sparse and structured

Design Under Uncertainty Scenario parameters: Fin l, x in l, T env, l penv, l... (given) (defining scenarios l = 1,..., L) Design variables: V react, D dist, h tank,... Control variables: Q l heat, r l refl, v l valve,... State variables: x l str, F l str, L l tank, T l, p l,...

Design Under Uncertainty c(x) T = Scenario parameters: Fin l, x in l, T env, l penv, l... (given) (defining scenarios l = 1,..., L) Design variables: V react, D dist, h tank,... Control variables: Q l heat, r l refl, v l valve,... State variables: x l str, F l str, L l tank, T l, p l,...

Optimal Control / Dynamic Optimization min f (z(t f ), y(t f ), u(t f ), p) z,y,u,p s.t. F (ż(t), z(t), y(t), u(t), p) = 0 G(z(t), y(t), u(t), p) = 0 z(0) = z init bound constraints u : [0, t f ] R nu z : [0, t f ] R nz y : [0, t f ] R ny p R np z init t f control variables differentiable state variables algebraic state variables time-independent parameters initial conditions final time Large-scale NLPs arise from discretization.

Circuit Tuning Model consists of network of gates. Gate delays computed by simulation (expensive, noisy). Model has many variables (up to 1, 000, 000). Implemented in IBM s circuit tuning tool EinsTuner.

Hyperthermia Treatment Planning min T (x),u 1 (T (x) T target (x)) 2 dx 2 Ω s.t. T (x) w(t (x) T blood ) = u M(x)u in Ω T (x) n = T exterior T (x) on Ω T (x) T max in Ω \ Ω Tumor Heat tumors with microwaves (support chemo- and radio-therapy). Model is a PDE with controls: Application u of microwave antennas. states: Temperature T (x) defined over domain Ω. Finite-dimensional problem obtained by discretization. e.g., finite differences, finite elements Resulting NLP is usually very large.

Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I

Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control).

Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control). Important building block for methods for general NLP.

Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control). Important building block for methods for general NLP. Algorithms: Active-set methods Interior-point methods

Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control). Important building block for methods for general NLP. Algorithms: Active-set methods Interior-point methods Let s first consider equality-constrained case. Assume: all rows of A E are linearly independent.

Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP)

Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP) First-order optimality conditions: Qx + g + A T λ = 0 Ax + b = 0

Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP) First-order optimality conditions: Qx + g + A T λ = 0 Ax + b = 0 Find stationary point (x, λ ) by solving the linear system [ ] ( ) ( ) Q A T x g A 0 λ =. b

KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)?

KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)? Recall the second-order optimality conditions: Let the columns of Z R n (n m) be a basis of the null-space of A, so AZ = 0 ( null-space matrix ). Then x is a strict local minimizer of (EQP) if Z T QZ 0.

KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)? Recall the second-order optimality conditions: Let the columns of Z R n (n m) be a basis of the null-space of A, so AZ = 0 ( null-space matrix ). Then x is a strict local minimizer of (EQP) if Z T QZ 0. On the other hand: If Z T QZ has negative eigenvalue, then (EQP) is unbounded below.

KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)? Recall the second-order optimality conditions: Let the columns of Z R n (n m) be a basis of the null-space of A, so AZ = 0 ( null-space matrix ). Then x is a strict local minimizer of (EQP) if Z T QZ 0. On the other hand: If Z T QZ has negative eigenvalue, then (EQP) is unbounded below. There are different ways to solve the KKT system Best choice depends on particular problem

Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z?

Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z? Definition Let n +, n, n 0 be the number of positive, negative, and zero eigenvalues of a matrix M. Then In(M) = (n +, n, n 0 ) is the inertia of M.

Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z? Definition Let n +, n, n 0 be the number of positive, negative, and zero eigenvalues of a matrix M. Then In(M) = (n +, n, n 0 ) is the inertia of M. Theorem Suppose that A has full rank. Then: In(K) = In(Z T QZ) + (m, m, 0).

Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z? Definition Let n +, n, n 0 be the number of positive, negative, and zero eigenvalues of a matrix M. Then In(M) = (n +, n, n 0 ) is the inertia of M. Theorem Suppose that A has full rank. Then: In(K) = In(Z T QZ) + (m, m, 0). Corollary If In(K) = (n, m, 0), then x is the unique global minimizer.

Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks

Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity.

Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity. Obtain inertia simply from counting eigenvalues of the blocks in B.

Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity. Obtain inertia simply from counting eigenvalues of the blocks in B. Used also to solve the linear system.

Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity. Obtain inertia simply from counting eigenvalues of the blocks in B. Used also to solve the linear system. Will be important later when we need to convexify QPs (Q Q + γi).

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular.

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1.

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ Requirements: Solutions with Q can be done efficiently

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ Requirements: Solutions with Q can be done efficiently Need to compute [AQ 1 A T ] and solve linear system with it

Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ Requirements: Solutions with Q can be done efficiently Need to compute [AQ 1 A T ] and solve linear system with it Works best if m is small

Step Decomposition 0

Step Decomposition 0 x

Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space

Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space x Y = Yp Y and x Z = Zp Z where [Y Z] is basis of R n and Z is null space matrix for A.

Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space x Y = Yp Y and x Z = Zp Z where [Y Z] is basis of R n and Z is null space matrix for A. Decomposition depends on choice of Y and Z

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y )

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y ) Solves min pz 1 2 pt Z [Z T QZ]p Z + (g + QYp Y ) T Zp Z ( reduced QP )

Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y ) Solves min pz 1 2 pt Z [Z T QZ]p Z + (g + QYp Y ) T Zp Z ( reduced QP ) λ = [AY ] T Y (Qx + g)

Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω

Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω Given the (independent) control variable u: (Dependent) state T is solution of PDE Can use well-established solution techniques

Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω Given the (independent) control variable u: (Dependent) state T is solution of PDE Can use well-established solution techniques We have only n u degrees of freedom

Discretized PDE-Constrained Problem 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω n min (t i ˆt i ) 2 n u + u t,u i 2 i=1 i=1 s.t. D t + Ku + b = 0 Discretized state variables t R N Discretized non-singular(!) differential operator D R N N

Discretized PDE-Constrained Problem 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω n min (t i ˆt i ) 2 n u + u t,u i 2 i=1 i=1 s.t. D t + Ku + b = 0 Discretized state variables t R N Discretized non-singular(!) differential operator D R N N Given controls u, the state variables can be computed from t = D 1 (Ku + b).

Discretized PDE-Constrained Problem 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω n min (t i ˆt i ) 2 n u + u t,u i 2 i=1 i=1 s.t. D t + Ku + b = 0 Discretized state variables t R N Discretized non-singular(!) differential operator D R N N Given controls u, the state variables can be computed from t = D 1 (Ku + b). We could just eliminate t and solve lower-dimensional problem in u

Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I

Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I p Y = [AY ] 1 d = B 1 b p Z = [Z T QZ] 1 Z T (g + QYp Y ) λ = [AY ] T Y (Qx + g) = B T Y (Qx + g) Can use existing implementations of operator B 1 : Compute Z and pz (assuming N has few columns). Compute λ (assuming that we have implementation for B T ).

Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I p Y = [AY ] 1 d = B 1 b p Z = [Z T QZ] 1 Z T (g + QYp Y ) λ = [AY ] T Y (Qx + g) = B T Y (Qx + g) Can use existing implementations of operator B 1 : Compute Z and pz (assuming N has few columns). Compute λ (assuming that we have implementation for B T ). Tailored implementation for simulation often already exist.

Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I p Y = [AY ] 1 d = B 1 b p Z = [Z T QZ] 1 Z T (g + QYp Y ) λ = [AY ] T Y (Qx + g) = B T Y (Qx + g) Can use existing implementations of operator B 1 : Compute Z and pz (assuming N has few columns). Compute λ (assuming that we have implementation for B T ). Tailored implementation for simulation often already exist. Exploit problem structure!

Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option

Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option Schur-complement Method Requires that Q is positive definite and easy to solve (e.g., diagonal) Number of constraints m should not be large

Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option Schur-complement Method Requires that Q is positive definite and easy to solve (e.g., diagonal) Number of constraints m should not be large Null-space method Step decomposition into range-space step and null-space step Permits exploitation of constraint matrix structure Number of degrees of freedom (n m) should not be large

Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Qx + g + a i λ i = 0 i E I ai T x + b i = 0 for i E ai T x + b i 0 for i I λ i 0 for i I (ai T x + b i )λ i = 0 for i I

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Assume: Q is positive definite; {a i } i E are linearly independent. Qx + g + i E I a i λ i = 0 a T i x + b i = 0 for i E a T i x + b i 0 for i I λ i 0 for i I (a T i x + b i )λ i = 0 for i I

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Assume: Q is positive definite; {a i } i E are linearly independent. Qx + g + i E I a i λ i = 0 a T i x + b i = 0 for i E a T i x + b i 0 for i I λ i 0 for i I (a T i x + b i )λ i = 0 for i I Difficulty: Decide, which inequality constraints are active.

Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Assume: Q is positive definite; {a i } i E are linearly independent. Qx + g + i E I a i λ i = 0 a T i x + b i = 0 for i E a T i x + b i 0 for i I λ i 0 for i I (a T i x + b i )λ i = 0 for i I Difficulty: Decide, which inequality constraints are active. We know how to solve equality-constrained QPs. Can we use that here?

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W Set missing multipliers λ i = 0 for i I \ W and verify a T i x + b i λ i? 0 for i I \ W? 0 for i I

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W Set missing multipliers λ i = 0 for i I \ W and verify a T i x + b i λ i? 0 for i I \ W? 0 for i I If satisfied, (x, λ) is the (unique) optimal solution

Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W Set missing multipliers λ i = 0 for i I \ W and verify a T i x + b i λ i? 0 for i I \ W? 0 for i I If satisfied, (x, λ) is the (unique) optimal solution Otherwise, let s try a different working set

Example QP (1) (2) (4) (5) (3) min (x 1 1) 2 + (x 2 2.5) 2 s.t. x 1 + 2x 2 2 0 (1) x 1 0 (4) x 1 + 2x 2 6 0 (2) x 2 0 (5) x 1 2x 2 2 0 (3)

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) x = (0, 2) Initialization: Choose feasible starting iterate x

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) Initialization: Choose feasible starting iterate x Choose working set W I with i W = a T i x + b i = 0 {a i } i E W are linear independent (Algorithm will maintain these properties)

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Solve (EQP)

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP).

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP). Release Constraint: Pick constraint i with λ i < 0.

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP). Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {3} = {5}

Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP). Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {3} = {5} Keep iterate x = (0, 2).

Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Solve (EQP)

Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP).

Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP). Take step (x EQP is feasible): Update iterate x x EQP

Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP). Take step (x EQP is feasible): Update iterate x x EQP Keep W

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Solve (EQP)

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP)

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP) Release Constraint: Pick constraint i with λ i < 0.

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP) Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {5} =

Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP) Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {5} = Keep iterate x = (1, 0).

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) W = x = (1, 0) x EQP = (1, 2.5) Solve (EQP)

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5)

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible):

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible Update iterate x x + α(x EQP x)

Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) W = x = (1, 0) x EQP = (1, 2.5) Status: Current iterate not optimal for (EQP) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible Update iterate x x + α(x EQP x) Update W W {i} = {1} where constraint i = 1 is blocking

Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Solve (EQP)

Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is not optimal for (EQP).

Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is not optimal for (EQP). Take step (x EQP feasible): Update iterate x x EQP. Keep W.

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Solve (EQP)

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP)

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP) λ i 0 for all i W.

Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP) λ i 0 for all i W. Declare Optimality!

Primal Active-Set QP Method 1. Select feasible x and W I A(x).

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP.

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP :

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done!

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}.

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP :

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x.

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x. Compute α = arg max{α [0, 1] : x + αp is feasible}.

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x. Compute α = arg max{α [0, 1] : x + αp is feasible}. If α < 1, pick i I \ W with a T i p > 0 and ai T (x + αp) + b i = 0, and set W W {i}.

Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x. Compute α = arg max{α [0, 1] : x + αp is feasible}. If α < 1, pick i I \ W with a T i p > 0 and ai T (x + αp) + b i = 0, and set W W {i}. Update x x + αp. 5. Go to step 2.

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent.

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step;

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps)

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps) Efficient solution of (EQP) Update the factorization when W changes.

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps) Efficient solution of (EQP) Update the factorization when W changes. Complexity Fast convergence if good estimate of optimal working set is given. Worst case exponential complexity. Alternative: Interior-point QP solvers (polynomial complexity).

Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps) Efficient solution of (EQP) Update the factorization when W changes. Complexity Fast convergence if good estimate of optimal working set is given. Worst case exponential complexity. Alternative: Interior-point QP solvers (polynomial complexity). There are variants that allow Q to be indefinite.

Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence!

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence! Issues: Guarantees only (fast) local convergence.

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence! Issues: Guarantees only (fast) local convergence. We would like to find local minima and not just any kind of stationary point.

Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence! Issues: Guarantees only (fast) local convergence. We would like to find local minima and not just any kind of stationary point. Need: Globalization scheme (for convergence from any starting point) Mechanisms that encourage convergence to local minima

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k f k := f (x k ) c k := c(x k ) c k := c(x k )

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k f k := f (x k ) c k := c(x k ) c k := c(x k ) m L(x, λ) := f (x) + c j (x)λ j H k := 2 xxl(x k, λ k ) j=1

Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k Update iterate (x k+1, λ k+1 ) = (x k, λ k ) + (p k, p λ k ) f k := f (x k ) c k := c(x k ) c k := c(x k ) m L(x, λ) := f (x) + c j (x)λ j H k := 2 xxl(x k, λ k ) j=1

Sequential Quadratic Programming [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k

Sequential Quadratic Programming [ ] ( ) ( Hk c k pk fk + c ck T 0 λ k + pk λ = ) k λ k c k

Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k λ k+1 = λ k + p λ k

Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k These are the optimality conditions of 1 min p R n 2 pt H k p + fk T p + f k s.t. c T k p + c k = 0 with multipliers λ k+1 = λ k + p λ k

Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k These are the optimality conditions of 1 min p R n 2 pt H k p + fk T p + f k s.t. c T k p + c k = 0 with multipliers λ k+1 = λ k + p λ k Newton step can be interpreted as solution of a local QP model!

Local QP Model x k x c(x) = 0 Original Problem min f (x) x R n s.t. c(x) = 0

Local QP Model x k p x c(x) = 0 Original Problem min f (x) x R n s.t. c(x) = 0 Local QP model (QP k ) min f p R n k + fk T p + 1 2 pt H k p s.t. c k + c T k p = 0

Exact Penalty Function Need tool to facilitate convergence from any starting point.

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals:

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x)

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x) Combined in (non-differentiable) exact penalty function: φ ρ (x) = f (x) + ρ c(x) 1 (ρ > 0)

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x) Combined in (non-differentiable) exact penalty function: Lemma φ ρ (x) = f (x) + ρ c(x) 1 (ρ > 0) Suppose, x is a local minimizer of (NLP) with multipliers λ and LICQ holds. Then x is a local minimizer of φ ρ if ρ > λ.

Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x) Combined in (non-differentiable) exact penalty function: Lemma φ ρ (x) = f (x) + ρ c(x) 1 (ρ > 0) Suppose, x is a local minimizer of (NLP) with multipliers λ and LICQ holds. Then x is a local minimizer of φ ρ if ρ > λ. We can use decrease in φ ρ as a measure of progress towards a local minimizer of (NLP).

Line Search φ ρ (x) = f (x) + ρ c(x) 1

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1))

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k.

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until Lemma φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k. Let p k be an optimal solution of (QP k ). Then Dφ ρ (x k ; p k ) p T k H k p k (ρ λ k+1 ) c k 1.

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until Lemma φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k. Let p k be an optimal solution of (QP k ). Then Dφ ρ (x k ; p k ) p T k H k p k (ρ λ k+1 ) c k 1. So, p k is a descent direction for φ ρ if

Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until Lemma φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k. Let p k be an optimal solution of (QP k ). Then Dφ ρ (x k ; p k ) p T k H k p k (ρ λ k+1 ) c k 1. So, p k is a descent direction for φ ρ if H k 0 and ρ > λ k+1.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1. 3. Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1. 3. Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise. 4. Perform backtracking line search: Find largest α k {1, 1 2, 1 4,...} with φ ρk (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρk (x k ; p k ).

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1. 3. Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise. 4. Perform backtracking line search: Find largest α k {1, 1 2, 1 4,...} with φ ρk (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρk (x k ; p k ). 5. Update iterate x k+1 = x k + α k p k and λ k+1 = λ k+1.

Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1. 2. Solve (QP k ) to get p k and λ k+1. 3. Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise. 4. Perform backtracking line search: Find largest α k {1, 1 2, 1 4,...} with φ ρk (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρk (x k ; p k ). 5. Update iterate x k+1 = x k + α k p k and λ k+1 = λ k+1. 6. Set k k + 1 and to go Step 2.

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable.

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite.

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite. The smallest singular value of c k is uniformly bounded away from zero.

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite. The smallest singular value of c k is uniformly bounded away from zero. Theorem Under these assumptions, we have ( ) fk + c lim k λ k+1 = 0. k c k

Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite. The smallest singular value of c k is uniformly bounded away from zero. Theorem Under these assumptions, we have ( ) fk + c lim k λ k+1 = 0. k c k In other words, each limit point of {x k } is a stationary point for (NLP).

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k.

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex.

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite.

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded.

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ.

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ. Would like Zk T H kz k 0 (Z k null-space matrix for ck T ).

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ. Would like Zk T H kz k 0 (Z k null-space matrix for ck T ). H k = BFGS approximation of 2 xxl(x k, λ k ).

Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ. Would like Zk T H kz k 0 (Z k null-space matrix for ck T ). H k = BFGS approximation of 2 xxl(x k, λ k ). Potentially slow local convergence, since 2 xx L(x, λ ) may be indefinite.

Regularization Recall optimality conditions 1 min p R n 2 pt H k p + fk T p [ ] Hk c k ck T 0 }{{} =:K k s.t. c T k p + c k = 0 ( pk λ k+1 ) = ( ) fk c k

Regularization Recall optimality conditions Choose γ 0 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk c k

Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = Choose γ 0 so that K k has inertia (n, m, 0). ( ) fk c k Then Z T k H kz k is positive definite.

Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk Choose γ 0 so that K k has inertia (n, m, 0). E.g.: Trial and error, computing inertia via factorization Then Z T k H kz k is positive definite. c k

Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk Choose γ 0 so that K k has inertia (n, m, 0). E.g.: Trial and error, computing inertia via factorization Then Z T k H kz k is positive definite. No regularization necessary close to second-order sufficient solution. c k

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k Reduced space QP 1 min p Z 2 pt Z [Zk T H k Z k ]p Z + ( f k + H k Y k p Y,k ) T Z k p Z

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k Reduced space QP 1 min p Z 2 pt Z [Zk T H k Z k ]p Z + ( f k + H k Y k p Y,k ) T Z k p Z Make sure Z T k H kz k is positive definite

Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k Reduced space QP 1 min p Z 2 pt Z [Zk T H k Z k ]p Z + ( f k + H k Y k p Y,k ) T Z k p Z Make sure Z T k H kz k is positive definite (e.g., BFGS)

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k Step p k is accepted if pred k ared k η with (η (0, 1)) pred k = m k (0) m k (p k ), ared k = φ ρ (x k ) φ ρ (x k + p k )

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k Step p k is accepted if pred k ared k η with (η (0, 1)) pred k = m k (0) m k (p k ), ared k = φ ρ (x k ) φ ρ (x k + p k ) Piece-wise quadratic model of φ ρ (x) = f (x) + ρ c(x) 1 : m k (p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1

Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k Step p k is accepted if pred k ared k η with (η (0, 1)) pred k = m k (0) m k (p k ), ared k = φ ρ (x k ) φ ρ (x k + p k ) Piece-wise quadratic model of φ ρ (x) = f (x) + ρ c(x) 1 : m k (p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1 Otherwise, decrease k

Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible

Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible One remedy: Penalize constraint violation 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 p k

Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible One remedy: Penalize constraint violation 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1

Fletcher s Sl 1 QP 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1

Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1

Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1 Natural algorithm for minimizing φ ρ (x).

Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1 Natural algorithm for minimizing φ ρ (x). Difficulty: Selecting sufficiently large value of ρ.

Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p + 1 2 pt H k p + ρ c k + c T k p 1 Natural algorithm for minimizing φ ρ (x). Difficulty: Selecting sufficiently large value of ρ. This motivated the invention of filter methods.

Byrd-Omojokun Trust-Region Algorithm x k n k x Decompose step p k = n k + t k

Byrd-Omojokun Trust-Region Algorithm x k n k x Decompose step p k = n k + t k Normal component n k towards feasibility

Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility Tangential component t k towards optimality

Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility Tangential component t k towards optimality min n c T k n + c k 2 2 s.t. n 2 0.8 k

Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n 2 0.8 k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2

Byrd-Omojokun Trust-Region Algorithm Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n 2 0.8 k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2 Subproblems can be solved inexactly

Byrd-Omojokun Trust-Region Algorithm Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n 2 0.8 k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2 Subproblems can be solved inexactly Normal problem: Dogleg method.