Constrained Nonlinear Optimization Algorithms

Size: px

Start display at page:

Download "Constrained Nonlinear Optimization Algorithms"

Noel Hutchinson
5 years ago
Views:

1 Department of Industrial Engineering and Management Sciences Northwestern University Institute for Mathematics and its Applications University of Minnesota August 4, 2016

2 Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable.

3 Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable. No is convexity required.

4 Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable. No is convexity required. Most algorithms for NLP have theoretical convergence guarantee only to stationary points;

5 Constrained Nonlinear Optimization Problems min f (x) x R n s.t. c E (x) = 0 c I (x) 0 (NLP) f : R n R c E : R n R m E c I : R n R m I We assume that all functions are twice continuously differentiable. No is convexity required. Most algorithms for NLP have theoretical convergence guarantee only to stationary points; ingredients that steer towards local minimizers.

6 Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

7 Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

8 Design and Operation of Chemical Plant

9 Design and Operation of Chemical Plant Minimize Costs Profit Variables: Physical quantities Constraints: physical relationships (conservation laws; themodyn. rel.) Limits (physical and operational) < 10 5 variables; few degrees of freedom

10 Design and Operation of Chemical Plant Minimize Costs Profit Variables: Physical quantities Constraints: physical relationships (conservation laws; themodyn. rel.) Limits (physical and operational) < 10 5 variables; few degrees of freedom Constraint Jacobian c(x) T = is sparse and structured

11 Design Under Uncertainty Scenario parameters: Fin l, x in l, T env, l penv, l... (given) (defining scenarios l = 1,..., L) Design variables: V react, D dist, h tank,... Control variables: Q l heat, r l refl, v l valve,... State variables: x l str, F l str, L l tank, T l, p l,...

12 Design Under Uncertainty c(x) T = Scenario parameters: Fin l, x in l, T env, l penv, l... (given) (defining scenarios l = 1,..., L) Design variables: V react, D dist, h tank,... Control variables: Q l heat, r l refl, v l valve,... State variables: x l str, F l str, L l tank, T l, p l,...

13 Optimal Control / Dynamic Optimization min f (z(t f ), y(t f ), u(t f ), p) z,y,u,p s.t. F (ż(t), z(t), y(t), u(t), p) = 0 G(z(t), y(t), u(t), p) = 0 z(0) = z init bound constraints u : [0, t f ] R nu z : [0, t f ] R nz y : [0, t f ] R ny p R np z init t f control variables differentiable state variables algebraic state variables time-independent parameters initial conditions final time Large-scale NLPs arise from discretization.

14 Circuit Tuning Model consists of network of gates. Gate delays computed by simulation (expensive, noisy). Model has many variables (up to 1, 000, 000). Implemented in IBM s circuit tuning tool EinsTuner.

15 Hyperthermia Treatment Planning min T (x),u 1 (T (x) T target (x)) 2 dx 2 Ω s.t. T (x) w(t (x) T blood ) = u M(x)u in Ω T (x) n = T exterior T (x) on Ω T (x) T max in Ω \ Ω Tumor Heat tumors with microwaves (support chemo- and radio-therapy). Model is a PDE with controls: Application u of microwave antennas. states: Temperature T (x) defined over domain Ω. Finite-dimensional problem obtained by discretization. e.g., finite differences, finite elements Resulting NLP is usually very large.

16 Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

17 Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I

18 Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control).

19 Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control). Important building block for methods for general NLP.

20 Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control). Important building block for methods for general NLP. Algorithms: Active-set methods Interior-point methods

21 Quadratic Programming 1 min x R n 2 x T Qx + g T x s.t. A E x + b E = 0 A I x + b I 0 (QP) Q R n n symmetric A E R me n b E R m E A I R mi n b I R m I Many applications (e.g., portfolio optimization, optimal control). Important building block for methods for general NLP. Algorithms: Active-set methods Interior-point methods Let s first consider equality-constrained case. Assume: all rows of A E are linearly independent.

22 Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP)

23 Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP) First-order optimality conditions: Qx + g + A T λ = 0 Ax + b = 0

24 Equality-Constrained QP 1 min x R n 2 x T Qx + g T x s.t. Ax + b = 0 (EQP) First-order optimality conditions: Qx + g + A T λ = 0 Ax + b = 0 Find stationary point (x, λ ) by solving the linear system [ ] ( ) ( ) Q A T x g A 0 λ =. b

25 KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)?

26 KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)? Recall the second-order optimality conditions: Let the columns of Z R n (n m) be a basis of the null-space of A, so AZ = 0 ( null-space matrix ). Then x is a strict local minimizer of (EQP) if Z T QZ 0.

27 KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)? Recall the second-order optimality conditions: Let the columns of Z R n (n m) be a basis of the null-space of A, so AZ = 0 ( null-space matrix ). Then x is a strict local minimizer of (EQP) if Z T QZ 0. On the other hand: If Z T QZ has negative eigenvalue, then (EQP) is unbounded below.

28 KKT System of QP [ ] ( ) ( ) Q A T x g A 0 λ = b When is (x, λ ) indeed a solution of (EQP)? Recall the second-order optimality conditions: Let the columns of Z R n (n m) be a basis of the null-space of A, so AZ = 0 ( null-space matrix ). Then x is a strict local minimizer of (EQP) if Z T QZ 0. On the other hand: If Z T QZ has negative eigenvalue, then (EQP) is unbounded below. There are different ways to solve the KKT system Best choice depends on particular problem

29 Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z?

30 Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z? Definition Let n +, n, n 0 be the number of positive, negative, and zero eigenvalues of a matrix M. Then In(M) = (n +, n, n 0 ) is the inertia of M.

31 Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z? Definition Let n +, n, n 0 be the number of positive, negative, and zero eigenvalues of a matrix M. Then In(M) = (n +, n, n 0 ) is the inertia of M. Theorem Suppose that A has full rank. Then: In(K) = In(Z T QZ) + (m, m, 0).

32 Direct Solution of the KKT System [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K How can we verify that x is local minimizer without computing Z? Definition Let n +, n, n 0 be the number of positive, negative, and zero eigenvalues of a matrix M. Then In(M) = (n +, n, n 0 ) is the inertia of M. Theorem Suppose that A has full rank. Then: In(K) = In(Z T QZ) + (m, m, 0). Corollary If In(K) = (n, m, 0), then x is the unique global minimizer.

33 Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks

34 Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity.

35 Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity. Obtain inertia simply from counting eigenvalues of the blocks in B.

36 Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity. Obtain inertia simply from counting eigenvalues of the blocks in B. Used also to solve the linear system.

37 Computing the Inertia [ ] ( ) ( ) Q A T x g A 0 λ = b }{{} =:K Symmetric indefinite factorization PKP T = LBL T P: permutation matrix L: unit lower triangular matrix B: block diagonal matrix with 1 1 and 2 2 diagonal blocks Can be computed efficiently, exploits sparsity. Obtain inertia simply from counting eigenvalues of the blocks in B. Used also to solve the linear system. Will be important later when we need to convexify QPs (Q Q + γi).

38 Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular.

39 Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1.

40 Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g

41 Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ

42 Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ Requirements: Solutions with Q can be done efficiently

43 Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ Requirements: Solutions with Q can be done efficiently Need to compute [AQ 1 A T ] and solve linear system with it

44 Schur-Complement Method Qx + g + A T λ = 0 Ax + b = 0 Assume, Q is positive definite. Then AQ 1 A T is nonsingular. Pre-multiply first equation by AQ 1. Then solve [AQ 1 A T ]λ = b AQ 1 g Qx = g A T λ Requirements: Solutions with Q can be done efficiently Need to compute [AQ 1 A T ] and solve linear system with it Works best if m is small

45 Step Decomposition 0

46 Step Decomposition 0 x

47 Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space

48 Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space x Y = Yp Y and x Z = Zp Z where [Y Z] is basis of R n and Z is null space matrix for A.

49 Step Decomposition 0 x Y x x Z Decompose x = x Y + x Z into two steps: range-space step x Y : step into constraints null-space step x Z : optimize within null space x Y = Yp Y and x Z = Zp Z where [Y Z] is basis of R n and Z is null space matrix for A. Decomposition depends on choice of Y and Z

50 Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b

51 Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b

52 Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y )

53 Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y ) Solves min pz 1 2 pt Z [Z T QZ]p Z + (g + QYp Y ) T Zp Z ( reduced QP )

54 Step Decomposition Qx + g + A T λ = 0 Ax + b = 0 x x Z 0 x Y x Y = Yp Y is a step into the constraints: 0 = Ax + b = AYp Y + AZp Z + b = p Y = [AY ] 1 b x Z = Zp Z optimizes in the null space p Z = [Z T QZ] 1 Z T (g + QYp Y ) Solves min pz 1 2 pt Z [Z T QZ]p Z + (g + QYp Y ) T Zp Z ( reduced QP ) λ = [AY ] T Y (Qx + g)

55 Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω

56 Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω Given the (independent) control variable u: (Dependent) state T is solution of PDE Can use well-established solution techniques

57 Example: PDE-Constrained Optimization 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω Given the (independent) control variable u: (Dependent) state T is solution of PDE Can use well-established solution techniques We have only n u degrees of freedom

58 Discretized PDE-Constrained Problem 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω n min (t i ˆt i ) 2 n u + u t,u i 2 i=1 i=1 s.t. D t + Ku + b = 0 Discretized state variables t R N Discretized non-singular(!) differential operator D R N N

59 Discretized PDE-Constrained Problem 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω n min (t i ˆt i ) 2 n u + u t,u i 2 i=1 i=1 s.t. D t + Ku + b = 0 Discretized state variables t R N Discretized non-singular(!) differential operator D R N N Given controls u, the state variables can be computed from t = D 1 (Ku + b).

60 Discretized PDE-Constrained Problem 1 (T (z) ˆT (z)) 2 dz + α 2 2 u 2 n u s.t. T (z) = k i (z)u i on Ω min T,u i=1 T (z) = b(z) on Ω n min (t i ˆt i ) 2 n u + u t,u i 2 i=1 i=1 s.t. D t + Ku + b = 0 Discretized state variables t R N Discretized non-singular(!) differential operator D R N N Given controls u, the state variables can be computed from t = D 1 (Ku + b). We could just eliminate t and solve lower-dimensional problem in u

61 Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I

62 Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I p Y = [AY ] 1 d = B 1 b p Z = [Z T QZ] 1 Z T (g + QYp Y ) λ = [AY ] T Y (Qx + g) = B T Y (Qx + g) Can use existing implementations of operator B 1 : Compute Z and pz (assuming N has few columns). Compute λ (assuming that we have implementation for B T ).

63 Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I p Y = [AY ] 1 d = B 1 b p Z = [Z T QZ] 1 Z T (g + QYp Y ) λ = [AY ] T Y (Qx + g) = B T Y (Qx + g) Can use existing implementations of operator B 1 : Compute Z and pz (assuming N has few columns). Compute λ (assuming that we have implementation for B T ). Tailored implementation for simulation often already exist.

64 Generalization: Elimination of Variables 1 min x R n 2 (x B T xn T )Q ( ) xb x N s.t. Bx B + Nx N + b = 0 + g T B x B + g T N x N [ ] I Y = 0 [ ] B Z = 1 N I p Y = [AY ] 1 d = B 1 b p Z = [Z T QZ] 1 Z T (g + QYp Y ) λ = [AY ] T Y (Qx + g) = B T Y (Qx + g) Can use existing implementations of operator B 1 : Compute Z and pz (assuming N has few columns). Compute λ (assuming that we have implementation for B T ). Tailored implementation for simulation often already exist. Exploit problem structure!

65 Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option

66 Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option Schur-complement Method Requires that Q is positive definite and easy to solve (e.g., diagonal) Number of constraints m should not be large

67 Solution of EQP Summary Direct method: Factorize KKT matrix If L T BL factorization is used, we can determine if x is indeed a minimizer Easy general purpose option Schur-complement Method Requires that Q is positive definite and easy to solve (e.g., diagonal) Number of constraints m should not be large Null-space method Step decomposition into range-space step and null-space step Permits exploitation of constraint matrix structure Number of degrees of freedom (n m) should not be large

68 Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

69 Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I

70 Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Qx + g + a i λ i = 0 i E I ai T x + b i = 0 for i E ai T x + b i 0 for i I λ i 0 for i I (ai T x + b i )λ i = 0 for i I

71 Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Assume: Q is positive definite; {a i } i E are linearly independent. Qx + g + i E I a i λ i = 0 a T i x + b i = 0 for i E a T i x + b i 0 for i I λ i 0 for i I (a T i x + b i )λ i = 0 for i I

72 Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Assume: Q is positive definite; {a i } i E are linearly independent. Qx + g + i E I a i λ i = 0 a T i x + b i = 0 for i E a T i x + b i 0 for i I λ i 0 for i I (a T i x + b i )λ i = 0 for i I Difficulty: Decide, which inequality constraints are active.

73 Inequality-Constrained QPs 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i 0 for i I Assume: Q is positive definite; {a i } i E are linearly independent. Qx + g + i E I a i λ i = 0 a T i x + b i = 0 for i E a T i x + b i 0 for i I λ i 0 for i I (a T i x + b i )λ i = 0 for i I Difficulty: Decide, which inequality constraints are active. We know how to solve equality-constrained QPs. Can we use that here?

74 Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W

75 Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W

76 Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W Set missing multipliers λ i = 0 for i I \ W and verify a T i x + b i λ i? 0 for i I \ W? 0 for i I

77 Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W Set missing multipliers λ i = 0 for i I \ W and verify a T i x + b i λ i? 0 for i I \ W? 0 for i I If satisfied, (x, λ) is the (unique) optimal solution

78 Working Set Choose working set W I and solve 1 min x R n 2 x T Qx + g T x s.t. a T i x + b i = 0 for i E a T i x + b i = 0 for i W Qx + g + a i λ i = 0 i E W ai T x + b i = 0 for i E ai T x + b i = 0 for i W Set missing multipliers λ i = 0 for i I \ W and verify a T i x + b i λ i? 0 for i I \ W? 0 for i I If satisfied, (x, λ) is the (unique) optimal solution Otherwise, let s try a different working set

79 Example QP (1) (2) (4) (5) (3) min (x 1 1) 2 + (x 2 2.5) 2 s.t. x 1 + 2x (1) x 1 0 (4) x 1 + 2x (2) x 2 0 (5) x 1 2x (3)

80 Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) x = (0, 2) Initialization: Choose feasible starting iterate x

81 Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) Initialization: Choose feasible starting iterate x Choose working set W I with i W = a T i x + b i = 0 {a i } i E W are linear independent (Algorithm will maintain these properties)

82 Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Solve (EQP)

83 Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP).

84 Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP). Release Constraint: Pick constraint i with λ i < 0.

85 Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP). Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {3} = {5}

86 Primal Active-Set QP Solver Iteration 1 (1) (2) (4) (5) (3) W = {3, 5} x = (0, 2) x EQP = (0, 2) λ 3 = 2 λ 5 = 1 Status: Current iterate is optimal for (EQP). Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {3} = {5} Keep iterate x = (0, 2).

87 Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Solve (EQP)

88 Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP).

89 Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP). Take step (x EQP is feasible): Update iterate x x EQP

90 Primal Active-Set QP Solver Iteration 2 (1) (2) (4) (5) (3) W = {5} x = (2, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is not optimal for (EQP). Take step (x EQP is feasible): Update iterate x x EQP Keep W

91 Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Solve (EQP)

92 Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP)

93 Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP) Release Constraint: Pick constraint i with λ i < 0.

94 Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP) Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {5} =

95 Primal Active-Set QP Solver Iteration 3 (1) (2) (4) (5) (3) W = {5} x = (1, 0) x EQP = (1, 0) λ 5 = 5 Status: Current iterate is optimal for (EQP) Release Constraint: Pick constraint i with λ i < 0. Remove i from working set: W W \ {5} = Keep iterate x = (1, 0).

96 Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) W = x = (1, 0) x EQP = (1, 2.5) Solve (EQP)

97 Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5)

98 Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible):

99 Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible

100 Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) Status: Current iterate not optimal for (EQP) W = x = (1, 0) x EQP = (1, 2.5) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible Update iterate x x + α(x EQP x)

101 Primal Active-Set QP Solver Iteration 4 (1) (2) (4) (5) (3) W = x = (1, 0) x EQP = (1, 2.5) Status: Current iterate not optimal for (EQP) Take step (x EQP not feasible): Largest α [0, 1]: x + α(x EQP x) feasible Update iterate x x + α(x EQP x) Update W W {i} = {1} where constraint i = 1 is blocking

102 Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Solve (EQP)

103 Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is not optimal for (EQP).

104 Primal Active-Set QP Solver Iteration 5 (1) (2) (4) (5) (3) W = {1} x = (1, 1.5) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is not optimal for (EQP). Take step (x EQP feasible): Update iterate x x EQP. Keep W.

105 Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Solve (EQP)

106 Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP)

107 Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP) λ i 0 for all i W.

108 Primal Active-Set QP Solver Iteration 6 (1) (2) (4) (5) (3) W = {1} x = (1.4, 1.7) x EQP = (1.4, 1.7) λ 1 = 0.8 Status: Current iterate is optimal for (EQP) λ i 0 for all i W. Declare Optimality!

109 Primal Active-Set QP Method 1. Select feasible x and W I A(x).

110 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP.

111 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP :

112 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done!

113 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}.

114 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP :

115 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x.

116 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x. Compute α = arg max{α [0, 1] : x + αp is feasible}.

117 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x. Compute α = arg max{α [0, 1] : x + αp is feasible}. If α < 1, pick i I \ W with a T i p > 0 and ai T (x + αp) + b i = 0, and set W W {i}.

118 Primal Active-Set QP Method 1. Select feasible x and W I A(x). 2. Solve (EQP) to get x EQP and λ EQP. 3. If x = x EQP : If λ EQP 0: Done! Otherwise, select λ EQP i < 0 and set W W \ {i}. 4. If x x EQP : Compute step p = x EQP x. Compute α = arg max{α [0, 1] : x + αp is feasible}. If α < 1, pick i I \ W with a T i p > 0 and ai T (x + αp) + b i = 0, and set W W {i}. Update x x + αp. 5. Go to step 2.

119 Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent.

120 Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step;

121 Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps)

122 Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps) Efficient solution of (EQP) Update the factorization when W changes.

123 Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps) Efficient solution of (EQP) Update the factorization when W changes. Complexity Fast convergence if good estimate of optimal working set is given. Worst case exponential complexity. Alternative: Interior-point QP solvers (polynomial complexity).

124 Active-Set QP Algorithms Primal active-set method: Keeps all iterates feasible. Changes W by at most one constraint per iteration. {a i } i E W remain linearly independent. Convergence Finite convergence: Finitely many options for W. Objective decreases with every step; as long as α > 0! Special handling of degeneracy necessary (α = 0 steps) Efficient solution of (EQP) Update the factorization when W changes. Complexity Fast convergence if good estimate of optimal working set is given. Worst case exponential complexity. Alternative: Interior-point QP solvers (polynomial complexity). There are variants that allow Q to be indefinite.

125 Table of Contents Applications Equality-Constrained Quadratic Programming Active-Set Quadratic Programming Solvers SQP for Equality-Constrained NLPs SQP for Inequality-Constrained NLPs Interior Point Methods Software

126 Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0

127 Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0

128 Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m

129 Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence!

130 Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence! Issues: Guarantees only (fast) local convergence.

131 Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence! Issues: Guarantees only (fast) local convergence. We would like to find local minima and not just any kind of stationary point.

132 Equality-Constrained Nonlinear Problems min f (x) x R n s.t. c(x) = 0 f (x) + c(x)λ = 0 c(x) = 0 System of nonlinear equations in (x, λ) R n R m Apply Newton s method: Fast local convergence! Issues: Guarantees only (fast) local convergence. We would like to find local minima and not just any kind of stationary point. Need: Globalization scheme (for convergence from any starting point) Mechanisms that encourage convergence to local minima

133 Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from

134 Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k f k := f (x k ) c k := c(x k ) c k := c(x k )

135 Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k f k := f (x k ) c k := c(x k ) c k := c(x k ) m L(x, λ) := f (x) + c j (x)λ j H k := 2 xxl(x k, λ k ) j=1

136 Newton s Method f (x) + c(x)λ = 0 c(x) = 0 At iterate (x k, λ k ) compute step p k, p λ k from [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k Update iterate (x k+1, λ k+1 ) = (x k, λ k ) + (p k, p λ k ) f k := f (x k ) c k := c(x k ) c k := c(x k ) m L(x, λ) := f (x) + c j (x)λ j H k := 2 xxl(x k, λ k ) j=1

137 Sequential Quadratic Programming [ Hk c k ck T 0 ] ( ) ( ) pk fk + c pk λ = k λ k c k

138 Sequential Quadratic Programming [ ] ( ) ( Hk c k pk fk + c ck T 0 λ k + pk λ = ) k λ k c k

139 Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k λ k+1 = λ k + p λ k

140 Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k These are the optimality conditions of 1 min p R n 2 pt H k p + fk T p + f k s.t. c T k p + c k = 0 with multipliers λ k+1 = λ k + p λ k

141 Sequential Quadratic Programming [ ] ( Hk c k pk ck T 0 λ k+1 ) = ( ) fk c k These are the optimality conditions of 1 min p R n 2 pt H k p + fk T p + f k s.t. c T k p + c k = 0 with multipliers λ k+1 = λ k + p λ k Newton step can be interpreted as solution of a local QP model!

142 Local QP Model x k x c(x) = 0 Original Problem min f (x) x R n s.t. c(x) = 0

143 Local QP Model x k p x c(x) = 0 Original Problem min f (x) x R n s.t. c(x) = 0 Local QP model (QP k ) min f p R n k + fk T p pt H k p s.t. c k + c T k p = 0

144 Exact Penalty Function Need tool to facilitate convergence from any starting point.

145 Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals:

146 Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x)

147 Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x) Combined in (non-differentiable) exact penalty function: φ ρ (x) = f (x) + ρ c(x) 1 (ρ > 0)

148 Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x) Combined in (non-differentiable) exact penalty function: Lemma φ ρ (x) = f (x) + ρ c(x) 1 (ρ > 0) Suppose, x is a local minimizer of (NLP) with multipliers λ and LICQ holds. Then x is a local minimizer of φ ρ if ρ > λ.

149 Exact Penalty Function Need tool to facilitate convergence from any starting point. Here, we have two (usually competing) goals: Optimality min f (x) Feasibility min c(x) Combined in (non-differentiable) exact penalty function: Lemma φ ρ (x) = f (x) + ρ c(x) 1 (ρ > 0) Suppose, x is a local minimizer of (NLP) with multipliers λ and LICQ holds. Then x is a local minimizer of φ ρ if ρ > λ. We can use decrease in φ ρ as a measure of progress towards a local minimizer of (NLP).

150 Line Search φ ρ (x) = f (x) + ρ c(x) 1

151 Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until

152 Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1))

153 Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k.

154 Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until Lemma φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k. Let p k be an optimal solution of (QP k ). Then Dφ ρ (x k ; p k ) p T k H k p k (ρ λ k+1 ) c k 1.

155 Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until Lemma φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k. Let p k be an optimal solution of (QP k ). Then Dφ ρ (x k ; p k ) p T k H k p k (ρ λ k+1 ) c k 1. So, p k is a descent direction for φ ρ if

156 Line Search φ ρ (x) = f (x) + ρ c(x) 1 Backtracking line search: Try α k {1, 1 2, 1 4,...} until Lemma φ ρ (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρ (x k ; p k ). (η (0, 1)) Dφ ρ (x k ; p k ): Directional derivative of φ ρ at x k in direction p k. Let p k be an optimal solution of (QP k ). Then Dφ ρ (x k ; p k ) p T k H k p k (ρ λ k+1 ) c k 1. So, p k is a descent direction for φ ρ if H k 0 and ρ > λ k+1.

157 Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k 1.

158 Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k Solve (QP k ) to get p k and λ k+1.

159 Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k Solve (QP k ) to get p k and λ k Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise.

160 Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k Solve (QP k ) to get p k and λ k Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise. 4. Perform backtracking line search: Find largest α k {1, 1 2, 1 4,...} with φ ρk (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρk (x k ; p k ).

161 Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k Solve (QP k ) to get p k and λ k Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise. 4. Perform backtracking line search: Find largest α k {1, 1 2, 1 4,...} with φ ρk (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρk (x k ; p k ). 5. Update iterate x k+1 = x k + α k p k and λ k+1 = λ k+1.

162 Basic SQP Algorithm 1. Choose x 1, λ 1, ρ 0 > 0. Set k Solve (QP k ) to get p k and λ k Update penalty parameter: (β > 0) { ρk 1 if ρ k 1 λ k+1 + β ρ k = λ k+1 + 2β otherwise. 4. Perform backtracking line search: Find largest α k {1, 1 2, 1 4,...} with φ ρk (x k + α k p k ) φ ρ (x k ) + ηα k Dφ ρk (x k ; p k ). 5. Update iterate x k+1 = x k + α k p k and λ k+1 = λ k Set k k + 1 and to go Step 2.

163 Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable.

164 Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite.

165 Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite. The smallest singular value of c k is uniformly bounded away from zero.

166 Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite. The smallest singular value of c k is uniformly bounded away from zero. Theorem Under these assumptions, we have ( ) fk + c lim k λ k+1 = 0. k c k

167 Convergence Result for Basic SQP Algorithm Assumptions f and c are twice continuously differentiable. The matrices H k are bounded and uniformly positive definite. The smallest singular value of c k is uniformly bounded away from zero. Theorem Under these assumptions, we have ( ) fk + c lim k λ k+1 = 0. k c k In other words, each limit point of {x k } is a stationary point for (NLP).

168 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k.

169 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex.

170 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite.

171 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded.

172 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ.

173 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ. Would like Zk T H kz k 0 (Z k null-space matrix for ck T ).

174 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ. Would like Zk T H kz k 0 (Z k null-space matrix for ck T ). H k = BFGS approximation of 2 xxl(x k, λ k ).

175 Choice of Hessian H k 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) For fast local convergence, want to choose H k = 2 xxl k. 2 xxl k is positive definite, if f and c are convex. In general, 2 xxl k might be indefinite. (QPk ) might be unbounded. pk might not be a descent direction for φ ρ. Would like Zk T H kz k 0 (Z k null-space matrix for ck T ). H k = BFGS approximation of 2 xxl(x k, λ k ). Potentially slow local convergence, since 2 xx L(x, λ ) may be indefinite.

176 Regularization Recall optimality conditions 1 min p R n 2 pt H k p + fk T p [ ] Hk c k ck T 0 }{{} =:K k s.t. c T k p + c k = 0 ( pk λ k+1 ) = ( ) fk c k

177 Regularization Recall optimality conditions Choose γ 0 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk c k

178 Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = Choose γ 0 so that K k has inertia (n, m, 0). ( ) fk c k Then Z T k H kz k is positive definite.

179 Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk Choose γ 0 so that K k has inertia (n, m, 0). E.g.: Trial and error, computing inertia via factorization Then Z T k H kz k is positive definite. c k

180 Regularization Recall optimality conditions 1 min p R n 2 pt (H k + γi)p + fk T p s.t. c T k p + c k = 0 [ ] Hk + γi c k ck T 0 }{{} =:K k ( pk λ k+1 ) = ( ) fk Choose γ 0 so that K k has inertia (n, m, 0). E.g.: Trial and error, computing inertia via factorization Then Z T k H kz k is positive definite. No regularization necessary close to second-order sufficient solution. c k

181 Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k

182 Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k

183 Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k Reduced space QP 1 min p Z 2 pt Z [Zk T H k Z k ]p Z + ( f k + H k Y k p Y,k ) T Z k p Z

184 Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k Reduced space QP 1 min p Z 2 pt Z [Zk T H k Z k ]p Z + ( f k + H k Y k p Y,k ) T Z k p Z Make sure Z T k H kz k is positive definite

185 Step Decomposition Revisited 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 (QP k ) Decomposition p k = Y k p Y,k + Z k p Z,k Range space step p Y,k = [ c T k Y k ] 1 f k Reduced space QP 1 min p Z 2 pt Z [Zk T H k Z k ]p Z + ( f k + H k Y k p Y,k ) T Z k p Z Make sure Z T k H kz k is positive definite (e.g., BFGS)

186 Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations

187 Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k

188 Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k Step p k is accepted if pred k ared k η with (η (0, 1)) pred k = m k (0) m k (p k ), ared k = φ ρ (x k ) φ ρ (x k + p k )

189 Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k Step p k is accepted if pred k ared k η with (η (0, 1)) pred k = m k (0) m k (p k ), ared k = φ ρ (x k ) φ ρ (x k + p k ) Piece-wise quadratic model of φ ρ (x) = f (x) + ρ c(x) 1 : m k (p) = f k + f T k p pt H k p + ρ c k + c T k p 1

190 Trust-Region SQP Method 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0, p k (QP k ) Trust-region radius k, updated throughout iterations No positive-definiteness requirements for H k Step p k is accepted if pred k ared k η with (η (0, 1)) pred k = m k (0) m k (p k ), ared k = φ ρ (x k ) φ ρ (x k + p k ) Piece-wise quadratic model of φ ρ (x) = f (x) + ρ c(x) 1 : m k (p) = f k + f T k p pt H k p + ρ c k + c T k p 1 Otherwise, decrease k

191 Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible

192 Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible One remedy: Penalize constraint violation 1 min p R n 2 pt H k p + fk T p s.t. c T k p + c k = 0 p k

193 Inconsistent QPs x k p c(x) = 0 x If x k is not feasible and k small, (QP k ) might not be feasible One remedy: Penalize constraint violation 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1

194 Fletcher s Sl 1 QP 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1

195 Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p pt H k p + ρ c k + c T k p 1

196 Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p pt H k p + ρ c k + c T k p 1 Natural algorithm for minimizing φ ρ (x).

197 Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p pt H k p + ρ c k + c T k p 1 Natural algorithm for minimizing φ ρ (x). Difficulty: Selecting sufficiently large value of ρ.

198 Fletcher s Sl 1 QP is equivalent to 1 min p R n ;t,s R m 2 pt H k p + fk T p + ρ s.t. c T k p + c k = s t p k, s, t 0 m (s j + t j ) j=1 min p R n m k(p) = f k + f T k p pt H k p + ρ c k + c T k p 1 Natural algorithm for minimizing φ ρ (x). Difficulty: Selecting sufficiently large value of ρ. This motivated the invention of filter methods.

199 Byrd-Omojokun Trust-Region Algorithm x k n k x Decompose step p k = n k + t k

200 Byrd-Omojokun Trust-Region Algorithm x k n k x Decompose step p k = n k + t k Normal component n k towards feasibility

201 Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility Tangential component t k towards optimality

202 Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility Tangential component t k towards optimality min n c T k n + c k 2 2 s.t. n k

203 Byrd-Omojokun Trust-Region Algorithm x k n k pk t k x Decompose step p k = n k + t k Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2

204 Byrd-Omojokun Trust-Region Algorithm Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2 Subproblems can be solved inexactly

205 Byrd-Omojokun Trust-Region Algorithm Normal component n k towards feasibility min n c T k n + c k 2 2 s.t. n k Tangential component t k towards optimality 1 min t 2 (n k +t) T H k (n k +t) + fk T (n k +t) s.t. ck T t = 0 t 2 2 k n k 2 2 Subproblems can be solved inexactly Normal problem: Dogleg method.

An Inexact Newton Method for Nonlinear Constrained Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results