Algorithms for nonlinear programming problems II Martin Branda Charles University Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics Computational Aspects of Optimization Martin Branda (KPMS MFF UK) 2017-05-15 1 / 34
Algorithm classification Order of derivatives 1 : derivative-free, first order (gradient), second-order (Newton) Feasibility of the constructed points: interior and exterior point methods Deterministic/randomized Local/global 1 If possible, deliver the derivatives. Martin Branda (KPMS MFF UK) 2017-05-15 2 / 34
Contents 1 Interior point method and barrier functions 2 Sequential Quadratic Programming 3 Other topics Martin Branda (KPMS MFF UK) 2017-05-15 3 / 34
Barrier method f : R n R, g j : R n R min x f (x) s.t. g j (x) 0, j = 1,..., m. (Extension including equality constraints is straightforward.) Assumption {x : g(x) < 0}. Martin Branda (KPMS MFF UK) 2017-05-15 4 / 34
Barrier method Barrier functions: continuous with the following properties B(y) 0, y < 0, lim y 0 B(y) =. Examples 1, log min{1, y}. y Maybe the most popular barrier function: Set and solve where µ > 0 is a parameter. B(y) = log y B(x) = m B ( g j (x) ), j=1 min f (x) + µ B(x), (1) Martin Branda (KPMS MFF UK) 2017-05-15 5 / 34
Interior point methods f : R n R, g j : R n R min x f (x) s.t. g j (x) 0, j = 1,..., m. (Extension including equality constraints is straightforward.) New slack decision variables s R m min x,s f (x) s.t. g j (x) + s j = 0, j = 1,..., m, s j 0. (2) Martin Branda (KPMS MFF UK) 2017-05-15 6 / 34
Interior point methods Barrier problem min x,s f (x) µ m log s j j=1 s.t. g(x) + s = 0. (3) The barrier term prevents the components of s from becoming too close to zero. Lagrangian function L(x, s, z) = f (x) µ m log s j j=1 m z j (g j (x) + s j ). j=1 Martin Branda (KPMS MFF UK) 2017-05-15 7 / 34
Interior point methods KKT conditions for barrier problem (matrix notation) f (x) g T (x)z = 0, µs 1 e z = 0, g(x) + s = 0, S = diag{s 1,..., s m }, Z = diag{z 1,..., z m }, g(x) is the Jacobian matrix (components of function in rows?) Multiply the second equality by S f (x) g T (x)z = 0, SZe = µe, g(x) + s = 0, = Nonlinear system of equalities Newton s method Martin Branda (KPMS MFF UK) 2017-05-15 8 / 34
Newton s method f (x + v) = f (x) + 2 f (x)v = 0 with the solution (under 2 f (x) 0) v = ( 2 f (x) ) 1 f (x) Martin Branda (KPMS MFF UK) 2017-05-15 9 / 34
Interior point methods Use Newton s method to obtain a step ( x, s, z) H(x, z) 0 g T 0 Z S g I 0 x s z = f (x) + g T (x)z SZe + µe g(x) s H(x, z) = 2 f (x) m j=1 z j 2 g j (x), 2 f denotes the Hessian matrix. Martin Branda (KPMS MFF UK) 2017-05-15 10 / 34
Interior point methods Stopping criterion: { } f E = max (x) g T (x)z, SZe + µe, g(x) + s ε, ε > 0 small. Martin Branda (KPMS MFF UK) 2017-05-15 11 / 34
Interior point methods ALGORITHM: 0. Choose x 0 and s 0 > 0, and compute initial values for the multipliers z 0 > 0. Select an initial barrier parameter µ 0 > 0 and parameter σ (0, 1), set k = 1. 1. Repeat until a stopping test for the nonlinear program (19.1) is satisfied: Solve the nonlinear system of equalities using Newton s method and obtain (x k, s k, z k ). Decrease barrier parameter µ k+1 = σµ k, set k = k + 1. Martin Branda (KPMS MFF UK) 2017-05-15 12 / 34
Interior point methods Convergence of the method (Nocedal and Wright 2006, Theorem 19.1): continuously differentiable f, g j, LICQ at any limit point, then the limits are stationary points of the original problem Martin Branda (KPMS MFF UK) 2017-05-15 13 / 34
Interior point method Example min x (x 1 2) 4 + (x 1 2x 2 ) 2 s.t. x 2 1 x 2 0. (4) min x,s (x 1 2) 4 + (x 1 2x 2 ) 2 s.t. x 2 1 x 2 + s = 0, s 0. (5) min x,s (x 1 2) 4 + (x 1 2x 2 ) 2 µ log s s.t. x 2 1 x 2 + s = 0. (6) Martin Branda (KPMS MFF UK) 2017-05-15 14 / 34
Interior point method Example Lagrange function L(x 1, x 2, s, z) = (x 1 2) 4 + (x 1 2x 2 ) 2 µ log s z(x 2 1 x 2 + s). Optimality conditions together with feasibility L x 1 = 4(x 1 2) 3 + 2(x 1 2x 2 ) 2zx 1 = 0, L x 2 = 4(x 1 2x 2 ) + z = 0, L s = µ s z = 0, L z = x 1 2 x 2 + s = 0. We have obtained 4 equations with 4 variables... (7) Martin Branda (KPMS MFF UK) 2017-05-15 15 / 34
Interior point method Example Slight modification 4(x 1 2) 3 + 2(x 1 2x 2 ) 2zx 1 = 0, (8) 4(x 1 2x 2 ) + z = 0, (9) sz µ = 0, (10) x1 2 x 2 + s = 0. (11) Necessary derivatives H(x 1, x 2, z) = g(x) = ( 12(x1 2) 2 + 2 2z 4 ( 2x1 ) 4 8 1 ) (12) (13) Martin Branda (KPMS MFF UK) 2017-05-15 16 / 34
Interior point method Example System of linear equations for Newton s step 12(x 1 2) 2 + 2 2z 4 0 2x 1 4 8 0 1 0 0 z s 2x 1 1 1 0 = x 1 x 2 s z 4(x 1 2) 3 2(x 1 2x 2 ) + 2zx 1 4(x 1 2x 2 ) z sz + µ x 2 1 + x 2 s = Martin Branda (KPMS MFF UK) 2017-05-15 17 / 34
Interior point method Example Starting point x 0 = (0, 1), z 0 = 1, s 0 = 1, µ > 0, then the step... 48 4 0 0 4 8 0 1 0 0 1 1 0 1 1 0 x 1 x 2 s z = 36 9 1 + µ 0 Martin Branda (KPMS MFF UK) 2017-05-15 18 / 34
Interior point method Example 2.0 1.5 1.0 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Martin Branda (KPMS MFF UK) 2017-05-15 19 / 34
Interior point method for LP Nocedal and Wright (2006), Chapter 14: A R m n full row rank (P) min c T x, s.t. Ax = b, x 0, (D) max b T λ, s.t. A T λ + s = c, s 0. (14) Lagrangean function KKT optimality conditions L(x, λ, s) = c T x λ T (Ax b) s T x. A T λ + s = c, Ax = b, s T x = 0, x 0, s 0. (15) Martin Branda (KPMS MFF UK) 2017-05-15 20 / 34
Interior point method for LP Define the mapping F (x, λ, s) = A T λ + s c Ax b XSe = 0, (16) under x 0, s 0, where S = diag{s 1,..., s n }, X = diag{x 1,..., x n }. To obtain a step, solve x J (x, λ, s) λ = F (x, λ, s), (17) s under x 0, s 0 leading to 0 A T I x A 0 0 λ S 0 X s = A T λ s + c Ax + b XSe. (18) Martin Branda (KPMS MFF UK) 2017-05-15 21 / 34
Interior point method for LP min c T x µ s.t. Ax = b. n log x j j=1 (19) KKT conditions... Martin Branda (KPMS MFF UK) 2017-05-15 22 / 34
Contents Sequential Quadratic Programming 1 Interior point method and barrier functions 2 Sequential Quadratic Programming 3 Other topics Martin Branda (KPMS MFF UK) 2017-05-15 23 / 34
Sequential Quadratic Programming Sequential Quadratic Programming Nocedal and Wright (2006): Let f, h i : R n R be smooth functions, min f (x) x s.t. h i (x) = 0, i = 1,..., l. (20) Lagrange function L(x, v) = f (x) + and KKT optimality conditions l v i h i (x) i=1 x L(x, v) = x f (x) + A(x) T v = 0, h(x) = 0, (21) where h(x) T = (h 1 (x),..., h l (x)) and A(x) T = [ h 1 (x), h 2 (x),..., h l (x)] denotes the Jacobian matrix. Martin Branda (KPMS MFF UK) 2017-05-15 24 / 34
Sequential Quadratic Programming Sequential Quadratic Programming We have system of n + l equations in the n + l unknowns x and v: [ x f (x) + A(x) F (x, v) = T ] v = 0. (22) h(x) ASS. (LICQ) Jacobian matrix A(x) has full row rank. The Jacobian is given by [ 2 2 F (x, v) = xx L(x, v) A(x) T ]. (23) A(x) 0 We can use the Newton algorithm.. Martin Branda (KPMS MFF UK) 2017-05-15 25 / 34
Sequential Quadratic Programming Sequential Quadratic Programming Setting f k = f (x k ), 2 xxl k = 2 xxl(x k, v k ), A k = A(x k ), h k = h(x k ), we obtain the Newton step by solving the system [ 2 xx L k A T ] [ ] [ k px fk A = T k v ] k (24) A k 0 p v h k Then we set x k+1 = x k + p x and v k+1 = v k + p v. Martin Branda (KPMS MFF UK) 2017-05-15 26 / 34
Sequential Quadratic Programming Sequential Quadratic Programming ASS. (SOSC) For all d { d 0 : A(x) d = 0}, it holds d T 2 xxl(x, v)d > 0. Martin Branda (KPMS MFF UK) 2017-05-15 27 / 34
Sequential Quadratic Programming Sequential Quadratic Programming Important alternative way to see the Newton iterations: Consider the quadratic program min p f k + p T f k + 1 2 pt 2 xxl k p s.t. h k + A k p = 0. (25) KKT optimality conditions 2 xxl k p + f k + A T k ṽ = 0 h k + A k p = 0, (26) [ 2 xx L k A T k A k 0 ] [ px pṽ ] [ fk = h k ] (27) which is the same as the Newton system if we add A T k v k to the first equation. Then we set x k+1 = x k + p x and v k+1 = pṽ. Martin Branda (KPMS MFF UK) 2017-05-15 28 / 34
Sequential Quadratic Programming Sequential Quadratic Programming Algorithm: Start with an initial solution (x 0, v 0 ) and iterate until a convergence criterion is met: 1. Evaluate f k = f (x k ), h k = h(x k ), A k = A(x k ), 2 xxl k = 2 xxl(x k, v k ). 2. Solve the Newton equations OR the quadratic problem to obtain new (x k+1, v k+1 ). If possible, deliver explicit formulas for first and second order derivatives. Martin Branda (KPMS MFF UK) 2017-05-15 29 / 34
Sequential Quadratic Programming Sequential Quadratic Programming f, g j, h i : R n R are smooth. We use a quadratic approximation of the objective function and linearize the constraints, p R p min p f (x k ) + p T x f (x k ) + 1 2 pt 2 xxl(x k, u k, v k )p s.t. g j (x k ) + p T x g j (x k ) 0, j = 1,..., m, h i (x k ) + p T x h i (x k ) = 0, i = 1,..., l. (28) Use an algorithm for quadratic programming to solve the problem and set x k+1 = x k + p k, where u k+1, v k+1 are Lagrange multipliers of the quadratic problem which are used to compute new 2 xxl. Convergence: Nocedal and Wright (2006), Theorem 18.1 Martin Branda (KPMS MFF UK) 2017-05-15 30 / 34
Contents Other topics 1 Interior point method and barrier functions 2 Sequential Quadratic Programming 3 Other topics Martin Branda (KPMS MFF UK) 2017-05-15 31 / 34
Other topics Conjugate gradient method Nocedal and Wright (2006), Chapter 5: Consider (unconstrained) quadratic programming problem min 1 2 x T Ax b T x. Let A R n n be a symmetric positive definite matrix. We say that vectors p 1,... p n are conjugate with respect to A if If we set x k+1 = x k + α k p k, where (p i ) T Ap j = 0 for all i j. r k = Ax k b, α k then x n+1 is an optimal solution. = (r k ) T p k (p k ) T Ap k, (29) Martin Branda (KPMS MFF UK) 2017-05-15 32 / 34
Other topics Mathematica Solver Decision Tree Martin Branda (KPMS MFF UK) 2017-05-15 33 / 34
Literature Other topics Bazaraa, M.S., Sherali, H.D., and Shetty, C.M. (2006). Nonlinear programming: theory and algorithms, Wiley, Singapore, 3rd edition. Boyd, S., Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, Cambridge. P. Kall, J. Mayer: Stochastic Linear Programming: Models, Theory, and Computation. Springer, 2005. Nocedal, J., Wright, J.S. (2006). Numerical optimization. Springer, New York, 2nd edition. Martin Branda (KPMS MFF UK) 2017-05-15 34 / 34