Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints

Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints Oliver Hinder, Yinyu Ye Department of Management Science and Engineering Stanford University June 28, 2018

The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives.

The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives. I Captures a huge variety of practical problems... drinking water network, electricity network, trajectory optimization, engineering design, etc. I In worst-case finding global optima takes an exponential amount of time.

µ-approximate Fritz John point a(x) < 0 q (Primal feasibility) r x f (x) y T ra(x) 2 apple µ kyk 1 +1 y > 0 (Dual feasibility) y i a i (x) µ 2 [1/2, 3/2] 8i 2{1,...,m} (Complementarity)

µ-approximate Fritz John point a(x) < 0 q (Primal feasibility) r x f (x) y T ra(x) 2 apple µ kyk 1 +1 y > 0 (Dual feasibility) y i a i (x) µ 2 [1/2, 3/2] 8i 2{1,...,m} (Complementarity) I Fritz John is a necessary condition for local optimality I MFCQ constraint qualification, i.e., dual multipliers are bounded then Fritz John = KKT point objective objective Fritz John points KKT point Dual multipliers Fritz John point Dual multipliers Local minimizers KKT points

How do we solve this problem?

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima.

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc. I Local superlinear convergence is known. I Global convergence: most results show convergence in limit but provide no explicit runtime bound. Implicitly runtime bound is exponential in 1/µ.

Brief history of IPM theory

Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ ))

Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) Log barrier + Newton [Regnar, 1988]

Brief history of IPM theory Conic optimization Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) General objective/constraints Log barrier + Newton [Regnar, 1988] Self-concordant barriers [Nesterov, Nemiroviskii, 1994] O( v log(1/ )) Successful implementations: MOSEK, GUROBI, Successful implementations: LOQO, KNTIRO, IPOPT, Successful implementations: SEDUMI, MOSEK, GUROBI,

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple.

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ).

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006].

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime. I Sequential linear programming [Cartis et al, 2015] has O( 2 )runtime.

Our IPM for nonconvex optimization Feasible region

Our IPM for nonconvex optimization

Our IPM for nonconvex optimization! " ($) Reminder: µ(x) :=f (x) µ log( a(x)).

Our IPM for nonconvex optimization! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization! " ($) & Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization & $! " ($) ' Each iteration of our IPM we: d x 2 argmin u: u 2 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 1 r 2 ut r 2 µ(x)u + u T r µ (x) 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization! " ($) '( $ & Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization?! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization! " ($) Theoretical properties? Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Nonconvex result Theorem Assume: I Strictly feasible initial point x (0). I rf, ra, r 2 f, r 2 a are Lipschitz. Then our IPM takes as most O µ(x (0) ) inf z µ(z) µ 7/4 trust region steps to terminate with a µ-approximate second-order Fritz John point (x +, y + ). algorithm # iteration primitive evaluates Birgin et al., 2016 O(µ 3 ) vector operation r Cartis et al., 2011 O(µ 2 ) linear program r Birgin et al., 2016 O(µ 2 ) KKT of NCQP r, r 2 IPM (this paper) O(µ 7/4 ) linear system r, r 2

Convex result Modify our IPM have sequence of decreasing µ, i.e.,µ (j) with µ (j)! 0. Theorem Assume: I Strictly feasible initial point x (0) and feasible region is bounded. I f, rf, ra, r 2 f, r 2 a are Lipschitz. I Slaters condition holds. Then our IPM starting takes at most Õ m 1/3 2/3 trust region steps to terminate with a -optimal solution.

Questions?

One-phase results