Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints Oliver Hinder, Yinyu Ye Department of Management Science and Engineering Stanford University June 28, 2018
The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives.
The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives. I Captures a huge variety of practical problems... drinking water network, electricity network, trajectory optimization, engineering design, etc.
The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives. I Captures a huge variety of practical problems... drinking water network, electricity network, trajectory optimization, engineering design, etc. I In worst-case finding global optima takes an exponential amount of time.
The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives. I Captures a huge variety of practical problems... drinking water network, electricity network, trajectory optimization, engineering design, etc. I In worst-case finding global optima takes an exponential amount of time. I Instead we want to find an approximate local optima, more precisely a Fritz John point.
µ-approximate Fritz John point a(x) < 0 q (Primal feasibility) r x f (x) y T ra(x) 2 apple µ kyk 1 +1 y > 0 (Dual feasibility) y i a i (x) µ 2 [1/2, 3/2] 8i 2{1,...,m} (Complementarity)
µ-approximate Fritz John point a(x) < 0 q (Primal feasibility) r x f (x) y T ra(x) 2 apple µ kyk 1 +1 y > 0 (Dual feasibility) y i a i (x) µ 2 [1/2, 3/2] 8i 2{1,...,m} (Complementarity) I Fritz John is a necessary condition for local optimality I MFCQ constraint qualification, i.e., dual multipliers are bounded then Fritz John = KKT point objective objective Fritz John points KKT point Dual multipliers Fritz John point Dual multipliers Local minimizers KKT points
How do we solve this problem?
How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima.
How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc.
How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc.
How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc. I Local superlinear convergence is known.
How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc. I Local superlinear convergence is known. I Global convergence: most results show convergence in limit but provide no explicit runtime bound. Implicitly runtime bound is exponential in 1/µ.
How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc. I Local superlinear convergence is known. I Global convergence: most results show convergence in limit but provide no explicit runtime bound. Implicitly runtime bound is exponential in 1/µ. I Goal: runtime with polynomial in 1/µ to find µ-approximate Fritz John point.
Brief history of IPM theory
Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ ))
Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) Log barrier + Newton [Regnar, 1988]
Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) Log barrier + Newton [Regnar, 1988]
Brief history of IPM theory Conic optimization Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) General objective/constraints Log barrier + Newton [Regnar, 1988] Self-concordant barriers [Nesterov, Nemiroviskii, 1994] O( v log(1/ ))
Brief history of IPM theory Conic optimization Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) General objective/constraints Log barrier + Newton [Regnar, 1988] Self-concordant barriers [Nesterov, Nemiroviskii, 1994] O( v log(1/ )) Successful implementations: MOSEK, GUROBI, Successful implementations: LOQO, KNTIRO, IPOPT, Successful implementations: SEDUMI, MOSEK, GUROBI,
Brief history of IPM theory Conic optimization Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) General objective/constraints Log barrier + Newton [Regnar, 1988] Self-concordant barriers [Nesterov, Nemiroviskii, 1994] O( v log(1/ )) Successful implementations: MOSEK, GUROBI, Successful implementations: LOQO, KNTIRO, IPOPT, Successful implementations: SEDUMI, MOSEK, GUROBI, Good theoretical understanding Less theoretical understanding
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple.
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ).
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006].
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017].
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points.
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints.
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime.
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime. I Sequential linear programming [Cartis et al, 2015] has O( 2 )runtime.
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime. I Sequential linear programming [Cartis et al, 2015] has O( 2 )runtime. I Both these methods plug and play directly with unconstrained optimization methods, e.g., they need their penalty functions to have Lipschitz derivatives.
Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime. I Sequential linear programming [Cartis et al, 2015] has O( 2 )runtime. I Both these methods plug and play directly with unconstrained optimization methods, e.g., they need their penalty functions to have Lipschitz derivatives. I Log barrier does not have Lipschitz derivatives!
Our IPM for nonconvex optimization Feasible region
Our IPM for nonconvex optimization
Our IPM for nonconvex optimization
Our IPM for nonconvex optimization! " ($) Reminder: µ(x) :=f (x) µ log( a(x)).
Our IPM for nonconvex optimization! " ($) Reminder: µ(x) :=f (x) µ log( a(x)).
Our IPM for nonconvex optimization! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.
Our IPM for nonconvex optimization! " ($) & Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.
Our IPM for nonconvex optimization & $! " ($) ' Each iteration of our IPM we: d x 2 argmin u: u 2 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 1 r 2 ut r 2 µ(x)u + u T r µ (x) 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.
Our IPM for nonconvex optimization! " ($) '( $ & Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.
Our IPM for nonconvex optimization! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.
Our IPM for nonconvex optimization?! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.
Our IPM for nonconvex optimization! " ($) Theoretical properties? Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.
Nonconvex result Theorem Assume: I Strictly feasible initial point x (0). I rf, ra, r 2 f, r 2 a are Lipschitz. Then our IPM takes as most O µ(x (0) ) inf z µ(z) µ 7/4 trust region steps to terminate with a µ-approximate second-order Fritz John point (x +, y + ).
Nonconvex result Theorem Assume: I Strictly feasible initial point x (0). I rf, ra, r 2 f, r 2 a are Lipschitz. Then our IPM takes as most O µ(x (0) ) inf z µ(z) µ 7/4 trust region steps to terminate with a µ-approximate second-order Fritz John point (x +, y + ). algorithm # iteration primitive evaluates Birgin et al., 2016 O(µ 3 ) vector operation r Cartis et al., 2011 O(µ 2 ) linear program r Birgin et al., 2016 O(µ 2 ) KKT of NCQP r, r 2 IPM (this paper) O(µ 7/4 ) linear system r, r 2
Convex result Modify our IPM have sequence of decreasing µ, i.e.,µ (j) with µ (j)! 0. Theorem Assume: I Strictly feasible initial point x (0) and feasible region is bounded. I f, rf, ra, r 2 f, r 2 a are Lipschitz. I Slaters condition holds. Then our IPM starting takes at most Õ m 1/3 2/3 trust region steps to terminate with a -optimal solution.
Questions?
One-phase results