Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints

Similar documents
A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

Second-order cone programming

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Interior Point Methods for Mathematical Programming

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

arxiv: v2 [math.oc] 11 Jan 2018


Lecture 18: Optimization Programming

Lecture 9 Sequential unconstrained minimization

Interior Point Algorithms for Constrained Convex Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization M2

5. Duality. Lagrangian

IE 5531 Midterm #2 Solutions

LINEAR AND NONLINEAR PROGRAMMING

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Lagrangian Duality Theory

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Optimization for Machine Learning

More First-Order Optimization Algorithms

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

An interior-point trust-region polynomial algorithm for convex programming

Convex Optimization and l 1 -minimization

On the behavior of Lagrange multipliers in convex and non-convex infeasible interior point methods

Interior Point Methods in Mathematical Programming

A Brief Review on Convex Optimization

Lecture 3. Optimization Problems and Iterative Algorithms

10 Numerical methods for constrained problems

Interior-Point Methods for Linear Optimization

Primal/Dual Decomposition Methods

Composite nonlinear models at scale

12. Interior-point methods

Barrier Method. Javier Peña Convex Optimization /36-725

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

Optimisation in Higher Dimensions

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Algorithms for constrained local optimization

CS711008Z Algorithm Design and Analysis

Lecture: Duality.

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

IE 5531: Engineering Optimization I

Primal-dual Subgradient Method for Convex Problems with Functional Constraints


Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

Primal-dual IPM with Asymmetric Barrier

Numerical Optimization

Nonlinear Programming

On Polynomial-time Path-following Interior-point Methods with Local Superlinear Convergence

Lecture 13: Constrained optimization

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Derivative-Based Numerical Method for Penalty-Barrier Nonlinear Programming

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Unconstrained minimization: assumptions

Infeasibility Detection in Nonlinear Optimization

Lecture 6: Conic Optimization September 8

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Numerical Optimization. Review: Unconstrained Optimization

COM S 578X: Optimization for Machine Learning

5 Handling Constraints

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Constrained Optimization and Lagrangian Duality

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

INTERIOR-POINT METHODS FOR NONLINEAR, SECOND-ORDER CONE, AND SEMIDEFINITE PROGRAMMING. Hande Yurttan Benson

12. Interior-point methods

Support Vector Machines: Maximum Margin Classifiers

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Nonlinear Optimization for Optimal Control

Newton s Method. Javier Peña Convex Optimization /36-725

Nonlinear Optimization: What s important?

Complexity of gradient descent for multiobjective optimization

An Inexact Newton Method for Nonlinear Constrained Optimization

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Nonlinear Optimization Solvers

A Value Estimation Approach to the Iri-Imai Method for Constrained Convex Optimization. April 2003; revisions on October 2003, and March 2004

1 Computing with constraints

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

2.098/6.255/ Optimization Methods Practice True/False Questions

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

On the complexity of an Inexact Restoration method for constrained optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm

Lecture: Duality of LP, SOCP and SDP

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

arxiv: v1 [math.na] 8 Jun 2018

Convex Optimization Overview (cnt d)

Transcription:

Apolynomialtimeinteriorpointmethodforproblemswith nonconvex constraints Oliver Hinder, Yinyu Ye Department of Management Science and Engineering Stanford University June 28, 2018

The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives.

The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives. I Captures a huge variety of practical problems... drinking water network, electricity network, trajectory optimization, engineering design, etc.

The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives. I Captures a huge variety of practical problems... drinking water network, electricity network, trajectory optimization, engineering design, etc. I In worst-case finding global optima takes an exponential amount of time.

The problem I Consider problem min f (x) s.t.a(x) apple 0 x2r d where f : R d! R and a : R d! R m have Lipschitz first and second derivatives. I Captures a huge variety of practical problems... drinking water network, electricity network, trajectory optimization, engineering design, etc. I In worst-case finding global optima takes an exponential amount of time. I Instead we want to find an approximate local optima, more precisely a Fritz John point.

µ-approximate Fritz John point a(x) < 0 q (Primal feasibility) r x f (x) y T ra(x) 2 apple µ kyk 1 +1 y > 0 (Dual feasibility) y i a i (x) µ 2 [1/2, 3/2] 8i 2{1,...,m} (Complementarity)

µ-approximate Fritz John point a(x) < 0 q (Primal feasibility) r x f (x) y T ra(x) 2 apple µ kyk 1 +1 y > 0 (Dual feasibility) y i a i (x) µ 2 [1/2, 3/2] 8i 2{1,...,m} (Complementarity) I Fritz John is a necessary condition for local optimality I MFCQ constraint qualification, i.e., dual multipliers are bounded then Fritz John = KKT point objective objective Fritz John points KKT point Dual multipliers Fritz John point Dual multipliers Local minimizers KKT points

How do we solve this problem?

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima.

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc.

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc.

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc. I Local superlinear convergence is known.

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc. I Local superlinear convergence is known. I Global convergence: most results show convergence in limit but provide no explicit runtime bound. Implicitly runtime bound is exponential in 1/µ.

How do we solve this problem? I One popular approach, move constraints into a log barrier: µ(x) :=f (x) µ log( a(x)). and apply (modified) Newton s method to find an approximate local optima. I Used for real systems, e.g., drinking water networks, electricity networks, trajectory control, etc. I Examples of practical codes using this method include IPOPT, KNITRO, LOQO, One-Phase (my code), etc. I Local superlinear convergence is known. I Global convergence: most results show convergence in limit but provide no explicit runtime bound. Implicitly runtime bound is exponential in 1/µ. I Goal: runtime with polynomial in 1/µ to find µ-approximate Fritz John point.

Brief history of IPM theory

Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ ))

Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) Log barrier + Newton [Regnar, 1988]

Brief history of IPM theory Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) Log barrier + Newton [Regnar, 1988]

Brief history of IPM theory Conic optimization Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) General objective/constraints Log barrier + Newton [Regnar, 1988] Self-concordant barriers [Nesterov, Nemiroviskii, 1994] O( v log(1/ ))

Brief history of IPM theory Conic optimization Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) General objective/constraints Log barrier + Newton [Regnar, 1988] Self-concordant barriers [Nesterov, Nemiroviskii, 1994] O( v log(1/ )) Successful implementations: MOSEK, GUROBI, Successful implementations: LOQO, KNTIRO, IPOPT, Successful implementations: SEDUMI, MOSEK, GUROBI,

Brief history of IPM theory Conic optimization Linear programming Birth of interior point methods [Karmarkar, 1984] O( m log(1/ )) General objective/constraints Log barrier + Newton [Regnar, 1988] Self-concordant barriers [Nesterov, Nemiroviskii, 1994] O( v log(1/ )) Successful implementations: MOSEK, GUROBI, Successful implementations: LOQO, KNTIRO, IPOPT, Successful implementations: SEDUMI, MOSEK, GUROBI, Good theoretical understanding Less theoretical understanding

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple.

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ).

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006].

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017].

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points.

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints.

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime.

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime. I Sequential linear programming [Cartis et al, 2015] has O( 2 )runtime.

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime. I Sequential linear programming [Cartis et al, 2015] has O( 2 )runtime. I Both these methods plug and play directly with unconstrained optimization methods, e.g., they need their penalty functions to have Lipschitz derivatives.

Literature review nonconvex optimization I Unconstrained. Goal: find a point with krf (x)k 2 apple. I Gradient descent O( 2 ). I Cubic regularization O( 3/2 )[Nesterov,Polyak,2006]. I Best achievable runtimes (of any method) with rf and r 2 f Lipschitz respectively [Carmon, Duchi, Hinder, Sidford, 2017]. I Nonconvex objective, linear inequality constraints. A ne scaling IPM [Ye, (1998), Bian, Chen, and Ye (2015), Haeser, Liu, and Ye (2017)]. Has O( 3/2 ) runtime to find KKT points. I Nonconvex objective and constraints. I Apply cubic regularization to quadratic penalty function [Birgin et al, 2016]. Has O( 2 )runtime. I Sequential linear programming [Cartis et al, 2015] has O( 2 )runtime. I Both these methods plug and play directly with unconstrained optimization methods, e.g., they need their penalty functions to have Lipschitz derivatives. I Log barrier does not have Lipschitz derivatives!

Our IPM for nonconvex optimization Feasible region

Our IPM for nonconvex optimization

Our IPM for nonconvex optimization

Our IPM for nonconvex optimization! " ($) Reminder: µ(x) :=f (x) µ log( a(x)).

Our IPM for nonconvex optimization! " ($) Reminder: µ(x) :=f (x) µ log( a(x)).

Our IPM for nonconvex optimization! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization! " ($) & Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization & $! " ($) ' Each iteration of our IPM we: d x 2 argmin u: u 2 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 1 r 2 ut r 2 µ(x)u + u T r µ (x) 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization! " ($) '( $ & Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization?! " ($) Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Our IPM for nonconvex optimization! " ($) Theoretical properties? Each iteration of our IPM we: 1 Pick trust region size r µ 3/4 (1 + kyk 1 ) 1/2. 2 Compute direction d x by solving trust region problem. 3 Pick step size 2 (0, 1] to ensure x new = x + d x satisfies a(x) < 0. 4 Is new point Fritz John point? If no then return to step one.

Nonconvex result Theorem Assume: I Strictly feasible initial point x (0). I rf, ra, r 2 f, r 2 a are Lipschitz. Then our IPM takes as most O µ(x (0) ) inf z µ(z) µ 7/4 trust region steps to terminate with a µ-approximate second-order Fritz John point (x +, y + ).

Nonconvex result Theorem Assume: I Strictly feasible initial point x (0). I rf, ra, r 2 f, r 2 a are Lipschitz. Then our IPM takes as most O µ(x (0) ) inf z µ(z) µ 7/4 trust region steps to terminate with a µ-approximate second-order Fritz John point (x +, y + ). algorithm # iteration primitive evaluates Birgin et al., 2016 O(µ 3 ) vector operation r Cartis et al., 2011 O(µ 2 ) linear program r Birgin et al., 2016 O(µ 2 ) KKT of NCQP r, r 2 IPM (this paper) O(µ 7/4 ) linear system r, r 2

Convex result Modify our IPM have sequence of decreasing µ, i.e.,µ (j) with µ (j)! 0. Theorem Assume: I Strictly feasible initial point x (0) and feasible region is bounded. I f, rf, ra, r 2 f, r 2 a are Lipschitz. I Slaters condition holds. Then our IPM starting takes at most Õ m 1/3 2/3 trust region steps to terminate with a -optimal solution.

Questions?

One-phase results