Inexact Newton Methods and Nonlinear Constrained Optimization

Similar documents
An Inexact Newton Method for Optimization

PDE-Constrained and Nonsmooth Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization

Large-Scale Nonlinear Optimization with Inexact Step Computations

Numerical Methods for PDE-Constrained Optimization

Recent Adaptive Methods for Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

1. Introduction. We consider nonlinear optimization problems of the form. f(x) ce (x) = 0 c I (x) 0,

On the Implementation of an Interior-Point Algorithm for Nonlinear Optimization with Inexact Step Computations

An Inexact Newton Method for Nonconvex Equality Constrained Optimization

1. Introduction. In this paper we discuss an algorithm for equality constrained optimization problems of the form. f(x) s.t.

Algorithms for Constrained Optimization


5 Handling Constraints

Constrained Nonlinear Optimization Algorithms

MS&E 318 (CME 338) Large-Scale Numerical Optimization

A GLOBALLY CONVERGENT STABILIZED SQP METHOD

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

A Penalty-Interior-Point Algorithm for Nonlinear Constrained Optimization

A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization

A Note on the Implementation of an Interior-Point Algorithm for Nonlinear Optimization with Inexact Step Computations

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION

What s New in Active-Set Methods for Nonlinear Optimization?

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

Hot-Starting NLP Solvers

A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

Algorithms for constrained local optimization

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION

10 Numerical methods for constrained problems

A Penalty-Interior-Point Algorithm for Nonlinear Constrained Optimization

Feasible Interior Methods Using Slacks for Nonlinear Optimization

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

Derivative-Based Numerical Method for Penalty-Barrier Nonlinear Programming

A New Penalty-SQP Method

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO

On the use of piecewise linear models in nonlinear programming

A globally and quadratically convergent primal dual augmented Lagrangian algorithm for equality constrained optimization

An Interior Algorithm for Nonlinear Optimization That Combines Line Search and Trust Region Steps

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

A Line search Multigrid Method for Large-Scale Nonlinear Optimization

A STABILIZED SQP METHOD: GLOBAL CONVERGENCE

2.3 Linear Programming

Implementation of an Interior Point Multidimensional Filter Line Search Method for Constrained Optimization

minimize x subject to (x 2)(x 4) u,

12. Interior-point methods

Lecture 13: Constrained optimization

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

1 Computing with constraints

Optimisation in Higher Dimensions

LINEAR AND NONLINEAR PROGRAMMING

arxiv: v1 [math.na] 8 Jun 2018

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties

A Trust-Funnel Algorithm for Nonlinear Programming

Written Examination

AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS

Applications of Linear Programming

5 Quasi-Newton Methods

arxiv: v2 [math.oc] 11 Jan 2018

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Lagrangian Duality Theory

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Convergence analysis of a primal-dual interior-point method for nonlinear programming

A note on the implementation of an interior-point algorithm for nonlinear optimization with inexact step computations

Some new facts about sequential quadratic programming methods employing second derivatives

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

8 Numerical methods for unconstrained problems

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

A Trust-region-based Sequential Quadratic Programming Algorithm

POWER SYSTEMS in general are currently operating

On the complexity of an Inexact Restoration method for constrained optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

Infeasibility Detection in Nonlinear Optimization

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties

x 2. x 1

Barrier Method. Javier Peña Convex Optimization /36-725

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

arxiv: v1 [math.oc] 20 Aug 2014

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Interior Methods for Mathematical Programs with Complementarity Constraints

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Solving Dual Problems

Transcription:

Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009

Outline PDE-Constrained Optimization Newton s method Inexactness Experimental results Conclusion and final remarks

Outline PDE-Constrained Optimization Newton s method Inexactness Experimental results Conclusion and final remarks

Hyperthermia treatment Regional hyperthermia is a cancer therapy that aims at heating large and deeply seated tumors by means of radio wave adsorption Results in the killing of tumor cells and makes them more susceptible to other accompanying therapies; e.g., chemotherapy

Hyperthermia treatment planning Computer modeling can be used to help plan the therapy for each patient, and it opens the door for numerical optimization The goal is to heat the tumor to a target temperature of 43 C while minimizing damage to nearby cells

PDE-constrained optimization min f (x) s.t. c E(x) = 0 c I(x) 0 Problem is infinite-dimensional Controls and states: x = (u, y) Solution methods integrate numerical simulation problem structure optimization algorithms

Algorithmic frameworks We hear the phrases: Discretize-then-optimize Optimize-then-discretize I prefer: Discretize the optimization problem min f (x) s.t. c(x) = 0 Discretize the optimality conditions min f h(x) s.t. c h (x) = 0 min f (x) s.t. c(x) = 0 [ f + A, λ c ] = 0 [ ] ( f + A, λ )h = 0 c h Discretize the search direction computation

Algorithms Nonlinear elimination min u,y f (u, y) s.t. c(u, y) = 0 min u f (u, y(u)) uf + uy T y f = 0 Reduced-space methods d y : toward satisfying the constraints λ : Lagrange multiplier estimates d u : toward optimality Full-space methods H u 0 A T u d u uf + A T 0 H y A T u λ y d y = y f + A T y λ A u A y 0 δ c

Outline PDE-Constrained Optimization Newton s method Inexactness Experimental results Conclusion and final remarks

Nonlinear equations Newton s method F(x) = 0 F(x k )d k = F(x k ) Judge progress by the merit function φ(x) 1 F(x 2 k) 2 Direction is one of descent since φ(x k ) T d k = F(x k ) T F(x k )d k = F(x k ) 2 < 0 (Note the consistency between the step computation and merit function!)

Equality constrained optimization Consider Lagrangian is min f (x) x R n s.t. c(x) = 0 L(x, λ) f (x) + λ T c(x) so the first-order optimality conditions are [ ] f (x) + c(x)λ L(x, λ) = F(x, λ) = 0 c(x)

Merit function Simply minimizing [ ] ϕ(x, λ) = 1 F(x, 2 λ) 2 = 1 f (x) + c(x)λ 2 2 c(x) is generally inappropriate for constrained optimization We use the merit function φ(x; π) f (x) + π c(x) where π is a penalty parameter

Minimizing a penalty function Consider the penalty function for min (x 1) 2, s.t. x = 0 i.e. φ(x; π) = (x 1) 2 + π x for different values of the penalty parameter π Figure: π = 1 Figure: π = 2

Algorithm 0: Newton method for optimization (Assume the problem is sufficiently convex and regular) for k = 0, 1, 2,... Solve the primal-dual (Newton) equations [ ] [ ] [ ] H(xk, λ k ) c(x k ) dk f (xk ) + c(x c(x k ) T = k )λ k 0 c(x k ) δ k Increase π, if necessary, so that Dφ k (d k ; π k ) 0 (e.g., π k λ k + δ k ) Backtrack from α k 1 to satisfy the Armijo condition φ(x k + α k d k ; π k ) φ(x k ; π k ) + ηα k Dφ k (d k ; π k ) Update iterate (x k+1, λ k+1 ) (x k, λ k ) + α k (d k, δ k )

Convergence of Algorithm 0 Assumption The sequence {(x k, λ k )} is contained in a convex set Ω over which f, c, and their first derivatives are bounded and Lipschitz continuous. Also, (Regularity) c(x k ) T has full row rank with singular values bounded below by a positive constant (Convexity) u T H(x k, λ k )u µ u 2 for µ > 0 for all u R n satisfying u 0 and c(x k ) T u = 0 Theorem (Han (1977)) The sequence {(x k, λ k )} yields the limit [ ] lim f (xk ) + c(x k )λ k k = 0 c(x k )

Outline PDE-Constrained Optimization Newton s method Inexactness Experimental results Conclusion and final remarks

Large-scale primal-dual algorithms Computational issues: Large matrices to be stored Large matrices to be factored Algorithmic issues: The problem may be nonconvex The problem may be ill-conditioned Computational/Algorithmic issues: No matrix factorizations makes difficulties more difficult

Nonlinear equations Compute F(x k )d k = F(x k ) + r k requiring (Dembo, Eisenstat, Steihaug (1982)) r k κ F(x k ), κ (0, 1) Progress judged by the merit function φ(x) 1 F(x 2 k) 2 Again, note the consistency... φ(x k ) T d k = F(x k ) T F(x k )d k = F(x k ) 2 +F(x k ) T r k (κ 1) F(x k ) 2 < 0

Optimization Compute [ ] [ ] [ ] H(xk, λ k ) c(x k ) dk f (xk ) + c(x c(x k ) T = k )λ k + 0 c(x k ) δ k [ ρk r k ] satisfying [ ρk r k ] [ ] κ f (xk ) + c(x k )λ k, κ (0, 1) c(x k ) If κ is not sufficiently small (e.g., 10 3 vs. 10 12 ), then d k may be an ascent direction for our merit function; i.e., Dφ k (d k ; π k ) > 0 for all π k π k 1 Our work begins here... inexact Newton methods for optimization We cover the convex case, nonconvexity, irregularity, inequality constraints

Model reductions Define the model of φ(x; π): m(d; π) f (x) + f (x) T d + π( c(x) + c(x) T d ) d k is acceptable if m(d k ; π k ) m(0; π k ) m(d k ; π k ) = f (x k ) T d k + π k ( c(x k ) c(x k ) + c(x k ) T d k ) 0 This ensures Dφ k (d k ; π k ) 0 (and more)

Termination test 1 The search direction (d k, δ k ) is acceptable if [ ] [ ] ρk κ f (xk ) + c(x k )λ k, κ (0, 1) c(x k ) r k and if for π k = π k 1 and some σ (0, 1) we have m(d k ; π k ) max{ 1 2 d T k H(x k, λ k )d k, 0} + σπ k max{ c(x k ), r k c(x k ) } }{{} 0 for any d

Termination test 2 The search direction (d k, δ k ) is acceptable if ρ k β c(x k ), β > 0 and r k ɛ c(x k ), ɛ (0, 1) Increasing the penalty parameter π then yields m(d k ; π k ) max{ 1 2 d T k H(x k, λ k )d k, 0} + σπ k c(x k ) }{{} 0 for any d

Algorithm 1: Inexact Newton for optimization (Byrd, Curtis, Nocedal (2008)) for k = 0, 1, 2,... Iteratively solve [ ] [ ] [ ] H(xk, λ k ) c(x k ) dk f (xk ) + c(x c(x k ) T = k )λ k 0 c(x k ) until termination test 1 or 2 is satisfied If only termination test 2 is satisfied, increase π so { π k max π k 1, f (x k) T d k + max{ 1 d } T 2 k H(x k, λ k )d k, 0} (1 τ)( c(x k ) r k ) Backtrack from α k 1 to satisfy δ k φ(x k + α k d k ; π k ) φ(x k ; π k ) ηα k m(d k ; π k ) Update iterate (x k+1, λ k+1 ) (x k, λ k ) + α k (d k, δ k )

Convergence of Algorithm 1 Assumption The sequence {(x k, λ k )} is contained in a convex set Ω over which f, c, and their first derivatives are bounded and Lipschitz continuous. Also, (Regularity) c(x k ) T has full row rank with singular values bounded below by a positive constant (Convexity) u T H(x k, λ k )u µ u 2 for µ > 0 for all u R n satisfying u 0 and c(x k ) T u = 0 Theorem (Byrd, Curtis, Nocedal (2008)) The sequence {(x k, λ k )} yields the limit [ ] lim f (xk ) + c(x k )λ k k = 0 c(x k )

Handling nonconvexity and rank deficiency There are two assumptions we aim to drop: (Regularity) c(xk ) T has full row rank with singular values bounded below by a positive constant (Convexity) u T H(x k, λ k )u µ u 2 for µ > 0 for all u R n satisfying u 0 and c(x k ) T u = 0 e.g., the problem is not regular if it is infeasible, and it is not convex if there are maximizers and/or saddle points Without them, Algorithm 1 may stall or may not be well-defined

No factorizations means no clue We might not store or factor [ H(xk, λ k ) ] c(x k ) c(x k ) T 0 so we might not know if the problem is nonconvex or ill-conditioned Common practice is to perturb the matrix to be [ ] H(xk, λ k ) + ξ 1I c(x k ) c(x k ) T ξ 2I where ξ 1 convexifies the model and ξ 2 regularizes the constraints Poor choices of ξ 1 and ξ 2 can have terrible consequences in the algorithm

Our approach for global convergence Decompose the direction d k into a normal component (toward the constraints) and a tangential component (toward optimality) We impose a specific type of trust region constraint on the v k step in case the constraint Jacobian is (near) rank deficient

Handling nonconvexity In computation of d k = v k + u k, convexify the Hessian as in [ H(xk, λ k ) + ξ 1I ] c(x k ) c(x k ) T 0 by monitoring iterates Hessian modification strategy: Increase ξ 1 whenever u k 2 > ψ v k 2, ψ > 0 1 2 ut k (H(x k, λ k ) + ξ 1I )u k < θ u k 2, θ > 0

Algorithm 2: Inexact Newton (Regularized) (Curtis, Nocedal, Wächter (2009)) for k = 0, 1, 2,... Approximately solve min 1 2 c(x k) + c(x k ) T v 2, s.t. v ω ( c(x k ))c(x k ) to compute v k satisfying Cauchy decrease Iteratively solve [ ] [ ] [ ] H(xk, λ k ) + ξ 1 I c(x k ) dk f (xk ) + c(x c(x k ) T = k )λ k 0 δ k c(x k ) T v k until termination test 1 or 2 is satisfied, increasing ξ 1 as described If only termination test 2 is satisfied, increase π so { π k max π k 1, f (x k) T d k + max{ 1 2 ut k (H(x } k, λ k ) + ξ 1 I )u k, θ u k 2 } (1 τ)( c(x k ) c(x k ) + c(x k ) T d k ) Backtrack from α k 1 to satisfy φ(x k + α k d k ; π k ) φ(x k ; π k ) ηα k m(d k ; π k ) Update iterate (x k+1, λ k+1 ) (x k, λ k ) + α k (d k, δ k )

Convergence of Algorithm 2 Assumption The sequence {(x k, λ k )} is contained in a convex set Ω over which f, c, and their first derivatives are bounded and Lipschitz continuous Theorem (Curtis, Nocedal, Wächter (2009)) If all limit points of { c(x k ) T } have full row rank, then the sequence {(x k, λ k )} yields the limit [ ] lim f (xk ) + c(x k )λ k k = 0. c(x k ) Otherwise, and if {π k } is bounded, then lim ( c(x k))c(x k ) = 0 k lim f (x k) + c(x k )λ k = 0 k

Handling inequalities Interior point methods are attractive for large applications Line-search interior point methods that enforce c(x k ) + c(x k ) T d k = 0 may fail to converge globally (Wächter, Biegler (2000)) Fortunately, the trust region subproblem we use to regularize the constraints also saves us from this type of failure!

Algorithm 2 (Interior-point version) Apply Algorithm 2 to the logarithmic-barrier subproblem for µ 0 Define min f (x) µ q ln s i, s.t. c E (x) = 0, c I (x) s = 0 i=1 H(x k, λ E,k, λ I,k ) 0 c E (x k ) c I (x k ) d x k 0 µi 0 S k c E (x k ) T d s k 0 0 0 δ E,k c I (x k ) T S k 0 0 δ I,k so that the iterate update has [ ] [ ] [ ] xk+1 xk d x + α k s k+1 s k k S k dk s Incorporate a fraction-to-the-boundary rule in the line search and a slack reset in the algorithm to maintain s max{0, c I (x)}

Convergence of Algorithm 2 (Interior-point) Assumption The sequence {(x k, λ E,k, λ I,k )} is contained in a convex set Ω over which f, c E, c I, and their first derivatives are bounded and Lipschitz continuous Theorem (Curtis, Schenk, Wächter (2009)) For a given µ, Algorithm 2 yields the same limits as in the equality constrained case If Algorithm 2 yields a sufficiently accurate solution to the barrier subproblem for each {µ j } 0 and if the linear independence constraint qualification (LICQ) holds at a limit point x of {x j }, then there exist Lagrange multipliers λ such that the first-order optimality conditions of the nonlinear program are satisfied

Outline PDE-Constrained Optimization Newton s method Inexactness Experimental results Conclusion and final remarks

Implementation details Incorporated in IPOPT software package (Wächter) inexact algorithm yes Linear systems solved with PARDISO (Schenk) SQMR (Freund (1994)) Preconditioning in PARDISO incomplete multilevel factorization with inverse-based pivoting stabilized by symmetric-weighted matchings Optimality tolerance: 1e-8

CUTEr and COPS collections 745 problems written in AMPL 645 solved successfully 42 real failures Robustness between 87%-94% Original IPOPT: 93%

Helmholtz N n p q # iter CPU sec (per iter) 32 14724 13824 1800 37 807.823 (21.833) 64 56860 53016 7688 25 3741.42 (149.66) 128 227940 212064 31752 20 54581.8 (2729.1)

Helmholtz Not taking nonconvexity into account:

Boundary control min 1 2 Ω (y(x) yt(x))2 dx s.t. (e y(x) y(x)) = 20 in Ω y(x) = u(x) on Ω 2.5 u(x) 3.5 on Ω where y t(x) = 3 + 10x 1 (x 1 1)x 2 (x 2 1) sin(2πx 3 ) N n p q # iter CPU sec (per iter) 16 4096 2744 2704 13 2.8144 (0.2165) 32 32768 27000 11536 13 103.65 (7.9731) 64 262144 238328 47632 14 5332.3 (380.88) Original IPOPT with N = 32 requires 238 seconds per iteration

Hyperthermia Treatment Planning min 1 2 Ω (y(x) yt(x))2 dx s.t. y(x) 10(y(x) 37) = u M(x)u in Ω 37.0 y(x) 37.5 on Ω 42.0 y(x) 44.0 in Ω 0 where u j = a j e iφ j, M jk (x) =< E j (x), E k (x) >, E j = sin(jx 1 x 2 x 3 π) N n p q # iter CPU sec (per iter) 16 4116 2744 2994 68 22.893 (0.3367) 32 32788 27000 13034 51 3055.9 (59.920) Original IPOPT with N = 32 requires 408 seconds per iteration

Groundwater modeling min 1 2 Ω (y(x) yt(x))2 dx + 1 2 α Ω [β(u(x) ut(x))2 + (u(x) u t(x)) 2 ]dx s.t. (e u(x) y i (x)) = q i (x) in Ω, i = 1,..., 6 y i (x) n = 0 on Ω y i (x)dx = 0, i = 1,..., 6 Ω 1 u(x) 2 in Ω where q i = 100 sin(2πx 1 ) sin(2πx 2 ) sin(2πx 3 ) N n p q # iter CPU sec (per iter) 16 28672 24576 8192 18 206.416 (11.4676) 32 229376 196608 65536 20 1963.64 (98.1820) 64 1835008 1572864 524288 21 134418. (6400.85) Original IPOPT with N = 32 requires approx. 20 hours for the first iteration

Outline PDE-Constrained Optimization Newton s method Inexactness Experimental results Conclusion and final remarks

Conclusion and final remarks PDE-Constrained optimization is an active and exciting area Inexact Newton method with theoretical foundation Convergence guarantees are as good as exact methods, sometimes better Numerical experiments are promising so far, and more to come