Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Similar documents
Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

5 Handling Constraints

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Algorithms for Constrained Optimization

Nonlinear Optimization: What s important?

Constrained Optimization

1 Computing with constraints

Part 2: Linesearch methods for unconstrained optimization. Nick Gould (RAL)

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Algorithms for constrained local optimization

Chapter 3 Numerical Methods

A GLOBALLY CONVERGENT STABILIZED SQP METHOD

MS&E 318 (CME 338) Large-Scale Numerical Optimization


Quadratic Programming

An Inexact Newton Method for Nonlinear Constrained Optimization

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

An Inexact Newton Method for Optimization

Inexact Newton Methods and Nonlinear Constrained Optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

Lecture 15: SQP methods for equality constrained optimization

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

Applications of Linear Programming

What s New in Active-Set Methods for Nonlinear Optimization?

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

10 Numerical methods for constrained problems

MATH2070 Optimisation

REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 10: Interior methods. Anders Forsgren. 1. Try to solve theory question 7.

Newton s Method. Javier Peña Convex Optimization /36-725

LINEAR AND NONLINEAR PROGRAMMING

A STABILIZED SQP METHOD: GLOBAL CONVERGENCE

MATH 4211/6211 Optimization Basics of Optimization Problems

5.6 Penalty method and augmented Lagrangian method

Constrained optimization: direct methods (cont.)

SF2822 Applied nonlinear optimization, final exam Saturday December

Generalization to inequality constrained problem. Maximize

Arc Search Algorithms

PDE-Constrained and Nonsmooth Optimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Examination paper for TMA4180 Optimization I

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

arxiv: v1 [math.na] 8 Jun 2018

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

A stabilized SQP method: superlinear convergence

A globally and quadratically convergent primal dual augmented Lagrangian algorithm for equality constrained optimization

Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm

CONVEXIFICATION SCHEMES FOR SQP METHODS

Lecture 3. Optimization Problems and Iterative Algorithms

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

HYBRID FILTER METHODS FOR NONLINEAR OPTIMIZATION. Yueling Loh

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Constrained Optimization and Lagrangian Duality

Numerical Methods for PDE-Constrained Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

An introduction to algorithms for continuous optimization

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

2.3 Linear Programming

Written Examination

A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties

minimize x subject to (x 2)(x 4) u,

CONSTRAINED NONLINEAR PROGRAMMING

Trust-Region SQP Methods with Inexact Linear System Solves for Large-Scale Optimization

Feasible Interior Methods Using Slacks for Nonlinear Optimization

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Additional Homework Problems

FETI domain decomposition method to solution of contact problems with large displacements

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Optimisation in Higher Dimensions

1. Introduction. In this paper we discuss an algorithm for equality constrained optimization problems of the form. f(x) s.t.

Optimization Tutorial 1. Basic Gradient Descent

Some new facts about sequential quadratic programming methods employing second derivatives

A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION

Sequential Convex Programming

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

Module 04 Optimization Problems KKT Conditions & Solvers

Iteration-complexity of first-order penalty methods for convex programming

An introduction to algorithms for nonlinear optimization 1,2

Convex Optimization and SVM

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

TMA947/MAN280 APPLIED OPTIMIZATION

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

Implementation of an Interior Point Multidimensional Filter Line Search Method for Constrained Optimization

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Transcription:

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x IR n f(x) subject to c(x) { } = where the objective function f : IR n IR and the constraints c : IR n IR m assume that f, c C (sometimes C 2 ) and Lipschitz often in practice this assumption violated, but not necessary

CONSTRAINTS AND MERIT FUNCTIONS Two conflicting goals: the objective function f(x) satisfy the constraints Overcome this by minimizing a composite merit function Φ(x, p) for which p are parameters (some) rs of Φ(x, p) wrt x approach those of f(x) subject to the constraints as p approaches some set P only uses unconstrained minimization methods AN EXAMPLE FOR EQUALITY CONSTRAINTS x IR n f(x) subject to c(x) = Merit function (quadratic penalty function): Φ(x, µ) = f(x) + 2µ c(x) 2 2 required solution as µ approaches {} from above may have other useless stationary points

CONTOURS OF THE PENALTY FUNCTION.9.9.8.8.7.7.6.6.5.5.4.4.3.3.2.2....2.3.4.5.6.7.8.9..2.3.4.5.6.7.8.9 µ = µ = Quadratic penalty function for min x 2 + x 2 2 subject to x + x 2 2 = CONTOURS OF THE PENALTY FUNCTION (cont.).9.9.8.8.7.7.6.6.5.5.4.4.3.3.2.2....2.3.4.5.6.7.8.9..2.3.4.5.6.7.8.9 µ =. µ =. Quadratic penalty function for min x 2 + x 2 2 subject to x + x 2 2 =

BASIC QUADRATIC PENALTY FUNCTION ALGORITHM Given µ >, set k = Until convergence iterate: Starting from x S k, use an unconstrained minimization algorithm to find an approximate r x k of Φ(x, µ k ) Compute µ k+ > smaller than µ k such that lim k µ k+ = and increase k by often choose µ k+ =.µ k or even µ k+ = µ 2 k might choose x S k+ = x k MAIN CONVERGENCE RESULT Theorem 5.. Suppose that f, c C 2, that that y k def = c(x k) µ k, x Φ(x k, µ k ) 2 ɛ k, where ɛ k converges to zero as k, and that x k converges to x for which A(x ) is full rank. Then x satisfies the first-order necessary optimality conditions for the problem x IR n f(x) subject to c(x) = and {y k } converge to the associated Lagrange multipliers y.

PROOF OF THEOREM 5. Generalized inv. A + (x) def = ( A(x)A T (x) ) A(x) bounded near x. Define def y k = c(x k) def and y = A + (x )g(x ). () µ k Inner-iteration termination rule g(x k ) A T (x k )y k ɛ k (2) = A + (x k )g(x k ) y k 2 = A + (x k ) ( ) g(x k ) A T (x k )y 2 k 2 A + (x ) 2 ɛ k = y k y 2 A + (x )g(x ) A + (x k )g(x k ) 2 + A + (x k )g(x k ) y k 2 = {y k } y. Continuity of gradients + (2) = g(x ) A T (x )y =. () implies c(x k ) = µ k y k + continuity of constraints = c(x ) =. = (x, y ) satisfies the first-order optimality conditions. ALGORITHMS TO MINIMIZE Φ(x, µ) Can use linesearch methods might use specialized linesearch to cope with large quadratic term c(x) 2 2/2µ trust-region methods (ideally) need to shape trust region to cope with contours of the c(x) 2 2/2µ term

DERIVATIVES OF THE QUADRATIC PENALTY FUNCTION x Φ(x, µ) = g(x, y(x)) xx Φ(x, µ) = H(x, y(x)) + µ AT (x)a(x) where Lagrange multiplier estimates: y(x) = c(x) µ g(x, y(x)) = g(x) A T (x)y(x): gradient of the Lagrangian m H(x, y(x)) = H(x) y i (x)h i (x): Lagrangian Hessian i= GENERIC QUADRATIC PENALTY NEWTON SYSTEM Newton correction s from x for quadratic penalty function is ( H(x, y(x)) + ) µ AT (x)a(x) s = g(x, y(x)) LIMITING DERIVATIVES OF Φ For small µ: roughly x Φ(x, µ) = g(x) A T (x)y(x) } {{ } moderate xx Φ(x, µ) = H(x, y(x)) }{{} moderate + µ AT (x)a(x) }{{} µ AT (x)a(x) large

POTENTIAL DIFFICULTY Ill-conditioning of the Hessian of the penalty function: roughly speaking (non-degenerate case) m eigenvalues λ i [ A T (x)a(x) ] /µ k n m eigenvalues λ i [ S T (x)h(x, y )S(x) ] where S(x) orthogonal basis for null-space of A(x) = condition number of xx Φ(x k, µ k ) = O(/µ k ) = may not be able to find r easily THE ILL-CONDITIONING IS BENIGN Newton system: ( H(x, y(x)) + ) ( µ AT (x)a(x) s = g(x) + ) µ AT (x)c(x) Define auxiliary variables w = (A(x)s + c(x)) µ = ( H(x, y(x)) A T (x) A(x) µi ) ( ) ( ) s g(x) = w c(x) essentially independent of µ for small µ = no inherent ill-conditioning thus can solve Newton equations accurately more sophisticated analysis = original system OK

PERTURBED OPTIMALITY CONDITIONS First order optimality conditions for are: x IR n f(x) subject to c(x) = g(x) A T (x)y = c(x) = dual feasibility primal feasibility Consider the perturbed problem g(x) A T (x)y = dual feasibility c(x) + µy = perturbed primal feasibility where µ > PRIMAL-DUAL PATH-FOLLOWING METHODS Track roots of g(x) A T (x)y = and c(x) + µy = as < µ nonlinear system = use Newton s method Newton correction (s, v) to (x, y) satisfies ( ) ( ) ( H(x, y) A T (x) s g(x) A T (x)y = A(x) µi v c(x) + µy ) Eliminate ( w = H(x, y) + ) ( µ AT (x)a(x) s = g(x) + ) µ AT (x)c(x) c.f. Newton method for quadratic penalty function minimization!

PRIMAL VS. PRIMAL-DUAL Primal: ( H(x, y(x)) + ) µ AT (x)a(x) s P = g(x, y(x)) Primal-dual: ( H(x, y) + ) µ AT (x)a(x) s PD = g(x, y(x)) where y(x) = c(x) µ What is the difference? freedom to choose y in H(x, y) for primal-dual... vital ANOTHER EXAMPLE FOR EQUALITY CONSTRAINTS x IR n f(x) subject to c(x) = Merit function (augmented Lagrangian function): Φ(x, u, µ) = f(x) u T c(x) + 2µ c(x) 2 2 where u and µ are auxiliary parameters Two interpretations shifted quadratic penalty function convexification of the Lagrangian function Aim: adjust µ and u to encourage convergence

DERIVATIVES OF THE AUGMENTED LAGRANGIAN FUNCTION x Φ(x, u, µ) = g(x, y F (x)) xx Φ(x, u, µ) = H(x, y F (x)) + µ AT (x)a(x) where First-order Lagrange multiplier estimates: y F (x) = u c(x) µ g(x, y F (x)) = g(x) A T (x)y F (x): gradient of the Lagrangian m H(x, y F (x)) = H(x) yi F (x)h i (x): Lagrangian Hessian i= AUGMENTED LAGRANGIAN CONVERGENCE Theorem 5.2. Suppose that f, c C 2, that def y k = u k c(x k )/µ k, for given {u k }, that x Φ(x k, u k, µ k ) 2 ɛ k, where ɛ k converges to zero as k, and that x k converges to x for which A(x ) is full rank. Then {y k } converge to some y for which g(x ) = A T (x )y. If additionally either µ k converges to zero for bounded u k or u k converges to y for bounded µ k, x and y satisfy the first-order necessary optimality conditions for the problem x IR n f(x) subject to c(x) =

PROOF OF THEOREM 5.2 Convergence of y k to y def = A + (x )g(x ) for which g(x ) = A T (x )y is exactly as for Theorem 5.. Definition of y k = c(x k ) = µ k u k y k µ k y k y + µ k u k y = c(x ) = from assumptions. = (x, y ) satisfies the first-order optimality conditions. CONTOURS OF THE AUGMENTED LAGRANGIAN FUNCTION.9.9.8.8.7.7.6.6.5.5.4.4.3.3.2.2....2.3.4.5.6.7.8.9..2.3.4.5.6.7.8.9 u =.5 u =.9 Augmented Lagrangian function for min x 2 + x 2 2 subject to x + x 2 2 = with fixed µ =

CONTOURS OF THE AUGMENTED LAGRANGIAN FUNCTION (cont.).9.9.8.8.7.7.6.6.5.5.4.4.3.3.2.2....2.3.4.5.6.7.8.9..2.3.4.5.6.7.8.9 u =.99 u = y = Augmented Lagrangian function for min x 2 + x 2 2 subject to x + x 2 2 = with fixed µ = CONVERGENCE OF AUGMENTED LAGRANGIAN METHODS convergence guaranteed if u k fixed and µ = y k y and c(x k ) check if c(x k ) η k where {η k } if so, set u k+ = y k and µ k+ = µ k if not, set u k+ = u k and µ k+ τµ k for some τ (, ) reasonable: η k = µ.+.9j k where j iterations since µ k last changed under such rules, can ensure µ k eventually unchanged under modest assumptions and (fast) linear convergence need also to ensure µ k is sufficiently large that xx Φ(x k, u k, µ k ) is positive (semi-)definite

BASIC AUGMENTED LAGRANGIAN ALGORITHM Given µ > and u, set k = Until convergence iterate: Starting from x S k, use an unconstrained minimization algorithm to find an approximate r x k of Φ(x, u k, µ k ) for which x Φ(x k, u k, µ k ) ɛ k If c(x k ) η k, set u k+ = y k and µ k+ = µ k Otherwise set u k+ = u k and µ k+ τµ k Set suitable ɛ k+ and η k+ and increase k by often choose τ = min(., µ k ) might choose x S k+ = x k reasonable: ɛ k = µ j+ k where j iterations since µ k last changed