Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Similar documents
Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Lecture 15: SQP methods for equality constrained optimization

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

5 Handling Constraints

Constrained Optimization

Algorithms for Constrained Optimization

Algorithms for constrained local optimization


1 Computing with constraints

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Generalization to inequality constrained problem. Maximize

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Nonlinear Optimization: What s important?

Inexact Newton Methods and Nonlinear Constrained Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization

Sufficient Conditions for Finite-variable Constrained Minimization

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

4TE3/6TE3. Algorithms for. Continuous Optimization

MATH2070 Optimisation

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

Lecture 13: Constrained optimization

An Inexact Newton Method for Optimization

Constrained Optimization and Lagrangian Duality

Quadratic Programming

Chapter 3 Numerical Methods

Numerical optimization

Examination paper for TMA4180 Optimization I

PDE-Constrained and Nonsmooth Optimization

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Lecture 3: Linesearch methods (continued). Steepest descent methods

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Preprint ANL/MCS-P , Dec 2002 (Revised Nov 2003, Mar 2004) Mathematics and Computer Science Division Argonne National Laboratory

Support Vector Machines

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION

LINEAR AND NONLINEAR PROGRAMMING

Nonlinear Optimization for Optimal Control

Lecture 3. Optimization Problems and Iterative Algorithms

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

Numerical Optimization

Constrained optimization: direct methods (cont.)

Nonlinear Programming

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

CONSTRAINED NONLINEAR PROGRAMMING

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

5.6 Penalty method and augmented Lagrangian method

Computational Optimization. Constrained Optimization Part 2

A globally and quadratically convergent primal dual augmented Lagrangian algorithm for equality constrained optimization

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Written Examination

arxiv: v1 [math.na] 8 Jun 2018

IE 5531: Engineering Optimization I

Accelerated primal-dual methods for linearly constrained convex problems

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Some new facts about sequential quadratic programming methods employing second derivatives

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

c 2005 Society for Industrial and Applied Mathematics

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

A STABILIZED SQP METHOD: GLOBAL CONVERGENCE

Optimal Newton-type methods for nonconvex smooth optimization problems

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

An interior-point stochastic approximation method and an L1-regularized delta rule

2.098/6.255/ Optimization Methods Practice True/False Questions

2.3 Linear Programming

What s New in Active-Set Methods for Nonlinear Optimization?

The Squared Slacks Transformation in Nonlinear Programming

A GLOBALLY CONVERGENT STABILIZED SQP METHOD

A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION

Convex Optimization and SVM

subject to (x 2)(x 4) u,

Optimisation in Higher Dimensions

IE 5531: Engineering Optimization I

10 Numerical methods for constrained problems

MPC Infeasibility Handling

Convex Optimization Algorithms for Machine Learning in 10 Slides

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Penalty, Barrier and Augmented Lagrangian Methods

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

minimize x subject to (x 2)(x 4) u,

Lecture V. Numerical Optimization

5. Duality. Lagrangian

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS

Lecture: Duality of LP, SOCP and SDP

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Machine Learning. Support Vector Machines. Manfred Huber

Convex Optimization and Support Vector Machine

Transcription:

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 1/25

Penalty methods for nonlinear programming Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 2/25

Nonlinear equality-constrained problems min f(x) subject to c(x) = 0, (ecp) x R n where f : R n R, c = (c 1,...,c m ) : R n R m smooth. attempt to find local solutions (at least KKT points). constrained optimization conflict of requirements: objective minimization & feasibility of the solution. easier to generate feasible iterates for linear equality and general inequality constrained problems; very hard, even impossible, in general, when general equality constraints are present. = form a single, parametrized and unconstrained objective, whose minimizers approach initial problem solutions as parameters vary Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 3/25

A penalty function for (ecp) min f(x) subject to c(x) = 0. (ecp) x R n The quadratic penalty function: min Φ σ (x) = f(x) + 1 x R n 2σ c(x) 2, (ecp σ ) where σ > 0 penalty parameter. σ: penalty on infeasibility; σ 0: forces constraint to be satisfied and achieve optimality for f. Φ σ may have other stationary points that are not solutions for (ecp); eg., when c(x) = 0 is inconsistent. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 4/25

Contours of the penalty function Φ σ - an example 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 σ = 100 σ = 1 The quadratic penalty function for minx 2 1 + x2 2 subject to x 1 + x 2 2 = 1 Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 5/25

Contours of the penalty function Φ σ - an example... 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 σ = 0.1 σ = 0.01 The quadratic penalty function for minx 2 1 + x2 2 subject to x 1 + x 2 2 = 1 Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 6/25

A quadratic penalty method Given σ 0 > 0, let k = 0. Until convergence do: Choose 0 < σ k+1 < σ k. Starting from x k 0 (possibly, xk 0 := xk ), use an unconstrained minimization algorithm to find an approximate minimizer x k+1 of Φ σ k+1. Let k := k + 1. Must have σ k 0, k 0. σ k+1 := 0.1σ k, σ k+1 := (σ k ) 2, etc. Algorithms for minimizing Φ σ : Linesearch, trust-region methods. σ small: Φ σ very steep in the direction of constraints gradients, and so rapid change in Φ σ for steps in such directions; implications for shape of trust region. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 7/25

A convergence result for the penalty method Theorem 25. (Global convergence of penalty method) Apply the basic quadratic penalty method to the (ecp). Assume that f,c C 1, y k i = c i(x k )/σ k, i = 1,m, and Φ σ k(x k ) ǫ k, where ǫ k 0,k, and also σ k 0, as k. Moreover, assume that x k x, where c i (x ), i = 1,m, are linearly independent. Then x is a KKT point of (ecp) and y k y, where y is the vector of Lagrange multipliers of (ecp) constraints. c i (x ), i = 1,m, lin. indep. the Jacobian matrix J(x ) of the constraints is full row rank and so m n. J(x ) not full rank, then x (locally) minimizes the infeasibility c(x). [let y k in ( ) on the next slide] Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 8/25

A convergence result for the penalty method Proof of Theorem 25. J(x ) full rank = J(x ) + = (J(x )J(x ) T ) 1 J(x ) pseudo-inverse. As x k x and J cont. J(x k ) + bounded above and cont. for all suff. large k. Let y k = c(x k )/σ k and y = J(x ) + f(x ). Φ σ k(x k ) = f(x k ) J(x k ) T y k ǫ k ( ) J(x k ) + f(x k ) y k = J(x k ) + ( f(x k ) J(x k ) T y k ) J(x k ) + f(x k ) J(x k ) T y k { J(x k ) + J(x ) + + J(x ) + } ǫ k 2 J(x ) + ǫ k ( ) where in the last we used x k x and J + continuous. Triangle inequality (add and subtr J + f) and def of y give y k y J(x k ) + f(x k ) J(x ) + f(x ) + J(x k ) + f(x k ) y k Thus y k y since x k x, J + and f cont., ( ) and ǫ k 0. Using all these again in ( ) as k : f(x ) J(x ) T y = 0. As c(x k ) = σ k y k, σ k 0, y k y c(x ) = 0. Thus x KKT. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 9/25

Derivatives of the penalty function Let y(σ) := c(x)/σ: estimates of Lagrange multipliers. Let L be the Lagrangian function of (ecp), Φ σ (x) = f(x) + 1 2σ c(x) 2. L(x,y) := f(x) y T c(x). Then Φ σ (x) = f(x) + 1 σ J(x)T c(x) = x L(x,y(σ)), where J(x) Jacobian m n matrix of constraints c(x). 2 Φ σ (x) = 2 f(x) + 1 σ m i=1 c i(x) 2 c i (x) + 1 σ J(x)T J(x) = 2 xx L(x,y(σ)) + 1 σ J(x)T J(x). σ 0: generally, c i (x) 0 at the same rate with σ for all i. Thus usually, 2 xxl(x,y(σ)) well-behaved. σ 0: J(x) T J(x)/σ J(x ) T J(x )/0 =. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 10/25

Ill-conditioning of the penalty s Hessian... Fact [cf. Th 5.2, Gould ref.] = m eigenvalues of 2 Φ σ k(x k ) are O(1/σ k ) and hence, tend to infinity as k (ie, σ k 0); remaining n m are O(1) in the limit. Hence, the condition number of 2 Φ σ k(x k ) is O(1/σ k ) = it blows up as k. = worried that we may not be able to compute changes to x k accurately. Namely, whether using linesearch or trust-region methods, asymptotically, we want to minimize Φ σ k+1(x) by taking Newton steps, i.e., solve the system 2 Φ σ (x)dx = Φ σ (x), (*) for dx from some current x = x k,i and σ = σ k+1. Despite ill-conditioning present, we can still solve for dx accurately! Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 11/25

Solving accurately for the Newton direction Due to computed formulas for derivatives, (*) is equivalent to ( 2 xx L(x,y(σ)) + 1 σ J(x)T J(x) ) dx = ( f(x) + 1 σ J(x)T c(x) ), where y(σ) = c(x)/σ. Define auxiliary variable w w = 1 σ (J(x)dx + c(x)). Then the Newton system (*) can be re-written as ( 2 L(x,y(σ)) J(x) J(x) σi )( dx w ) = ( ) f(x) c(x) This system is essentially independent of σ for small σ = cannot suffer from ill-conditioning due to σ 0. Still need to be careful about minimizing Φ σ for small σ. Eg, when using TR methods, use dx B for TR constraint. B takes into account ill-conditioned terms of Hessian so as to encourage equal model decrease in all directions. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 12/25

Perturbed optimality conditions min f(x) subject to c(x) = 0. (ecp) x R n (ecp) satisfies the KKT conditions (dual feasibility) f(x) = J(x) T y and (primal feasibility) c(x) = 0. Consider the perturbed problem f(x) J(x) T y = 0 c(x)+σy = 0 (ecp p ) Find roots of nonlinear system (ecp p ) as σ 0 (σ > 0); use Newton s method for root finding. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 13/25

Perturbed optimality conditions... Newton s method for system (ecp p ) computes change (dx,dy) to (x,y) from 2 L(x,y) J(x) = J(x) σi dx dy f(x) J(x) y c(x) + σy Eliminating dy, gives ( 2 xx L(x,y) + 1 ) ( σ J(x)T J(x) dx = f(x) + 1 ) σ J(x)T c(x) = same as Newton for quadratic penalty! what s different? Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 14/25

Perturbed optimality conditions... Primal: ( 2 xx L(x,y(σ)) + 1 ) ( σ J(x)T J(x) dx p = f(x) + 1 ) σ J(x)T c(x) where y(σ) = c(x)/σ. Primal-dual: ( 2 xx L(x,y) + 1 ) ( σ J(x)T J(x) dx pd = f(x) + 1 ) σ J(x)T c(x) The difference is in freedom to choose y in 2 L(x,y) in primal-dual methods - it makes a big difference computationally. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 15/25

Other penalty functions Consider the general (CP) problem minimize x R n f(x) subject to c E (x) = 0, c I (x) 0. (CP) Exact penalty function: Φ(x,σ) is exact if there is σ > 0 such that if σ < σ, any local solution of (CP) is a local minimizer of Φ(x, σ). (Quadratic penalty is inexact.) Examples: l 2 -penalty function: Φ(x,σ) = f(x) + 1 σ c E(x) l 1 -penalty function: let z = min{z,0}, Φ(x,σ) = f(x) + 1 σ i E c i(x) + 1 σ Extension of quadratic penalty to (CP): Φ(x,σ) = f(x) + 1 2σ c E(x) 2 + 1 2σ i I i I [c i(x)]. ( [ci (x)] ) 2 (may no longer be suff. smooth; it is inexact) Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 16/25

Augmented Lagrangian methods for nonlinear programming Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 17/25

Nonlinear equality-constrained problems min f(x) subject to c(x) = 0, (ecp) x R n where f : R n R, c = (c 1,...,c m ) : R n R m smooth. Another example of merit function and method for (ecp): augmented Lagragian function Φ(x,u,σ) = f(x) u T c(x) + 1 2σ c(x) 2 where u R m and σ > 0 are auxiliary parameters. Two interpretations: shifted quadratic penalty function convexification of the Lagrangian function Aim: adjust u and σ to encourage convergence. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 18/25

Derivatives of the augmented Lagrangian function Let J(x) Jacobian of constraints c(x) = (c 1 (x),...,c m (x)). x Φ(x,u,σ) = f(x) J(x) T u + 1 σ J(x)T c(x) = x Φ(x,u,σ) = f(x) J(x) T y(x) = x L(x,y(x)) where y(x) = u c(x) σ Lagrange multiplier estimates 2 Φ(x,u,σ) = 2 f(x) m i=1 u i 2 c i (x)+ 1 σ m i=1 c i(x) 2 c i (x) + 1 σ J(x)T J(x) = 2 Φ(x,u,σ) = 2 f(x) m i=1 y i 2 c i (x) + 1 σ J(x)T J(x) = 2 Φ(x,u,σ) = 2 L(x,y(x)) + 1 σ J(x)T J(x) Lagrangian: L(x,y) = f(x) y T c(x) Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 19/25

A convergence result for the augmented Lagrangian Theorem 26. (Global convergence of augmented Lagrangian) Assume that f,c C 1 in (ecp) and let y k = u k c(xk ) σ k, for given u k R m, and assume that Φ(x k,u k,σ k ) ǫ k, where ǫ k 0,k. Moreover, assume that x k x, where c i (x ), i = 1,m, are linearly independent. Then y k y as k with y satisfying f(x ) J(x ) T y = 0. If additionally, either σ k 0 for bounded u k or u k y for bounded σ k then x is a KKT point of (ecp) with associated Lagrange multipliers y. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 20/25

A convergence result for the augmented Lagrangian Proof of Theorem 26. The first part of Th 26, namely, convergence of y k to y = J(x ) + f(x ) follows exactly as in the proof of Theorem 25 (penalty method convergence). (Note that the assumption σ k 0 is not needed for this part of the proof of Th 25.) It remains to show that under the additional assumptions on u k and σ k, x is feasible for the constraints. To see this, use the definition of y k to deduce c(x k ) = σ k (u k y k ) and so c(x k ) = σ k u k y k σ k y k y + σ k u k y Thus c(x k ) 0 as k due to y k y (cf. first part of theorem) and the additional assumptions on u k and σ k. As x k x and c is continuous, we deduce that c(x ) = 0. Note that Augmented Lagrangian may converge to KKT points without σ k 0, which limits the ill-conditioning. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 21/25

Contours of the augmented Lagrangian - an example 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 u = 0.5 u = 0.9 The augmented Lagrangian function for minx 2 1 + x2 2 = 1 for fixed σ = 1 x 1 + x 2 2 subject to Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 22/25

Contours of the augmented Lagrangian - an example... 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 u = 0.99 u = y = 1 The augmented Lagrangian function for minx 2 1 + x2 2 = 1 for fixed σ = 1 x 1 + x 2 2 subject to Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 23/25

Augmented Lagrangian methods Th 26 = convergence guaranteed if u k fixed and σ k 0 [similar to quadratic penalty methods] = y k y and c(x k ) 0 check if c(x k ) η k where η k 0 if so, set u k+1 = y k and σ k+1 = σ k [recall expression of y k in Th 26] if not, set u k+1 = u k and σ k+1 τσ k for some τ (0,1) reasonable: η k = (σ k ) 0.1+0.9j where j iterations since σ k last changed Under such rules, can ensure that σ k is eventually unchanged under modest assumptions, and (fast) linear convergence. Need also to ensure that σ k is sufficiently large that the Hessian 2 Φ(x k,u k,σ k ) is positive (semi-)definite. Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 24/25

A basic augmented Lagrangian method Given σ 0 > 0 and u 0, let k = 0. convergence do: Until Set η k and ǫ k+1. If c(x k ) η k, set u k+1 = y k and σ k+1 = σ k. Otherwise, set u k+1 = u k and σ k+1 τσ k. Starting from x k 0 (possibly, xk 0 := xk ), use an unconstrained minimization algorithm to find an approximate minimizer x k+1 of Φ(,u k+1,σ k+1 ) for which x Φ(x k+1,u k+1,σ k+1 ) ǫ k+1. Let k := k + 1. Often choose τ = min(0.1, σ k ) Reasonable: ǫ k = (σ k ) j+1, where j iterations since σ k last changed Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming p. 25/25