MMA and GCMMA two methods for nonlinear optimization

Similar documents
Some modelling aspects for the Matlab implementation of MMA

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Lecture 21: Numerical methods for pricing American type derivatives

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

COS 521: Advanced Algorithms Game Theory and Linear Programming

The Minimum Universal Cost Flow in an Infeasible Flow Network

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Convex Optimization. Optimality conditions. (EE227BT: UC Berkeley) Lecture 9 (Optimality; Conic duality) 9/25/14. Laurent El Ghaoui.

Lecture 20: November 7

Lagrange Multipliers Kernel Trick

Lecture Notes on Linear Regression

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Lecture 12: Discrete Laplacian

Assortment Optimization under MNL

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

6.854J / J Advanced Algorithms Fall 2008

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Linear Approximation with Regularization and Moving Least Squares

Module 9. Lecture 6. Duality in Assignment Problems

Lecture 10 Support Vector Machines II

Errors for Linear Systems

Feature Selection: Part 1

1 Matrix representations of canonical matrices

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

EEE 241: Linear Systems

NUMERICAL DIFFERENTIATION

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

CHAPTER 7 CONSTRAINED OPTIMIZATION 2: SQP AND GRG

APPENDIX A Some Linear Algebra

Singular Value Decomposition: Theory and Applications

Inexact Newton Methods for Inverse Eigenvalue Problems

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

On a direct solver for linear least squares problems

Generalized Linear Methods

Kernel Methods and SVMs Extension

1 GSW Iterative Techniques for y = Ax

e - c o m p a n i o n

Foundations of Arithmetic

PHYS 705: Classical Mechanics. Calculus of Variations II

Difference Equations

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Relaxation Methods for Iterative Solution to Linear Systems of Equations

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

More metrics on cartesian products

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +

The Geometry of Logit and Probit

Complete subgraphs in multipartite graphs

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Support Vector Machines CS434

Numerical Heat and Mass Transfer

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Finite Element Modelling of truss/cable structures

The Order Relation and Trace Inequalities for. Hermitian Operators

4DVAR, according to the name, is a four-dimensional variational method.

The Study of Teaching-learning-based Optimization Algorithm

FIRST AND SECOND ORDER NECESSARY OPTIMALITY CONDITIONS FOR DISCRETE OPTIMAL CONTROL PROBLEMS

DUE: WEDS FEB 21ST 2018

1 Convex Optimization

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

An Interactive Optimisation Tool for Allocation Problems

Linear Feature Engineering 11

Problem Set 9 Solutions

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Appendix B. The Finite Difference Scheme

High resolution entropy stable scheme for shallow water equations

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Estimation: Part 2. Chapter GREG estimation

14 Lagrange Multipliers

Time-Varying Systems and Computations Lecture 6

Lecture 17: Lee-Sidford Barrier

Lecture 5 Decoding Binary BCH Codes

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

On the Global Linear Convergence of the ADMM with Multi-Block Variables

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 17. a ij x (k) b i. a ij x (k+1) (D + L)x (k+1) = b Ux (k)

Week 5: Neural Networks

ECE559VV Project Report

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

Maximal Margin Classifier

In this section is given an overview of the common elasticity models.

A 2D Bounded Linear Program (H,c) 2D Linear Programming

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Transcription:

MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons of MMA and GCMMA n Matlab. The frst versons of these methods were publshed n [1] and [2]. 1. Consdered optmzaton problem Throughout ths note, optmzaton problems of the followng form are consdered, the varables are x = (x 1,..., x n T IR n, y = (y 1,..., y m T IR m, and z IR. mnmze f 0 (x + a 0 z + (c y + 1 2 d y 2 =1 subect to f (x a z y 0, = 1,..., m x X, y 0, z 0. (1.1 Here, X = {x IR n x mn real numbers whch satsfy x x max x mn < x max, = 1,..., n}, x mn and x max are gven for all, f 0, f 1,..., f m are gven, contnuously dfferentable, real-valued functons on X, a 0, a, c and d are gven real numbers whch satsfy a 0 > 0, a 0, c 0, d 0 and c + d > 0 for all, and also a c > a 0 for all wth a > 0. In (1.1, the natural optmzaton varables are x 1,..., x n, whle y 1,..., y m and z are artfcal optmzaton varables whch should make t easer for the user to formulate and solve certan subclasses of problems, lke least squares problems and mnmax problems. As a frst example, assume that the user wants to solve a problem on the followng standard form for nonlnear programmng. mnmze f 0 (x subect to f (x 0, = 1,..., m x X, (1.2 f 0, f 1,..., f m are gven dfferentable functons and X s as above. To make problem (1.1 (almost equvalent to ths problem (1.2, frst let a 0 = 1 and a = 0 for all > 0. Then z = 0 n any optmal soluton of (1.1. Further, for each, let d = 1 and c = a large number, so that the varables y become expensve. Then typcally y = 0 n any optmal soluton of (1.1, and the correspondng x s an optmal soluton of (1.2. It should be noted 1

that the problem (1.1 always has feasble solutons, and n fact also at least one optmal soluton. Ths holds even f the user s problem (1.2 does not have any feasble solutons, n whch case some y > 0 n the optmal soluton of (1.1. As a second example, assume that the user wants to solve a mn-max problem on the form mnmze max (x} =1,..,p subect to g (x 0, = 1,..., q x X. (1.3 h and g are gven dfferentable functons and X s as above. It s assumed that the functons h are non-negatve on X (possbly after addng a suffcently large constant C to each of them, the same C for all h. For each gven x X, the value of the obectve functon n problem (1.3 s the largest of the p real numbers h 1 (x,..., h p (x. Problem (1.3 may equvalently be wrtten on the followng form wth varables x IR n and z IR : mnmze z subect to z h (x, = 1,..., p g (x 0, x X, z 0. = 1,..., q (1.4 To make problem (1.1 (almost equvalent to ths problem (1.4, let m = p + q, f 0 (x = 0, f (x = h (x for = 1,..., p, f p+ (x = g (x for = 1,..., q, a 0 = a 1 = = a p = 1, a p+1 = = a p+q = 0, d 1 = = d m = 1, c 1 = = c m = a large number. As a thrd example, assume that the user wants to solve a constraned least squares problem on the form p 1 mnmze 2 (h (x 2 =1 (1.5 subect to g (x 0, = 1,..., q x X. h and g are gven dfferentable functons and X s as above. Problem (1.5 may equvalently be wrtten on the followng form wth varables x IR n and y 1,..., y 2p IR: mnmze 1 2 p (y 2 + yp+ 2 =1 subect to y h (x, = 1,..., p y p+ h (x, g (x 0, y 0 and y p+ 0, x X. = 1,..., p = 1,..., q = 1,..., p (1.6 To make problem (1.1 (almost equvalent to ths problem (1.6, let m = 2p + q, f 0 (x = 0, f (x = h (x for = 1,..., p, f p+ (x = h (x for = 1,..., p, f 2p+ (x = g (x for = 1,..., q, a 0 = 1, a 1 = = a m = 0, d 1 = = d m = 1, c 1 = = c 2p = 0, c 2p+1 = = c 2p+q = a large number. 2

2. Some practcal consderatons In many applcatons, the constrants are on the form g (x g max, g (x stands for e.g. a certan stress, whle g max s the largest permtted value on ths stress. Ths means that f (x = g (x g max (n (1.1 as well as n (1.2. The user should then preferably scale the constrants n such a way that 1 g max 100 for each (and not g max = 10 10. The obectve functon f 0 (x should preferably be scaled such that 1 f 0 (x 100 for reasonable values on the varables. The varables x should preferably be scaled such that 0.1 x max x mn 100, for all. Concernng the large numbers on the coeffcents c (mentoned above, the user should for numercal reasons try to avod extremely large values on these coeffcents (lke 10 10. It s better to start wth reasonably large values and then, f t turns out that not all y = 0 n the optmal soluton of (1.1, ncrease the correspondng values of c by e.g. a factor 100 and solve the problem agan, etc. If the functons and the varables have been scaled accordng to above, then resonably large values on the parameters c could be, say, c = 1000 or 10000. Fnally, concernng the smple bound constrants x mn x x max, t may sometmes be the case that some varables x do not have any prescrbed upper and/or lower bounds. In that case, t s n practce always possble to choose artfcal bounds x mn and x max such that every realstc soluton x satsfes the correspondng bound constrants. The user should then preferably avod choosng x max x mn unnecessarly large. It s better to try some reasonable bounds and then, f t turns out that some varable x becomes equal to such an artfcal bound n the optmal soluton of (1.1, change ths bound and solve the problem agan (startng from the recently obtaned soluton, etc. 3. The ordnary MMA MMA s a method for solvng problems on the form (1.1, usng the followng approach: In each teraton, the current teraton pont (x (k, y (k, z (k s gven. Then an approxmatng (k subproblem, n whch the functons f (x are replaced by certan convex functons f (x, s generated. The choce of these approxmatng functons s based manly on gradent nformaton at the current teraton pont, but also on some parameters u (k and l (k ( movng asymptotes whch are updated n each teraton based on nformaton from prevous teraton ponts. The subproblem s solved, and the unque optmal soluton becomes the next teraton pont (x (k+1, y (k+1, z (k+1. Then a new subproblem s generated, etc. 3

The MMA subproblem looks as follows: mnmze subect to f (k 0 (x + a 0z + (c y + 1 2 d y 2 =1 f (k (x a z y 0, = 1,..., m α (k y 0, z 0. x β (k, = 1,..., n, = 1,..., m (3.1 In ths subproblem (3.1, the approxmatng functons p (k q (k f (k (x = = (u (k x (k 2 = (x (k l (k 2 =1 (k f (x are chosen as ( (k p q (k u (k + x x l (k + r (k, = 0, 1,..., m, (3.2 ( ( f 1.001 (x (k x ( ( f 0.001 (x (k x r (k = f (x (k + ( f + 0.001 + ( f + 1.001 =1 ( (k p u (k x (k (x (k + x (x (k + x + q (k x (k l (k x max x max 10 5 x mn 10 5 x mn, (3.3, (3.4. (3.5 ( + f Here, (x (k f denotes the largest of the two numbers (x (k and 0, x x ( f whle (x (k denotes the largest of the two numbers f (x (k and 0. x x The bounds α (k α (k β (k and β (k n (3.1 and (5.1 are chosen as = max{ x mn, l (k = mn{ x max, u (k whch means that the constrants α (k of constrants: 0.9(x (k 0.5(x max + 0.1(x (k l (k 0.1(u (k x (k x β (k x mn l (k x x (k x mn x x (k, x (k, x (k 0.5(x max x mn }, (3.6 + 0.5(x max x mn }, (3.7 are equvalent to the followng three sets x x max, (3.8 0.9(u (k x (k, (3.9 0.5(x max x mn. (3.10 4

The default rules for updatng the lower asymptotes l (k and the upper asymptotes u (k as follows. The frst two teratons, when k = 1 and k = 2, are l (k = x (k 0.5(x max x mn, u (k = x (k + 0.5(x max x mn. (3.11 In later teratons, when k 3, γ (k = l (k = x (k u (k = x (k 0.7 f (x (k 1.2 f (x (k 1 f (x (k provded that ths leads to values that satsfy γ (k (x (k 1 l (k 1, + γ (k (u (k 1 x (k 1, x (k 1 (x (k 1 x (k 2 < 0, x (k 1 (x (k 1 x (k 2 > 0, x (k 1 (x (k 1 x (k 2 = 0, l (k x (k 0.01(x max x mn, l (k x (k 10(x max x mn, u (k x (k + 0.01(x max x mn, u (k x (k + 10(x max x mn. (3.12 (3.13 (3.14 If any of these bounds s volated, the correspondng l (k of the volated nequalty. or u (k s put to the rght hand sde Note that most of the explct numbers n the above expressons are ust default values of dfferent parameters n the fortran code. More precsely: The number 10 5 n (3.3 and (3.4 s the default value of the parameter raa0. The number 0.1 n (3.6 and (3.7 s the default value of the parameter albefa. The number 0.5 n (3.6 and (3.7 s the default value of the parameter move. The number 0.5 n (3.11 s the default value of the parameter asynt. The number 0.7 n (3.13 s the default value of the parameter asydecr. The number 1.2 n (3.13 s the default value of the parameter asyncr. All these values can be carefully changed by the user. As an example, a more conservatve method s obtan by decreasng move and/or asynt and/or asyncr. 5

4. GCMMA the globally convergent verson of MMA The globally convergent verson of MMA, from now on called GCMMA, for solvng problems of the form (1.1 conssts of outer and nner teratons. The ndex k s used to denote the outer teraton number, whle the ndex ν s used to denote the nner teraton number. Wthn each outer teraton, there may be zero, one, or several nner teratons. The double ndex (k, ν s used to denote the ν:th nner teraton wthn the k:th outer teraton. The frst teraton pont s obtaned by frst chosng x (1 X and then chosng y (1 and z (1 such that (x (1, y (1, z (1 becomes a feasble soluton of (1.1. Ths s easy. An outer teraton of the method, gong from the k:th teraton pont (x (k, y (k, z (k to the (k + 1:th teraton pont (x (k+1, y (k+1, z (k+1, can be descrbed as follows: Gven (x (k, y (k, z (k, an approxmatng subproblem s generated and solved. In ths subproblem, the functons f (x are replaced by certan convex functons f (k,0 (x. The optmal soluton of ths subproblem s denoted (ˆx (k,0, ŷ (k,0, ẑ (k,0 (k,0. If f (ˆx (k,0 f (ˆx (k,0, for all = 0, 1,..., m, the next teraton pont becomes (x (k+1, y (k+1, z (k+1 = (ˆx (k,0, ŷ (k,0, ẑ (k,0, and the outer teraton s completed (wthout any nner teratons needed. Otherwse, an nner teraton s made, whch means that a new subproblem s generated and solved at x (k (k,1 (k,0, wth new approxmatng functons f (x whch are more conservatve than f (x for those ndces for whch the above nequalty was volated. (By ths we mean that f (k,1 (k,0 (x > f (x for all x X (k, except for x = x (k (k,1 f (x (k (k,0 = f (x (k. The optmal soluton of ths new subproblem s denoted (ˆx (k,1, ŷ (k,1, ẑ (k,1 (k,1. If f (ˆx (k,1 f (ˆx (k,1, for all = 0, 1,..., m, the next teraton pont becomes (x (k+1, y (k+1, z (k+1 = (ˆx (k,1, ŷ (k,1, ẑ (k,1, and the outer teraton s completed (wth one nner teratons needed. Otherwse, another nner teraton s made, whch means that a new subproblem s generated and solved at x (k (k,2, wth new approxmatng functons f (x, etc. These nner teratons are (k,ν repeated untl f (ˆx (k,ν f (ˆx (k,ν for all = 0, 1,..., m, whch always happens after a fnte (usually small number of nner teratons. Then the next teraton pont becomes (x (k+1, y (k+1, z (k+1 = (ˆx (k,ν, ŷ (k,ν, ẑ (k,ν, and the outer teraton s completed (wth ν nner teratons needed. It should be noted that n each nner teraton, there s no need to recalculate the gradents f (x (k, snce x (k has not changed. Gradents of the orgnal functons f are calculated only once n each outer teraton. Ths s an mportant note snce the calculaton of gradents s typcally the most tme consumng part n structural optmzaton. 6

The GCMMA subproblem looks as follows, for k {1, 2, 3,...} and ν {0, 1, 2,...}: mnmze subect to f (k,ν 0 (x + a 0 z + (c y + 1 2 d y 2 =1 f (k,ν (x a z y 0, = 1,..., m α (k y 0, z 0. x β (k, = 1,..., n, = 1,..., m (4.1 In ths subproblem (4.1, the approxmatng functons p (k,ν q (k,ν f (k,ν (x = = (u (k x (k 2 = (x (k l (k 2 Note: If ρ (k,ν+1 =1 (k,ν f (x are chosen as ( (k,ν p q (k,ν u (k + x x l (k + r (k,ν, = 0, 1,..., m, (4.2 ( ( f 1.001 (x (k x ( ( f 0.001 (x (k x r (k,ν > ρ (k,ν = f (x (k then + ( f + 0.001 + ( f + 1.001 =1 ( (k,ν p u (k x (k (x (k + x (x (k + x + q(k,ν x (k l (k x max x max ρ (k,ν x mn ρ (k,ν x mn, (4.3, (4.4. (4.5 (k,ν+1 f (x s a more conservatve approxmaton than (k,ν f (x. Between each outer teraton, the bounds α (k and β (k and the asymptotes l (k and u (k are updated as n the orgnal MMA, the formulas (3.6 (3.14 stll hold. The parameters ρ (k,ν n (4.3 and (4.4 are strctly postve and updated accordng to below. Wthn a gven outer teraton k, the only dfferences between two nner teratons are the values of some of these parameters. In the begnnng of each outer teraton, when ν = 0, the followng default values are used: ρ (k,0 = 0.1 n f (x (k x =1 (xmax x mn, for = 0, 1,.., m. (4.6 If any of the rght hand sdes n (4.6 s < 10 6 then the correspondng ρ (k,0 s set to 10 6. 7

In each new nner teraton, the updatng of ρ (k,ν s based on the soluton to the most recent subproblem. Note that f (k,ν (x may be wrtten on the form: f (k,ν (x = h (k (x + ρ (k,ν d (k (x, h (k (x and d (k (x do not depend on ρ (k,ν. Some calculatons gve that Now, let d (k (x = Then h (k (ˆx (k,ν +(ρ (k,ν be a natural value of ρ (k,ν+1 s modfed as follows. +δ (k,ν =1 δ (k,ν (u (k (u (k l (k (x x (k 2 x (x l (k (x max x mn. (4.7 = f (ˆx (k,ν (k,ν f (ˆx (k,ν d (k (ˆx (k,ν. (4.8 d (k (ˆx (k,ν = f (ˆx (k,ν, whch shows that ρ (k,ν +δ (k,ν mght. In order to get a globally convergent method, ths natural value ρ (k,ν+1 = mn{ 1.1 (ρ (k,ν + δ (k,ν, 10ρ (k,ν } f δ (k,ν > 0, ρ (k,ν+1 = ρ (k,ν f δ (k,ν 0. (4.9 f (k,ν It follows from the formulas (4.2 (4.5 that the functons are always frst order approxmatons of the orgnal functons f at the current teraton pont,.e. f (k,ν (x (k = f (x (k and (k,ν f (x (k = f (x (k. (4.10 x x f (k,ν Snce the parameters ρ (k,ν are always strctly postve, the functons are strctly convex. Ths mples that there s always a unque optmal soluton of the GCMMA subproblem. There are at least two approaches for solvng the subproblems n MMA and n GCMMA, the dual approach and the prmal-dual nteror-pont approach. In the prmal-dual nteror-pont approach, a sequence of relaxed KKT condtons are solved by Newton s method. We have mplemented ths approach n Matlab, snce all the requred calculatons are most naturally carred out on a matrx and vector level. Ths approach for solvng the subproblem s descrbed next. 8

5. A prmal-dual method for solvng the subproblems n MMA and GCMMA To smplfy the notatons, the teraton ndces k and ν are now removed n the subproblem. Further, we let b = r (k,ν, and drop the constant r (k,ν 0 from the obectve functon. Then the MMA/GCMMA subproblem becomes mnmze g 0 (x + a 0 z + (c y + 1 2 d y 2 =1 subect to g (x a z y b, = 1,..., m g (x = α x β, = 1,..., n z 0, y 0, = 1,..., m =1 ( p u x + and l < α < β < u for all. q x l (5.1, = 0, 1,..., m, (5.2 5.1. Optmalty condtons for the MMA/GCMMA subproblem Snce the subproblem (5.1 s a regular convex problem, the KKT optmalty condtons are both necessary and suffcent for an optmal soluton to (5.1. In order to state these condtons, we frst form the Lagrange functon correspondng to (5.1. L = g 0 (x + a 0 z + + m (c y + 1 2 d y 2 + λ (g (x a z y b + =1 (ξ (α x + η (x β =1 =1 µ y ζz, λ = (λ 1,..., λ m T, ξ = (ξ 1,..., ξ n T, η = (η 1,..., η n T, µ = (µ 1,..., µ m T and ζ are non-negatve Lagrange multplers for the dfferent constrants n (5.1. Let ψ(x, λ = g 0 (x + p (λ = p 0 + λ g (x = =1 Then the Lagrange functon can be wrtten L = ψ(x, λ + (a 0 ζz + + =1 =1 λ p and q (λ = q 0 + (5.3 ( p (λ + q (λ, (5.4 u x x l (ξ (α x + η (x β + =1 (c y + 1 2 d y 2 λ a z λ y λ b µ y, =1 9 λ q. (5.5 (5.6

and then the KKT optmalty condtons for the subproblem (5.1 become as follows. ψ ξ + η = 0, x = 1,..., n ( L/ x = 0 (5.7a c + d y λ µ = 0, = 1,..., m ( L/ y = 0 (5.7b a 0 ζ λ T a = 0, ( L/ z = 0 (5.7c g (x a z y b 0, = 1,..., m (prmal feasblty (5.7d λ (g (x a z y b = 0, = 1,..., m (compl slackness (5.7e ξ (α x = 0, = 1,..., n (compl slackness (5.7f η (x β = 0, = 1,..., n (compl slackness (5.7g µ y = 0, = 1,..., m (compl slackness (5.7h ζz = 0, (compl slackness (5.7 α x β, = 1,..., n (prmal feasblty (5.7 z 0 and y 0, = 1,..., m (prmal feasblty (5.7k ξ 0 and η 0, = 1,..., n (dual feasblty (5.7l ζ 0 and µ 0, = 1,..., m (dual feasblty (5.7m λ 0, = 1,..., m (dual feasblty (5.7n ψ x = p (λ (u x 2 q (λ (x l 2 and λ T a = λ a. (5.8 =1 10

5.2. The relaxed optmalty condtons for the MMA/GCMMA subproblem When a prmal-dual nteror-pont method s used for solvng the subproblem (5.1, the zeros n the rght hand sdes of the complementary slackness condtons (5.7e (5.7 are replaced by the negatve of a small parameter ε > 0. Further, slack varables s are ntroduced for the constrants (5.7d. The relaxed optmalty condtons then become as follows. ψ ξ + η = 0, x = 1,..., n (5.9a c + d y λ µ = 0, = 1,..., m (5.9b a 0 ζ λ T a = 0, (5.9c g (x a z y + s b = 0, = 1,..., m (5.9d ξ (x α ε = 0, = 1,..., n (5.9e η (β x ε = 0, = 1,..., n (5.9f µ y ε = 0, = 1,..., m (5.9g ζz ε = 0, (5.9h λ s ε = 0, = 1,..., m (5.9 x α > 0 and ξ > 0, = 1,..., n (5.9 β x > 0 and η > 0, = 1,..., n (5.9k y > 0 and µ > 0, = 1,..., m (5.9l z > 0 and ζ > 0, (5.9m s > 0 and λ > 0, = 1,..., m (5.9n For each fxed ε > 0, there exsts a unque soluton (x, y, z, λ, ξ, η, µ, ζ, s of these condtons. Ths follows because (5.9a (5.9n are mathematcally (but not numercally equvalent to the KKT condtons of the followng strctly convex problem n the varables x, y, z and s. mnmze g 0 (x + a 0 z + (c y + 1 2 d y 2 ε log(x α ε log(β x ε log y ε log s ε log z (5.10 subect to g (x a z y + s b, = 1,..., m (α < x < β, y > 0, z > 0, s > 0 the strct nequaltes wthn parentheses wll be satsfed automatcally because of the logarthm terms n the obectve functon. 11

5.3. A Newton drecton for the relaxed optmalty condtons Assume that a pont w = (x, y, z, λ, ξ, η, µ, ζ, s whch satsfes (5.9 (5.9n s gven, and assume that, startng from ths pont, Newton s method should be appled to (5.9a (5.9. Then the followng system of lnear equatons n w = ( x, y, z, λ, ξ, η, µ, ζ, s should be generated and solved. Ψ G T e e d e e a T 1 G e a e ξ x α η β x µ y ζ z s λ x y z λ ξ η µ ζ s = δ x δ y δ z δ λ δ ξ δ η δ µ δ ζ δ s δ x,..., δ s are defned by the left hand sdes n (5.9a (5.9, Ψ s an n n dagonal matrx wth (Ψ = 2 ψ x 2 = 2p (λ (u x 3 + 2q (λ (x l 3, (5.11 G s an m n matrx wth (G = g x = p (u x 2 q (x l 2, (5.12 d s a dagonal matrx wth the vector d = (d 1,..., d m T on the dagonal, x α s a dagonal matrx wth the vector x α on the dagonal, etc. e s a vector (1,..., 1 T, wth dmenson apparent from the context, so that e s a unty matrx, wth dmenson apparent from the context. In the above Newton system, ξ, η, µ, ζ and s can be elmnated through ξ = x α 1 ξ x ξ + ε x α 1 e, η = β x 1 η x η + ε β x 1 e, (5.13a (5.13b µ = y 1 µ y µ + ε y 1 e, (5.13c ζ = (ζ/z z ζ + ε/z, s = λ 1 s λ s + ε λ 1 e. (5.13d (5.13e 12

Then the followng system n ( x, y, z, λ s obtaned. D x G T D y e ζ/z a T G e a D λ x δ x y δ y = z δ z λ δ λ (5.14 D x = Ψ + x α 1 ξ + β x 1 η, (a dagonal matrx (5.15a D y = d + y 1 µ, (a dagonal matrx (5.15b D λ = λ 1 s, (a dagonal matrx (5.15c δ x = ψ x ε x α 1 e + ε β x 1 e, δ y = c + d y λ ε y 1 e, (5.15d (5.15e δ z = a 0 λ T a ε/z, (a scalar (5.15f δ λ = g(x az y b + ε λ 1 e. (5.15g In (5.15d, ψ x s a vector wth the n components ψ x = Next, y can be elmnated from the system (5.14 through Then the followng system n x, z and λ s obtaned. p (λ (u x 2 q (λ (x l 2. y = Dy 1 λ Dy 1 δ y. (5.16 D x G T ζ/z a T G a D λy x δ x z = δ z λ δ λy (5.17 D λy = D λ + D 1 y, (a dagonal matrx (5.18a δ λy = δ λ + D 1 y δ y. (5.18b 13

In ths system (5.17, ether x or λ can be elmnated. If x s elmnated through the followng system n λ and z s obtaned. x = Dx 1 G T λ Dx 1 δ x, (5.19 D λy + GD 1 x G T a a T ζ/z λ z = δ λy GD 1 x δ z δ x (5.20 If, nstead, λ s elmnated from the system (5.17 through λ = D 1 λy the followng system n x and z s obtaned. G x D 1 λy a z + D 1 δ λy λy, (5.21 D x + G T D 1 λy G a T D 1 λy G GT D 1 λy a ζ/z + at D 1 λy a x z = δ x G T D 1 δ λy λy δ z + a T D 1 δ λy λy (5.22 Note that the sze of the system (5.20 s (m+1 (m+1, whle the sze of the system (5.22 s (n + 1 (n + 1. Therefore, typcally, the system (5.20 should be preferred f n > m, whle the system (5.22 should be preferred f m > n. In both the systems (5.20 and (5.22, t s of course possble to elmnate also the varable z (and reduce the systems to m m and n n, respectvely, but for numercal reasons t s not wse to do that. As an example, f a s dense (many non-zeros then the system obtaned after elmnatng z wll be dense, even f the orgnal system (5.20 or (5.22 s sparse. 5.4. Lne search n the Newton drecton When the Newton drecton w = ( x, y, z, λ, ξ, η, µ, ζ, s has been calculated n the gven pont w = (x, y, z, λ, ξ, η, µ, ζ, s, a step should be taken n that drecton. A pure Newton step would be to take a step equal to the calculated drecton, but such a step mght lead to a pont some of the nequaltes (5.9 (5.9n are volated. Therefore, we frst let t be the largest number such that t 1, x + t x α 0.01(x α for all, β (x + t x 0.01(β x for all, and (y, z, λ, ξ, η, µ, ζ, s + t ( y, z, λ, ξ, η, µ, ζ, s 0.01 (y, z, λ, ξ, η, µ, ζ, s. Further, the new pont should also be n some sense better than the prevous. Therefore, we let τ be the largest of t, t/2, t/4, t/8,... such that δ(w + τ w < δ(w, δ(w s the resdual vector defned by the left hand sdes n the relaxed KKT condtons (5.9a (5.9, and s the ordnary Eucldan norm. Ths s always possble to obtan snce the Newton drecton s a descent drecton for δ(w. 14

5.5. The complete prmal-dual algorthm Frst of all, ε (1 and a startng pont w (1 = (x (1, y (1, z (1, λ (1, ξ (1, η (1, µ (1, ζ (1, s (1 whch satsfes (5.9 (5.9n are chosen. The followng s a smple but reasonable choce. ε (1 = 1, x (1 = 1 2 (α + β, y (1 = 1, z (1 = 1, ζ (1 = 1, λ (1 = 1, s (1 = 1, ξ (1 = max{ 1, 1/(x (1 α }, η (1 = max{ 1, 1/(β x (1 }, µ (1 = max{ 1, c /2}. A typcal teraton, leadng from the l:th teraton pont w (l to the (l+1:th teraton pont w (l+1, conssts of the followng steps. Step 1: For gven ε (l and w (l whch satsfy (5.9 (5.9n, calculate w (l as descrbed n secton 5.3 above. Step 2: Calculate a steplength τ (l as descrbed n secton 5.4 above. Step 3: Let w (l+1 = w (l + τ (l w (l. Step 4: If δ(w (l+1 < 0.9 ε (l, let ε (l+1 = 0.1 ε (l. Otherwse, let ε (l+1 = ε (l. Increase l by 1 and go to Step 1. The algorthm s termnated when ε (l has become suffcently small, say ε (l 10 7, and δ(w (l+1 < 0.9 ε (l. 6. References [1] K. Svanberg, The method of movng asymptotes a new method for structural optmzaton, Internatonal Journal for Numercal Methods n Engneerng, 1987, 24, 359-373. [2] K. Svanberg, A class of globally convergent optmzaton methods based on conservatve convex separable approxmatons, SIAM Journal of Optmzaton, 2002, 12, 555-573. 15