A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm

Size: px

Start display at page:

Download "A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm"

Harriet Cole
5 years ago
Views:

1 Journal name manuscript No. (will be inserted by the editor) A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm Rene Kuhlmann Christof Büsens Received: date / Accepted: date Abstract Interior-point methods have been shown to be very efficient for large-scale nonlinear programming. The combination with penalty methods increases their robustness due to the regularization of the constraints caused by the penalty term. In this paper a primal-dual penalty-interior-point algorithm is proposed, that is based on an augmented Lagrangian approach with an l2-exact penalty function. Global convergence is maintained by a combination of a merit function and a filter approach. Unlie other filter methods no separate feasibility restoration phase is required. The algorithm has been implemented within the solver WORHP to study different penalty and line search options and to compare its numerical performance to two other state-of-theart nonlinear programming algorithms, the interior-point method IPOPT and the sequential quadratic programming method of WORHP. Keywords Nonlinear Programming Constrained Optimization Augmented Lagrangian Penalty-Interior-Point Algorithm Primal-Dual Method Mathematics Subject Classification (2000) 49M05 49M15 49M29 49M37 90C06 90C26 90C30 90C51 Rene Kuhlmann Optimization and Optimal Control, Center for Industrial Mathematics (ZeTeM), Bibliothestr. 5, Bremen, Germany rene.uhlmann@math.uni-bremen.de Christof Büsens buesens@math.uni-bremen.de

2 2 Rene Kuhlmann, Christof Büsens 1 Introduction In this paper we consider the nonlinear optimization problem min x R n f(x) s.t. c(x) = 0 x 0 (1.1) with twice continuously differentiable functions f : R n R and c : R n R m, but the methods can easily be extended to the general case with l x g and c(x) 0 (cf. [45]). The widely used and very efficient interior-point strategy (cf. [6, 21, 34]) handles the inequality constraints by adding a barrier term to the objective function f(x) and solving a sequence of barrier problems min x R n ϕ µ (x) := f(x) µ s.t. c(x) = 0 n ln x (i) i=1 (1.2) with a decreasing barrier parameter µ > 0. In this paper, we consider an algorithm that penalizes both, the inequality box constraints and the nonlinear equality constraints c(x), by a log-barrier term and an augmented Lagrangian term, respectively. However, unlie other augmented Lagrangian methods we do not use a quadratic l2-norm as measure for the constraint violation, but an exact l2-penalty-interior-point algorithm (see Chen and Goldfarb [10, 11, 12]). The resulting unconstrained reformulation is ( ) min x Φ µ,λ,ρ,τ (x) := ρ f(x) µ n ln x (i) + λ c(x) i=1 + τ c(x) 2 (1.3) with penalty parameters ρ 0 and τ > 0, a barrier parameter µ 0 and Lagrangian multipliers λ R m. For improved readability the dependences of ρ, τ and λ are neglected when clear from the context and we write Φ µ (x) := Φ µ,λ,ρ,τ (x). The penalty parameter τ controls the size of the multipliers and will be updated until a certain threshold value is reached. The penalty parameter ρ balances the optimization of the Lagrangian function and the constraint violation of problem (1.2). In particular the algorithm solves a sequence of (1.3) with a decreasing penalty parameter ρ until finding a first-order optimal point of (1.2). However, unlie penalty-interior-point algorithms with a quadratic penalty function (e.g. Armand et al. [1], Armand and Omheni [2, 3] or Yamashita and Yabe [47]) the penalty parameter ρ does not have to converge to zero. A first-order optimal point of (1.2) satisfying the Mangasarian- Fromovitz constraint qualification (MFCQ) is a stationary point of the merit function Φ µ (x) if ρ is smaller than a certain threshold value or the duals of (1.3) equal ρλ. Using two penalty parameters is mainly motivated by a better accuracy of the implemented algorithm.

3 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 3 It is an important feature of optimization algorithms to detect infeasibility of the given problem. In such a case a first-order optimal point of (1.2) does not exist and the penalty parameter ρ will converge to zero resulting in the optimization of min x 0 c(x) 2. (1.4) The solution of (1.4) that is infeasible for (1.1) serves as a certificate of infeasibility. The presented algorithm follows the idea of Fletcher [18] and Byrd et al. [8] to place the penalty parameter in front of the objective function or the Lagrangian function, respectively, instead of in front of the measure of constraint violation for better solver performance for infeasible problems. The proposed algorithm shares the following properties with other primaldual penalty-interior-point algorithms (e.g. [1, 10, 15]): The step is a guaranteed descent direction for the merit function Φ µ (x) and a ran-deficient Jacobian of the constraints at infeasible non-stationary points can be handled without modification of the Newton system. The latter avoids failure of global convergence for example for the optimization problem in Wächter and Biegler [44]. An extension to the pure (quadratic) l2-penalty function are the augmented Lagrangian methods (e.g. [14, 35]). Recently, primal-dual augmented Lagrangian methods have enjoyed an increased popularity. They have been studied by Armand and Omheni [2,3], Forsgren and Gill [20], Gertz and Gill [22], Gill and Robinson [23] and Goldfarb et al. [25]. These methods can remove the perturbation of the KKT system caused by the penalty term by an appropriate update of the Lagrangian multipliers λ. This maes it unnecessary to calculate a further unperturbed step per iteration lie in Chen and Goldfarb [11, 12], and naturally leads to a quadratic rate of convergence to first-order optimal points of (1.2) and a superlinear rate in case of the nonlinear program (1.1). Our update of the Lagrangian multipliers λ differs from other augmented Lagrangian based algorithms (e.g. [2,3,13]), as it does not rely on a criterion that measures the reduction of the constraint violation. Instead, it is based on the dual information and is designed to be applied as often as possible when approaching the optimal solution. For step acceptance, instead of following recent research trends to avoid penalties and a filter lie in Liu and Yuan [33] or Gould and Toint [30], we combine the two the merit function and the filter mechanism as line search criteria, of which at least one has to indicate progress for a trial iterate. Comparable combinations have been proposed by Chen and Goldfarb [12] and Gould et al. [26, 27]. The filter, originally introduced by Fletcher and Leyffer [19] significantly increases the flexibility of the step acceptance and, thus, is widely used by nonlinear programming solvers (e.g. [4, 9, 19, 40, 45]). Global convergence has been proved for several filter methods and usually depends on a further algorithm phase: the feasibility restoration. Due to the combination with the merit function, a feasibility restoration phase which we believe to be a drawbac of the filter approach is not necessary for global convergence.

4 4 Rene Kuhlmann, Christof Büsens A further advantage is that our filter entries do not depend on parameter choices, e.g. the barrier parameter µ. Other penalty-interior-point algorithms consider an l1-penalty, see e.g., Benson et al. [5], Boman [7], Curtis [15], Fletcher [18], Tits et al. [39], Gould et al. [29], Yamashita [46]. Many l1-penalty-interior-point algorithms reformulate the problem into a smooth one using additional elastic variables. However, for large-scale nonlinear programming this can be a disadvantage. Closely related are also the stabilized sequential quadratic programming methods lie the wors of Gill and Robinson [24] or Shen et al. [38]. The aim of this paper is to study the convergence properties of the proposed algorithm and its numerical performance. Therefore, we implemented the algorithm within the large-scale nonlinear programming solver WORHP. The paper is organized as follows. In Section 2 we describe the algorithm including the general approach of primal-dual penalty-interior-point algorithms, the step calculation and the line search. The global and local convergence of the presented algorithm are shown in Section 3 and Section 4, respectively. Finally, in Section 5 we perform numerical experiments using the CUTEst test set [28] to show the efficiency of the proposed algorithm and compare it to other solvers, in particular the interior-point method IPOPT [45] and the sequential quadratic programming algorithm of WORHP [9]. Notation Matrices are written in uppercase and vectors in lowercase. The i-th component of a vector x is denoted by x (i). A diagonal matrix with the entries of a vector x on its diagonal has the same name in uppercase, i.e X := diag(x). The vector e stands for a vector of all ones with appropriate dimension. The norm is the Euclidean norm 2 unless stated differently, e.g. is the maximum norm. The notation In(X) = (λ +, λ, λ 0 ) stands for the inertia of a matrix X, in particular (λ +, λ, λ 0 ) are the numbers of positive, negative and zero eigenvalues, respectively. We will denote the gradient of a function h 1 : R n R at the point x 0 as h 1 (x 0 ) R n, the Jacobian of a function h 2 : R n R m as h 2 (x 0 ) R n m and the subdifferential of h 1 (x) at x 0 as h 1 (x 0 ). 2 Algorithm Description 2.1 The Primal-Dual Penalty-Interior-Point Approach The first-order optimality conditions of problem (1.2) are f(x) + c(x)λ ν = 0 c(x) = 0 Xν µe = 0, (2.1a) (2.1b) (2.1c) where λ R m and ν R n correspond to the Lagrangian multipliers of the nonlinear equality constraints and the inequality bound constraints, respectively. In the case of µ = 0, the conditions (2.1) are the optimality conditions

5 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 5 of (1.1) if x 0, z 0 are added. It is well-established to consider (2.1) as a homotopy method with µ 0 for finding an optimal solution of (1.1). To derive the first-order optimality conditions for problem (1.3), we consider the generic function Φ(x) := ρϕ(x) + h(c(x)) with c as above, the smooth function ϕ : R n R and the non-smooth but convex function h : R m R. The first-order necessary optimality condition is that if a point x minimizes Φ(x), then there exists y h such that ρ ϕ(x) + c(x)y = 0, see Fletcher [17]. In case of problem (1.3), we have h(c) = ρλ c + τ c and, thus, h(c) = ρλ + τ { c c if c > 0 {g R m g 1} if c = 0. (2.2) We can transform the condition y h to c(x) τ 1 c(x) (y ρλ) = 0 together with y ρλ τ and retrieve the first-order optimality conditions for problem (1.3): ρ f(x) + c(x)y z = 0 c(x) τ 1 c(x) (y ρλ) = 0 y ρλ τ Xz ρµe = 0 (2.3a) (2.3b) (2.3c) (2.3d) The optimality conditions (2.3) can be interpreted as scaled and perturbed optimality conditions (2.1) of the barrier problem (1.2) with a scaling factor ρ and a perturbation τ 1 c(x) (y ρλ) with size smaller or equal to c(x). This perturbation vanishes if either the constraint violation c(x) is zero or the duals are chosen to be y = ρλ. This leads to the following propositions that formally state the relation of first-order optimal points of problem (1.2) and (1.3) similarly to [15]. Proposition 2.1 Let τ > 0, ρ > 0, µ > 0 and let ( x, ȳ, z) be a first-order optimal point of problem (1.3), i.e. equations (2.3) hold. If c( x) = 0 and ( λ, ν) = (ȳ/ρ, z/ρ), then ( x, λ, ν) is first-order optimal for problem (1.2). Proposition 2.2 Let µ > 0. If ( x, λ, ν) is a first-order optimal point of (1.2), then for all penalty parameters ρ > 0 and τ > 0, the point ( x, ȳ, z) with (ȳ, z) = (ρ λ, ρ ν) is first-order optimal for (1.3). The Propositions 2.1 and 2.2 validate our approach of optimizing problem (1.3) with appropriate choices of the penalty parameters ρ and τ and Lagrangian multipliers λ for finding a first-order optimal point of the barrier problem (1.2). This in turn yields a solution of (1.1) for µ 0. A difference

6 6 Rene Kuhlmann, Christof Büsens to other optimization algorithms is that our primal-dual algorithm wors with the possibly scaled multipliers y instead of the multipliers λ of problem (1.2). For penalization of the constraints there are two options with different properties: increasing τ or decreasing ρ. While ρ scales the multipliers y itself, τ scales their distance to the multipliers λ, see (2.3a) and (2.3c). If a huge penalization of the constraints is needed, i.e. τ being very big or ρ being very small, both approaches have a disadvantage. On the one hand, if the problem (1.1) is infeasible, letting τ tend to infinity would lead to divergence of the multipliers y (cf. Chen and Goldfarb [10]), which can be harmful in practical implementations. On the other hand, a very small ρ can cause difficulties in finding the optimal solution with respect to a given tolerance due to the scaling (cf. Curtis [15]). That is why we propose the combination of both. All together, the parameters ρ, τ and λ form a dual trust-region algorithm. 2.2 Step Computation Instead of applying Newton s method directly to the optimality conditions (2.3) at an iterate (x, y ), we first rewrite the feasibility condition (2.3b) to c(x ) + σ (ρ λ y ) = 0 and fix σ = c(x ) /τ throughout the iteration. The dual trust-region condition (2.3c) is omitted for the step computation. For given penalty parameter ρ, the Newton iterate then yields the linear system H c(x ) I x ρ f(x ) + c(x )y z c(x ) σ I 0 y = c(x ) + σ (ρ λ y ), (2.4) Z 0 X z X z µρ e where H := ρ 2 xxf(x ) + m i=1 y(i) 2 xxc (i) (x ) is the Hessian of the Lagrangian function with respect to the multipliers y or an approximation to it. In case of c(x ) > 0 this linear equation system is equivalent to the one of a primal-dual augmented Lagrangian method with a quadratic l2-penalty function (cf. Armand and Omheni [3]) and penalty parameter adaptively set to σ = c(x ) /τ. Because the iterates x are ept strictly feasible throughout the optimization, we can eliminate the last equation of the Newton system (2.4) and solve the smaller linear equation system [ ] [ x ρ f(x M = ) + c(x )y µρ X 1 e ] (2.5a) y c(x ) + σ (ρ λ y ) [ H + X 1 M := Z ] c(x ) c(x ) (2.5b) σ I z = µρ X 1 e z X 1 Z x. (2.5c) Greif et al. [31] investigate eigenvalue bounds for the two matrices of (2.4) and (2.5) if the Hessian H is regularized or σ is constantly zero and conclude

7 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 7 that (2.4) is better conditioned when µ becomes very small. However, in the context of practical implementations for large-scale programming the system (2.5) is generally preferred (cf. [41, 45]). The next result shows that, provided the Hessian matrix is modified appropriately, the step computed by (2.4) will always yield a descent direction for the merit function Φ µ. Proposition 2.3 Let τ > 0, ρ 0, µ 0 and ( x, y ) be a solution of the linear system (2.5). Then, Φ µ (x ) x = ρ ϕ µ (x ) x ρ λ c(x ) τ c(x ) + (c(x ) + σ ρ λ ) (y + y ρ λ ) { ( x ) (H + X 1 = Z ) x, if c(x ) = 0 ( x ) ( H + X 1 Z + σ 1 c(x ) c(x ) ) x, if c(x ) > 0 Furthermore, if the inertia of (2.5) satisfies In(M ) = (n, m, 0) and the optimality conditions (2.3) are not satisfied at the current iterate (x, y, z ), then ( x, y ) is a descent direction for the merit function Φ µ at x, i.e. Φ µ (x ) x < 0. Proof The proof is similar to the one of Lemma 3.2 in [10] but is extended here for the case of λ not being constantly zero. We split the proof into the two cases c(x ) > 0 and c(x ) = 0: Case c(x ) > 0: Then we have Φ µ (x ) x = ρ ( f(x ) µx 1 e + c(x ) )λ x + σ 1 c(x ) c(x ) x = ρ ϕ µ (x ) x + σ 1 (σ ρ λ + c(x )) c(x ) x = ρ ϕ µ (x ) x + σ 1 (σ ρ λ + c(x )) (σ (y + y ρ λ )) σ 1 (σ ρ λ + c(x )) c(x ) = ρ ϕ µ (x ) x + (c(x ) + σ ρ λ ) (y + y ρ λ ) ρ λ c(x ) τ c(x ), where the third equality follows by applying the second equation of (2.5a). This proves the first equation of the Proposition. In addition, we also have Φ µ (x ) x ( = ρ f(x ) µx 1 e + c(x ) )λ x + σ 1 c(x ) c(x ) x = ( x ) (H + X 1 Z ) x ( y + y ρ λ σ 1 c(x ) ) c(x ) x = ( x ) (H + X 1 Z ) x σ 1 ( x ) c(x ) c(x ) x,

8 8 Rene Kuhlmann, Christof Büsens where the second equality follows from the first and the third equality from the second equation of (2.5a). Case c(x ) = 0: Then we have σ = 0 and c(x ) x = 0 from the second equation of (2.5a) and, thus, c(x + t x ) c(x ) lim t 0 t ( m ( c (i) (x + t x ) c (i) ) 2 ) 1 2 (x ) = lim t 0 t i=1 = c(x ) x = 0. Using this together with the definition of the directional derivative and the fact that c(x ) = 0, σ = 0 and, again, c(x ) x = 0 yields Φ µ (x ) x = lim t 0 ( ϕ µ (x + t x ) ϕ µ (x ) ρ t ) + τ c(x + t x ) c(x ) t = ρ ϕ µ (x ) x + ρ λ c(x ) x = ρ ϕ µ (x ) x = ρ ϕ µ (x ) x ρ λ c(x ) τ c(x ) + (c(x ) + σ ρ λ ) (y + y ρ λ ). + ρ λ (c(x + t x ) c(x )) t This proves the first equation of the Proposition. Furthermore, using the first equation of (2.5a) we get Φ µ (x ) x = ρ ϕ µ (x ) x = ( x ) (H + X 1 Z ) x (y + y ) c(x ) x = ( x ) (H + X 1 Z ) x. Combining the two cases, the two equations of the Proposition have been proven. Using Lemma 3.1 of [10], In(M ) = (n, m, 0) yields the positve definiteness of H + X 1 Z or H + X 1 Z + σ 1 c(x ) c(x ) for c(x ) = 0 or c(x ) > 0, respectively. If the optimality conditions (2.3) are not satisfied, the step ( x, y ) is not zero. Thus, we have Φ µ (x ) x < 0. Proposition 2.3 states that the step calculated by (2.5) is a descent direction if the inertia of (2.5) satisfies In(M ) = (n, m, 0). If this is not the case, it can be achieved by regularizing the Hessian H + X 1 Z, i.e. adding a multiple of the identity to H + X 1 Z until In(M ) = (n, m, 0) holds (cf. [10,42,45]).

9 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 9 Adding a further vi with a small v > 0 then guarantees the conditions (cf. [10, Lemma 3.1]) ( x ) H x v x 2, or (2.7a) ( x ) ( H + X 1 Z + σ 1 c(x ) c(x ) ) x v x 2, (2.7b) in case of c(x ) = 0 or c(x ) > 0, respectively. This strategy can also be interpreted as a proximal algorithm or primal trust-region algorithm (cf. Parih and Boyd [37]). It can only fail for this algorithm if the current iterate is feasible and the MFCQ fails to hold. In other words, this algorithm has the important property of handling problems with ran-deficient Jacobians c(x ) at infeasible non-stationary points due to the automatic dual regularization if σ > 0, i.e. the term σ I in the (2,2)-bloc of M. The linear equation system (2.5) reveals another important feature of penalty-interior-point algorithms that rely on an augmented Lagrangian approach: Choosing λ = y /ρ reduces the system (2.5) to a regularized Newton method applied to the optimality conditions of the barrier problem (1.2), which is relevant for the fast local convergence. 2.3 Computation of the Step Sizes After the step computation a step size α x (0, αx max] with α x max (0, 1] has to be determined to update the primal iterates by x +1 x + α x x. (2.8) The step size has to guarantee that the iterate x +1 remains strictly positive, which is done by a fraction-to-the-boundary rule with a parameter sequence {η } with η (0, 1) and η 1: α x max := max {α (0, 1] x + α x (1 η )x } (2.9) To measure progress towards the optimal solution, two different approaches are combined: a filter and a merit function. Checing a reduction in the merit function Φ µ is a straightforward criterion for penalty-interior-point algorithms. In particular, for a trial iterate x + α x x we chec the Armijo condition Φ µ (x + α x x ) Φ µ (x ) ωα x Φ µ (x ) x (2.10) with ω ( 0, 1 2). Since Proposition 2.1 guarantees that the step x is a descent direction for the merit function Φ µ (x ) with an appropriate Hessian regularization, a step size α x (0, αx max) that satisfies (2.10) exists. However, as pointed out by Fletcher and Leyffer [19] a bad choice of the penalty parameter can slow down the performance of the optimization method.

10 10 Rene Kuhlmann, Christof Büsens For example if the penalty parameter τ is too large or ρ too small, respectively, the influence of the objective function may be damped out. Analogously this may occur to the constraint violation if τ is too small or ρ too large. To avoid this and to increase the flexibility of the step acceptance, it is said to be acceptable if one of the two measures, constraint violation θ(x) = c(x) or objective function, improve. This is the basic idea of the filter method, which can be interpreted as checing a reduction in the merit function Φ µ for either ρ = 0 or τ = 0. The filter is defined as the prohibited region in the two-dimensional space of constraint violation θ and objective function value f and is initially set to F 0 { (θ, f) R 2 θ θ max }. (2.11) with a maximum allowed constrained violation θ max > 0. Unlie other filter methods we do not use ϕ µ (x) as objective function, but the original f(x) to avoid the dependence of the filter data from specific parameter choices, e.g. µ or λ. A trial iterate x + α x x is accepted by the filter, if it produces a sufficient reduction of constraint violation or objective function with respect to a filter envelope δ > 0, regarding the current iterate x, i.e. if θ(x + α x x ) + γ θ δ θ(x ), or (2.12a) f(x + α x x ) + γ f δ f(x ) holds where γ θ, γ f > 0, and regarding the current filter: (2.12b) (θ(x + α x x + γ θ δ ), f(x + α x x + γ f δ )) / F (2.13) The filter envelope δ measures the error of the optimality conditions of problem (1.2) and is defined as follows: δ := ( ρ f(x ) + c(x )y z c(x ) ) z ρ µx 1 e (2.14) A switching condition, usually a mandatory feature of filter methods, is not required due to the choice of the filter envelope and the combination with the merit function. If the filter accepts the trial iterate, it is augmented by F +1 F { (θ, f) R 2 θ θ(x ) and f f(x ) } (2.15) to avoid cycling of the iterates. If neither of the two criteria, the filter or the merit function, accepts the maximum step size αmax, x similar to [12] we try to refine the step by a secondorder-correction step [ ] x M y [ ρ f(x = ) + c(x )y µρ X 1 e c(x + αmax x x ) αmax c(x x ) x ], (2.16)

11 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 11 and apply a fraction-to-the-boundary rule for x, i.e. α max x := max {α (0, 1] x + α x } (1 η )x. (2.17) The filter conditions for the second-order-correction that correspond to (2.12) and (2.13) are θ(x + α x max x ) + γ θ δ θ(x ) or (2.18a) f(x + α max x x ) + γ f δ f(x ), and (2.18b) ( θ(x + α max x x + γ θ δ ), f(x + α max x x ) + γ f δ ) / F, (2.18c) and the Armijo condition corresponding to (2.10) is Φ µ (x + α x max x ) Φ µ (x ) ω α x max Φ µ (x ) x. (2.19) If one of the two accepts the step, we update x +1 by x +1 x + α max x x instead of (2.8). Since this is a strategy to improve local convergence, it is just applied if λ = y /ρ holds in the current iteration. Otherwise, or if the second-order-correction step does not help, the step x will be rejected and a bactracing line search α x βαx with β (0, 1) is applied and the filter and Armijo condition are checed for x and the updated step size again. The dual iterates (y, z ) are updated by y +1 y + y z +1 z + α z z. (2.20a) (2.20b) with a fraction-to-the-boundary rule α z := max {α (0, 1] z + α z (1 η )z } (2.21) [ ] and a further projection of z (i) +1 into the interval ρ µ/(κ z x (i) +1 ), κ zρ µ/x (i) +1 by { { } z (i) +1 max min z (i) +1, κ zρ µ, x (i) +1 ρ µ κ z x (i) +1 } i = 1,..., n. (2.22) with κ z > 1. While the fraction-to-the-boundary rule again guarantees the strict positivity of the iterate z +1, the projection avoids that z (i) +1 deviates too much from ρ µ/x (i) +1 which has to be satisfied in an optimal solution of (1.2).

12 12 Rene Kuhlmann, Christof Büsens 2.4 Update of Lagrangian Multipliers and Penalty Parameters For the update of the Lagrangian multipliers λ and the penalty parameters ρ and τ we chec if problem (1.3) has been solved to a certain accuracy, i.e. we chec ρ 1 f(x ) + c(x )y z ε c(x ) τ 1 1 c(x ) (y ρ 1 λ 1 ) ε X z ρ 1 µe ε, (2.23a) (2.23b) (2.23c) for a tolerance ε > 0 that converges to zero if (2.23) is satisfied infinitely many times. In the following, we define the maximum of the left sides as E µ,ρ,τ,λ (x, y, z ). If the left side of (2.23) is equal to zero, we have 0 Φ µ (x ) meaning that x is a stationary point of the merit function Φ µ. However, this condition is not sufficient for a first-order optimal solution of (1.2). In particular, Proposition 2.1 further requires the constraint violation c(x ) to be zero. This can be checed by the omitted dual trust-region condition y ρ 1 λ 1 τ 1, see (2.3c). From (2.2) we can conclude that if y ρ 1 λ 1 < τ 1 holds, the case c(x ) = 0 in (2.2) will be true and if y ρ 1 λ 1 = τ 1 we have c(x ) 0. In the former case Proposition 2.1 suggest to update the Lagrangian multipliers λ and in the latter case to update the penalty parameter ρ or τ trying to avoid the case of c(x ) > 0. In particular, we update the penalty parameter ρ or τ if y ρ 1 λ 1 > κ y τ 1 (2.24) with κ y (0, 1) holds in addition to (2.23). Otherwise, if y ρ 1 λ 1 κ y τ 1 (2.25) is satisfied, an update of the Lagrangian multipliers (λ, ν ) ρ 1 (y, z ). (2.26) is applied. This update strategy of the multipliers λ differ from other augmented Lagrangian based algorithms as it does not rely on a further criterion that measures a reduction in the constraint violation (see for example Armand and Omheni [2,3] or Conn et al. [13]). The penalty parameter update is defined as ρ χ ρ ρ 1, or (2.27a) τ min{τ max, χ τ τ 1 }. (2.27b) with χ ρ (0, 1), χ τ > 1 and τ max > 0. As mentioned in the end of Section 2.1, the updates of ρ and τ both may have a disadvantage. We prefer to update τ using (2.27b) if problem (1.1) is feasible and to update ρ using (2.27a) otherwise. In the feasible case, this would avoid the scaling of the optimality conditions of (1.1) and (1.2), which can decrease the accuracy of the solver.

13 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 13 And in the infeasible case, this would enable the solver to satisfy the optimality conditions of the feasibility problem (1.4) to a better accuracy. Because it is unnown in advance if (1.1) is feasible or not, we propose the following strategy for a penalty update: if τ < τ max holds, then we update τ using (2.27b) and, otherwise, we update ρ using (2.27a). This way the algorithm switches to the update of ρ if it gets more liely that problem (1.1) is infeasible. With the threshold τ max it is possible to adjust the timing of the switch. 2.5 The Algorithm In this section we formally state our penalty-interior-point algorithm. We split the presentation into Algorithm A (inner algorithm) that solves problem (1.2) for a given barrier parameter µ and Algorithm B (outer algorithm) that repeatedly calls Algorithm A with decreasing µ 0. After an initialization in Step A-1 Algorithm A enters the optimization loop, which starts with the optimality and infeasibility chec at Step A-2 and Step A-3, respectively. Subsequently, the optimality of the l2-exact augmented Lagrangian subproblem is evaluated at Step A-4 and Step A-5 to update the penalty parameters ρ or τ, or the Lagrangian multipliers (λ, ν ). The search direction is calculated in Step A-7 and its step size in Step A-8. It remains the update of the iterate in Step A-9 and the finalization of the iteration in Step A-11. The most expensive part of Algorithm A is the factorization of the linear equation system in Step A-7. Note, that this factorization can be reused for the calculation of the second-order-correction in Step A-8.4. Algorithm B starts with an initialization in Step B-1. Within the optimization loop, the optimality chec is performed first in Step B-2. If it is not satisfied, Algorithm A is called in Step B-3, which solves (1.2) up to a tolerance ε µj = δµ j with δ (0, min{ n, δ µ }) with δ µ > 0. It determines how to update the iterate in Step B-4. For the barrier update in Step B-5 we apply the well-established update rule } µ j+1 min { χ µ µ j, µ κµ j (2.28) η j+1 max{η min, 1 µ κη j } (2.29) with χ µ (0, 1), κ µ (1, 2), η min (0, 1) and κ η > κ µ 1 to eventually update µ at a superlinear rate, see [12]. Step B-6 finalizes the iteration of Algorithm B. 3 Global Convergence This algorithm can be seen as an extension or modification of the l2-penaltyinterior-point algorithm in [10, 12]. Therefore, the proof of global convergence is based on these wors and we focus on the differences that arise due to the augmented Lagrangian approach, the different penalty parameter and its updates. Throughout this section we mae the following assumptions.

14 14 Rene Kuhlmann, Christof Büsens Algorithm A Inner Algorithm A-1: (Initialization) Set 0. Choose a starting point (x 0, y 0, z 0 ) with x 0 > 0 and initial penalty parameters ρ 0 (0, 1] and τ 0 > 0. Select a tolerance ε tol > 0, a sequence ζ 0 monotonically with ζ > 0 and an l > 0. Choose a barrier parameter µ > 0 and a switching parameter τ max > 0. Set θ max > 0 and initialize the filter by (2.11). Choose a sequence η 1 with η (0, 1). Furthermore, select ω (0, 1/2), κ y (0, 1), κ z > 1, κ H > 0, κ ε (0, 1), χ ρ (0, 1), χ τ > 1, γ f > 0, γ θ > 0 and β (0, 1). A-2: (Optimality chec) If (2.1) is satisfied to a tolerance ε tol, then STOP; x is a firstorder optimal point of (1.2). A-3: (Infeasibility chec) If (2.3) with ρ = 0 is satisfied to a tolerance ε tol, then STOP; x is a first-order optimal point of (1.4) that is infeasible for (1.1) and (1.2). A-4: (Penalty update) If (2.23), (2.24) and τ < τ max are satisfied update τ min{τ max, χ τ τ 1 }, set ρ ρ 1 and reduce the tolerance ε. Otherwise, if (2.23), (2.24) and τ τ max are satisfied, set τ τ 1, update ρ χ ρρ 1 and update the tolerance ε. Otherwise, set τ τ 1 and ρ ρ 1. A-5: (Multiplier update) If (2.23) and (2.25) is satisfied, update Lagrangian multipliers by (λ, ν ) ρ 1 (y, z ) and update the tolerance ε. Otherwise set (λ, ν ) (λ 1, ν 1 ). A-6: (Hessian regularization) Modify the Hessian H by adding a multiple of the identity until M has the correct inertia In(M ) = (n, m, 0). If necessary add a further κ H I such that (2.7) holds. If the Hessian regularization fails, then STOP; x is feasible and the MFCQ fails to hold. A-7: (Search direction) Compute a search direction ( x, y, z ) from (2.5). A-8: (Line search) A-8.1: Apply fraction-to-the-boundary rule (2.9) to get α x max. Set αx αx max. A-8.2: (Filter) If (2.12) and (2.13) are satisfied, accept the current step size α x, augment the filter by (2.15) and go to Step A-9. A-8.3: (Merit function) If the Armijo condition (2.10) is satisfied, go to Step A-9. A-8.4: (Second-order-correction) If λ y /ρ or α x αx max, go to Step A-8.8. Otherwise, calculate second-order-correction step x from (2.16) and α x max from (2.17). A-8.5: (Second-order-correction / Filter) If (2.18) is satisfied, augment the filter by (2.15) and go to Step A-8.7. A-8.6: (Second-order-correction / Merit) If the Armijo condition (2.19) is satisfied, go to Step A-8.7. Otherwise, reject x and go to Step A-8.8. A-8.7: (Second-order-correction / Primal update) Update primal iterate by x +1 x + α x x and go to Step A-10. A-8.8: Reduce step size by setting α βα and go bac to Step A-8.2. A-9: (Primal update) Update primal iterate by x +1 x + α x x. A-10: (Dual update) Use the fraction-to-the-boundary rule (2.21) to get α z. Update dual iterate by y +1 y + y and z +1 z + α z z. Apply the dual projection of (2.22). A-11: ( increment) Set + 1 and go to Step A-2. Assumptions G G1. The functions f and c are real valued and twice continuously differentiable. G2. The primal iterates {x } are bounded. G3. The modified Hessians {H } are bounded. We will use the definition of a Fritz-John point within the global convergence analysis. The reader is referred to [10, Definition ] for details.

15 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 15 Algorithm B Outer Algorithm B-1: (Initialization) Set j 0. Choose a starting point (x 0, y 0, z 0 ) with x 0 > 0 and an initial barrier parameter µ 0 > 0. Select a tolerance ε tol. Choose initial penalty parameters ρ 0 = 1 and τ 0 > 0. Furthermore, select χ µ (0, 1), κ µ (1, 2), δ µ > 0, η min (0, 1), κ η > κ µ 1. B-2: (Optimality chec) If (2.1) with µ = 0 is satisfied to a tolerance ε tol, then STOP; x j is a first-order optimal point of (1.1). B-3: (Inner algorithm) Call Algorithm A with the barrier parameter µ j, the tolerance ε µj, the fraction-to-the-boundary parameter η j and the initial guesses (x j, ρ j λ j, ρ j ν j ) for the iterates and ρ j and τ j for the penalty parameters. Algorithm A returns the solution x j, multipliers ( λ j, ν j ) and penalty parameters ρ j and τ j. B-4: (Iterate update) Update iterate (x j+1, λ j+1, ν j+1 ) ( x j, λ j, ν j ) and set ρ j+1 ρ j and τ j+1 τ j. B-5: (Barrier update) Update the barrier parameter µ j+1 by (2.28) and the fraction-tothe-boundary parameter η j+1 by (2.29). B-6: ( increment) Set j j + 1 and go to Step B Global Convergence of the Inner Algorithm A For the global convergence analysis we assume that the Algorithm A does not terminate, i.e. it produces an infinite sequence of iterates. We will write to refer to the iteration right before iteration of the algorithm. In the first part of the global convergence analysis we study the convergence in case the penalty parameter ρ is updated infinitely many times, i.e. it tends to zero. Note, that we do not have to consider the infinite update of the penalty parameter τ as it is just updated up to the threshold τ max. We therefore assume without loss of generality that τ = τ τ max for all. We begin with a preliminary result stating that in case of infeasible problems the multipliers (λ, ν ) will get updated just finitely many times. Lemma 3.1 Suppose Assumptions G hold. Let K be an index set such that the necessary conditions for an update of the penalty parameter ρ or multipliers (λ, ν ) (2.23) are satisfied. Further assume that {x } K converges to a point x with c( x) > 0. Then the multipliers (λ, ν ) are updated finitely many times. Proof Because x x for K with c( x) > 0 we have c(x ) > 0 for large K. With (2.23b) we can conclude, that then τ c(x ) c(x ) y + ρ λ τ ε c(x ). It follows y ρ λ τc( x)/ c( x) for K since c(x ) is bounded by Assumption G1 and G2 and ε tends to zero. Subsequently, we have y ρ λ τ. For an index 0 K large enough we have that y ρ λ (κy τ, τ] for all K, 0. It follows that for 0 (2.25) is violated and no update of the multipliers (λ, ν ) can be performed.

16 16 Rene Kuhlmann, Christof Büsens Lemma 3.1 can be used to study the possible outcomes if the penalty parameter ρ is updated infinitely many times. Lemma 3.2 Suppose Assumptions G hold. If the penalty parameter ρ is decreased infinitely many times, then it exists an index set K such that either of the following holds: 1. The sequence (λ, ν ) is updated finitely many times and {(x, z / τ)} K converges to a KKT point ( x, z/ τ) of the feasibility problem (1.4) that is infeasible for (1.1). The sequence {y } K converges to τc( x)/ c( x). 2. The sequence {x } K converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ. Proof Let K be an index set such that (2.23) holds, the conditions that are necessary for updates of the penalty parameter ρ and the multipliers λ. The index set K is infinite because ρ is updated infinitely many times by assumption. This implies ρ 0 for K since χ ρ (0, 1). With Assumption G2 it exist an index set K K such that x x > 0 for K. We have to distinguish between two cases: Case c( x) > 0: Then, by Lemma 3.1 it exists an index 0 such that λ = λ for all 0. With τ c(x ) c(x ) y τ = c(x ) c(x ) y + ρ λ ρ λ τ c(x ) c(x ) y + ρ λ + ρ λ ε τ c(x ) + ρ λ it follows that y τ c( x) 1 c( x). Now, by letting K in (2.23a) and (2.23c) yields c( x) 1 c( x)c( x) τ 1 z = 0 and τ 1 X Ze = 0, respectively. Since x > 0 and z 0, it follows that ( x, z/ τ) is a KKT point of the feasibility problem min x 0 c(x) that is infeasible for problem (1.1). Case c( x) = 0: Let K K be the index set of iterations at which the penalty parameter ρ is updated. For all K (2.24) is satisfied and thus y ρ 1 λ 1 > 0. Then there exists (ȳ, z) such that (y ρ 1 λ 1, z ) 1 (y, z ) (ȳ, z) with (ȳ, z) = 1. Dividing (2.23a) and (2.23c) by (y ρ 1 λ 1, z ) and letting K yields c( x)ȳ z = 0 and X Ze = 0 since ρ and ε converge to zero for K. Because of (ȳ, z) = 1 and c(x ) = 0, it follows that ( x, ȳ, z) is a Fritz-John point of problem (1.1) that fails to hold the MFCQ. We next analyze the convergence if the penalty parameter ρ is bounded away from zero, but the duals (λ, ν ) are updated infinitely many times.

17 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 17 Lemma 3.3 Suppose Assumptions G hold. If ρ is updated finitely many times and (λ, ν ) is updated infinitely many times, then it exists an index set K such that either of the following holds: 1. The sequence {(y, z )} K is bounded and {(x, λ, ν )} K converges to a first-order optimal point of problem (1.2). 2. The sequence {(y, z )} K is unbounded and {x } converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ. Proof Let K be an index set such that (λ, ν ) = (y, z )/ρ for all K. The index set K is infinite because (λ, ν ) is updated infinitely many times by assumption. This implies, that for all K (2.23) must be satisfied. With Assumption G2 and the assumption that ρ is updated finitely many times, it exists an index set K K such that x x > 0 and ρ ρ for K. Furthermore, we have that c( x) = 0 since otherwise it would contradict Lemma 3.1. We now have to distinguish between two cases: Case {(y, z )} K is bounded: Then it exists an index set K K such that (y, z ) (ȳ, z) for K. This implies λ λ := ȳ/ ρ and ν ν := z/ ρ for K. Dividing (2.23a) and (2.23c) by ρ 1 and letting yields f( x) + c( x)ȳ z = 0 1 ρ 1 ρ 1 ρ X z µe = 0 Together with c( x) = 0 it follows, that ( x, λ, ν) is a first-order optimal point of (1.2). Case {(y, z )} K is unbounded: Then it exists an infinite index set K K such that (y, z ) > 0 and (y, z )/ (y, z ) (ȳ, z) for K with (ȳ, z) = 1. Dividing (2.23a) and (2.23c) by (y, z ) and letting K then yields c( x)ȳ z = 0 and X Ze = 0. Because of (ȳ, z) = 1 and c( x) = 0, it follows that ( x, ȳ, z) is a Fritz-John point of problem (1.1) that fails to hold the MFCQ. Next, we analyze the situation that the filter accepts the trial iterates infinitely many times. Lemma 3.4 Suppose Assumptions G hold. Let the penalty parameter ρ be updated finitely many times and all feasible limit points satisfy the MFCQ. If the filter is augmented infinitely many times, then the multipliers {(λ, ν )} are updated infinitely many times. Proof Let K be an index set of iterations at which the filter accepts the trial iterate, i.e. it is augmented, and for which x x and ρ = ρ holds for K. The latter exists due to Assumption G2 and the assumption that ρ is updated just finitely many times.

18 18 Rene Kuhlmann, Christof Büsens We next show, that δ of (2.14) converges to zero by contradiction. This part of the proof is similar to [36, Theorem 5.1]. Assume that it exists an infinite K K such that δ ε > 0 for K. Due to Assumption G1 and G2 the sequences { f(x ) } K and { c(x ) K } are bounded. This implies that the area in which filter entries (f(x ), c(x ) ) for K are located is bounded by F := {(f, θ) f L f f U, 0 θ θ U }, where f L and f U are the lower and upper bounds of {f(x )} K, respectively, and θ U is the upper bound of { c(x ) } K. With every filter augmentation an area F +1 \ F is added to the filter, which is at least of size δ 2 ε2. Because of the monotonicity F F +1, this is a contradiction to the boundedness of F. It follows δ 0 and, thus, ρ f(x ) + c(x )y z 0 We now have to distinguish between two cases: (3.1a) c(x ) 0 (3.1b) z ρµx 1 e 0. (3.1c) Case {(y, z )} K is bounded: Then, dividing (3.1a) by ρ, multiplying ρ 1 X, which is bounded, in (3.1c) and letting K in (3.1) yields that (2.23) will eventually be satisfied for every ε > 0. Since ρ is updated just finitely many times by assumption, (2.24) must be violated and (2.25) holds. This implies that {(λ, ν )} K is updated infinitely many times. Case {(y, z )} K is unbounded: Then it exists an infinite index set K K such that (y, z ) > 0 and (y, z )/ (y, z ) (ȳ, z) for K with (ȳ, z) = 1. Dividing (3.1a) by (y, z ), multiplying (3.1c) by (y, z ) 1 X and letting K in (3.1) yields that c( x)ȳ z = 0, c( x) = 0 and X z = 0. Because of (ȳ, z) = 1, it follows that ( x, ȳ, z) is a Fritz-John point of problem (1.1) that fails to hold the MFCQ, a contradiction. In the remainder of the global convergence analysis we study the case if both, the penalty parameter ρ and the filter, are updated just finitely many times. Lemma 3.5 Suppose Assumptions G hold. Further assume, that the penalty parameter ρ and the duals (λ, ν ) are updated finitely many times and the filter is augmented finitely many times. Then {x } is bounded away from zero and {z } is bounded. Proof The proof is by contradiction and similar to [10, Lemma 3.7] since λ is assumed to be bounded. Let K be an infinite index set of iterations at which the trial step is accepted by the Armijo condition (2.10). Now assume that it exists an index j {1,..., n} such that x (j) 0 for K.

19 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 19 There exists an index 0 and constants ρ and λ such that ρ = ρ and λ = λ hold for K, 0. Then, on the one hand, for 0 it follows from the Armijo condition (2.10) that Φ µ,ρ,λ, τ (x ) = Φ µ, ρ, λ, τ (x ) Φ µ, ρ, λ, τ (x 0 ). (3.2) On the other hand, however, as { f(x ) } K and { c(x ) } K are bounded due to Assumption G1 and G2, it follows that {Φ µ, ρ, λ, τ (x )} K due to the barrier term, a contradiction to (3.2). Thus, {x } is bounded away from zero and together with (2.22) if follows that {z } is bounded. Lemma 3.6 Suppose Assumptions G hold, the penalty parameter ρ and the duals (λ, ν ) are updated finitely many times and the filter is augmented finitely many times. Further assume that all feasible limit points satisfy the MFCQ. Then the sequences {(x, y, z )}, {( x, y, z )} and ( x, y ) are bounded. Proof Consider the linear equation system ρ 1 H x + c(x )λ ν = f(x ) (3.3a) c(x ) x ρ σ λ = c(x ) ρ σ λ (3.3b) ν = µ X 1 e ρ 1 X 1 Z x (3.3c) which is equal to (2.5) and the linear equation system in [12, Section 1.1] perturbed by (0 1 n, σ ρ λ ) and scaled with ρ. Since ρ is bounded away from zero and {H } is bounded by Assumption G3 we have that the matrix ρ 1 (H + X 1 Z ) is bounded by Lemma 3.5. Furthermore, due to the regularization strategy in Step A-6 condition (2.7) holds. Then, since the right side of (3.3) is bounded due to Assumption G1 and G2 and the assumption that λ is updated finitely many times, we can apply the proof of Lemma 3.3 of [12], which then shows the boundedness of the sequences {(x, y, z )}, {( x, y, z )} and ( x, y ). Lemma 3.7 Suppose Assumptions G hold. Let the penalty parameter ρ be updated finitely many times and all feasible limit points satisfy the MFCQ. If the filter is augmented finitely many times, then the multipliers {(λ, ν )} are updated infinitely many times. Proof The proof is by contradiction. Assume that (λ, ν ) are updated finitely many times. By Lemma 3.5, Lemma 3.6, Assumption G2 and the assumption that ρ is updated finitely many times, it exists an index set K such that x x > 0, x x and ρ = ρ for K. Then, by the same proof as in [12, Lemma 3.8] it can be shown that x = 0 for K. This holds analogously in case the second-order-correction step x is used. Due to the boundedness of all components in (2.5) from Assumption G1 and G2 and Lemma 3.5 and 3.6, this implies that the left sides of (2.23) converge to zero, i.e. for every ε > 0 there is an iteration K such that (2.23) holds.

20 20 Rene Kuhlmann, Christof Büsens Since ρ = ρ for all K the conditions (2.24) must be violated and thus (2.25) be satisfied. It follows that (λ, ν ) is updated infinitely many times in K. Combining the results from Lemma 3.4 with Lemma 3.7 yields that if the penalty parameter ρ does not tend to zero, the iterates either converge to a Fritz-John point that fails to satisfy the MFCQ or the algorithm will update the Lagrangian multipliers (λ, ν ) infinitely many times. This in turn together with Lemma 3.3 states the convergence to a first-order optimal point. However, if the ρ but not the constraint violation tend to zero, then by Lemma 3.2 the algorithm will converge to a first-order optimal point of the feasibility problem (1.4). The global convergence result of Algorithm A for a fixed barrier parameter µ is formally stated in the following theorem. Theorem 3.8 Suppose Assumptions G hold and Algorithm A generates an infinite sequence of iterates. Then there exists an index set K with either of the following: 1. The penalty parameter {ρ } K tends to zero, the multipliers {(λ, ν )} K are updated finitely many times and {(x, z / τ)} K converges to a KKT point ( x, z/ τ) of the feasibility problem (1.4) that is infeasible for (1.1). The sequence {y } K converges to τc( x)/ c( x). 2. The penalty parameter {ρ } K is bounded away from zero, the multipliers {(λ, ν )} K are updated infinitely many times and {(x, λ, ν )} K converges to a first-order optimal point of problem (1.2). 3. The sequence {x } K converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ. 3.2 Global Convergence of the Outer Algorithm B The global convergence of the Algorithm B result is the same as in Theorem 3.13 of [10]. Theorem 3.9 Suppose Algorithm B generates an infinite sequence of iterates. Further suppose Assumption G2 holds with the same bound for every µ j and Assumption G3 and G1 hold for every µ j. Let there be a set {µ j } with µ j 0 and assume that Algorithm A terminates successfully for every µ j. Then it exists an index set K for which one of the following holds: 1. The sequence {(λ j, ν j )} K is bounded and {(x j, λ j, ν j )} K converges to a firstorder optimal point of (1.1). 2. The sequence {(λ j, ν j )} K is unbounded and {x j } K converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ. In summary, Algorithm B always terminates successfully finding an optimal solution of (1.1) or the iterates converge within Algorithm A to an optimal solution of (1.4) that is infeasible for (1.1) or to a Fritz-John point that fails to hold the MFCQ. While the second outcome serves as a certificate of local infeasibility, the latter outcome indicates that locally there may be no feasible first-order optimal point.

21 Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 21 4 Fast Local Convergence The local convergence of the proposed algorithm resembles the one of Chen and Goldfarb [12] since both algorithms eventually switch to a regularized Newton method (in [12] by switching explicitly to that system and here by an update of the Lagrangian multipliers λ). Therefore, we briefly show the two extensions in the local convergence analysis of [12] that have to be done to handle also the presented penalty-interior-point algorithm based on the augmented Lagrangian function: first, the acceptance of the full second-ordercorrection step by the Armijo condition (2.19), and second, the update of the Lagrangian multipliers (λ, ν) in every iteration, both near the optimal solution. Let ( x, λ, ν) be a first-order optimal point of problem (1.1). Then we mae the following assumptions for the local convergence analysis: Assumptions L L1. The functions f and c are real valued and twice continuously differentiable and the Hessian matrices 2 f(x) and 2 c (i) (x) for i = 1,..., m are locally Lipschitz continuous at x. L2. The linear independence constraint qualification (LICQ) holds: the gradients of the active constraints and e i, i B := {j = 1,..., n x (j) = 0} and c (i) ( x), i = 1,..., m are linearly independent. L3. The second-order sufficient conditions (SOSC) hold: it exists v > 0 such that ) d ( 2 xxf(x ) + m i=1 y (i) 2 xxc (i) (x ) d v d 2 for all d R n with d (i) = 0 for all i B and c( x) d = 0. L4. Strict complementarity holds: x + z > Local Convergence of Algorithm A In the case of convergence to a first-order optimal point, we now from Theorem 3.8 that the penalty parameter ρ is bounded away from zero. For simplicity we assume w.l.o.g. that ρ = 1 and τ = τ for all large enough. This also implies that λ = y when updated in an iteration. In this section we consider µ to be sufficiently small and w(µ) := (x(µ), y(µ), z(µ)) to be an optimal solution of (1.2) in a neighborhood w that converges to w for µ 0. Due to the LICQ of Assumption L2 the multipliers of w(µ) are unique. For improved readability, we introduce the notation w := (x, y, z ) for the iterate and for the steps w := ( x, y, z ) and w := ( x, y, z ). The first result shows the quadratic convergence rate of the step w + w and w + w towards w(µ). It is proven in [12, Theorem 5.1] 1. Let N ( w) be 1 Note, that if λ = y /ρ and ρ = 1 the step ( x, y, z ) equals the step ( x, λ, y ) of [12].

22 22 Rene Kuhlmann, Christof Büsens a neighborhood of w such that M 1 M for all w N ( w) and M R, see [12]. Lemma 4.1 Suppose Assumptions L hold. If w N ( w) and λ = y /ρ, then the following is satisfied: 1. w + w w(µ) = O ( w w(µ) 2) 2. w + w w(µ) = O ( w w(µ) 2) 3. w = Ω ( w w(µ) ) 4. w = Ω ( w w(µ) ) Eventually a full second-order-correction step is accepted by the Armijo condition, which is formally stated in the following. Lemma 4.2 Suppose Assumptions L hold. Let ω ( 0, 1 2). If µ is sufficiently small, w w(µ) = o(µ) and λ = y /ρ, then Φ µ (x + x ) Φ µ (x ) ω Φ µ (x ) x. Proof Because µ is sufficiently small and w w(µ) = o(µ), it holds that w N ( w). The proof of this lemma is similar to [12, Theorem 5.4] and we can use its result for estimating the change in the barrier function [12, Theorem 5.4 equations (5.19) - (5.37)]. It states, that for c(x ) > 0 ϕ µ (x + x ϕ µ (x ) ( ) 1 2 ω ϕ µ (x ) x c(x ) (y + y ) + ω τ c(x ) + ωy c(x ) + o( c(x ) ) + o( x 2 ) (4.1) and similar for c(x ) = 0 ϕ µ (x + x ) ϕ µ (x ) ( ) 1 2 ω ϕ µ (x ) x + o( x 2 ) (4.2) where ω is an arbitrary constant satisfying ω (0, 1 2 ω). Furthermore, it is shown that c(x + x ) = o( c(x ) + o( x 2 ) (4.3) which is a consequence of Lemma 4.1. Now, for c(x > 0 by combining (4.1), (4.3) with Proposition 2.3 and using λ = y and w = o(µ), which is a consequence of Lemma 4.1 and

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global