Nonmonotone Trust Region Methods for Nonlinear Equality Constrained Optimization without a Penalty Function

Size: px

Start display at page:

Download "Nonmonotone Trust Region Methods for Nonlinear Equality Constrained Optimization without a Penalty Function"

Rudolf O’Brien’
5 years ago
Views:

1 Nonmonotone Trust Region Methods for Nonlinear Equality Constrained Optimization without a Penalty Function Michael Ulbrich and Stefan Ulbrich Zentrum Mathematik Technische Universität München München, Germany Technical Report, December 2000

3 NON-MONOTONE TRUST REGION METHODS FOR NONLINEAR EQUALITY CONSTRAINED OPTIMIZATION WITHOUT A PENALTY FUNCTION MICHAEL ULBRICH AND STEFAN ULBRICH Abstract We propose and analyze a class of penalty-function-free nonmonotone trust-region methods for nonlinear equality constrained optimization problems The algorithmic framework yields global convergence without using a merit function and allows nonmonotonicity independently for both, the constraint violation and the value of the Lagrangian function Similar to the Byrd Omojokun class of algorithms, each step is composed of a quasinormal and a tangential step Both steps are required to satisfy a decrease condition for their respective trust-region subproblems The proposed mechanism for accepting steps combines nonmonotone decrease conditions on the constraint violation and/or the Lagrangian function, which leads to a flexibility and acceptance behavior comparable to filter-based methods We establish the global convergence of the method Furthermore, transition to quadratic local convergence is proved Numerical tests are presented that confirm the robustness and efficiency of the approach Key words nonmonotone trust-region methods, sequential quadratic programming, penalty function, global convergence, equality constraints, local convergence, large-scale optimization AMS subject classifications 65K05, 90C30 1 Introduction We consider the nonlinear equality constrained optimization problem min f (x) subject to c(x) = 0 (11) with continuously differentiable functions f : R n R and c : R n R m For the solution of (11) we propose a method that is inspired by the class of trust-region algorithms introduced by Byrd [2], Omojokun [23], and Dennis, El-Alem, and Maciel [9], but with the important difference that our algorithm does not use a penalty or augmented Lagrange function to test the acceptability of steps Hereby, we are motivated by the impressing efficiency of sequential quadratic programming (SQP) filter methods, which were recently introduced by Fletcher and Leyffer [17] The algorithm that we investigate here does not use the concept of a filter Rather, it applies nonmonotone trust-region techniques independently to the quasi-normal subproblem and the tangential subproblem This strategy admits a flexibility in accepting steps that is comparable with filter methods Besides global convergence, our approach has two favorable properties that appear to be new for algorithms without a penalty function: (a) The method does not require a restoration procedure (b) We prove that the algorithm converges locally q-quadratically, even without an additional second order correction that is needed by many algorithms to avoid the Maratos effect For SQP filter methods global convergence has been established in Fletcher, Gould, Leyffer, and Toint [16], whereas a local convergence theory is not yet available Recently, a globally convergent primal-dual interior-point filter method was introduced by Ulbrich, Ulbrich, and Vicente [31] Except for the method presented in this paper, filter methods and its predecessor, the tolerance-tube approach by Zoppke-Donaldson [34], are the only algorithms for NLP we are aware of that do not require a penalty function We will use [2], [23], and [9] as our main references on trust-region methods for equality constrained nonlinear programming However, there are several related approaches and recent extensions that should be mentioned Regarding related work, we refer to Byrd, Schnabel, and Shultz [5], Celis, Dennis, and Tapia [6], El Alem [13, 14], Powell and Yuan [26], Lehrstuhl für Angewandte Mathematik und Mathematische Statistik, Zentrum Mathematik, Technische Universität München, D München, Germany, mulbrich@matumde Lehrstuhl für Angewandte Mathematik und Mathematische Statistik, Zentrum Mathematik, Technische Universität München, D München, Germany, sulbrich@matumde 1

4 2 M ULBRICH AND S ULBRICH and Vardi [32] Recent contributions to the analysis of trust-region methods for equality constrained problems include Dennis and Vicente [12], El Alem [15], Lalee, Nocedal, and Plantenga [20] Several extensions to problems involving inequality constraints have been proposed Here we mention only those methods that extend the ideas of Byrd [2], Omojokun [23] and Dennis, El Alem, and Maciel [9] Some of these algorithms are based on trust-region methods for box-constrained problems, see, eg, Coleman and Li [7], Conn, Gould, and Toint [8], Dennis and Vicente [11], Lin and Moré [22], Ulbrich, Ulbrich and Heinkenschloss [30], and combine them with the above approaches to handle additional equality constraints Algorithms of this type were investigated by Dennis, Heinkenschloss, Vicente [10], Plantenga [24], and Vicente [33] Byrd, Gilbert, and Nocedal [3] and Byrd, Hribar, and Nocedal [4] take a different approach by solving a sequence of equality constrained barrier problems The above references underline the important role of methods for nonlinear equality constrained optimization problems, both as stand alone methods and as solvers for subproblems This paper is organized as follows In section 2 the algorithm is developed We introduce the quasi-normal and the tangential trust-region subproblem and describe the model decrease conditions for the respective trial steps The nonmonotone decrease conditions for constraint violation and Lagrangian function, respectively, which are the key ingredients of the new algorithm, are developed in sections 21 and 22 The full algorithm is formulated in section 23 In section 3 the global convergence of the algorithm is established We first state the main result in section 31 The global convergence analysis starts in section 32 with the proof of well definedness Section 33 is devoted to the development of nonmonotone decrease results Convergence to feasible points is proved in section 34, convergence to stationary points in 35 In section 4 we show that with a Newton-type step computation the algorithm converges locally quadratically Numerical results for problems from the CUTE collection [1] are presented in section 5 Notations Throughout the paper, denotes the Euclidean norm 2 The gradient of f is denoted by f and c denotes the transposed Jacobian of c We use the abbreviations g = f and A = c 2 Development of the algorithm We denote the gradient of the objective function f by g and write A for the transposed Jacobian of c: g(x) = f (x) R n, A(x) = c(x) R n m Following Byrd [2], Omojokun [23] and Dennis, El Alem, and Maciel [9], we obtain the trial step s k = sk t +sn k at the current iterate x k by computing a quasi-normal step sk n and a tangential step sk t The purpose of the quasi-normal step sn k is to improve feasibility It is obtained as approximate solution of the trust-region subproblem min c(x k ) + A(x k ) T s n 2 subject to s n k, (21) where k > 0 denotes the trust-region radius Our requirements on the steps sk n are that there exist constants K 1, K 2 > 0, independent of k, such that sk n admits the upper bound and satisfies the decrease condition s n k min{ K 1 c k, k }, (22) c k 2 c k + A T k sn k 2 K 2 c k min { c k, k } (23) As, eg, in [9], we will assume that the matrices A(x k ) T A(x k ) are nonsingular with uniformly bounded inverses for all k Then it is well known that the Cauchy point, which is the solution

5 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 3 of (21) along the direction of steepest descent at s n = 0, satisfies the conditions (22) and (23) for appropriate constants K 1 and K 2 The assumptions stated below ensure the existence of constants K 1 and K 2 that are suitable for all iterations k Therefore, the conditions (22) and (23) can be implemented by a fraction of Cauchy decrease condition To improve optimality we seek sk t in the tangent space of the linearized constraints in such a way that it provides sufficient decrease for a quadratic model of the Lagrange function l(x, y) = f (x) + y T c(x), y R m, under a trust-region constraint To this end, we define a quadratic model q k (s) = (g(x k ) + A(x k )y k ) T s st H k s about the current point (x k, y k ) that approximates l(x k + s, y k ) l(x k, y k ) Here, H k is a symmetric approximation of x 2l(x k, y k ) Based on this model, the tangential step sk t is computed as approximate solution of the trust-region subproblem satisfying the decrease condition min q k (s n k + st ) subject to A(x k ) T s t = 0, s t k (24) q k (s n k ) q k(s n k + st k ) K 3 W T k q k(s n k ) min { W T k q k(s n k ), k with a constant K 3 > 0 independent of k, and the feasibility condition }, (25) A(x k ) T s t k = 0, st k k (26) Hereby, W k = W(x k ), where W(x) denotes a matrix whose columns form a basis of the null space of A(x) T Note that Wk T q k(s t ) is the reduced gradient of q k in terms of the representation s t = W k d of the tangential step: d ( qk (W k d) ) = W T k q k(w k d) = W T k q k(s t ) Therefore, (25) can be realized by a fraction of Cauchy decrease condition for the reduced function d q k (s n k + W kd) subject to the constraint W k d k To simplify notation we will use the abbreviations f k = f (x k ), c k = c(x k ), l k = l(x k, y k ), etc Moreover, it will be convenient to introduce the reduced gradient ĝ(x) = W(x) T g(x) Then the first order necessary optimality conditions (Karush Kuhn Tucker or KKT conditions) at a local solution x R n of (11) can be written as c( x) = 0, ĝ( x) = 0 The algorithm is based on a combination of nonmonotone decrease criteria for the quasinormal and tangential steps Non-monotone trust-region methods were investigated by Toint [28] and Ulbrich [29] We follow [29] and compare the predicted decrease promised by the trust-region model with a relaxation of the actual decrease to decide whether a step is acceptable or not Before we give a precise description of the algorithm, we introduce our assumptions for the global convergence analysis Assumptions: There exist an open convex set R n and a closed convex set ˆ with dist( ˆ, R n \ ) > 0 such that:

6 4 M ULBRICH AND S ULBRICH (A1) The functions f : R and c : R m are continuously differentiable (A2) The matrix A(x) = c(x) has full rank for all x (A3) The functions f, g = f, c, A = c, (A T A) 1, W, and (W T W) 1 are uniformly bounded on Hereby, W(x) denotes a matrix whose columns form a basis for the null space of A(x) T (A4) For all k, x k is in ˆ, and x k + s n k as well as x k + s k are in (A5) The matrices H k and the multiplier estimates y k are uniformly bounded for all k (A6) The derivatives g = f and A = c are Lipschitz continuous on In section 4 we will moreover require the following assumption in order to show transition to locally quadratic convergence in a neighborhood of a stationary point x satisfying second order sufficient conditions (A7) The functions f : R and c : R m are twice continuously differentiable Furthermore, there exists a neighborhood N of x on which 2 f and 2 c are Lipschitz continuous and H k = 2 x l(x k, y k ) for all x k N 21 A nonmonotone decrease condition for the constraint violation The decrease condition (23) for the quasi-normal step guarantees that the predicted reduction for the feasibility violation c 2, admits the estimate pred c k def = c k 2 c k + A T k s k 2, pred c k K 2 c k min{ c k, k } (27) Note hereby that Ak T s k = Ak T sn k Clearly, the requirement that the actual reduction ared c k def = c k 2 c(x k + s k ) 2 should be a fraction of the predicted reduction is too restrictive, since it could impose severe restrictions on the tangential step if the feasibility is too good in comparison to the norm of the reduced gradient In order to relax the feasibility requirement in this case and to allow nonmonotonicity we accept the step for the constraints if rared c k ρ 1pred c k, ρ 1 (0, 1) fixed, with the relaxed actual reduction rared c k { ν k c 1 } def = max R k, λ c kr c k r 2 c(x k + s k ) 2 Hereby, we require that with fixed parameters ν c N and λ (0, 1/ν c ) holds r=0 ν c k = min{k + 1,νc }, λ c kr λ > 0, νc k 1 R k c k 2, usually R k = c k 2 r=0 λ c kr = 1,

7 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 5 Before we discuss the choice of R k, we notice that a maximum of nonmonotonicity is achieved by selecting an index r, 0 r ν c k 1, such that c k r = max 0 r<ν c k c k r and setting λ c kr = 1 (ν c k 1)λ, and λc kr = λ for r r The choice of R k is an important issue in the design of the method It is done in such a way that the feasibility requirement is relaxed if the feasibility is much better than the stationarity, ie, if c k ĝ k If this situation is detected then instead of R k = c k 2 a larger value is chosen In order to keep a minimum of control over the constraint violation we choose R k not larger than some upper bound a jk Hereby, (a j ) is a slowly decreasing sequence tending to zero and j is only increased, ie, j k+1 = j k +1, if R k yields the maximum in the first term of rared c k Thus, let (a j) be a sequence with a j > 0, 0 < α 0 a j+1 a j < 1, lim j a j = 0, and a η j =, (28) j=0 where η > 4/3 is a fixed constant The following algorithm describes how R k is updated: Algorithm R: (Update of R k ) Let 0 < α,β < 1/2 If c k < min { αa jk,β ĝ k } then Set R k := min{a 2 j k, ĝ k 2 } If R k ν c k 1 r=0 λc kr c k r 2 then set j k+1 := j k + 1, else set j k+1 := j k Otherwise, set R k := c k 2 and j k+1 := j k 22 A nonmonotone decrease condition for the Lagrangian function To evaluate the descent properties of the step for the objective function we use the predicted tangential reduction of the Lagrangian l pred t k the predicted reduction of l for the whole step and the relaxed actual reduction of l rared l k def = q k (s n k ) q k(s k ), pred l k { ν k l 1 def = max l k, r=0 def = q k (s k ), λ l kr l k r where as above with fixed ν l N and λ (0, 1/ν l ) } l(x k + s k, y k ) ν l k = min{k + 1,νl }, λ l kr λ > 0, νl k 1 r=0 λ l kr = 1 REMARK 21 Another natural choice would be to use l(x k + s k, y k+1 ) in the definition of rared l k with a new multiplier estimate y k+1 and to add the term (y k y k+1 ) T (c k + A T k s k) in

8 6 M ULBRICH AND S ULBRICH the definition of pred t k and predl k Our convergence analysis can easily be adapted to handle this case as well and even simplifies On the other hand, we prefer to work only with y k until an acceptable step is found, since the computation of new multipliers y k+1 requires the usually costly evaluation of A(x k + s k ) The computation of s k ensures that the tangential step sk t provides decrease for the quadratic model q k (sk n + st ), since pred t k satisfies (25) However, this descent can be destroyed by the normal step sk n, if sn k is too large compared with ĝ k This motivates the following admissibility criterion: If pred l k promises sufficient decrease for the whole step, more precisely, if pred t k max{predc k,(predc k )µ } and pred l k γ predt k, µ (2/3, 1), γ (0, 1), then we require rared l k ρ 1pred l k This leads to the following evaluation of trial steps Evaluation of steps: Let µ (2/3, 1), γ (0, 1), ρ 1 (0, 1) Accept the trial step s k = s n k + st k if s k is acceptable for the constraints, ie, rared c k ρ 1pred c k, and if s k is acceptable for the objective function: If pred t k max{predc k,(predc k )µ } and pred l k γ predt k then rared l k ρ 1pred l k holds If the step is not acceptable then the trust-region radius k is reduced and the step s k is recomputed 23 The Algorithm We now give a complete statement of the algorithm Algorithm A: Let 0 < ρ 1 < ρ 2 < 1, 0 < γ 1 < 1 < γ 2, 0 < α,β < 1/2, 2/3 < µ < 1, and 0 < γ < 1 Fix α 0 (0, 1) and choose a sequence (a j ) satisfying the conditions in (28) Choose an initial point x 0, and an initial trust-radius 0 min > 0 Set ν := 1, k := 0, and j 0 := 0 1 (Evaluate functions at x k ) Compute c k, A k, W k, f k, g k, ĝ k := Wk T g k, and a Lagrange multiplier estimate y k 2 (Check for termination) If c k + ĝ k = 0: STOP 3 (Update R k ) Choose the weights λ c/l kr for rared c/l k Update R k by calling Algorithm R 4 (Compute trial steps) Compute a quasi-normal step sk n satisfying (22), (23), and a tangential step st k satisfying (25), (26) Set s k := sk n + st k

9 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 7 5 (Test if s k is acceptable) If pred l k γ predt k and predt k max { pred c k,(predc k )µ} then goto Step 51, else goto Step If rared c k < ρ 1 pred c k or raredl k < ρ 1 pred l k then set k := γ 1 k and goto Step 4 Else choose k+1 [max{ min, k }, max{ min,γ 2 k }], set x k+1 := x k + s k, k := k + 1 and goto Step 1 52 If rared c k < ρ 1 pred c k then set k := γ 1 k and goto Step 4 Else choose k+1 [max{ min, k }, max{ min,γ 2 k }], set x k+1 := x k + s k, k := k + 1 and goto Step 1 In our formulation of the algorithm we have avoided a further index that distinguishes between different instances of trial steps at iteration level k To prevent possible ambiguities, we use the following Notation: We say that the step s k is accepted (or successful) if it is used in Step 51 or 52 to compute the new iterate, ie, x k+1 = x k + s k If it is necessary to reference the accepted, ie, final values of s k, sk n, st k, k, pred c k, predl k, predt k, raredc k, and raredl k at iteration level k, we denote them by s k,a, sk,a n, st k,a, k,a, pred c k,a, predl k,a, predt k,a, raredc k,a, and raredl k,a, respectively 3 Global convergence analysis 31 Statement of the global convergence result The following theorem states the global convergence properties of Algorithm A THEOREM 31 (i) Under assumptions (A1) (A3), Algorithm A is well defined as long as x k stays in Moreover, if Algorithm A does not terminate finitely, then the following holds: (ii) If assumptions (A1) (A5) are satisfied, then lim c k = 0 k (iii) If assumptions (A1) (A6) are satisfied, then in addition lim inf k ĝ k = 0 The proof requires several steps and is carried out in the remainder of this section In particular, part (i) is proved in section 32, Lemma 32, part (ii) in section 34, Lemma 37, and part (iii) in section 35, Lemma 39 A local convergence analysis showing the transition to fast local convergence under suitable conditions on the step computation will be given in section 4 Throughout the remainder of this section, we will not consider the case where Algorithm A terminates successfully in Step 1, since in this situation the global convergence is trivial For the convergence analysis it will be convenient to introduce also the actual reduction of the Lagrangian ared l k def = l k l(x k + s k, y k ) The following estimates, obtained by the mean value theorem, will be used several times in this section We recall that s k = s n k + st k, AT k s k = A T k sn and max{ s n k, st k } k Assume that x k and x k +s k are contained in Denoting by τ [0, 1] an appropriate generic constant that is adjusted from case to case, and writing x τ k = x k + τ s k we find that under the assumption (A1) holds ared c k predc k A k A T k 2 k + 4 A(xτ k )c(xτ k ) A kc k k, (31) ared l k predl k 2( g(x τ k ) g k + (A(x τ k ) A k)y k ) k + 2 H k 2 k (32)

10 8 M ULBRICH AND S ULBRICH 32 Well definedness We start by showing that the algorithm is well defined, and thus establish part (i) of Theorem 31 We first note that under assumptions (A1) (A3) normal steps sk n satisfying (22), (23), and tangential steps st k satisfying (25), (26) can be obtained by enforcing a fraction of Cauchy decrease condition Hereby, (A3) ensures that the constants K 1, K 2, K 3 in (22), (23), and (25) can be chosen independently of k as long as x k We refer to [9] and, for details on the practical computation of steps providing a fraction of Cauchy decrease, to [20] LEMMA 32 Let the assumptions (A1) (A3) hold Then for x k with c k + ĝ k > 2ε > 0 there exists δ > 0 such that the step s k is accepted in Step 5 of Algorithm A whenever k δ Hence, the algorithm is well defined as long as x k Further, if also (A4) and (A5) hold and if g and A are uniformly continuous on then δ can be chosen depending only on max{min{ε, c k }, min{ε, a jk }},, ˆ, and the bounds in assumption (A3) and (A5) Proof We start with a δ (0,ε] such that the closed δ-ball about x k lies in If x k ˆ, we always achieve this by choosing δ = min { ε, 1 2 dist( ˆ, R n \ ) } Further adjustment of δ will be performed as the proof proceeds Case 1: c k ε Since δ ε, (27) implies that for any k δ we have pred c k K 2ε k Now reduce δ such that for all s R n, s 2δ, holds A k A T k δ + 4 A(x k + s)c(x k + s) A k c k (1 ρ 1 )K 2 ε, (33) which is possible by assumption (A1) This together with (31) implies rared c k predc k aredc k predc k ρ 1pred c k + (1 ρ 1)(pred c k K 2ε k ) ρ 1 pred c k If pred l k γ predt k and predt k max{predc k,(predc k )µ } then we have to satisfy the additional condition rared l k ρ 1pred l k (see Step 51) To achieve this, we note that in this case holds pred l k γ predc k γ K 2ε k, where we have used (27) We now reduce δ > 0 such that for all s R n, s 2δ, holds H k δ + g(x k + s) g k + (A(x k + s) A k )y k γ 2 (1 ρ 1)K 2 ε, (34) which can be done by assumption (A1) Then (32) ensures that the test in Step 51 is passed for all k δ Hence, the step is accepted if k δ If (A4) and (A5) hold and if g, A are uniformly continuous on then our mechanism of reducing δ can be done depending only on ε = min{ε, c k },, ˆ and on the bounds in the assumptions Case 2: ĝ k > ε By assumption (A3) there exists K 7 > 0 with W T k q k(s n k ) ĝ k K 7 k Hence, if we reduce δ such that δ 2K 1 7 ε then for all k δ holds k 2K 1 7 ĝ k and therefore by (25) pred t k K 3 4 ε min {ε, k} = K 3 4 ε k (35) Case 21: pred t k < predc k Then pred c k K 3 4 ε k for k δ We reduce δ until for all s R n with s 2δ holds A k A T k δ + 4 A(x k + s)c(x k + s) A k c k (1 ρ 1 ) K 3 4 ε

11 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 9 This is possible by (A1) Invoking (31), we obtain rared c k aredc k ρ 1pred c k whenever k δ and thus the trial step is accepted in Step 52 If (A4) and (A5) hold and g, A are uniformly continuous, δ can be chosen depending only on ε and on the bounds in the assumptions Case 22: pred t k predc k For the acceptance of the step we have to make sure that rared c k ρ 1pred c k The additional condition rared l k ρ 1pred l k is only required if also predl k γ predt k In this case, the latter requirement is met by noting that (A1) allows to reduce δ such that for all s R n, s 2δ, holds H k δ + g(x k + s) g k + (A(x k + s) A k )y k γ 2 (1 ρ 1) K 3 4 ε Then, by (32) and (35), rared l k aredl k ρ 1pred l k if k δ If (A4) and (A5) hold and if g and A are uniformly continuous then δ can be chosen depending only on ε The first requirement rared c k ρ 1pred c k can be achieved by reducing δ further according to the following cases: Case 221: c k > min { αa jk,βε } def = ε jk Reduce δ such that δ c k and that for all s R n, s 2δ, holds A k A T k δ + 4 A(x k + s)c(x k + s) A k c k (1 ρ 1 )K 2 c k This is again possible by (A1) By (27) we have pred c k K 2 c k k if k δ Hence, rared c k aredc k ρ 1pred c k by (31) whenever k δ and therefore the step is accepted in 51 If (A4) and (A5) hold and if g and A are uniformly continuous then δ can be chosen depending only on min{ε, c k } > min { αa jk,βε } and the bounds in the assumptions Case 222: c k ε jk Then R k min{a 2 j k,ε 2 } ε 2 j k / max{α 2,β 2 } 4ε 2 j k and with suitable τ [0, 1] and x τ k = x k + τ s k holds rared c k R k c k 2 (A(x τ k )c(xτ k ))T s k Since (A k c k ) T s k = (A k c k ) T s n k 0, this gives rared c k 3ε2 j k A(x τ k )c(xτ k ) A kc k k Moreover, pred c k c k 2 ε 2 j k Now reduce δ such that for all s R n, s 2δ, holds 3ε 2 j k A(x k + s)c(x k + s) A k c k δ ρ 1 ε 2 j k This is possible by (A1) Then rared c k ρ 1pred c k and, hence, the trial step is accepted in Step 52 for all k δ If (A4) and (A5) hold and if g and A are uniformly continuous then δ can be chosen depending only on min{αa jk,βε} = ε jk min{ε, c k } and the bounds in the assumptions 33 A nonmonotone decrease Lemma The following crucial decrease Lemma is a slight modification of [29, Lem 43] LEMMA 33 Suppose that there exists K 0 such that for all iteration levels k K holds rared c k,a = max { c k 2, ν c k 1 r=0 λ c kr c k r 2 } c k+1 2

12 10 M ULBRICH AND S ULBRICH Then for all k K c k+1 2 max R K νk c <l K l ρ 1 k λ min{k r,νc} predr,a c (36) Proof Set M K = max K ν c K <l K R l and ared c k,a = c k 2 c k+1 2 The proof is by induction Since R l c l 2, 0 l K, we have for k = K r=k M K c k+1 2 rared c k,a ρ 1pred c k,a Now let k K If rared c k+1,a = aredc k+1,a we get by (36) k c k+2 2 = c k+1 2 rared c k+1,a M K ρ 1 k λ min{k r,νc} predr,a c ρ 1pred c k+1,a, which implies (36) k+1, since 0 < λ < 1 Now consider the case where rared c k+1,a aredc k+1,a Then we obtain with q = νc k+1 1 by using (36) and the fact that c k+1 p 2 M K for K ν c K < k + 1 p K, c k+2 2 = q λ c k+1,p c k+1 p 2 rared c k+1,a p=0 q p=0 λ c k+1,p k q M K ρ 1 r=k k p M K ρ 1 λ min{k p r,νc} predr,a c ρ 1 pred c k+1,a r=k ρ 1 pred c k+1,a r=k λ min{k r,νc} pred c r,a ρ 1λ k r=max{k,k q+1} λ min{k r,νc} pred c r,a Now r k q + 1 yields k r q 1 ν c 2 and therefore 1 + min {k r,ν c } = min {k + 1 r,ν c } Thus, we see from the last chain of inequalities that c k+2 2 M K ρ 1 k λ min{k+1 r,νc} predr,a c ρ 1pred c k+1,a r=k k+1 = M K ρ 1 r=k λ min{k+1 r,νc }pred c r,a which concludes the proof To show the convergence towards stationary points we will moreover use the following decrease Lemma, which is very similar to the previous Lemma 33 LEMMA 34 Suppose that there exists K 0 such that for all iteration levels k K holds Then for all k K l k+1 rared l k,a + l(x k+1, y k ) l k+1 ρ 1 2 predl k,a max l l ρ 1 K νk l <l K 2 k λ min{ k r,ν l} predr,a l (37) r=k

13 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 11 Proof We only note that by the definition of rared l k,a holds { ν rared l k,a + l(x k l 1 k+1, y k ) l k+1 = max l k, r=0 λ l kr l k r } l k+1 Now (37) follows exactly by the same arguments as in the proof of Lemma (33) 34 Convergence to feasible points The following auxiliary result will be useful: LEMMA 35 Let the assumptions (A1) (A4) hold If j is increased by Algorithm R in iteration k, ie, j k+1 = j k + 1, then c k 1 λ a jk for all k k In particular, for all iterations k with j k 1 holds c k a j k λα0, (38) where α 0 is the constant in (28) Proof If j k+1 = j k + 1 then we must have c k < min{αa jk,β ĝ k } αa jk and a 2 j k R k ν c k 1 r=0 λ c kr c k r 2 Thus, using λ c kr λ, we obtain c k 2 1 λ a2 j k, k = k + 1 ν c k,,k We now show by induction c k 1 λ a jk for all k k + 1 ν c k (39) For k = k + 1 νk c,, k this is already shown Now let the assertion hold for the iterations k + 1 νk c,,k k Since by Lemma 32 the k -th iteration will eventually be successful, we obtain in particular that 0 ρ 1 pred c k,a raredc k,a = max {R k, ν c k 1 r=0 λ c k r c k r 2 } c k +1 2 { } } Since R k max c k 2, a 2 j max { c k k 2, a 2, we have by the induction hypothesis jk { c k +1 2 max a 2 j k, c k 2,, c k +1 ν c k 2} 1 λ a2 j k This proves the first assertion

14 12 M ULBRICH AND S ULBRICH Now c k a jk / λ follows immediately if j is increased by Algorithm R in iteration k, ie, j k+1 = j k + 1 For all subsequent iterations k satisfying j k = j k+1 we have by our previous result and by (28) c k a j λ k = a j k 1 a j k λ α 0 λ Therefore, (38) holds for all k with j k 1 The next Lemma shows that c k must converge to zero if the assumption of the decrease Lemma 33 does not hold LEMMA 36 Let the assumptions (A1) (A4) hold If for infinitely many iterations holds rared c k,a max { c k 2, ν c k 1 r=0 λ c kr c k r 2 } c k+1 2 then j k and c k 0 Proof Under the assumptions of the Lemma, there exists an infinite subsequence of iterations k for which holds ν { c k 1 R k = min a 2 j, ĝ k k 2} > c k 2 and R k > λ c k r c k r 2 Hence, j k +1 = j k + 1 in each iteration k and thus we must have j k But now Lemma 35 yields lim c a jk k lim λα0 = 0 k k r=0 Combining Lemmas 33 and 36, we can establish convergence to feasible points, which proves part (ii) of Theorem 31 LEMMA 37 If Algorithm A does not terminate finitely then lim c k = 0 k Proof Assume that c k does not tend to zero Then Lemma 36 yields possibly after increasing K rared c k,a = max { c k 2, ν c k 1 r=0 λ c kr c k r 2 } c k+1 2 for all k K (310) Thus Lemma 33 is applicable and we obtain that for all k K holds c k+1 2 M K ρ 1 λ νc We first show that k r=k pred c k,a, where M K = max K ν c K <l K R l (311) lim inf k c k = 0 (312)

15 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 13 If this is wrong then possibly after increasing K there exists ε > 0 with c k ε for all k K Now by (23) which, together with (311), shows that pred c k,a K 2ε min { ε, k,a }, (313) k,a < (314) k=k Thus, (x k ) ˆ is a Cauchy sequence and converges to some x ˆ The continuity of c, A, and g and the boundedness of (H k ) and (y k ), see (A1) and (A5), implies the existence of 0 < δ ε and δ > 0 such that for all x k with x k x δ and all s with s 2δ the inequalities (33) and (34) are satisfied In the proof of Lemma 32, Case 1, it was shown (note c k ε) that the step is accepted if (33), (34) hold for all s with s 2δ and if in addition k δ Since for all sufficiently large k K we have x k x δ, the mechanism of updating k would thus ensure that the step is accepted with k,a min { min,γ 1 δ} This contradicts (314) and (312) is proven Now assume that (312) holds, but c k does not converge to zero Then there is ε > 0 with cˆk 2ε for a subsequence (ˆk) By (312), we can associate with each ˆk some k ˆk with c k+1 < ε, c k ε, k = ˆk,, k As a consequence we have by (23) pred c k,a K 2ε min { } ε, k,a, k = ˆk,, k (315) Since moreover (311) holds, we must have k k=ˆk and thus by (315) and s k,a 2 k,a pred c k,a 0 for ˆk x k+1 xˆk 2 k k=ˆk k,a 0 for ˆk Since c is Lipschitz continuous on by (A3), we conclude that ε = 2ε ε c k+1 cˆk c k+1 cˆk 0 for ˆk which is a contradiction Hence, our assumption was wrong and the proof is complete 35 Convergence to stationary points As a next step we consider the convergence behavior of the reduced gradient ĝ k = W T k g k The following Lemma gives an important lower bound for acceptable trust-region radii Hereby, we establish two variants of the result One holds in the general setting of assumptions (A1) (A6) The second result is stronger and holds under assumption (A7) It is used in section 4 to achieve locally quadratic convergence LEMMA 38 Let the assumptions (A1) (A6) be satisfied Then the following holds

16 14 M ULBRICH AND S ULBRICH (i) There exists a constant κ 1 > 0 independent of k such that rared c k ρ 1pred c k is satisfied whenever { max{ s k n, st k } δ k = def κ 1 min 1, max{min{a jk, ĝ k }, c k } 2/3} (316) (ii) There exists a constant κ 2 > 0 independent of k such that the step s k is accepted whenever max{ s n k, st k } min { δ k,κ 2 max{ ĝ k, c k } } (317) with δ k as in (i) (iii) If in addition (A7) holds then there exists θ > 0 and κ 1 > 0 in (i) can be chosen such that the step s k is accepted whenever x k x < θ and max{ s n k, st k } min { δ k,κ 1 max{ ĝ k, c k } µ} (318) Proof Set σ k = max{ s n k, st k } and note that σ k k (i): Taylor expansion yields with x τ k = x k + τ s k and appropriate τ [0, 1] ared c k predc k 4 A(xτ k )A(xτ k )T A k A T k σ 2 k c(x τ k ) c(xτ k ) σ 2 k Using (A3), (A6) we conclude that there exists K 5 > 0 with We now consider two cases ared c k predc k K 5σ 2 k (σ k + c k ) (319) Case 1: c k min{αa jk,β ĝ k } Since rared c k aredc k, the decrease condition (23) and (319) ensure that raredc k ρ 1pred c k holds if K 5 σ 2 k (σ k + c k ) (1 ρ 1 )K 2 c k min{ c k,σ k } (320) If c k σ k, (320) holds if 2K 5 σ 3 k (1 ρ 1)K 2 c k 2, ie, if In the case c k > σ k, (320) is satisfied if σ k C 1/3 1 c k 2/3, C 1 def = (1 ρ 1)K 2 2K 5 2K 5 σ 2 k c k (1 ρ 1 )K 2 c k σ k, ie, if σ k C 1 Since c k min{αa jk,β ĝ k } the assertion (i) thus holds with κ 1 = C 2 def = min{c 1, C 1/3 1 min{α,β} 2/3 } Case 2: c k < min{αa jk,β ĝ k } Then R k = min{a 2 j k, ĝ k 2 } according to Algorithm R We obtain rared c k R k c k+1 2 = R k c k 2 + pred c k + (aredc k predc k )

17 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 15 Using (319) we get rared c k predc k if R k c k 2 K 5 σ 2 k (σ k + c k ) (321) Case 21: R k = ĝ k 2 Then R k c k 2 (1 β 2 ) ĝ k 2 since c k β ĝ k, and (321) is satisfied if If ĝ k > σ k, (322) holds if (1 β 2 ) ĝ k 2 K 5 σ 2 k (σ k + β ĝ k ) (322) (1 β 2 ) ĝ k 2 K 5 (1 + β) ĝ k 2 σ k, ie, if σ k C 3 def = 1 β K 5 If ĝ k σ k, (322) holds if (1 β 2 ) ĝ k 2 K 5 (1 + β)σ 3 k, ie, if σ k C 1/3 3 ĝ k 2/3 Therefore, since c k β ĝ k, (322) holds if { σ k min C 3, C 1/3 3 max { ĝ k, c k } } 2/3 Case 22: R k = a 2 j k Then R k c k 2 (1 α 2 )a 2 j k and c k αa jk As in Case 21, (321) holds if which yields with c k αa jk { } σ k min C 4, C 1/3 4 a 2/3 def j k, C 4 = 1 α, K 5 σ k min { C 4, C 1/3 4 max { a jk, c k } 2/3 } { } Thus, the assertion (i) is proven with κ 1 = min C 2, C 3, C 1/3 3, C 4, C 1/3 4 (ii): If pred t k < max { pred c k,(predc k )µ} or pred l k < γ predt k then no further acceptance criteria are required and we are done Otherwise, we have pred t k max{ pred c k,(predc k )µ} and pred l k γ predt k, (323) and get a further restriction on σ k by the requirement rared l k ρ 1pred l k Now (32) and (A6) yield a constant K 6 > 0 with ared l k predl k K 6σ 2 k (324) We consider first the case that c k ĝ k We know that in the present case holds pred l k γ predt k γ predc k γ K 2 c k min { c k,σ k } Since rared l k aredl k, we conclude from (324) that raredl k ρ 1pred l k is ensured if K 6 σ 2 k γ(1 ρ 1)K 2 c k min { c k,σ k }

18 16 M ULBRICH AND S ULBRICH This is satisfied if σ k C 5 c k, { ( ) } γ(1 ρ 1 )K 2 γ(1 ρ1 )K 1/2 2 C 5 = min, K 6 K 6 Hence, in the case c k ĝ k the step is accepted if (317) holds with κ 2 = C 5 We now consider the case ĝ k c k By (A3) and (A5) there exists a constant K 7 > 0 with W T k q k(s n k ) ĝ k K 7 s n k ĝ k K 7 σ k, (325) W T k q k(s n k ) ĝ k + K 7 s n k ĝ k + K 1 K 7 c k, (326) where we have used (22) in the last inequality Thus, we obtain for σ k 1 2K 7 ĝ k by (25) and (325) pred l k γ predt k γ K 3 4 ĝ k min { ĝ k,σ k } (327) Hence, by (324) we have rared l k ρ 1pred l k whenever σ k 1 2K 7 ĝ k and K 6 σ 2 k γ(1 ρ 1) K 3 4 ĝ k min { ĝ k,σ k } All this is satisfied whenever { 1 σ k C 5 ĝ k, C 5 = min, γ(1 ρ ( ) } 1)K 3 γ(1 ρ1 )K 1/2 3, 2K 7 4K 6 4K 6 Hence, the step is accepted if (317) holds with κ 2 = min{c 5, C 5 }, which completes the proof of (ii) (iii): Now let in addition (A7) hold We choose θ > 0 such that the 2θ-neighborhood of x is contained in N In the rest of the proof we show that after a possible further reduction of κ 1 the step is accepted whenever x k x < θ and (318) is satisfied Therefore, let x k satisfy x k x < θ As already in (ii), we have only to consider the case (323) in which we have to ensure that rared l k ρ 1pred l k To this end, let 0 < κ 1 θ/2 be such that (i) holds for κ 1 = κ 1 (and thus for all 0 < κ 1 κ 1 ) and consider steps that satisfy (316) for κ 1 = κ 1 In particular, we then have σ k θ/2 and thus [x k, x k + s k ] N Using (A7), Taylor expansion yields ared l k predl k = 1 2 st k ( 2 x l(xτ k, y k) 2 x l(x k, y k ))s k for appropriate x τ k = x k + τ s k, τ [0, 1] Thus, (A7) yields a constant K 8 > 0 with ared l k predl k K 8σ 3 k (328) Now we conclude from (22) and the first inequality in (325) that We consider first the case W T k q k(s n k ) ĝ k K 1 K 7 c k (329) c k 1 max{ ĝ k, Wk T 2K 1 K q k(sk n ) } (330) 7

19 Then by (329) NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 17 W T k q k(s n k ) 1 2 ĝ k and thus (327) holds by (25) Hence, (327), (328) guarantee rared l k ρ 1pred l k whenever This is satisfied for σ k C 6 min{ ĝ k 2/3, ĝ k 1/2 }, Moreover, we have in the case (330) K 8 σ 3 k γ(1 ρ 1) K 3 4 ĝ k min { ĝ k,σ k } { (γ(1 ) ρ1 )K 1/3 ( ) } 3 γ(1 ρ1 )K 1/2 3 C 6 = min, 4K 8 4K 8 c k 1 K 1 K 7 ĝ k (331) In fact, either (331) follows directly from (330) or we have W T k q k(s n k ) 2K 1K 7 c k and thus (326) yields (331) We now choose κ 1 = min{κ 1, C 6 min{1,(k 1 K 7 ) µ }} Then (318) implies σ k δ k κ 1 and (note that µ > 2/3) σ k κ 1 min{1, max{ ĝ k, c k } µ } κ 1 min{1, ĝ k µ max{1,(k 1 K 7 ) µ }} C 6 min{1, ĝ k µ } C 6 min{ ĝ k 2/3, ĝ k 1/2 } Hence, the proof of (iii) is complete in the case (330) It remains to consider the case c k > 1 max{ ĝ k, Wk T 2K 1 K q k(sk n ) } (332) 7 Now, since s t k = W kd k with d k = (W T k W k) 1 W T k st k, (A3) and (A5) yield a constant K 9 > 0 such that pred t k W T k q k(s n k ) (W T k W k) 1 W T k st k H k σ 2 k K 9 σ k ( W T k q k(s n k ) + σ k) Since pred t k max{ pred c k,(predc k )µ}, we deduce from (23) which yields Introducing the constant C 7 = K 9 σ k ( W T k q k(s n k ) + σ k) K µ 2 c k µ min{ c k µ,σ µ k } K 9 σ k (2K 1 K 7 c k + σ k ) K µ 2 c k µ min{ c k µ,σ µ k } K µ 2 (1+2K 1 K 7 )K 9, this implies σ 2 k C 7 c k 2µ, if c k σ k, (333) σ k c k C 7 c k µ σ µ k, if c k > σ k (334)

20 18 M ULBRICH AND S ULBRICH In the case (333) we obtain σ k C 1/2 7 c k µ Since c k is bounded by (A3) and 2/3 < µ < 1, we see that in the case (334) there exists a constant C 8 > 0 with σ k C 8 We conclude that in the situation (323), (332) always holds σ k min{c 1/2 7 c k µ, C 8 } Since c k 1 2K 1 K 7 ĝ k, we find that with C 9 = min{c 1/2 7 min{1,(2k 1 K 7 ) µ }, C 8 } holds σ k C 9 min{1, max{ c k, ĝ k } µ } Choosing κ 1 = min{κ 1, C 9}, this concludes the proof of (iii) also for the case (332) The following Lemma establishes part (iii) of Theorem 31 LEMMA 39 Let (A1) (A6) hold If the algorithm does not terminate finitely then lim inf k ĝ k = 0 Proof Assume that the algorithm runs infinitely and that there are K 0 and ε (0, 1] with ĝ k > 2ε for all k K We first show that after a possible increase of K holds pred l k,a γ predt k,a, predt k,a max{ pred c k,a,(predc k,a )µ}, for all k K As in the proof of Lemma 38, see (329), there exists a constant K 7 > 0 with W T k q k(s n k,a ) ĝ k K 7 s n k,a 2ε K 1K 7 c k for all k K, where we have used (22) Since c k 0 by Lemma 37, we can increase K such that Wk T q k(sk,a n ) ε for all k K, and thus by (25) pred t k,a K 3ε min { ε, k,a } for all k K (335) On the other hand holds pred c k,a 2 A kc k s n k,a + A k A T k sn k,a 2, and, hence, by (22) and (A3) find a constant K 10 > 0 with pred c k,a K 10 c k min { c k, k,a } (336) Since c k 0 by Lemma 37 and ĝ k ε, we obtain from Lemma 38, (i) (ii), and the mechanism of updating k that after a possible increase of K holds k,a γ 1 κ 1 c k 2/3 c k for all k K, (337) and thus we have by (335) (337) that, for sufficiently large K, pred c k,a K 10 c k 2, pred t k,a K 3εγ 1 κ 1 c k 2/3 for all k K (338) Hence, using µ > 2/3 and c k 0, we see from (338) that, possibly after increasing K, pred t k,a > max{ pred c k,a,(predc k,a )µ} for all k K (339)

21 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 19 Moreover, we note that by (A3), (A5), and (22) there is K 11 > 0 with Hence, K can by (338) be enlarged such that pred l k,a predt k,a = q k(s n k,a ) K 11 c k pred l k,a γ predt k,a for all k K (340) Therefore, for all k K the accepted step s k,a satisfies (339) and (340) Thus, for all k K, the acceptance of the step takes place in Step 51 In particular, the accepted steps satisfy rared l k,a ρ 1pred l k,a γρ 1K 3 ε min { ε, k,a } for all k K, (341) where we have used (335) By our assumption holds ĝ k 2ε for all k K Thus Lemma 38, (i) (ii) and the mechanism of updating k yields a constant C 1 > 0 with k,a C 1 min { 1, max{min{a jk, 2ε}, c k } 2/3} C 1 min { } ε, a 2/3 j k (342) Hence, we obtain from (341) some C 2 > 0 with { } pred l k,a C 2ε min ε, a 2/3 j k for all k K (343) By (341) holds rared l k,a ρ 1pred l k,a for all k K We want to apply the decrease Lemma 34 and since rared l k,a uses l(x k+1, y k ), we show that l k+1 l(x k+1, y k ) becomes small compared to pred l k,a In fact, l(x k+1, y k+1 ) l(x k+1, y k ) y k+1 y k c k+1 (344) Using pred c k,a 0, we have with (319) c k+1 2 = c k 2 ared c k,a c k 2 + ared c k,a predc k,a c k 2 + K 5 2 k,a ( k,a + c k ) This together with (337) yields a constant K 12 > 0 such that c k+1 2 K 12 3 k,a for all k K (345) Using (A5), (341), (344), and (345) we obtain possibly after increasing K l(x k+1, y k+1 ) l(x k+1, y k ) ρ 1 2 predl k,a for all k K In fact, by (341), (344), (345) this is clear for k,a C 3, C 3 > 0 small enough After increasing K this holds also for all k,a C 3 > 0, since by c k+1 0 according to Lemma 37 the left term tends by (344) to zero whereas the right hand side has by (341) a positive lower bound Hence, we get with (341) rared l k,a + l(x k+1, y k ) l k+1 ρ 1 2 predl k,a for all k K This yields by Lemma 34 with M l K = max K ν l K <l K l l l k+1 M l K ρ 1 2 λνl k r=k pred l k,a

22 20 M ULBRICH AND S ULBRICH for all k K Since l k is bounded from below by (A3) and (A5) this gives with (343) k=k C 2 min{1, a 2/3 j k } pred l k,a < But the left hand side is not summable because a η j is not summable and η > 2/3 Hence, we have derived a contradiction and the proof is complete 4 Transition to fast local convergence Throughout this section we assume that assumptions (A1) (A7) hold We now show that the proposed Algorithm A converges with local quadratic rate towards a point satisfying the second order sufficient condition Hereby, we work with an SQP-Newton-type step computation These steps are shown to be accepted by our algorithm in a neighborhood of a stationary pair ( x, ȳ) ˆ R m satisfying the following standard sufficient second order condition: (O2) k=k ( ) c( x) = 0, W( x) T x 2 l( x, ȳ)w( x) positive definite, A( x) has full column rank ĝ( x) The last condition ensures that the Lagrange multiplier ȳ is unique 41 Requirement on the step computation To achieve fast local convergence we have to ensure that close to the solution SQP-Newton steps are taken This requires an appropriate splitting of these steps in their quasi-normal and tangential part and a careful choice of the Lagrange multiplier update rule For the derivation of these concepts, we begin by collecting some facts about the local convergence behavior of SQP methods Hereby, we choose an informal style of presentation since these results are by now well-known The SQP-Lagrange-Newton system is given by ( 2 x l(x k, y k ) A k Ak T 0 ) ( ) sn,k = z N,k ( ) x l(x k, y k ) c k Under the assumptions (A7) and (O2) it is well known that for all ζ (0, 1) there exist neighborhoods U N of x and V N of ȳ such that for x k U N, y k V N the steps s N,k and z N,k are well defined with x k + s N,k U N, y k + z N,k V N, x k + s N,k x + y k + z N,k ȳ ζ ( x k x + y k ȳ ), (41) and that for (x k, y k ) ( x, ȳ) holds x k + s N,k x + y k + z N,k ȳ = O( x k x 2 + y k ȳ 2 ), (42) l(x k + s N,k, y k + z N,k ) + c(x k + s N,k ) = O( l k 2 + c k 2 ) (43) Furthermore, if s n N,k and st N,k satisfy A T k sn N,k = c k, s t N,k = W kdn,k t, where (W k T 2 x l kw k )dn,k t = (ĝ k + Wk T 2 x l ks n N,k ), (44) then we have s N,k = s n N,k + st N,k Hereby, we have used the identity ĝ k = Wk T xl k Note that s n N,k solves the unconstrained quasi-normal problem min c k + A T k sn 2,

23 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 21 and that s t N,k is the corresponding solution of the unconstrained tangential problem min q k (s n N,k + st ) subject to A T k st = 0 with H k = 2 x l(x k, y k ) We would like to achieve quadratic convergence of (x k ) rather than (x k, y k ) To this end, similar as in [19], let Y : U N R m be a consistent update rule for the Lagrange multiplier, which is Lipschitz continuous at x, ie, Y( x) = ȳ, Y(x) Y( x) L y x x for all x U N (45) By a possible reduction of U N and V N we achieve that (41) holds for ζ 1/(2+2L y ), and a further reduction of U N yields Y(U N ) V N Thus, for all x k U N holds y k = Y(x k ) V N and x k + s N,k x ζ ( x k x + y k ȳ ) 1 2 x k x (46) x k + s N,k x = O( x k x 2 + y k ȳ 2 ) = O( x k x 2 ), (47) where we have used (41), (42), and (45) Therefore, the iteration x k x k+1 = x k + s N,k with y k = Y(x k ) converges q-quadratically to x In the sequel we restrict ourselves to the following class of update rules: Let B : U N R n m be continuously differentiable such that A T B is uniformly bounded invertible on U N We introduce the multiplier update Y(x) = (B(x) T A(x)) 1 B(x) T g(x), (48) which is obviously consistent and continuously differentiable Therefore, after reducing U N if necessary, B is bounded on U N and Y satisfies the Lipschitz condition In particular, if we choose B = A, we obtain the well-known least-squares multiplier update Furthermore, the adjoint update, which is widely used in optimal control, also fits in this framework: Let x be partitioned in the form x = (z T, u T ) T R m R n m such that z c(x) is invertible on U with uniformly bounded inverse In an optimal control context the standard choice for z is the state, and for u the control The adjoint update for this splitting now corresponds to B T = (Bz T, BT u ) = (I, 0) Among the many possible solutions s n N,k of the quasi-normal problem we select the one contained in span(b k ), ie, By construction holds for y k = Y(x k ) s n N,k = B k(a T k B k) 1 c k (49) l(x k, y k ) T s n N,k = 0 (410) Further, there exist constants K 1, K 13 > 0 such that s n N,k K 1 c k, s t N,k K 13( ĝ k + c k ), (411) where the first inequality follows from (49) and for the derivation of the second inequality we use (44) to obtain s t N,k = W k(w T k 2 x l kw k ) 1 (ĝ k + W T k 2 x l ks n N,k )

24 22 M ULBRICH AND S ULBRICH Furthermore, the uniformly bounded invertibility of B T A and the fact that A T W = 0 yield the uniformly bounded invertibility of (B W) and thus ( ( ) ) 0 x l k = O( (B k W k ) T x l k ) = O = O( ĝ k ) Hence, using that ĝ(x) = W(x) T x l(x, y) for all y R m, we obtain ĝ(x k + s N,k ) + c(x k + s N,k ) = O( x l(x k + s N,k, y k + z N,k ) ) + c(x k + s N,k ) = O( x l k 2 + c k 2 ) = O( ĝ k 2 + c k 2 ) (412) Collecting the results obtained so far, we have PROPOSITION 41 Let (A7) hold and assume that x ˆ satisfies the second order sufficient condition (O2) Let B(x) R n m be continuously differentiable in a neighborhood U N of x such that A T B is uniformly bounded invertible on U N Then for x k U N sufficiently close to x, y k = Y(x k ), s n N,k, and st N,k as given in (48), (49), and (44) are well defined and satisfy (410), (411) Furthermore, (46), (47), and (412) hold The following assumption states our requirements on the step computation that we need to prove fast local convergence Assumption: (A8) x ˆ satisfies the second order sufficient condition (O2), and (x k ) converges to x Moreover, there exists a neighborhood U N of x such that for all x k U N holds: (i) The Lagrange multiplier estimates are computed by y k = Y(x k ) with Y given by (48), where B(x) R n m is continuously differentiable on U N and A T B is uniformly bounded invertible on U N (ii) The step sk n = sn N,k with sn N,k as in (49) is chosen whenever sn N,k k (iii) If the reduced Hessian Wk T 2 x l kw k is positive definite, then s t N,k is computed according to (44) and sk t = st N,k is chosen whenever st N,k k REMARK 42 A possible implementation of (iii) is obtained by applying Steihaug s conjugate gradient method to (24) in the reduced variables d, where s t = W k d (or in its projected form [18]) If the reduced Hessian is positive definite then the CG-path either leaves the trust-region (in this case holds s t N,k > k), or it stays in the trust-region and converges to dn,k t If the reduced Hessian is not positive definite, the Steihaug method either detects negative curvature or stops since the path leaves the trust-region As is well known, one can allow inexactness without destroying the rate of convergence Due to space limitations this issue is not discussed here 42 Quadratic local convergence The next result shows that with the rule (A8) for the step computation Algorithm A eventually takes Newton steps THEOREM 43 Let (A1) (A8) hold Then the trial steps according to (44), (49) are eventually taken by Algorithm A and thus (x k ) converges q-quadratically to x The proof of this result requires some work We start with the following auxiliary result LEMMA 44 Let (A1) (A8) hold and let τ satisfy { 2 3 < τ < min µ, 2 3µ, η } (413) 2 Then there is K > 0 such that the following is true: If for some iteration k K holds ĝ k a τ j k > ĝ k or c k τ > ĝ k, (414)

25 NONMONOTONE TRUST-REGION METHODS WITHOUT PENALTY FUNCTION 23 then for all k k Algorithm A takes Newton steps, ie sk,a n = sn N,k and st k,a = st N,k Proof We first note that by the assumptions on µ and η the condition (413) can be satisfied Since x k x by (A8) and c( x) = 0, ĝ( x) = 0, we find K > 0 with c k 1, ĝ k 1, x k x < min{δ N,θ} and x k U N for all k K, where θ is as in Lemma 38, (iii) In particular, (411) holds for all k K Hence, we can increase K such that s n N,k, st N,k min for all k K Since the steps s n N,k and st N,k satisfy the decrease conditions (23) and (25), respectively, part (iii) of Lemma 38 yields by the mechanism of updating k that for k K the Newton step s k = s N,k is accepted whenever δ k s n N,k, st N,k γ 1δ k, where (415) { def = κ 1 min max{min{a jk, ĝ k }, c k } 2/3, max{ ĝ k, c k } µ} (416) In fact, iteration level k is entered with k min and in each subiteration k is reduced by at most the factor γ 1 From c k 0 and (411) we obtain s n N,k K 1 c k γ 1 κ 1 c k µ γ 1 δ k for all k K after a possible increase of K Thus, by (A8) the quasi-normal step satisfies sk,a n = sn N,k for all k K Now we consider the step s t N,k for k K If aτ j k > ĝ k then, using ĝ k, c k 1, max{min{a jk, ĝ k }, c k } max{ ĝ k 1/τ, c k } max{ ĝ k, c k } 1/τ Similarly, if c k τ > ĝ k, we obtain max{min{a jk, ĝ k }, c k } c k max{ ĝ k 1/τ, c k } max{ ĝ k, c k } 1/τ In both situations we conclude that, since µ < 2 3τ, for k K, K sufficiently large, holds γ 1 δ k γ 1κ 1 max{ ĝ k, c k } 2 3τ > K13 ( ĝ k + c k ) s t N,k, where we have used 3τ 2 < 1 and (411) Therefore, we have proved: If K is sufficiently large and for k K holds (414), then s k,a = s N,k (417) In the case j k, k, the sequence a jk is bounded away from zero and we see that (414) holds for all k K if K is chosen sufficiently large Therefore, (417) completes the proof in this case Now consider the case j k as k Then for K so large that j K 1, Lemma 35 yields c k 1 λα0 a jk def = K 14 a jk for all k K (418) In the case a τ j k > ĝ k we get with (418), using a j 0, ĝ k + c k < a τ j + K k 14 a jk 2a τ j 2 k α0 τ a τ j k +1 for k K, K large enough In the case c k τ > ĝ k we have ĝ k + c k < c k τ + c k 2 c k τ 2K14 τ aτ j 2K 14 τ k α0 τ a τ j k +1

5 Handling Constraints

5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest