QUADRATICALLY AND SUPERLINEARLY CONVERGENT ALGORITHMS FOR THE SOLUTION OF INEQUALITY CONSTRAINED MINIMIZATION PROBLEMS 1

Size: px

Start display at page:

Download "QUADRATICALLY AND SUPERLINEARLY CONVERGENT ALGORITHMS FOR THE SOLUTION OF INEQUALITY CONSTRAINED MINIMIZATION PROBLEMS 1"

Hope Lawrence
6 years ago
Views:

1 QUADRATICALLY AND SUPERLINEARLY CONVERGENT ALGORITHMS FOR THE SOLUTION OF INEQUALITY CONSTRAINED MINIMIZATION PROBLEMS 1 F. FACCHINEI 2 AND S. LUCIDI 3 Communicated by L.C.W. Dixon 1 This research was supported by the National Research Program \Metodi di ottimizzazione per la decisioni", MURST, Roma, Italy. 2 Associate Professor, Dipartimento di Informatica e Sistemistica, Universita di Roma \La Sapienza", Via Buonarroti 12, Roma, Italy. 3 Associate Professor, Dipartimento di Informatica e Sistemistica, Universita di Roma \La Sapienza", Via Buonarroti 12, Roma, Italy.

2 Abstract. In this paper some Newton and quasi-newton algorithms for the solution of inequality constrained minimization problems are considered. All the algorithms described produce sequences fx k g converging q-superlinearly to the solution. Furthermore, under mild assumptions, a q-quadratic convergence rate in x is also attained. Other features of these algorithms are that the solution of linear systems of equations only is required at each iteration and that the strict complementarity assumption is never invoked. First the superlinear or quadratic convergence rate of a Newton-like algorithm is proved. Then, a simpler version of this algorithm is studied and it is shown that it is superlinearly convergent. Finally, quasi-newton versions of the previous algorithms are considered and, provided the sequence dened by the algorithms converges, a characterization of superlinear convergence extending the result of Boggs, Tolle and Wang is given. Key Words. Inequality constrained optimization, Newton algorithm, quasi-newton algorithms, superlinear convergence, quadratic convergence, multiplier function, strict complementarity. 2

3 1 Introduction We are interested in the solution of the inequality constrained optimization problem (P) min f(x) g(x) 0 where f : IR n! IR and g : IR n! IR m are assumed to be twice continuously dierentiable with Lipschitz continuous second order derivatives. In the sequel (x; ) 2 IR n IR m will always indicate a KKT pair for Problem (P), i.e. a pair which satises the following conditions rl(x; ) = 0; g(x) 0; 0; 0 g(x) = 0; where L(x; ) : IR n IR m! IR is the Lagrangian function L(x; ) = f(x) + mx i=1 i g i (x): We shall denote by I 0 the index set of active constraints at x I 0 := fi : g i (x) = 0g; and by I + the index set of strongly active constraints, i.e. the index set of active constraints with positive multiplier I + := fi 2 I 0 : i > 0g: If I 0 = I +, strict complementarity is said to hold at x. We shall always assume that the following two assumptions hold. Assumption A1 [Linear Independence] The gradients of active constraints are linearly independent at x. Assumption A2 [Strong Second Order Sucient Condition] It holds that w 0 r 2 L(x; )w > 0; 8w 6= 0 : rg I+ (x) 0 w = 0: This second assumption is slightly stronger than the usual KKT second order sucient condition in that it requires the Hessian of the Lagrangian to be positive denite on a larger region; note however that the two conditions coincide if strict complementarity holds. Assumption A2 has already been used by some authors (e.g. Refs.1-3) to establish superlinear convergence of local algorithms without assuming strict complementarity. 1

4 Many local algorithms have been proposed for approximating a solution x of Problem (P). In this paper we are concerned only with algorithms which are based on variations of Newton's method and on their quasi-newton counterparts. To motivate and put in perspective our contribution, we give a brief historical account of the development of the convergence theory of these algorithms, with an emphasis on those contributions which consider specically inequality constraints. In fact it is important to note that many results in the literature only deal with equality constrained problems. Regarding inequality constrained problems, if one assumes that strict complementarity holds at the solution, often it can be safely assumed that, locally, the correct active set is identied and the inequality constrained problem reduces to an equality constrained one. However, if the strict complementarity assumption does not hold, this is not true anymore, and inequality constrained problems are considerably more dicult than the equality constrained ones. The development of a local solution algorithm for Problem (P) that could be thought of as an extension of the classical Newton method for unconstrained minimization has been the object of extensive study since the end of the sixties, and it is still an active research area. The seminal work is that of Wilson (Ref.4), where a local algorithm for convex, constrained problems is proposed which has the form x k+1 = x k + d k ; where d k solves the quadratic program (QP k ) min 1 2 dk0 r 2 L(x k ; k )d k + rf(x k ) 0 d k g(x k ) + rg(x k ) 0 d k 0; with r 2 L(x k ; k ) = r 2 f(x k )+ P m i=1 k i r2 g i (x k ) and where k i is the Lagrange multiplier of the i?th linear constraint of Problem QP k?1. The local behavior of this algorithm has been investigated in more detail in Ref.5, where it is established that, if (x; ) is a KKT pair for Problem (P) where (i) the KKT second order sucient conditions for optimality hold; (ii) the strict complementarity condition holds; (iii) the gradients of the active constraints are linearly independent at x; then the algorithm of Wilson is locally r-quadratically convergent in (x; ). Some years later, in the framework of generalized equations, Robinson, on the basis of some results of Josephy (see Ref.1), established that the same result can be obtained without requiring strict complementarity but assuming that Assumptions A1 and A2 hold. Han (Ref.6), under assumption (i), (ii), and (iii), was able to prove a stronger result: a q-superlinear convergence rate in (x; ) (see also Ref.7). Note that, as a rule, we are mainly interested in the behavior of the algorithm with respect to the variable x alone; however, in general, a q-rate in (x; ) implies no more than the corresponding r-rate in x, which, from the numerical point of view, is a much less interesting result. Very recently Bonnans (Ref.8), studying the local behavior of Newton algorithms for 2

5 variational inequalities, established a q-superlinear convergence rate in x alone for a Newton method of the Wilson type (see also Ref.9). His assumptions are weaker than (i), (ii) and (iii); more precisely he only requires (i) and the uniqueness of the Lagrange multipliers. Note that in general, even close to a KKT pair satisfying conditions (i), (ii), and (iii), r 2 L(x k ; k ) need not be positive denite. For this reason, in all the works mentioned so far, it is necessary to specify which of the possibly multiple KKT pairs of (QP k ) has to be selected, and this may result in a computationally very expensive (if at all implementable) algorithm. For example, in Refs.5-6 (x k+1 ; k+1 ) is the closest (in the euclidean norm) KKT pair of (QP k+1 ) to (x k ; k ); in Ref.7 (x k+1 ; k+1 ) is the closest KKT pair of (QP k+1 ) to (x; ); while in Ref.8 (x k+1 ; k+1 ) is required to be \suciently close" to (x k ; k ). Also note that establishing a q-superlinear convergence rate in x for inequality constrained problems without assuming strict complementarity is not at all easy. This can be seen, for example, from the large time gap between a similar result for equality constrained problems, (Refs.10-11), and the work of Bonnans (Ref.8). Another line of research, also stemming from the work of Wilson, has dealt with the study of quasi-newton algorithms, i.e. with the possibility of using in Problem (QP k ) a B k which tries to approximate r 2 L(x k ; k ) by using rst order derivatives only. In fact this was immediately recognized as an important topic, and a lot of works have been devoted to the study of this problem, even if the more interesting results refer to equality constrained problems. Early papers include Refs. 6,11-13 and 10, where suf- cient conditions for r-superlinear and q-superlinear convergence rates are established for the pair (x k ; k ). Passing to rate results in x alone, Powell proved in Ref.14 an r-superlinear convergence rate for a BFGS update. But what is probably the most important result in this eld was obtained by Boggs, Tolle and Wang (Ref.15) the convergence of fx k g to x is q-linear they gave a complete characterization of when the convergence will be q-superlinear. This characterization result is an extension to equality constrained optimization of the Dennis-More (Ref.16) characterization for unconstrained problems. Subsequently Fontecilla, Steihaug and Tapia (Ref.17) derived the Boggs, Tolle and Wang characterization without the q-linear convergence assumption and simply requiring convergence of fx k g to x (see also Nocedal and Overton (Ref.18) and Stoer and Tapia (Ref.19)). It is important to remark that all the characterization results described consider equality constrained problems; the only exception is recent and due to Bonnans (Ref.8) (see also Ref.9), who gives a characterization of superlinear convergence for quasi-newton methods for inequality constrained problems. This characterization only requires that the sequence fx k g generated by the algorithm converges to x and that the KKT second order sucient conditions for optimality hold at x along with the linear independence of the gradients of the active constraints. 3

6 The results so far described are quite interesting and form the basis for what is nowadays probably the most popular class of algorithms for the solution of Problem (P): recursive quadratic programming. Nevertheless they all share a common drawback: the necessity of solving, at each iteration, an inequality constrained quadratic programming problem. The question which naturally arises, then, is: is it possible to develop a dierent extension of the Newton method from the unconstrained to the (inequality) constrained case which only requires the solution of linear systems? The answer is positive, and these methods are part of the folklore of constrained optimization. Nevertheless, sound theoretical results in this direction are established in few cases, and often the local aspects are strongly linked to global results. The most remarkable proposals are those of Biggs (Refs.20-21), Bertsekas (Ref.7), Di Pillo and Grippo (Ref.22), and Kleinmichel, Richter and Schonefeld (Ref.23). The works of Biggs gave ground for the development of ecient numerical codes. In these works both Newton and quasi-newton methods are considered, and q-superlinear convergence rates in x are reported. Note, however, that these results depend on assumptions that are quite strong. Bertsekas considers in Ref.7, under assumptions (i), (ii) and (iii), two dierent Newton schemes. In the rst one a q-quadratic convergence rates is established in (x; ). In the second scheme, instead, a q-superlinear convergence rate in x is proved, but while x k+1 is basically determined by solving a linear system, at each iteration a quadratic constrained subproblem (with possibly \few" constraints) has to be solved in order to get an approximation of the Lagrange multipliers. Di Pillo and Grippo (Ref.22) (see also Ref.24) improve on the second approach of Bertsekas. Under assumptions (i), (ii) and (iii), they prove the q-superlinear convergence rate in x of an algorithm which only requires the solution of two linear systems for each iteration. In particular one of the two linear systems is used to obtain an approximation of the Lagrange multipliers, thus avoiding the need of solving a quadratic subproblem. The last paper we mentioned is the survey (Ref.23). In this work some local algorithms which only require the solution of linear systems are described. The common idea is that of transforming the KKT conditions of Problem (P) into a system of equations. Following this approach, it is possible to establish both q-superlinear and q-quadratic convergence in (x; ), see also Schonefeld (Ref.2), Kanzow (Ref.25), and Kanzow and Kleinmichel (Ref.26). These works are interesting, but we remark that either the strict complementarity assumption is needed, or the algorithms considered are not \true" local algorithms, in the sense that they cannot be applied, even if the starting point (x 0 ; 0 ) is very close to (x; ), if they are not embedded in specic, globally convergent algorithms. We complete this review by recalling a recent paper of Pang (Ref.27), where an 4

7 hybrid approach is adopted: at each iteration a quadratic subproblem is solved, but not all the constraints are linearized and, furthermore, some of the linear constraints in the subproblem are treated as equalities. He can prove a q-quadratic convergence rate in (x; ) without assuming strict complementarity and supposing that some regularity conditions, implied by Assumptions A1 and A2, hold. In this paper we describe some local Newton-type algorithms for the solution of problem (P). These algorithms, by solving linear systems only, generate two sequences, fx k g and f k g, converging to x and respectively. In particular, we show that: - if one is willing to solve, at each iteration, two linear systems, then q-quadratic convergence of the sequence fx k g to x can be proved; - if one wants to reduce the computational burden per iteration and solve only one linear system at each iteration, then q-superlinear convergence of the sequence fx k g to x can be proved along with the q-quadratic convergence of the sequence f(x k ; k )g to (x; ). In both cases we also consider quasi-newton versions of the algorithms proposed and give a characterization of the q-superlinear convergence of the sequence fx k g extending the result of Boggs, Tolle and Wang (Ref.15), and paralleling that of Bonnans (Ref.8). The algorithms proposed in this paper are related to those proposed in Refs.7, 22, 24; however, our proof techniques are completely dierent, and allow us to obtain stronger results under weaker assumptions. Actually our approach is strongly based on the methods of analysis developed, in the equality constrained case, in Ref.28. We remark the following points - this is the rst paper, to our knowledge, where a q-quadratic convergence rate in x is proved; - at each iteration only the solution of linear systems is required; - strict complementarity at the solution is not assumed. Furthermore we note that all the local algorithms described in this paper can be globalized in a very natural way without destroying their good properties. This is the subject of the companion paper (Ref.29). The algorithms we consider produce a sequence fx k g dened by the iteration where d k is obtained by solving the following system: x k+1 = x k + d k (1) 5

8 " r2 L(x k ; k ) rg A k(x k ) rg A k(x k ) 0 0 # " # " d k rf(xk ) =? z k g A k(x k ) # : (2) A k is a set of indices contained in f1; : : : ; mg which is supposed to approximate the set of active constraints I 0, while k is an estimate of the multiplier. These algorithms can be seen as a blend of RQP and active set algorithms. In fact, in RQP algorithms we calculate d k by solving the inequality constrained quadratic subproblem (QP k ). However, if we knew the active set I 0, locally, we could obtain d k by solving the following equality constrained quadratic subproblem: min 1 2 dk r 2 L(x k ; k )d k + rf(x k ) 0 d k g I0 (x k ) + rg I0 (x k ) 0 d k = 0; whose KKT conditions are: " r2 L(x k ; k ) rg I0 (x k ) rg I0 (x k ) 0 0 # " # " d k rf(xk ) =? z k g I0 (x k ) Since we do not know the active set I 0 we resort to the estimate A k, thus obtaining system (2). In the next section, we propose a simple, and yet powerful technique to dene the estimate A k. This identication technique is based on a particular multiplier approximation k, given by a multiplier function (x k ), and which allows us to identify, in a neighborhood of x, the set I +. Then, in Section 3 we study the behavior of algorithm (1), (2), when k is obtained by solving an auxiliary m m linear system; while, in the subsequent section, we consider the case in which k, in system (2), is taken to be equal to z k?1. In the last section, instead, we analyze quasi-newton versions of the algorithms described in the previous two sections. Regarding the notation, in the sequel, we shall only consider q-rates of convergence, hence we shall write, for example, superlinearly convergent instead of q-superlinearly convergent. We shall indicate by k k the euclidean norm or the induced matrix norm according to the argument and by I m the identity matrix of order m. We dene a spherical neighborhood of the point x as the set fx 2 IR n : kx? xk rg, where r > 0 is the radius of the neighborhood. A superscript k is used to denote iteration numbers, while, if v is a vector v i denotes its i-th component. Finally, if v is an n-vector and J is an index set such that J f1; : : : ; ng we denote by v J the subvector with components v j, j 2 J. # : 6

9 2 Identifying the active constraints In this section we introduce a technique for identifying the set of active constraints at x which is of great importance in our approach. To begin with, we need to introduce the concept of multiplier function. This is done in the next denition. Denition 2.1 Let IR n be a neighbourhood of x. A function :! IR m is a multiplier function for Problem (P) at x if (a) () is continuous in x; (b) (x) = : Based on a multiplier function, we can introduce the following \guessing" of the set I 0. A(x) := fi : g i (x)? i (x)g; (3) where is a xed positive parameter. If the strict complementarity assumption holds at x it is easy to see that, in a neighborhood of x, A(x) = I 0. When the strict complementarity does not hold the following renement of this result can be proved. Theorem 2.1 Let x be a stationary point for Problem (P). Then there exists a neighborhood of x such that, for each x in this neighborhood, I + A(x) I 0 : (4) Proof. If i belongs to I + then, by denition of I +, g i (x) = 0 and i (x) > 0. Then, since both g i and i are continuous in x, we have the rst inclusion. To conclude the proof, we show that if i does not belong to I 0 then it can not belong to A(x). In fact, if i =2 I 0, we have, by the complementarity conditions, g i (x) < 0 and i (x) = 0, so that, by continuity in x of g i and i, the second inclusion follows. / Theorem 2.1 says that, in a neighborhood of x, the identication technique (3) correctly identies strongly active constraints while those constraints that are active, but with a null multiplier, can be classied as active or as non active. In the next sections, we shall show that, notwithstanding the impossibility to identify all the active constraints, it is still possible to dene superlinear and quadratically convergent algorithms which only require the solution of a linear systems at each iteration. Remark 2.1 It is easy to see that Theorem 2.1 still holds if we dene A(x) to be A(x) := fi : g i (x)?(x) i (x)g; where (x) is a positive function dened in a neighborhood of x, bounded from above and away from 0. 7

10 Many multiplier functions have been proposed for equality constrained problems, see e.g. Refs.10,30. However, the denition of a multiplier function for inequality constrained problems is a much more dicult task. One possibility, implicitly used in recursive quadratic programming methods, is to use the multipliers of the linear constraints of a quadratic subproblem approximating Problem (P). This choice is obviously very costly, furthermore it is dicult, if at all possible, to prove that these multiplier functions enjoy some further properties beyond continuity that could be useful (see next section). Another possibility is that proposed by Glad and Polak in Ref.31, where an analytic expression for a continuously dierentiable multiplier function for inequality constrained problems is given; each evaluation of this multiplier function requires the solution of an m m linear system. Here we recall the class of multiplier functions proposed in Ref.32; this class includes the Glad and Polak multiplier function and has similar properties. Let 1 and 2 be a positive and a non negative constant respectively. We dene a multiplier function by (x) =?N?1 (x)rg(x) 0 rf(x); (5) where N(x) is the m m matrix dened by: N(x) = rg(x) 0 rg(x) + 1 diag h g 2 i (x) i + 2 m i=1 max [0; g i (x)] 3 I m : The following theorem holds. Theorem 2.2 (see Ref.32) There exists a neighborhood of x such that (a) (x) is well dened in and (x) = ; (b) (x) is continuously dierentiable in. Note that if 2 = 0 we obtain the Glad and Polak multiplier function, this function is dened at any point where the gradients of active constraints are linearly independent. On the other hand, if 2 > 0 the corresponding multiplier function is dened at any point where either at least one constraint is violated or the gradients of active constraints are linearly independent. This can be easily seen by noting that if the point is feasible the multiplier function (5) is equal to the multiplier function of Glad and Polak, while if the point is not feasible the matrix N(x) entering in the denition of (5) is positive denite because its third term is positive denite (see Ref.32 for a more detailed discussion). Locally this brings no advantages over the case 2 = 0, but if one wants to dene globalizing techniques this may be an important additional feature of the multiplier function, which allows us to dene globally convergent algorithms under weaker assumptions, see Ref.29. 8

11 3 Newton-like methods In this section we study the local behavior of the sequence fx k g produced by the iteration x k+1 = x k + d k (6) where d k is obtained by solving the following system " r2 L(x k ; (x k )) rg A k(x k ) rg A k(x k ) 0 0 # " # " d k rf(xk ) =? z k g A k(x k ) # ; (7) (x k ) is a multiplier function according to the denition given in the previous section, and A k is a short notation for A(x k ) dened by (3). We shall establish that convergence rate is superlinear, and that it is at least quadratic if the multiplier function is Lipschitz continuous at x. To prove these results we need the following proposition. Proposition 3.1 Let x; be a KKT pair for Problem (P) which satises Assumptions A1 and A2. Then there exists a convex neighborhood ^ of x and a positive such that for all x 2 ^, the matrix " # r2 L(x; (x)) rg A(x) (x) M(x) := rg A(x) (x) 0 (8) 0 is nonsingular and km(x)?1 k. Proof. Assumption A2 and well known properties of quadratic forms (see, e.g., Refs.33-34) imply that there exists a constant such that the matrix r 2 L(x; (x)) + rg I+ (x)rg I+ (x) 0 is positive denite. By continuity there exists a neighborhood 1 of x such that the matrix r 2 L(x; (x)) + rg I+ (x)rg I+ (x) 0 is positive denite for all x 2 1. This implies (see, e.g. Ref.34) that for all x 2 1 : z T rl(x; (x))z > 0; for all z : rg I+ (x) 0 z = 0: (9) Recalling Theorem 2.1, we can nd a neighborhood 2 1 such that I + A(x) I for all x 2 2. Therefore (9) and Assumption A1 imply that for all x 2 2 : z T rl(x; (x))z > 0; for all z : rg A(x) (x) 0 z = 0: (10) 9

12 rg A(x) (x)y = 0 implies y = 0: (11) Now, using (10) and (11), it is easy to show that the matrix M(x) is nonsingular for all x 2 2 and, hence, the result follows by setting ^ = 2 and by the continuity of M(x). / Now we can prove the main result of this section. Theorem 3.1 Let f and g i, i = 1; : : : ; m; be twice continuously dierentiable with Lipschitz continuous Hessian matrices r 2 f and r 2 g i, i = 1; : : : ; m. Let x; be a KKT pair for Problem (P) which satises Assumptions A1 and A2. Then there exists a neighborhood of x such that if x 0 2, the system (7) is well dened and the sequence fx k g produced by (6) satises x k 2 for all k, converges to x and the rate of convergence is superlinear. Furthermore, if the multiplier function (x) is Lipschitz continuous at x, then the sequence fx k g converges quadratically to x. Proof. First we derive some bounds that will be employed later on. Mean Value Theorems we have: rf (x) = rf (x)? r 2 f (x)(x? x)? rg i (x) = rg i (x)? r 2 g i (x)(x? x)? Z 1 0 Z 1 0 r 2 f (x + t(x? x))? r 2 f (x) r 2 g i (x + t(x? x))? r 2 g i (x) Using the (x? x)dt; (12) (x? x)dt; (13) g i (x) = g i (x)? rg i (x) 0 (x? x) (x? x)0 r 2 g i (x + s i (x? x))(x? x); (14) where s i 2 (0; 1). If L f and L i, i = 1; : : : ; m; are the Lipschitz constants of the Hessians of f and g i, i = 1; : : : ; m respectively, we can write, taking into account (12) and (13): krf(x)? rf(x) + r 2 f(x)(x? x)k L f 2 kx? xk2 ; (15) krg i (x)? rg i (x) + r 2 g i (x)(x? x)k L i 2 kx? xk2 : (16) Now, if we denote by 1 the neighborhood of x described in Theorem 2.1, we can dene the following constants M i, i = 1; : : : ; m: By using (14) we obtain for all x 2 1 : M i = sup x2 1 kr 2 g i (x)k: kg i (x)? g i (x) + rg i (x) 0 (x? x)k M i 2 kx? xk2 : (17) 10

13 Now, recalling (7), we can consider the following system: " # " r2 L(x k ; (x k )) rg A k(x k ) xk + d k? x rg A k(x k ) 0 0 z k? A k "?rf(xk ) + r 2 L(x k ; (x k ))(x k? x)? rg A k(x k ) A k =?g A k(x k ) + rg A k(x k ) 0 (x k? x) # # : (18) Recalling that rl(x; ) = 0 and that, by equation (4), i = 0 if i 62 A k, we have:?rf(x k ) + r 2 L(x k ; (x k ))(x k? x)? rg A k(x k ) A k =?rf(x k ) + r 2 f(x k )(x k? x)? P m i=1 i [rg i (x k )? r 2 g i (x k )(x k? x)] + P m i=1( i (x k )? i )r 2 g i (x k )(x k? x) = rf(x)? rf(x k ) + r 2 f(x k )(x k? x) + P m i=1 i [rg i (x)? rg i (x k ) + r 2 g i (x k )(x k? x)] + P m i=1 ( i(x k )? i )r 2 g i (x k )(x k? x) from which, by using (15) and (16), we obtain: k? rf(x k ) + r 2 L(x k ; (x k ))(x k? x)? rg A k(x k ) A kk Lkx k? xk 2 + Mk(x k )? k kx k? xk; (19) (20) where L = L f + P m i=1 i L i and M = P m i=1 M i Similarly, by using (16) and (4) (which implies that g A k(x) = 0), we have: k? g A k(x k ) + rg A k(x k ) 0 (x k? x)k = kg A k(x)? g A k(x k ) + rg A k(x k ) 0 (x k? x)k P (21) m M i i=1 2 kxk? xk 2 Mkx k? xk 2 Proposition 3.1 ensures that there exist a positive constant and a neighborhood 2 1 of x such that km(x)?1 k ; for all x 2 2 ; (22) where the matrix M(x) is dened by (8). At this point (18), (20), (21) and (22) imply that for all x 2 2 : where kx k + d k? xk max h kx k + d k? xk; kz k? A kk i (x k )kx k? xk; (23) By continuity we have that (x) := (L + M)kx? xk + Mk(x)? k : (24) (x)! 0 if x! x (25) 11

14 and that there exists a spherical neighborhood 2 of x such that (x) < 1; 8x 2 : (26) Now, if x 0 2, we have that (23) and (26) imply that the sequence fx k g satises x k 2 for all k, is well dened by Lemma 3.1, and converges to x; while (23) and (25) ensure that the rate of convergence is superlinear. If the multiplier function (x) is Lipschitz continuous in x, and L is its Lipschitz constant, we can assume, without loss of generality, that, for all x 2 : where (x) ~kx? xk; (27) ~ := (L + M + ML ) : Therefore, if x 0 2, (23) and (27) show that kx k + d k? xk ~kx k? xk 2 ; which implies that the sequence converges quadratically to x. / Remark 3.1 We note that a multiplier function which is also Lipschitz continuous at x, as required by Theorem 3.1, is that dened by (5) (see Theorem 2.2). This function requires the solution of an m m linear system. Multiplier functions other than (5) may be used, however, to the best of our knowledge, any such multiplier function will require the solution of an additional linear system. Obviously, if Problem (P) has some particular structure it may well be possible to dene multiplier functions which are easier to compute (see, e.g., Ref.35). 4 Low cost Newton-like methods In this section we investigate the possibility to reduce the computational costs of the algorithm described in the previous section. The basic idea is to follow a more classical approach, namely to use the vector z k?1 to obtain an approximation of, thus avoiding the necessity of using a multiplier function. However, it should be intuitively clear that, using the old information z k?1 instead of (x k ), will cause a deterioration of the convergence rate. Hence we consider a local algorithm dened by iteration where d k is obtained by solving the following system: x k+1 = x k + d k (28) 12

15 and where while 2 4 r2 L(x k ; ~ k )) rg ~ A k(xk ) rg ~ A k (x k ) " # " 5 d k rf(xk ) =? z k g A ~ k(xk ) ~ k i = z k?1 i if i 2 ~ A k?1 ; ~ k i = 0 if i 62 ~ A k?1 ; ~A k := fi : g i (x k )? ~ k i g: # ; (29) Note that to start this algorithm we need a starting point x 0 and an initial estimate of the Lagrange multiplier ~ 0. Theorem 4.1 Let f and g i, i = 1; : : : ; m, be twice continuously dierentiable with Lipschitz continuous Hessian matrices r 2 f and r 2 g i, i = 1; : : : ; m. Let x; be a KKT pair for Problem (P) which satises Assumptions A1 and A2. Then there exist neighborhoods of x and ~ of such that if x 0 2 and ~ 0 2 ~, the systems (29) are well dened for every k, the sequence fx k g produced by (28) satises x k 2 for all k, converges to x and the rate of convergence is superlinear. Furthermore, the sequence f ~ k g remains in ~, converges to, and the sequence fx k ; ~ k g converges quadratically to (x; ). Proof. We rst note that it is easy to see, by repeating the proofs of Theorem 2.1 and Proposition 3.1, that there exist neighborhoods 1 of x and ~ 1 of such that if x k 2 1 and ~ k 2 ~ 1 then I + ~ Ak I 0 ; (30) the matrix ~M(x k ) := 2 4 r2 L(x k ; ~ k )) rg A ~ k(xk ) rg A ~ k(xk ) 0 0 is nonsingular and k ~ M(xk )?1 k ~ for some positive ~. Now, by simply changing (x k ) into ~ k and A k into ~ A k, we can verbatim repeat the rst part of the proof of Theorem 3.1 and obtain the following relations, analogous to equations (23) and (24). kx k + d k? xk max h kx k + d k? xk; kz k? ~ A k k i ~(x k ; ~ k )kx k? xk; (31) 3 5 where ~(x k ; ~ k ) := ~ (L + M)kx k? xk + Mk ~ k? k : (32) We now note that i = 0 if i 62 I +, (30), and the denition of ~ k+1, imply kz k? ~ A kk = k~ k+1? k; 13

16 since ~ k+1 and are obtained from z k and A ~ k respectively by \lling" them with zeros up to n. Then equation (31) gives kx k+1? xk max h kx k+1? xk; kz k? ~ A kki = By continuity we have that max h kx k+1? xk; k ~ k+1? k i ~(x k ; ~ k )kx k? xk: (33) ~(x k ; ~ k )! 0 if x k! x and ~ k! (34) and that there exist spherical neighborhoods 1 of x and ~ ~ 1 such that ~(x k ; ~ k ) < 1; 8x k 2 ; 8 ~ k 2 ~; and radius() radius(~): (35) Now, if x 0 2 and ~ 0 2 ~, we have that (33), and (35) imply that the sequence fx k ; ~ k g satises x k 2 and ~ k 2 ~ for all k, is well dened, and converges to (x; ); while (33) and (34) ensure that the rate of convergence of the sequence fx k g to x is superlinear. To complete the proof we now show that the sequence fx k ; ~ k g converges quadratically to (x; ). From (32) and (33) we have, for some positive R, the following chain of inequalities!! x k+1? ~ x h k+1 R max kx k+1? xk; k ~ k+1? k i R ~(x k ; ~ k )kx k? xk which completes the proof. / = R (L + M)kx k? xk + Mk ~ k? k kx k? xk!! 2R (L + M) x k? x 2 ~ k ; Remark 4.1 We note that the method described in this section, employing ~ instead of a multiplier function (x), are less computationally demanding than those described in the previous section, since the evaluation of a multiplier function (x) requires, in general, the solution of an additional linear system. On the other hand algorithm (28)-(29) is only superlinear convergent in x and requires an initial estimate of the Lagrange multiplier ~ 0. In particular we cannot prove the quadratic convergence rate of algorithm (28)-(29) because when using ~ instead of (x), it does not seem possible to show that ~(x k ; ~ k ) (see (32)) goes to zero at least as fast as kx k? xk. This is instead possible when using (x), see (27). 14

17 5 Quasi-Newton methods In this section we consider the case when we do not want to use second order derivatives. We analyze quasi-newton versions of both the Newton-like method of Section 3 and the low cost Newton-like method of Section 4. We start with the quasi-newton counterpart of the algorithm of Section 3. Therefore we study the local behavior of the sequence fx k g produced by the iteration (6) when d k is obtained by solving the following system: " B k rg A k(x k ) rg A k(x k ) 0 0 # " # " d k rf(xk ) =? z k g A k(x k ) # ; (36) where B k are n n matrices intended to aproximate second order information and where, we recall, A k is a short notation for A(x k ) (see (3)). We assume that the following assumption is satised. Assumption A3 The sequence B k is such that w 0 B k w > 0; 8w : rg A k(x k ) 0 w = 0: It is easily seen (reasoning along the same lines adopted in the proof of Proposition 3.1) that this assumption, along with Assumption A1, implies that, locally, system (36) has a unique solution so that iteration (6) is well dened. On the other hand this assumption seems reasonable, since Assumption A2 is supposed to hold. Our aim is to nd the minimal assumptions on the matrices B k which guarantee that the sequence fx k g produced by (6)-(36) is superlinearly convergent. We shall obtain a necessary and sucient condition similar to the characterization given by Boggs, Tolle, and Wang (Ref.15) in the equality constrained case. To this end we shall extend the proof technique introduced by Stoer and Tapia (Ref.19) to establish the Boggs-Tolle-Wang characterization. Stoer and Tapia (Ref.19) basically show that a quasi-newton direction for an equality constrained minimization problem can be viewed as a quasi-newton direction for a system of n nonlinear equations and then apply Dennis-More result (Ref.16) on the superlinear convergence of quasi-newton methods for systems of equations. We add one more step. Roughly speaking, rst we use the fact that the direction d k produced by (36) can be viewed as a quasi-newton direction for the equality constrained problem min f(x) g A k(x) = 0; then we apply the Stoer-Tapia technique. However, we have to deal with the following problem. Since the strict complementarity assumption does not necessarily hold at x, 15

18 we can only ensure, by Theorem 2.1, that locally I + A k I 0, so that A k does not necessarily settle down. This means that the direction d k is the quasi-newton direction for an equality constrained problem, but that this problem can possibly be dierent from iteration to iteration. To cope with this \inconvenience" and to be able to extend the techniques of Ref.19, we then need a generalization of the classical result of Dennis and More (Ref.16) on the superlinear convergence of of quasi-newton methods for systems of equations. More precisely we consider the case in which the quasi-newton iterations are applied to a nite number of possibly dierent systems of equations. To state this result we introduce some more notation. Suppose that we have r dierent functions F j : IR n! IR n. We shall consider a function j : N! f1; : : : ; rg, which is used to associate to each iteration k one of the functions F j. Then we can state the following theorem which coincides with the Dennis More result when r = 1. Theorem 5.1 Let F j : IR n! IR n, j = 1; : : : ; r be continuously dierentiable in an open convex neighborhood D of x, and suppose that, for every j = 1; : : : ; r, F j (x) = 0 and rf j (x) is nonsingular. Let fh k g be a sequence of n n nonsingular matrices and let a function j(k), j : N! f1; : : : ; rg, be given. Suppose that for x 0 2 D the sequence x k+1 = x k? (H k )?1 F j(k) (x k ) is contained in D and converges to x. Then fx k g converges q-superlinearly to x if and only if k(h k? rf j(k) (x) 0 )(x k+1? x k )k lim = 0: k!1 k(x k+1? x k )k Proof. The proof is an almost verbatim repetition of the Dennis More characterization theorem (Ref. 16), and is left to the reader. / Using this result we can now determine when the sequence fx k g produced by (6)-(36) is superlinearly convergent. We consider, for each iteration k, the projection matrix P (x; A k ): P (x; A k ) := I? rg A k(x) (rg A k(x) 0 rg A k(x))?1 rg A k(x) 0 ; depending on x and A k. P (x; A k ) projects onto the null space of rg A k(x) 0. Note that, by Theorem 2.1 and Assumption A1, P (x; A k ) is well dened if both x k and x belong to a suciently small neighborhood of x. We can now prove the main theorem of this section. Theorem 5.2 Let x; be a K-T pair for Problem (P) which satises Assumptions A1 and A2. Suppose that f and g i, i = 1; : : : ; m are twice continuously dierentiable, 16

19 that fx k g, generated according to (6) and (36), converges to x, and that Assumption 3 is satised. Then, the sequence fx k g converges q-superlinearly to x if and only if kp (x k ; A k )[B k? r 2 L(x; )]d lim k k k!1 kd k k = 0: (37) Proof. We suppose, without loss of generality, that k is large enough, so that I + A k I 0, system (36) is well dened and admits a unique solution, and the projection operators P (x; A k ) are well dened. By Ref.19 (Section 2) determining d k from (36) is equivalent to determining d k from the linear system h P (xk ; A k )B k + rg A k(x k )rg A k(x k ) 0i d k =? h P (x k ; A k )rf(x k ) + rg A k(x k )g A k(x k ) i : (38) For each iteration we consider the function F (x; A k ) := P (x; A k )rf(x) + rg A k(x)g A k(x): Note that the functions F (x; A k ) may dier at each iteration because the set A k may change. However, the maximum number of dierent A k is nite and, by Theorem 2.1, equal to the number of index sets which contain I + and are contained in I 0. Let r denote this maximum number. Then there are, at maximum, r dierent functions F (x; A k ). Now, in order to apply Theorem 5.1 and to be consistent with the notation used there, we numerate the index sets which contain I + and are contained in I 0. If j is the number corresponding to the set J, with I + J I 0, we set F j (x) = F (x; J): Then we can dene a function j(k) which associates to the iteration index k the number of the set A k, and write F j(k) (x) = F (x; A k ): By Theorem 2.1 we have, for k suciently large, that g A k(x) = 0, so that, following again Ref.19 (Section 3), we can write F j(k) (x) = 0 rf j(k) (x) 0 = P (x; A k )r 2 L(x; ) + rg A k(x)rg A k(x) 0 ; where the rst equation follows from the KKT conditions for Problem (P) and where rf j(k) (x) 0 is non singular by Assumptions A1 and A2. We see then, from (38), that we can consider the k-th iteration of the iterative process dened by (6) and (36), as a quasi-newton step applied to the nonlinear system F j(k) (x) = 0, with H k = P (x k ; A k )B k + rg A k(x k )rg A k(x k ) 0 nonsingular by Assumptions A1 and A3. Then, applying Theorem 5.1, we obtain that fx k g converges to x q-superlinearly if and only if lim k!1 h i P (x k ; A k )B k + rg A k (x k )rg A k (x k ) 0? (P (x; A k )r 2 L(x; ) + rg A k (x)rg A k (x) 0 ) d k kd k k 17 (39) = 0:

20 If we now add and subtract to the term is square brackets P (x k ; A k )r 2 L(x; ) and rearrange terms, we see, by simple continuity arguments, that (39) is satised if and only if kp (x k ; A k )[B k? r 2 L(x; )]d lim k k = 0 k!1 kd k k that is what we wanted to prove. /. To our knowledge this is the rst characterization of superlinear convergence in the variable x alone for an algorithm (for inequality constrained problems) which only solves linear systems at each iteration. For algorithms of RQP type the only similar characterization, was given by Bonnans in Ref.8. Note that both characterizations do not require the strict complementarity assumption. It may be of interest to restate Theorem 5.2 is a slightly dierent way. Corollary 5.1 Let x; be a K-T pair for Problem (P) which satises Assumptions A1 and A2. Suppose that f and g i, i = 1; : : : ; m are twice continuously dierentiable, that fx k g, generated according to (6) and (36), converges to x, and that Assumption A3 is satised. Then the sequence fx k g converges q-superlinearly to x if and only if kp (x k ; A k )[B k? r 2 L(x k ; (x k ))]d k k lim k!1 kd k k = 0: (40) Proof. This corollary follows by Theorem 5.2 adding and subtracting r 2 L(x k ; (x k )) in (37), and noting that r 2 L(x k ; (x k )) tends to r 2 L(x; ). / Condition (40) is interesting because it depends only on quantities which are computable in x k. Note that this property is not shared by the similar condition given in Theorem 6.2 of Ref.8. In fact the projection matrix used there requires the knowledge of the sets I 0 and I +. There is no diculty in extending the previous analysis to the low cost Newton-like methods of Section 4. By simply repeating the proofs of Theorem 5.2 and Corollary 5.1 we can prove the following theorem. Theorem 5.3 Let x; be a K-T pair for Problem (P) which satises Assumptions A1 and A2. Suppose that f and g i, i = 1; : : : ; m; are twice continuously dierentiable, that fx k g, generated according to (6) and (36), converges to x, and theat Assumption 3 is satised. Then the sequence fx k g converges q-superlinearly to x if and only if or equivalently kp (x k ; Ak ~ )[B k? r 2 L(x; )]d lim k k k!1 kd k k = 0; (41) kp (x k ; A lim ~ k )[B k? r 2 L(x k ; ~ k )]d k k = 0: (42) k!1 kd k k 18

21 6 Conclusions In this paper we have proposed some Newton and quasi-newton algorithms for the solution of inequality constrained minimization problems. All the algorithms proposed produce sequences fx k g converging q-superlinearly to the solution. Furthermore, under fairly mild assumptions, a q-quadratic convergence rate in x is also attained. Other features of these algorithms are that the solution of linear systems of equations only is required at each iteration and that the strict complementarity assumption is never invoked. These methods should be compared to methods which solve at each step a quadratic inequality constrained subproblem, and, in our opinion should re-open the question of the merits of using inequality constrained quadratic subproblems (IQP) or equality constrained quadratic subproblems (EQP) in sequential quadratic programming (SQP) methods for nonlinear programming. The reason which led to a much wider use of IQP subproblems in SQP methods is probably to be found in the possibility to obtain global convergence results more easily than when using EQP subproblems. However, in the companion paper Ref. 29, we prove that all the Newton and quasi-newton directions decribed in Sections 3, 4 and 5 are \good" descent directions for a large class of exact penalty function, thus allowing us to easily globalize the local algorithms described in this work without losing their good features. We remark that near a solution at which the strict complementarity assumption holds, most current SQP codes using IQP subproblems will often employ only a single iteration to solve the IQP subproblem, thus requiring a computational eort similar to the algorithms described here. This depends on the fact that if strict complementarity holds, the active set for the IQP subproblem will be the same as for the previous iteration. However, if strict complementarity does not hold, this is no longer true, and SQP methods using IQP could be much costly and the search direction could be remarkably dierent. The results of this paper and of Ref.29 suggest that SQP methods using EQP subproblems could be a valid alternative to those using IQP subproblems, since they are much less computationally expensive and enjoy better local properties (see Introduction). However, one should also take into account that solving an IQP subproblem (implicitely) gives a dierent estimate of the active set that, in principle, could be better than that given by (3). Then, it is obvious that only extensive computational experiences could assess the merits of SQP methods using inequality or equality constrained subproblems. This will be the subject of future research. To this end, it may be interesting to note that, in this paper, we have always assumed A k to be dened by (3). However, this was done only for the sake of concreteness. It is easy to check that all the result of the Sections 3, 4, and 5 still hold if we assume that we use any estimate A k which locally satises (4). 19

22 References 1. ROBINSON, S.M., Generalized equations, Mathematical Programming: The State of the Art, Edited by A. Bachem and M. Grotschel and B. Korte, Springer Verlag, Berlin, pp. 346{367, SCH ONEFELD, K., A superlinearly and globally convergent optimization method indipendent of strict complementarity slackness, Technische Univeritat Dresden, Sektion Mathematik, Report No , GABAY, D., Reduced quasi-newton methods with feasibility improvement for nonlinearly constrained optimization, Mathematical Programming Study, Vol.16, pp.18{44, WILSON, R.B., A simplicial algorithm for concave programming, Harvard University, Graduate School of Business Administration, PhD THesis, ROBINSON, S.M., Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear-programming algorithms, Mathematical programming, Vol. 7, pp. 1{16, S.P. HAN, S.P., Superlinearly convergent variable metric algorithm for general nonlinear programming problems, Mathematical Programming, Vol. 11, pp. 263{ 282, BERTSEKAS, D.P., Constrained Optimization and Lagrange Multipliers Methods, Academic Press, New York, BONNANS, J.F., Rates of convergence of Newton type methods for variational inequalities and nonlinear programming, Applied Mathematics and Optimization, Vol. 29, pp. 161{186, BONNANS, J.F., Local study of Newton type algorithms for constrained problems, Optimization, Edited by S. Dolecki, Springer-Verlag, Berlin, pp. 13{24, TAPIA, R.A., Diagonalized multiplier methods and quasi-newton methods for constrained optimization, Journal of Optimization Theory and Applications, Vol. 22, pp. 135{194, GLAD, T., Properties of updating methods for multipliers in augmented Lagrangians, Journal of Optimization Theory and Applications, Vol. 28, pp. 135{ 156,

23 12. GARCIA PALOMARES, U.M., and MANGASARIAN, O.L., Superlinearly convergent quasi-newton algorithms for nonlinearly constrained optimization problems, Mathematical Programming, Vol. 11, pp. 1{13, HAN, S.P., Dual variable metric algorithms for constrained optimization, SIAM Journal on Control and Optimization, Vol. 15, pp. 546{565, POWELL, M.J.D., The convergence of variable metric methods for nonlinear constrained optimization calculations, Nonlinear Programming 3, Edited by O.L. Mangasarian, R.R. Meyer and S.M. Robinson, Academic Press, New York, pp. 27{63, BOGGS, P.T., TOLLE, J.W., WANG, P., On the local convergence of quasi- Newton methods for constrained optimization, SIAM Journal on Control and Optimization, Vol. 20, pp. 161{171, DENNIS, J.E., and MORE, J.J., A characterization of superlinear convergence and its application to quasi-newton methods, Mathematics of Computation, Vol. 28, pp. 549{560, FONTECILLA, R., STEIHAUG, T., and TAPIA, R.A., A convergence theory for a class of quasi-newton methods for constrained optimization, SIAM Journal on Numerical Analysis, Vol. 24, pp. 1133{1151, NOCEDAL, J., and OVERTON, M., Projected Hessian updating algorithms for nonlinearly constrained optimization, SIAM Journal on Numerical Analysis, Vol. 22, pp. 821{850, STOER, J., and TAPIA, A., On the characterization of q-superlinear convergence of quasi-newton methods for constrained optimization, Mathematics of Computation, Vol. 49, pp. 581{584, BIGGS, M.C., On the convergence of some constrained minimization algorithms based on recursive quadratic programming, Journal Institute of Mathematics and its Applications, Vol. 21, pp. 67{82, BARTHOLOMEW-BIGGS, M.C., Recursive quadratic programming based on the augmented Lagrangian, Mathematical Programming Study, Vol. 31, pp. 21{41, 1987, 22. DI PILLO, G., and GRIPPO, L., A class of continuously dierentiable exact penalty function algorithms for nonlinear programming problems, System modelling and optimization, Edited by P. Toft-Christensen, Springer-Verlag, Berlin, pp. 246{256, 1984, 21

24 23. KLEINMICHEL, H., RICHTER, C., and SCH ONEFELD, K., On a class of Hybrid methods for smooth constrained optimization, Journal of optimization theory and applications, Vol. 73, pp. 465{499, DI PILLO, G., GRIPPO, L. and LUCIDI, S., Globally Convergent Exact Penalty Algorithms for Constrained Optimization, System modelling and optimization, Edited by A. Prekopa and J. Szelezsan and B. Strazicky, Springer-Verlag, Berlin, pp. 694{703, KANZOW, C., Newton-type methods for nonlinearly constrained optimization, Universitat Hamburg, Institut fur Angewandte Mathematik, Report No.62, KANZOW, C., and KLEINMICHEL, H., A class of Newton-type methods for equality and inequality constrained optimization, Universitat Hamburg, Institut fur Angewandte Mathematik, Report No.61, PANG, J.-S, A B-dierentiable equation-based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, Mathematical Programming, Vol. 51, pp. 101{131, FACCHINEI, F., and LUCIDI, S., Local properties of a Newton-like direction for equality constrained minimization problems, Optimization methods and software, Vol. 3, pp. 13{26, FACCHINEI, F., and LUCIDI, S., A Globalization technique for constrained Newton-like algoritms, Universita di Roma \La Sapienza", Report (forthcoming). 30. FLETCHER, R., A class of methods for nonlinear programming with termination and convergence properties, Integer and nonlinear programming, Edited by J. Abadie, North-Holland, Amsterdam, pp. 157{173, GLAD, T., and POLAK, E., A multiplier method with automatic limitation of penalty growth, Mathematical Programming, Vol. 17, pp. 140{155, LUCIDI, S., New results on a continuously dierentiable exact penalty function, SIAM Journal on optimization, Vol. 2, pp. 558{574, HESTENES, M.R., Optimization theory. The nite dimensional case, Wiley, New York, BELLMAN, R., Introduction to matrix analysis, McGraw Hill, New York, FACCHINEI, F., and LUCIDI, S., A class of penalty functions for optimization problems with bound constraints, Optimization, Vol. 26, pp. 239{259,

system of equations. In particular, we give a complete characterization of the Q-superlinear

system of equations. In particular, we give a complete characterization of the Q-superlinear INEXACT NEWTON METHODS FOR SEMISMOOTH EQUATIONS WITH APPLICATIONS TO VARIATIONAL INEQUALITY PROBLEMS Francisco Facchinei 1, Andreas Fischer 2 and Christian Kanzow 3 1 Dipartimento di Informatica e Sistemistica