Technische Universität Dresden Herausgeber: Der Rektor

Als Manuskript gedruckt Technische Universität Dresden Herausgeber: Der Rektor The Gradient of the Squared Residual as Error Bound an Application to Karush-Kuhn-Tucker Systems Andreas Fischer MATH-NM-13-2002 August 2002

The Gradient of the Squared Residual as Error Bound an Application to Karush-Kuhn-Tucker Systems Andreas Fischer Department of Mathematics University of Dresden 01062 Dresden Germany fischer@math.tu-dresden.de August 2002 Abstract. A general relationship between the natural residual of a system of equations the necessary optimality conditions associated to the corresponding least squares problem is presented. Based on this an error bound result for the gradient of a least squares reformulation of Karush-Kuhn-Tucker systems will be derived. 1 Introduction Error bounds have been turned out an essential tool for analyzing both the global the local convergence behavior of algorithms for solving equations or more general problems. Besides their use for theoretical purposes appropriate error bounds play a central role in several techniques for globalizing locally convergent algorithms. More recently, computable error bounds have been successfully employed for achieving or improving certain local properties of algorithmn cases where classical regularity conditions are violated. Let us first mention techniques for the locally accurate identification of constraints which are active at a solution of a Karush-Kuhn-Tucker system. Secondly, several modifications of Newton-type methods have been developed that guarantee superlinear convergence properties even in cases where no isolated solution exists. For such modifications the existence of an appropriate error bound is one of the key assumptions for superlinear convergence. Let us consider the problem of solving the equation H(z) = 0, (1) where H : R l 1 R l 2 is a given map. Its solution set is assumed to be nonempty 2

denoted by Σ, i.e., Σ := {z R l 1 H(z) = 0}. (2) In [7] [3] Levenberg-Marquardt type algorithms have been suggested for solving (1) provided that H is continuously differentiable. The local Q-quadratic rate of convergence in [7] relies, besides further assumptions, on an error bound condition like µd[z, Σ] H(z) z Σ 0 + ɛb (3) for some ɛ, µ > 0, where Σ 0 denotes a nonempty closed subset of Σ d[z, Σ] the distance of z to Σ. In [3] a general approach for solving generalized equations with nonisolated solutions has been developed. It exploits the upper Lipschitz-continuity of the solution set map belonging to a perturbed generalized equation. In the case of problem (1) this assumption is equivalent to the error bound condition above. As an application of the approach just mentioned a prox-regularized Newton-type method for solving the necessary optimality conditions Q(z) := H(z)H(z) = 0 (4) associated to the least squares problem Ψ(z) := 1 2 H(z)T H(z) min is suggested in [3, Section 5.4]. Therefore, instead of (3), an error bound condition for problem (4) is of importance. This condition reads for some ɛ, µ > 0. Note that where µd[z, Σ] Q(z) z Σ 0 + ɛb (5) d[z, S] d[z, Σ] z R l 1, S := {z R l 1 Q(z) = 0} Σ denotes the solution set of problem (4). Obviously, if H(z) is bounded in a neighborhood of Σ, the error bound condition (5) implies condition (3) with µ > 0 suitably chosen. However, since (3) is a natural error bound condition for problem (1), the question arises under which assumptions (3) implies (5). An answer to this basic question will be given by Theorem 1 in Section 2. This theorem is not only applicable to continuously differentiable maps but also to a certain class of nondifferentiable maps H. In particular, maps Φ can be dealt with that are frequently used to reformulate Karush-Kuhn-Tucker (KKT) systems as systems of nondifferentiable equations. To this end, we show in Section 3 that if the KKT system satisfies certain assumptions the map Φ belongs to the class of nondifferentiable maps covered by Theorem 1. In Section 4, it is shown that there is a reasonably large class of KKT systems that satisfy this assumption. In particular, affine KKT systems do so. If (3) implies (5) then this can be exploited for the analysis of the local behavior of algorithms. Suppose that a line search algorithm makes use of search directions d := M(z) H(z)H(z) = M(z)Q(z) 3

with a certain matrix M(z) R l 1 l 1. Then, (3) (5) can be helpful for estimating the local progress, in particular, if the step length is computed by decreasing a merit function like Ψ. A corresponding example is provided in the forthcoming paper [4]. There, a class of algorithms for solving KKT systems with nonisolated solutionnvestigated in respect to the limiting behavior of the iterates of active set estimates. Another advantage of having (5) is that any solution of Q(z) = 0 in a neighborhood of Σ 0 also solves H(z) = 0 without any further condition. Notation: Throughout the paper denotes the Euclidean vector norm or the induced matrix norm. The unit ball (of appropriate dimension) is always denoted by B. For a set S R l a point z R l the distance of z to S is defined as d[z, S] := inf{ z s s S} if S d[z, S] := otherwise. Moreover, by Π(z, S) := {s S z s = d[z, S]} we denote the set of all pointn S that have minimal distance to S. If S is nonempty closed then Π(z, S) is nonempty for any z R l. Let G : R d 1 R d 2 be a locally Lipschitz-continuous function. Then, Clarke s generalized Jacobian of G at z R d 1 exists is denoted by G(z). If G is continuously differentiable at z then G(z) = G(z) T. Definition further properties of G(z) can be found in [1]. 2 The Gradient of the Squared Residual as Error Bound Let us first discuss the question whether assumptions (3) implies (5) in the classical situation where H is continuously differentibale H(z ) is nonsingular at a solution z of (1). Then, with the continuity of H, Taylors formula shows that z is an isolated solution. Moreover, in a neighborhood of z, H(z) 1 exists its norm is bounded above. Therefore, setting Σ 0 := {z }, condition (3) for ɛ, µ > 0 sufficiently small implies (5) with µ > 0 suitably chosen. If, however, z is a nonisolated solution of (1), then H(z ) must be singular as long as H is continuous. Moreover, even if H(z) 1 exists for z close to z, its norm cannot be bounded. Nevertheless, (3) implies (5). This follows from Theorem 9 in [3]. This theorem is applicable not only to continuously differentiable maps but also to a certain class of nondifferentiable maps H. We will now present a corresponding theorem for a larger class of nondifferentiable maps H. To this end, let H : R l 1 R l 2 denote a locally Lipschitz-continuous map consider problem (1), i.e., H(z) = 0. According to (2), this problem is assumed to have a nonmepty solution set denoted by Σ. The necessary optimality condition for minimizing the squared residual Ψ(z) = 1 2 H(z)T H(z). reads 0 Ψ(z), 4

where Ψ(z) = H(z) T H(z) holds according to an appropriate chain rule [1]. We will now derive conditions under which the norm of elementn Ψ(z) can serve as an error bound for d[z, Σ], i.e., for the distance of z to the solution set of (1). Theorem 1 Let Σ 0 Σ be nonempty closed. Assume that there are ɛ, µ > 0, σ 1 so that, for any z Σ 0 + ɛb, there is ẑ z + σd[z, Σ]B V H(z) so that H(z) + V (ẑ z) 1 µd[z, Σ] (6) 2 Then, with µ := µ(2σ) 1, is valid for all z Σ 0 + ɛb. µd[z, Σ] H(z). (7) µd[z, Σ] V T H(z) (8) Proof. Choose any z Σ 0 + ɛb any V H(z). If z Σ, then inequality (8) is obviously valid. Otherwise, if z (Σ 0 + ɛb) \ Σ, multiply the vector within the norm in (6) by H(z) T. With (6), this yields H(z) 2 H(z) T V (ẑ z) H(z) T H(z) + H(z) T V (ẑ z) µ H(z) d[z, Σ]. From ẑ z + σd[z, Σ]B it follows that H(z) 2 1 2 µ H(z) d[z, Σ] σd[z, Σ] V T H(z). Dividing this by d[z, Σ] taking into account (7), we obtain 1 2 µ H(z) σ V T H(z). The previous theorem refines [3, Theorem 9]. In particular, (6) is a weaker assumption than the corresponding condition in [3]. Due to this refinement, it will be possible to apply Theorem 1 for the map H := Φ with Φ defined below. 3 Application to Karush-Kuhn-Tucker Systems In this section the case is dealt with that H(z) = 0 reformulates the KKT system in a particular but frequently used manner. Assumptions will be provided which ensure that (6) is satisfied, thus, that Theorem 1 is applicable to this case. 1 2 5

We first need to describe the KKT system its reformulation as system of equations in more detail. Let F : R n R n be a continuously differentiable function, g : R n R m h : R n R p twice continuously differentiable functions consider the system L(x, u, v) = 0 h(x) = 0 g(x) 0 u 0 u T g(x) = 0 (9) with the Lagrangian L : R n+m+p R n given by L(x, u, v) := F (x) + h(x)v g(x)u. System (9) is well known as the Karush-Kuhn-Tucker (KKT) system belonging to the variational inequality problem Find x G so that F (x) T (ξ x) 0 for all ξ G, (10) where G := {x R n h(x) = 0, g(x) 0}. If x is a solution of (10) if a certain constraint qualification is satisfied at x then (u, v) exists so that (x, u, v) solves (9). Moreover, under a certain constraint qualification, the system (9), with F := f for f : R n R sufficiently smooth, states necessary optimality conditions associated to the programming problem f(x) min s.t. x G. Therefore, a basic approach for solving such programs or the variational inequality problem (10) is to determine a solution of the KKT system (9). To this end, (9) is often reformualated as a system of equations. A frequently used approach is based on the function ϕ : R 2 R given by ϕ(a, b) := a 2 + b 2 a b. Since ϕ equals to zero if only if a 0, b 0, ab = 0, it is easily verified that (9) is equivalent to Φ(z) = 0, where z := (x, u, v) with Now, if we set Φ(z) := L(z) h(x) φ(z) φ(z) := (ϕ(g 1 (x), u 1 ),..., ϕ(g m (x), u m )) T. H := Φ, (11) the same question an Sections 1 2 becomes of interest. Namely, which assumptions ensure that the error bound condition (3) implies (5). However, the function Q as defined in (4) employed in (5) is not well defined now. This due to the fact that ϕ is nondifferentiable at (0, 0) so that H is not necessarily everywhere differentiable. Nevertheless, the merit function Ψ(z) = 1 2 Φ(z)T Φ(z) 6

is continously differentiable [2, 5] the function Q can be defined by It holds that Any matrix V Φ(z) can be written as V = Q(z) := Ψ(z). Ψ(z) = V T Φ(z) for all V Φ(z). (12) x L(x, u, v) h(x) g(x)d a (g(x), u) h(x) T 0 0 g(x) T 0 D b (g(x), u) with diagonal matrices D a (g(x), u) D b (g(x), u). Their i-th diagonal entries a(g i (x), u i ) b(g i (x), u i ), respectively, are given by T (13) a(g i (x), u i ) = a ϕ(g i (x), u i ), b(g i (x), u i ) = b ϕ(g i (x), u i ) (14) if (g i (x), u i ) (0, 0), where a ϕ(a, b) = a a2 + b 2 1, bϕ(a, b) = b a2 + b 2 1 (15) for (a, b) (0, 0). Otherwise, if (g i (x), u i ) = (0, 0), there are α i, β i R so that a(g i (x), u i ) = α i 1, b(g i (x), u i ) = β i 1 with α 2 i + β 2 i 1. (16) To answer the main question under which assumptions (3) implies (5) we would like to apply Theorem 1. Therefore, besides the error bound condition (7) which corresponds to (3), condition (6) need to be satisfied. To achieve thin the case that H = Φ the subsequent assumption plays a key role. Its formulation the analysis thereafter require some index sets. With I := {1,..., m} let I C (z) := {i I g i (x) = u i = 0} denote the set of all indices that are complementary at z R n+m+p. z R n+m+p t 0 define I(z, t) := {i I max{u i, g i (x)} t}. Note that I C (z) I(z, t) for any z Σ any t 0. Moreover, for Assumption 1 Let Σ 0 Σ be nonempty closed ɛ 1 > 0 be given. For any N 1 there are σ > 0 τ > 0 so that, for any y (Σ 0 + ɛ 1 B) Σ any t [0, τ], there is y t Σ with y y t σt I(y t, N max{ y y t, t}) = I C (y t ). 7

Roughly speaking, this assumption requires that for any point in the set (Σ 0 + ɛ 1 B) Σ there is a point in Σ so that both points are not too far away from each other that the complementary indices of the latter point are stable in a certain sense. Before exploiting Assumption 1 let us refer to Section 4. There, an error bound condition is presented under which Assumption 1 can be fulfilled. Besides the smoothness conditions stated at the beginning of this section we will make use of the following additional Lipschitz-continuity conditions. Assumption 2 There is L 1 so that, for all z, z Σ 0 + B, a) g i (x) g i (x ) L x x for all i I, b) g i (x) g i (x ) L x x for all i I, h(x) h(x ) L x x, L(z) L(z ) L z z. If Σ 0 is bounded then Assumption 2 except the third inequality in part b) is satisfied. If, in addition, F, 2 g, 2 h are locally Lipschitz-continuous then the latter inequality holds as well. Lemma 1 Suppose that Assumption 1 is satisfied. Then, for any N 1, there is ˆρ > 0 so that, for any z Σ 0 + ˆρB, a vector ẑ Σ (17) exists with z ẑ (σ + 1)d[z, Σ] (18) g i (ˆx) + û i > 1 2 N max{ z ẑ, d[z, Σ]} i I \ I C(ẑ). (19) If, in addition, Assumption 2 a) holds N 4L, then I C (z) I C (ẑ). (20) Proof. Let N 1 be arbitrary but fixed. Then, with ɛ 1, σ > 0, τ > 0 existing due to Assumption 1, define ˆρ := min{1, τ, 1 2 ɛ 1 1, }. (21) σ + 2 Now, choose any z Σ 0 + ˆρB. Then, there is y Π(z, Σ). From (21) it follows that d[y, Σ 0 ] z y + d[z, Σ 0 ] 2d[z, Σ 0 ] 2ˆρ ɛ 1. Thus, To apply Assumption 1, define Due to (21), thimplies y (Σ 0 + ɛ 1 B) Σ. (22) t := d[z, Σ]. (23) t d[z, Σ 0 ] ˆρ τ. (24) 8

Assumption 1 together with (22) (24) ensures that ẑ := y t Σ exists with y ẑ σt = σd[z, Σ] We therefore obtain (18) by g i (ˆx) + û i > N max{ y ẑ, t} i I \ I C (ẑ). z ẑ z y + y ẑ (1 + σ)d[z, Σ] (19) since g i (ˆx) + û i > N max{ y ẑ, t} N max{ z ẑ y z, t} = N max{ z ẑ t, t} 1 N max{ z ẑ, t} 2 for all i I \ I C (ẑ). To verify (20) we first note that, by (21) (18), z Σ 0 + ˆρB Σ 0 + B, ẑ z + (σ + 1)d[z, Σ]B Σ 0 + ˆρB + (σ + 1)ˆρB Σ 0 + B. (25) Hence, Assumption 2 a), N 4L, L 1 provide g i (ˆx) + û i g i (ˆx) g i (x) + û i u i L ˆx x + û u 1 N ẑ z 2 for any i I C (z). Therefore, (19) yield I C (ẑ). Lemma 2 Suppose that Assumptions 1 2 are satisfied that µ > 0 is given. Then, there is ρ > 0 so that, for any z Σ 0 + ρb, a vector ẑ Σ exists with z ẑ (σ + 1)d[z, Σ] (26) Φ(z) + V (ẑ z) 1 µd[z, Σ] for all V Φ(z). (27) 2 Proof. To apply Lemma 1 choose N so that N 4L. (28) Then, according to Lemma 1, ˆρ > 0 exists so that, for any z Σ 0 + ˆρB, there is ẑ so that (17) (20) are satisfied. Based on this we will show that (z, ẑ) also satisfies (26) (27) if z Σ 0 + ρb, where ρ is given by ρ := min{ˆρ, µ 8 }. (29) m(σ + 1) 2 L Obviously, (26) directly follows from (18). To prove (27), the term R(z, ẑ) := Φ(z) + V (ẑ z) (30) 9

will be investigated componentwise for V Φ(z) arbitrary but fixed. The first n + p components of R(z, ẑ) read as follows R 1 n (z, ẑ) = L(z) + L(z) T (ẑ z) R n+1 n+p (z, ẑ) = h(x) + h(x) T (ˆx x). Taylor s formula, L(ẑ) = 0 due to (17), Assumption 2 b) with z, ẑ Σ 0 + B (an (25)) yield R 1 n (z, ẑ) = 1 Taking into account (26) (29) we further get In the same way one can show that 0 ( L(z + s(ẑ z)) L(z)) T (ẑ z)ds L ẑ z 2. R 1 n (z, ẑ) L(σ + 1) 2 d[z, Σ] 2 µ d[z, Σ]. (31) 8 We now consider the last m components of (30), i.e., R n+1 n+p (z, ẑ) µ d[z, Σ]. (32) 8 R n+p+i (z, ẑ) = φ i (z) + v i (ẑ z) i = 1,..., m, where v i is the (n + p + i)-th row of V, thus v i φ i (z). Taylor s formula yields with r i (x, ˆx) := g i (x) T (ˆx x) = g i (ˆx) g(x) r i (x, ˆx) (33) 1 Similar to showing (31), we get 0 ( g i (x + s(ˆx x)) g i (x)) T (ˆx x)ds. r i (x, ˆx) µ 8 d[z, Σ]. (34) m Now, for any i I two cases are distinguished: a) i I C (z). Due to (20) in Lemma 1, i I C (ẑ) follows. Thus, i I C (z) I C (ẑ) g i (x) = g i (ˆx) = u i = û i = φ i (z) = 0. (35) Using the representation (13) of matrices V contained in Φ(z) together with (16), we get R n+p+i (z, ẑ) = φ i (z) + v i (ẑ z) = (α i 1) g i (x) T (ˆx x) + (β i 1)(û i u i ) = (α i 1) g i (x) T (ˆx x). Therefore, (33), (35), (34), α i 1 2 (by 16) imply R n+p+i (z, ẑ) µ 4 d[z, Σ]. m 10

b) i I \ I C (z). Then, (g i (x), u i ) (0, 0) so that φ i is continuously differentiable at z = (x, u, v). With (13) (14), we obtain φ i (z) T (ẑ z) = a ϕ(g i (x), u i ) g i (x) T (ˆx x) + b ϕ(g i (x), u i )(û i u i ). Having (15) in mind setting := g i (x) 2 + u 2 i, we further get φ i (z) T (ẑ z) = ( g i(x) Together with (33), we have 1) g i (x) T (ˆx x) + ( u i 1)(û i u i ). R n+p+i (z, ẑ) = φ i (z) + φ i (z) T (ẑ z) = g i (x) u i + ( g i(x) 1) g i (x) T (ˆx x) + ( u i 1)(û i u i ) = g i (x) + ( g i(x) 1)(g i (ˆx) g i (x) r i (x, ˆx)) u2 i + ( u i 1)û i = g i(x) 2 +u 2 i + ( g i(x) 1)(g i (ˆx) r i (x, ˆx)) + ( u i 1)û i, by the definition of, R n+p+i (z, ẑ) = ( g i(x) Now, three subcases of case b) are considered. b1) i I C (ẑ). Then, g i (ˆx) = û i = 0. From (36) (34) 1)(g i (ˆx) r i (x, ˆx)). + ( u i 1)û i (36) R n+p+i (z, ẑ) = g i(x) 1 r i (x, ˆx) µ 4 d[z, Σ]. m follows. b2) i I \ I C (ẑ) g i (ˆx) > 0. Then, by (17), û i = 0. Moreover, (19) can be exploited. Therefore, having Assumption 2 a) (28) in mind, we get g i (x) g i (ˆx) L x ˆx > 1g 2 i(ˆx) + 1 N max{ z ẑ, d[z, Σ]} L x ˆx 4 1g 2 i(ˆx) > 0 (37) g i (x) 1 2 g i(ˆx) > 1 N max{ z ẑ, d[z, Σ]}. (38) 4 This, Assumption 2 a), (28) yield g i (x ) g i (x) L x x g i (x) Ld[z, Σ] > 0 11

for z = (x, u ) Π(z, Σ). Since z Σ, thimplies u i = 0 By an appropriate Taylor expansion we have u i = u i u i z z = d[z, Σ]. (39) a a 2 + b 2 b2 2a (a, b) (0, ) R. Setting a := g i (x) b := u i, it follows with (37) (39) that g i (x) = g i(x) gi (x) 2 + u 2 i u2 i 2 g i (x) d[z, Σ]2 2g i (x) 2. Therefore, with (37) (34), we further get R n+p+i (z, ẑ) = g i(x) g i (ˆx) + r i (x, ˆx) This (38) lead to Obviously, for N sufficiently large, R n+p+i (z, ẑ) 4 N d[z, Σ] + µ 4N m d[z, Σ]2. R n+p+i (z, ẑ) µ 4 d[z, Σ] m d[z, Σ]2 2g i (x) (2g i(x) + µ 2 8 d[z, Σ]). m follows. b3) i I \ I C (ẑ) û i > 0. In a very similar way the same estimate an case b2) can be obtained by carefully interchanging certain terms, (g i (ˆx) with û i or g i (x ) with u i, for instance). The results for the cases a) b1) b3) together with (31) (32) show that Φ(z) + V (ẑ z) = R(z, ẑ) 1 µd[z, Σ]. 2 holds for all V Φ(z). Theorem 2 Suppose that Assumptions 1 2 are satisfied. Moreover, assume that there are ɛ, µ > 0 so that µd[z, Σ] Φ(z) z Σ 0 + ɛb. (40) Then, there is µ > 0 so that µd[z, Σ] Ψ(z) z Σ 0 + ɛb. Proof. Apply Theorem 1 for H := Φ with σ + 1 instead of σ, where (12) Lemma 2 with µ > 0 from (40) have to be taken into account. 12

4 Pleasant Karush-Kuhn-Tucker Systems In this section conditions are provided under which Assumption 1 is satisfied. To proceed let us first define investigate the activity pattern belonging to any z Σ. Definition 1 For any z Σ the activity pattern p(z) := (g(z), u(z)) is defined by g(z) := {i I g i (x) = 0}, u(z) := {i I u i = 0}. All activity patterns of pointn z Σ are collected in the set In addition, let P(Σ) := {p(z) z Σ}. P := {p = (g, u) g, u {1,..., m}, g u = {1,..., m}}. Elements p 1 = (g 1, u 1 ) p 2 = (g 2, u 2 ) of P are said to be related, p 1 p 2 for short, if g 1 g 2 u 1 u 2. In addition, p 1 p 2 is used to denote that p 1 p 2 p 1 p 2. Any element p P(Σ) is called maximal if no q P(Σ) exists with p q. The set of all maximal elementn P(Σ) is denoted by P max (Σ). Finally, let P a (Σ) := {p P p q P(Σ)}. Obviously, P collects all those activity patterns that are potentially possible but need not occur in the solution set of a particular KKT system. The only requirement an element (g, u) of P has to satisfy is the complementarity condition, i.e. that each index i I is contained in at least one of the sets g u. Moreover, the inclusions P(Σ) P a (Σ) P. can easily be verified. For any p = (g, u) P, let the map Φ p the cone K p be defined by L(z) h(x) F p (z) := g g (x) u u, K p := {0} n+p+ g + u R m+m +, g(x) u where Then, for any p = (g, u) P, the set g g := ( g i ) T i g u u := ( u i ) T i u. Σ p := {z R n+m+p F p (z) K p } is possibly empty contained in Σ. For affine KKT systems any set Σ p with p P is a closed polyhedron. 13

Lemma 3 a) If p 1, p 2 P, then p 1 p 2 implies Σ p2 Σ p1. b) For any p P a (Σ), there is p max P max (Σ) so that p p max. c) The set Σ p is nonempty if only if p P a (Σ). Proof. Obvious. Lemma 4 Let Σ 2 Σ be nonempty compact. Then, there is κ 0 (0, 1] so that g i (x) + u i κ 0 i I \ I C (z) for any z Σ 2 with p(z) = (g(z), u(z)) P max (Σ). Proof. Assume the contrary. Then, a sequence {z ν } Σ 2, z Σ 2, p P max (Σ), i I must exist so that (i, i) / p(z ν ) = p P max (Σ) ν N lim (g i(x ν ) + u ν i ) = 0, ν lim z ν = z. ν With the continuity of g it follows that p p(z ). Thus, p cannot be maximal. Lemma 5 Let Σ 2 Σ be nonempty compact. Then, for any η > 0, there is κ (0, κ 0 ] so that, for all z Σ 2 all i I, g i (x) + u i κ (41) implies p(z) (i, i) P a (Σ) (42) inf{ z s s Σ p(z) (i,i) } η. (43) Proof. Fix η > 0. Assume first that there is no κ (0, κ 0 ] so that (41) implies (42) for all z Σ 2 all i I. Then, sequences {κ ν } (0, κ 0 ] {z ν } Σ 2, j I, ẑ = (ˆx, û, ˆv) Σ 2 must exist with lim κ ν = 0, ν lim ν z ν = ẑ, (44) g j (x ν ) + u ν j κ ν ν N, (45) p(z ν ) (j, j) P \ P a (Σ) ν N. (46) Taking suitable subsequencef necessary we have that, without loss of generality, p(z ν ) = ˆp ν N (47) for some fixed ˆp = (ĝ, û) P(Σ). Now consider any i I. If i ĝ, we have from (44) by the continuity of g i that lim g i(x ν ) = g i (ˆx) = 0, ν lim u ν i = û i 0 ν 14

If i û, we get Hence, with ẑ Σ 2 Σ, follows. Since (45) implies lim g i(x ν ) = g i (ˆx) 0, ν lim u ν i = û i = 0. ν ˆp p(ẑ) P(Σ) (48) lim g j(x ν ) + u ν j = g j (ˆx) + û j = 0, ν we further get (j, j) p(ẑ). This together with (48) yields ˆp ˆp (j, j) p(ẑ) P(Σ). (49) Thus, ˆp (j, j) P a (Σ) which contradicts (46) (47). Therefore, (42) is valid for all z Σ 2. To show that (43) implied by (41) first note that, due to (42) Lemma 3 c), Σ p(z) (i,i) for all z Σ 2 all i I satisfying (41). Thus, the left term in (43) is well defined. Let us assume that (43) does not hold. Then, we can repeat all steps of the previous part of the proof until formula (49) with the only modification that (46) is replaced by From (49), Lemma 3 a), (47) we have Therefore, since ẑ Σ p(ẑ), inf{ z ν y y Σ p(z ν ) (j,j)} > η ν N. (50) Σ p(ẑ) Σˆp (j,j) = Σ p(z ν ) (j,j) ν N. inf{ z ν y y Σ p(z ν ) (j,j)} z ν ẑ follows. By (44), this contradicts (50) for ν N sufficiently large. Assumption 3 (Pleasant KKT System) There are ω (0, 1] δ > 0 so that, for any p P a (Σ), ωd[z, Σ p ] inf F p (z) f z Σ p + δb. f K p Theorem 3 Let Assumption 3 be satisfied suppose that Σ 0 Σ is nonempty compact. Moreover, let ɛ 1 > 0 be given. Then, Assumption 1 is satisfied. Proof. Let N 1 be arbitrary but fixed define σ the sets Σ 1 Σ 2 Σ by σ := (1 + ω 1 N) m Σ 1 := (Σ 0 + ɛ 1 B) Σ, Σ 2 := (Σ 1 + σb) Σ. 15

Since Σ 0 is compact by assumption the same holds for Σ 1 Σ 2. Therefore, with η := δ (δ > 0 from Assumption 3), Lemma 5 provides some κ (0, κ 0 ] (0, 1] we can define τ := κσ 1 < 1. (51) Now, choose any (y, t) Σ 1 [0, τ] define vectors z 0,..., z m numbers σ 0,..., σ m recursively as follows. First, let z 0 := y. To define z k+1 from z k for k {0,..., m 1} choose If then set choose Otherwise, set z k+1 := z k. Finally, let i k argmin{g i (x k ) + u k i i I \ I C (z k )}. g ik (x k ) + u k i k N max{ y z k, t}, (52) p k := p(z k ) (i k, i k ) (53) z k+1 argmin{ z z k z Σ p k}. (54) σ k := (1 + ω 1 N) k (55) for k = 0,..., m. We now show by induction that z 0,..., z m are well defined that y z k σ k t z k Σ 2. (56) holds for k = 0, 1,..., m. For k := 0 we get that z 0 = y σ 0 = 1 so that (56) is obviously satisfied. Now, let (56) be valid for some k {0,..., m 1}. If (52) is violated, then z k+1 = z k (56) must hold. Therefore, we only need to consider the case if (52) is satisfied. In view of σ k 1, t [0, τ], (56), (55), (51), thimplies g ik (x k ) + u k i k Nσ k t N(1 + ω 1 N) k τ σ k+1 τ στ = κ. (57) Therefore since z k Σ 2, we can apply Lemma 5. Together with (53), p(z k ) p k = p(z k ) (i k, i k ) P a (Σ) (58) follows. Thus, by Lemma 3 c), the closed set Σ p k is nonempty so that z k+1 is well defined by (54). Moreover, (43) in Lemma 5 (with η := δ) gives inf{ z k s s Σ p k} = z k z k+1 η = δ. Hence, since z k+1 Σ p k, Assumption 3 can be exploited for z := z k p := p k leads to z k z k+1 = d[z k, Σ p k] ω 1 inf F p k(z k ) f. 16 f K p k

Due to z k Σ p(z k ) (53), the definition of F p k yields so that, with (57), inf F p k(z k ) f = g ik (x k ) + u k i f K k p k z k z k+1 ω 1 Nσ k t. follows. Therefore, by (56), t [0, τ], τ < 1 from (51), (55), we have y z k+1 y z k + z k z k+1 (1 + ω 1 N)σ k t = σ k+1 t σ. Thus, (56) is true for k + 1 instead of k so for all k {0,..., m}. From (58) for k = 0,..., m 1 it follows that Together with p(z k ) p k p(z k+1 ) k {0,..., m 1}. I g(z 0 ) u(z 0 ), for p(z 0 ) = (g(z 0 ), u(z 0 )), we have that there is k 0 {0,..., m} so that p(z k ) P max (Σ) k {k 0,..., m}. According to Lemma 4 this means that g i (x k ) + u k i κ 0 i I \ I C (z k ) k {k 0,..., m}. Therefore, (52) is violated for all k {k 0,..., m}. In particular, together with (56) for k = m, it follows that y z m σt, g i (x m ) + u m i > N max{ y z m, t} i I \ I C (z m ). Hence, y t := z m has exactly the properties required in Assumption 1. Corollary 1 Suppose that Σ 0 Σ is nonempty compact. Moreover, let Assumptions 2 3 be satisfied. If there are ɛ, µ > 0 so that µd[z, Σ] Φ(z) z Σ 0 + ɛb, then there are ɛ, µ > 0 so that µd[z, Σ] Ψ(z) z Σ 0 + ɛb. Proof. The assertion directly follows from Theorem 2 Theorem 3. For affine KKT systems, i.e., if F, g, h are affine functions, Corollary 1 can be simplified as follows. Corollary 2 Suppose that Σ 0 Σ is nonempty compact. If the KKT system is affine then there are ɛ, µ > 0 so that µd[z, Σ] Ψ(z) z Σ 0 + ɛb. 17

Proof. Since F, g, h are affine, the function F p is affine for any p P. Thus, Assumption 2 is satisfied. Moreover, by Lemma 3 c), Σ p is nonempty for all p P a (Σ) Therefore, Hoffman s error bound [6] for affine systems of inequalities ensures that, for any p P a (Σ), there are are δ p, ω p > 0 so that ω p d[z, Σ p ] inf f K p F p (z) f z Σ p + δ p B. Since P a (Σ) is a finite set, Assumption 3 is satisfied with δ := min{δ p p P a (Σ)} ω := min{ω p p P a (Σ)}. Hence, by Theorem 3, Assumption 1 holds for any ɛ 1 > 0. Altogether, Theorem 2 provides the desired result. Affine KKT systems are of particular interest for future research. We think that the compactness of Σ 0 Σ as often assumed in this paper can be removed for affine KKT systems. Another improvement of the resultn this section might be obtained by using a more local version of Assumption 3 so that not all Σ p with p P a (Σ) occur. References [1] Clarke, F.H. (1983): Optimization Nonsmooth Analysis. John Wiley Sons, NY [2] Facchinei, F., Soares, J. (1997): A new merit function for nonlinear complementarity problems a related algorithms. SIAM Journal on Optimization, 7, 225 247 [3] Fischer, A. (2001): Local behavior of an iterative framework for generalized equations with nonisolated solutions. Applied Mathematics Report 203, Department of Mathematics, University of Dortmund, Dortmund (revised 2002) [4] Fischer, A. (2002): Limiting behavior of an algorithmic framework for Karush- Kuhn-Tucker systems. Forthcoming. [5] Geiger, C., Kanzow, C. (1996): On the resolution of monotone complementarity problems. Computational Optimization Applications, 5, 155 173 [6] Hoffman, A. J. (1952): On approximate solutions of systems of linear inequalities, Journal of Research of the National Bureau of Stards, 49, 263 265 [7] Yamashita, N., Fukushima, M. (2001): On the rate of convergence of the Levenberg- Marquardt method. Computing, 15, 239 249 18