Technische Universität Dresden Herausgeber: Der Rektor

Similar documents
A SIMPLY CONSTRAINED OPTIMIZATION REFORMULATION OF KKT SYSTEMS ARISING FROM VARIATIONAL INEQUALITIES

Technische Universität Dresden Institut für Numerische Mathematik. An LP-Newton Method: Nonsmooth Equations, KKT Systems, and Nonisolated Solutions

WHEN ARE THE (UN)CONSTRAINED STATIONARY POINTS OF THE IMPLICIT LAGRANGIAN GLOBAL SOLUTIONS?

The effect of calmness on the solution set of systems of nonlinear equations

ON REGULARITY CONDITIONS FOR COMPLEMENTARITY PROBLEMS

A PENALIZED FISCHER-BURMEISTER NCP-FUNCTION. September 1997 (revised May 1998 and March 1999)

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

A QP-FREE CONSTRAINED NEWTON-TYPE METHOD FOR VARIATIONAL INEQUALITY PROBLEMS. Christian Kanzow 1 and Hou-Duo Qi 2

Technische Universität Dresden Institut für Numerische Mathematik

Optimality Conditions for Constrained Optimization

Solving generalized semi-infinite programs by reduction to simpler problems.

First-order optimality conditions for mathematical programs with second-order cone complementarity constraints

Numerical Optimization

Implications of the Constant Rank Constraint Qualification

A Novel Inexact Smoothing Method for Second-Order Cone Complementarity Problems

Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions

system of equations. In particular, we give a complete characterization of the Q-superlinear

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

On the Coerciveness of Merit Functions for the Second-Order Cone Complementarity Problem

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

FIXED POINT ITERATIONS

FIRST- AND SECOND-ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH VANISHING CONSTRAINTS 1. Tim Hoheisel and Christian Kanzow

5 Handling Constraints

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Constraint qualifications for nonlinear programming

Lecture 13 Newton-type Methods A Newton Method for VIs. October 20, 2008

20 J.-S. CHEN, C.-H. KO AND X.-R. WU. : R 2 R is given by. Recently, the generalized Fischer-Burmeister function ϕ p : R2 R, which includes

1. Introduction. We consider the general smooth constrained optimization problem:

1. Introduction The nonlinear complementarity problem (NCP) is to nd a point x 2 IR n such that hx; F (x)i = ; x 2 IR n + ; F (x) 2 IRn + ; where F is

Constrained Optimization and Lagrangian Duality

Laplace s Equation. Chapter Mean Value Formulas

Differentiable exact penalty functions for nonlinear optimization with easy constraints. Takuma NISHIMURA

Affine scaling interior Levenberg-Marquardt method for KKT systems. C S:Levenberg-Marquardt{)KKTXÚ

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 19 Algorithms for VIs KKT Conditions-based Ideas. November 16, 2008

Lecture 3. Optimization Problems and Iterative Algorithms

1 The Observability Canonical Form

Constraint Identification and Algorithm Stabilization for Degenerate Nonlinear Programs

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

First order optimality conditions for mathematical programs with second-order cone complementarity constraints

A Regularized Directional Derivative-Based Newton Method for Inverse Singular Value Problems

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

Inexact alternating projections on nonconvex sets

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

MODIFYING SQP FOR DEGENERATE PROBLEMS

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

A strongly polynomial algorithm for linear systems having a binary solution

A Continuation Method for the Solution of Monotone Variational Inequality Problems

A FRITZ JOHN APPROACH TO FIRST ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Manual of ReSNA. matlab software for mixed nonlinear second-order cone complementarity problems based on Regularized Smoothing Newton Algorithm

AN ABADIE-TYPE CONSTRAINT QUALIFICATION FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS. Michael L. Flegel and Christian Kanzow

1. Introduction. Consider the following parameterized optimization problem:

SOME STABILITY RESULTS FOR THE SEMI-AFFINE VARIATIONAL INEQUALITY PROBLEM. 1. Introduction

A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties

Stationary Points of Bound Constrained Minimization Reformulations of Complementarity Problems1,2

Constrained Optimization

Nonlinear Programming, Elastic Mode, SQP, MPEC, MPCC, complementarity

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

1. Introduction. We consider the mathematical programming problem

WEIERSTRASS THEOREMS AND RINGS OF HOLOMORPHIC FUNCTIONS

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION

Radius Theorems for Monotone Mappings

Department of Social Systems and Management. Discussion Paper Series

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

SOLUTION OF NONLINEAR COMPLEMENTARITY PROBLEMS

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

SEMISMOOTH LEAST SQUARES METHODS FOR COMPLEMENTARITY PROBLEMS

YURI LEVIN, MIKHAIL NEDIAK, AND ADI BEN-ISRAEL

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace

Using exact penalties to derive a new equation reformulation of KKT systems associated to variational inequalities

The Karush-Kuhn-Tucker (KKT) conditions

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

4TE3/6TE3. Algorithms for. Continuous Optimization

2.3 Linear Programming

Convex Analysis and Optimization Chapter 2 Solutions

Some new facts about sequential quadratic programming methods employing second derivatives

Key words. linear complementarity problem, non-interior-point algorithm, Tikhonov regularization, P 0 matrix, regularized central path

Iteration-complexity of first-order penalty methods for convex programming

Existence of minimizers

Sharpening the Karush-John optimality conditions

A note on upper Lipschitz stability, error bounds, and critical multipliers for Lipschitz-continuous KKT systems

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

On the Local Convergence of Regula-falsi-type Method for Generalized Equations

Transcription:

Als Manuskript gedruckt Technische Universität Dresden Herausgeber: Der Rektor The Gradient of the Squared Residual as Error Bound an Application to Karush-Kuhn-Tucker Systems Andreas Fischer MATH-NM-13-2002 August 2002

The Gradient of the Squared Residual as Error Bound an Application to Karush-Kuhn-Tucker Systems Andreas Fischer Department of Mathematics University of Dresden 01062 Dresden Germany fischer@math.tu-dresden.de August 2002 Abstract. A general relationship between the natural residual of a system of equations the necessary optimality conditions associated to the corresponding least squares problem is presented. Based on this an error bound result for the gradient of a least squares reformulation of Karush-Kuhn-Tucker systems will be derived. 1 Introduction Error bounds have been turned out an essential tool for analyzing both the global the local convergence behavior of algorithms for solving equations or more general problems. Besides their use for theoretical purposes appropriate error bounds play a central role in several techniques for globalizing locally convergent algorithms. More recently, computable error bounds have been successfully employed for achieving or improving certain local properties of algorithmn cases where classical regularity conditions are violated. Let us first mention techniques for the locally accurate identification of constraints which are active at a solution of a Karush-Kuhn-Tucker system. Secondly, several modifications of Newton-type methods have been developed that guarantee superlinear convergence properties even in cases where no isolated solution exists. For such modifications the existence of an appropriate error bound is one of the key assumptions for superlinear convergence. Let us consider the problem of solving the equation H(z) = 0, (1) where H : R l 1 R l 2 is a given map. Its solution set is assumed to be nonempty 2

denoted by Σ, i.e., Σ := {z R l 1 H(z) = 0}. (2) In [7] [3] Levenberg-Marquardt type algorithms have been suggested for solving (1) provided that H is continuously differentiable. The local Q-quadratic rate of convergence in [7] relies, besides further assumptions, on an error bound condition like µd[z, Σ] H(z) z Σ 0 + ɛb (3) for some ɛ, µ > 0, where Σ 0 denotes a nonempty closed subset of Σ d[z, Σ] the distance of z to Σ. In [3] a general approach for solving generalized equations with nonisolated solutions has been developed. It exploits the upper Lipschitz-continuity of the solution set map belonging to a perturbed generalized equation. In the case of problem (1) this assumption is equivalent to the error bound condition above. As an application of the approach just mentioned a prox-regularized Newton-type method for solving the necessary optimality conditions Q(z) := H(z)H(z) = 0 (4) associated to the least squares problem Ψ(z) := 1 2 H(z)T H(z) min is suggested in [3, Section 5.4]. Therefore, instead of (3), an error bound condition for problem (4) is of importance. This condition reads for some ɛ, µ > 0. Note that where µd[z, Σ] Q(z) z Σ 0 + ɛb (5) d[z, S] d[z, Σ] z R l 1, S := {z R l 1 Q(z) = 0} Σ denotes the solution set of problem (4). Obviously, if H(z) is bounded in a neighborhood of Σ, the error bound condition (5) implies condition (3) with µ > 0 suitably chosen. However, since (3) is a natural error bound condition for problem (1), the question arises under which assumptions (3) implies (5). An answer to this basic question will be given by Theorem 1 in Section 2. This theorem is not only applicable to continuously differentiable maps but also to a certain class of nondifferentiable maps H. In particular, maps Φ can be dealt with that are frequently used to reformulate Karush-Kuhn-Tucker (KKT) systems as systems of nondifferentiable equations. To this end, we show in Section 3 that if the KKT system satisfies certain assumptions the map Φ belongs to the class of nondifferentiable maps covered by Theorem 1. In Section 4, it is shown that there is a reasonably large class of KKT systems that satisfy this assumption. In particular, affine KKT systems do so. If (3) implies (5) then this can be exploited for the analysis of the local behavior of algorithms. Suppose that a line search algorithm makes use of search directions d := M(z) H(z)H(z) = M(z)Q(z) 3

with a certain matrix M(z) R l 1 l 1. Then, (3) (5) can be helpful for estimating the local progress, in particular, if the step length is computed by decreasing a merit function like Ψ. A corresponding example is provided in the forthcoming paper [4]. There, a class of algorithms for solving KKT systems with nonisolated solutionnvestigated in respect to the limiting behavior of the iterates of active set estimates. Another advantage of having (5) is that any solution of Q(z) = 0 in a neighborhood of Σ 0 also solves H(z) = 0 without any further condition. Notation: Throughout the paper denotes the Euclidean vector norm or the induced matrix norm. The unit ball (of appropriate dimension) is always denoted by B. For a set S R l a point z R l the distance of z to S is defined as d[z, S] := inf{ z s s S} if S d[z, S] := otherwise. Moreover, by Π(z, S) := {s S z s = d[z, S]} we denote the set of all pointn S that have minimal distance to S. If S is nonempty closed then Π(z, S) is nonempty for any z R l. Let G : R d 1 R d 2 be a locally Lipschitz-continuous function. Then, Clarke s generalized Jacobian of G at z R d 1 exists is denoted by G(z). If G is continuously differentiable at z then G(z) = G(z) T. Definition further properties of G(z) can be found in [1]. 2 The Gradient of the Squared Residual as Error Bound Let us first discuss the question whether assumptions (3) implies (5) in the classical situation where H is continuously differentibale H(z ) is nonsingular at a solution z of (1). Then, with the continuity of H, Taylors formula shows that z is an isolated solution. Moreover, in a neighborhood of z, H(z) 1 exists its norm is bounded above. Therefore, setting Σ 0 := {z }, condition (3) for ɛ, µ > 0 sufficiently small implies (5) with µ > 0 suitably chosen. If, however, z is a nonisolated solution of (1), then H(z ) must be singular as long as H is continuous. Moreover, even if H(z) 1 exists for z close to z, its norm cannot be bounded. Nevertheless, (3) implies (5). This follows from Theorem 9 in [3]. This theorem is applicable not only to continuously differentiable maps but also to a certain class of nondifferentiable maps H. We will now present a corresponding theorem for a larger class of nondifferentiable maps H. To this end, let H : R l 1 R l 2 denote a locally Lipschitz-continuous map consider problem (1), i.e., H(z) = 0. According to (2), this problem is assumed to have a nonmepty solution set denoted by Σ. The necessary optimality condition for minimizing the squared residual Ψ(z) = 1 2 H(z)T H(z). reads 0 Ψ(z), 4

where Ψ(z) = H(z) T H(z) holds according to an appropriate chain rule [1]. We will now derive conditions under which the norm of elementn Ψ(z) can serve as an error bound for d[z, Σ], i.e., for the distance of z to the solution set of (1). Theorem 1 Let Σ 0 Σ be nonempty closed. Assume that there are ɛ, µ > 0, σ 1 so that, for any z Σ 0 + ɛb, there is ẑ z + σd[z, Σ]B V H(z) so that H(z) + V (ẑ z) 1 µd[z, Σ] (6) 2 Then, with µ := µ(2σ) 1, is valid for all z Σ 0 + ɛb. µd[z, Σ] H(z). (7) µd[z, Σ] V T H(z) (8) Proof. Choose any z Σ 0 + ɛb any V H(z). If z Σ, then inequality (8) is obviously valid. Otherwise, if z (Σ 0 + ɛb) \ Σ, multiply the vector within the norm in (6) by H(z) T. With (6), this yields H(z) 2 H(z) T V (ẑ z) H(z) T H(z) + H(z) T V (ẑ z) µ H(z) d[z, Σ]. From ẑ z + σd[z, Σ]B it follows that H(z) 2 1 2 µ H(z) d[z, Σ] σd[z, Σ] V T H(z). Dividing this by d[z, Σ] taking into account (7), we obtain 1 2 µ H(z) σ V T H(z). The previous theorem refines [3, Theorem 9]. In particular, (6) is a weaker assumption than the corresponding condition in [3]. Due to this refinement, it will be possible to apply Theorem 1 for the map H := Φ with Φ defined below. 3 Application to Karush-Kuhn-Tucker Systems In this section the case is dealt with that H(z) = 0 reformulates the KKT system in a particular but frequently used manner. Assumptions will be provided which ensure that (6) is satisfied, thus, that Theorem 1 is applicable to this case. 1 2 5

We first need to describe the KKT system its reformulation as system of equations in more detail. Let F : R n R n be a continuously differentiable function, g : R n R m h : R n R p twice continuously differentiable functions consider the system L(x, u, v) = 0 h(x) = 0 g(x) 0 u 0 u T g(x) = 0 (9) with the Lagrangian L : R n+m+p R n given by L(x, u, v) := F (x) + h(x)v g(x)u. System (9) is well known as the Karush-Kuhn-Tucker (KKT) system belonging to the variational inequality problem Find x G so that F (x) T (ξ x) 0 for all ξ G, (10) where G := {x R n h(x) = 0, g(x) 0}. If x is a solution of (10) if a certain constraint qualification is satisfied at x then (u, v) exists so that (x, u, v) solves (9). Moreover, under a certain constraint qualification, the system (9), with F := f for f : R n R sufficiently smooth, states necessary optimality conditions associated to the programming problem f(x) min s.t. x G. Therefore, a basic approach for solving such programs or the variational inequality problem (10) is to determine a solution of the KKT system (9). To this end, (9) is often reformualated as a system of equations. A frequently used approach is based on the function ϕ : R 2 R given by ϕ(a, b) := a 2 + b 2 a b. Since ϕ equals to zero if only if a 0, b 0, ab = 0, it is easily verified that (9) is equivalent to Φ(z) = 0, where z := (x, u, v) with Now, if we set Φ(z) := L(z) h(x) φ(z) φ(z) := (ϕ(g 1 (x), u 1 ),..., ϕ(g m (x), u m )) T. H := Φ, (11) the same question an Sections 1 2 becomes of interest. Namely, which assumptions ensure that the error bound condition (3) implies (5). However, the function Q as defined in (4) employed in (5) is not well defined now. This due to the fact that ϕ is nondifferentiable at (0, 0) so that H is not necessarily everywhere differentiable. Nevertheless, the merit function Ψ(z) = 1 2 Φ(z)T Φ(z) 6

is continously differentiable [2, 5] the function Q can be defined by It holds that Any matrix V Φ(z) can be written as V = Q(z) := Ψ(z). Ψ(z) = V T Φ(z) for all V Φ(z). (12) x L(x, u, v) h(x) g(x)d a (g(x), u) h(x) T 0 0 g(x) T 0 D b (g(x), u) with diagonal matrices D a (g(x), u) D b (g(x), u). Their i-th diagonal entries a(g i (x), u i ) b(g i (x), u i ), respectively, are given by T (13) a(g i (x), u i ) = a ϕ(g i (x), u i ), b(g i (x), u i ) = b ϕ(g i (x), u i ) (14) if (g i (x), u i ) (0, 0), where a ϕ(a, b) = a a2 + b 2 1, bϕ(a, b) = b a2 + b 2 1 (15) for (a, b) (0, 0). Otherwise, if (g i (x), u i ) = (0, 0), there are α i, β i R so that a(g i (x), u i ) = α i 1, b(g i (x), u i ) = β i 1 with α 2 i + β 2 i 1. (16) To answer the main question under which assumptions (3) implies (5) we would like to apply Theorem 1. Therefore, besides the error bound condition (7) which corresponds to (3), condition (6) need to be satisfied. To achieve thin the case that H = Φ the subsequent assumption plays a key role. Its formulation the analysis thereafter require some index sets. With I := {1,..., m} let I C (z) := {i I g i (x) = u i = 0} denote the set of all indices that are complementary at z R n+m+p. z R n+m+p t 0 define I(z, t) := {i I max{u i, g i (x)} t}. Note that I C (z) I(z, t) for any z Σ any t 0. Moreover, for Assumption 1 Let Σ 0 Σ be nonempty closed ɛ 1 > 0 be given. For any N 1 there are σ > 0 τ > 0 so that, for any y (Σ 0 + ɛ 1 B) Σ any t [0, τ], there is y t Σ with y y t σt I(y t, N max{ y y t, t}) = I C (y t ). 7

Roughly speaking, this assumption requires that for any point in the set (Σ 0 + ɛ 1 B) Σ there is a point in Σ so that both points are not too far away from each other that the complementary indices of the latter point are stable in a certain sense. Before exploiting Assumption 1 let us refer to Section 4. There, an error bound condition is presented under which Assumption 1 can be fulfilled. Besides the smoothness conditions stated at the beginning of this section we will make use of the following additional Lipschitz-continuity conditions. Assumption 2 There is L 1 so that, for all z, z Σ 0 + B, a) g i (x) g i (x ) L x x for all i I, b) g i (x) g i (x ) L x x for all i I, h(x) h(x ) L x x, L(z) L(z ) L z z. If Σ 0 is bounded then Assumption 2 except the third inequality in part b) is satisfied. If, in addition, F, 2 g, 2 h are locally Lipschitz-continuous then the latter inequality holds as well. Lemma 1 Suppose that Assumption 1 is satisfied. Then, for any N 1, there is ˆρ > 0 so that, for any z Σ 0 + ˆρB, a vector ẑ Σ (17) exists with z ẑ (σ + 1)d[z, Σ] (18) g i (ˆx) + û i > 1 2 N max{ z ẑ, d[z, Σ]} i I \ I C(ẑ). (19) If, in addition, Assumption 2 a) holds N 4L, then I C (z) I C (ẑ). (20) Proof. Let N 1 be arbitrary but fixed. Then, with ɛ 1, σ > 0, τ > 0 existing due to Assumption 1, define ˆρ := min{1, τ, 1 2 ɛ 1 1, }. (21) σ + 2 Now, choose any z Σ 0 + ˆρB. Then, there is y Π(z, Σ). From (21) it follows that d[y, Σ 0 ] z y + d[z, Σ 0 ] 2d[z, Σ 0 ] 2ˆρ ɛ 1. Thus, To apply Assumption 1, define Due to (21), thimplies y (Σ 0 + ɛ 1 B) Σ. (22) t := d[z, Σ]. (23) t d[z, Σ 0 ] ˆρ τ. (24) 8

Assumption 1 together with (22) (24) ensures that ẑ := y t Σ exists with y ẑ σt = σd[z, Σ] We therefore obtain (18) by g i (ˆx) + û i > N max{ y ẑ, t} i I \ I C (ẑ). z ẑ z y + y ẑ (1 + σ)d[z, Σ] (19) since g i (ˆx) + û i > N max{ y ẑ, t} N max{ z ẑ y z, t} = N max{ z ẑ t, t} 1 N max{ z ẑ, t} 2 for all i I \ I C (ẑ). To verify (20) we first note that, by (21) (18), z Σ 0 + ˆρB Σ 0 + B, ẑ z + (σ + 1)d[z, Σ]B Σ 0 + ˆρB + (σ + 1)ˆρB Σ 0 + B. (25) Hence, Assumption 2 a), N 4L, L 1 provide g i (ˆx) + û i g i (ˆx) g i (x) + û i u i L ˆx x + û u 1 N ẑ z 2 for any i I C (z). Therefore, (19) yield I C (ẑ). Lemma 2 Suppose that Assumptions 1 2 are satisfied that µ > 0 is given. Then, there is ρ > 0 so that, for any z Σ 0 + ρb, a vector ẑ Σ exists with z ẑ (σ + 1)d[z, Σ] (26) Φ(z) + V (ẑ z) 1 µd[z, Σ] for all V Φ(z). (27) 2 Proof. To apply Lemma 1 choose N so that N 4L. (28) Then, according to Lemma 1, ˆρ > 0 exists so that, for any z Σ 0 + ˆρB, there is ẑ so that (17) (20) are satisfied. Based on this we will show that (z, ẑ) also satisfies (26) (27) if z Σ 0 + ρb, where ρ is given by ρ := min{ˆρ, µ 8 }. (29) m(σ + 1) 2 L Obviously, (26) directly follows from (18). To prove (27), the term R(z, ẑ) := Φ(z) + V (ẑ z) (30) 9

will be investigated componentwise for V Φ(z) arbitrary but fixed. The first n + p components of R(z, ẑ) read as follows R 1 n (z, ẑ) = L(z) + L(z) T (ẑ z) R n+1 n+p (z, ẑ) = h(x) + h(x) T (ˆx x). Taylor s formula, L(ẑ) = 0 due to (17), Assumption 2 b) with z, ẑ Σ 0 + B (an (25)) yield R 1 n (z, ẑ) = 1 Taking into account (26) (29) we further get In the same way one can show that 0 ( L(z + s(ẑ z)) L(z)) T (ẑ z)ds L ẑ z 2. R 1 n (z, ẑ) L(σ + 1) 2 d[z, Σ] 2 µ d[z, Σ]. (31) 8 We now consider the last m components of (30), i.e., R n+1 n+p (z, ẑ) µ d[z, Σ]. (32) 8 R n+p+i (z, ẑ) = φ i (z) + v i (ẑ z) i = 1,..., m, where v i is the (n + p + i)-th row of V, thus v i φ i (z). Taylor s formula yields with r i (x, ˆx) := g i (x) T (ˆx x) = g i (ˆx) g(x) r i (x, ˆx) (33) 1 Similar to showing (31), we get 0 ( g i (x + s(ˆx x)) g i (x)) T (ˆx x)ds. r i (x, ˆx) µ 8 d[z, Σ]. (34) m Now, for any i I two cases are distinguished: a) i I C (z). Due to (20) in Lemma 1, i I C (ẑ) follows. Thus, i I C (z) I C (ẑ) g i (x) = g i (ˆx) = u i = û i = φ i (z) = 0. (35) Using the representation (13) of matrices V contained in Φ(z) together with (16), we get R n+p+i (z, ẑ) = φ i (z) + v i (ẑ z) = (α i 1) g i (x) T (ˆx x) + (β i 1)(û i u i ) = (α i 1) g i (x) T (ˆx x). Therefore, (33), (35), (34), α i 1 2 (by 16) imply R n+p+i (z, ẑ) µ 4 d[z, Σ]. m 10

b) i I \ I C (z). Then, (g i (x), u i ) (0, 0) so that φ i is continuously differentiable at z = (x, u, v). With (13) (14), we obtain φ i (z) T (ẑ z) = a ϕ(g i (x), u i ) g i (x) T (ˆx x) + b ϕ(g i (x), u i )(û i u i ). Having (15) in mind setting := g i (x) 2 + u 2 i, we further get φ i (z) T (ẑ z) = ( g i(x) Together with (33), we have 1) g i (x) T (ˆx x) + ( u i 1)(û i u i ). R n+p+i (z, ẑ) = φ i (z) + φ i (z) T (ẑ z) = g i (x) u i + ( g i(x) 1) g i (x) T (ˆx x) + ( u i 1)(û i u i ) = g i (x) + ( g i(x) 1)(g i (ˆx) g i (x) r i (x, ˆx)) u2 i + ( u i 1)û i = g i(x) 2 +u 2 i + ( g i(x) 1)(g i (ˆx) r i (x, ˆx)) + ( u i 1)û i, by the definition of, R n+p+i (z, ẑ) = ( g i(x) Now, three subcases of case b) are considered. b1) i I C (ẑ). Then, g i (ˆx) = û i = 0. From (36) (34) 1)(g i (ˆx) r i (x, ˆx)). + ( u i 1)û i (36) R n+p+i (z, ẑ) = g i(x) 1 r i (x, ˆx) µ 4 d[z, Σ]. m follows. b2) i I \ I C (ẑ) g i (ˆx) > 0. Then, by (17), û i = 0. Moreover, (19) can be exploited. Therefore, having Assumption 2 a) (28) in mind, we get g i (x) g i (ˆx) L x ˆx > 1g 2 i(ˆx) + 1 N max{ z ẑ, d[z, Σ]} L x ˆx 4 1g 2 i(ˆx) > 0 (37) g i (x) 1 2 g i(ˆx) > 1 N max{ z ẑ, d[z, Σ]}. (38) 4 This, Assumption 2 a), (28) yield g i (x ) g i (x) L x x g i (x) Ld[z, Σ] > 0 11

for z = (x, u ) Π(z, Σ). Since z Σ, thimplies u i = 0 By an appropriate Taylor expansion we have u i = u i u i z z = d[z, Σ]. (39) a a 2 + b 2 b2 2a (a, b) (0, ) R. Setting a := g i (x) b := u i, it follows with (37) (39) that g i (x) = g i(x) gi (x) 2 + u 2 i u2 i 2 g i (x) d[z, Σ]2 2g i (x) 2. Therefore, with (37) (34), we further get R n+p+i (z, ẑ) = g i(x) g i (ˆx) + r i (x, ˆx) This (38) lead to Obviously, for N sufficiently large, R n+p+i (z, ẑ) 4 N d[z, Σ] + µ 4N m d[z, Σ]2. R n+p+i (z, ẑ) µ 4 d[z, Σ] m d[z, Σ]2 2g i (x) (2g i(x) + µ 2 8 d[z, Σ]). m follows. b3) i I \ I C (ẑ) û i > 0. In a very similar way the same estimate an case b2) can be obtained by carefully interchanging certain terms, (g i (ˆx) with û i or g i (x ) with u i, for instance). The results for the cases a) b1) b3) together with (31) (32) show that Φ(z) + V (ẑ z) = R(z, ẑ) 1 µd[z, Σ]. 2 holds for all V Φ(z). Theorem 2 Suppose that Assumptions 1 2 are satisfied. Moreover, assume that there are ɛ, µ > 0 so that µd[z, Σ] Φ(z) z Σ 0 + ɛb. (40) Then, there is µ > 0 so that µd[z, Σ] Ψ(z) z Σ 0 + ɛb. Proof. Apply Theorem 1 for H := Φ with σ + 1 instead of σ, where (12) Lemma 2 with µ > 0 from (40) have to be taken into account. 12

4 Pleasant Karush-Kuhn-Tucker Systems In this section conditions are provided under which Assumption 1 is satisfied. To proceed let us first define investigate the activity pattern belonging to any z Σ. Definition 1 For any z Σ the activity pattern p(z) := (g(z), u(z)) is defined by g(z) := {i I g i (x) = 0}, u(z) := {i I u i = 0}. All activity patterns of pointn z Σ are collected in the set In addition, let P(Σ) := {p(z) z Σ}. P := {p = (g, u) g, u {1,..., m}, g u = {1,..., m}}. Elements p 1 = (g 1, u 1 ) p 2 = (g 2, u 2 ) of P are said to be related, p 1 p 2 for short, if g 1 g 2 u 1 u 2. In addition, p 1 p 2 is used to denote that p 1 p 2 p 1 p 2. Any element p P(Σ) is called maximal if no q P(Σ) exists with p q. The set of all maximal elementn P(Σ) is denoted by P max (Σ). Finally, let P a (Σ) := {p P p q P(Σ)}. Obviously, P collects all those activity patterns that are potentially possible but need not occur in the solution set of a particular KKT system. The only requirement an element (g, u) of P has to satisfy is the complementarity condition, i.e. that each index i I is contained in at least one of the sets g u. Moreover, the inclusions P(Σ) P a (Σ) P. can easily be verified. For any p = (g, u) P, let the map Φ p the cone K p be defined by L(z) h(x) F p (z) := g g (x) u u, K p := {0} n+p+ g + u R m+m +, g(x) u where Then, for any p = (g, u) P, the set g g := ( g i ) T i g u u := ( u i ) T i u. Σ p := {z R n+m+p F p (z) K p } is possibly empty contained in Σ. For affine KKT systems any set Σ p with p P is a closed polyhedron. 13

Lemma 3 a) If p 1, p 2 P, then p 1 p 2 implies Σ p2 Σ p1. b) For any p P a (Σ), there is p max P max (Σ) so that p p max. c) The set Σ p is nonempty if only if p P a (Σ). Proof. Obvious. Lemma 4 Let Σ 2 Σ be nonempty compact. Then, there is κ 0 (0, 1] so that g i (x) + u i κ 0 i I \ I C (z) for any z Σ 2 with p(z) = (g(z), u(z)) P max (Σ). Proof. Assume the contrary. Then, a sequence {z ν } Σ 2, z Σ 2, p P max (Σ), i I must exist so that (i, i) / p(z ν ) = p P max (Σ) ν N lim (g i(x ν ) + u ν i ) = 0, ν lim z ν = z. ν With the continuity of g it follows that p p(z ). Thus, p cannot be maximal. Lemma 5 Let Σ 2 Σ be nonempty compact. Then, for any η > 0, there is κ (0, κ 0 ] so that, for all z Σ 2 all i I, g i (x) + u i κ (41) implies p(z) (i, i) P a (Σ) (42) inf{ z s s Σ p(z) (i,i) } η. (43) Proof. Fix η > 0. Assume first that there is no κ (0, κ 0 ] so that (41) implies (42) for all z Σ 2 all i I. Then, sequences {κ ν } (0, κ 0 ] {z ν } Σ 2, j I, ẑ = (ˆx, û, ˆv) Σ 2 must exist with lim κ ν = 0, ν lim ν z ν = ẑ, (44) g j (x ν ) + u ν j κ ν ν N, (45) p(z ν ) (j, j) P \ P a (Σ) ν N. (46) Taking suitable subsequencef necessary we have that, without loss of generality, p(z ν ) = ˆp ν N (47) for some fixed ˆp = (ĝ, û) P(Σ). Now consider any i I. If i ĝ, we have from (44) by the continuity of g i that lim g i(x ν ) = g i (ˆx) = 0, ν lim u ν i = û i 0 ν 14

If i û, we get Hence, with ẑ Σ 2 Σ, follows. Since (45) implies lim g i(x ν ) = g i (ˆx) 0, ν lim u ν i = û i = 0. ν ˆp p(ẑ) P(Σ) (48) lim g j(x ν ) + u ν j = g j (ˆx) + û j = 0, ν we further get (j, j) p(ẑ). This together with (48) yields ˆp ˆp (j, j) p(ẑ) P(Σ). (49) Thus, ˆp (j, j) P a (Σ) which contradicts (46) (47). Therefore, (42) is valid for all z Σ 2. To show that (43) implied by (41) first note that, due to (42) Lemma 3 c), Σ p(z) (i,i) for all z Σ 2 all i I satisfying (41). Thus, the left term in (43) is well defined. Let us assume that (43) does not hold. Then, we can repeat all steps of the previous part of the proof until formula (49) with the only modification that (46) is replaced by From (49), Lemma 3 a), (47) we have Therefore, since ẑ Σ p(ẑ), inf{ z ν y y Σ p(z ν ) (j,j)} > η ν N. (50) Σ p(ẑ) Σˆp (j,j) = Σ p(z ν ) (j,j) ν N. inf{ z ν y y Σ p(z ν ) (j,j)} z ν ẑ follows. By (44), this contradicts (50) for ν N sufficiently large. Assumption 3 (Pleasant KKT System) There are ω (0, 1] δ > 0 so that, for any p P a (Σ), ωd[z, Σ p ] inf F p (z) f z Σ p + δb. f K p Theorem 3 Let Assumption 3 be satisfied suppose that Σ 0 Σ is nonempty compact. Moreover, let ɛ 1 > 0 be given. Then, Assumption 1 is satisfied. Proof. Let N 1 be arbitrary but fixed define σ the sets Σ 1 Σ 2 Σ by σ := (1 + ω 1 N) m Σ 1 := (Σ 0 + ɛ 1 B) Σ, Σ 2 := (Σ 1 + σb) Σ. 15

Since Σ 0 is compact by assumption the same holds for Σ 1 Σ 2. Therefore, with η := δ (δ > 0 from Assumption 3), Lemma 5 provides some κ (0, κ 0 ] (0, 1] we can define τ := κσ 1 < 1. (51) Now, choose any (y, t) Σ 1 [0, τ] define vectors z 0,..., z m numbers σ 0,..., σ m recursively as follows. First, let z 0 := y. To define z k+1 from z k for k {0,..., m 1} choose If then set choose Otherwise, set z k+1 := z k. Finally, let i k argmin{g i (x k ) + u k i i I \ I C (z k )}. g ik (x k ) + u k i k N max{ y z k, t}, (52) p k := p(z k ) (i k, i k ) (53) z k+1 argmin{ z z k z Σ p k}. (54) σ k := (1 + ω 1 N) k (55) for k = 0,..., m. We now show by induction that z 0,..., z m are well defined that y z k σ k t z k Σ 2. (56) holds for k = 0, 1,..., m. For k := 0 we get that z 0 = y σ 0 = 1 so that (56) is obviously satisfied. Now, let (56) be valid for some k {0,..., m 1}. If (52) is violated, then z k+1 = z k (56) must hold. Therefore, we only need to consider the case if (52) is satisfied. In view of σ k 1, t [0, τ], (56), (55), (51), thimplies g ik (x k ) + u k i k Nσ k t N(1 + ω 1 N) k τ σ k+1 τ στ = κ. (57) Therefore since z k Σ 2, we can apply Lemma 5. Together with (53), p(z k ) p k = p(z k ) (i k, i k ) P a (Σ) (58) follows. Thus, by Lemma 3 c), the closed set Σ p k is nonempty so that z k+1 is well defined by (54). Moreover, (43) in Lemma 5 (with η := δ) gives inf{ z k s s Σ p k} = z k z k+1 η = δ. Hence, since z k+1 Σ p k, Assumption 3 can be exploited for z := z k p := p k leads to z k z k+1 = d[z k, Σ p k] ω 1 inf F p k(z k ) f. 16 f K p k

Due to z k Σ p(z k ) (53), the definition of F p k yields so that, with (57), inf F p k(z k ) f = g ik (x k ) + u k i f K k p k z k z k+1 ω 1 Nσ k t. follows. Therefore, by (56), t [0, τ], τ < 1 from (51), (55), we have y z k+1 y z k + z k z k+1 (1 + ω 1 N)σ k t = σ k+1 t σ. Thus, (56) is true for k + 1 instead of k so for all k {0,..., m}. From (58) for k = 0,..., m 1 it follows that Together with p(z k ) p k p(z k+1 ) k {0,..., m 1}. I g(z 0 ) u(z 0 ), for p(z 0 ) = (g(z 0 ), u(z 0 )), we have that there is k 0 {0,..., m} so that p(z k ) P max (Σ) k {k 0,..., m}. According to Lemma 4 this means that g i (x k ) + u k i κ 0 i I \ I C (z k ) k {k 0,..., m}. Therefore, (52) is violated for all k {k 0,..., m}. In particular, together with (56) for k = m, it follows that y z m σt, g i (x m ) + u m i > N max{ y z m, t} i I \ I C (z m ). Hence, y t := z m has exactly the properties required in Assumption 1. Corollary 1 Suppose that Σ 0 Σ is nonempty compact. Moreover, let Assumptions 2 3 be satisfied. If there are ɛ, µ > 0 so that µd[z, Σ] Φ(z) z Σ 0 + ɛb, then there are ɛ, µ > 0 so that µd[z, Σ] Ψ(z) z Σ 0 + ɛb. Proof. The assertion directly follows from Theorem 2 Theorem 3. For affine KKT systems, i.e., if F, g, h are affine functions, Corollary 1 can be simplified as follows. Corollary 2 Suppose that Σ 0 Σ is nonempty compact. If the KKT system is affine then there are ɛ, µ > 0 so that µd[z, Σ] Ψ(z) z Σ 0 + ɛb. 17

Proof. Since F, g, h are affine, the function F p is affine for any p P. Thus, Assumption 2 is satisfied. Moreover, by Lemma 3 c), Σ p is nonempty for all p P a (Σ) Therefore, Hoffman s error bound [6] for affine systems of inequalities ensures that, for any p P a (Σ), there are are δ p, ω p > 0 so that ω p d[z, Σ p ] inf f K p F p (z) f z Σ p + δ p B. Since P a (Σ) is a finite set, Assumption 3 is satisfied with δ := min{δ p p P a (Σ)} ω := min{ω p p P a (Σ)}. Hence, by Theorem 3, Assumption 1 holds for any ɛ 1 > 0. Altogether, Theorem 2 provides the desired result. Affine KKT systems are of particular interest for future research. We think that the compactness of Σ 0 Σ as often assumed in this paper can be removed for affine KKT systems. Another improvement of the resultn this section might be obtained by using a more local version of Assumption 3 so that not all Σ p with p P a (Σ) occur. References [1] Clarke, F.H. (1983): Optimization Nonsmooth Analysis. John Wiley Sons, NY [2] Facchinei, F., Soares, J. (1997): A new merit function for nonlinear complementarity problems a related algorithms. SIAM Journal on Optimization, 7, 225 247 [3] Fischer, A. (2001): Local behavior of an iterative framework for generalized equations with nonisolated solutions. Applied Mathematics Report 203, Department of Mathematics, University of Dortmund, Dortmund (revised 2002) [4] Fischer, A. (2002): Limiting behavior of an algorithmic framework for Karush- Kuhn-Tucker systems. Forthcoming. [5] Geiger, C., Kanzow, C. (1996): On the resolution of monotone complementarity problems. Computational Optimization Applications, 5, 155 173 [6] Hoffman, A. J. (1952): On approximate solutions of systems of linear inequalities, Journal of Research of the National Bureau of Stards, 49, 263 265 [7] Yamashita, N., Fukushima, M. (2001): On the rate of convergence of the Levenberg- Marquardt method. Computing, 15, 239 249 18