A Constraint-Reduced Variant of Mehrotra s Predictor-Corrector Algorithm

A Constraint-Reduced Variant of Mehrotra s Predictor-Corrector Algorithm Luke B. Winternitz, Stacey O. Nicholls, André L. Tits, Dianne P. O Leary September 24, 2007 Abstract Consider linear programs in dual standard form with n constraints and m variables. When typical interior-point algorithms are used for the solution of such problems, updating the iterates, using direct methods for solving the linear systems and assuming a dense constraint matrix A, requires O(nm 2 ) operations. When n m it is often the case that at each iteration most of the constraints are not very relevant for the construction of a good update and could be ignored to achieve computational savings. This idea has been considered in the 1990s by Dantzig and Ye, Tone, Kaliski and Ye, den Hertog et al. and others. More recently, Tits et al. proposed a simple constraint-reduction scheme and proved global and local quadratic convergence for a dual-feasible primal-dual affine-scaling method modified according to that scheme. In the present work, similar convergence results are proved for a dual-feasible constraint-reduced variant of Mehrotra s predictor-corrector algorithm. Some promising numerical results are reported. 1 Introduction Consider the primal and dual standard forms of linear programming (LP): min c T x s.t. Ax = b, x 0, and max b T y s.t. A T y c, where A is an m n matrix with n m, that is, the dual problem has many more inequality constraints than variables. We assume b 0. 1 The dual problem can alternatively be written in the form (with slack variable s) max b T y s.t. A T y + s = c, s 0. (2) Some of the most effective algorithms for solving LPs are the primal-dual interior-point methods (PDIPMs), which apply Newton s method, or variations thereof, to the perturbed Karush-Kuhn-Tucker (KKT) optimality conditions for the primal-dual pair (1): This work was supported by NSF grant DMI0422931 and DoE grant DEFG0204ER25655. The work of the first author was supported by NASA under the Goddard Space Flight Center Study Fellowship Program. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation, those of the US Department of Energy, or those of NASA. Department of Electrical and Computer Engineering and the Institute for Systems Research, University of Maryland College Park (lukewinternitz@gmail.com, andre@umd.edu) Applied Mathematics and Scientific Computing Program, University of Maryland College Park (son@math.umd.edu) Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland College Park (oleary@cs.umd.edu) 1 This assumption is benign, since if b = 0 the problem at hand is readily solved: any dual feasible point y 0 (assumed available for the algorithm analyzed in this paper) is dual optimal and x = 0 is primal optimal. (1) 1

A T y + s c = 0, Ax b = 0, (3) Xs τe = 0, (x,s) 0, with X = diag(x), S = diag(s), e the vector of all ones, and τ a positive parameter. As τ ranges over (0, ), the unique (if it exists) 2 solution (x,y,s) to this system traces out the primal-dual central path. Newton-type steps for system (3) are obtained by solving one or more linear systems of the form 0 A T I x f A 0 0 y = g, (4) S 0 X s h where f, g, and h are certain vectors of appropriate dimension. System (4) is often solved by first eliminating s, giving the augmented system ( A 0 S XA T )( ) ( ) x g =, (5) y h Xf s = f A T y, or by further eliminating x, giving the normal system AS 1 XA T y = g AS 1 (h Xf), s = f A T y, (6) x = S 1 (h X s). irrelevant? redundant active Figure 1: A view of the y space when m = 2 and n = 12. The arrow indicates the direction of vector b. The two active constraints are critical and define the solution, while the others are redundant or perhaps not very relevant for the formation of good search directions. When n m, a drawback of most interior-point methods is that the computational cost of determining a step is rather high. For example, in the context of PDIPMs, if we choose to solve (6) by a direct method and A is dense, the most expensive computation is forming the normal matrix AS 1 XA T, which costs O(nm 2 ) operations. This computation involves forming the sum AS 1 XA T = n i=1 x i s i a i a T i, (7) 2 System (3) has a unique solution for each τ > 0 (equivalently, for some τ) if there exists (x, y, s) with Ax = b, A T y + s = c and (x, s) > 0 [Wri97, Thm. 2.8, p39]. This is the so called Slater or interior-point condition. 2

where a i is the ith column of A and x i and s i are the ith components of x and s respectively, so that each term of the sum corresponds to a particular constraint in the dual problem. Note however that we expect most of the n constraints are redundant or not very relevant for the formation of a good search direction (see Figure 1). In (7), if we were to select a small set of q < n important constraints and compute only the corresponding partial sum, then the work would be reduced to O(qm 2 ) operations. Similar possibilities arise in other interior-point methods: by somehow ignoring most of the constraints, we may hope that a good step can still be computed, at significantly reduced cost. (Such a step may even be better: see [DNPT06] for evidence of potential harm caused by redundant constraints.) This observation is the basis of the present paper. In the sequel we refer to methods that attempt such computations as constraint-reduced. Prior work investigating this question started at least as far back as Dantzig and Ye [DY91], who proposed a build-up variant of a dual affine-scaling algorithm. In their scheme, at each iteration, starting with a small working set of constraints, a dual affine-scaling step is computed. If this step is feasible with respect to the full constraint set, then it is taken. Otherwise, more constraints are added to the working set and the process is repeated. Convergence of this method was shown to follow from prior convergence results on the dual affine-scaling algorithm. At about the same time, Tone [Ton93] developed an active set version of Ye s dual potential-reduction (DPR) algorithm [Ye91]. There, starting with a small working set of constraints, a DPR-type search direction is computed. If a step along this direction gives a sufficient decrease of the potential function, then it is accepted. Otherwise, more constraints are added to the working set and the process is repeated. Convergence and complexity results are essentially inherited from the properties of the DPR algorithm. Kaliski and Ye [KY93] considered a variant of Tone s algorithm and investigated applying the method to large scale transportation problems. Remarkable computational results were obtained. A different approach was used by den Hertog et al. [dhrt94] who proposed a build-up and down path-following algorithm based on a dual logarithmic barrier method. Starting from an interior dual-feasible point, the central path corresponding to a small set of working constraints is followed until it becomes infeasible with respect to the full constraint set, whereupon the working set is appropriately updated and the process restarts from the previous iterate. The authors proved an O( q log 1 ε ) iteration complexity bound for this algorithm, where q is the maximum size of the working constraint set during the iteration. Notably, this suggests that both the computational cost per iteration and the iteration complexity may be reduced. However, it appears that in this algorithm the only sure upper bound on q is n. A common component of [DY91, Ton93, dhrt94] is the backtracking that adds constraints and tries again when the step generated using the working constraint set fails to pass certain acceptability tests. In constrast, no such backtracking is used in [TAW06] where the authors considered constraint reduction for primal-dual algorithms. In particular, they proposed constraint-reduced versions of a primal-dual affinescaling algorithm (rpdas) and of Mehrotra s Predictor-Corrector algorithm (rmpc). As in [DY91, Ton93, dhrt94], at each iteration, rpdas and rmpc use a small working set of constraints to generate a step, but this step is not subjected to acceptability tests; it is simply taken. This has the advantage that the cost per iteration can be guaranteed to be cheaper than when the full constraint set is used; however it may preclude polynomial complexity results, as were obtained in [Ton93] and [dhrt94]. Global and local quadratic convergence of rpdas was proved in [TAW06] (under nondegeneracy assumptions) using a nonlinear programming inspired line of argument [Her82, PTH88] and promising numerical results were reported. To our knowledge, aside from the analysis of rpdas in [TAW06], no attempts have been made to date at analyzing constraint-reduced versions of PDIPMs, the leading class of interior-points methods over the past decade. This observation applies in particular to the current champion, Mehrotra s Predictor Corrector algorithm (MPC, [Meh92]), which combines an adaptive choice of the perturbation parameter τ in (3), a second order correction to the Newton direction, and several ingenious heuristics which together have proven to be extremely effective. Investigations of the convergence properties of variants of MPC are reported in [Meh92, ZZ95, ZZ96, SPT05, ST05, Car04]. In the present paper, we follow the line of analysis of [Her82, PTH88, TAW06] to analyze a proposed dual-feasible constraint-reduced version of MPC, inspired from rmpc of [TAW06], which we term rmpc*. The main contribution is this analysis, and also a somewhat different, and perhaps more natural, perspective on the notion of constraint reduction than was put forth in [TAW06] (see Remark 2.1 below). We prove global and local quadratic convergence of rmpc under certain nondegeneracy assumptions. We also report 3

on numerical experiments investigating the performance of rmpc on randomly generated LPs and on LPs arising from the discretization of a class of semi-infinite linear programming problems. Various rules for choosing the working constraint set are proposed and investigated numerically. The notation in this paper is mostly standard. We use to denote the 2-norm or its induced operator norm. Given a vector x R n, we let X = diag(x) denote the diagonal n n matrix with x on its main diagonal. We define n := {1,2,...n} and given any index set n, we use A to denote the m (where is the cardinality of ) matrix obtained from A by deleting all columns a i with i. Similarly, we use x and s to denote the vectors of size obtained from x and s by deleting all entries x i and s i with i. We define e to be the column vector of ones, with length determined by context. For a vector v, [v] is defined by ([v] ) i := min{v i,0}. Further, we define the dual feasible, dual strictly feasible, and dual solution sets, respectively, as The active set at y F is F := {y R m A T y c}, F o := {y R m A T y < c}, F := {y F b T y b T w for all w F }. I(y) := {i n a T i y = c i }. We term a vector y R m stationary if A T y c and there exists x R n with Ax = b and X(c A T y) = 0. Such an x is called a multiplier associated to the stationary point y. (A stationary vector y belongs to F if and only if x 0.) Lowercase k always indicates an iteration count, and limits of the form y k y are meant as k. Uppercase K generally refers to an infinite index set and the qualification on K is synonymous with for k K. In particular, y k y on K means y k y as k, k K. Finally we define and J(G,u,v) := G 0 0 0 GT I diag(v) 0 diag(u) ( G 0 J a (G,u,v) := diag(v) diag(u)g T for any matrix G and vectors u and v of compatible dimensions (cf. systems (4) and (5)). (8) The rest of the paper is structured as follows. Section 2 contains the definition and discussion of algorithm rmpc. Sections 3 and 4 contain the global and local convergence analyses, respectively. Some numerical results are presented in section 5, and conclusions are drawn in section 6. 2 A Constraint-Reduced MPC Algorithm 2.1 A convergent variant of MPC Our proposed algorithm, rmpc, is based on the implementation of MPC discussed in [Wri97, Ch. 10], which we state here for ease of reference. Iteration MPC [Meh92, Wri97]. Parameter. β (0,1). Data. y R m, s R n with s > 0, x R n with x > 0, µ := x T s/n. Step 1. Compute the affine scaling direction, i.e., solve 0 AT I x a A 0 0 y a = c AT y s b Ax (10) S 0 X s a Xs ) (9) 4

for ( x a, y a, s a ) and set Step 2. Compute the centering parameter where µ a := (x + t a p x a ) T (s + t a d sa )/n. t a p := arg max{t [0,1] x + t x a 0}, (11) t a d := arg max{t [0,1] s + t s a 0}. (12) σ:=(µ a /µ) 3, (13) Step 3. Compute the centering/corrector direction, i.e., solve 0 A T I x c 0 A 0 0 y c = 0 (14) S 0 X s c σµe X a s a for ( x c, y c, s c ). Step 4. Form the total search direction and set Step 5. Update the variables: set set and compute ( x m, y m, s m ):=( x a, y a, s a ) + ( x c, y c, s c ), (15) t m p := arg max{t [0,1] x + t x m 0}, (16) t m d := arg max{t [0,1] s + t s m 0}. (17) t m p :=β t m p, t m d :=β t m d, (18) (x +,y +,s + ):=(x,y,s) + (t m p x m, t m d y m, t m d s m ), (19) µ + := (x+ ) T (s + ). (20) n Algorithm MPC is of the infeasible type, in that it does not require the availability of a feasible initial point. 3 In contrast, the global convergence analysis for Algorithm rmpc (see section 3 below) critically relies on the monotonic increase of the dual objective b T y from iteration to iteration, and for this we do need a dual feasible initial point. As stated, Iteration MPC has no known convergence guarantees. Previous approaches to providing such guarantees involve introducing certain safeguards or modifications [Meh92, ZZ95, ZZ96, SPT05, ST05, Car04]. We do this here as well. Specifically, aside from the constraint-reduction mechanism (to be discussed in section 2.2), Iteration rmpc proposed below has four differences from Iteration MPC, all motivated by the structure of the convergence analysis adapted from [Her82, PTH88, TZ94, TAW06]. These differences, which occur in Steps 2, 4, and 5, are discussed next. Numerical experience suggests that they do not negatively affect the performance of the algorithm. The first difference, in Step 2, is the formula for the centering parameter σ. Instead of using (13), we set σ := (1 t a ) λ, 3 In MPC and rmpc (and most PDIPMs for that matter), dual (primal) feasibility of the initial iterate implies dual (primal) feasibility of all subsequent iterates. 5

where t a := min{t a p, t a d } and λ 2 is a scalar algorithm parameter. This formula agrees with (13) when λ = 3, (x,y,s) is primal and dual feasible, and t a = t a p = t a d. In general, both formulas result in similar empirical performance, while the new formula simplifies our analysis. The second difference is in Step 4, where we introduce a mixing parameter γ (0,1] and replace (15) with ( x m, y m, s m ):=( x a, y a, s a ) + γ( x c, y c, s c ). (21) Nominally we want γ = 1, but we reduce γ as needed to enforce three properties of our algorithm that are essential for the analysis. The first such property is the monotonic increase of b T y mentioned previously. While, given dual feasibility, it is readily verified that y a is an ascent direction for b T y (i.e., b T y a > 0), this may not be the case for y m, defined in (15). To enforce monotonicity we choose γ γ 1 where γ 1 is the largest number in [0,1] such that b T ( y a + γ 1 y c ) θb T y a, with θ (0,1) an algorithm parameter. It is easily verified that γ 1 is given by { 1 b T y c 0, γ 1 = min { 1, (1 θ) bt y a b T y } b T y c < 0. c The second essential property addressed via the mixing parameter is that (22) y a small implies y m and γσµ are also small. This smallness property is enforced (along with the first property) by requiring γ γ 0, where { } γ 0 := min γ 1,ψ ya y c,ψ ya, (23) σµ and ψ > 0 is another algorithm parameter. The final property enforced by γ is that t m d ζt a d, (24) where ζ (0,1) is a third algorithm parameter and t m d depends on γ via (17) and (21). We could choose γ to be the largest number in [0,γ 0 ] such that (24) holds, but this would seem to require a potentially expensive iterative procedure. Instead, rmpc sets where γ := { γ 0 t m d,0 ζta d γ 0 (1 ζ) t m d,0 (1 ζ) t m d,0 +(ζta d t m d,0 ) t m d,0 < ζta d (25) t m d,0 := arg max{t [0,1] s + t( s a + γ 0 s c ) 0}. (26) Geometrically, if t m d,0 ζta d then γ = γ 0, but otherwise γ [0,γ 0 ) and is selected in such a way that the search direction s m = s a +γ s c goes through the intersection of the line segment connecting s+ζt a d sa and s + ζt a d ( sa + γ 0 s c ) with the feasible line segment connecting s + t a d sa and s + t m d,0 ( sa + γ 0 s c ). See Figure 2. Since the intersection point s + ζt a d ( sa + γ s c ) is feasible, (24) will hold. In spite of these three requirements on γ, it is typical that γ = 1 in practice (with appropriate choice of algorithm parameters, as in section 5), except when aggressive constraint reduction is used (i.e., very few constraints are retained at each iteration). The remaining two differences between rmpc and MPC (aside from constraint reduction) are in Step 5. First, (18) is replaced by t m p := max{β t m p, t m p y a } (27) and similarly for t m d, to allow for local quadratic convergence. Second, the primal update is replaced by a componentwise clipped (from above and below) version of the primal update in (19). Namely, defining ˆx := x + t m p x m and x a := x + x a, for all i n, we update x i to x + i := min{max{ˆx i, min{ξ max, y a ν + [ x a ] ν }}, ξ}, (28) 6

s + t a d sa s + ζt a d sa s + t m d ( sa + γ s c ) s 0 s + ζt a d ( sa + γ s c ) s + t m d,0 ( sa + γ 0 s c ) s + ζt a d ( sa + γ 0 s c ) Figure 2: Enforcing t m d ζt a d with γ. The positive orthant here represents the feasible region s 0 in twodimensional slack space. The top arrow shows the step taken from some s > 0 along the affine scaling direction s a. The bottom arrow is the step along the MPC step with mixing parameter γ 0. In this picture, the damping factor t m d,0 is less than ζt a d, so we do not choose γ = γ 0. Rather, we take a step along the direction from s that passes through the intersection of two lines: the line consisting of points of the form s + ζt a d( s a + γ s c ) with γ [0, γ 0] and the feasible line connecting s + t a d s a and s + t m d,0( s a + γ 0 s c ). The maximum feasible step along this direction has length t m d ζt a d. where ν 2, ξ max (small) and ξ (large) are positive algorithm parameters. Formula (28) is adapted from [TAW06]. The lower bound, min{ξ max, y a ν + [ x a ] ν }, ensures that the components of x remain bounded away from zero away from KKT points (which is crucial to the global convergence analysis), while allowing for local quadratic convergence. Parameter ξ max, the maximum value of the lower bound, is not needed in our convergence analysis, but is important in practice; if ξ max is set sufficiently small, then normally x + = ˆx and the resulting iteration emulates the behavior of Iteration MPC. The upper bound parameter ξ ensures boundedness of the primal sequence, which is needed in the analysis; in practice performance is unaffected even if we set ξ = +. 2.2 A constraint reduction mechanism Given a working set of constraints and a dual-feasible point (x,y,s), we compute an MPC-type direction for the reduced primal-dual pair min c T x s.t. A x = b, x 0, and max b T y s.t. A T y + s = c, s 0. To that effect, we first compute the reduced affine-scaling direction by solving 0 AT I x a 0 A 0 0 y a = b A x (30) S 0 X s a X s and then the reduced centering-corrector direction by solving 0 AT I x c 0 A 0 0 y c = 0, (31) S 0 X s c σµ e X s a a where µ := (x ) T (s )/. As discussed above, we combine these components using the mixing parameter γ to get our primal and dual search directions: ( x m, y m, s m ):=( x a, y a, s a ) + γ( x c, y c, s c ). (32) (29) 7

This leaves the search direction in the n \ components of x m and s m unspecified. However, using an update of the form (19) and requiring dual feasibility from iteration to iteration requires that we set s a n\:= A T n\ y a and s c n\:= A T n\ y c. Thus, we augment (32) accordingly, yielding the search direction for x, y, and s, ( x m, y m, s m ) = ( x a, y a, s a ) + γ( x c, y c, s c ). (33) However, we do not update x n\ by taking a step along a computed direction. Rather, inspired by an idea in [Ton93], we first consider the update x + i :=µ+ s + i i n \, with µ + := (x + ) T (s + )/. This makes (x + n\,s + n\) perfectly centered. Indeed, µ + = (x+ ) T (s + ) n = (x+ ) T (s + ) + (x + n\) T (s + n\) n = n µ+ + i n\ x + i s+ i n = n µ+ + n µ + = µ + n, and hence x + i s+ i = µ + for all i n \. However, since the analysis requires that the primal iterates remain bounded, we use instead, for i n \, ˆx i := µ+ s +, x + i := min{ˆx i,ξ}. (34) i Like the upper bound in (28), bound ξ in (34) was never active in numerical tests (when chosen appropriately large). Remark 2.1. A somewhat different approach to constraint-reduction, where the motivating idea of ignoring irrelevant constraints is less prominent, is used in [TAW06]. There, instead of the reduced systems, (30)- (31), full systems of equations of the form (4) are solved via the corresponding normal systems (6), only with the normal matrix AS 1 XA T replaced by the reduced normal matrix A S 1 X A T. Possible benefits of the approach taken here in rmpc are: 1) the [TAW06] approach is essentially tied to the normal equations, whereas our approach is not, 2) if we do solve the normal equations (62) (below) there is a (mild) computational savings over algorithm rmpc of [TAW06] and 3) initial computational experiments suggest that rmpc is at least as efficient as rmpc in practice. Before formally defining Iteration rmpc, we define the set of admissible s. Here we follow [TAW06] in requiring that contain m most nearly active constraints. 4 However, we depart from [TAW06] in that we also require to satisfy rank(a ) = m. (In [TAW06] this rank condition is still needed, but is enforced differently, through a rather strong assumption on A.) Specifically, we require that be selected from the set (y) := { n rank(a ) = m,, = m, and (c A T y) i (c A T y) j i,j / }. (35) In words, A must have full rank and must contain m most nearly active constraints. We are now ready to state Iteration rmpc. Iteration rmpc. Parameters. β (0,1), θ (0,1), ψ > 0, ζ (0,1), λ 2, ν 2, ξ max > 0, and ξ > 0. Data. y R m, s R n, such that A T y + s = c and s > 0, x R n such that x > 0, (y), µ := (x ) T (s )/. 4 Of course, nearness to activity can be measured in different ways. Here by most active constraints, we mean those having the smallest slack values. When the columns of A are normalized to unit 2-norm, the slack in a constraint is just the Euclidean distance to the constraint boundary. Also see Remark 2.3 below on invariance under scaling. 8

Step 1. Compute the affine scaling direction, i.e., solve (30) for ( x a, y a, s a ), set s a n\ = A T n\ y a, and set Step 2. Compute the centering parameter t a p := arg max{t [0,1] x + t x a 0}, (36) t a d := arg max{t [0,1] s + t s a 0}, (37) t a := min{t a p,t a d}. (38) σ:= (1 t a ) λ. (39) Step 3. Compute the centering/corrector direction, i.e., solve (31) for ( x c, y c, s c ) and set s c n\ := A T n\ y c. Step 4. Form the total search direction ( x m, y m, s m ):=( x a, y a, s a ) + γ( x c, y c, s c ), (40) where γ is as in (25), with µ (in (23)) replaced by µ. Set t m p := arg max{t [0,1] x + t x m 0}, (41) t m d := arg max{t [0,1] s + t s m 0}. (42) Step 5. Update the variables: set t m p := max{β t m p, t m p y a }, (43) t m d := max{β t m d, t m d ya }, (44) and set (ˆx,y +,s + ):=(x,y,s) + (t m p x m, t m d y m, t m d s m ). (45) For each i, set x + i := min{max{ˆx i, min{ξ max, y a ν + [ x a ] ν }}, ξ}, (46) where x a is defined by Set and, for each i n \, set { x a xi + x i:= a i i, 0 i n \. µ + := (x+ ) T (s + ) (47) (48) ˆx i := µ+ s +, i (49) x + i := min{ˆx i,ξ}. (50) Step 6. Choose a new set of constraints + (y + ) and compute µ + := (x+ ) T (s + ) + + + +. (51) 9

In the convergence analysis, we will also make use of the quantities x m, s a, and s m defined by the following expressions: { x m xi + x i := m i i, (52) 0 i n \, s a := s + s a, (53) s m := s + s m. (54) Remark 2.2. As in iteration MPC, rmpc uses separate step sizes for the primal and dual variables. In practice, this has been observed by many to be preferable to using a common step size. However, often in convergence analysis of MPC-type algorithms a common step size is assumed. We found that using separate step sizes simplified our analysis and was, in fact, necessary for proving a critical result (Proposition 4.4). Remark 2.3. While rmpc as stated fails to retain the remarkable invariance properties of MPC, invariance under diagonal scaling in the primal space and under Euclidean transformations and uniform diagonal scaling in the dual space can be readily recovered (without affecting the theoretical properties of the algorithm) by modifying iteration rmpc along lines similar to those discussed in section 5 of [TAW06]. In closing this section, we note a few immediate results to be used in the remainder of the paper. First, the following identities are valid for j {a,m}: { { }} t j x i p = min 1,min x j i, x j i < 0, (55) i { }} t j d {1,min = min s i s j s j i < 0. (56) i Next, the following are direct consequences of equations (30)-(31) and Steps 1 and 3 of Iteration rmpc : and, for i, s j = A T y j for j {a,c,m}, (57) s i s i x a i + x i s a i = x i s i, (58) x i s a = x i i x a and i x a = s i i s a, (59) i s i x m i + x i s m i = x i s i + γ(σµ x a i s a i). (60) Further, system (30) can alternatively be solved in augmented system form ( )( ) ( ) A 0 x a S X A T b A x y a =, (61) X s or in normal equations form s a = A T y a, A S 1 X A T y a = b, (62a) s a = A T y a, (62b) x a = x S 1 X s a. (62c) Similarly, (31) can be solved in augmented system form ( )( ) ( ) A 0 x c S X A T 0 y c = σµ e X s a a, (63) s c = A T y c, 10

or in normal equations form A S 1 X A T y c = A S 1 (σµ X s a a ), (64a) s c = A T y c, (64b) x c = S 1 X s c + S 1 (σµ X s a a ). (64c) Finally, as an immediate consequence of the definition (21) of the rmpc search direction in Step 4 of Iteration rmpc and the expressions (23) and (25) (in particular (23)), we have γ y c ψ y a, γσµ ψ y a. (65) 3 Global Convergence Analysis The analysis given here follows the line of argument used in [TAW06] for the rpdas algorithm. We first list the assumptions we use in the global convergence analysis. Each result to follow will state explicitly which, if any, of these assumptions it relies on. Assumption 1. A has full row rank. Assumption 2. The dual solution set F is nonempty and bounded. Assumption 3. For all y F, {a i : i I(y)} is a linearly independent set. Note that the first assumption ensures that (y) is always nonempty. The next two lemmas are taken (almost) verbatim from [TAW06, Lemmas 1 and 2]. Lemma 3.1. J a (A,x,s) is nonsingular if and only if J(A,x,s) is. Further suppose x 0 and s 0. Then J(A,x,s) is nonsingular if and only if (i) x i + s i > 0 for all i, (ii) {a i : s i = 0} is linearly independent, and (iii) {a i : x i 0} spans R m. Lemma 3.2. Let x > 0, s > 0, and (y) for some y R m. Then A X S 1 A T is positive definite. The following proposition, which builds on [TAW06, Prop. 3], shows that Iteration rmpc is well defined and that the dual objective strictly increases. Proposition 3.3. Let x > 0, s > 0, and (y) for some y R m. Then the following hold: (i) b T y a > 0, (ii) b T y m θb T y a, and (iii) t m p > 0, t m d > 0, y+ F o, s + = c A T y + > 0, and x + > 0. Proof. Claim (i) follows directly from Lemma 3.2, (62) and b 0, which imply For claim (ii), if b T y c 0, then, by claim (i), b T y a = b T (A S 1 X A T ) 1 b > 0. b T y m = b T y a + γb T y c b T y a θb T y a, and from step 4 of Iteration rmpc, if b T y c < 0 then, using claim (i) and since γ γ 1, b T y m b T y a + γ 1 b T y c b T y a + (1 θ) bt y a Finally, claim (iii) follows from steps 4-5 of Iteration rmpc. b T y c bt y c = b T y a (1 θ)b T y a = θb T y a. Hence, under Assumption 1 (which ensures that (y) is always nonempty), Iteration rmpc can be executed repeatedly to generate an infinite sequence of iterates. From here on we attach an iteration index k to the quantities generated. In view of Proposition 3.3, the sequence of dual objective values {b T y k } is monotonically increasing. As in [TAW06, Lemma 5], under Assumption 2 and in view of Proposition 3.3, this implies boundedness of {y k }. 11

Lemma 3.4. Suppose Assumptions 1 and 2 hold. Then {y k } is bounded. Our global convergence analysis critically relies on the fact (see Lemma 3.6 below) that if y a,k is small then y k is close to a stationary point and x a,k is close to the corresponding multiplier. An essential property of our working set selection strategy, to be used repeatedly in the global and local convergence analyses is that, under Assumptions 1 and 3, k eventually includes the indices of all critical constraints. Specifically the following holds. Lemma 3.5. Suppose Assumptions 1 and 3 hold and for some y R m, y k y on an infinite index set K. Then I(y ) k for all sufficiently large k K. Proof. Assumption 3 implies I(y ) m, and in view of the definition (35) of (y k ), the claim follows by convergence of {y k } k K to y. The next step in the global convergence analysis is to establish that, if { y a,k } goes to zero on a subsequence, then, on the same subsequence, {y k } tends to a stationary point, and both { x a,k } and { x m,k } converge to the associated multiplier (c.f. [TAW06, Lemma 6]). Lemma 3.6. Suppose Assumptions 1 and 3 hold and for some y R m, y k y on an infinite index set K. If y a,k 0 on K, then y is stationary and both { x a,k } k K and { x m,k } k K converge to x, where x is the multiplier associated with y. Proof. The second block equation in (30) yields, for all k, and (58) yields for all k and all i k, A k x a,k k b = 0 (66) s k i x a,k i Now, let s := c A T y, and suppose y a,k 0 on K. sufficiently large. We claim that for all i n \ I(y ), + x k i s a,k i = 0, (67) Lemma 3.5 implies I(y ) k for k K x a,k i 0 on K. (68) Suppose it is not the case. Then for some i n \ I(y ), and an infinite index set K K, inf k K x a,k i > 0. This implies that i k for all k K since x a,k i := 0 for all i n \ k. Thus in view of (67), since s a,k i 0 on K (by (57)) and {x k } is bounded (by construction), we must have s k i 0 on K, so s i = 0, but this contradicts i n \ I(y ). From (66) and (68) it follows that A I(y ) x a,k I(y ) b 0 on K. Next, Assumption 3 implies that the columns of A I(y ) are linearly independent which, in view of (68), implies that x a,k x on K, for some x. Taking limits in (66) and (67) as k, k K gives Ax = A I(y )x I(y ) = b and X s = 0; i.e., y is stationary with multiplier x. Finally we turn to { x m,k } k K. The second block equations of (30) and (31), together with (40), yield, for all k and (60) yields for all k and all i k, s k i x m,k i + x k i s m,k i A k x m,k k b = 0, (69) = γ k (σ k µ k k xa,k i s a,k i ). (70) Convergence of { x a,k } k K and boundedness of {x k } imply boundedness of { x a,k k } k K since x a,k k = x a,k k x k k. In addition, y a,k 0 on K and relations (65) give γ k σ k µ k k 0 on K and γ k y c,k 0 on K which, in turn, implies γ k s c,k = γ k A T y c,k 0 on K by (57). So, in view of (40), (57), and (65), we see that the subsequence { s m,k } k K and the entire right-hand side of (70) converge to zero on K. With these facts, the same argument as above yields the remaining portion of the claim. 12

Hence, when y a,k becomes small on a subsequence, a stationary point is approached, as desired. Could it be however that y a,k does not become small, while y k converges on a subsequence to some limit point? The next result, which will used in Lemma 3.9 below as part of a contradiction argument, shows that a KKT point must then be approached on the previous subsequence (cf. [TAW06, Lemma 7]). Lemma 3.7. Suppose Assumptions 1, 2 and 3 hold. If K is an infinite index set such that then y a,k 0 on K. inf{ y a,k 1 ν + [ x a,k 1 ] ν k K} > 0, (71) Proof. We proceed by contradiction. Suppose the claim does not hold; that is, suppose (71) holds but y a,k 0 on K. Then for some infinite index set, K K, inf{ y a,k > 0 k K } > 0. Since {y k } (see Lemma 3.4) and {x k } (by construction) are bounded, assume without loss of generality that they converge on K to y and x respectively. Since k is selected from a finite set, we may also assume without loss of generality that, for some fixed n, k = for all k K. In view of Lemma 3.5, we may further assume that I(y ). Also, note that (71) and (46) imply that for each i, {x k i } k K is bounded away from zero and positive and hence x > 0. The idea in the remainder of the proof is to show that, under the contradiction hypothesis, the dual objective increases by a constant amount infinitely many times, i.e., there exists δ > 0 such that b T y k+1 > b T y k + δ for all k K. This implies by monotonicity (Proposition 3.3) that {y k } is unbounded, which contradicts Lemma 3.4. First, by dual feasibility, s k = c A T y k for all k, hence {s k } converges to s := c A T y on K. The three conditions of Lemma 3.1 are satisfied at (A,x,s ), since (i) x > 0 and s 0, (ii) {a i i I(y )} is a linearly independent set by Assumption 3, and (iii) {a i i, x i 0} = {a i i } spans R m since (y k ) implies rank(a ) = m by (35). Therefore J a (A,x,s ) is nonsingular. Since, from (9) and (61), ) ) ( y J a (A,x k,s k a,k ) x a,k = ( b A x k X k s k and the right hand side converges on K, it follows that the steps converge on K : y a,k y a, and x a,k x a,, for some y a, and x a,. Also, since inf k K { y a,k } > 0, we have y a, 0 which implies s a, := A T y a, 0 since, again, (y k ) implies rank(a ) = m by (35). Next, we have by Step 5 of rmpc and Proposition 3.3 (ii) that, for all k, (72) b T y k+1 = b T (y k + t m,k d y m,k ) b T y k + t m,k d θb T y a,k. (73) Also, from (62a) and using s a,k = A T y a,k (by (57)), we have for all k, b T y a,k = ( y a,k ) T A (S) k 1 XA k T y a,k = ( s a,k ) T (S) k 1 X s k a,k = x k i s k ( s a,k i ) 2. i i The terms of this sum are all nonnegative and, as noted above, for at least one i, s a,k i tends to a nonzero limit s a, i on K. Therefore, since x k x > 0 on K and {s k i } k K is bounded (since it converges), we conclude that b T y a, > 0. So there exists a δ > 0 such that for all k K with k large enough, b T y a,k > δ > 0. In view of (73), establishing a positive lower bound on t m,k d for k K will complete the proof. By (44) and since Step 4 of Iteration rmpc ensures (24), we have t m,k d β t m,k d βζt a,k d. Therefore, it suffices to bound t a,k d away from zero. From (37), either t a,k d = 1 or, for some such that s a,k < 0, we have t a,k d = sk s a,k. (74) 13

If n \ ( n \ I(y ), since we assumed, without loss of generality, that I(y ) ) then, by Assumption 3, s k is bounded away from zero on K (since it converges to s > 0) and since s a,k = A T y a,k is bounded on K, we do get a positive lower bound for t a,k t a,k d d. On the other hand, if, using (74) and (59) we obtain = x k / x a,k, which is positive and bounded away from zero on K since x k is bounded away from zero on is bounded on K. This completes the proof. K and x a,k As in [TAW06, Lemmas 8 and 9] (with an identical proof using the appropriate modified lemmas) convergence to a stationary point readily follows. We include the next two results with proof for completeness and ease of reference. Lemma 3.8. Suppose Assumptions 1, 2 and 3 hold, and suppose there exists an infinite index set K on which y k is bounded away from F. Then y a,k 0 on K. Proof. Suppose the claim does not hold. Then Lemma 3.7 implies that { y a,k 1 } k K and {[ x a,k 1 ] } k K both converge to zero. Since {y k } is bounded (Lemma 3.4), there exists a vector y and an infinite index set K K such that {y k 1 } k K y. Lemma 3.6 implies that y is stationary and that { x a,k 1 } k K x, where x is the associated multiplier. However {[ x a,k 1 ] } k K 0 implies x 0 and so y F, a contradiction. Lemma 3.9. Suppose Assumptions 1, 2 and 3 hold. Then {y k } converges to the set of stationary points of (1). Proof. Suppose the claim does not hold. By boundedness of {y k }, there exists an infinite index set K on which y k y, with y non-stationary, hence {y k } k K is bounded away from F. Lemma 3.8 then implies y a,k 0 on K, which contradicts Lemma 3.6. This result is then used together with a technical lemma, proved in [TAW06, Lemmas 10 and 11], to show (using a modification of the proof of [TAW06, Theorem 12]) that in fact {y k } must converge to the dual solution set F (see Theorem 3.11 below). Lemma 3.10. Suppose Assumptions 1, 2 and 3 hold. If {y k } is bounded away from F, then all limit (stationary) points of {y k } have the same multiplier. Theorem 3.11. Suppose Assumptions 1, 2 and 3 hold. Then {y k } converges to F, the dual solution set. Proof. Suppose the claim does not hold. Monotonicity of {b T y k } implies that all limit points of y k have the same objective value. Therefore, since {y k } converges to its set of limit points (by boundedness), it must be bounded away from F. Lemma 3.8 then implies that y a,k 0. Let x be the unique (by Lemma 3.10) multiplier associated with all limit points of {y k }. We claim that x a,k x and x m,k x. We use another (nested) contradiction argument to prove this claim. Thus, suppose x a,k x, and let K be an infinite index set such that x a,k is bounded away from x on K. Let ŷ be such that y k ŷ on some infinite index set K K. Since y a,k 0 (on K in particular), Lemma 3.6 implies x a,k x on K, a contradiction. The same argument with x m,k in place of x a,k gives x m,k x. Now let y be an arbitrary limit point of {y k }, and let K be an infinite index set such that y k y on K, so that also s k := c A T y k c A T y =:s on K. Lemma 3.9 and the contradiction assumption imply that y is a non-kkt stationary point and hence for at least one i n, say i =, we have x < 0. This implies that x a,k < 0 and x m,k < 0 for all k sufficiently large. Complementarity implies s = 0, i.e., I(y ), so that by Lemma 3.5, k for all sufficiently large k K. Hence, for all large enough k, by (58), which implies s a,k that x a,k s a,k s k x a,k < 0. Since x a,k + x k s a,k = 0, (75) < 0 and x k > 0 also imply that x a,k > 0 since x a,k < 0. Therefore, since σ k µ k 0 and γ k 0, using (60) we get k s k x m,k + x k s m,k which, since x m,k < 0, implies that s m,k contradicts the fact that s = 0. < 0, it follows = γ k (σ k µ k k xa,k s a,k ) 0 (76) > 0 for all k large enough, say k k 0. Thus s k s k0 > 0, which 14

4 Local Convergence Analysis In this section we show, under additional Assumptions 4 and 5 (see below), that the iteration sequence z k := (x k,y k ) converges q-quadratically to the solution z :=(x,y ). We will first show that the iteration sequence converges to the solution, viz. z k z (Proposition 4.4), and then that it does so with a q-quadratic rate (Theorem 4.12). The following assumption supersedes Assumption 2. Assumption 4. The dual solution set is a singleton, F = {y }. Of course under this assumption, Theorem 3.11 implies y k y and s k := c A T y k c A T y =: s. Let x be the (unique by Assumption 3) multiplier associated to the stationary point y. The following result is a slight extension of [TAW06, Lemma 13]. Lemma 4.1. Under Assumptions 1, 3, and 4, the sequence {y k } generated by Iteration rmpc converges to y, the unique dual solution, and (x,s ) satisfy strict complementary slackness, i.e., x + s > 0. Further, for any x R n, x 0 such that x + s > 0, (in particular for x = x ) and any such that I(y ), J(A,x,s ) and J a (A,x,s ) are nonsingular. Proof. Assumption 4 and the Goldman-Tucker theorem, (e.g. see [Wri97, p.28]) imply strict complementary slackness for the pair (x,s ). Assumption 4 also implies that {a i i I(y )} = {a i x i 0} consists of exactly m linearly independent vectors. Hence, the three conditions for Lemma 3.1 are satisfied, and the non-singularity claim follows. The following technical lemma, that relates quantities generated by rmpc, is called upon below (in Lemmas 4.3 and 4.10) to show that the damping coefficients t m p and t m d converge to one and, moreover, that the convergence is fast enough for quadratic convergence of {z k } to take place. Lemma 4.2. Suppose Assumptions 1, 3, and 4 hold. Let (x,y,s) satisfy A T y + s = c, s > 0, and x > 0 and let (y), with I(y ). Let x a, s a, x a, s a, x m, and s m be generated by Iteration rmpc. If x m i > 0 for all i I(y ) and s m i > 0 for all i n \ I(y ), then t m p min { 1, min i \I(y ) { t m d min 1, min i I(y ) { si s a i, s i s m i { xi x a i, x i x m i sa i s m i xa i x m i }}, (77) }}. (78) Proof. First consider (77). With reference to (55), we see that either t m p = 1, in which case (77) is verified, or for some with x m < 0, we have t m p = x x m < 1. (79) Suppose I(y )( ). Since x m < 0, and x i0 > 0 and in view of the definition (52) of x m, the inequality x m > 0, which holds by assumption, implies x i0 /( x m ) 1, contradicting (79). Thus we must have \ I(y ). To complete the proof of (77), we consider two possibilities. If then, using (59) we have x a x m, t m p = x x m x x a = s s a, (80) and (77) is again verified. Alternately, if x a < x m, then using (60) and rearranging terms, we get (see below for explanation of the inequalities) t m p = x x m = s s m + γσµ x m s m + γ xa s a x m s m s s m γ xa s a x m s m s s m sa s m, 15

where γ, σ, and µ are as generated by Iteration rmpc. The first inequality follows because the second term is nonnegative: the numerator is nonegative, x m > 0 by assumption, and s m > 0 also by assumption. The second inequality follows since x a < x m and γ 1. So, once again, (77) is verified. Finally, inequality (78) is proved by a very similar argument that flips the roles of x and s. Since by construction x k i ξ for all i and k, where ξ is set arbitrarily by the user, the sequence {x k} cannot be expected in general to converge to x. The best we can hope for is that it converges to x #, where x # i := min{x i,ξ}. (81) This, together with appropriate convergence of other quantities, is established in Proposition 4.4 below, whose proof makes use of the following lemma. Lemma 4.3. Suppose Assumptions 1, 3, and 4 hold. If there exists an infinite index set K on which y a,k 0, then ˆx k x and x k+1 x #, both as k, k K. Proof. Since y k y, in view of Lemma 3.5, we may assume without loss of generality that I(y ) k for all k K. Now, since y a,k 0 on K and y k y, (57) implies that s a,k 0 on K, and Lemma 3.6 implies that x a,k x on K and x m,k x on K, in particular, that [ x a,k ] 0 on K. Further, by (65), and (40), y m,k 0 on K which implies, again by (57), that s m,k 0 on K. We first show that ˆx k x 0 on K. 5 We have for all k, using the triangle inequality, (52), and k k (45), ˆx k k x k ˆxk k xm,k + x m,k x k k k 1 tm,k p x m,k k + x m,k x. (82) Since x m,k x on K and { x m,k } k k K is bounded (since {x k } and { x m,k } k K are both bounded), we need only show t m,k p 1 on K. Now, since I(y ) k, x m,k x on K, and s m,k s on K (by y k y, (57), (40), (65), and (54)) all hold, strict complementarity (Lemma 4.1) implies that, for all k K large enough, x m,k i > 0 for i I(y ) and s m,k i > 0 for i n \ I(y ). Thus, without loss of generality, we assume it holds for all k K. Therefore, the hypothesis of Lemma 4.2 is verified for all k K, and in view of (77), since s a,k 0 on K and {s k } k K, { s a,k } k K and { s m,k } k K all converge to s, we have t m,k p 1 on K (since s i > 0 for all i n \ I(y )). Further, by (43) and since y m,k 0 on K, we also have t m,k p 1 on K. So indeed, ˆx k x 0 on k k K. Next, we show that x k+1 x # 0 on K. Let i I(y ) ( k for all k K). We have already k k established that y a,k ν + [ x a,k ] ν 0 on K and ˆx k i x i > 0 on K (positivity is by strict complementary slackness). This implies, by (46), that for sufficiently large k K we have x k+1 i = min{ˆx k i,ξ}, so that x k+1 i x # i on K. Now consider i n \ I(y ), where x i = 0, and consider the set K i K defined by K i := {k K i k }. If K i is finite, then this i is irrelevant to the limit of x k+1 x #. If it is infinite k k however, then since y a,k ν + [ x a,k ] ν 0 on K i and ˆx k i x i = 0 on K i, we have from (46) that x k+1 i 0 = x # i = x i on K i. Thus we have shown that x k+1 x # 0 on K. This fact, taken together k k with the complementarity of (x #,s ) (which follows from complementarity of (s,x ) and the definition (81) of x # ), then implies that µ k+1 0 on K. k Let K be the subset of K on which n \ k is nonempty. If K is finite, then the proof of the lemma is already complete. Otherwise, to complete the proof, we show that ˆx k x = ˆx k 0 on n\ k n\ k n\ k K and x k+1 x # = x k+1 0 on K. For this, we consider i n \ I(y ) and the set K n\ k n\ k n\ k i K defined by K i := {k K i n \ k }. As before, if K i is finite then this index i is irrelevant to the limits we are interested in. If it is infinite, then by (49) we get ˆx k i = µk+1 /s k+1 k bounded away from zero (since i n \ I(y ) ) and µ k+1 k i on K i, and since {s k+1 i } is 0 on K (as shown in the previous paragraph), we have ˆx k i x i = 0 on K i. In view of (50), this implies x k+1 i x # i = x i = 0 on K i. Thus, the proof is complete. 5 Note that the dimension of ˆx k k x k, i.e., k, may vary with k. 16

Proposition 4.4. Suppose Assumptions 1, 3 and 4 hold. Then we have (i) y a,k 0 and y m,k 0, (ii) x a,k x and x m,k x,(iii) ˆx k x and x k x #, and (iv) if x i ξ for all i n, then xk x, x a,k 0 and x m,k 0. k k Proof. First we show that y a,k 0. Supposing it is not so, take an infinite index set K with inf k K y a,k > 0. Lemma 3.7 then implies that there exists an infinite index set K K on which { y a,k 1 } k K and {[ x a,k 1 ] } k K converge to zero. We assume without loss of generality that k =, a constant set, for all k K (since k is selected from a finite set). Lemma 4.3 implies x k x # on K and since s k s (on K in particular), we have J(A,x k,s k ) J(A,x #,s ) on K. Further, by strict complementarity of (x #,s ) (which follows from strict complementarity of (s,x ) and the definition (81) of x # ) and Assumption 4, and since I(y ), Lemma 4.1 implies that J(A,x #,s ) is nonsingular. Using these facts and noting that (30) and the inclusion I(y ) imply J(A,x k,s k ) x a,k y a,k s a,k 0 = b on K and J(A,x #,s ) 0 x 0 0 0 = b, (83) 0 we see that y a,k 0 on K. This gives the desired contradiction and proves that the entire sequence { y a,k } converges to zero. In view of (65) and definition (40) of y m, the proof of claim (i) is complete. In view of Lemma 3.6 and Lemma 4.3, claims (ii) and (iii) are immediate consequences of claim (i). Claim (iv) follows directly from claims (ii) and (iii) and the definition (81) of x #. The dual sequence {y k } was shown to converge to the dual optimal set F under Assumptions 1, 2, and 3. Adding Assumption 4 allowed us to show that the surrogate primal sequence {ˆx k } converges to the optimal multiplier x. Under the further assumption that x i ξ for all i n, the primal sequence itself was shown to converge to the optimal multiplier. An ever-so-slightly strengthened version of this latter assumption, which we state next, guarantees local quadratic convergence of the primal-dual sequence {z k } = {(x k,y k )}. Assumption 5. x i < ξ, i n. From here forward, we focus on the {z k } sequence. To prove quadratic convergence, we show that there exist constants c 0, and ρ > 0 (independent of z = (x,y) and ) such that for all z B(z,ρ) G o and all (y), Here z + (z,) z c z z 2. (84) B(z,ρ) := {z R m+n z z ρ}, G o := {(x,y) R n R m x > 0, y F o }, and z + (z,) is the update to z with the dependence of z + on z and made explicit. We will use this explicit notation for all quantities that depend on (z,) from now on, e.g. z a (z,), x m (z,), etc. Notice that the set of (z,) such that z B(z,ρ) G o and (y) is precisely the domain of definition of the mappings defined by Iteration rmpc : z + (, ), z a (, ), etc. We also introduce the (somewhat abusive) notation z := (x,y), z := ( x, y). The following lemma gives a neighborhood of z on which we will prove that the quadratic rate inequality (84) holds. In particular, several useful bounds that simplify the remaining analysis are proven on this neighborhood. We first define a quantity which is guaranteed to be positive when strict complementarity holds: ε := min{1, min i n (s i + x i )}. (85) 17

Lemma 4.5. Suppose Assumptions 1, 3, 4 and 5 hold and let β > 0 and ξ > 0. Then there exists ρ > 0 and R > 0 such that for all z B(z,ρ) G o and (y) the following hold: (i) I(y ) and J a (A,x,s ) 1 R, (86) (ii) max{ z(z,), a z m (z,), s a (z,), s m (z,) } < ε /2, (87) (iii) min{x i, x a i(z,), x m i (z,)} > ε /2, i I(y ), (88) max{s i, s a i (z,), s m i (z,)} < ε /2, i I(y ), (89) max{x i, x a i(z,), x m i (z,)} < ε /2, i n \ I(y ), (90) min{s i, s a i(z,), s m i (z,)} > ε /2, i n \ I(y ), (91) (iv) β t m p (z,) < t m p (z,) y a (z,), (92) β t m d (z,) < t m d (z,) y a (z,) (93) (v) ˆx i (z,) < ξ, i. (94) Proof. Let s := c A T y. (Note that, through y, s varies with z.) Consider the (finite) set := { n I(y ) }. We first note that for all y sufficiently close to y, we must have I(y ) (y), and hence (y). Indeed, Assumption 3 implies that, for i I(y ), s i (y ) = 0 is among the m smallest slack values at y, and thus that, for i I(y ), s i (y) is among the m smallest slack values at y, for all y close enough to y ; rule (35) for selecting then implies the claim. (In fact, under Assumptions 3 and 4, (y) = for all y close enough to y.) To prove the lemma, it suffices to show that we can find ρ > 0 and R > 0 to establish claims (i)-(v) for any fixed and all z B(z,ρ ). Indeed, given this, in view of the finiteness of, the claims are easily seen to hold for all and z B(z,ρ 0 ) (e.g. take ρ 0 := min ρ and R := max R ). Then by the argument of the first paragraph, we can find a sufficiently small ρ (0,ρ 0 ] so that (y) for all z = (x,y) B(z,ρ) and the proof would be complete. Thus, we now fix and seek appropriate ρ and R. For claim (i), since, we have I(y ) and so Lemma 4.1 implies that J a (A,x,s ) is nonsingular. Since J a (A,x,s ) depends continuously on z, we can find ρ > 0 and R > 0 such that J a (A,x,s ) 1 R, for all z B(z,ρ ). For claim (ii), since the right hand sides of (30) and (31) vanish at z and are continuous in z, and since J a (A,x,s ) 1 is bounded on B(z,ρ ), by further tightening ρ if needed, we can also make z(z,), a z(z,), c and hence z m (z,) as small as desired on B(z,ρ ) G o. Equation (57) implies that s a (z,) and s m (z,) can also be made small by controlling z a (z,) and z m (z,) respectively. Consider now claim (iii). By complementarity and the definition (85) of ε, it is clear that x i > ε /2, s i < ε /2 for i I(y ) and x i < ε /2, s i > ε /2 for i n \ I(y ) can be made to hold for all z B(z,ρ ) G o by further reducing ρ if needed. Further, since x n\ = 0 (since I(y ) ) and in view of (47), x a n\(z,) = 0 also, using the triangle inequality we have x a i(z,) x i x a (z,) x x a (z,) + x x, (95) for all i n and for all z B(z,ρ ) G o. Thus, in view of (87) (note the inequality in there is strict), by further tightening ρ if needed (since x x can thus be made arbitrarily small), the right hand side of (95) can be made less than ε /2 for all z B(z,ρ ) G o. Hence, for i n \ I(y ) (since x i = 0), x a i (z,) < ε /2 and for i I(y ) (since x i ε ), x a i (z,) > ε /2. Similar arguments apply for x m i (z,), s a i (z,), and sm i (z,). 18