A NOTE ON A GLOBALLY CONVERGENT NEWTON METHOD FOR SOLVING. Patrice MARCOTTE. Jean-Pierre DUSSAULT

A NOTE ON A GLOBALLY CONVERGENT NEWTON METHOD FOR SOLVING MONOTONE VARIATIONAL INEQUALITIES Patrice MARCOTTE Jean-Pierre DUSSAULT Resume. Il est bien connu que la methode de Newton, lorsqu'appliquee a un probleme d'inequation variationnelle fortement monotone, converge localement vers la solution de l'inequation, et que l'ordre de convergence est quadratique. Dans cet article nous montrons que la direction de Newton constitue une direction de descente pour un objectif non dierentiable et non convexe, et ceci m^eme en l'abscence de l'hypothese de monotonie forte. Ce resultat permet de modier la methode et de la rendre globalement convergente. De plus, sous l'hypothese de forte monotonie, les deux methodes sont localement equivalentes: il s'ensuit que la methode modiee herite des proprietes de convergence de la methode de Newton: identication implicite des contraintes actives a la solution (sous l'hypothese de stricte complementarite) et ordre de convergence quadratique. 1

A NOTE ON A GLOBALLY CONVERGENT NEWTON METHOD FOR SOLVING MONOTONE VARIATIONAL INEQUALITIES Patrice MARCOTTE () Jean-Pierre DUSSAULT () () College Militaire Royal de Saint-Jean Saint-Jean-sur-Richelieu, Quebec, Canada J0J 1R0 GERAD, Ecole des Hautes Etudes Commerciales Montreal, Quebec, Canada H3T 1V6 () Departement de Mathematiques et Informatique Universite de Sherbrooke, Sherbrooke, Quebec, Canada J1K 2R1 Abstract. It is well-known (see Pang and Chan [7]) that Newton's method, applied to strongly monotone variational inequalities, is locally and quadratically convergent. In this paper we show that Newton's method yields a descent direction for a nonconvex, nondierentiable merit function, even in the abscence of strong monotonicity. This result is then used to modify Newton's method into a globally convergent algorithm by introducing a linesearch strategy. Furthermore, under strong monotonicity (i) the optimal face is attained after a nite number of iterations (ii) the stepsize is eventually xed to the value one, resulting in the usual Newton step. Computational results are presented. Keywords. Mathematical Programming. Variational Inequalities. Newton's method. Research supported by NSERC grants A5789 and A5491. 1

1. Problem formulation and basic denitions. Let be a nonempty, convex and compact subset of R n. Consider the variational inequality problem consisting in nding x in such that: (x? x) t F (x ) 0 8x 2 (V IP ) where F is a continuously dierentiable, monotone mapping from into R n : (x? y) t (F (x)? F (y)) 0 8x; y 2 (1) and the compacity assumption ensures that the variational inequality possesses at least one solution. To solve V IP, Newton's method generates a sequence fx k g, where x 1 is any feasible point in and x k+1 is solution to the variational inequality problem obtained by linearizing F around the previous iterate x k, i.e.: (x k+1? x) t (F (x k ) + F 0 (x k )(x k+1? x k )) 0 8x 2 (LV IP (x k )) In the above expression, F 0 (x k ) denotes the (not necessarily symmetric) Jacobian matrix of F evaluated at x k. In order that Newton's method be ecient, it is clear that the linearized problem LV IP (x k ) must be easier to solve than the original V IP. This might be the case if possesses some simple (e.g. polyhedral) structure for which a nitely convergent algorithm is available. In the remainder of the paper, the following characterizations of ordinary, strict and strong monotonicity will be used: Monotonicity: F is monotone on if: (x? y) t (F (x)? F (y)) 0 8x; y 2 (2) Strict monotonicity: F is strictly monotone on if: (x? y) t (F (x)? F (y)) > 0 8x; y 2 (3) Strong monotonicity: constant such that: F is strongly monotone on if there exists a positive (x? y) t (F (x)? F (y)) kx? yk 2 8x; y 2 (4) Finally, let us dene some quantities associated with V IP : Denition 1. Denition 2.?(x) def = arg min y2 y t F (x) The gap function associated with V IP is dened as: g(x) def = max(x? y) t F (x) y2 = (x? y) t F (x) for any y 2?(x) 2

It is clear that x is solution to V IP if and only if g(x ) = 0. The gap function, though in general nondierentiable and nonconvex, can be driven to zero in a monotone fashion by specialized algorithms (Marcotte [5], Marcotte and Dussault [6]). The term \gap function" has been used by Hearn [3] to denote the same function, although in an optimization framework, i.e. when F is the gradient of some convex function f. Denition 3. If is polyhedral and the solution set is a singleton, we say that strict complementarity holds at the solution x if (x? x) t F (x ) = 0 implies that x lies in the optimal face, i.e. the minimal face of containing x. 2. A globally convergent Newton algorithm. In this section we present algorithm GNEW, obtained by incorporating a linesearch to the basic method. ALGORITHM GNEW Let x 1 2 k 1 while convergence criterion not met do 1. FIND DESCENT DIRECTION Let x 2 satisfy LV IP (x k ), i.e.: (x? x) t (F (x k ) + F 0 (x k )(x? x k )) 0 8x 2 (5) Set d k x? x k 2. LINE SEARCH if g(x k+1 ) :5g(x k ) then k 1 else let k 2 arg min 2[0;1] g(x k + d k ) endif endwhile Set x k+1 k k + 1 x k + k d k Remark. At step 2 (linesearch) of algorithm GNEW, the constant.5 could be replaced by any positive number strictly less than 1. Also inexact linesearch techniques such as Armijo-Goldstein can be implemented. Lemma. If x k is not solution to V IP, then the direction d k generated by GNEW is a feasible descent direction for g at x k. Proof. We have, by Danskin's rule of dierentiation of max-functions (see Danskin [2]): g 0 (x; d) = max d t r x f(x? y) t F (x)g y2?(x) 3

Therefore: g 0 (x k ; d k ) = max (x? x k ) t (F (x k ) + F 0t (x k )(x k? y)) = (x? x k ) t F (x k ) + max (x k? y) t F 0 (x k )(x? x k ) = (x? x k ) t F 0 (x k )(x k? x) + (x? x k ) t (F 0 (x k )x + F (x k )? F 0 (x k )x k ) + max [(y? x k ) t F (x k ) + (x k? y) t (F 0 (x k )x + F (x k )? F 0 (x k )x k )] < max (x? y) t (F (x k ) + F 0 (x k )(x? x k )) since the rst term is nonpositive by monotonicity of F, and the third term is strictly negative, since x k is not a solution of the variational inequality. The term on the last line has been obtained by adding the second and fourth terms. Hence: g 0 (x k ; d k ) < 0 max (x? y) t (F (x k ) + F 0 (x k )(x? x k )) since x is solution to the linearized problem. QED Theorem 1. (GLOBAL CONVERGENCE) Let fx k g be a sequence generated by algorithm GNEW. Then lim k!1 g(x k ) = 0 and the limit point of any convergent subsequence is a solution to V IP. Proof. Following Luenberger ([4], section 6.5), global convergence will be obtained if the point-to-set mapping x k! D(x k ) = fd k = x? x k with x solution to LV IP (x k )g is a closed mapping. Let f n g a convergent sequence of points in and its limit point. Let also f n g be a sequence converging to and satisfying: 2 D( n ). Write n = n? n and =?, where n is solution to LV IP ( n ), i.e. : ( n? x) t (F ( n ) + F 0 ( n )( n? n ) 0 8x 2 : (8) Taking limits on both sides of (8), and from the continuity of F and F 0, we obtain: (? x) t (F () + F 0 ()(? )) 0 8x 2 (9) i.e. that is solution to LV IP (). Thus 2 D() and the mapping D is closed. The continuity of g then ensures that the limit of any convergent subsequence (by compacity of there exists at least one such subsequence) is solution to V IP. QED 4

3. Local convergence results. The next three results precise the behaviour of the iterates generated in a neighborhood of the solution x, when it is unique. Theorem 2. If (i) F is strongly monotone on with Lipschitz continuous Jacobian F 0 (ii) is a polyhedron (iii) strict complementarity holds at the (unique) solution x, then there exists an index K such that k = 1 for all k K. Proof. From the proof of proposition 8 in Marcotte and Dussault [6] we have that kx? xk O(g(x)). From the quadratic convergence of Newton's method it then follows that g(x k+1 ) c[g(x k )] 2 for some constant c, whenever x k lies in some neighborhood B(x ; ) of x. If K is chosen such that x k 2 B(x ; ) and g(x k ) 1=(2c) for all k K then k will be set to 1 at step 2 of algorithm GNEW. QED Corollary. Under the assumptions of theorem 2, algorithm GNEW is quadratically convergent. Proof. For k K, GNEW and Newton's method are equivalent. QED Theorem 3. (IDENTIFICATION OF OPTIMAL FACE) If is polyhedral, the solution x is unique, and strict complementarity holds at the solution, then there exists an index L such that x k lies in the optimal face T whenever k L. Proof. Assume the result does not hold. Then there exists a subsequence fx k g k2i and an extreme point y of the optimal face T k associated with LV IP (x k ) such that y =2 T ; in other \words": (x k? y) t (F (x k?1) + F 0 (x k?1)(x k? x k?1)) = 0 8k 2 I (10) but: Passing to the limit in (10) there comes: in contradiction with (11). (x? y) t F (x ) = < 0 (11) (x? y) t F (x ) = 0 QED Remark. It is known (see Robinson [8]) that Newton's method does not require strong monotonicity for quadratic convergence. However, to get second-order convergence, invertibility of the Jacobian matrix restricted to the optimal face T is usually assumed; since monotonicity of F on is required for global convergence, this implies that the restriction of F to T has to be strongly monotone in a neighborhood of x (relative to T ). Since T is not known a priori, the strong 5

monotonicity condition on all of cannot be substantially weakened. Furthermore, in the proof of theorem 2, the statement kx? xk O(g(x)) may fail to be valid if F is not strongly monotone in a neighborhood of the solution. 4. Numerical results. The algorithm has been developed while investigating ecient methods for solving large-scale network equilibrium problems, using the restriction strategy of von Hohenbalken [9]. At each iteration, a variational inequality problem on the unit simplex is being solved. For the restricted variational inequality, the gap function can be evaluated by inspecting its value at the extreme points dening the current restriction, rather than by solving a linear program over. In our implementation, the linearized subproblems have been solved using Lemke's complementary pivot algorithm, while the linesearch followed Armijo's rule. 1 The pseudo-random cost functions generated assumed the general form: F (x) = (A? A t )x + B t Bx + C(x) + b = Ax + Bx + C(x) + b The entries of matrices A and B are randomly generated from uniform variates; C(x) is a nonlinear diagonal mapping whose i'th component has the form: C i (x) = arctan(x i ) The constants and can be used to vary the asymmetry level of the cost mapping, to increase the importance of the nonlinear term, making the problem more dicult to solve for Newton's method. The parameter b has been adjusted in order that the optimal solution be a priori known. Five test problems of sizes 4, 5, 5, 15 and 25 have been solved. Data for the smallest problems are given in tables 1 and 4. Numerical results are given in the remaining tables. As can be readily observed, a high value of parameter can force Newton's method into an erratic behaviour (see table 7), while small corrections (see table 8) are sucient to make the method convergent. In one instance (see tables 2 and 3) a single stepsize of length less than one resulted in a diminution of the number of iterations from ten to ve. Finally it has been observed that both algorithms seem to be rather insensitive to variation in the asymmetry level of the cost mapping, which is monitored by the relative importance of parameters and. 1 Armijo's stepsize rule usually requires that the objective be continuously dierentiable. However, in our case, it can be proven that the directional derivative along Newton's direction varies continuously with x, due to the dierentiability of the cost mapping F. Hence it is unnecessary to use a more sophisticated linesearch strategy, such as the one proposed by Miin [10]. On our test examples, Armijo's rule performed very satisfactorily. 6

References [1] A. Auslender, Optimisation; methodes numeriques, Masson, Paris (1976). [2] J.M. Danskin, \The theory of min-max, with applications", SIAM Journal of Applied Mathematics 14, 641-664 (1966). [3] D.W. Hearn, \The gap function of a convex program", Operations Research Letters 1, 67-71 (1981). [4] D.J. Luenberger, Introduction to linear and nonlinear programming, Addison- Wesley, Reading, Mass. (1973). [5] P. Marcotte, \A new algorithm for solving variational inequalities, with application to the trac assignment problem", Mathematical Programming 33 339-351 (1985). [6] P. Marcotte and J.-P. Dussault, \A modied Newton method for solving variational inequalities", Proceedings of the 24 th IEEE Conference on Decision and Control, Fort Lauderdale, December 11-13 (1985). [7] J.S. Pang and D. Chan, \Iterative methods for variational and complementarity problems", Mathematical Programming 24, 284-313 (1982). [8] S.M. Robinson, \Generalized equations", in Mathematical programming: The state of the art, Bachem, Grotschel, Korte ed., Springer-Verlag, Berlin (1983). [9] B. von Hohenbalken, \A nite algorithm to maximize certain pseudo-concave functions", Mathematical Programming 8, 189-196 (1975). [10] R. Miin, \A superlinearly convergent algorithm for one-dimensional minimization with convex functions", Mathematics of Operations Research 8, 189-206 (1983). Aknowledgments. We want to thank Rene Ferland for his careful programming of the test problems, and a referee for suggestions that helped improve the paper. 7