A Global Regularization Method for Solving. the Finite Min-Max Problem. O. Barrientos LAO June 1996

Size: px

Start display at page:

Download "A Global Regularization Method for Solving. the Finite Min-Max Problem. O. Barrientos LAO June 1996"

Jeffrey Lloyd
5 years ago
Views:

1 A Global Regularization Method for Solving the Finite Min-Max Problem O. Barrientos LAO June 1996 LABORATOIRE APPROXIMATION ET OPTIMISATION Universite Paul Sabatier, 118 route de Narbonne Toulouse cedex - France Tel lao@cict.fr

2 A Global Regularization Method for Solving the Finite Min-Max Problem 1 O. BARRIENTOS 2 Abstract. In this paper, we present a method for solving the nite nonlinear min-max problem. By using quasi-newton methods, we approximately solve a sequence of dierentiable subproblems where, for each subproblem, the cost function to minimize is a global regularization underestimating the nite maximum function. We show that every cluster point of the sequence generated is a stationary point of the min-max problem, and therefore, in the convex case, is a solution of the problem. Moreover, we give numerical results for a large set of test problems which show that the method is ecient in practice. Key words: min-max, regularization technique, nondierentiable optimization, quasi-newton methods. where 1. Introduction The so-called nite min-max problem can be stated as follows: '(x) = min '(x); (1) x2ir n max j2f1;:::;mg f j(x); (2) and for each j 2 f1; :::; mg the function f j is dened on IR n. In this work, we suppose that all the functions f j are continuously dierentiable on IR n. It is a well known fact that the function ' is generally nondierentiable at points where the maximum in (2) is attained for more than one function f j. In this paper, we present a method to solve the problem (1) by generating dierentiable subproblems whose solutions cluster to stationary points of the original problem. These subproblems are obtained by a global regularization of the function ', while for obtaining the approximated solutions of the subproblems we use a quasi-newton method. This regularization technique was introduced by Ggola & Gomez in [1]; we have added one parameter what will allow us to show, for example, that the sequence generated by the method is bounded. Moreover, in the convex case, we establish that this global regularization of the function ' is essentially the Moreau-Yosida's regularization of the support function of the 1 This work was accomplished while the author was visiting the Laboratoire d'approximation et Optimisation at Universite Paul Sabatier, Toulouse, France. The author is indebted to Professor J.-B. Hiriart-Urruty for bringing to his attention the work of Ggola & Gomez referenced in this paper and for supervising this work. In addition his lucid comments substantially improved this article. 2 Depto. Ingeniera Matematica, Universidad de Chile, Casilla 170/3 Correo 3, Santiago, CHILE. 1

3 unit-simplex in IR m. We also extend to the nonconvex case the results shown in [1] in a clear, simple and direct fashion. On the other hand, some other methods have been proposed in the literature to solve the min-max problem. Most of them use the function ' as a `merit function'. In order to nd a descent direction, Charalambous & Conn [2] and Demyanov & Malozemov [3] use the "?feasibility concept, Han [4] and Murray & Overton [5] solve quadratic programming subproblems. Zang [6] uses a local regularization-type method to approximate ' by smooth functions. The paper is organised as follows: in the second section, we dene the regularized function ' " of the function ', this function will be dened on IR m IR n and we say that IR m is the space of the dual variables. We also show some properties of the regularized function. Next, in section 3, we analyse two dierent theoretical methods for solving the nite min-max problem. First, a primal method, in the sense that we consider the dual variables as parameters. The method consists in the approximated minimization of dierentiable subproblems dened via the regularized function ' ". We show that every cluster point of the sequence of approximated solutions of these subproblems is a stationary point of the problem (1) (see Theorem 3.1 in section 3). Next, a primal-dual method based on the proximal method for saddle functions. In this case, the method is an application of the proximal point theory developed by Martinet in [7] and Rockafellar in [8]. We show that the sequence generated by the method converges to a solution point of the min-max problem (see Proposition 3.4 in section 3). Unfortunately, we could not show that the conditions for obtaining this result are attainable in practice. In section 4, we rstly explain the algorithm proposed to solve the min-max problem and we next describe in detail the method as it is implemented. Finally, in the last section, we show the performance of the proposed method by testing rst, the well known so-called MAXQUAD problem and next, a large set of test problems where the cost function is a nite maximum of random (convex and nonconvex) quadratic functions. 2. Regularization technique In this part, we present the regularization technique that we use to obtain the regularized function underestimating the function '. For simplicity, we consider m convex functions f j : IR n! IR j 2 f1; :::; mg, but it is easy to see that this property is not necessary to obtain the results we are going to derive. Moreover, if we dene the set (unit-simplex of IR n ) U = fu 2 IR m : mx u j = 1; u j 0 for all j 2 f1; :::; mgg; (3) 2

4 it is straightforward to see that the function ' dened in (2) veries the relation '(x) = max u2u mx We now dene the function # : IR m! IR by #(y) = u j f j (x): (4) max j2f1;:::;mg y j for all y = (y 1 ; :::; y m ) T 2 IR m ; (5) and we therefore have the equality '(x) = #(f 1 (x); :::; f m (x)) for all x 2 IR n. On the other hand, for each " > 0 and for each v 2 IR m, we consider the function # " (v; ) : IR m! IR dened by # " (v; y) = max u2u f<u; y>? " 2 ku? vk2 g for all y 2 IR m : (6) We will subsequent see that the function # " (v; ) is a global regularization of the function #. This regularization technique was introduced, in the case " = 1, by Ggola and Gomez in [1]. We will also see that the introduction of the parameter " has interesting properties from the algorithmic viewpoint (see for example Proposition 3.1, Corollary 3.1 or Theorem 3.1 in section 3). We then can write the equality # " (v; y) = sup u2ir m f<u; y>? [ " 2 ku? vk2 + 0 (u)] g; where 0 denotes the indicator function of the set U, that is, 0 (u) = 0 if u 2 U, 0(u) = +1 otherwise. We now consider the perturbated function " of the function 0 dened by (v; u) 2 IR m IR m?! " (v; u) = " 2 ku? vk2 + 0 (u) " > 0; we then have the following equalities # " (v; y) =[ " (v; )] (y) = h? 1 + U _ <v; > + 2" k k2 i (y) =? " 2 kvk2 + inf u2ir mf U(u) + 1 2" k("v + y)? uk2 g; (7) where [ " (v; )] denotes the Legendre-Fenchel conjugate of the function " (v; ), U is the support function of the set U, that is, (y) = max <u; y> for all y 2 u2u IRm, and + _ denotes the inmal convolution operation. 3

5 Thus, if T " denotes the Moreau-Yosida's regularization of the function U, we then can write the following equality # " (v; y) =? " 2 kvk2 + T " ("v + y) for all y 2 IR m ; (8) it tells us that the function # " (v; ) is essentially the Moreau-Yosida's regularization of the function U. We now show some properties of the functions # and # ". In particular, we prove that the function # " (v; ) is a global regularization of the function #. Proposition 2.1 i) For all " > 0 and for all v 2 IR m, the function # " (v; ) is convex dierentiable and its gradient mapping is Lipschitz on IR m. ii) For all " > 0, we have the equality #(y) = max # "(v; y) for all y 2 IR m. v2u iii) For all " > 0 and for all v 2 U, we have the inequality #(y)? " 2 h 1+kvk 2?2 min j2f1;:::;mg v j i # " (v; y) #(y) for all y 2 IR m : (9) Proof. i) The convexity of the function # " (v; ) follows from the rst equality in (7), whereas its regularity follows from the equality (8). ii) It follows from the denition of the function # " (v; ) in (6). iii) The inequality in the right of (9) is straightforward from the denition (6), whereas the inequality in the left follows from the following inequality y j? " 2 hx i6=j v 2 i + (1? v j ) 2i # " (v; y) for all j 2 f1; :::; mg: From now on, for each " > 0, we consider the regularized function ' " of the function ' dened on IR m IR n as ' " (v; x) = # " (v; f(x)) = max u2u f<u; f(x)>? " 2 ku? vk2 g; (10) where f(x) = (f 1 (x); :::; f m (x)) T 2 IR m for all x 2 IR n. For simplicity, for each " > 0, for each x 2 IR n and for each v 2 IR m, we denote by u(v; x) the element in U verifying the following equality ' " (v; x) = <u(v; x); f(x)>? " 2 ku(v; x)? vk2 (11) (for calculation details, see section 4.2). 4

6 Moreover, from the inequality (9), it is easy to deduce the following inequality '(x)? " ' " (v; x) '(x) for all (v; x) 2 U IR n : (12) We next show some properties of the function ' ". In particular, we prove that the function ' " is indeed a global regularization of the function '. Proposition 2.2 Let us suppose that all the functions f j are dierentiable on IR n. Then, i) For all " > 0 and for all v 2 IR m, the function ' " (v; ) is dierentiable on IR P n and r x ' " (v; x) = m u j (v; x)rf j (x) where u(v; x) is dened in (11). ii) For all " > 0 and for all x 2 IR n, the function ' " (; x) is concave dierentiable on IR m and r v ' " (v; x) = "[u(v; x)? v] where u(v; x) is dened in (11). iii) For all " > 0, for all x 2 IR n, for all v 1 ; v 2 2 IR m and for all 2 [0; 1], we have the following inequality ' " (v 1 + (1? )v 2 ; x) ' " (v 1 ; x) + (1? )' " (v 2 ; x) + 1 2" (1? )kr v' " (v 1 ; x)? r v ' " (v 2 ; x)k 2 : (13) Proof. i) The dierentiability of the function ' " (v; ) follows from the dierentiability of the functions # " (v; ) and f j for all j 2 f1; :::; mg. ii) The dierentiability of the function ' " (; x) follows from the denition (10), whereas its concavity will be a consequence of the inequality (13). iii) Given " > 0, x 2 IR n, v 1 ; v 2 2 IR m and 2 [0; 1], from the denition (10), we have the following inequality ' " (v 1 + (1? )v 2 ; x) <u; f(x)>? " 2 kv 1 + (1? )v 2? uk 2 for all u 2 U: We now set u = u 1 + (1? )u 2 with u 1 ; u 2 2 U for obtaining the equality <u; f(x)>? " 2 kv 1 + (1? )v 2? uk 2 = h<u 1 ; f(x)>? " 2 kv 1? u 1 k 2i h + (1? ) <u 2 ; f(x)>? " 2 kv 2? u 2 k 2i + " 2 (1? )k(v 1? u 1 )? (v 2? u 2 )k 2 ; and we end the proof by considering u k dened in (11). = u(v k ; x) k = 1; 2 where u(v k ; x) is 5

7 3. Convergence Results In this part, we rstly are interested in knowing if there exists or not a relation between the set of minima points of the functions ' " (v; ) and the set of minima points of the function '. Next, we will also study if there exists or not a relation between the set of the saddle points of the problem max min ' "(v; x) and the set v2u x2ir n of saddle points of the problem max min <v; f(x)>. v2u x2ir n For simplicity, we suppose that the m functions f j are continuous on IR n and there exists a function f j0 coercive on IR n some j 0 2 f1; :::; mg, that is, lim kxk!+1 f j 0 (x) = +1. Then, for each " > 0 and for each v 2 U, the set of minima points of the functions ' " (v; ) and ' are nonempty. Proposition 3.1 Let us consider the three following sequences: f" k g a sequence of positive real numbers converging to zero, fv k g a sequence in U and ff k g the sequence of functions dened on IR n by If we dene the sequence fx k g IR n as verifying f k (x) = ' "k (v k ; x) for all x 2 IR n : (14) f k (x k ) = min x2ir n f k(x): Then, every cluster point of the sequence fx k g is a solution point of the problem (1). Proof. To prove the proposition we use the following result due to Attouch & Wets: Proposition. (See Ref. [9] or Ref. [10]) If x k is a minimum point of the function f k and if the sequence ff k g epiconverges to f, then every cluster point of the sequence fx k g is contained in the set of minima points of the function f. } Therefore, we only must show that the sequence ff k = ' "k (v k ; )g epiconverges to the function ', but this follows from the inequality (12) and the continuity of the functions ' and ' ". Consequently, we give a sucient condition ensuring that the sequence fx k g is bounded. Note that the second condition is veried if the point x k+1 is simply obtained by using any descent method on the function ' "k (v k ; ) starting from the point x k. 6

8 Proposition 3.2 Let us consider the three following sequences: f" k g a sequence of positive real numbers, fv k g a sequence in U and fx k g a sequence in IR n verifying the following properties: i) P k2in " k < +1, ii) " k + ' "k (v k ; x k ) ' "k (v k ; x k+1 ) for all k 2 IN. Then, the sequence fx k g is contained in the set fx2ir n : '(x)'(x 0 )+2 P k2in " k g. Proof. From the inequality (12) and condition ii), we obtain the following inequality '(x k ) + " k ' "k (v k ; x k+1 ) for all k 2 IN : By using again the inequality (12), we can write '(x k ) + 2" k '(x k+1 ) for all k 2 IN, and we therefore conclude '(x i ) '(x 0 ) + 2 X k2in " k for all i 2 IN : Corollary 3.1 If, in addition to the hypotheses in Proposition 3.2, we suppose that there exists a function f j0 coercive on IR n some j 0 2 f1; :::; mg. Then, the sequence fx k g is bounded. We now present an interesting property of the sequence fv k g U when v k is `optimally' chosen. It tells us that the cluster points of this sequence are always optimal points for the limits of the primal sequence fx k g. Moreover, we characterize the stationarity condition of a point x 2 IR n. More precisely, we have: Lemma 3.1 i) Given a sequence f" k g converging to zero and a sequence fx k g IR n. Let us consider the sequence fv k g U recursively dened by v 0 2 U and v k+1 = u(v k ; x k+1 ) where u(v k ; x k+1 ) is dened in (11). Then, every cluster point (v ; x ) 2 UIR n of the sequence f(v k ; x k )g satises the following relationship '(x ) = <v ; f(x )>: ii) If, in addition to the above equality, the point (v ; x ) 2 U IR n veries mp v j rf j(x ) = 0. Then, 0 ), ) denotes the Clarke's subdierential of the function ' at point x (see Clarke [11]). 7

9 Proof. i) From the denitions (10) and (11), for every u 2 U, we have <v k+1 ; f(x k+1 )>? " k 2 kvk+1? v k k 2 <u; f(x k+1 )>? " k 2 ku? vk k 2 : Now, let us suppose that the sequence f(v k+1 ; x k+1 )g converges to the point (v ; x ); then, by taking limits, we obtain <v ; f(x )> <u; f(x )> for all u 2 U, that is, '(x ) = <v ; f(x )>. ii) It is straightforward by remembering that the rst-order directional derivative ' 0 (x ; d) of the function ' at the point x 2 IR n in the direction d 2 IR n veries the following formula ' 0 (x ; d) = max mx v j <rf j (x ); d>; v2i(x ;d) where the set I(x ; d) = fv 2 U : '(x ) = <v; f(x )>g. In Proposition 3.1, we have shown that the minima points of the functions f k (dened in (14)) cluster to minima points of the function '. From a practical viewpoint, this result is not interesting because the cost for obtaining one minimum point of the function f k may be as expensive as to solve the problem (1). Consequently, we next show one more (computationally) implementable version of this proposition which will be the base to design our algorithm for solving the problem (1) in section 4. Theorem 3.1 Let us suppose that all the functions f j are continuously dierentiable on IR n. Given two sequences f k g, f" k g converging to zero and a point (v 0 ; x 0 ) 2 U IR n, let us consider the sequence f(v k ; x k )g U IR n recursively dened as follows: a) Starting from the point x k, the function ' "k (v k ; ) is minimized until nding a point x k+1 2 IR n verifying kr x ' "k (v k ; x k+1 )k k ; (15) b) v k+1 = u(v k ; x k+1 ) where u(v k ; x k+1 ) is dened in (11). Then, every cluster point x 2 IR n of the sequence fx k g is a stationary point of the problem (1), that is, the point x veries 0 ), ) denotes the Clarke's subdierential of the function ' at point x (see Clarke [11]). 8

10 Proof. Let fx k0 g be the subsequence of fx k g converging to x. From the compacity of the set U, we can suppose that the subsequence fv k0 g of fv k g converges to v 2 U. From Proposition 2.2, i) and (15), we have the following inequality kr x ' "k 0?1 (v k0?1 ; x k 0 )k = k and by taking limits, we obtain mp mx v j rf j(x ) = 0. v k0 j rf j (x k0 )k k 0?1; On the other hand, from Lemma 3.1, we know that the point (v ; x ) veries the relationship '(x ) = <v ; f(x )>. We therefore can conclude that 0 = mx v j rf j(x ) ): P Corollary 3.2 If, in addition to the hypotheses in the above theorem, we suppose that the sequence f" k g veries " k < +1, all the functions f j are convex on IR n, and there exists f j0 k2in coercive on IR n some j 0 2 f1; :::; mg. Then, every cluster point of the sequence fx k g is a solution point of the problem (1). Moreover, we have the following equalities lim k!+1 ' " k (v k ; x k+1 ) = lim k!+1 '(xk+1 ) = min '(x): x2ir n Proof. Firstly, from Corollary 3.1, we know that the sequence fx k g is bounded. Now, given x 1 ; x 2 2 IR n, x 1 6= x 2, two cluster points of the sequence fx k g, from the above theorem we have that 0 1)\@'(x 2). Since ' is a convex function, that means that both points are solutions of the problem (1), that is, = '(x 1) = '(x 2) = min x2ir n '(x): From the above reasoning, we also can deduce that the sequence f'(x k+1 )g is bounded and that it has as unique cluster point, it therefore converges to. Finally, by using (12), the sequence f' "k (v k ; x k+1 )g converges to. P Remark. In Theorem 3.1, if we suppose that the sequence fx k g is bounded (from Corollary 3.1, this holds if " k < +1) and the function ' has only one k2in stationary point x 2 IR n, then the sequence fx k g converges to x. 9

11 So far, we have used v 2 U as a parameter, we now are interested to know what are the consequences when we consider v as an independent variable. In this way, a well known relation exists between the solution points of the problem (1) and the saddle points of the problem max v2u min <v; f(x)>. Then, it is interesting n min ' "(v; x) and if there x2ir n min <v; f(x)>. n x2ir to know what are the saddle points of the problem max v2u exists any relation with those of the problem max v2u x2ir We next completely characterise the saddle points of these problems. Proposition 3.3 Let us suppose that all the functions f j are convex dierentiable on IR n. Given " > 0, let us consider the two following problems: (i) max v2u min <v; f(x)> (ii) max min ' n "(v; x): v2u x2ir n x2ir Then, the following statements are equivalent: a) (v ; x ) 2 U IR n is a saddle point of the problem (i). b) (v ; x ) 2 U IR n is a saddle point of the problem (ii). c) (v ; x ) 2 U IR n is a point verifying the equalities m P <v ; f(x )> = '(x ). v j rf j(x ) = 0 and Proof. a) ) b). Let (v ; x ) 2 U IR n be a point verifying a), that is, <u; f(x )> <v ; f(x )> <v ; f(x)> for all u 2 U; x 2 IR n : From the inequality in the left, we have <u; f(x )>? " 2 ku? vk2 <u; f(x )> <v ; f(x )> for all u; v 2 U; and therefore by taking the supremum on u 2 U, we obtain ' " (v; x ) <v ; f(x )> for all v 2 U: On the other hand, from the denition (10), we have the following inequality <v ; f(x)> = <v ; f(x)>? " 2 kv? v k 2 ' " (v ; x) for all x 2 IR n : Now, it is straightforward to deduce that <v ; f(x )> = ' " (v ; x ), and therefore that the point (v ; x ) satises b). 10

12 b) ) c). Let (v ; x ) 2 U IR n be a point verifying b), that is, ' " (v; x ) ' " (v ; x ) for all v 2 U; (16) ' " (v ; x ) ' " (v ; x) for all x 2 IR n : (17) First, we write the rst order optimality condition to v in (16), that is, <r v ' " (v ; x ); v? v > = "<u(v ; x )? v ; v? v > 0 for all v 2 U: We then can conclude that u(v ; x ) = v, ' " (v ; x ) = <v ; f(x )> and, P from the inequality (17), 0 = r x ' " (v ; x ) = m v j rf j(x ). Now, from the inequality (16) and Proposition 2.1, ii), we also see that '(x ) = max v2u ' "(v; x ) = ' " (v ; x ) = <v ; f(x )>: Therefore, we have shown that the point (v ; x ) 2 U IR n satises c). To nish the proof, we show that c) ) a). Let (v ; x ) 2 U IR n be a point verifying the statement c). Let us consider the function g dened by g(x) = <v ; f(x)> for all x 2 IR n. Since the function g is convex dierentiable on IR n, we have the equivalence: rg(x ) = mx v j rf j(x ) = 0, <v ; f(x )> <v ; f(x)> for all x 2 IR n : On the other hand, from (4), it is easy to see that we have the inequality <v ; f(x )> = '(x ) = max u2u <u; f(x )> <v; f(x )> for all v 2 U: Therefore, the point (v ; x ) 2 U IR n satises a). Remark. Let (v ; x ) 2 U IR n be a saddle point of the problem (ii), we have shown while proving b) ) c) that r v ' " (v ; x ) = 0, that is, the point v is a maximum of the concave function ' " (; x ) on IR m. Now, since r v ' " (v ; x ) = 0 implies v 2 U, we can conclude that the set of saddle points of the problem (ii) is not modied if in the denition of this problem we change the set U by IR m. On the other hand, from Lemma 3.1, we see that if the point (v ; x ) 2 UIR n veries the statement c) in above proposition then x is a solution point of the problem (1). 11

13 To end this section, we present a result showing that the sequence generated by the proximal method, developed by Rockafellar in [8], converges to a saddle point of the problem (i) or (ii) in Proposition 3.3 above. Unfortunately, the method has only a theoretical interest because the conditions (18) and (19) below can not be simultaneously veried in practice. Proposition 3.4 Let us suppose that all the functions f j are convex dierentiable on IR n, there exists a function f j0 coercive on IR n some j 0 2 f1; :::; mg, fc k g is a sequence of positive real numbers bounded away from zero, that is, there exists c > 0 such that P c k c for all k 2 IN, f k g is a sequence of positive real numbers such that k < +1. Let us consider the sequence f(v k ; x k )g 2 U IR n recursively k2in generated as (v 0 ; x 0 ) 2 U IR n and (v k+1 ; x k+1 ) 2 U IR n such that? k? r v ' 1 (v k+1 ; x k+1 ) + c?1 k v k+1? k v k k (18) c k? k? r x ' 1 (v k+1 ; x k+1 ) + c?1 k x k+1? k x k k : (19) c k Then, the sequence f(v k ; x k )g converges to a point (v ; x ) 2 UIR n verifying the statement b) in Proposition 3.3 (with " = 1). Proof. To prove the proposition we use the following result due to Rockafellar: Theorem. (See Ref. [8]) Let fz k g be any sequence generated by the proximal point algorithm under the criterion dist? 0; S k (z k+1 ) k c k with X k2in k < +1; (20) where S k (z) = T (z) + c?1 k (z? zk ), T is a maximal monotone operator and fc k g is a sequence bounded away from zero. Suppose fz k g is bounded; this holds under the preceding assumption if and only if there exists at least one solution to 0 2 T (z). Then, fz k g converges weakly to a point z 1 satisfying 0 2 T (z 1 ), and lim k2in kzk+1? z k k = 0. } For applying the above result, we dene the function L on IR m IR n by L(v; x) = ' 1 (v; x)? 0 (v) (v; x) 2 IR m IR n ; where 0 is the indicator function of the set U. 12

14 It is easy to see that the function L is a saddle function in the sense of [12], Section 33, and from the compacity of the set U, we have the following equalities sup v2ir m x2ir inf L(v; x) = sup inf ' n 1(v; x) = inf sup ' 1 (v; x) x2ir n x2ir n v2u = inf x2ir n v2ir sup L(v; x) = inf m We now consider the operator T dened on IR m IR n by v2u (21) '(x): x2ir n T (v; x) = f (u; y) 2 IR m IR n : u x) (v); y )(x) g; denotes the convex subdierential (see Rockafellar [12], Section 23). It is not dicult to show that T is a maximal monotone operator and that we moreover have the following properties dom T = f (v; x) 2 IR m IR n : T (v; x) 6= g dom L = U IR n ; (0; 0) 2 T (v ; x ), (v ; x ) is a saddle point of the problems (21): Finally, we nish the proof by noting that the condition (18) and (19) are sucient to obtain the criterion (20). 4. The Method in Practice In line with theoretical results of the previous section (Theorem 3.1 principally), we present the method designed for solving the problem (1). 4.1 The Algorithm At each iteration we minimize the regularized function ' "k (v k ; ) (" k > 0; v k 2 U xed), where the starting point for this minimization is the point x k obtained for the last regularized function ' "k?1 (v k?1 ; ), that is, we nd x k+1 2 IR n such that ' "k (v k ; x k+1 ) ' "k (v k ; x k ) and kr x ' "k (v k ; x k+1 )k k ; (22) where f k g is a sequence converging to zero. We now update (if necessary) the value of the paramenter v k 2 U, that is, if kv k? u(v k ; x k+1 )k is large where u(v k ; x k+1 ) is dened in (11), we then set v k+1 u(v k ; x k+1 ). Consequently, we can describe precisely the algorithm designed for solving the nite min-max problem (1): Algorithm 0. Let (v 0 ; x 0 ) 2 UIR n be a initial point, let > 0 be a tolerance P threshold, let f" k g be a sequence of positive real numbers such that " k < +1 and let 13 k2in

15 0 < < 1 be a parameter. We dene the rst element of the sequence fg k g by G 0 = kr x ' "0 (v 0 ; x 0 )k, " = " 0 and j = 0. At the k-th iteration we have a point (v k ; x k ) 2 U IR n and two real numbers G k and ". 1. If kr x ' " (v k ; x k )k and ku(v k ; x k )? v k k STOP, because we have found the point (v k ; x k ) satisfying the EXIT. Otherwise go to If kr x ' " (v k ; x k )k >, starting from point x k, we minimize the function ' " (v k ; ) until nding a point x 0 2 IR n verifying kr x ' " (v k ; x 0 )k G k : (23) We update G k+1 kr x ' " (v k ; x 0 )k and x k x 0. Go to If ku(v k ; x k )? v k k >, we update the parameters v k u(v k ; x k ), j j + 1 and " " j. Go to We update x k+1 x k, v k+1 v k and k k + 1. Go to 1. Remark. We have introduced the counting number j because from the proof of Proposition 3.2, the updating of the parameter " must only be made while we update the paramenter v. 4.2 The Implementation First, we have chosen the values " k = [1+(k?1)10?4 ]?2 k 2 IN and = 0:99 for the parameters. Moreover, from the computational experiments, we have seen that these values are not of a vital importance for the algorithm work well. Next, for a given point (v; x) 2 UIR n and " > 0 we must solve the minimization problem in step 2 above. That is, for each regularized function ' " (v; ), we must nd a point satisfying the condition (23). For it, we apply a quasi-newton method, more precisely, we generated a sequence of points fx i g IR n with x 1 = x k using descent directions as follows: x i+1 = x i + i d i, where d i is a descent direction for the function ' " (v; ) at point x i. We take d i =?H i r' " (v; x i ), where H i is an approximation to the inverse of the Hessian of the function r' " (v; ) at point x i (a positive denite matrix). To update the Hessian we use the Broyden-Fletcher- Goldfarb-Shanno (BFGS) updating formula with Powell's strategy to preserve positive deniteness (see Minoux [13], Vol. I, chapter 4). The step-length i is found using Wolfe's method for the one-dimensional search. In this case, the parameters set are 0.1 and 0.7 (see Hiriart-Urruty & Lemarechal [14], chapter II). On the other hand, at every point x 2 IR n, we must evaluate the objetive function ' " (v; x), so we need to compute u(v; x) 2 U verifying the equality (11), that is, u(v; x) 2 U for which the following maximum is attained (' " (v; x) =) max u2u f<u; f(x)>? " 2 ku? vk2 g: 14

16 For solving this problem, we use the same technique as the one described in Ggola and Gomez [1], that is, we apply a Lagrange multiplier technique to the above problem. Thus, we obtain the following formula for the solution: u j (v; x) = 8 < f j (x) + "v j? (v; x) " : 0 otherwise; P where (v; x) 2 IR saties m P u j (v; x) = m if f j (x) + "v j? (v; x) > 0 max fj (x)+"v j?(v;x) ; 0 " = 1. This last equation can be solved very easily because it is piecewise linear in the variable. In practice we calcule (v; x) using the following procedure: (24) i) Generate the vector 2 IR m, ordering in a decreasing order the elements of j = f j (x) + "v j j 2 f1; :::; mg, P ii) nd j 0 as the rst index in the set f1; :::; m?1g such that j0 j j 0 j0+1+", otherwise set j 0 = m. Then, we dene (v; x) = h j0 P i. j? " j 0. Therefore, we can suppose that we have obtained x 0 2 IR n verifying the condition (23), we then proceed (if necessary) to change the regularized function, that is, we take new values of the parameters v and ", we set v = u(v; x 0 ) and " = " j+1 where u(v; x 0 ) 2 U is obtained by formula (24). Finally, in order to justify our updating of the parameter v, we consider of vector v() 2 IR m dened by v() = v + [u(v; x)? v] 0: Since r v ' " (v; x) = "[u(v; x)? v], we see that v() is the direction of maximum increase for the function ' " (; x). Moreover, it is easy to show that if r v ' " (v; x) 6= 0 then the largest step-length max ( 1) such that v() 2 U for all 2 [0; max ] is given by max = min vj : 0 u j < v j : v j? u j We next estimate the increase of the function 2 [0; max ]! ' " (v(); x). Moreover, it is straightforward to see that to set = 1 is the best a priori estimation of the solution of the problem max ' "(v(); x). 2[0; max] 15

17 Proposition 4.1 For all 2 [0; max ], we have the following inequality ' " (v; x)+ (2?) "ku(v; x)?vk 2 ' " (v(); x) ' " (v; x)+"ku(v; x)?vk 2 : (25) 2 Proof. From the denition (10), for all 2 [0; max ], we have the equality ' " (v(); x) =?" 2 2 ku(v; x)? vk2? "<v; u(v; x)? v> + supf<; f(x) + "[u(v; x)? v]>? " 2U 2 k? vk2 g; by setting = u(v; x), we then obtain the inequality in the left of (25). On the other hand, from the concavity of the function ' " (; x), we have the following inequality ' " (v(); x) ' " (v; x) + <r v ' " (v; x); v()? v>; and as r v ' " (v; x) = "[u(v; x)? v] we obtain the inequality in the right of (25). 5. Numerical Results The computational results presented below were all obtained using a CD 9460 computer (Silicon Graphics). The CD 9460 is a multiprocessor computer, in this case, with 4 MIPS R4400 MC 64 bits processors. The R4400 is a RISC-processor of 3-th generation with a SIMM memory of 1 Mb. The CD 9460 has a SIMM memory of 512 Mb and a CPU speed of 150 MHz. Test Problems. dened on IR n as For every test problems, we consider m quadratic functions f j f j (x) = <Q j x; x>? <c j ; x> + d j ; (26) where Q j = (q j ik ) is a symmetric nn matrix, cj = (c j i ) is a vector in IRn and d j is a real number. Test Problem 5.1 It is the well known so-called MAXQUAD problem, that is, the data in (26) are given by the fancy formulae: q j ik = exp( i k ) cos(i k) sin(j) i < k; cj i = exp( i j ) sin(i j); dj = 0 P and the diagonals are q j ii = i n j sin(j)j + jq j ikj. See Ref. [15], pp , for i6=k more details on this test problem. 16

18 We solved the above problem for dimensions n=10 and m=5 starting from two dierents points x 0 = (0; :::; 0) T 2 IR 10 and x 1 = (1; :::; 1) T 2 IR 10 ; for both the starting dual point was v 0 = (1; 0; :::; 0) T 2 IR 5. Table 5.1 below summarizes the computational results with tolerance threshold = 5 10?9. Table 5.1 func jac mult iter time norm x x func=number of function evaluations mult=number of quadratic pbs. solved time=cpu time (in seconds) jac=number of Jacobian evaluations iter=number of algorithm iterations norm=starting gradient norm Test Problem 5.2 In this case, the data in (26), q j ik ; cj i ; dj 2 [?10; 10] P i < k are randomly generated real numbers and the diagonals are q j ii = 1 + jq j ik j. i6=k Thus, Q j is a symmetric diagonal dominating matrix. For every problems we have considered as starting points x 0 = (1; :::; 1) 2 IR n and v 0 = (1; 0; :::; 0) T 2 U. Test Problem We solved 7 problems (10 for each dimension), the smallest of dimension 50 (10 functions to maximize) and the largest of dimension 500 (10 functions to maximize). The average CPU time was 5.7 secs. for n=50 and m=10, to secs. for n=500 and m=10. The minimum starting gradient norm was norm=3638. Table below summarizes the computational results with tolerance threshold = 5 10?4. Test Problem We solved 7 problems (10 for each dimension), the smallest of dimension 50 (10 functions to maximize) and the largest of dimension 100 (70 functions to maximize). The average CPU time was 8.1 secs. for n=50 and m=10, to secs. for n=100 and m=70. The minimum starting gradient norm was norm=3554. Table below summarizes the computational results with tolerance threshold = 5 10?6. Test Problem 5.3 In this case, in formula (26), Q 1 is a symmetric diagonal dominating matrix generated as in Test Problem 2 and for each j 2 f2; :::; mg the data q j ik ; cj i ; dj 2 [?10; 10] i k are randomly generated real numbers. Note that for all j 2 f2; :::; mg the symmetric matrix Q j is not necessarily positive semidenite. For every problems we have considered as starting points x 0 = (1; :::; 1) 2 IR n and v 0 = (1; 0; :::; 0) T 2 U. We solved 7 problems (10 for each dimension), the smallest of dimension 50 (10 functions to maximize) and the largest of dimension 300 (10 functions to maximize). The average CPU time was 9.6 secs. for n=50 and m=10, to secs. for n=300 and m=10. The minimum starting gradient norm was norm=

19 Table 5.3 below summarizes the computational results with tolerance threshold = 5 10?6. Table n m func jac mult iter time min max norm Table n m func jac mult iter time min max norm Table 5.3 n m func jac mult iter time min max norm n=number of variables func=average number of function evaluations mult=average number of quadratic pbs. solved time=average CPU time (in seconds) max=maximun CPU time (in seconds) m=number of functions jac=average number of Jacobian evaluations iter=average number of algorithm iterations min=minimum CPU time (in seconds) norm=average starting gradient norm 6. Conclusions We have presented a method for solving the nite min-max problem based on the idea of regularizing to obtain a sequence of dierentiable subproblems. This method principally diers from the others in the literature because rstly, we globally regularize the nondierentiable nite maximum function (to dierence of 18

20 [6]), and secondly, we do not use the "-feasibility concept (see [2] or [3]). We have shown in a clear and direct fashion that every cluster point of the sequence of approximated solutions of these problems is a stationary point of the min-max problem, and therefore, in the convex case, is a solution point of this problem. The implementation becomes very simple because for solving (approximately) the dierentiable subproblems we have used a quasi-newton method, thus we do not need to solve quadratic programming problems for obtaining the descent directions of these subproblems. Finally, from the numerical results we conclude that the proposed method is probably one of the most robust and ecient methods for nite min-max optimization, even in the nonconvex case. Therefore, we think that the present method is of practical interest. References [1] GIGOLA, C. & G OMEZ, S., A regularization method for solving the nite convex min-max problem, SIAM J. Numer. Anal., Vol. 27, pp , [2] CHARALAMBOUS, C. & CONN, A.R., An ecient method to solve the minmax problem directly, SIAM J. Numer. Anal., Vol. 15, pp , [3] DEMYANOV, V.F. & MALOZEMOV, V.N., On the theory of non-linear min-max problems, Russian Math. Surveys, Vol. 26, pp , [4] HAN, S.P., Variable metric methods for minimizing a class of non-dierentiable functions, Math. Programming, Vol. 20, pp. 1-13, [5] MURRAY, W. & OVERTON, M.L., A projective Lagrange algorithm for nonlinear min-max optimization, SIAM J. Sci. Statist. Comput., Vol. 1, pp , [6] ZANG, I., A smoothing-out technique for min-max optimization, Math. Programming, Vol. 19, pp , [7] MARTINET, B., Regularisation d'inequations variationnelles par approximations succesives, Rev. Francaise Inf. Rech. Oper., Vol. R-3, pp , [8] ROCKAFELLAR, R.T., Monotone operators and proximal point algorithm, SIAM J. Control and Optimization, Vol. 14, pp , [9] ATTOUCH, H. & WETS, R.J.B., Approximation and convergence in Non-linear optimization, Nonlinear Programming, Edited by O. Mangasarian, R.R. Meyer and S.M. Robinson, Academic Press New York, New York, Vol. 4, pp , [10] HIGLE, J.L. & SEN, S., Epigraphical Nesting: A unifying theory for the convergence of algorithms, J. of Optimization Theory and Applications, Vol. 84, pp , [11] CLARKE, F.H., Generalized gradients and applications, Trans. Amer. Math. Soc. Vol. 205, pp , [12] ROCKAFELLAR, R.T., Convex Analysis, Princeton University Press, Princeton, New Jersey, [13] MINOUX, M., Programmation mathematique, theorie et algorithmes, Dunod, Paris, France, [14] HIRIART-URRUTY, J.B. & LEMAR ECHAL, C., Convex Analysis and Minimization Algorithms, Grundlehren der mathematischen Wissenschaften 305, Springer- Verlag, New York, [15] LEMAR ECHAL, C. & MIFFLIN, R., Editors, Nonsmooth optimization: Proceedings of a IIASA Workshop 1977 Laxenburg, IIASA proceedings series, Vol. 3, Pergamon Press Ltd., Oxford, England,

Methods for a Class of Convex. Functions. Stephen M. Robinson WP April 1996

Methods for a Class of Convex. Functions. Stephen M. Robinson WP April 1996 Working Paper Linear Convergence of Epsilon-Subgradient Descent Methods for a Class of Convex Functions Stephen M. Robinson WP-96-041 April 1996 IIASA International Institute for Applied Systems Analysis