Solving generalized semi-infinite programs by reduction to simpler problems. G. Still, University of Twente January 20, 2004 Abstract. The paper intends to give a unifying treatment of different approaches to solve generalized semi-infinite programs by transformation to simpler problems. In particular dual-, penalty-, discretization-, reduction- and KKT-methods are applied to obtain equivalent problems or relaxations of a simpler structure. The relaxations are viewed as a perturbation P of the original problem P, depending on a perturbation parameter > 0, and are analyzed by using parametric programg techniques. We give convergence results and results on the rate of convergence for the imal values and the optimal solutions of P when tends towards 0. We review earlier studies and present new ones. Keywords: Semi-infinite programg with variable index sets, penalty methods, parametric programg, discretization. Mathematical Subject Classification 2000: 90C34, 90C31 1 Introduction We consider semi-infinite problems with variable index sets, also called generalized semi-infinite problems, GSIP: f.x/ subject to x F := {x IR n g.x; y/ 0 y Y.x/} where the (variable) index set is given by Y.x/ := {y IR m v.x; y/ 0}. Throughout the paper we assume that the functions f : IR n IR, g : IR n IR m IR, v : IR n IR m IR q are (at least) continuously differentiable. For the special case that the index set Y = Y.x/ does not depend on the variable x, the program is a (common) semi-infinite problem (SIP). Note that in case of SIP the component functions v i ; i = 1; : : : ; q, do not depend on x. Since the 60-th by more than 2000 publications the theory and practice of SIP is well- developed (see e.g. [6], [10] for a survey). GSIP is only studied since about 1985, see e.g. [5], [7] and [13]. It is meanwhile known that the structure of GSIP is (possibly) more complicated than SIP. The structure depends strongly on the behavior of the 1
set-valued mapping Y.x/. If for example Y.x/ is not continuous then the feasible set F need not be closed. To avoid this structural problem we make some assumptions. A1: The feasible set F of GSIP is non-empty and is contained in a compact set X. (This condition can always be forced by adding constraints such as x M; = 1; : : : ; n, to the feasibility condition g 0.) A2: The set-valued mapping Y : X IR m is continuous. A3: There is a compact set Y 0 IR m such that Y.x/ Y 0 x X : Under A3 we can write Y.x/ = { y Y 0 v i.x; y/ 0; i I := {1; : : : ; q} } : These assumptions in particular imply that the feasible set F is non-empty and compact and, so, an optimal solution of GSIP always exists. Remark 1 We emphasize that some of the assumptions in this paper are related. For example, the compactness condition A3 for Y.x/ and the continuity of v imply the upper semicontinuity of the mapping Y.x/ and the MFCQ-condition in A5 ii leads to the lower semicontinuity of Y.x/ (i.e., both conditions entail the continuity of Y in A2). In all approaches to solve GSIP numerically, one tries to relax GSIP locally or globally to a simpler problem, e.g., to a SIP or a finite program (FP). This paper intends to give a unifying treatment of such approaches, comprising earlier results and providing new ones. Remark. GSIP includes the max problem, as treated in [12]: x max y Y.x/ g.x; y/ : Obviously, this problem can be formulated as the GSIP, x; s.t. g.x; y/ 0 y Y.x/ : In the present paper, to keep the exposition as clear as possible, we omit the case of additional equality constraints in the description of Y.x/ and in the feasibility condition, as well as the occurrence of several semi-infinite constraints, g.x; y/ 0 y Y.x/; = 1; : : : ; (as done in [9]). We emphasize that these generalizations are straightforward and only lead to technical complications which we wish to avoid. 2 Different approaches to relax GSIP In this section we briefly describe different possibilities to transform GSIP to simpler problems. The dual-, the Karush-Kuhn-Tucker- (KKT), the reduction-, and the penalty-method. 2
We first give a useful equivalent formulation of GSIP. To do so let us consider the finite program Q.x/ : max g.x; y/ s.t. v.x; y/ 0 ; (1) depending on the parameter x. Q.x/ is often called lower level problem. Let '.x/ denote the optimal value function of Q.x/ and S.x/ the corresponding set of optimal solutions. Obviously, via the value function '.x/ we can write F = {x X '.x/ 0} and GSIP becomes P : x X f.x/ s.t. '.x/ 0 : (2) Let in the sequel denote the imal value of P and S the set of solutions. Note that, as a imum function, ' is not smooth in general. We assume throughout that at each x S the activity condition '.x/ = 0 holds. Otherwise near x the problem would simply reduce to an unconstrained optimization problem. 2.1 Dual and KKT-approach. Under the following convexity assumptions the problem Q.x/ can be replaced by the Lagrangian dual: AD 1 : Let for any fixed x the functions g.x; y/ and v i.x; y/; i I; be convex in y, i.e., Q.x/ is a convex problem. AD 2 : (Slater condition.) For any x X there exists y Y 0 such that v.x; y/ < 0. We introduce the Lagrange function L.x; y; / = g.x; y/ T v.x; y/ and the Lagrangian primal and dual problems Q P.x/ : '.x/ := max Q D.x/ :.x/ := 0 0 L.x; y; / max L.x; y; / : It is well-known that Q P.x/ coincides with Q.x/ and we have, in addition, Lemma 1 Under AD 1 and AD 2 for all x X the problem Q D.x/ is solvable with '.x/ =.x/. Dual approach: (Under assumptions AD 1 and AD 2 ) The idea (due to [8],[9]) is based on the following observation. The condition.x/ 0 (or equivalently '.x/ 0) can be expressed as: there is some 0 such that L.x; y; / 0 y Y 0 : 3
So our problem P can equivalently be written as the SIP in the variables.x; /: 0;x f.x/ s.t. L.x; ; y/ 0 y Y 0 : KKT-approach: (Under assumptions AD 1 and AD 2 ) If in addition we assume g; v C 1 then the following approach leads to a finite problem. Under our assumptions AD 1 and AD 2 the point y is an optimal solution of Q.x/ if and only if the KKT-conditions are fulfilled with some multiplier : y L.x; y; / = 0 ; T v.x; y/ = 0; 0; v.x; y/ 0 : So the generalized semi-infinite problem P can globally be reduced to the finite program with complementarity constraints: x;y; f.x/ s.t. g.x; y/ 0 y g.x; y/ T y v.x; y/ = 0 T v.x; y/ = 0 ; v.x; y/ 0 In [14] GSIP has been solved numerically by using this idea and applying an interior point method to solve the complementarity condition. 2.2 Local reduction to a finite problem This approach has been described in [5] and [15]. We therefore, here, may omit any technical details. This approach is based on the parametric program Q.x/ (see (1)) and we assume that all functions f; g; v are C 2 -functions. For a feasible x F we define the set of active indices Y a.x/ = {y Y.x/ g.x; y/ = 0} : Note that by definition, any point y from Y a.x/ is a (global) maximizer of Q.x/. Given x IR n, the Mangasarian Fromovitz Constraint Qualification (MFCQ) is said to hold for Q.x/ at y Y.x/ if there is some IR m such that y v i.x; y/ < 0 i I 0.x; y/ := {i I v i.x; y/ = 0} : We say that the Linear Independence Constraint Qualification (LICQ) is satisfied if the gradients y v i.x; y/; i I 0.x; y/ are linearly independent. We now assume at x F, A1 red.x/: For all y Y a.x/ (i.e., maximizer of Q.x/) the following strong regularity conditions hold: linear independence constraint qualification (LICQ), strict complementary slackness (SC), and strong second order optimality conditions (see [15] for details). 4
A1 red.x/ implies that Y a.x/ = {y 1 ; : : : ; y p } is finite and for each y j Y a.x/ with (unique) Lagrange multipliers 0 j;i ; i I 0.x; y j /, the Karush-Kuhn-Tucker (KKT) condition y L y j.x; y j ; j / = y g.x; y j / j;i y v i.x; y j / = 0 (3) i I 0.x;y j / is valid with the Lagrange function L y j with respect to the maximizers y j of Q.x/. Moreover, there are C 1 -functions y j.x/; j = 1; : : : ; p, y j.x/ = y j, defined on a neighborhood U x of x such that '.x/ = max 1 j p G j.x/ where G j.x/ := g.x; y j.x// : (4) So x is a (local) solution of P if and only if x is a local solution of the so-called reduced finite problem P red.x/ : x U x f.x/ s.t. G j.x/ 0; j = 1; : : : ; p : One can show the relation G j.x/ = x L y j.x; y j ; j /. So locally near x the GSIP problem P is equivalent to P red.x/ which is a common finite optimization problem. Let now x be some (local) solution of P (and thus of P red.x/). We moreover assume A2 red.x/: (MFCQ for P): There exists IR n such that G j.x/ < 0; j = 1; : : : ; p. Then with some Lagrange multipliers j 0 necessarily the KKT-condition holds: f.x/ + p j=1 j G j.x/ = 0 : (5) Moreover, it is well-known that under MFCQ the set of multipliers K.x/ = { 0 such that (5) holds} is compact. Thus, the number max K.x/ is well-defined. To solve problem P via P red.x/ we consider locally near to a point ˆx the exact penalty problem (with > 0/: P. ˆx/ : f.x/ := f.x/ + max 1 j p. ˆx/ G + j.x/ where p. ˆx/ denotes the number of locally near ˆx implicitly defined functions y j.x/ (see (4)) and G + j.x/ := max{0; G j.x/}. Based on this problem we obtain the SQP-method with merit function f. (Conceptual form) step k: Let be given x k, the (locally near x k defined) functions G j.x/; j = 1; : : : ; p (p = p.x k /) and a positive definite matrix L k ( fixed, large enough). Compute a solution d k with corresponding multipliers k; j 0 of Q k : d f.x k /d + 1 2 dt L k d s.t. G j.x k / + G j.x k /d 0; j = 1; : : : ; p 5
Compute a solution t k of t>0 f.x k + td k / and put x k+1 = x k + t k d k. We refer to [6] for a discussion of this method for solving common SIP. The following theorem is the basic result for the SQP-method for GSIP. Theorem 1 a. Let x be a local solution of P (i.e., of P red.x/). Let A1 red.x/ and A2 red.x/ hold. Then for any > 0 := max K.x/, the point x is also a local imizer of P.x/. b. Let the condition A1 red.x k / be satisfied at x k (at all solutions of Q.x k /). Then for each > j k; j a solution d k 0 of Q k is a descent direction for f at x k. Proof. a. By our assumptions the function.u/ := x f.x/ s.t. G j.x/ u j ; j, satisfies the Lipschitz condition.u/.0/ B u (for any small u ). This is seen by applying Lemma 6b (Appendix) (with u z; x y; f g; G j v i ). The value for B is obtained from formula (19). So the result follows from Lemma 8. b. See e.g. in [4, Th12.12 ] for such a standard result. Remark. The SQP-method leads (under certain regularity conditions) to a superlinear convergent algorithm. 3 Penalty approaches In this section we describe two approaches that use penalty methods in order to solve the lower level problem Q.x/. The exact penalty method as described in [12] and the L 2 penalty method given in [9]. These approaches transform GSIP to a common SIP. Let for real valued h be defined h +.y/ = max{0; h.y/} and for h : IR s IR q, h +.z/ = max 1 j q h+ j.z/ and h+.z/ 2 2 = Consider the exact, respectivelyl 2, penalty method for Q.x/: q.h + j.z//2 : j=1 Q.x/ : Q.x/ : '.x/ := max {g.x; y/ v +.x; y/ } '.x/ := max {g.x; y/ v +.x; y/ 2 2} and the corresponding relaxations of P (cf. (2)): P : x X f.x/ s.t. '.x/ 0 P : x X f.x/ s.t. '.x/ 0 : Note that both problems are common SIP s (but P is in general non-smooth). It is shown below that for large enough the problems P and P are equivalent. Remark. We comment on the smoothness of the problems. Clearly the max-functions 6
', ' and ' are not differentiable (in general). The function g.x; y/ v +.x; y/ 2 2 is in C 1 whereas the function g.x; y/ v +.x; y/ is not. So P represents a SIP of C 1 type whereas the problem P is only a Lipschitz SIP. Note that P can be written as a disjunctive semi-infinite problem (see [12]): P : f.x/ s.t. {g.x; y/ ; g.x; y/ v i.x; y/; i I} 0 y Y 0 : A4: For all x X the condition MFCQ is satisfied (for Q.x/) at all points y from the sets S.x/ of solutions of Q.x/. The next corollary gives a result of [12] but under weaker assumptions (MFCQ here, instead of LICQ in [12]). Corollary 1 Let A1-A4 hold. Then there exists some such that '.x/ = '.x/ for all x X,, i.e., on X the problems P and P are equivalent. In particular x is a (local) solution of P if and only if x is a (local) solution of P. Proof. Let us define the parametric problem '.x; u/ := max g.x; y/ s.t. v.x; y/ u with feasible set Y.x; u/. We first apply Lemma 6 b with z =.x; u/ for given x and parameter u, u small. This leads to the existence of " = ".x/ > 0 and B = B.x/ such that '.x; u/ '.x; 0/ B.x/ u 0 for u ".x/ with B.x/ b 1, uniformly bounded on X (cf. (17), (18) in the proof of Lemma 6 b) By compactness of X there exists " > 0 such that '.x; u/ '.x; 0/ b 1 u 0 for all x X; u " : Now define R := max x X;y Y0 v +.x; y/. Note that '.x; u/ is continuously depending on x; u for all x; u with Y.x; u/. (By definition '.x; u/ = if Y.x; u/ =.) So the value m := max {'.x; u/ '.x; 0/} x X; u R is well-defined. By construction, for all R u > " the bound '.x; u/ '.x; 0/ m u =" holds. So with L := max{b 1 ; m="} the assumptions of Lemma 8 are satisfied and for all L, it follows '.x/ = '.x/ for all x X. 4 General perturbations The functions ', ', introduced in Section 3 can be considered as a perturbation of the function '. In the next section we will even relax the problems further by smoothing 7
and discretization. All these relaxations can be seen as a perturbation P of the original problem P. Therefore in this section we discuss some general perturbation results used later on. Consider the perturbation of the original problem P (cf. (2), P : x X f.x/ s.t. '.x/ 0 ; (6) depending on the parameter 0 such that P 0 = P. Let, F and S denote respectively the optimal value, the feasible set and the optimal solution set of P. In particular F = F 0 ; S = S 0 ; = 0. We wish to know under which conditions the following holds, whichever solution x of P we take: If 0 then and d.x ; S / 0 : Here, d.x; S / = { x s s S} denotes the distance between x and S. We firstly impose the (imal) assumption to guarantee that ' ', secondly some constraint qualification. AP 1 : There exists a t 1 > 0 and a function : [0; t 1 ] IR + such that./ 0 for 0 and for all x X '.x/ '.x/./ for [0; t 1 ] : AP 2 : For some x S (i.e., '.x/ = 0) (at least one) there exist IR n, t 2 > 0 and a function : [0; t 2 ] IR + such that is strongly monotonically decreasing (for decreasing t) with.0/ = 0 and '.x + t/.t/ for all t [0; t 2 ] : Lemma 2 Let A1-A3 and AP 1, AP 2 be satisfied. Then for 0 and x being a solution of P (provided they exist) it follows and d.x ; S / 0. Proof. Assume to the contrary that d.x ; S / 0, i.e., for some subsequence x X d.x ; S / " > 0. By compactness of X, there will exist a convergent subsequence x ˆx. We now show ˆx S, yielding a contradiction. By AP 1 we find '.x / '.x / +. /. / 0 : Since ' is continuous (cf. Lemma 6 with z = x) we obtain '.x / '. ˆx/ 0. So ˆx F. With the point x S in AP 2 we deduce, if is sufficiently large, '.x + t / '.x + t / +. /.t / +. / 0 (7) 8
if t is chosen such that.t / =. / (i.e., t = 1.. //). Note that t 0 for 0. By (7) x + t F and f.x/ f. ˆx/ = f.x / + f. ˆx/ f.x / f.x + t / + f. ˆx/ f.x / f.x/ for 0, proving ˆx S. The condition d.x ; S / 0, together with the continuity of f directly implies f.x / =. In many cases the following perturbation concept can be applied which requires weaker assumptions than AP 2 above. Consider for 0 the special perturbation P : f.x/ s.t. x F := {x X '.x/ } ; (8) with optimal value, and define the lower level set f := {x X f.x/ + }. Note that since F F the relation holds and, by assumption A1, the set F is nonempty. AP 3 : There exists > 0 such that ' is continuous on F. (Since X is compact a solution of P exists for ). Remark. A1-A3 imply the condition AP 3 (see the proof of Lemma 6a). The following holds (see also [9, Prop. 5.1]). Lemma 3 Let AP 3 be satisfied and consider 0. Then 0 0 and if x F f it follows d.x ; S / 0. Proof. Suppose that for some subsequence x F f we have d.x ; S / " > 0. But (taking a subsequence, if necessary,) x x and the continuity of f and ' yield x S, which constitutes a contradiction. Let now x be an optimal solution of P. By F F clearly = f.x / must hold implying x F f. So d.x ; S / 0 and we can chose x S such that x x 0. Since f is uniformly continuous on the compact set F we deduce 0 = f.x / f.x / 0 for 0 : The results in Lemma 2 and Lemma 3 can be given in a quantitative form. To do so we have to strengthen the assumptions. APS 1 : AP 1 holds with./ = for some positive. APS 2 : (Constraint qualification near all x S.) There exist "; t; ; c 0 > 0 such that for each x S there exists =.x/ IR n satisfying.x/ c 0 and '.x + t/ '.x/ t for x x < "; t [0; t] : 9
Remark. Let A1 red.x/ and A2 red.x/ hold and let x be the unique optimal solution of P (S = {x}). Then (near x) it follows '.x/ = max j G j.x/ (see (4). So in view of A2 red.x/ and using G j C 1 we find '.x + t/ = max j G j.x + t/ = max j {G j.x/ + t G j.x/ + o.t/ } '.x/ + t max j G j.x/ + o.t/ '.x/ t with some > 0 if x x and t are small. Consequently APS 2 holds. AG: (Growth condition) With the imum value and r = 1 or r = 2 for some positive : f.x/ + d.x; S / r for all x F If S = {x} is a singleton then x is called a imizer of order r. Lemma 4 Let A1-A3, APS 1, APS 2 be satisfied and let 0. Then = O./. If moreover AG is satisfied then with the solutions x of P (provided they exist) it follows d.x ; S / = O. 1=r /. Proof. By Lemma 2 and taking " in APS 2 there exist 0 > 0, x S such that x x " 0. Choose ˆx := x + ( =.x /) (see APS 2 ). Then, for small, '. ˆx / '.x / '.x / +. / = 0 for := =. Moreover for any x S (by APS 1 and APS 2 ): '.x + / '.x + / + '.x/ + 0 So x + F F and also ˆx F. In view of f C 1 and c 0 (, see APS 2 ) we deduce f.x/ f. ˆx / = f.x / + O./ f.x + / + O./ f.x/ + O./ = + O./ : Thus = f.x/ f.x / + O./ = + O./ and = f.x / + O./ showing = O./. Under AG we moreover have d. ˆx ; S / r f. ˆx / f.x/ O./ or d. ˆx ; S / O. 1=r / and thus d.x ; S / d. ˆx ; S / + d. ˆx ; x / O. 1=r / With similar (but easier) arguments we also obtain the following result for the perturbation P (see Lemma 3): Let AP 3, APS 2 and AG hold. Then for 0 with solutions x of P 0 = O./ d.x ; S / = O. 1=r / (9) 10
5 Relaxation by smoothing and discretization In this section we convert the (non-smooth) semi-infinite problem P by a simple smoothing approach into a smooth SIP and apply a discretization procedure to relax P and P to a finite program (see [12], [9]). By applying the perturbation theory of Section 4 new convergence results are obtained. The idea is to view all relaxations as a perturbation of the original GSIP problem P. In what follows, for any d > 0, Y d Y 0 will denote a finite discretization of the index set Y 0 such that in the Haussdorf distance d.y d ; Y 0 / = max y Y0 yd Y d y y d the bound d.y d ; Y 0 / d holds. Recall (cf. Corollary 1) that for the function '.x/ = max '.x; y/ with '.x; y/ = g.x; y/ v +.x; y/ coincides with '.x/ on X. As done in [12, Section 3], we write '.x; y/ in the equivalent form Then by setting '.x; y/ = {g.x; y/ ; g.x; y/ v i.x; y/; i = 1; : : : ; q} : h 0.x; y/ := g.x; y/ ; h i.x; y/ := g.x; y/ v i.x; y/; i = 1; : : : ; q the nonsmooth function '.x; y/ is replaced by the smooth approximation (see Lemma 9) ) e ph i.x;y/ ; 1 < p < : (10) ' ; p.x; y/ = 1 p ln ( q i=0 We define two relaxations of '.x/, ' ; p.x/ := max ' ; p.x; y/ and ' ; p;d.x/ := max y Y d ' ; p.x; y/ ; (11) where the second is obtained simply by discretization of Y 0. This leads to the following relaxations of the original problem: P ; p : P ; p;d : x X f.x/ s.t. ' ; p.x/ 0 ; x X f.x/ s.t. ' ; p;d.x/ 0 : P ; p is a (smooth) SIP whereas P ; p;d represents a (smooth) FP. Let ; p ; ; p;d and F ; p ; F ; p;d denote the optimal values and the feasible sets of P ; p ; P ; p;d. It is not difficult to show that by our assumptions the function ' ; p.x/ is Lipschitz continuous on X w.r.t. y Y 0 : For given 0 < < ˆ there exists L 0 such that ' ; p.x; y 1 / ' ; p.x; y 2 / L 0 y 1 y 2 y 1 ; y 2 Y 0 ; x X; p 1; [; ˆ] : (12) The next lemma provides the basic result for the convergence analysis of the relaxations P ; p ; P ; p;d. 11
Lemma 5 Let L 0 ; < ˆ be as in (12), as in Corollary 1 and let A1-A4 hold. Then for all x X, p 1, d 0 and [; ˆ]: 0 '.x/ ' ; p.x/ ln.q+1/ p 0 ' ; p.x/ ' ; p;d.x/ L 0 d 0 '.x/ ' ; p;d.x/ ln.q+1/ p and ; p;d ; p = ; F = F F ; p F ; p;d : + L 0 d Proof. In view of Corollary 1, for we have = and '.x/ = '.x/ for all x X. From Lemma 9 we obtain (see (10)) 0 '.x; y/ ' ; p.x; y/ ln.q + 1/ p Taking the maximum with respect to y Y 0 yields (cf. (11)) 0 '.x/ ' ; p.x/ ln.q + 1/ p x X; y Y 0 : x X : The inequality ' ; p.x/ '.x/ implies F = F F ; p and ; p =. Using Lemma 11 with the Lipschitz constant L 0 in (12) we find 0 ' ; p.x/ ' ; p;d.x/ L 0 d x X : Consequently F ; p F ; p;d and ; p;d ; p. The inequalities so far combine to the last relation ln.q + 1/ 0 '.x/ ' ; p;d.x/ + L 0 d : p Applying Lemma 3 the preceding lemma leads to a convergence result for the relaxations P ; p and P ; p;d. Note that by using A1 and the relations for the feasible sets in Lemma 5 we see that the solutions of the relaxations exist. Theorem 2 Let the assumptions of Lemma 5 hold and suppose. p; d/. ; 0/. Then with solutions x ; p, x ; p;d of P ; p and P ; p;d the following holds for all [; ˆ] : d.x ; p ; S / 0 ; d.x ; p;d ; S / 0 and ; p ; ; p;d : Proof. We show the statement for x ; p;d (the result for x ; p is shown by choosing Y d = Y 0 i.e., d = 0). We set := ln.q + 1/ p + L 0 d and note that. p; d/. ; 0/ implies 0. By using Lemma 5 we deduce f.x ; p;d / = ; p;d. Since x ; p;d is feasible for P ; p;d it follows '.x ; p;d / = '.x ; p;d / ' ; p;d.x ; p;d / + 12 ln.q + 1/ p + L 0 d :
So x ; p;d F f, i.e., the assumptions of Lemma 3 are fulfilled and the result follows. In [12] an algorithm is described to solve P (approximately) by computing solutions x k = x k ; p k ;d k of P k ; p k ;d k for a sequence of parameters. k ; p k ; d k /, with p k ; d k 0 and k is iteratively controlled (kept bounded) by an appropriate test function. It is shown that (under certain assumptions) the solutions x k converges to a critical point of P. The convergence result of the preceding Theorem is new. We now consider the L 2 -penalty approach P in Section 3 (see also [9]): P : x X f.x/ s.t. '.x/ 0 ; '.x/ := max { g.x; y/ v +.x; y/ 2 2} We need some assumptions: A5: Let A1-A3 be satisfied. Let the following hold with constants L g > 0. i. g.x; y 1 / g.x; y 2 / L g y 1 y 2 for all y 1 ; y 2 Y 0 ; x X. ii. For all x X the MFCQ holds for all points y Y.x/. Note that under this assumption, Lemma 7 implies a relation (with some > 0) Theorem 3 Let A5 be fulfilled. Then d.y; Y.x// v +.x; y/ ; y Y 0 ; x X : a. For all > 0, x X the inequality 0 '.x/ '.x/.l g / 2 = holds implying F F and b. Let in addition AP 2 hold and let x be solutions of P for. Then d.x ; S / 0 and. Proof. a. Apply Lemma 10 to ' = '.x/. b. Since by a., 0 '.x/ '.x/.l g / 2 1 for all x X, the result is proven by applying Lemma 2 (with = 1= the assumption AP 1 is satisfied). Remark. In contrast to the smoothed L approach ( ; p, see Lemma 5), in this L 2 penalty method P the converse inequality holds. Consequently, a solution x of P does not automatically fulfill f.x / and x f 1=. So the perturbation approach of Lemma 3 is not directly applicable. Therefore Levitin [9] introduced a further artificial perturbation '.x/ (with parameter > 0) to fit the theory and obtained a convergence 13
result as in Theorem 3 (see [9, Th.5.2]). This leads to an artificial numerical approach which depends on certain unknown parameters. In this paper instead we obtained the convergence result of Theorem 3 directly via Lemma 2. We now apply an additional discretization to P and consider the finite program P ;d : x X f.x/ s.t. ' ;d.x/ 0 ; ' ;d.x/ := max y Y d { g.x; y/ v +.x; y/ 2 2 } where Y d Y 0 is again a discretization of meshsize d, ;d the optimal value and F ;d the feasible set. We cannot directly apply Lemma 11 since the function '.x; y/ := g.x; y/ v +.x; y/ 2 2 does not satisfy a Lipschitz condition uniformly for all 1. However we obtain a similar result as in Theorem 2. Note first that by our assumption v C 1 there exists some L 1 such that v +.x; y 1 / 2 2 v +.x; y 2 / 2 2 L 1 y 1 y 2 y 1 ; y 2 Y 0 ; x X : (13) Theorem 4 Let A5 be fulfilled. Then a. With L 1 in (13) for all > 0, x X the relations 0 '.x/ ' ;d.x/ L g d + L 1 d.l g / 2 1 '.x/ ' ;d.x/ L g d + L 1 d hold implying F F ;d and ;d. b. Let in addition AP 2 hold. Suppose ; d 0 in such a way that L g d + L 1 d 0 (e.g. d = 1= 2 ). Then with solutions x ;d of P ;d : ;d and d.x ;d ; S / 0 : Proof. a. (Modification of the proof of Lemma 11.) For fixed x X let y Y 0 be a maximizer of '. Since d.y d ; Y 0 / d there is some y d Y d such that y d y d. Then 0 '.x/ ' ;d.x/ '.x; y / '.x; y d / g.x; y / g.x; y d / + ( v +.x; yd / 2 2 ) v+.x; y / 2 2 L g y y d + L 1 y y d L g d + L 1 d : In particular, F F ;d and ;d. By combining this inequality with the bound in Theorem 3 a. we obtain the last inequality. 14
b. By a., '.x/ ' ;d.x/ max {.L g / 2 = ; L g d + L 1 d} 0, i.e., AP 1 is satisfied, and the result follows by Lemma 2. Remark. As a direct approach we could also consider a double penalization such as P r ; p;d : x X f.x/ + r '+ ; p;d.x/ : Note that ' + ; p;d.x/ is a Lipschitz function such that this problem represents a nonsmooth unconstrained finite problem which (for small dimensions) could be solved approximately by an appropriate algorithm (e.g. Nelder/Mead). Theoretically, to this approach we can apply Lemma 8 by considering the perturbation./ : x X f.x/ s.t. ' ; p;d.x/ : Under the assumption that./ is Lipschitz continuous, for any r large enough, the problems P ; p;d and P r ; p;d are equivalent. We emphasize that under the additional assumption APS 2 and AG (see Section 4, e.g., Lemma 4) we obtain a rate of convergence in the results of Theorems 2-4. As an illustration we only state the result for the relaxations in Theorem 2. The proof follows directly from the discussions at the end of Section 4 (see (9)) using the arguments in the proof of Theorem 2. Corollary 2 Let, in addition to the assumptions of Theorem 2, the conditions APS 2 and AG hold. Then with solutions x ; p, x ; p;d of P ; p and P ; p;d for all [; ˆ] : 0 ; p ; p;d O.1= p/ + O.d/ and d.x ; p ; S /.O.1= p// 1=r, d.x ; p;d ; S /.O.1= p/ + O.d// 1=r : Remark. Under the stronger assumptions A1 red.x/ and A2 red.x/ we may apply a special type of discretization as described in [16]. This would lead to a faster rate of convergence in the discretization parameter d, namely in the results above, d would then be replaced by d 2. 6 Appendix This section intends to provide a brief survey of results from parametric optimization (cf. [1] and [2] for details) and penalty methods used to prove the results of the paper. Let be given the finite program Q : ' := max y g.y/ s.t. y Y := {y IR m v i.y/ 0 ; i I} ; (14) 15
I := {1; : : : ; q}, with feasible set Y Y 0, Y 0 IR m compact, and S, the set of optimal solutions. Consider now the parametric version, Q.z/ : '.z/ := max y g.z; y/ s.t. y Y.z/ = {y IR m v i.z; y/ 0 ; i I} for z Z IR N, and Z a compact parameter set. Let S.z/ denote the set of optimal solutions of Q.z/. We assume that all the involved functions are C 2 -functions and Aa 1 : There is a compact set Y 0 IR m such that Y.z/ Y 0 z Z. Given z Z, the MFCQ is said to hold for Q.z/ at y Y.z/ if there is some IR m such that y v i.z; y/ < 0 i I 0.z; y/ := {i I v i.z; y/ = 0} : It is well-known that under MFCQ each (local) maximizer y of Q.z/ must satisfy the KKT-condition y L.z; y; / := y g.z; y/ i y v i.z; y/ = 0 (15) i I 0.z;y/ with multipliers i 0. Let K.z; y/ denote the set of all multiplier vectors such that (15) holds. We begin with a result on the behavior of the value function '.z/. Lemma 6 Let Aa 1 hold. a. If the mapping Y.z/ is continuous (on Z) (or if for all z Z the condition MFCQ is satisfied at (at least) one point y S.z/) then the function '.z/ is continuous on Z (and S.z/ is a closed mapping). b. Let MFCQ be satisfied for all z Z and all solutions y S.z/. Then ' is Lipschitz continuous i.e., with some B: '.z/ '.z/ B z z z; z Z : Proof. a. Under Aa 1 and the continuity of Y the continuity of '.z/ can be proved with straightforward arguments. For the proof of the result under the MFCQ assumption we refer e.g. to [2, Prop. 4.4.]. Note that in our situation MFCQ is equivalent with the so-called Robinson Constraint Qualification (see [2, Sect.2.3.4.]) and this qualification implies the assumption iv in [2, Prop. 4.4.]. b. This result can be proven elementary by making use of the properties of MFCQvectors. It also can be deduced from the bounds (see e.g. [11] for a proof) inf y S.z/ K.z;y/ z L.z; y; / d '.z; d/ ' +.z; d/ inf y S.z/ for the Hadamard lower and upper derivatives, max K.z;y/ z L.z; y; / d (16) '.z; d/ := lim d d inf t 0 '.z + td/ '.z/ t ; ' +.z; d/ := lim d d sup t 0 '.z + td/ '.z/ t : 16
To do so, we firstly note that it is not difficult to show that under our assumptions (MFCQ, Z compact etc.) the multiplier set K 0 := K.z; y/ is bounded. Consequently also a bound inf y S.z/ max K.z;y/ z Z;y S.z/ z L.z; y; / 1 b 1 z Z (17) is valid. In view of (16) by using z L.z; y; /d z L.z; y; / 1 d we find '.z; d/ ; ' +.z; d/ b 1 for all z Z, d IR N ; d = 1 : This implies that for any z Z, d IR N ; d = 1 there exists " > 0 such that '.z + td/ '.z/ 2b 1 t t < " : (18) The compactness of Z yields the uniform Lipschitz condition of the theorem. Remark A1. For the special case '.z/ := max y g.y/ s.t. v i.y/ z i 0 ; i I, we find the formula max z L.z; y; / d = max T d max 1 d K.z;y/ K.z;y/ K.z;y/ and the Lipschitz constant B in Lemma 6 is governed by the number B := sup z Z inf y S.z/ max 1 (19) K.z;y/ The next lemma involves the so-called metric regularity of the mapping Y.z/ (see e.g. [2, Th. 2.87] for a proof). Lemma 7 Let Aa 1 hold and let MFCQ be satisfied for all z Z, and for all y Y.z/. Then Y.z/ is uniformly metric regular, i.e., there exists > 0 such that d.y; Y.z// v +.z; y/ y Y 0 ; z Z ; where d.y; Y / denotes the distance d.y; Y / = ŷ Y y ŷ. We briefly survey some results from penalty methods. Consider the problems Q.u/ : Q : '.u/ := max g.y/ s.t. v i.y/ u i 0 ; i I ' := max { g.y/ v +.y/ } : Aa 2 : For all u IR q such that u max y Y0 v +.y/ the following Lipschitz condition holds with some L : '.u/ '.0/ L u : The next lemma presents a standard result in penalty methods (see [3]). 17
Lemma 8 Let assumption Aa 2 hold. Then for any L it follows ' = ' and each solution y of ' is also a solution of ' For the smoothing procedure in Section 5 we make use of the following auxiliary result. Lemma 9 Let 0 1 : : : q, p > 0. Then [ 0 0 1 ( q )] p ln e p ln.q + 1/ j : p j=0 Proof. The inequality is a consequence of the relation 1 p ln ( q j=0 e p j ) = 1 p ln (e p 0 [ 1 + = 0 1 p ln (1 + q j=1 q e ]) p. j 0 / j=1 e p. j 0 / ) We also give a basic result for the L 2 -penalty approach for solving Q: Q : ' := max { g.y/ v +.y/ 2 2 } : Aa 3 : There exist constants ; L > 0 such that: g.y 1 / g.y 2 / L y 1 y 2 y 1 ; y 2 Y 0 d.y; Y / v +.y/ y Y 0 Aa 4 : The point y Y is the unique solution of Q of order r = 1 or r = 2, i.e., with some : g.y/ g.y/ y y r y Y : Lemma 10 The inequality ' ' holds and if Aa 3 is fulfilled then for the solutions y of Q we have 0 ' '.L/ 2 1 ; d.y ; Y / 2 L 1 : If in addition Aa 4 is satisfied then with some c, y y c 1 1=r > 0 : 18
Proof. (For completeness we give the proof which is partially contained in [9].) For any y Y (Y the feasible set of Q) the relation g.y/ = g.y/ v +.y/ 2 2 is true which in view of Y Y 0 yields ' '. Let now y be a solution of Q. Then the inequality ' ' = g.y / v +.y / 2 2 implies 0 v +.y / 2 2 g.y / ' : (20) By using max i.v + i.y // 2 v +.y / 2 2 and the second condition in Aa 3 we find 1 2 d2.y ; Y / v +.y / 2 v +.y / 2 2 g.y / ' : (21) Let ỹ Y be such that ỹ y = d.y ; Y /. By Aa 3 we then obtain g.y / g.ỹ / + L ỹ y = g.ỹ / + L d.y ; Y / ' + L d.y ; Y / and using (21) also d 2.y ; Y / 2 L d.y ; Y / or d.y ; Y / L 2 Moreover it follows : ' = g.y / v +.y / 2 2 g.y / ' + L d.y ; Y / ' + L2 2 which proves the first inequalities of the lemma. Under Aa 4 we deduce with ỹ as defined above g.ỹ / g.y / + L ỹ y or ' g.ỹ / ' g.y / + L d.y /; Y / L d.y /; Y / and with ỹ y r ' g.ỹ / Ld.y ; Y / L2 2 also y y ỹ y + ỹ y.l 2 2 =/ 1=r.1= 1=r / +.L 2 =/ 1 : Finally we describe the effect of a discretization step and compare with a grid Y d Y 0 of mesh size d.y d ; Y 0 / = d the problems where k : Y 0 IR is a given function. ' := max k.y/, ' d := max y Y d k.y/ Lemma 11 Let k satisfy a Lipschitz condition on Y 0 with Lipschitz constant B 1. Then 0 ' ' d B 1 d : Proof. In view of Y d Y 0 the relation ' d ' is immediate. To any solution y of ' we can choose a point y d Y d, y y d d. Then, using ' d k.y d / we deduce ' ' d k.y/ k.y d / B 1 d : Acknowledgments. The author is indebted to the referees for their valuable comments. 19
References [1] Bank B., Guddat J., Klatte D., Kummer B., Tammer K., Non-Linear Parametric Optimization, Birkhäuser Verlag, Basel, (1983). [2] Bonnans J.F., Shapiro A., Perturbation Analysis of Optimization Problems, Springer, New York, (2000). [3] Burke, J.V., Calmness and exact penalization, SIAM J. Control and Optimization, Vol. 29, No. 2, pp. 493-497, (1991). [4] Faigle U., Kern W., Still G., Algorithmic Principles of Mathematical Programg, Kluwer, Dordrecht, (2002). [5] R. Hettich, G. Still, Second order optimality conditions for generalized semi-infinite programg problems, Optimization Vol. 34, 195-211, (1995). [6] Hettich R., Kortanek K., Semi-infinite programg: Theory, methods and applications, SIAM Review, vol. 35, No. 3, 380-429 (1993). [7] H. Th Jongen, J.-J. Rückmann, O. Stein, Generalized semi-infinite optimization: A first order optimality condition and examples, Mathematical Programg 83, 145-158, (1998). [8] Levitin E. and Tichatschke R., A branch and bound approach for solving a class of generalized semi-infinite programg problems, J. of Global Optimization 13, 299-315, (1998). [9] Levitin E., Reduction of generalized semi-infinite programg problems to semi-infinite or piece-wise smooth programg problems, preprint, to appear. [10] Reemtsen R., Rückmann J.-J. (eds.) Semi-Infinite Programg, Kluwer, Boston, (1998). [11] Rockafellar R.T., Directional differentiability of the optimal value function in nonlinear programg problem, Math. Programg Study 21, 213-226, (1984). [12] Royset J.O., Polak E., Der Kiureghian A., Adaptive approximations and exact penalization for the solution of generalized semi-infinite -max problems, preprint, to appear. [13] Rückmann J.-J., Shapiro A., First order optimality conditions in Generalized Semiinfinite Programg, JOTA, Vol. 101, No. 2 (1999). [14] Stein O., Still G., Solving semi-infinite optimization problems with interior point techniques, SIAM J. Control Optim. 42, no. 3, 769 788, (2003). [15] Still G., Generalized semi-infinite programg: Numerical aspects, Optimization 49, No. 3, 223-242, (2001). [16] Still G., Discretization in semi-infinite programg: the rate of convergence, Math. Program. 91, no. 1, Ser. A, 53 69, (2001). 20