On Second-order Properties of the Moreau-Yosida Regularization for Constrained Nonsmooth Convex Programs Fanwen Meng,2 ; Gongyun Zhao Department of Mathematics National University of Singapore 2 Science Drive 2, Singapore 7543 Republic of Singapore 2 School of Mathematics University of Southampton Highfield, Southampton SO7 BJ Great Britain Abstract In this paper, we attempt to investigate a class of constrained nonsmooth convex optimization problems, that is, piecewise C 2 convex objectives with smooth convex inequality constraints. By using the Moreau-Yosida regularization, we convert these problems into unconstrained smooth convex programs. Then, we investigate the second-order properties of the Moreau-Yosida regularization η. By introducing the (GAIPCQ) qualification, we show that the gradient of the regularized function η is piecewise smooth, thereby, semismooth. Keywords. Moreau-Yosida Regularization, Piecewise C k Functions, Semismooth Functions. AMS Subject Classification (99): 90C25, 65K0, 52A4. Introduction In this paper, we consider the following constrained convex programming: min f 0 (x) s.t. f i (x) 0, i I = {, 2,..., m}, () where f 0, f i, i I, are finite, convex on R n and f 0 is possibly nonsmooth. For nonsmooth programs, many approaches have been presented so far and they are often restricted to the convex unconstrained case. In general, the various approaches Corresponding author.
are based on combinations of the following three methods: (i) subgradient methods (e.g., Ermoliev, Kryazhimskii, and Ruszczyńskii (997), Poljak (967)); (ii) bundle techniques (e.g., Kiwiel (990), Lemaréchal, Nemirovskii, and Nesterov (995), Lemaréchal and Strodiot (98)); (iii) Moreau-Yosida regularization (e.g., Fukushima and Qi (996), Lemaréchal and Sagastizábal (997a, 997b), Qi and Chen (997)). In this paper, we would like to investigate some properties of the Moreau-Yosida regularization. For any proper, closed convex function f : R n R {+ }, the Moreau (965) -Yosida (964) regularization of f, η : R n R, is defined by η(x) = min y R n{f(y) + 2 y x 2 M}, (2) where M is a symmetric positive definite n n matrix and z 2 M = zt Mz for any z R n. Let p(x) denote the unique solution of (2). The motivation of the study of Moreau-Yosida regularization is due to the fact that is equivalent to More precisely, the following statements are equivalent: (i) x minimizes f; (ii) p(x) = x; (iii) η(x) = 0; (iv) x minimizes η; (v) f(p(x)) = f(x); (vi) η(x) = f(x). min{f(x) x R n }, (3) min{η(x) x R n }. (4) It is easy to see that η is finite everywhere and convex. A nice property of the Moreau-Yosida regularization is that η is continuously differentiable and its gradient g(x) η(x) = M(x p(x)), (5) is Lipschitz continuous. For more details about Moreau-Yosida regularization, the readers may be referred to Hiriart-Urruty and Lemaréchal (993) and Rockafellar (970). In many situations, besides the first order property mentioned above, we are also interested in the second order property of the Moreau-Yosida regularization, e.g., when we wish to use Newton s method to minimize η(x). In most circumstances, due to the nondifferentiability of f, the second order derivative of η does not exist. Thus we need to investigate a weaker second order property, namely, the semismoothness of g(= η). Under the condition of semismoothness, Qi and Sun (993) derived the superlinear (quadratic) 2
convergence of a nonsmooth version of Newton s method for solving a nonsmooth equation. Also, by virtue of the semismoothness of g, Fukushima and Qi (996) have shown that the superlinear convergence can be guaranteed by using approximate solutions of the problem (2) to construct search directions for minimizing η. In addition, the concept of semismoothness plays a central role in the development of optimality conditions in nonsmooth analysis as well, e.g., Chaney (988, 989, 990). Recently, several authors have investigated the semismoothness of g for some class of f, for example, see Lemaréchal and Sagastizábal (994, 997), Meng and Hao (200), Mifflin, Qi and Sun (999), Qi (995), Rockafellar (985), Sun and Han (997), Zhao and Meng (2000). Qi (995) conjectured that g is semismooth under a regularity condition if f is the maximum of several twice continuously differentiable convex functions. Later, Sun and Han (997) gave a proof to this conjecture under a constant rank constraint qualification (CRCQ) for the case where M = (/λ)i and λ is a positive constant. In Meng and Hao (200), the authors investigated the unconstrained case of problem (), where f 0 is assumed to be piecewise C 2. By introducing the (SCRCQ) qualification, which is weaker than (CRCQ), they derived the same result for g as Sun and Han (997). In addition, Mifflin, Qi and Sun (999) investigated the same problem as in Meng and Hao (200). By introducing the (AIPCQ) qualification, which is weaker than (CRCQ) and (SCRCQ), they derived the same result for g as Sun and Han (997). Recently, Zhao and Meng (2000) studied the Lagrangian-dual function of certain problems. Specifically, there are two cases under consideration: (i) linear objectives with either affine inequality or strictly convex inequality constraints; (ii) strictly convex objectives with general convex inequality and affine equality constraints. After deriving the piecewise C 2 -smoothness of the Lagrangiandual function, they showed that the gradient of the Moreau-Yosida regularization of this function is piecewise smooth, thereby, semismooth, under the (AIPCQ) qualification. Notice that all papers mentioned above mainly study the Moreau-Yosida regularization of a continuous (possibly nondifferentiable) function on R n. And, the results derived are applicable to the unconstrained optimization problem. In this paper, we consider the constrained problem () instead. We should point out that the smooth problem min η 0 (x) s.t. f i (x) 0, i I, where η 0 is the Moreau-Yosida regularization of f 0, is not equivalent to (), because η 0 coincides with f 0 only on the set of their unconstrained minima, which may not be the solutions of the constrained minimization problems. Therefore, although the results derived in Mifflin, Qi, and Sun (999) can be easily extended to (6), but they are not applicable to (). Due to this observation, we consider the following generalized function: (6) f(x) = f 0 (x) + δ(x D), (7) where D := {x R n : f i (x) 0, i I} and δ(x D) is the usual indicator function on the 3
convex set D (see, Rockafellar (970)). Then, problem () can be written as the following unconstrained problem: min{f(x) x R n }. (8) Problem (8) then is equivalent to where min{η(x) x R n }, (9) η(x) = min y R n{f(y) + 2 y x 2 M}. (0) In this paper, we assume that for problem (), f 0 is piecewise C 2 and f i are twice continuously differentiable. We aim to investigate the second-order properties of the regularized function η in (0). The analysis presented in this paper is related to the results in Mifflin, Qi and Sun (999), however, there are several fundamental distinctions. Firstly, f in (7) is a generalized function (indeed, f is not even finite or continuous everywhere on R n.), whereas, in Mifflin, Qi and Sun (999), it was required to be finite everywhere, convex and continuous on the whole space. Actually, the latter condition is a fundamental assumption in many publications regarding to the Moreau-Yosida regularization, e.g., see Lemaréchal and Sagastizábal (997), Mifflin, Qi and Sun (999), Qi (995), Zhao and Meng (2000). Secondly, although the feasible region D is defined by some smooth inequalities, but the geometric figure of D is clearly nonsmooth, on which the indicator function is defined. In this paper, we attempt to cope with this two kinds of nonsmooth situations simultaneously. To this end, we introduce a new qualification, called (GAIPCQ). Under this condition, we show that the gradient of the Moreau-Yosida regularization of f is piecewise smooth. Thereby, the gradient of η is semismooth. The rest of this paper is organized as follows. In Section 2, some basic notions and properties are collected. Section 3 studies the piecewise smoothness and semismoothness of the gradient g. Section 4 concludes. 2. Definitions and Basic Properties In this section, we recall some concepts, such as semismoothness and piecewise C k - ness, which are basic elements of this paper. In addition, we introduce a new constraint qualification, (GAIPCQ), which will be used in next section. It is well known that η is a continuously differentiable convex function defined on R n not only for the specific type function f defined in (7), but even for general proper, closed, nondifferentiable convex functions as well. In order to use Newton s method or modified Newton s methods, it is important to study the Hessian of η, i.e., the Jacobian of g. It is also known that g is globally Lipschitz continuous with modulus M. However, g may not be differentiable. To extend the definition of Jacobian to certain classes 4
of nonsmooth functions, Qi and Sun (993) introduced the definition of semismoothness for vector-valued functions below: Definition 2.. Suppose F : R n R m is a locally Lipschitzian vector-valued function, we say that F is semismooth at x if lim V F (x+th ), h h, t 0 {V h }, () exists for any h R n, where F (y) denotes the generalized Jacobian of F at y defined by Clarke (983). The semismoothness of the gradient g is a key condition in the analysis of the superlinear convergence of an approximate Newton method proposed by Fukushima and Qi (993). In fact, semismoothness is a local property for nonsmooth functions. A direct verification of semismoothness is difficult. Fortunately, according to Qi and Sun (993), it is known that piecewise smooth functions are semismooth functions. Hence, one needs to study the piecewise smoothness of g for some class of convex problems. Now, we define the piecewise smoothness for vector-valued functions, see,. Pang and Ralph (996). Definition 2.2. A continuous function ψ : R n R m is said to be piecewise C k on a set W R n if there exist a finite number of functions ψ, ψ 2,..., ψ q such that each ψ j C k (U) for some open set U containing W, and ψ(x) {ψ (x),..., ψ q (x)} for any x W. We refer to {ψ j } j Iq as a representation of ψ on W, where I q := {, 2,..., q}. Remark 2.. If m =, then Definition 2.2 is referred to the notion of piecewise C k -ness for real-valued functions. In addition, the domain W in above definition is a general set in R n. In particular, when W is open, then all pieces ψ j above could be simply required to be C k on W = U. Actually, in the following context, we mainly pay attention to the case where ψ is piecewise C k on some open set in R n. In the following context, f 0 is always assumed to be convex. Let f 0 be piecewise C on W R n with a representation {h j } j Ĵ. For any j Ĵ, define D j := {x W : h j (x) = f 0 (x)}. We define the active index set at x by And, let J(x) = {j Ĵ : x D j}, x W. (2) K(x) = {j Ĵ : x cl(intd j)}, x W. (3) We then obtain the following result regarding to the subdifferential of f 0. 5
Proposition 2..(Pang and Ralph (996)) Let f 0 be a convex real-valued function on R n. Suppose that f 0 is piecewise C on an open set W R n with a representation {h j } j Ĵ. Then, for every x W, f 0 (x) = conv{ h j (x) : j K(x)}. (4) where convs denotes the convex hull of the set S. For the piecewise C function f 0 above, it is easy to see that K(x) J(x) for any x W. Hence, f 0 (x) = conv{ h j (x) : j K(x)} conv{ h j (x) : j J(x)}. (5) Moreover, if f 0 above is piecewise C 2 on W, by virtue of Lemma 2 in Mifflin, Qi and Sun (999), we then have 2 h j (x) 0, j K(x), x W, (6) where A 0 means the matrix A is positive semidefinite. Now, let s define the set of indexes of active constraints at any point x D by I(x) := {i I f i (x) = 0}. (7) In order to study the piecewise smoothness of g in (5), we introduce the following constraint qualification. Definition 2.3. Generalized Affine Independence Preserving Constraint Qualification(GAIPCQ) is said to hold for f 0 and f i, i I, at x, if for every subset J K J(x) I(x) for which there exists a sequence {x k } with {x k } x, J K J(x k ) I(x k ) and the vectors {( hj (x k ) ) ( fi (x, k ) 0 ) : j J, i K }, (8) being linearly independent, it follows that the vectors {( ) ( ) } hj (x) fi (x), : j J, i K, (9) 0 are linearly independent. Remark 2.2. For problem (8) when f 0 is piecewise C 2, in order to investigate the smoothness of g, obviously, we shall need the information about the constraints f i together with pieces h j of f 0. Thus, (GAIPCQ) above involves both pieces h j (i.e., f 0 ) and f i. On the other hand, if problem () is unconstrained and f 0 is piecewise C 2 on R n, then 6
f = f 0, which implies that f is piecewise C 2 on the whole R n as well. Evidently, in this case problem (8) becomes to problem (). This is the exact situation studied in Mifflin, Qi and Sun (999). In this case, the terms concerning I and I(x) in Definition 2.3 will vanish automatically, and (GAIPCQ) is reduced to the qualification (AIPCQ) introduced in Mifflin, Qi and Sun (999). 3. Smoothness of g for Piecewise C 2 Objectives In this section, we shall investigate the piecewise smoothness of the gradient of the Moreau-Yosida regularized function η under the (GAIPCQ) qualification. Then, according to Qi and Sun (993), we can derive the semismoothness of g(= η). In the rest of this paper, we make the following assumptions: Assumption 3.. = D R n. Assumption 3.2. f 0 is convex, piecewise C 2 on R n with a representation {h j } j Ĵ. Assumption 3.3. f i is convex, twice continuously differentiable on R n for all i I. For the generalized function f in (7), we can easily see that its effective domain is exactly the feasible region D. Then, for any x R n, due to the closeness of D, it follows that the proximal solution p(x) lies in D, hence, η(x) is finite. Therefore, the effective domain of η is the whole n-dimensional space. Now, for x R n, since p(x) minimizes f(y) + y 2 x 2 M on Rn, equivalently, p(x) minimizes f 0 (y) + y 2 x 2 M over D, then there exists a multiplier (λ(x), µ(x)) R J(p(x)) I(p(x)) such that g(x) = M(x p(x)) = j J(p(x)) λ j(x) h j (p(x)) + i I(p(x)) µ i(x) f i (p(x)), λ j (x) 0, j J(p(x)), j J(p(x)) λ j(x) =, (20) µ i (x) 0, i I(p(x)). For a nonnegative vector a R m, we shall let supp(a), called the support of a, be the subset of {,..., m} consisting of the indexes i for which a i > 0. We define the following sets: M(x) = {(λ(x), µ(x)) R J(p(x)) I(p(x)) (λ(x), µ(x)) satisfies (20)}, (2) 7
B(x) = {J K J(p(x)) I(p(x)) there exists (λ(x), µ(x)) M(x) such that and supp(λ(x)) J J(p(x)), supp(µ(x)) K I(p(x)), K, and the vectors {( hj (p(x)) ) ( fi (p(x)), 0 ) } : j J, i K are linearly independent.}, (22) ˆB(x) = {J K B(x) J K(p(x))}. (23) If I(p(x)) =, i.e., I(p(x)) = 0, then all terms with respect to I(p(x)), µ(x) and K in (20) (23) would vanish automatically. If I(p(x)), namely, the optimal solution p(x) lies on the boundary of D, as to the corresponding Lagrangian multipliers µ(x) in (20), we make the following non-degenerate assumption: Assumption 3.4. For any (λ(x), µ(x)) M(x), µ(x) 0. Lemma 3.. Let B(x) be defined in (22), then B(x) for any x R n. Proof. It is easy to verify that M(x) in (2) is a convex set. It s obvious that the result is true if I(p(x)) = 0. Next, we consider the case where I(p(x)). According to the convexity of M(x) and Assumption 3.4, we can easily see that there exists a vertex of M(x), say, α(x) = (α λ (x), α µ (x)), such that α µ (x) 0. Let J = supp(α λ (x)) and K = supp(α µ (x)), then we have J J(p(x)) and K I(p(x)). Without loss of generality, we assume J = {, 2,..., s}, K = {,..., t}. In what follows we need to show the vectors {( ) ( ) ( ) ( )} h (p(x)) hs (p(x)) f (p(x)) ft (p(x)),...,,,...,, (24) 0 0 are linearly independent. If this desired result is not true, then there exist scalars k, k 2,..., k s and l, l 2,... l t which are not all zero satisfying s ( ) hj (p(x)) t ( ) fi (p(x)) k j + l i = 0, (25) 0 namely, j= i= s j= k j h j (p(x)) + t i= l t f i (p(x)) = 0, k + k 2 + + k s = 0. 8 (26)
Let β = (β λ, β µ ) with β λ = (k,..., k s, 0,..., 0) R J(p(x)), β µ = (l,..., l t, 0,..., 0) R I(p(x)), then for any sufficiently small ε > 0, e.g., taking ε = min{α λj (x)/ k j, α µi (x)/ l i : k j 0, l i 0, j J, i K.}, we have and α(x) ± εβ 0, (27) α k (v) ± εβ k = 0, k (J(p(x)) \ J) (I(p(x)) \ K). (28) Hence, it follows from (26) that j J(p(x)) (α j(x) ± εβ j ) = ± ε j J(p(x)) β j = ± ε j J β j = ± ε(k + + k s ) = ± 0 =. In addition, by (2) and (26), we have j J(p(x)) (α j(x) ± εβ j ) h j (p(x)) + i I(p(x)) (α i(x) ± εβ i ) f i (p(x)) = g(x) ± ε( s j= k j h j (p(x)) + t i= l i f i (p(x))) = g(x). Therefore, by (27) (30), it follows that α(x) ± εβ M(x). Furthermore, we have α(x) = (α(x) + εβ)/2 + (α(x) εβ)/2, which leads to a contradiction with that α(x) is the vertex of M(x). Hence, (24) are linearly independent, thereby, J K B(x). Thus, B(x). Now, we establish the following lemma, which is very important in analysis. Lemma 3.2. Let B(x) be defined in (22). Suppose that (GAIPCQ) is satisfied at p(x). Then, there exists an open neighborhood N (x) such that B(y) B(x) for any y N (x). Proof. In the following, we shall consider the two cases: I(p(x)) = 0 and I(p(x)), respectively. Case i I(p(x)) = 0. In this case, it is easy to see that p(x) lies in the relative interior of D. By Theorem XV 4..4 of Hiriart-Urruty and Lemaréchal (993), we have p(.) is continuous on R n. So, it follows that p(y) rid for y close to x, thereby, I(p(y)) =. Thus, for any y close to x, the terms regarding the index set K in B(y) are automatically vacuous. Hence, we only need to verify that the desired result concerning J is true. However, in this case, according to Remark 2.2, (GAIPCQ) is reduced to (AIPCQ). So, the desired result is valid with the help of Lemma 3 of Mifflin, Qi and Sun (999). Case ii I(p(x)). By the continuity of p(.) and (2), for any y close to x, we have J(p(y)) J(p(x)). (3) 9 (29) (30)
Since f i (p(x)) < 0 for any i I \ I(p(x)), then by virtue of the continuity of p(.) again, there exists an open neighborhood N (x) around x such that for any y N (x), f i (p(y)) < 0, i I \ I(p(x)). Hence, I \ I(p(x)) I \ I(p(y)). So, it follows that I(p(y)) I(p(x)) (32) for any y N (x). Now, suppose on the contrary that the proposed result is not true. Hence, noticing that J(p(x)) and I(p(x)) are finite index sets, then there exists a sequence {x k } tending to x such that for all k, there is an index set J K B(x k ) \ B(x). So, for each k, the vectors {( ) hj (p(x k )) ) ( fi (p(x, k )) 0 } : j J, i K are linearly independent and there exists (λ(x k ), µ(x k )) M(x k ) such that supp(λ(x k )) J J(p(x k )), supp(µ(x k )) K I(p(x k )), but J K / B(x). According to the (GAIPCQ), the vectors {( ) ( ) } hj (p(x)) fi (p(x)), : j J, i K (34) 0 must be linearly independent. In addition, by (3) and (32), it follows that J K J(p(x)) I(p(x)). Then, we get the following statement: There doesn t exist (λ(x), µ(x)) M(x) such that supp(λ(x)) J J(p(x)), and supp(µ(x)) K I(p(x)). (35) Due to the boundedness of λ J (x k ), there exists a convergent subsequence of {λ J (x k )} k=. For convenience in description, we still use this same notation. Let s assume {λ J (x k )} tends to λ J. Then, regarding to this convergent subsequence, it follows from (20) that g(x k ) j J (33) λ j (x k ) h j (p(x k )) = i K µ i (x k ) f i (p(x k )), (36) According to (33), we can easily know that f K (p(x k )) has full column rank. follows from (36) that So, it µ K (x k ) = [( f K (p(x k ))) T f K (p(x k ))] ( f K (p(x k ))) T [g(x k ) λ j (x k ) h j (p(x k ))]. j J (37) By the continuity of p(.) and f i (.), as k, we have ( f K (p(x k ))) T f K (p(x k )) ( f K (p(x))) T f K (p(x)). In addition, from (34) we have f K (p(x)) has full column rank. Thereby, the inverse of ( f K (p(x))) T f K (p(x)) does exist. Hence, as k, the right hand side of (37) must tend to [( f K (p(x))) T f K (p(x))] ( f K (p(x))) T [g(x) j J λ j h j (p(x))], (38) 0
which is denoted by µ K. So, we have g(x) λ j h j (p(x)) = µ i f i (p(x)), (39) j J i K Now, by setting λ i (x) = λ i (x) if i J, λ i (x) = 0 for i J(p(x)) \ J, and, µ i (x) = µ i (x) if i K, µ i (x) = 0 for i I(p(x))\K, we have (λ(x), µ(x)) M(x) satisfying supp(λ(x)) J, supp(µ(x)) K, which leads to a contradiction with (35). Therefore, by Case i and Case ii, we have shown the proposed conclusion. Corollary 3.. Let ˆB(x) be defined in (23). Suppose that (GAIPCQ) is satisfied at p(x). Then, there exists an open neighborhood ˆN (x) such that ˆB(y) ˆB(x) for any y ˆN (x). Proof. By virtue of Proposition 2., Lemma 3. and (5), it s clear that ˆB(x) for any x R n. Noticing the continuity of p(.), it follows that K(p(y)) K(p(x)) for y close to x. Hence, the result is valid by virtue of Lemma 3.2. Now, for x R n and any J K ˆB(x), choose some k J and define J = J \ {k}. We consider the following mapping: H J K (y, z, q, w) = (h j (y) h k (y)) j J (f i (y)) i K, ( M j J z j h j (y) + ( j J z j) h k (y) + ) i K q i f i (y) + y w (40) where w R n, (y, z, q) R n R J R K. Let (y 0, z 0, q 0, w 0 ) = (p(x), λ J(x), µ K (x), x), then we have H J K (y 0, z 0, q 0, w 0 ) = 0. For convenience, set h j (y) = h j (y) h k (y), j J. Then, the matrix of partial derivatives of H J K (y, z, q, w) with respect to (y, z, q) is h J(y) T 0 0 J (y,z,q) H J K (y, z, q, w) = f K (y) T 0 0, (4) B J K (y, z, q) M h J(y) M f K (y) where for i K, f i (y) is the i-th column of f K (y) and h j (y) is the j-th column of h J(y) for any j J, and B J K (y, z, q) = I + M z j 2 h j (y) + ( z j ) 2 h k (y) + q i 2 f i (y). (42) j J j J i K
When J =, then J, the variable z, and the terms concerning them in above formulae are vacuous. Similarly, all terms regarding to K above will vanish when I(p(x)) = 0. In this two cases, the mapping H in (40) and its corresponding Jacobian matrix would be described by simpler forms in some lower dimensional spaces, and the analysis becomes much easier. Now, we get the following result: Lemma 3.3. For any J K ˆB(x), there exists an open neighborhood S J K of w 0 (= x) and an open neighborhood Q J K of (y 0, z 0, q 0 )(= (p(x), λ J (x), µ K (x))) such that when w cls J K, the equations H J K (y, z, q, w) = 0 have a unique solution (y(w), z(w), q(w)) J K clq J K where z J K (w) or q J K (w) is vacuous if J = or K = 0, respectively. Furthermore, (y(w), z(w), q(w)) J K is continuously differentiable on S J K. Proof. When J = or I(p(x)) = 0, it is easy to see that the desired results are true. Next, we consider the case when J > and I(p(x)) = 0. It follows from (6) that 2 h j (x) 0 for any j J. Also, noticing q 0 0, 0 z 0 j, 0 j J z0 j for any j J, then there exists an open neighborhood of (y 0, z 0, q 0, w 0 ), say N J K (y 0, z 0, q 0, w 0 ), such that B J K (y, z, q) is positive definite on this neighborhood. On the other hand, by the definition of ˆB(x), we can easily know that the vectors { h j (y 0 ), f i (y 0 ) : j J, i K} (43) are linearly independent. So, by the continuity of h j and f i, j J, i K, there exists an open neighborhood of (y 0, z 0, q 0, w 0 ), for convenience say N J K (y 0, z 0, q 0, w 0 ) as above, such that for any (y, z, q, w) N J K (y 0, z 0, q 0, w 0 ) { h j (y), f i (y) : j J, i K} (44) are linearly independent. Then, it follows that the matrix ( h J(y), f K (y) ) has full column rank. Hence, we have J (y,z,q) H J K (y, z, q, w) is nonsingular on N J K (y 0, z 0, q 0, w 0 ). Noticing the fact that H J K (y 0, z 0, q 0, w 0 ) = 0 and by virtue of the Implicit Function Theorem, we derive the desired conclusion. Since J(p(x)) and I(p(x)) are both finite index sets, according to Lemma 3.3, we derive the following finitely many functions, g J K : S J K R n defined by g J K (w) = M(w y J K (w)), J K ˆB(x). (45) where S J K, y J K (w) are defined in Lemma 3.3. Obviously, g J K (w) is continuously differentiable on S J K. By using these functions and the (GAIPCQ) qualification, we derive the piecewise smoothness of g. 2
Proposition 3.. Suppose that (GAIPCQ) holds at p(x). Then, there exists an open neighborhood N (x) around x such that g is piecewise smooth on N (x). In particular, g is semismooth at x. Proof. From (20) and Corollary 3., for any w J K ˆB(w) ˆB(x) such that ˆN (x), it follows that there exists g(w) = M(w p(w)) = j J λ j(w) h j (p(w)) + i K µ i(w) f i (p(w)), λ j (w) 0, j J(p(w)), j J(p(w)) λ j(w) =, (46) Let µ i (w) 0, i I(p(w)). h J (p(w)) f K (p(w)) A(p(w)) =, (47) J 0 K where J, 0 K, denotes the J -dimensional row vector with all ones components, and the K -dimensional zero row vector, respectively. According to the definition of ˆB(x), it follows that A(p(w)) has full column rank. Then, from (46), we get ( ) λj (w) = ( A(p(w)) T A(p(w)) ) ( ) A(p(w)) T g(w), w µ K (w) ˆN (x). (48) Since p(.), h j (.) and f i (.) are continuous, so we have λ J (w) λ J (x) and µ K (w) µ K (x) as w x. Thus, we can choose N (x) { J K ˆB(x) S J K } ˆN (x) to be an open neighborhood of w 0 (= x) such that for any w N (x) and J K ˆB(w), (p(w), λ J (w), µ K (w)) clq J K, where λ J (w), µ K (w) is vacuous when J = or I(p(x)) = 0, respectively. For any w N (x), there exists J K ˆB(x) such that H J K (p(w), λ J (w), µ K (w), w) = 0, (49) and (p(w), λ J (w), µ K (w)) clq J K. So, by virtue of the Implicit Function Theorem, we have (p(w), λ J (w), µ K (w)) = (y J K (w), z J (w), q K (w)). Therefore, it follows that g(w) = M(w p(w)) = M(w y J K (w)) = g J K (w), (50) where g J K (w) is defined in (45). Thus, for any w N (x), there exists at least one continuously differentiable function g J K : S J K R n such that g(w) = g J K (w) with J K ˆB(x). Noticing that ˆB(x) is a finite set, we have g is piecewise smooth on N (x) by Definition 2.2. Moreover, g is semismooth at x according to Qi and Sun (993). 3
Remark 3.. In this section, we show that g is piecewise smooth around x, thereby, semismooth at x, for the problems with C 2 inequality constraints. Actually, if constraint functions f i in () are piecewise C 2 convex on R n, we can also analogously derive the same results without difficulty by following the idea of this paper. The description in this case should be much more complicated. However, it is not straightforward to extend these results to the general constraint problems with both equality and inequality constraints. In fact, the analysis would become quite difficult and complex in this situation since the methods used in this paper are perhaps not valid. One of the main obstacles is that at the KKT points the Lagrangian multipliers corresponding to the equality constraints are very hard to deal with. 4. Conclusion The Moreau-Yosida regularization is a powerful tool for smoothing nondifferentiable functions, which is widely used in nonsmooth optimization. A significant feature is the semismoothness of the gradient of the Moreau-Yosida regularization. In this paper, we investigate the semismoothness of the gradient g of the Moreau-Yosida regularization of f, which plays a key role in the superlinear convergence analysis of approximate Newton methods. The function f is closely related to a constrained program with a convex, piecewise C 2 objective f 0 and inequality constraints. By introducing a new qualification, we derive the piecewise smoothness of the gradient g. We also believe that the results established in this paper could enrich the theory in nonsmooth optimization greatly. Actually, there are still a lot of questions left open. For instance, for general convex problems with both equality and inequality constraints, we have not a confirmative result for the semismoothness of the gradient g so far. It is a great challenge to answer these questions. Acknowledgment The research of the first author was supported by Grant R-46-000-036-592 of IHPC- CIM, National University of Singapore. References [] Chaney, R.W. Second-order Necessary Conditions in Semismooth Optimization. Math. Programming. 988, 40 (), 95 09. [2] Chaney, R.W. Optimality Conditions for Piecewise C 2 Nonlinear Programming. J. Optim. Theory Anal. 989, 6 (2), 79 202. 4
[3] Chaney, R.W. Piecewise C k Functions in Nonsmooth Analysis. Nonlinear Anal. Theory, Methods & Appl. 990, 5 (7), 649 660. [4] Clarke, F. H. Optimization and Nonsmooth Analysis. John Wiley and Sons: New York, 983. [5] Ermoliev, Y. M.; Kryazhimskii, A. V.; Ruszczyńskii, A. Constraint Aggregation Principle in Convex Optimization. Math. Programming. 997, 76 (3), 353 372. [6] Fukushima, M.; Qi, L. A Global and Superlinear Convergent Algorithm for Nonsmooth Convex Minimization. SIAM J. Optim. 996, 6 (4), 06 20. [7] Hiriart-Urruty, J. B.; Lemaréchal, C. Convex Analysis and Minimization Algorithms. Springer Verlag: Berlin, 993. [8] Kiwiel, K. Proximity Control in Bundle Methods for Convex Nondifferentiable Minimization. Math. Programming. 990, 46 (), 05 22. [9] Lemaréchal, C.; Nemirovskii, A.; Nesterov, Y. New Variants of Bundle Methods. Math. Programming. 995, 69 (), 47. [0] Lemaréchal, C.; Sagastizábal, C. An Approach to Variable Metric Methods. In Systems modelling and optimization, Lecture notes in control and information sciences 97; Henry, J., Yvon, J.-P., Eds., Springer-Verlag: Berlin, 994; 44 62. [] Lemaréchal, C.; Sagastizábal, C. Variable Metric Bundle Methods: From Conceptual to Implementable Forms. Math. Programming. 997a, 76 (3), 393 40. [2] Lemaréchal, C.; Sagastizábal, C. Practical Aspects of the Moreau-Yosida Regularization I: Theoretical Preliminaries. SIAM J. Optim. 997b, 7 (2), 367 385. [3] Lemaréchal, C.; Strodiot, J. J.; Bihain, A. On a Bundle Method for Nonsmooth Optimization. In Nonlinear Programming 4; Mangasarian, O. L.; Meyer, R. R.; Robinson, S. M. Eds.; Academic Press: New York, 98; 245 282. [4] Meng F. W.; Hao Y. The Property of Piecewise Smoothness of Moreau-Yosida Approximation for a Piecewise C 2 Convex Function. Adv. Math. (Chinese). 200l, 30 (4), 354 358. [5] Mifflin, R.; Qi, L.; Sun, D. Properties of the Moreau-Yosida Regularization of a Piecewise C 2 Convex Function. Math. Programming. 999, 84 (2), 269 28. [6] Moreau, J. J. Proximite et Dualite Dans un Espace Hilbertien. Bulletin de la Societe Mathematique de France (French). 965, 93, 273 299. 5
[7] Pang, J. S.; Ralph, D. Piecewise Smoothness, Local Invertibility, and Parametric Analysis of Normal Maps. Math. Oper. Res. 996, 2 (2), 40 426. [8] Poljak, B. T. A General Method of Solving Extremum Problems. Dokl. Akad. Nauk SSSR. 967, 74, 33 36. [9] Qi, L. Second-order Analysis of the Moreau-Yosida Regularization of a Convex Function. Preprint, Revised version, School of Mathematics, the University of New South Wales, Sydney, Australia, 995. [20] Qi, L.; Chen, X. A Preconditioning Proximal Newton Method for Nondifferentiable Convex Optimization. Math. Programming. 997, 76 (3), 4 429. [2] Qi, L.; Sun, J. A Nonsmooth Version of Newton s Method. Math. Programming. 993, 58 (3), 353 367. [22] Rockafellar, R. T. Convex Analysis. Princeton, New Jersey, 970. [23] Rockafellar, R. T. Maximal Monotone Relations and the Second Derivatives on Nonsmooth Functions. Ann. Inst. H. Poincaré Anal. Non Linéaire. 985, 2 (3), 67 84. [24] Sun, D.; Han, J. On a Conjecture in Moreau-Yosida Regularization of a Nonsmooth Convex Function. Chinese Sci. Bull. 997, 42 (7), 40 43. [25] Yosida, K. Functional analysis. Springer Verlag: Berlin, 964. [26] Zhao, G.; Meng, F. On Piecewise Smoothness of a Lagrangian-dual Function and Its Moreau-Yosida Regularization. Research Report, DAMTP of University of Cambridge, 2000. 6