On Second-order Properties of the Moreau-Yosida Regularization for Constrained Nonsmooth Convex Programs

Similar documents
Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

A quasisecant method for minimizing nonsmooth functions

Optimality Conditions for Nonsmooth Convex Optimization

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

A Proximal Method for Identifying Active Manifolds

Journal of Convex Analysis (accepted for publication) A HYBRID PROJECTION PROXIMAL POINT ALGORITHM. M. V. Solodov and B. F.

A Continuation Method for the Solution of Monotone Variational Inequality Problems

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

A Novel Inexact Smoothing Method for Second-Order Cone Complementarity Problems

WHEN ARE THE (UN)CONSTRAINED STATIONARY POINTS OF THE IMPLICIT LAGRANGIAN GLOBAL SOLUTIONS?

FIXED POINTS IN THE FAMILY OF CONVEX REPRESENTATIONS OF A MAXIMAL MONOTONE OPERATOR

On constraint qualifications with generalized convexity and optimality conditions

Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions

A SIMPLY CONSTRAINED OPTIMIZATION REFORMULATION OF KKT SYSTEMS ARISING FROM VARIATIONAL INEQUALITIES

Optimality Conditions for Constrained Optimization

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Subdifferential representation of convex functions: refinements and applications

Merit functions and error bounds for generalized variational inequalities

CHARACTERIZATIONS OF LIPSCHITZIAN STABILITY

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see

FIRST- AND SECOND-ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH VANISHING CONSTRAINTS 1. Tim Hoheisel and Christian Kanzow

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Summary Notes on Maximization

SOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane

Optimization and Optimal Control in Banach Spaces

Extended Monotropic Programming and Duality 1

Lagrange Relaxation and Duality

The nonsmooth Newton method on Riemannian manifolds

1. Introduction The nonlinear complementarity problem (NCP) is to nd a point x 2 IR n such that hx; F (x)i = ; x 2 IR n + ; F (x) 2 IRn + ; where F is

Methods for a Class of Convex. Functions. Stephen M. Robinson WP April 1996

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

20 J.-S. CHEN, C.-H. KO AND X.-R. WU. : R 2 R is given by. Recently, the generalized Fischer-Burmeister function ϕ p : R2 R, which includes

A convergence result for an Outer Approximation Scheme

BASICS OF CONVEX ANALYSIS

Spectral gradient projection method for solving nonlinear monotone equations

Bulletin of the. Iranian Mathematical Society

Least Sparsity of p-norm based Optimization Problems with p > 1

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

arxiv: v1 [math.oc] 13 Mar 2019

Primal/Dual Decomposition Methods

SECOND-ORDER CHARACTERIZATIONS OF CONVEX AND PSEUDOCONVEX FUNCTIONS

ON REGULARITY CONDITIONS FOR COMPLEMENTARITY PROBLEMS

POLARS AND DUAL CONES

A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials

Constrained Optimization Theory

A Note on KKT Points of Homogeneous Programs 1

DUALIZATION OF SUBGRADIENT CONDITIONS FOR OPTIMALITY

On well definedness of the Central Path

Algorithms for Nonsmooth Optimization

Implications of the Constant Rank Constraint Qualification

On the iterate convergence of descent methods for convex optimization

Downloaded 12/13/16 to Redistribution subject to SIAM license or copyright; see

Weak sharp minima on Riemannian manifolds 1

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

Relationships between upper exhausters and the basic subdifferential in variational analysis

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Optimality, identifiability, and sensitivity

SECOND ORDER DUALITY IN MULTIOBJECTIVE PROGRAMMING

Fixed points in the family of convex representations of a maximal monotone operator

A FRITZ JOHN APPROACH TO FIRST ORDER OPTIMALITY CONDITIONS FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS

Convergence rate estimates for the gradient differential inclusion

EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE. Leszek Gasiński

Inequality Constraints

Stationary Points of Bound Constrained Minimization Reformulations of Complementarity Problems1,2

Vector Spaces. Addition : R n R n R n Scalar multiplication : R R n R n.

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

Convergence of Stationary Points of Sample Average Two-Stage Stochastic Programs: A Generalized Equation Approach

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Convex Functions and Optimization

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT

A Dual Condition for the Convex Subdifferential Sum Formula with Applications

STABLE AND TOTAL FENCHEL DUALITY FOR CONVEX OPTIMIZATION PROBLEMS IN LOCALLY CONVEX SPACES

THE NEARLY ADDITIVE MAPS

OPTIMALITY CONDITIONS FOR GLOBAL MINIMA OF NONCONVEX FUNCTIONS ON RIEMANNIAN MANIFOLDS

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Necessary optimality conditions for optimal control problems with nonsmooth mixed state and control constraints

The exact absolute value penalty function method for identifying strict global minima of order m in nonconvex nonsmooth programming

Global Maximum of a Convex Function: Necessary and Sufficient Conditions

Support Vector Machines

Active sets, steepest descent, and smooth approximation of functions

Chapter 2 Convex Analysis

Optimality, identifiability, and sensitivity

FUNCTION BASES FOR TOPOLOGICAL VECTOR SPACES. Yılmaz Yılmaz

Epiconvergence and ε-subgradients of Convex Functions

Using exact penalties to derive a new equation reformulation of KKT systems associated to variational inequalities

RECURSIVE APPROXIMATION OF THE HIGH DIMENSIONAL max FUNCTION

1 Introduction We consider the problem nd x 2 H such that 0 2 T (x); (1.1) where H is a real Hilbert space, and T () is a maximal monotone operator (o

Journal of Inequalities in Pure and Applied Mathematics

SOME STABILITY RESULTS FOR THE SEMI-AFFINE VARIATIONAL INEQUALITY PROBLEM. 1. Introduction

Differentiability of Convex Functions on a Banach Space with Smooth Bump Function 1

SOME PROPERTIES ON THE CLOSED SUBSETS IN BANACH SPACES

OPTIMALITY CONDITIONS FOR D.C. VECTOR OPTIMIZATION PROBLEMS UNDER D.C. CONSTRAINTS. N. Gadhi, A. Metrane

Transcription:

On Second-order Properties of the Moreau-Yosida Regularization for Constrained Nonsmooth Convex Programs Fanwen Meng,2 ; Gongyun Zhao Department of Mathematics National University of Singapore 2 Science Drive 2, Singapore 7543 Republic of Singapore 2 School of Mathematics University of Southampton Highfield, Southampton SO7 BJ Great Britain Abstract In this paper, we attempt to investigate a class of constrained nonsmooth convex optimization problems, that is, piecewise C 2 convex objectives with smooth convex inequality constraints. By using the Moreau-Yosida regularization, we convert these problems into unconstrained smooth convex programs. Then, we investigate the second-order properties of the Moreau-Yosida regularization η. By introducing the (GAIPCQ) qualification, we show that the gradient of the regularized function η is piecewise smooth, thereby, semismooth. Keywords. Moreau-Yosida Regularization, Piecewise C k Functions, Semismooth Functions. AMS Subject Classification (99): 90C25, 65K0, 52A4. Introduction In this paper, we consider the following constrained convex programming: min f 0 (x) s.t. f i (x) 0, i I = {, 2,..., m}, () where f 0, f i, i I, are finite, convex on R n and f 0 is possibly nonsmooth. For nonsmooth programs, many approaches have been presented so far and they are often restricted to the convex unconstrained case. In general, the various approaches Corresponding author.

are based on combinations of the following three methods: (i) subgradient methods (e.g., Ermoliev, Kryazhimskii, and Ruszczyńskii (997), Poljak (967)); (ii) bundle techniques (e.g., Kiwiel (990), Lemaréchal, Nemirovskii, and Nesterov (995), Lemaréchal and Strodiot (98)); (iii) Moreau-Yosida regularization (e.g., Fukushima and Qi (996), Lemaréchal and Sagastizábal (997a, 997b), Qi and Chen (997)). In this paper, we would like to investigate some properties of the Moreau-Yosida regularization. For any proper, closed convex function f : R n R {+ }, the Moreau (965) -Yosida (964) regularization of f, η : R n R, is defined by η(x) = min y R n{f(y) + 2 y x 2 M}, (2) where M is a symmetric positive definite n n matrix and z 2 M = zt Mz for any z R n. Let p(x) denote the unique solution of (2). The motivation of the study of Moreau-Yosida regularization is due to the fact that is equivalent to More precisely, the following statements are equivalent: (i) x minimizes f; (ii) p(x) = x; (iii) η(x) = 0; (iv) x minimizes η; (v) f(p(x)) = f(x); (vi) η(x) = f(x). min{f(x) x R n }, (3) min{η(x) x R n }. (4) It is easy to see that η is finite everywhere and convex. A nice property of the Moreau-Yosida regularization is that η is continuously differentiable and its gradient g(x) η(x) = M(x p(x)), (5) is Lipschitz continuous. For more details about Moreau-Yosida regularization, the readers may be referred to Hiriart-Urruty and Lemaréchal (993) and Rockafellar (970). In many situations, besides the first order property mentioned above, we are also interested in the second order property of the Moreau-Yosida regularization, e.g., when we wish to use Newton s method to minimize η(x). In most circumstances, due to the nondifferentiability of f, the second order derivative of η does not exist. Thus we need to investigate a weaker second order property, namely, the semismoothness of g(= η). Under the condition of semismoothness, Qi and Sun (993) derived the superlinear (quadratic) 2

convergence of a nonsmooth version of Newton s method for solving a nonsmooth equation. Also, by virtue of the semismoothness of g, Fukushima and Qi (996) have shown that the superlinear convergence can be guaranteed by using approximate solutions of the problem (2) to construct search directions for minimizing η. In addition, the concept of semismoothness plays a central role in the development of optimality conditions in nonsmooth analysis as well, e.g., Chaney (988, 989, 990). Recently, several authors have investigated the semismoothness of g for some class of f, for example, see Lemaréchal and Sagastizábal (994, 997), Meng and Hao (200), Mifflin, Qi and Sun (999), Qi (995), Rockafellar (985), Sun and Han (997), Zhao and Meng (2000). Qi (995) conjectured that g is semismooth under a regularity condition if f is the maximum of several twice continuously differentiable convex functions. Later, Sun and Han (997) gave a proof to this conjecture under a constant rank constraint qualification (CRCQ) for the case where M = (/λ)i and λ is a positive constant. In Meng and Hao (200), the authors investigated the unconstrained case of problem (), where f 0 is assumed to be piecewise C 2. By introducing the (SCRCQ) qualification, which is weaker than (CRCQ), they derived the same result for g as Sun and Han (997). In addition, Mifflin, Qi and Sun (999) investigated the same problem as in Meng and Hao (200). By introducing the (AIPCQ) qualification, which is weaker than (CRCQ) and (SCRCQ), they derived the same result for g as Sun and Han (997). Recently, Zhao and Meng (2000) studied the Lagrangian-dual function of certain problems. Specifically, there are two cases under consideration: (i) linear objectives with either affine inequality or strictly convex inequality constraints; (ii) strictly convex objectives with general convex inequality and affine equality constraints. After deriving the piecewise C 2 -smoothness of the Lagrangiandual function, they showed that the gradient of the Moreau-Yosida regularization of this function is piecewise smooth, thereby, semismooth, under the (AIPCQ) qualification. Notice that all papers mentioned above mainly study the Moreau-Yosida regularization of a continuous (possibly nondifferentiable) function on R n. And, the results derived are applicable to the unconstrained optimization problem. In this paper, we consider the constrained problem () instead. We should point out that the smooth problem min η 0 (x) s.t. f i (x) 0, i I, where η 0 is the Moreau-Yosida regularization of f 0, is not equivalent to (), because η 0 coincides with f 0 only on the set of their unconstrained minima, which may not be the solutions of the constrained minimization problems. Therefore, although the results derived in Mifflin, Qi, and Sun (999) can be easily extended to (6), but they are not applicable to (). Due to this observation, we consider the following generalized function: (6) f(x) = f 0 (x) + δ(x D), (7) where D := {x R n : f i (x) 0, i I} and δ(x D) is the usual indicator function on the 3

convex set D (see, Rockafellar (970)). Then, problem () can be written as the following unconstrained problem: min{f(x) x R n }. (8) Problem (8) then is equivalent to where min{η(x) x R n }, (9) η(x) = min y R n{f(y) + 2 y x 2 M}. (0) In this paper, we assume that for problem (), f 0 is piecewise C 2 and f i are twice continuously differentiable. We aim to investigate the second-order properties of the regularized function η in (0). The analysis presented in this paper is related to the results in Mifflin, Qi and Sun (999), however, there are several fundamental distinctions. Firstly, f in (7) is a generalized function (indeed, f is not even finite or continuous everywhere on R n.), whereas, in Mifflin, Qi and Sun (999), it was required to be finite everywhere, convex and continuous on the whole space. Actually, the latter condition is a fundamental assumption in many publications regarding to the Moreau-Yosida regularization, e.g., see Lemaréchal and Sagastizábal (997), Mifflin, Qi and Sun (999), Qi (995), Zhao and Meng (2000). Secondly, although the feasible region D is defined by some smooth inequalities, but the geometric figure of D is clearly nonsmooth, on which the indicator function is defined. In this paper, we attempt to cope with this two kinds of nonsmooth situations simultaneously. To this end, we introduce a new qualification, called (GAIPCQ). Under this condition, we show that the gradient of the Moreau-Yosida regularization of f is piecewise smooth. Thereby, the gradient of η is semismooth. The rest of this paper is organized as follows. In Section 2, some basic notions and properties are collected. Section 3 studies the piecewise smoothness and semismoothness of the gradient g. Section 4 concludes. 2. Definitions and Basic Properties In this section, we recall some concepts, such as semismoothness and piecewise C k - ness, which are basic elements of this paper. In addition, we introduce a new constraint qualification, (GAIPCQ), which will be used in next section. It is well known that η is a continuously differentiable convex function defined on R n not only for the specific type function f defined in (7), but even for general proper, closed, nondifferentiable convex functions as well. In order to use Newton s method or modified Newton s methods, it is important to study the Hessian of η, i.e., the Jacobian of g. It is also known that g is globally Lipschitz continuous with modulus M. However, g may not be differentiable. To extend the definition of Jacobian to certain classes 4

of nonsmooth functions, Qi and Sun (993) introduced the definition of semismoothness for vector-valued functions below: Definition 2.. Suppose F : R n R m is a locally Lipschitzian vector-valued function, we say that F is semismooth at x if lim V F (x+th ), h h, t 0 {V h }, () exists for any h R n, where F (y) denotes the generalized Jacobian of F at y defined by Clarke (983). The semismoothness of the gradient g is a key condition in the analysis of the superlinear convergence of an approximate Newton method proposed by Fukushima and Qi (993). In fact, semismoothness is a local property for nonsmooth functions. A direct verification of semismoothness is difficult. Fortunately, according to Qi and Sun (993), it is known that piecewise smooth functions are semismooth functions. Hence, one needs to study the piecewise smoothness of g for some class of convex problems. Now, we define the piecewise smoothness for vector-valued functions, see,. Pang and Ralph (996). Definition 2.2. A continuous function ψ : R n R m is said to be piecewise C k on a set W R n if there exist a finite number of functions ψ, ψ 2,..., ψ q such that each ψ j C k (U) for some open set U containing W, and ψ(x) {ψ (x),..., ψ q (x)} for any x W. We refer to {ψ j } j Iq as a representation of ψ on W, where I q := {, 2,..., q}. Remark 2.. If m =, then Definition 2.2 is referred to the notion of piecewise C k -ness for real-valued functions. In addition, the domain W in above definition is a general set in R n. In particular, when W is open, then all pieces ψ j above could be simply required to be C k on W = U. Actually, in the following context, we mainly pay attention to the case where ψ is piecewise C k on some open set in R n. In the following context, f 0 is always assumed to be convex. Let f 0 be piecewise C on W R n with a representation {h j } j Ĵ. For any j Ĵ, define D j := {x W : h j (x) = f 0 (x)}. We define the active index set at x by And, let J(x) = {j Ĵ : x D j}, x W. (2) K(x) = {j Ĵ : x cl(intd j)}, x W. (3) We then obtain the following result regarding to the subdifferential of f 0. 5

Proposition 2..(Pang and Ralph (996)) Let f 0 be a convex real-valued function on R n. Suppose that f 0 is piecewise C on an open set W R n with a representation {h j } j Ĵ. Then, for every x W, f 0 (x) = conv{ h j (x) : j K(x)}. (4) where convs denotes the convex hull of the set S. For the piecewise C function f 0 above, it is easy to see that K(x) J(x) for any x W. Hence, f 0 (x) = conv{ h j (x) : j K(x)} conv{ h j (x) : j J(x)}. (5) Moreover, if f 0 above is piecewise C 2 on W, by virtue of Lemma 2 in Mifflin, Qi and Sun (999), we then have 2 h j (x) 0, j K(x), x W, (6) where A 0 means the matrix A is positive semidefinite. Now, let s define the set of indexes of active constraints at any point x D by I(x) := {i I f i (x) = 0}. (7) In order to study the piecewise smoothness of g in (5), we introduce the following constraint qualification. Definition 2.3. Generalized Affine Independence Preserving Constraint Qualification(GAIPCQ) is said to hold for f 0 and f i, i I, at x, if for every subset J K J(x) I(x) for which there exists a sequence {x k } with {x k } x, J K J(x k ) I(x k ) and the vectors {( hj (x k ) ) ( fi (x, k ) 0 ) : j J, i K }, (8) being linearly independent, it follows that the vectors {( ) ( ) } hj (x) fi (x), : j J, i K, (9) 0 are linearly independent. Remark 2.2. For problem (8) when f 0 is piecewise C 2, in order to investigate the smoothness of g, obviously, we shall need the information about the constraints f i together with pieces h j of f 0. Thus, (GAIPCQ) above involves both pieces h j (i.e., f 0 ) and f i. On the other hand, if problem () is unconstrained and f 0 is piecewise C 2 on R n, then 6

f = f 0, which implies that f is piecewise C 2 on the whole R n as well. Evidently, in this case problem (8) becomes to problem (). This is the exact situation studied in Mifflin, Qi and Sun (999). In this case, the terms concerning I and I(x) in Definition 2.3 will vanish automatically, and (GAIPCQ) is reduced to the qualification (AIPCQ) introduced in Mifflin, Qi and Sun (999). 3. Smoothness of g for Piecewise C 2 Objectives In this section, we shall investigate the piecewise smoothness of the gradient of the Moreau-Yosida regularized function η under the (GAIPCQ) qualification. Then, according to Qi and Sun (993), we can derive the semismoothness of g(= η). In the rest of this paper, we make the following assumptions: Assumption 3.. = D R n. Assumption 3.2. f 0 is convex, piecewise C 2 on R n with a representation {h j } j Ĵ. Assumption 3.3. f i is convex, twice continuously differentiable on R n for all i I. For the generalized function f in (7), we can easily see that its effective domain is exactly the feasible region D. Then, for any x R n, due to the closeness of D, it follows that the proximal solution p(x) lies in D, hence, η(x) is finite. Therefore, the effective domain of η is the whole n-dimensional space. Now, for x R n, since p(x) minimizes f(y) + y 2 x 2 M on Rn, equivalently, p(x) minimizes f 0 (y) + y 2 x 2 M over D, then there exists a multiplier (λ(x), µ(x)) R J(p(x)) I(p(x)) such that g(x) = M(x p(x)) = j J(p(x)) λ j(x) h j (p(x)) + i I(p(x)) µ i(x) f i (p(x)), λ j (x) 0, j J(p(x)), j J(p(x)) λ j(x) =, (20) µ i (x) 0, i I(p(x)). For a nonnegative vector a R m, we shall let supp(a), called the support of a, be the subset of {,..., m} consisting of the indexes i for which a i > 0. We define the following sets: M(x) = {(λ(x), µ(x)) R J(p(x)) I(p(x)) (λ(x), µ(x)) satisfies (20)}, (2) 7

B(x) = {J K J(p(x)) I(p(x)) there exists (λ(x), µ(x)) M(x) such that and supp(λ(x)) J J(p(x)), supp(µ(x)) K I(p(x)), K, and the vectors {( hj (p(x)) ) ( fi (p(x)), 0 ) } : j J, i K are linearly independent.}, (22) ˆB(x) = {J K B(x) J K(p(x))}. (23) If I(p(x)) =, i.e., I(p(x)) = 0, then all terms with respect to I(p(x)), µ(x) and K in (20) (23) would vanish automatically. If I(p(x)), namely, the optimal solution p(x) lies on the boundary of D, as to the corresponding Lagrangian multipliers µ(x) in (20), we make the following non-degenerate assumption: Assumption 3.4. For any (λ(x), µ(x)) M(x), µ(x) 0. Lemma 3.. Let B(x) be defined in (22), then B(x) for any x R n. Proof. It is easy to verify that M(x) in (2) is a convex set. It s obvious that the result is true if I(p(x)) = 0. Next, we consider the case where I(p(x)). According to the convexity of M(x) and Assumption 3.4, we can easily see that there exists a vertex of M(x), say, α(x) = (α λ (x), α µ (x)), such that α µ (x) 0. Let J = supp(α λ (x)) and K = supp(α µ (x)), then we have J J(p(x)) and K I(p(x)). Without loss of generality, we assume J = {, 2,..., s}, K = {,..., t}. In what follows we need to show the vectors {( ) ( ) ( ) ( )} h (p(x)) hs (p(x)) f (p(x)) ft (p(x)),...,,,...,, (24) 0 0 are linearly independent. If this desired result is not true, then there exist scalars k, k 2,..., k s and l, l 2,... l t which are not all zero satisfying s ( ) hj (p(x)) t ( ) fi (p(x)) k j + l i = 0, (25) 0 namely, j= i= s j= k j h j (p(x)) + t i= l t f i (p(x)) = 0, k + k 2 + + k s = 0. 8 (26)

Let β = (β λ, β µ ) with β λ = (k,..., k s, 0,..., 0) R J(p(x)), β µ = (l,..., l t, 0,..., 0) R I(p(x)), then for any sufficiently small ε > 0, e.g., taking ε = min{α λj (x)/ k j, α µi (x)/ l i : k j 0, l i 0, j J, i K.}, we have and α(x) ± εβ 0, (27) α k (v) ± εβ k = 0, k (J(p(x)) \ J) (I(p(x)) \ K). (28) Hence, it follows from (26) that j J(p(x)) (α j(x) ± εβ j ) = ± ε j J(p(x)) β j = ± ε j J β j = ± ε(k + + k s ) = ± 0 =. In addition, by (2) and (26), we have j J(p(x)) (α j(x) ± εβ j ) h j (p(x)) + i I(p(x)) (α i(x) ± εβ i ) f i (p(x)) = g(x) ± ε( s j= k j h j (p(x)) + t i= l i f i (p(x))) = g(x). Therefore, by (27) (30), it follows that α(x) ± εβ M(x). Furthermore, we have α(x) = (α(x) + εβ)/2 + (α(x) εβ)/2, which leads to a contradiction with that α(x) is the vertex of M(x). Hence, (24) are linearly independent, thereby, J K B(x). Thus, B(x). Now, we establish the following lemma, which is very important in analysis. Lemma 3.2. Let B(x) be defined in (22). Suppose that (GAIPCQ) is satisfied at p(x). Then, there exists an open neighborhood N (x) such that B(y) B(x) for any y N (x). Proof. In the following, we shall consider the two cases: I(p(x)) = 0 and I(p(x)), respectively. Case i I(p(x)) = 0. In this case, it is easy to see that p(x) lies in the relative interior of D. By Theorem XV 4..4 of Hiriart-Urruty and Lemaréchal (993), we have p(.) is continuous on R n. So, it follows that p(y) rid for y close to x, thereby, I(p(y)) =. Thus, for any y close to x, the terms regarding the index set K in B(y) are automatically vacuous. Hence, we only need to verify that the desired result concerning J is true. However, in this case, according to Remark 2.2, (GAIPCQ) is reduced to (AIPCQ). So, the desired result is valid with the help of Lemma 3 of Mifflin, Qi and Sun (999). Case ii I(p(x)). By the continuity of p(.) and (2), for any y close to x, we have J(p(y)) J(p(x)). (3) 9 (29) (30)

Since f i (p(x)) < 0 for any i I \ I(p(x)), then by virtue of the continuity of p(.) again, there exists an open neighborhood N (x) around x such that for any y N (x), f i (p(y)) < 0, i I \ I(p(x)). Hence, I \ I(p(x)) I \ I(p(y)). So, it follows that I(p(y)) I(p(x)) (32) for any y N (x). Now, suppose on the contrary that the proposed result is not true. Hence, noticing that J(p(x)) and I(p(x)) are finite index sets, then there exists a sequence {x k } tending to x such that for all k, there is an index set J K B(x k ) \ B(x). So, for each k, the vectors {( ) hj (p(x k )) ) ( fi (p(x, k )) 0 } : j J, i K are linearly independent and there exists (λ(x k ), µ(x k )) M(x k ) such that supp(λ(x k )) J J(p(x k )), supp(µ(x k )) K I(p(x k )), but J K / B(x). According to the (GAIPCQ), the vectors {( ) ( ) } hj (p(x)) fi (p(x)), : j J, i K (34) 0 must be linearly independent. In addition, by (3) and (32), it follows that J K J(p(x)) I(p(x)). Then, we get the following statement: There doesn t exist (λ(x), µ(x)) M(x) such that supp(λ(x)) J J(p(x)), and supp(µ(x)) K I(p(x)). (35) Due to the boundedness of λ J (x k ), there exists a convergent subsequence of {λ J (x k )} k=. For convenience in description, we still use this same notation. Let s assume {λ J (x k )} tends to λ J. Then, regarding to this convergent subsequence, it follows from (20) that g(x k ) j J (33) λ j (x k ) h j (p(x k )) = i K µ i (x k ) f i (p(x k )), (36) According to (33), we can easily know that f K (p(x k )) has full column rank. follows from (36) that So, it µ K (x k ) = [( f K (p(x k ))) T f K (p(x k ))] ( f K (p(x k ))) T [g(x k ) λ j (x k ) h j (p(x k ))]. j J (37) By the continuity of p(.) and f i (.), as k, we have ( f K (p(x k ))) T f K (p(x k )) ( f K (p(x))) T f K (p(x)). In addition, from (34) we have f K (p(x)) has full column rank. Thereby, the inverse of ( f K (p(x))) T f K (p(x)) does exist. Hence, as k, the right hand side of (37) must tend to [( f K (p(x))) T f K (p(x))] ( f K (p(x))) T [g(x) j J λ j h j (p(x))], (38) 0

which is denoted by µ K. So, we have g(x) λ j h j (p(x)) = µ i f i (p(x)), (39) j J i K Now, by setting λ i (x) = λ i (x) if i J, λ i (x) = 0 for i J(p(x)) \ J, and, µ i (x) = µ i (x) if i K, µ i (x) = 0 for i I(p(x))\K, we have (λ(x), µ(x)) M(x) satisfying supp(λ(x)) J, supp(µ(x)) K, which leads to a contradiction with (35). Therefore, by Case i and Case ii, we have shown the proposed conclusion. Corollary 3.. Let ˆB(x) be defined in (23). Suppose that (GAIPCQ) is satisfied at p(x). Then, there exists an open neighborhood ˆN (x) such that ˆB(y) ˆB(x) for any y ˆN (x). Proof. By virtue of Proposition 2., Lemma 3. and (5), it s clear that ˆB(x) for any x R n. Noticing the continuity of p(.), it follows that K(p(y)) K(p(x)) for y close to x. Hence, the result is valid by virtue of Lemma 3.2. Now, for x R n and any J K ˆB(x), choose some k J and define J = J \ {k}. We consider the following mapping: H J K (y, z, q, w) = (h j (y) h k (y)) j J (f i (y)) i K, ( M j J z j h j (y) + ( j J z j) h k (y) + ) i K q i f i (y) + y w (40) where w R n, (y, z, q) R n R J R K. Let (y 0, z 0, q 0, w 0 ) = (p(x), λ J(x), µ K (x), x), then we have H J K (y 0, z 0, q 0, w 0 ) = 0. For convenience, set h j (y) = h j (y) h k (y), j J. Then, the matrix of partial derivatives of H J K (y, z, q, w) with respect to (y, z, q) is h J(y) T 0 0 J (y,z,q) H J K (y, z, q, w) = f K (y) T 0 0, (4) B J K (y, z, q) M h J(y) M f K (y) where for i K, f i (y) is the i-th column of f K (y) and h j (y) is the j-th column of h J(y) for any j J, and B J K (y, z, q) = I + M z j 2 h j (y) + ( z j ) 2 h k (y) + q i 2 f i (y). (42) j J j J i K

When J =, then J, the variable z, and the terms concerning them in above formulae are vacuous. Similarly, all terms regarding to K above will vanish when I(p(x)) = 0. In this two cases, the mapping H in (40) and its corresponding Jacobian matrix would be described by simpler forms in some lower dimensional spaces, and the analysis becomes much easier. Now, we get the following result: Lemma 3.3. For any J K ˆB(x), there exists an open neighborhood S J K of w 0 (= x) and an open neighborhood Q J K of (y 0, z 0, q 0 )(= (p(x), λ J (x), µ K (x))) such that when w cls J K, the equations H J K (y, z, q, w) = 0 have a unique solution (y(w), z(w), q(w)) J K clq J K where z J K (w) or q J K (w) is vacuous if J = or K = 0, respectively. Furthermore, (y(w), z(w), q(w)) J K is continuously differentiable on S J K. Proof. When J = or I(p(x)) = 0, it is easy to see that the desired results are true. Next, we consider the case when J > and I(p(x)) = 0. It follows from (6) that 2 h j (x) 0 for any j J. Also, noticing q 0 0, 0 z 0 j, 0 j J z0 j for any j J, then there exists an open neighborhood of (y 0, z 0, q 0, w 0 ), say N J K (y 0, z 0, q 0, w 0 ), such that B J K (y, z, q) is positive definite on this neighborhood. On the other hand, by the definition of ˆB(x), we can easily know that the vectors { h j (y 0 ), f i (y 0 ) : j J, i K} (43) are linearly independent. So, by the continuity of h j and f i, j J, i K, there exists an open neighborhood of (y 0, z 0, q 0, w 0 ), for convenience say N J K (y 0, z 0, q 0, w 0 ) as above, such that for any (y, z, q, w) N J K (y 0, z 0, q 0, w 0 ) { h j (y), f i (y) : j J, i K} (44) are linearly independent. Then, it follows that the matrix ( h J(y), f K (y) ) has full column rank. Hence, we have J (y,z,q) H J K (y, z, q, w) is nonsingular on N J K (y 0, z 0, q 0, w 0 ). Noticing the fact that H J K (y 0, z 0, q 0, w 0 ) = 0 and by virtue of the Implicit Function Theorem, we derive the desired conclusion. Since J(p(x)) and I(p(x)) are both finite index sets, according to Lemma 3.3, we derive the following finitely many functions, g J K : S J K R n defined by g J K (w) = M(w y J K (w)), J K ˆB(x). (45) where S J K, y J K (w) are defined in Lemma 3.3. Obviously, g J K (w) is continuously differentiable on S J K. By using these functions and the (GAIPCQ) qualification, we derive the piecewise smoothness of g. 2

Proposition 3.. Suppose that (GAIPCQ) holds at p(x). Then, there exists an open neighborhood N (x) around x such that g is piecewise smooth on N (x). In particular, g is semismooth at x. Proof. From (20) and Corollary 3., for any w J K ˆB(w) ˆB(x) such that ˆN (x), it follows that there exists g(w) = M(w p(w)) = j J λ j(w) h j (p(w)) + i K µ i(w) f i (p(w)), λ j (w) 0, j J(p(w)), j J(p(w)) λ j(w) =, (46) Let µ i (w) 0, i I(p(w)). h J (p(w)) f K (p(w)) A(p(w)) =, (47) J 0 K where J, 0 K, denotes the J -dimensional row vector with all ones components, and the K -dimensional zero row vector, respectively. According to the definition of ˆB(x), it follows that A(p(w)) has full column rank. Then, from (46), we get ( ) λj (w) = ( A(p(w)) T A(p(w)) ) ( ) A(p(w)) T g(w), w µ K (w) ˆN (x). (48) Since p(.), h j (.) and f i (.) are continuous, so we have λ J (w) λ J (x) and µ K (w) µ K (x) as w x. Thus, we can choose N (x) { J K ˆB(x) S J K } ˆN (x) to be an open neighborhood of w 0 (= x) such that for any w N (x) and J K ˆB(w), (p(w), λ J (w), µ K (w)) clq J K, where λ J (w), µ K (w) is vacuous when J = or I(p(x)) = 0, respectively. For any w N (x), there exists J K ˆB(x) such that H J K (p(w), λ J (w), µ K (w), w) = 0, (49) and (p(w), λ J (w), µ K (w)) clq J K. So, by virtue of the Implicit Function Theorem, we have (p(w), λ J (w), µ K (w)) = (y J K (w), z J (w), q K (w)). Therefore, it follows that g(w) = M(w p(w)) = M(w y J K (w)) = g J K (w), (50) where g J K (w) is defined in (45). Thus, for any w N (x), there exists at least one continuously differentiable function g J K : S J K R n such that g(w) = g J K (w) with J K ˆB(x). Noticing that ˆB(x) is a finite set, we have g is piecewise smooth on N (x) by Definition 2.2. Moreover, g is semismooth at x according to Qi and Sun (993). 3

Remark 3.. In this section, we show that g is piecewise smooth around x, thereby, semismooth at x, for the problems with C 2 inequality constraints. Actually, if constraint functions f i in () are piecewise C 2 convex on R n, we can also analogously derive the same results without difficulty by following the idea of this paper. The description in this case should be much more complicated. However, it is not straightforward to extend these results to the general constraint problems with both equality and inequality constraints. In fact, the analysis would become quite difficult and complex in this situation since the methods used in this paper are perhaps not valid. One of the main obstacles is that at the KKT points the Lagrangian multipliers corresponding to the equality constraints are very hard to deal with. 4. Conclusion The Moreau-Yosida regularization is a powerful tool for smoothing nondifferentiable functions, which is widely used in nonsmooth optimization. A significant feature is the semismoothness of the gradient of the Moreau-Yosida regularization. In this paper, we investigate the semismoothness of the gradient g of the Moreau-Yosida regularization of f, which plays a key role in the superlinear convergence analysis of approximate Newton methods. The function f is closely related to a constrained program with a convex, piecewise C 2 objective f 0 and inequality constraints. By introducing a new qualification, we derive the piecewise smoothness of the gradient g. We also believe that the results established in this paper could enrich the theory in nonsmooth optimization greatly. Actually, there are still a lot of questions left open. For instance, for general convex problems with both equality and inequality constraints, we have not a confirmative result for the semismoothness of the gradient g so far. It is a great challenge to answer these questions. Acknowledgment The research of the first author was supported by Grant R-46-000-036-592 of IHPC- CIM, National University of Singapore. References [] Chaney, R.W. Second-order Necessary Conditions in Semismooth Optimization. Math. Programming. 988, 40 (), 95 09. [2] Chaney, R.W. Optimality Conditions for Piecewise C 2 Nonlinear Programming. J. Optim. Theory Anal. 989, 6 (2), 79 202. 4

[3] Chaney, R.W. Piecewise C k Functions in Nonsmooth Analysis. Nonlinear Anal. Theory, Methods & Appl. 990, 5 (7), 649 660. [4] Clarke, F. H. Optimization and Nonsmooth Analysis. John Wiley and Sons: New York, 983. [5] Ermoliev, Y. M.; Kryazhimskii, A. V.; Ruszczyńskii, A. Constraint Aggregation Principle in Convex Optimization. Math. Programming. 997, 76 (3), 353 372. [6] Fukushima, M.; Qi, L. A Global and Superlinear Convergent Algorithm for Nonsmooth Convex Minimization. SIAM J. Optim. 996, 6 (4), 06 20. [7] Hiriart-Urruty, J. B.; Lemaréchal, C. Convex Analysis and Minimization Algorithms. Springer Verlag: Berlin, 993. [8] Kiwiel, K. Proximity Control in Bundle Methods for Convex Nondifferentiable Minimization. Math. Programming. 990, 46 (), 05 22. [9] Lemaréchal, C.; Nemirovskii, A.; Nesterov, Y. New Variants of Bundle Methods. Math. Programming. 995, 69 (), 47. [0] Lemaréchal, C.; Sagastizábal, C. An Approach to Variable Metric Methods. In Systems modelling and optimization, Lecture notes in control and information sciences 97; Henry, J., Yvon, J.-P., Eds., Springer-Verlag: Berlin, 994; 44 62. [] Lemaréchal, C.; Sagastizábal, C. Variable Metric Bundle Methods: From Conceptual to Implementable Forms. Math. Programming. 997a, 76 (3), 393 40. [2] Lemaréchal, C.; Sagastizábal, C. Practical Aspects of the Moreau-Yosida Regularization I: Theoretical Preliminaries. SIAM J. Optim. 997b, 7 (2), 367 385. [3] Lemaréchal, C.; Strodiot, J. J.; Bihain, A. On a Bundle Method for Nonsmooth Optimization. In Nonlinear Programming 4; Mangasarian, O. L.; Meyer, R. R.; Robinson, S. M. Eds.; Academic Press: New York, 98; 245 282. [4] Meng F. W.; Hao Y. The Property of Piecewise Smoothness of Moreau-Yosida Approximation for a Piecewise C 2 Convex Function. Adv. Math. (Chinese). 200l, 30 (4), 354 358. [5] Mifflin, R.; Qi, L.; Sun, D. Properties of the Moreau-Yosida Regularization of a Piecewise C 2 Convex Function. Math. Programming. 999, 84 (2), 269 28. [6] Moreau, J. J. Proximite et Dualite Dans un Espace Hilbertien. Bulletin de la Societe Mathematique de France (French). 965, 93, 273 299. 5

[7] Pang, J. S.; Ralph, D. Piecewise Smoothness, Local Invertibility, and Parametric Analysis of Normal Maps. Math. Oper. Res. 996, 2 (2), 40 426. [8] Poljak, B. T. A General Method of Solving Extremum Problems. Dokl. Akad. Nauk SSSR. 967, 74, 33 36. [9] Qi, L. Second-order Analysis of the Moreau-Yosida Regularization of a Convex Function. Preprint, Revised version, School of Mathematics, the University of New South Wales, Sydney, Australia, 995. [20] Qi, L.; Chen, X. A Preconditioning Proximal Newton Method for Nondifferentiable Convex Optimization. Math. Programming. 997, 76 (3), 4 429. [2] Qi, L.; Sun, J. A Nonsmooth Version of Newton s Method. Math. Programming. 993, 58 (3), 353 367. [22] Rockafellar, R. T. Convex Analysis. Princeton, New Jersey, 970. [23] Rockafellar, R. T. Maximal Monotone Relations and the Second Derivatives on Nonsmooth Functions. Ann. Inst. H. Poincaré Anal. Non Linéaire. 985, 2 (3), 67 84. [24] Sun, D.; Han, J. On a Conjecture in Moreau-Yosida Regularization of a Nonsmooth Convex Function. Chinese Sci. Bull. 997, 42 (7), 40 43. [25] Yosida, K. Functional analysis. Springer Verlag: Berlin, 964. [26] Zhao, G.; Meng, F. On Piecewise Smoothness of a Lagrangian-dual Function and Its Moreau-Yosida Regularization. Research Report, DAMTP of University of Cambridge, 2000. 6