Subgradient Projectors: Extensions, Theory, and Characterizations

Size: px

Start display at page:

Download "Subgradient Projectors: Extensions, Theory, and Characterizations"

Amice Patterson
5 years ago
Views:

1 Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke, Caifang Wang, Xianfu Wang, and Jia Xu April 13, 2017 Abstract Subgradient projectors play an important role in optimization and for solving convex feasibility problems. For every locally Lipschitz function, we can define a subgradient projector via generalized subgradients even if the function is not convex. The paper consists of three parts. In the first part, we study basic properties of subgradient projectors and give characterizations when a subgradient projector is a cutter, a local cutter, or a quasi-nonexpansive mapping. We present global and local convergence analyses of subgradent projectors. Many examples are provided to illustrate the theory. In this second part, we investigate the relationship between the subgradient projector of a prox-regular function and the subgradient projector of its Moreau envelope. We also characterize when a mapping is the subgradient projector of a convex function. In the third part, we focus on linearity properties of subgradient projectors. We show that, under appropriate conditions, a linear operator is a subgradient projector of a convex function if and only if it is a convex combination of the identity operator and a projection operator onto a subspace. In general, neither a convex combination nor a composition of subgradient projectors of convex functions is a subgradient projector of a convex function Mathematics Subject Classification: Primary 49J52; Secondary 49J53, 47H04, 47H05, 47H09. Keywords: Approximately convex function, averaged mapping, cutter, essentially strictly differentiable function, fixed point, limiting subgradient, local cutter, local quasi-firmly nonexpansive mapping, local quasi-nonexpansive mapping, local Lipschitz function, linear cutter, linear firmly nonexpansive mapping, linear subgradient projection operator, Moreau envelope, projection, prox-bounded, proximal mapping, prox-regular function, quasi-firmly nonexpansive mapping, quasi-nonexpansive mapping, (C, ε)-firmly nonexpansive mapping, subdifferentiable function, subgradient projection operator. 1 Introduction Studies of optimization problems and convex feasibility problems have led in recent years to the development of a theory of subgradient projectors, which is a projection to a certain half-space. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. heinz.bauschke@ubc.ca. Department of Mathematics, Shanghai Maritime University, Shanghai, China. cfwang@shmtu.edu.cn. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. shawn.wang@ubc.ca. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. jia.xu@ubc.ca. 1

2 Rather than finding projections on level sets of original functions, the iterative algorithms find projections on half spaces which include the 0-level set of the function. Polyak developed subgradient projector iteration for convex functions [46, 47, 48], and they are further developed by Censor, Combettes, Fukushima, Kiwiel, Yamada and others, and applied to many kinds of optimization problems [22, 21, 9, 24, 25, 29, 34, 35, 44, 19, 55]. In [12], we give a systematic study for subgradient projectors of convex functions. Convexity is often a too strong assumption for the needs of applications. In a recent work [43], Pang studied finitely convergent algorithms for nonconvex inequality problems involving approximately convex functions. The subgradient projector by Pang used the Clarke subdifferential instead of the Mordukhovich limiting subdifferential. To this day, however, there is a lack of systematic theory on the subgradient projector when a function is possibly nonconvex. The goal of this paper is to carry out the basic theory of subgradient projectors for possibly nonconvex functions on a finite dimensional space, which is thus aimed ultimately at applications to diverse problems of nonconvex optimization. Non-differentiable and nonconvex functions arise in many optimization problems. As far as nonconvex functions are concerned, the cutter theory or T -class developed by Cegielski [20], Bauschke, Borwein and Combettes [8], and Bauschke, Wang, Wang and Xu [13] furnish the new approach to subgradient projectors, without appealing to the existence theory on subgradient projectors for convex functions. Our study shows that subgradient projectors for nonconvex functions have many attractive analytical properties. Among all results presented here, we discover that while cutters and quasi-nonexpansive mappings on R n are global, cutters and quasi-nonexpansive mappings on a neighborhood are more useful for functions which are locally convex around the desired point, say a critical point or a feasible point. This paper not only includes some results from [54], but also many refinements and new advances. Since definitions and proofs are much simpler in the finite dimensional space, and many technical complications do not even appear, we shall work in the finite dimensional space only. For the convenience of readers, our main results are presented in three parts. In the first part, we study extensions and theory of subgradient projectors. In the second part, we consider subgradient projectors of Moreau enevelopes and conditions under which a mapping is the subgradient projector of a convex function. The third part is devoted to linear subgradient projectors. The remainder of this paper is organized as follows. Part I consists of Sections 2 6. Section 2 provides an extension of subgradient projectors from convex functions to possibly nonconvex functions; Section 3 is devoted to calculus of subgradient projectors; Section 4 deals with whether one can recover a function from its subgradient projector, and fixed point closed property of a subgradient projector; conditions on functions under which their subgradient projectors are cutters or local cutters are presented in Section 5. Section 6 is devoted to convergence analysis of subgradient projectors by using theory from cutters, local cutters, quasinonexpansive mapping, and local quasinonexpansive mappings. Under appropriate assumptions, we show that subgradient projectors are (C, ε)-firmly nonexpansive, a very useful concept introduced by Hesse and Luke for studying local linear convergence of a variety of algorithms. Part II consists of Sections 7 8. For prox-bounded and prox-regular functions, their Moreau envelopes are differentiable. Section 7 studies the subgradient projectors of Moreau envelopes of prox-bounded and prox-regular functions, and their connections to subgradient projectors of original functions. We show that if f is proper, lsc, prox-bounded, and prox-regular on R n, then f is a difference of convex functions (Corollary 7.9); and that if f is C 2, min f = 0, and f (x) = 0 for every x R n \ argmin f, then the subgradient projector G f of f is a cutter if and only if the 2

3 subgradient projector G eλ f of the envelope e λ f is a cutter for every λ > 0 (Propositions 7.22 and 7.24). Section 8 characterizes when a mapping is actually a subgradient projector of a convex function. Part III consists of Section It is interesting to ask when a subgradient projector is linear, and what special properties a linear subgradient projector possesses. To the best of our knowledge, this question has not been explored in the literature. Section 9 studies linear subgradient projectors and their distinguished features. In particular, we give a nonlinear cutter which is nonexpansive but not firmly nonexpansive, and the example is much simpler than the one given by Cegielski [20]. In Section 10, using results from Section 9, we show that in general neither a convex combination nor a composition of subgradient projectors of convex functions is a subgradient projector of a convex function. Finally, in Section 11, we completely characterize linear subgradient projectors on R 2, and give explicit formulae for the corresponding functions. The notation that we employ is for the most part standard; however, a partial list is provided for the reader s convenience. Throughout this paper, R n is the n-dimensional dimensional Euclidean space with inner product, and induced norm, i.e., ( x R n ) x := x, x. The identity operator on R n is Id. For a mapping T : R n R n, its fixed point set is denoted by Fix T := x R n Tx = x } ; its kernel is ker T := x R n Tx = 0 } ; its range is ran T := y R n y = Tx for some x R n}. For a function f : R n (, + ], its α-level set is denoted by lev α f := x R n f (x) α } ; its effective domain is dom f := x R n f (x) < + }. For a set-valued mapping F : R n R m, the domain, range and fixed point set of F are given by dom F := x R n F(x) = }, ran F := x R n F(x), and Fix F := x R n x F(x) } respectively. We use B(x, δ) for the closed ball centered at x R n with radius δ > 0. R + denotes the set of non-negative real numbers, and N denotes the set of non-negative integers 0, 1, 2,...}. For a set C R n, its distance function is and the projection operator onto C is d C : R n [0, + ) : x inf x y y C }, P C : R n C : x p C x p = d C (x) }. The indicator function of C is ι C : R n (, + ] is defined by ι C (x) := 0 if x C and ι C (x) := + if x C. We write int C for the interior, and bdry(c) := C \ int C for the boundary of C, respectively. For a subspace L R n, its orthogonal complement is defined to be L := y R n y, x = 0, x L }. When x, y R n, the line segment between x, y is given by [x, y] := (1 λ)x + λy 0 λ 1 }. 3

4 Part I Extensions to possibly nonconvex functions and basic theory 2 An extension of subgradient projector via limiting subgradients To introduce subgradient projectors for possible non-convex functions, we need the following generalized subgradients [51, 40, 39, 31]. Definition 2.1 Consider a function f : R n (, + ] and a point x R n with f ( x) finite. For a vector v R n, one says that (i) v is a regular (or Fréchet) subgradient of f at x, written v ˆ f ( x), if f (x) f ( x) + v, x x + o( x x ); (ii) v is a limiting (or Mordukhovich) subgradient of f at x, written v f ( x), if there are sequences x ν x, f (x ν ) f ( x) and v ν ˆ f (x ν ) with v ν v. A locally Lipschitz function is subdifferentially regular at x with f ( x) finite if f ( x) = ˆ f ( x), see [51, Corollary 8.11], [39]. It is well-known that when f is locally Lipschitz, f is nonempty-valued everywhere; when f is lower semicontinuous (lsc), the set of points at which f is nonemptyvalued is at least dense in the domain of f, [51, Corollary 8.10]. Furthermore, f is the usual Fenchel subdifferential when f is convex. All of these results can be found in [18, 17, 40, 51]. A function f : R n R is called subdifferentiable if f (x) = for every x R n. While every locally Lipschitz functions on R n is a subdifferentiable function, a subdifferentiable function might not be locally Lipschitz, e.g., 1 if x 0, f : R R : x 1 x if x > 0, see also [51, page 359]. The key concept we shall study is the subgradient projection operator. Definition 2.2 Let f : R n R be lsc and subdifferentiable, and let s : R n R n be a selection of f. The subgradient projector of f is defined by (1) G f,s : R n R n : x x x f (x) s(x) 2 s(x) if f (x) > 0 and 0 f (x), otherwise. When it is not necessary to emphasize the selection s, we will write G f. It is also convenient to introduce the set-valued mapping associated with f by (2) G f : R n R n : x G f,s (x) s is a selection of f } with G f,s being given in (1). 4

5 Although subgradient projectors have been well studied for convex functions [12, 21, 24, 20, 44, 46, 48, 55, 42], the extension to possibly nonconvex functions is new. When f is convex and inf R n f 0, G f,s reduces to x f (x) G f,s : R n R n s(x) if f (x) > 0, : x s(x) 2 x otherwise, where s : R n R n is a selection of f with s(x) f (x). When f is continuously differentiable on R n \ lev 0 f, G f reduces to x f (x) G f : R n R n f (x) if f (x) > 0 and f (x) = 0, : x f (x) 2 x otherwise. The geometric interpretation and motivation of the subgradient projector come from the following: Proposition 2.3 Let f : R n R be lsc and subdifferentiable, and let s be a selection of f. (i) Whenever f (x) > 0, 0 f (x), we have where the half space G f,s (x) = P H (s(x),x)(x) (3) H (s(x), x) : = z R n f (x) + s(x), z x 0 }. (ii) The fixed point set of G f,s is Fix G f,s = x R n 0 f (x) } lev 0 f = Fix G f. If f is locally Lipschitz, then Fix G f,s is closed. (iii) If f is convex and inf R n f 0, then Fix G f,s = lev 0 f. Proof. (i). According to [7] or [20, page 133], for the half space H (a, β) := z R n a, z β }, where a R n, a = 0 and β R, its metric projection is given by x a,x β a if a, x > β, (4) P H (a,β)x = a 2 x if a, x β. Apply (4) with a := s(x), β := β(x) = s(x), x f (x). (ii). This follows from the definition of G f. When f is locally Lipschitz, f is uppersemicontinuous [51, Proposition 8.7], so x R n 0 f (x) } is closed. Being a union of two closed sets, Fix G is closed. (iii). When f is convex, 0 f (x) gives f (x) = min X f, so f (x) 0. Then x R n 0 f (x) } lev 0 f. Thus (iii) follows from (ii). 5

6 Remark 2.4 (i) Proposition 2.3(i) uses the Euclidean distance. Following [36, 8, 11, 10], one may define Bregman subgradient projectors for lsc and subdifferentiable functions. This will be explored in future work. (ii) Proposition 2.3(ii) shows that for subdifferential functions, the fixed point of G f gives x R n such that 0 f (x) or f (x) 0. Proposition 2.3(iii) shows that for convex functions, the fixed point of G f gives x R n such that f (x) 0. We give two simple examples to illustrate the difference of subgradient projectors between convex and nonconvex functions. Example 2.5 Consider f : R n R : x k x = x 1/k where k > 0. Then (i) When k 1, f is convex, G f = (1 k) Id is firmly nonexpansive. (ii) When k > 1, f is not convex, G f = (1 k) Id is not monotone and need not be nonexpansive, e.g, k = 3. Proof. For x = 0, we have and the result follows from the definition of G f. f (x) = 1 x k x 1/k 1 x, Let B denote the closed unit ball of R n. According to [51, Exercise 8.14], for a nonempty C R n the normal cone and regular normal cone mapping are respectively defined N C := ι C and ˆN C := ˆ ι C. Recall Fact 2.6 ([51, Example 8.53]) For f := d C in the case of a closed set C = in R n, one has at any point x C that f ( x) = N C ( x) B, ˆ f ( x) = ˆN C ( x) B. On the other hand, for any x C, one has f ( x) = x P C( x) d C ( x) x x, ˆ f ( x) = d C ( x) } if P C ( x) = x}, otherwise. Example 2.7 (subgradient projectors of distance functions) Let C = be a closed set in R n. Then G dc = P C. Proof. Let s be a selection of d C. By Fact 2.6, where p(x) P C (x). We show G dc,s = p. ( x C) s(x) = x p(x) d C (x) When x C, we have P C (x) = x} and p(x) = x. Because d C (x) = 0 for x C, the definition of G dc,s gives G dc,s(x) = x. Thus G dc,s(x) = x = p(x) for x C. 6

7 When x C, d C (x) > 0 and 0 d C (x) because every x d C (x) has x = 1 by Fact 2.6. Then for x C, G dc,s(x) = x d C (x) x p(x) d C (x) = p(x). Altogether, G dc,s(x) = p(x) for every x R n. When C is nonempty, closed and convex, the projection mapping P C is single-valued, and d C is continuously differentiable on R n \ C. Example 2.7 implies: Fact 2.8 ([11], [25]) Let C R n be nonempty, closed and convex. Then G dc = P C. What happens if we take the subgradient projector of a distance function to a set where the distance is taken with respect to another norm? The following example illustrates that using the Euclidean norm for d C in Example 2.7 is essential. Example 2.9 Define f : R 2 R : (x 1, x 2 ) x 1 + x 2, the distance function to C := (0, 0)} in l 1 -norm. When x 1 > 0, x 2 > 0, x 1 = x 2, we have G f (x 1, x 2 ) = ((x 1 x 2 )/2, (x 2 x 1 )/2) = (0, 0) = P C (x 1, x 2 ). Even using the dual norm of 1 for s(x 1, x 2 ), we have G f (x 1, x 2 ) = ( x 2, x 1 ) = (0, 0) = P C (x 1, x 2 ). Remark 2.10 Example 2.7 might lead the reader to believe that G f is a monotone operator. This holds for any twice differentiable convex function f : R R; see [12, Proposition 8.2]. However, this fails for f : R 2 R : (x 1, x 2 ) x 1 p + x 2 p when 1 < p < 2; see [12, Proposition 10.1(iii)]. The following example shows that the assumption that f being subdifferentiable is important in Definition 2.2. Example 2.11 The function defined by f : R R : x 1/ x if x = 0, 0 if x = 0, has G f = 2 Id on R so that Fix G f = Fix 2 Id = 0}. However, the function defined by 1/ x if x = 0, g : R (, + ] : x + if x = 0, has G g = 2 Id on R \ 0}. Because G g is not defined at x = 0, we have Fix G g = but Fix 2 Id = 0}. 3 Calculus for subgradient projectors In this section we obtain calculus results for subgradient projectors defined in Section 2 related to representations of subgradient projectors for max functions, compositions of functions with a linear operator, and positive powers of nonnegative functions. Subdifferential calculus is the main tool for proving these results. 7

8 A mapping Φ : R n R k is called strictly differentiable at x if the Fréchet derivative Φ ( x) exists and Φ(x) Φ(y) Φ ( x)(x y) lim = 0. x,y x x y y =x The following facts on subdifferentials are crucial to study the calculus of subgradient projectors. Fact 3.1 ([39, Theorem 6.5], [40, Theorem 1.110(ii)]) Assume that F : R n R k is locally Lipschitz at x R n, and g : R k R is strictly differentiable at F( x). Then for f (x) = g(f(x)), one has f ( x) = g (ȳ), F ( x) with ȳ = F( x). For a matrix A : R n R n, let A denote its transpose. Fact 3.2 ([39, Theorem 6.7(i)], [40, Proposition 1.112(i)], or [51, Exercise 10.7]) Let F : R n R n be strictly differentiable at x with F ( x) being invertible, and suppose that f (x) = g(f(x)) with g : R n (, + ] being lsc around ȳ = F( x) and f being finite at x. Then f ( x) = ( F ( x) ) g(ȳ) with ȳ = F( x). Fact 3.3 ([39, Theorem 7.5(ii)], [40, Theorem 3.46(ii)]) Let f 1, f 2 : R n R be locally Lipschitz at x and J( x) := j f j ( x) = max f 1, f 2 }( x) }. Then max f 1, f 2 }( x) conv f j ( x) j J( x) }, where the equality holds and max f 1, f 2 } is subdifferentially regular at x if the function f j is subdifferentially regular at x for j J( x). Proposition 3.4 Let f : R n R be lsc and subdifferentiable. (i) If k > 0 then G k f = G f. (ii) Let α R and s be a selection of f. Define G f,α : R n R n : x x x f (x) α s(x) 2 s(x) if f (x) > α and 0 f (x), otherwise. Then G f,α = G f α,s. (iii) Let α > 0 and s be a selection of f. Then G f α,s (x) = G f,s (x) + αs(x) s(x) 2 x if f (x) > α and 0 f (x), otherwise. Proof. (i). By Fact 3.1, (k f ) = k f. Note that k f (x) > 0 if and only if f (x) > 0, and 0 (k f )(x) if and only if 0 f (x). When k f (x) > 0 and 0 (k f )(x), for s(x) f (x), we have ks(x) (k f )(x) so that G k f,ks (x) = x k f (x) ks(x) 2 ks(x) = x f (x) s(x) 2 s(x) = G f,s(x). 8

9 When k f (x) 0 or 0 (k f )(x), we have f (x) 0 or 0 f (x), so G k f,ks (x) = x = G f,s (x). (ii). It suffices to note that ( f α) = f. (iii). When f (x) > α and 0 f (x), we have f (x) > 0 and 0 f (x). Then G f α,s (x) = x f (x) α s(x) 2 s(x) = x f (x) s(x) 2 s(x) + α s(x) 2 s(x) = G f,s(x) + α s(x) 2 s(x). When f (x) α or 0 f (x), G f α,s (x) = x by the definition. Proposition 3.5 Assume that f 1, f 2 : R n R are locally Lipschitz and subdifferentially regular. For the maximum function g := max f 1, f 2 }, one has G f1 (x) if g(x) > max f 2 (x), 0}, 0 f 1 (x), G G g (x) = f2 (x) if g(x) > max f 1 (x), 0}, 0 f 2 (x), V(x) if g(x) = f 1 (x) = f 2 (x) > 0, 0 conv ( f 1 (x) f 2 (x) ), x if g(x) 0, or 0 conv ( f 1 (x) f 2 (x) ), where V(x) := x f i(x) s(x) 2 s(x) s(x) conv ( f 1 (x) f 2 (x) )}. Proof. When g(x) > 0, we consider three cases: (i). g(x) > f 2 (x); (ii). g(x) > f 1 (x); (iii). g(x) min f 2 (x), f 1 (x)} which is g(x) = f 1 (x) = f 2 (x). Also note that g(x) = conv ( f 1 (x) f 2 (x) ) when f 1 (x) = f 2 (x) by Fact 3.3. Proposition 3.6 Assume that f : R n R is lsc and subdifferentiable, and that g(x) := f (kx) with 0 = k R. Then for every x R n. Moreover, Fix G g = 1 k Fix G f. G g (x) = 1 k G f (kx) Proof. By Fact 3.2, g(x) = k f (y) where y = kx, so 0 g(x) if and only if 0 f (y) with y = kx. Let s be a selection of f. When g(x) > 0 and 0 g(x), we have f (kx) > 0 and 0 f (kx), therefore (5) (6) where s(y) f (y) with y = kx. G g,ks(k ) (x) = x f (kx) ks(y) 2 ks(y) = x 1 f (kx) k s(y) 2 s(y) = 1 ( kx f (kx) ) k s(y) 2 s(y) = 1 k G f,s(kx) When g(x) 0 or 0 g(x), we have f (y) 0 or 0 f (y) with y = kx, thus G g,ks(k ) (x) = x = 1 k kx = 1 k G f,s(kx). This establishes the result. 9

10 Proposition 3.7 Let A : R n R n be unitary and b R n, and let f : R n R be lsc and subdifferentiable. Define g : R n R : x f (Ax + b). Then (7) G g (x) = A ( G f (Ax + b) b ) for every x R n. Furthermore, (8) Fix G g = A (Fix G f b). Proof. Let s be a selection of f. By Fact 3.2, g(x) = A f (y) where y = Ax + b. As A is unitary, A s(y) = s(y) for every s(y) f (y). When g(x) > 0 and 0 g(x), we have f (Ax + b) > 0 and 0 f (y) with y = Ax + b, therefore (9) (10) G g,a s(a +b)(x) = x ( f (Ax + b) A s(y) 2 A s(y) = A Ax + b = A (G f,s (Ax + b) b). ) f (Ax + b) s(y) 2 s(y) b When g(x) 0 or 0 g(x), we have f (y) 0 or 0 f (y) with y = Ax + b, thus G g,a s(a +b)(x) = x = A (Ax + b b) = A (G f,s (Ax + b) b). Hence (7) holds. Finally, (8) follows from (7). Corollary 3.8 Let a R n, f : R n R be lsc and subdifferentiable, and g(x) := f (x a). Then G g (x) = G f (x a) + a for every x R n. Moreover, Fix G g = a + Fix G f. Theorem 3.9 Assume that f : R n R + is locally Lipschitz, and g := f k with k > 0. Then ( G g = 1 1 ) Id + 1 k k G f. Proof. By Fact 3.1, g(x) = k f (x) k 1 f (x) when f (x) > 0. Let s be a selection of f. When g(x) > 0 and 0 g(x), we have f (x) > 0 and 0 f (x), therefore (11) (12) (Id G g,k f k 1 s )(x) = f (x) k k f (x) k 1 s(x) 2 k f (x)k 1 s(x) = 1 f (x) k s(x) 2 s(x) = 1 k (Id G f,s)(x). When g(x) = 0 or 0 g(x), we have f (x) = 0 or 0 f (x), thus (Id G g,k f k 1 s )(x) = 0 = (Id G f,s)(x). Therefore, Id G g,k f k 1 s = 1 k (Id G f,s) which gives G g,k f k 1 s = ( 1 1 k ) Id + 1 k G f,s. Remark 3.10 While Theorem 3.9 says that the convex combination of Id and G f,s is a subgradient projector, the set of subgradient projectors is not a convex set; see Theorems 10.1,10.3 in Section 10. Note that if U i : H H is a cutter (see [20] or Definition 5.1) with a common fixed point, i I := 1, 2,..., m}, and w : H m is an appropriate weight function, then the operator U := i I w i U i is a cutter, cf. [20, Corollary ]. 10

11 Corollary 3.11 For f := d 2 C in the case of a closed set C = in Rn, one has G f = Id +P C. 2 Proof. Combine Theorem 3.9 and Example 2.7. Example 3.12 (penalty function) Assume that g : R n R is locally Lipschitz. In optimization, for a direct constraint given by C := x R n g(x) 0 }, one can define penalty substitutes. Two popular penalty functions associated with g are: the linear penalty θ 1 g(x) = t + (g(x)) and quadratic penalty θ 2 g(x) = t 2 +(g(x)) where t + := max0, t}, cf. [51, page 4]. We have ( x R n ) G θ1 g(x) = G g (x). Because θ 2 g = (θ 1 g) 2, by Theorem 3.9 we obtain G θ2 g = Id +G θ 1 g. 2 The following is immediate from the definition of subgradient projectors. Proposition 3.13 Let f, g : R n R be lsc and subdifferentiable such that f g on an open set O R n. Then G f = G g on O. Remark 3.14 For calculus of subgradient projectors of convex functions, see [12, 44]. 4 Basic properties of subgradient projectors In this section under appropriate conditions we show that the subgradient projector can determine a function uniquely up to a positive scalar multiplication, that the subgradient projector enjoys the fixed point closedness property, and that the subgradient projector is continuous if the function is strictly differentiable. We start with some elementary properties of subgradient projectors. Theorem 4.1 Let f : R n R be lsc and subdifferentiable, and G f,s be given by (1). Then the following hold: (i) We have (13) x G f,s (x) = f (x) s(x) and (14) x G f,s (x) x G f,s (x) 2 = 1 f (x) s(x) for every x satisfying f (x) > 0 and 0 f (x). In particular, when f is locally Lipschitz, one has (15) x G f,s (x) x G f,s (x) 2 = s(x) (ln f )(x); f (x) 11

12 when f is continuously differentiable, one has (16) x G f (x) = (ln f (x)). x G f (x) 2 (ii) Set g := ln f when f > 0. Then whenever f (x) > 0 and 0 f (x) we have (17) G f,s (x) = x c(x) c(x) 2 where c(x) = s(x) f (x) g(x). If f is continuously differentiable on Rn \ lev 0 f, then (18) G f (x) = x g(x) g(x) 2, whenever f (x) > 0 and f (x) = 0. Proof. (i). By the definition of G f,s, when f (x) > 0 and 0 f (x), x G f,s (x) = Therefore, x G f,s (x) = f (x) s(x). It follows that (19) x G f,s (x) = f (x)2 s(x) s(x) 2 f (x) = x G f,s(x) 2 s(x) f (x), f (x) s(x) 2 s(x). equivalently, x G f,s (x) x G f,s (x) 2 = s(x). When f is locally Lipschitz, (15) holds because Fact 3.1 gives f (x) (ln f )(x) = f (x) when f (x) > 0. When f is continuously differentiable, s(x) = f (x), hence f (x) (16) follows from (ln f (x)) = f (x)/ f (x) when f (x) > 0. (ii). By Fact 3.1, we have g(x) = f (x) f (x) (17) follows since 1 c(x) 2 = f (x)2 s(x) 2 when f (x) > 0 and 0 f (x). s(x) when f (x) > 0. Then c(x) = where s(x) f (x). f (x) When f is continuously differentiable on R n \ lev 0 f, the same holds for g, so (18) follows. 4.1 When is a mapping T a subgradient projector? Theorem 4.2 Given a mapping T : R n R n. The following are equivalent: (i) T is the subgradient projector of a locally Lipschitz function. (ii) There exists a locally Lipschitz function f : R n R such that (20) x T(x) (ln f (x)) x T(x) 2 whenever f (x) > 0 and 0 f (x), Tx = x whenever f (x) 0 or 0 f (x). 12

13 Proof. (i) (ii). Suppose that T = G f,s with f being locally Lipschitz. Apply Theorem 4.1(i) to obtain (20). (ii) (i). Assume that (20) holds. By Fact 3.1, ln f = f f 0 f (x), (20) gives 1 x Tx = s(x) i.e., x Tx = f (x) when f > 0. When f (x) > 0 and f (x) s(x) where s(x) f (x). Then using (20) again we obtain x Tx = x Tx 2 s(x) so that f (x) as required. Tx = x x Tx 2 s(x) ( ) 2 f (x) s(x) f (x) = x s(x) f (x) = x f (x) s(x) 2 s(x) Can the functions Theorem 4.2(i) and (ii) be different? This is answered in the next subsection. 4.2 Recovering f from its subgradient projector G f Can one determine the function f if G f is known? To this end, we recall the concept of essentially strictly differentiable functions by Borwein and Moors [15, Section 4]. Definition 4.3 A locally Lipschitz function f : R n R is called essentially strictly differentiable on an open set O R n if f is strictly differentiable everywhere on O except possibly on a Lebesgue null set. This class of functions has been extensively studied by Borwein and Moors [15]. This class of functions includes finite-valued convex functions, Clarke regular locally Lipschitz functions, semismooth locally Lipschitz functions, C 1 functions and others, [15, pages ]. If a locally Lipschitz function is essentially strictly differentiable, then f is single-valued almost everywhere. Moreover, the Clarke subdifferetial c f, which can be written as conv f (the convex hull of f ) when f is locally Lipschitz [40, Theorem 3.57], can be recovered by every densely defined selection s f ; see, e.g., [15]. We refer the reader to [23] and [51] for details on the Clarke subdifferential. Fact 4.4 Let f, g be locally Lipschitz on a polygonally connected and open subset O of R n. If f = g almost everywhere on O, then h := f g is a constant on O. Proof. We prove this by contradiction. Rademacher s Theorem says that a locally Lipschitz function is differentiable almost everywhere, see, e.g., [28, page 81]. By the assumption, h is locally Lipschitz, so h = 0 almost everywhere. Suppose that x, y O and h(x) = h(y). As O is polygonally connected, there exists z O such that either [x, z] O with h(x) = h(z) or [z, y] O with h(z) = h(y). Without loss of generality, assume [z, y] O and h(z) = h(y). As h is differentiable almost everywhere, by Fubini s Theorem [49, Theorem 6.2.2, page 110], we can choose z nearby z and ỹ nearby y so that both h is differentiable and h = 0 almost everywhere on [ z, ỹ] O, and h( z) = h(ỹ). Then h(ỹ) h( z) = 1 0 h( z + t(ỹ z)), ỹ z dt = 1 0 0dt = 0 which contradicts h( z) = h(ỹ). 13

14 Theorem 4.5 Let T : R n R n be a subgradient projector. Suppose that there exist two essentially strictly differentiable functions f, f 1 : R n R such that G f,s = T = G f1,s 1 with s being a selection of f and s 1 being a selection of f 1. Then on each polygonally connected component of R n \ Fix T there exists k > 0 such that f = k f 1. Proof. Assume that there exist two essentially strictly differentiable and locally Lipschitz functions f, f 1 such that T = G f,s = G f1,s 1. Since T has a full domain, we have dom f = dom f 1 = R n. By Theorem 4.2, we have x T(x) x T(x) 2 (ln f (x)) whenever x Rn \ Fix T, x T(x) x T(x) 2 (ln f 1(x)) whenever x R n \ Fix T. As f, f 1 are locally Lipschitz, both ln f, ln f 1 are locally Lipschitz on R n \ Fix T. Then ln f = 1 f f, ln f 1 = 1 f 1 f 1 by Fact 3.1 or [23, Theorem 2.3.9(ii)]. Because f, f 1 are essentially strictly differentiable and locally Lipschitz, f, f 1 are single-valued almost everywhere [15], thus (ln f 1 (x)) = x T(x) x T(x) 2 = (ln f (x)) almost everywhere on Rn \ Fix T. By Fact 4.4, on each polygonally connected component of R n \ Fix T, there exists c R such that ln f ln f 1 = c, which implies that f 1 = k f for k = e c > 0. Example 4.6 Define and Then 2x if x > 0, f 1 : R R : x 0 if 1 x 0, 3(x + 1) if x < 1, x if x > 0, f 2 : R R : x 0 if 1 x 0, 1(x + 1) if x < 1. 0 if x 0, ( x R) G f1 (x) = G f2 (x) = x if 1 x 0, 1 if x < 1. The set R \ [ 1, 0] has two connected components (, 1) and (0, + ). We have f 1 = 3 f 2 on (, 1), and f 1 = 2 f 2 on (0, + ). The following example shows that Theorem 4.5 fails if one removes the assumption of essentially strictly differentiability. Example 4.7 In [16], Borwein, Moors and Wang showed that generically nonexpansive Lipschitz functions have their limiting subdifferentials identically equal to the unit ball; see also [53]. Let f : R n R be a locally Lipschitz function such that f (x) = B for every x R n. Since 0 f (x) for every x R n, in view of Definition 2.2 we have G f = Id. As such, generically nonexpansive Lipschitz functions have a subgradient projector equal to the identity mapping. 14

15 4.3 Fixed point closed property and continuity Definition 4.8 We say that an operator T : D R n is fixed point closed at x D if for every sequence x k x with x k Tx k 0 one has x = Tx. If this holds for every x D, we say that T has the fixed point closed property on D. In [20], Cegielski calls the fixed point closed property of T as Id T being closed at 0. Theorem 4.9 (fixed-point closed property) Let f : R n R be locally Lipschitz and G f,s be given by Definition 2.2. Then G f,s is fixed-point closed at every x R n, i.e., (21) y G f,s (y) 0 and y x x = G f,s (x). Proof. Assume that a sequence (y n ) n N in R n satisfies (22) y n G f,s (y n ) 0 and y n x. Consider three cases. Case 1. If there exists infinitely many y n s, say (y nk ) k N, such that 0 f (y nk ). Since f is upper semicontinuous, taking limit when k gives 0 f (x). Hence x = G f (x). Case 2. If there exists infinitely many y n s, say (y nk ) k N, such that f (y nk ) 0. Taking limit when k and using the continuity of f at x gives Hence x = G f (x). f (x) = lim k f (y nk ) 0. Case 3. There exists N N such that f (y n ) > 0 and 0 f (y n ) when n > N. Then by (13), (23) f (y n ) = y n G f,s (y n ) s(y n ). As f is continuous at x, f is locally Lipschitz around x, so f is locally bounded around x. Therefore, f (x) = lim f (y n ) = lim ( y n G f,s (y n ) s(y n ) ) = 0 n n since y n G f,s (y n ) 0. Hence x = G f,s (x). Altogether, x Fix G f,s. This establishes (21) because (y n ) n N was an arbitrary sequence satisfying (22). The following result generalizes [12, Theorem 5.6]. Theorem 4.10 Let f : R n R be a locally Lipschitz function and essentially strictly differentiable, and let G f,s be given by Definition 2.2. Suppose that x R n \ Fix G f,s. Then the following statements are equivalent: (i) G f,s is continuous at x. (ii) f is strictly differentiable at x. Consequently, G f,s is continuous on R n \ Fix G f,s if and only if f is continuously differentiable on R n \ Fix G f,s. 15

16 Proof. (ii) (i). Assume that f is strictly differentiable at x R n \ Fix G f,s. Under the assumption, s : R n R n is continuous at x and s(x) = 0. The result follows from the definition G f,s : y y f (y) s(y) 2 s(y). (i) (ii). Assume that G f,s is continuous at x R n \ Fix G f,s. By (14), y G f,s (y) s(y) = f (y) y G f,s (y) 2 so s is continuous at x. Because s is a selection of f and f is essentially strictly differentiable, we conclude that f is strictly differentiable at x. Note that Fix G f,s is closed by Proposition 2.3(ii). The remaining result follows from the fact that on an open set on which a function is finite, the function is continuously differentiable if and only if the function is strictly differentiable; cf. [51, Corollary 9.19]. We illustrate Theorem 4.10 by two examples. Example 4.11 Define Then f : R n R : x x if x 1, 2x 1 if x > 1. 0 if x < 1, G f,s (x) = 1/2 if x > 1, 1 1 where s(x) [1, 2] if x = 1, s(x) is discontinuous at x = 1, because f is not differentiable at x = 1. Proof. When x < 0, G f,s (x) = x x ( 1) 2 ( 1) = 0; When x = 0, f (0) = 0, so G f,s (0) = 0; When 0 < x < 1, G f,s (x) = x x 1 2 (1) = 0; When x > 1, G f,s (x) = x 2x = 1/2; When x = 1, f (1) = [1, 2], so where s(x) [1, 2]. G f,s (x) = x 1 s(x) 2 (s(x)) = x 1 s(x) The next example gives a function that is differentiable but not strictly differentiable at 0, and that its subgradient projector is not continuous at 0. Example 4.12 Define f : R R : x x 2 sin 1 x + x + 1 if x = 0, 0 if x = 0. Then f is differentiable everywhere, but not strictly differentiable at 0. The subgradient projector x x 2 sin(1/x)+x+1 if f (x) > 0 and f (x) = 0, G f (x) = 2x sin(1/x) cos(1/x 2 )+1 x otherwise, is not continuous at 0. 16

17 Proof. At x = 0, f (0) = 1 and f (0) = 1. The function f is not strictly differentiable at 0 because f is not continuous at 0. Since lim x 0 G f (x) does not exist, the subgradient projector is not continuous at 0. How about the continuity of G f,s on Fix G f,s? Since G f,s = Id on Fix G f,s, it is always continuous at x int(fix G f,s ). The following result deals with the case of x bdry(fix G f,s ). Theorem 4.13 Let f : R n R be locally Lipschtiz, G f,s be given by Definition 2.2, and x bdry(fix G f,s ). (i) Assume that f (x) > 0 and 0 f (x). Then G f,s is discontinuous at x. (ii) Assume that f (x) 0. Suppose that one of the following holds: (a) α > 0 such that (24) ( y : f (y) > 0, 0 f (y)) α f (y) + s(y), x y 0. (b) In particular, this is true when f is convex. (25) 0 f (x). (c) (26) lim inf s(y) > 0. y x f (y)>0,0 f (y) Then G f,s is continuous at x. Proof. (i). As x bdry(fix G f ), there exists a sequence (y k ) k N such that y k x, f (y k ) > 0 and 0 f (y k ). Because f is locally Lipschitz and s(y k ) f (x), (s(y k )) k N is locally bounded. By taking a subsequence if necessary, we can assume that s(y k ) l R +. Taking limit when k yields that y k G f,s (y k ) = f (y k) s(y k ) f (x) l which is + if l = 0 or a positive number if l > 0. Because G f,s (x) = x, this shows that G f,s is not continuous at x. (ii). To show that G f,s is continuous at x, it suffices to show that (27) lim y x f (y)>0,0 f (y) f (y) s(y) = 0. Indeed, by Theorem 4.1, when f (y) > 0 and 0 f (y), we have y G f,s (y) = f (y) s(y). Then (27) gives that lim y x G f,s (y) = lim y x (G f,s (y) y) + lim y x y = x. When y Fix G f,s, G f (y) = y, clearly lim y x G f (y) = x. Hence G f,s is continuous at x. 17

18 Now (24) gives so that which implies (27). f (y) s(y), y x α f (y) y x s(y) α y x s(y) α Next, we show (25) implies (26). Note that (25) gives d f (x) (0) > 0 since f (x) is closed by [51, Theorem 8.6]. Because f is locally Lipschitz, in view of [51, Proposition 8.7], we have that lim sup y x f (y) f (x), hence 0 f (y) for y sufficiently nearby x. Invoking [51, Corollary 4.7(b)], we obtain lim inf d f y x (y) (0) d f (x) (0), from which it follows that (28) (29) lim inf s(y) y x f (y)>0,0 f (y) lim inf d y x f (y) (0) f (y)>0,0 f (y) lim inf d f y x (y) (0) d f (x) (0) > 0, and this gives (26). Finally, (26) gives (27) because lim y x f (y) = 0 and 0 lim inf y x f (y)>0,0 f (y) f (y) s(y) lim sup y x f (y)>0,0 f (y) f (y) s(y) lim y x, f (y)>0 f (y) lim inf y x s(y) = 0. f (y)>0,0 f (y) Here is an example showing the result of Theorem 4.13(i). Example 4.14 (1). Define f : R R : x x Then 2x G f (x) = 3 1 if x = 0 and x > 1, 3x 2 x if x = 0 or x 1, has Fix G f = (, 1] 0}, and G f is not continuous at x = 0. (2). Define Then f : R R : x G f (x) = x + 1 if x 0, 1 if x 0. 1 if 1 < x < 0, x if x 1 or x 0, has Fix G f = (, 1] [0, + ), and G f is not continuous at x = 0. 18

19 4.4 The family of subgradient projectors Theorem 4.15 Let f : R n R be locally Lipschitz. Then the following are equivalent: (i) G f is a single-valued. (ii) f is strictly differentiable on R n \ Fix G f. Proof. (i) (ii). By (14) in Theorem 4.1, when x R n \ Fix G f, we have x G f,s (x) s(x) = f (x) x G f,s (x) 2 where s(x) f (x). By the assumption G f = T for an everywhere single-valued T : R n R n, so x Tx s(x) = f (x) x Tx 2. It follows that f (x) is a singleton, so f is strictly differentiable at x by [51, Theorem 9.18]. Therefore, f is strictly differentiable on R n \ Fix G f. (ii) (i). Clear. Theorem 4.16 Let C R n be a nonempty closed set. Then the following are equivalent: (i) G dc is a single-valued. (ii) C is convex. Proof. According to Fact 2.6, we have Fix G dc = C. (i) (ii). By Theorem 4.15, d C is strictly differentiable on R n \ C. Fact 2.6 shows that P C is singlevalued for every x R n \ C. Hence, C is convex; cf. [27, Theorem 12.7]. (ii) (i). Apply Fact When is the subgradient projector G f a cutter or local cutter? In this section we provide conditions for a subgradient projector to be a cutter or local cutter, and an explicit nonconvex function with a cutter subgradient projector. Along the way some calculus on cutter subgradient projectors are also developed. 5.1 Cutters, quasi-firmly nonexpansive mappings, and local cutters Recall the following well-known algorithmic operators. Definition 5.1 ([20, page 53]) Let D be a nonempty subset of R n and T : D R n. We say that T is a cutter if Fix T = and (30) ( x D)( u Fix T) x Tx, u Tx 0. 19

20 Definition 5.2 ([20, page 56]) Let D be a nonempty subset of R n and T : D R n. We say that T is quasi-firmly nonexpansive (quasi-fne) if Fix T = and ( x D)( u Fix T) Tx u 2 + x Tx 2 x u 2. In [20, page 56], quasi-fne mappings are called strongly quasinonexpansive mappings. The following fact says that a cutter is strongly Fejer monotone with respect to the set of its fixed points, and that cutters and quasi-fne mappings are the same, see [20, page 108]. Fact 5.3 ([20, Theorem , Lemma ]) (i) A mapping T : D R n is a cutter if and only if T is quasi-fne. (ii) Let T : D R n be a cutter. Then T is always continuous on Fix T. (iii) Let T : D R n be a cutter. Then Fix T is closed and convex. In Definitions 5.1 and (i), they require that T satisfies the inequalities for all x D and u Fix T. In practice, the sets D and Fix T might be too large to verify those inequalities. We now introduce local cutters and locally quasi-firmly nonexpansive mappings. Definition 5.4 A mapping T : D R n is a local cutter at x Fix T if Fix T = and there exists δ > 0 such that (31) ( x B( x, δ) D)( u B( x, δ) Fix T) x Tx, u Tx 0. Definition 5.5 A mapping T : D R n is locally quasi-firmly nonexpansive (locally quasi-fne) at x Fix T if there exists δ > 0 such that ( x B( x, δ) D)( u B( x, δ) Fix T) Tx u 2 + Tx x 2 x u 2. A localized version of Fact 5.3(i) comes next. Proposition 5.6 A mapping T : D R n is a local cutter at x Fix T if and only if T is locally quasi-fne at x Fix T. Proof. This follows from x u 2 = Tx u 2 + x Tx x Tx, Tx u. Proposition 5.7 Assume that T : R R and Fix T =. Then T is a cutter on R if and only if (32) ( x R) Tx [x, P Fix T x]. Proof. The sufficiency is clear. Conversely, when x Fix T, (32) clearly holds. Assume x Fix T and c Fix T. Because T is from R to R, there exists λ R such that As T is a cutter, we have Tx = (1 λ)x + λc. (x Tx)(c Tx) = λ(1 λ)(x c) 2 0, which gives 0 λ 1, so that Tx [x, c]. Since c Fix T was arbitrary, it follows that x [x, P Fix T x]. 20

21 Remark 5.8 Compare Proposition 5.7 to Corollary 8.4, which characterizes the subgradient projector of a convex function on R. For a nonempty convex set C R n, the recession cone of C is rec C := x R n x + C C }. The negative polar of K R n is K := y R n y, x 0, x K }. Proposition 5.9 Let T : R n R n be a cutter. Then ran(id T) (rec(fix T)). Consequently, when Fix T is a linear subspace, ran(id T) (Fix T). In other words, ran(id T) (ker(id T)). Proof. Let x Tx ran(id T) and v rec(fix T). Then for every k > 0 and u Fix T, we have u + kv Fix T. The assumption of T being a cutter implies x Tx, u + kv Tx 0 x Tx, u/k + v Tx/k 0. When k this gives x Tx, v 0. Since v rec(fix T) was arbitrary, we have x Tx (rec(fix T)). When Fix T is a linear subspace, Fix T = rec(fix T) and (rec(fix T)) = (rec(fix T)). 5.2 Characterizations of G f being a cutter or local cutter Our first result characterizes the class of functions f for which its G f,s is a cutter. Lemma 5.10 Let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Suppose that f (x) > 0, 0 f (x). Then x G f,s (x), u G f,s (x) = f (x) s(x) 2 ( f (x) + s(x), u x ). Proof. Let f (x) > 0, 0 f (x). The definition of G f,s gives x G f,s (x), u G f,s (x) = f (x)s(x) s(x) 2, u x + f (x) s(x) 2 s(x) = f (x) s(x), u x + f 2 (x) s(x) 2 s(x) 2 = f (x) ( ) f (x) + s(x), u x. s(x) 2 Theorem 5.11 (level sets of tangent planes including the target set) Let f : R n subdifferentiable, let G f,s be given by Definition 2.2, and R be lsc and S := u R n f (u) 0 or 0 f (u) }. Then the following hold: 21

22 (i) G f,s is a cutter if and only if whenever x S and u S one has f (x) + s(x), u x 0. (ii) Let x S and δ > 0. G f,s is a cutter on B( x, δ) if and only if for all x B( x, δ) \ S and u S B( x, δ) one has f (x) + s(x), u x 0. Proof. (i). When f (x) 0 or 0 f (x), x = G f,s (x), (31) holds for T = G f,s. Assume that f (x) > 0, 0 f (x) and s(x) f (x). By Lemma 5.10, x G f,s (x), u G f,s (x) = f (x) s(x) 2 ( f (x) + s(x), u x ). Since f (x) > 0, we deduce that x G f,s (x), u G f,s (x) 0 f (x) + s(x), u x 0. Hence, the result follows from Definition 5.1. (ii). Apply the same arguments as in above with x B( x, δ) and u S B( x, δ). One immediately obtains the following: Fact 5.12 ([20, page 146]) Let f : R n R be convex, let G f,s be given by Definition 2.2, and lev 0 f =. Then G f,s is a cutter. Consequently, G f,s is continuous at every x lev 0 f. Proof. As lev 0 f =, Fix G f,s = lev 0 f. Assume that f (x) > 0. For u G f,s, f (u) 0. By the convexity of f we have f (x) + s(x), u x f (u) 0. Theorem 5.11 shows that G f,s is a cutter. The remaining result follows from Fact 5.3(ii). In Fact 5.12, lev 0 f = is required, as the following example shows. Example 5.13 (1). Let f : R R be defined by ( x R) x 1 if x > 0, G f (x) = 0 if x = 0, x + 1 if x < 0. f (x) := exp x. Then lev 0 f = and In particular, this G f is discontinuous at x = 0 and not a cutter. Moreover, G f is not monotone. (2). Consider f : R n R : x exp( x 2 /2). We have lev 0 f = and G f (x) = x x x 2 if x = 0, 0 if x = 0. In particular, G f is not continuous at 0, so not a cutter. 22

23 Example 5.14 The nonconvex function f : R n R : x x 2 if x 1, 1 if x > 1, has G f being a cutter on a neighborhood of 0, but not a cutter on R n. It is instructive to consider d C where C R n is closed and nonempty. Proposition 5.15 Let C R n be closed and nonempty, and s be a selection of d C. Then G dc,s is a cutter if and only if the set C is convex. Proof. By Fact 2.6, 0 d C ( x) whenever x C, because v = 1 for every v d C ( x). This implies that Fix G dc,s = C. Assume that G dc,s is a cutter. Then Fix G dc,s = C is convex. Conversely, assume that C is convex. We have d C is convex, consequently, G dc,s is a cutter. Theorem 5.16 Let k 1, f : R n [0, + ) be locally Lipschitz, and let G f,s be given by Definition 2.2 Suppose that G f,s is a cutter, and Fix G f,s =. If g = f k, then G g,k f k 1 s is a cutter. Proof. By Theorem 3.9, G g,k f k 1 s = (1 1/k) Id +1/kG f,s. As Id and G f,s are both cutters, and Fix G f,s Fix Id = Fix G f,s =, being a convex combination of cutters, G g,k f k 1 s is a cutter by [20, Corollary , page 62]. In Corollary 11.6, we give an example showing even though G f 2,2 f s is a cutter, G f,s might not be a cutter; so the converse of Theorem 5.16 is not true. Theorem 5.17 Let A : R n R n be unitary, b R n, let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Define g : R n R : x f (Ax + b). If G f,s is a cutter, then G g,a s(a +b) is a cutter. Proof. Let x R n, u Fix G g,a s(a +b). Proposition 3.7 gives G g,a s(a +b)(x) = A ( G f,s (Ax + b) b ), Au + b Fix G f,s. Since A is unitary and G f,s is a cutter, we have (33) (34) (35) (36) (37) x G g,a s(a +b)(x) 2 = x A ( G f,s (Ax + b) b ) 2 = Ax + b G f,s (Ax + b) 2 Ax + b (Au + b) 2 G f,s (Ax + b) (Au + b) 2 = x u 2 G f,s (Ax + b) Au b 2 = x u 2 A ( G f,s (Ax + b) b ) u 2 = x u 2 G g,a s(a +b)(x) u 2. Hence G g,a s(a +b) is a cutter by Fact 5.3(i). Corollary 5.18 Let B be an n n symmetric matrix. Define f : R n R : x 1 2 x Bx. 23

24 Then x x Bx Bx if x (38) G f (x) = 2 Bx Bx > 0 and Bx = 0, 2 x otherwise. Moreover, the following are equivalent: (i) G f is a cutter. (ii) B is positive semidefinte or negative semidefinite. Proof. (38) follows from Definition 2.2. Because B is symmetric, there exists an orthogonal matrix Q such that Q BQ = D, where D is an n n diagonal matrix whose diagonal entries are eigenvalues of B. Using x = Qy, Theorem 5.17 shows that G f is a cutter if and only if G g is a cutter, where g : R n R : y f (Qy) = 1 2 y Dy. (i) (ii). (i) implies that G g is a cutter. This means that (39) ( y R n : y Dy > 0)( u R n : u Du 0) y Du 1 2 y Dy. We will show that all nonzero diagonal entries of D have the same sign. Suppose to the contrary that there exist diagonal entries of D such that λ i > 0, λ j < 0. Put y k = 0, u k = 0 for k = 1,..., n, k = i, j. Then (39) reduces to that whenever λ i y 2 i + λ j y 2 j > 0 and (40) λ i u 2 i + λ j u 2 j 0, we have (41) λ i y i u i + λ j y j u j 1 2 (λ iy 2 i + λ j y 2 j ). Fix (y i, y j ) such that λ i y 2 i + λ j y 2 j > 0 and y j < 0. When u i = 0, u j +, (40) is verified but (41) fails to hold. This contradicts that G g is a cutter. Hence all nonzero diagonal entries of D must have the same sign, which implies that B is positive semidefinite if positive sign, and B is negative semidefinite if negative sign. (ii) (i). When B is positive semidefinite, f is convex, we apply Fact When B is negative semidefinite, G g = Id is a cutter. Theorem 5.19 Let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Assume that R k = 0 and g(x) = f (kx). If G f,s is a cutter, then G g,ks(k ) is a cutter. Proof. Proposition 3.6 gives G g,ks(k ) (x) = 1 k G f,s(kx), and Fix G g,ks(k ) = 1 k Fix G f,s. Let x X and u Fix G g,ks(k ). We have (42) x G g,ks(k ) (x), u G g,ks(k ) (x) = x 1/kG f,s (kx), u 1/kG f,s (kx) (43) = 1/k 2 kx G f,s (kx), ku G f,s (kx) 0 since G f,s is a cutter. Therefore, G g,ks(k ) is a cutter. One might ask: If each function f i : R n R has G fi being a cutter, must the maximum g := max f 1, f 2 } have G g being a cutter? The answer is negative as the following example shows. 24

25 Example 5.20 Let f 1, f 2 : R n R be defined by f 1 (x) := 1 + x and f 2 (x) := 1 x on R. Each G fi is a cutter by Fact The function g(x) := max f 1 (x), f 2 (x)} has 1 if x > 0, G g (x) = 0 if x = 0, 1 if x < 0, which is not continuous at x = 0, so G g is not a cutter. 5.3 A nonconvex function whose G f is a cutter Example 5.21 If f is not convex, G f,s need not be a cutter. Consider Then the subgradient projector of f is G f,s (x) = f : R R : x 1 exp( x 2 ). x ( 1 2x exp( x 2 ) 1 2x ) if x = 0, 0 if x = 0, and Fix G f = 0}. However G f is not a cutter. Indeed, when x > 2 we have f (x) + s(x)(0 x) = 1 exp( x 2 ) + (2x exp( x 2 ))(0 x) By Theorem 5.11, G f is not a cutter. = exp(x2 ) (1 + 2x 2 ) exp(x 2 ) = x2 (x 2 2) 2 exp(x 2 ) > x2 + x4 2 (1 + 2x2 ) exp(x 2 ) Example 5.22 Even though f is not convex, G f,s may still be a cutter. Define 0 if x 0, x if 0 x 20/7, f : R R : x 8(x 2.5) if 20/7 x 3, 2(x 1) if x > 3. Then f is not convex since f (x) is not monotone on [20/7, + ). However, its subgradient projector x if x 0, 0 if 0 < x < 20/7, 20 7 G f,s (x) = if x = 20/7, where s(x) [1, 8], s(x) 2.5 if 20/7 < x < 3, 3 4 if x = 3, where s(x) 2, 8}, s(x) 1 if x > 3. 25

26 is a cutter. To see this, by Theorem 5.11, it suffices to consider zero level sets of tangent planes. Indeed, Let f (u) 0, i.e., u 0. When x 0 > 3, when x 0 = 3, f (x 0 ) + s(x 0 )(u x 0 ) = 2(u 1) 0; f (x 0 ) + s(x 0 )(u x 0 ) = 4 + s(3)(u 3) 4 + 2(u 3) 0; where 2 s(3) 8; when 20/7 < x 0 < 3, when x 0 = 20/7, f (x 0 ) + s(x 0 )(u x 0 ) = 8(u 2.5) 0; f (x 0 ) + s(x 0 )(u x 0 ) = 70/2 + s(20/7)(u 20/7) u 0; where 1 s(20/7) 8; when 0 < x 0 < 20/7, f (x 0 ) + s(x 0 )(u x 0 ) = u 0. See Corollary 11.6(ii) for an example on R 2. Note that even if G f is continuous, it does not mean that G f is a cutter, e.g., see Example 2.5(ii). In [20], Cegielski developed a systematic theory for cutters. The theory of cutters can be used to study the class of functions (Theorem 5.11) whose subgradient projectors are cutters. One might also ask: If f : R n R has G f,s being a cutter, does g := f + r have G g,s being a cutter for every r R? In general, the answer is negative. When f is convex and lev 0 f =, it follows from Fact 5.12 that G f r,s is a cutter whenever r > 0. This might fail for r < 0 as the following example shows. Example 5.23 Let f : R R be defined by f (x) := e x 1. Then x 1 + e x if x > 0, G f,s (x) = x + 1 e x if x < 0, 0 if x = 0, is a cutter by Fact However, for g : R R : x e x, we have g = f + 1 and but G g,s is not a cutter by Example 5.13(1). For a nonconvex function function, although G f,s is a cutter, G f r,s might not be a cutter even when r > 0. Example 5.24 Let f be given by Example 5.22, and g := f 20/7. Then x if x 20/7, 20 G g,s (x) = 7 if 20/7 < x < 3, s 17/7, 20/7} if x = 3, 17/7 if x > 3. As shown in Example 5.22, G f,s is a cutter. However, G g,s is not a cutter by using Proposition 5.7 or by direct calculations using Definition

27 6 Convergence analysis of subgadient projectors In this section, we study the convergence of the sequence generated by the subgradient projector. When the function is convex, the convergence analysis has been fairly well known; see, e.g., [47, Section 5.3], [46], [9], and [20]. For nonconvex functions, we demonstrate that the convergence results on cutters, local cutters, quasi-ne mappings, and local quasi-ne mappings can be effectively used. It turns out that local cutters and local quasi-ne mappings are more appropriate for nonconvex functions. In addition to cutters and local cutters, see Definitions 5.1 and 5.4, quasi-nonexpansive mappings and local quasi-nonexpansive mappings are also useful for the convergence analysis. 6.1 Quasi-nonexpansive mappings and local quasi-nonexmapsive mappings According to [7, page 59], and [20, page 47], we define: Definition 6.1 Let D be a nonempty subset of R n and T : D R n. We say that (i) T is quasinonexpansive (quasi-ne) if ( x D)( y Fix T) Tx y x y. (ii) A mapping T : D D is said to be asymptotically regular at x D if T k+1 x T k x 0 as k ; it is said to be asymptotic regular on D if it is so at every x D. Definition 6.1(ii) requires that T satisfy the inequalities for all x D and y Fix T. In practice, the sets D and Fix T might be too large to verify those inequalities. We now introduce locally quasinonexpansive mappings. Definition 6.2 A mapping T : D R n is locally quasinonexpansive (locally quasi-ne) at x Fix T if there exists δ > 0 such that ( x B( x, δ) D) ( y B( x, δ) Fix T) Tx y x y. The connection between quasi-ne mappings and quasi-fne mappings is given by the following fact. Fact 6.3 ([9, Proposition 2.3(v) (vi)], [20, Corollary ]) Let D be a nonempty subset of R n, and T : D R n with Fix T =. Then the following are equivalent: (i) T is quasi-fne. (ii) 2T Id is quasi-ne. The following result says that quasi-ne, nonexpansiveness, and local quasi-ne are the same for linear mappings. Although the equivalence of quasi-ne and nonexpansiveness for linear mappings has been given in [7, Exercise 4.4], the equivalence to local quasi-ne is new. Proposition 6.4 Let T : R n R n be a linear operator. Then the following are equivalent: 27

28 (i) T is quasi-ne. (ii) T is nonexpansive. (iii) There exists δ > 0 and x Fix T such that T is quasi-ne on B( x, δ). Proof. (i) (ii). Since 0 Fix T, we have Tx x for every x R n. Hence T is nonexpansive. (ii) (i). Clear. (ii) (iii). Clear. (iii) (ii). The assumption means that there exists x Fix T and δ > 0 such that (44) ( x B( x, δ))( y B( x, δ) Fix T) Tx y x y. Let v B(0, δ). Using T x = x, y = x, and T being linear, from (44) we obtain Tv = T( x + v) T x = T( x + v) x ( x + v) x = v. Since v B(0, δ) was arbitrary and T is linear, we have Tv v for every v R n. Hence T is nonexpansive. Remark 6.5 Fact 6.3 and Proposition 6.4 hold in Hilbert spaces. We formulate them only in R n. The following example illustrates that for nonlinear T, quasinonexpanseness and nonexpanseness are different. Example 6.6 Define T : R R : x Then T is quasi-ne but not nonexpansive. x 2 sin 1 x if x = 0, 0 if x = 0. Proof. T is quasi-ne because that Fix T = 0} and ( x R) T(x) 0 = x 2 sin 1 x x 2 x. T is not nonexpansive because for x > 0, we have and T (1/(2nπ)) = nπ > 1. T (x) = 1 2 sin 1 x 1 2x cos 1 x For analogous results on linear cutters, see Proposition 9.1 in Section 9. Although we have developed calculus for G f being cutters in Section 5, most results also hold for quasi-ne mappings. We single out two most important ones. Theorem 6.7 Let k 1 and f : R n [0, + ) be locally Lipschitz, and let G f,s be given by Definition 2.2. Suppose that G f,s is quasi-ne and Fix G f,s =. If g = f k, then G g,k f k 1 s is quasi-ne. Proof. Apply Theorem 3.9 and [7, Exercise 4.11]. Theorem 6.8 Let A : R n R n be unitary and b R n, let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Define g : R n R : x f (Ax + b). If G f,s is quasi-ne, then G g,a s(a +b) is quasi-ne. 28

29 Proof. Apply Proposition 3.7 and Definition 6.1. Corollary 6.9 Let k 1 and f : R n [0, + ) be locally Lipschitz with G f,s be given by Definition 2.2. Suppose that g = f 2 and Fix G f,s =. Then G f,s is quasi-ne if and only if G g,2 f s is quasi-fne. Proof. By Theorem 3.9, G g,2 f s = G f,s+id 2. The result then follows from Fact Convergences of cutters, local cutters, quasi-ne mappings, and local quasi-ne mappings Proposition 6.10 (convergence of iterates of a cutter) Let D R n be a nonempty closed convex set, T : D D be an operator with a fixed point, and that T has the fixed point closed property on D. If T is a cutter, then for every x D, the sequence (T k x) k N converges to a point z Fix T. Proof. Since T is a cutter, T is quasi-fne by Fact 5.3(i), so quasi-ne. Moreover, T is asymptotically regular by [20, Theorem 3.4.3]. The result now follows from [20, Theorem 3.5.2]. Proposition 6.11 (convergence of iterates of a locally quasi-fne mapping) Let D be nonempty closed convex subset of R n, let T : D D and Fix T =. Assume that (i) There exists x Fix T and δ > 0 such that T is locally quasi-fne (see Definition 5.5); (ii) T has the fixed-point closed property. Let x 0 D B( x, δ). Set ( k N) x k+1 = Tx k. Then (x k ) k N converges to a point z B( x, δ) Fix T. Proof. By assumption (i), (45) ( x B( x, δ) D) ( y B( x, δ) Fix T) Tx y 2 + Tx x 2 x y 2. With x 0 D B( x, δ), equation (45) gives ( y B( x, δ) Fix T) Tx 0 y 2 + Tx 0 x 0 2 x 0 y 2, so x 1 x x 0 x δ. By induction, we have that (46) (x k ) k N is a sequence in B( x, δ). Moreover, equation (45) gives ( k N)( y B( x, δ) Fix T) Tx k y 2 + Tx k x k 2 x k y 2, so (x k ) k N is Fejér monotone with respect to C := B( x, δ) Fix T, and x k+1 x k 0 as k. Let x be a cluster point of (x k ) k N, say x kl x. Since Tx kl x kl 0, and T is fixed-point closed, we have Tx x = 0. Moreover, x x δ because of (46). Thus, x C. Applying [7, Theorem 5.5], we conclude that x k z C. 29

30 Proposition 6.12 (convergence of iterates of a locally quasi-ne mapping) Let D be nonempty closed convex subset of R n, let T : D D and int(fix T) =. Assume that (i) There exists x Fix T and δ > 0 such that T is locally quasi-ne (see Definition 6.2); (ii) int(b( x, δ) Fix T) = ; (iii) T has the fixed-point closed property. Let x 0 D B( x, δ). Set ( k N) x k+1 = Tx k. Then (x k ) k N converges to a point z B( x, δ) Fix T. Proof. By assumption (i), there exists δ > 0 such that (47) ( x B( x, δ) D) ( y B( x, δ) Fix T) Tx y x y ; With x 0 D B( x, δ), equation (47) gives ( y B( x, δ) Fix T) Tx 0 y x 0 y, so x 1 x x 0 x δ. By induction, we have that (48) (x k ) k N is a sequence in B( x, δ). Moreover, equation (47) gives ( k N)( y B( x, δ) Fix T) Tx k y x k y, so (x k ) k N is Fejér monotone with respect to C := B( x, δ) Fix T. As int C =, we have that x k z R n by [7, Proposition 5.10]. This implies that Tx k x k = x k+1 x k 0 and x k z as k. Since T is fixed-point closed, we have Tz z = 0. Moreover, z x δ because of (48). Hence x k z B( x, δ) Fix T. 6.3 Applications to subgradient projectors Theorem 6.13 Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f =. If the subgradient projector G f,s is a cutter, then for every x R n, the sequence (G f,s k x) k N converges to a point z such that either 0 f (z) or f (z) 0. Proof. Combine Theorem 4.9 and Proposition To proceed, it will be convenient to single out: Lemma 6.14 Let f : R n R be lsc and subdifferentiable, and G f,s be given by Definition 2.2. When f (x) > 0, 0 f (x), and y R n, we have (49) G f,s (x) y 2 = x y 2 + f (x) s(x) 2 ( f (x) + 2 y x, s(x) ). 30

31 Proof. This follows from G f,s (x) y 2 = x y f (x) s(x) 2 s(x) = x y 2 + f 2 (x) s(x) x y, f (x) s(x) 2 s(x). Theorem 6.15 Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and Then the following hold: (i) G f,s is quasi-ne if and only if S := x R n 0 f (x)} lev 0 f. (50) ( x S) ( y S) f (x) + 2 y x, s(x) 0. (ii) Assume that int S =, and (50) holds. Then for every x R n, the sequence (G f,s k x) k N converges to a point z S. Proof. (i). By Lemma 6.14, when x S, and y S, we have (51) G f,s (x) y 2 = x y 2 + f (x) s(x) 2 ( f (x) + 2 y x, s(x) ). In view of Definition 6.1, assumption (50) is equivalent to G f,s being quasi-ne. (ii). By (i), the sequence (G f,s k x) k N is Fejér monotone with respect to S. Since int S =, by [7, Proposition 5.10], the sequence (G f,s k x) k N converges to a point z R n. Write x k = G f,s k x. Then ( k N) x k+1 = G f,s (x k ). As f is locally Lipschitz at z and x k z, the sequence (s(x k )) k N is bounded. Since x k+1 x k = f (x k) s(x k ) 2 s(x k) and lim k x k = z, we have Hence z S. f (z) = lim k f (x k ) = lim k x k+1 x k s(x k ) = 0. Theorem 6.16 Let f : R n R be locally Lipschitz, and G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f =. Assume that the subgradient projector G f,s is locally quasi-fne at x S, i.e., there exists δ > 0 such that (52) ( x B( x, δ) \ S) ( y B( x, δ) S) f (x) + y x, s(x) 0. Then for every x 0 B( x, δ), the sequence (x k ) k N defined by ( k N) x k+1 = G f,s (x k ) converges to a point z B( x, δ) S. 31

32 Proof. (52) guarantees that G f,s is locally quasi-fne at x S. Indeed, when x S and y S B( x, δ), using Lemma 6.14, (52) and (13), we have (53) (54) (55) (56) G f,s (x) y 2 = x y 2 + f (x) ( ) f (x) + 2 y x, s(x) s(x) 2 = x y 2 + f 2 (x) 2( f (x) + y x, s(x) ) f (x) s(x) 2 f (x) ( ) f (x) x y 2 + G f (x) x 2 f (x) x y 2 G f,s (x) x 2. In view of Theorem 4.9, it suffices to apply Proposition Theorem 6.17 Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f with int S =. Assume that there exist x S and δ > 0 such that (57) ( x B( x, δ) \ S) ( y B( x, δ) S) f (x) + 2 y x, s(x) 0. Assume further that int(b( x, δ) S) =. Then for every x 0 B( x, δ), the sequence (x k ) k N defined by converges to a point z B( x, δ) S. ( k N) x k+1 = G f,s (x k ) Proof. (57) guarantees that G f,s is locally quasi-ne at x. Indeed, for every x B( x, δ) \ S and y B( x, δ) S, using Lemma 6.14 and (57) we have (58) (59) G f,s (x) y 2 = x y 2 + x y 2. f (x) ( ) f (x) + 2 y x, s(x) s(x) 2 By Theorem 4.9, G f,s has the fixed point closed property. Therefore, Proposition 6.12 applies. Example 6.18 (1). Define f : R R : x 1 x if x 1, 0 otherwise. Because Fix G f,s = x R x 1 } is not convex, we have that G f,s is not a cutter. However, f satisfies the assumptions of both Theorems 6.16 and 6.17 so that the local convergence theory applies. (2). Define 0 if x 0, x if 0 x 1, f : R R : x 1 if 1 x 2, x 1 if x 2. As Fix G f,s = (, 0] [1, 2], G f,s is not a cutter. However, both Theorems 6.16 and 6.17 apply. 32

33 6.4 Finite convergence and (C, ε)-firmly nonexpansiveness Finite termination algorithms for subgradient projectors of convex functions have been studied in [46, 29, 13]. Recently, in [43] Pang studied finite convergent algorithms of subgradient projectors of locally Lipschitz functions defined in terms of the Clarke subdifferential. Naturally, one asks what his result implies about the subgradient projector defined by us. To this end, let us recall lower-c k functions defined by Rockafellar and Wets [51, Definition 10.29], and approximate convex functions by Nghai, Luc, and Théra [41], respectively. Definition 6.19 A function f : O R, where O is an open subset in R n, is said to be lower C k on O if on some neighborhood V of each x O there is a representation f (x) := max t T f t (x) in which f t is of class C k on V and the index set T is compact such that f t (x) and all its partial derivatives through order k depend continuously not just on x V but jointly on (t, x) T V. Definition 6.20 A function f : R n R is approximately convex at x R n if for every ε > 0 there exists δ > 0 such that ( x, y B( x, δ))( λ (0, 1)) f (λx + (1 λ)y) λ f (x) + (1 λ) f (y) + ελ(1 λ) x y. Fact 6.21 (See [2, Theorem 4.5], [26, Corollary 3]) Let f : R n R be locally Lipschitz at x. Then the following are equivalent: (i) f is lower-c 1 around x. (ii) f is approximately convex at x. (iii) for every ε > 0 there exists δ > 0 such that ( x, y B( x, δ))(x c f (x)) f (y) f (x) + x, y x ε x y. Theorem 6.22 (finite convergence for accelerated subgradient projectors) Let f : R n R be locally Lipschitz, and let x R n satisfy (i) f ( x) = 0; (ii) 0 f ( x); (iii) f is lower-c 1 around x. Suppose that the strictly decreasing sequence (ε k ) k N converges to 0 at a sublinear rate. Then there exist δ > 0 and ε > 0 such that for every x 0 B( x, δ) and ε 0 < ε, the sequence (x k ) k N defined by (60) ( k N) x k+1 = x k ε k + f (x k ) s k 2 s k, where s k f (x k ), converges in finitely many iterations, i.e., f (x k ) 0 for some k N. Proof. Since f is lower-c 1 around x, f is Clarke regular around x; see [51, Theorem 10.31]. Thus, the Clarke subdifferential and the limiting subdifferential of f are the same around x. Because f is upper semicontinuous, when δ is sufficiently small, (ii) guarantees that 0 f (x) for every x B( x, δ), which implies that (60) is well defined. By Fact 6.21, (iii) is equivalent to f being approximately convex at x. The result then follows from [43, Theorem 3]. 33

34 Remark 6.23 See [2, 41, 52] for more characterizations on lower-c 1 functions and approximately convex functions. Let ε 0 and C R n. In [30], Hesse and Luke studied (C, ε)-firmly nonexpansive mappings; see also [37]. Definition 6.24 Let C, D be nonempty subsets of R n and T : D R n. T is called (C, ε)-firmly nonexpansive if ( x D)( y C) Tx Ty 2 + (x Tx) (y Ty) 2 (1 + ε) x y 2. Theorem 6.25 ((C, ε)-firmly nonexpansivness of G f,s ) Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f. Suppose that x R n satisfies (i) f ( x) = 0; (ii) 0 f ( x); (iii) f is lower-c 1 around x. Then for every ε > 0 there exists δ > 0 such that on B( x, δ) the subgradient projector G f,s is (S B( x, δ), ε)-firmly nonexpansive, in which ε = 1 + 8Lε/d f ( x) (0) 2 and L being the Lipschitz modulus of f around x. Proof. Let α := d f ( x) (0)/2. Then α > 0 by (ii). For every ε > 0, we can find δ > 0 such that (61) f (y) f (x) + x, y x ε x y, when x, y B( x, δ), x f (x). This follows from (iii) and Fact s(x) α whenever s(x) f (x) and x B( x, δ). This is because that f ( x) is compact, f is upper semicontinuous, and (ii). f (x) f (y) L x y whenever x, y B( x, δ). This is possible because f is locally Lipschitz around x. Since 0 f (y) for y B( x, δ), we must have f (y) 0 if y S B( x, δ)). Thus, (61) gives (62) ( x B( x, δ))( y S B( x, δ))( x f (x)) f (x) + x, y x ε x y. Put C := S B( x, δ). When f (x) > 0, 0 f (x), and y S B( x, δ), using Lemma 6.14, (62), and (13), we have (63) (64) G f,s (x) y 2 = x y 2 + f (x) ( ) f (x) + 2 y x, s(x) s(x) 2 = x y 2 + f 2 (x) 2( f (x) + y x, s(x) ) f (x) s(x) 2 f (x) 34

35 (65) (66) (67) (68) (69) This completes the proof. x y f (x) ( ) f (x) s(x) 2 ε y x + G f,s(x) x 2 f (x) x y 2 2( f (x) f (y)) + α 2 ε x y G f,s (x) x 2 x y 2 2L x y + α 2 ε x y G f,s (x) x 2 = x y 2 + 2Lε α 2 x y 2 G f,s (x) x 2 (1 + ε) x y 2 G f,s (x) x 2. Remark 6.26 Observe that both Theorems 6.22 and 6.25 aim for solving nonconvex inequality problems, e.g., finding a point x such that f (x) 0 with f (x) = e x2 + 1/2 and f satisfying the assumptions at x = ln 2. However, they do not apply to f = d C. This completes Part I. In Part II, we will study subgradient projectors of Moreau envelopes, and their connections to subgradient projectors of original functions. Part II Subgradient projectors of Moreau envelopes and characterizations 7 Subgradient projectors of Moreau envelopes When f : R n (, + ] is lsc, f (x) might be empty for some x R n. However, e λ f has much better properties when f is prox-bounded, see, e.g., Fact 7.5 below; and this is the motivation for us to study subgradient projectors of Moreau envelopes below. To do this, we need to study the relationship between G eλ f and G f,s. Recall that for a proper, lsc function f : R n (, + ] and parameter value λ > 0, the Moreau envelope e λ f and proximal mapping P λ f are defined respectively by e λ f : R n (, + ] : x inf f (w) + 1 } x w 2, and w 2λ P λ f : R n R n : x argmin w f (w) + 1 x w 2 2λ When f is proper, lsc, and convex, we refer the reader to [7, Chapter 12] and [50] for the properties e λ f and P λ f. When f is a proper and lsc function, not necessarily convex, in [45] Poliquin and Rockafellar coined the notions of prox-boundedness and prox-regularity of functions; see also [51, page 610]. Definition 7.1 (i) A function f : R n (, + ] is prox-bounded if there exists λ > 0 such that e λ f (x) > for some x R n. The supremum of the set of all such λ is the threshold λ f of prox-boundedness for f. 35 }.

36 (ii) A function f : R n (, + ] is prox-regular at x for v if f is finite and locally lsc at x with v f ( x), and there exists ε > 0 and ρ 0 such that f (x ) f (x) + v, x x ρ 2 x x 2 for all x x ε when v f (x), v v < ε, x x < ε, f (x) < f ( x) + ε. When this holds for all v f ( x), f is said to be prox-regular at x. We give a simple example to illustrate the concepts of prox-regularity and prox-bounded of functions. Example 7.2 (1). The function f : R R : x x is prox-bounded with λ f = +. However, f is not prox-regular at x = 0. (2). The function f : R R : x x 3 is prox-regular on R. However, f is not prox-bounded. (3). The function f : R R : x x 2 /2 is prox-regular on R, and prox-bounded with λ f = 1. (4). The function f : R R : x x 3 x not prox-regular at x = 0, and not prox-bounded. In the sequel, we shall also need the following key concepts. Definition 7.3 ([51, page 614]) Let O be a nonempty open subset of R n. We say that f : O R is C 1+ if f is differentiable with f Lipschitz continuous. Set q : R n R : x x 2 /2. Definition 7.4 ([51, page 567]) A proper, lsc function f : R n (, + ] is µ-hypoconvex for some µ > 0 if f + µ 1 q is convex. 7.1 Fine properties of prox-regular functions Two major facts about the Moreau envelopes of prox-bounded functions and prox-regular functions are: Fact 7.5 ([51, Example 10.32]) Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Then for every λ (0, λ f ), the function e λ f is lower C 2, hence semidifferentiable, locally Lipschitz, Clarke regular, and [ e λ f ](x) = λ 1 [conv P λ f (x) x], = [e λ f ](x) λ 1 [x P λ f (x)]. Fact 7.6 ([5, Proposition 5.3], [51, Proposition 13.37]) Let f : R n (, + ] be lsc, proper, and prox-bounded with threshold λ f. Suppose that f is prox-regular at x for v f ( x). Then for all λ (0, λ f ) there is a neighborhood U λ of x + λ v for which the following equivalent properties hold: (i) e λ f is C 1+ on U λ. (ii) P λ f is nonempty, single-valued, monotone and Lipschitz continuous on U λ. Further, e λ f = (Id P λ f )/λ on U λ. 36

37 Proposition 7.7 Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Then for every λ (0, λ f ), one has dom P λ f = R n. Consequently, ran(id +λ f ) = R n. Proof. As 0 < λ < λ f, we have dom P λ f = R n. To complete the proof, it suffices to apply [51, Example 10.2]: P λ f (Id +λ f ) 1. Proposition 7.8 (global prox-regularity implies hypoconvexity) Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Suppose that f is prox-regular on R n. Then for every λ (0, λ f ), the following hold: (i) The function f + λ 1 q is convex. (ii) P λ f = (Id +λ f ) 1 is single-valued and Lipschitz continuous on R n. (iii) e λ f = (Id P λ f )/λ. Proof. When λ (0, λ f ), we have dom P λ f = R n. Since f is prox-regular, by Fact 7.6, for v f (x) there exists an open neighborhood U λ of x + λv such that P λ f is single-valued and locally Lipschitz. Proposition 7.7 implies that P λ f is single-valued and locally Lipschitz on R n. As P λ f is always monotone, cf. [51, Proposition 12.19], P λ f is maximally monotone by [51, Example 12.7]. Then (i) and (ii) follow from [51, Proposition 12.19]. To obtain (iii), one can apply Fact 7.6(ii).. Proposition 7.8(i) immediately implies: Corollary 7.9 Let f : R n (, + ] be proper, lsc, and prox-bounded. Suppose that f is prox-regular on R n. Then the function f is a difference of two convex functions. Characterizations of prox-regularity on an open subset is given by Fact 7.10 ([51, Theorem 10.33], [51, Proposition 13.33]) Let f : O R, where O is a nonempty open set in R n. The following are equivalent: (i) The function f is lower C 2 on O. (ii) Relative to some neighborhood of each point of O, there is an expression f = g ρ q in which g is finite, convex function, and ρ > 0. (iii) f is prox-regular and locally Lipschitz on O. Corollary 7.11 Let f : R n (, + ] and let O be a nonempty open subset of R n. Suppose that f is prox-regular and locally Lipschitz on O. Then for every compact convex subset S of O, there exists ρ > 0 such that f + ρ q is convex on S. Proof. Let x S. By Fact 7.10, there exists an open ball B(x, δ x ) O and ρ x such that f + ρ x q is convex on B(x, δ x ). Select from the covering of S by various balls B(x, δ x ) a finite covering, say B(x i, δ xi ) with i = 1,..., m. Let ρ := maxρ x1,..., ρ xm }. As f + ρ q is convex on each B(x i, δ xi ), and S B(x i, δ xi ), we obtain that f + ρ q is convex on S. 37

38 7.2 Relationship among ( e λ f ) 1 (0), Fix P λ f and ( f ) 1 (0) Proposition 7.12 Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Then for every λ (0, λ f ), the following hold: (i) For every α R, the level set lev α f = if and only if lev α (e λ f ) =. Moreover, lev α (e λ f ) lev α f. (ii) 0 e λ f (x) x P λ f (x) 0 f (x). (iii) If, in addition, f is prox-regular at x for v f ( x), then on a neighborhood U λ of x + λ v one has Proof. When f is prox-regular on R n, one has 0 = e λ f (x) x = P λ f (x). ( x R n ) 0 = e λ f (x) x = P λ f (x) 0 f (x). (i). Since inf f = inf e λ f and argmin f = argmin e λ f by [51, Example 1.46], lev α f = if and only if lev α (e λ f ) = for every α R. The inclusion follows from e λ f f. (ii). By Fact 7.5, we have e λ f (x) λ 1 [x P λ f (x)]. This gives the first implication. By [51, Example 10.2], P λ f (x) (Id +λ f ) 1 (x) for all x R n. The second implication follows. (iii). By Fact 7.6 or [45, Theorem 4.4], the Moreau envelope e λ f is C 1+ on a neighborhood U λ of x + λ v with e λ f = λ 1 [Id P λ f ] on U λ. When f is prox-regular on R n, one has e λ f = λ 1 [Id P λ f ], and P λ f = (Id +λ f ) 1 is single-valued on R n by Proposition 7.8. Fact 7.13 [51, Proposition 12.19] For a proper, lsc function f : R n (, + ], assume that f is µ-hypoconvex for some µ > 0. Then P µ f = (Id +µ f ) 1, and for all λ (0, µ) the mapping P λ f = (Id +λ f ) 1 is Lipschitz continuous with constant µ/[µ λ]. Under the assumption of f being µ-hypoconvex for some µ > 0, when λ > 0 is sufficiently small e λ f gives rise to a smooth regularization of f. Proposition 7.14 For a proper, lsc function f : R n (, + ], assume that f is µ-hypoconvex for some µ > 0. Then for every λ (0, µ), the following hold: (i) e λ f is C 1+ and e λ f = λ 1 (Id P λ f ) on R n. (ii) e λ f (x) = 0 0 f (x). Proof. As f is µ-hypoconvex, f is prox-regular and prox-bounded. By Fact 7.13 and Fact 7.6, e λ f = λ 1 [Id P λ f ] = λ 1 [Id (Id +λ f ) 1 ]. Remark 7.15 Proposition 7.14(ii) can also been obtained from [33, Theorem 4.4], in which the authors study the Bregman envelope and proximal mapping of proper, lsc, and prox-bounded functions. 38

39 Proposition 7.16 For a proper, lsc function f : R n (, + ], assume that f := max f 1,..., f m } with f i being C 2 and that f is prox-bounded below. Then for every λ > 0 sufficiently small, one has 0 = e λ f (x) x = P λ f (x) 0 f (x). Proof. By [51, Proposition 13.33] or [45, Example 2.9], f is prox-regular everywhere on R n. By Proposition 7.8, P λ f = (Id +λ f ) 1. By Fact 7.6, we have e λ f = λ 1 [Id P λ f ]. It remains to apply Proposition 7.12(iii), and P λ f = (Id +λ f ) 1 being single-valued. For a sequence C k } k N of subsets of R n, its limit and outer limit are denoted respectively by lim k C k and lim sup k C k ; see [51, page 109]. Proposition 7.17 Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f > 0. Assume that C R n is nonempty and closed. Then for every α R one has lim λ 0 lev α (e λ f + ι C ) = lev α ( f + ι C ). Proof. In view of [51, Theorem 7.4(d)], e λ f + ι C converges epigraphically to f + ι C when λ 0. By [51, Proposition 7.7], for every α R there exists α λ α such that lim λ 0 lev αλ (e λ f + ι C ) = lev α ( f + ι C ). Since lev α ( f + ι C ) lev α (e λ f + ι C ) lev αλ (e λ f + ι C ), we obtain lim λ 0 lev α (e λ f + ι C ) = lev α ( f + ι C ). 7.3 The subgradient projector of e λ f The following result extends [12, Proposition 3.1(viii)] and [11, Example 4.9(ii)] from convex functions to possibly nonconvex functions. Theorem 7.18 (subgradient projector of Moreau envelopes of a prox-regular function) Suppose that f : R n (, + ] is proper, lsc, and prox-bounded with threshold λ f, and that f is prox-regular. Then for every λ (0, λ f ), the subgradient projector of e λ f is given by x e λ λ f (x) G eλ f : R n R n (x P : x x P λ f (x) 2 λ f (x)) x if e λ f (x) > 0 and x = P λ f (x), otherwise, and Fix G eλ f = lev 0 (e λ f ) x R n x = P λ f (x)}. When x = P λ f (x), we have 0 f (x). Moreover, lim λ 0 lev 0 (e λ f ) = lev 0 f. Proof. Apply Propositions 7.12, 7.8, and 7.17 with C = R n and α = 0. The restriction of G f to a subset D R n is denoted by G f D and is the operator defined by G f D : D R n, G f D (x) = G f (x) for every x D. Theorem 7.19 (functions being prox-regular at the critical point) Suppose that f : R n (, + ] is proper, lsc, and prox-bounded with threshold λ f, and that f is prox-regular at x for 0 f ( x). Then for every λ (0, λ f ), there exists a closed neighborhood U λ of x for which x e λ λ f (x) (x P (70) ( x U λ ) G eλ f Uλ (x) = x P λ f (x) 2 λ f (x)) x if e λ f (x) > 0 and x = P λ f (x), otherwise, 39

40 (71) Fix G eλ f Uλ = ( lev 0 (e λ f ) x R n x = P λ f (x)} ) U λ, and (72) x = P λ f (x) 0 f (x). Moreover, lim sup λ 0 ( lev0 (e λ f ) U λ ) lev0 f. Proof. Apply Fact 7.6 and Proposition 7.12(iii) to obtain (70) (72). Since ( lev0 (e λ f ) U λ ) lev0 (e λ f ), and lim lev 0 (e λ f ) = lev 0 (e 1/k f ) λ 0 k 1 by [51, Exercise 4.3(b)], it suffices to use Proposition 7.17 with C = R n and α = 0. Theorem 7.20 (subgradient projector of Moreau envelopes of a hypoconvex function) Suppose that f : R n (, + ] is proper and lsc, and that f is µ-hypoconvex for some µ > 0. Then for every λ (0, µ), the subgradient projector of e λ f is given by x e λ λ f (x) G eλ f : R n R n (x P : x x P λ f (x) 2 λ f (x)) x if e λ f (x) > 0 and 0 f (x), otherwise, Moreover, lim λ 0 lev 0 (e λ f ) = lev 0 f. Fix G eλ f = lev 0 (e λ f ) x R n 0 f (x)}, and x R n 0 = e λ f (x)} = x R n 0 f (x)}. Proof. Apply Propositions 7.14 and 7.17 with C = R n and α = 0. Theorems 7.18 and 7.20 imply that if one can solve x λ Fix(G eλ f ), then either 0 f (x λ ) for some λ > 0 or the subsequential limits of (x λ ) will lie in lev 0 f when λ 0. Remark 7.21 Moreau envelopes of nonconvex functions in infinite dimensional spaces have been intensively studied; see, e.g., [5, 6, 32, 3]. Thus, it is possible to have analogues of Theorems 7.18, 7.19, 7.20 in infinite dimensional spaces. However, this is beyond the scope of this paper. Cutters are important for studying convergence of iterative methods; see, e.g., [9, 20, 13]. It is natural to ask whether G eλ f is a cutter in the case that G f is a cutter. Although we cannot answer this in general, the following special case is true. Proposition 7.22 Let f : R n (, + ] be proper, lsc, and prox-regular. Suppose that min f = 0, f is strictly differentiable at every x argmin f, and that 0 f (x) for every x R n \ argmin f. Then for every λ > 0 the following hold: (i) Fix G eλ f = Fix G f. (ii) If G f,s is a cutter for every selection s of f, then G eλ f is a cutter. 40

41 Proof. As min f >, the function f is prox-bounded with threshold r f = +. (i). Note that min f = min e λ f, argmin f = argmin e λ f. The assumption min f = 0 implies lev 0 f = lev 0 e λ f = argmin f. Because f is prox-regular on R n and r f = +, for every λ > 0 we have e λ f = λ 1 (Id Prox λ f ) and Prox λ f = (Id +λ f ) 1 being single-valued by Proposition 7.8. This gives x R n e λ f (x) = 0 } = x R n 0 f (x) }. Then Fix G eλ f = lev 0 e λ f x R n e λ f (x) = 0 } = lev 0 f x R n 0 f (x) } = Fix G f. (ii). Assume that e λ f (x) > 0 and 0 = e λ f (x). Since f is prox-regular, e λ f (x) = λ 1 (x Prox λ f (x)) = 0. By the definition of Prox λ f, we have (73) 0 = λ 1 (x Prox λ f (x)) f (Prox λ f (x)), which implies that Prox λ f (x) argmin f. Indeed, if Prox λ f (x) argmin f, the assumption gives f (Prox λ f (x)) = 0} which contradicts (73). Thus, f (Prox λ f (x)) > 0. Because Prox λ f (x) argmin f, the assumption also gives 0 f (Prox λ f (x)). These arguments, (i), Theorem 5.11, and G f,s being a cutter altogether imply that (74) f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 0 if u Fix G f,s, e λ f (x) > 0 and e λ f (x) = 0. Now we show that G eλ f is a cutter. Let u Fix G eλ f, e λ f (x) > 0, and e λ f (x) = 0. In view of (74) and (i), we calculate e λ f (x) + λ 1 (x Prox λ f (x)), u x = e λ f (x) + λ 1 (x Prox λ f (x)), u Prox λ f (x) + λ 1 (x Prox λ f (x)), Prox λ f (x) x = f (Prox λ f (x)) + 1 2λ x Prox λ f (x) 2 + λ 1 (x Prox λ f (x)), u Prox λ f (x) λ 1 x Prox λ f (x)) 2 = f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 1 2λ x Prox λ f (x)) 2 1 2λ x Prox λ f (x)) 2 0. Theorem 5.11(i) concludes the proof. A local version of Theorem 7.22 comes as follows. Proposition 7.23 Let f : R n (, + ] be proper, lsc, and prox-regular at x for v = 0, and let S := x R n 0 f (x) } lev 0 f. Suppose that min f = 0, and there exists δ > 0 such that (i) For every selection s of f, G f,s is a cutter on B( x, δ), i.e., ( x B( x, δ) \ S)( u S B( x, δ)) f (x) + s(x), u x 0. 41

42 (ii) f is strictly differentiable at every u argmin f B( x, δ), and that 0 f (x) for every x (R n \ argmin f ) B( x, δ). Then for every λ > 0 there is a neighborhood of x on which G eλ f is a cutter. Proof. Because min f >, the function f is prox-bounded with threshold r f = +. Since min f = min e λ f and argmin f = argmin e λ f, the assumption min f = 0 implies lev 0 f = lev 0 e λ f = argmin f. Because f is prox-regular at x for v = 0, and r f = +, by Proposition 7.6 for every λ > 0 there exists δ > δ 1 > 0 such that on B( x, δ 1 ) the proximal mappings (75) P λ f is Lipschitz continuous, P λ f ( x) = x, and (76) e λ f = λ 1 (Id Prox λ f ). By (75) there exists δ 1 > δ 2 > 0 such that (77) P λ f (x) B( x, δ 1 ) when x B( x, δ 2 ). Claim 1. For every u Fix G f,s B( x, δ 2 ) we have (78) f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 0 if e λ f (x) > 0, e λ f (x) = 0 and x B( x, δ 2 ). Indeed, let e λ f (x) > 0 and 0 = e λ f (x) and x B( x, δ 2 ). In view of (76), e λ f (x) = λ 1 (x Prox λ f (x)) = 0. By the definition of Prox λ f or [45, Proposition 4.3(b)], we have (79) 0 = λ 1 (x Prox λ f (x)) f (Prox λ f (x)). This implies that Prox λ f (x) argmin f. Suppose to the contrary that Prox λ f (x) argmin f. Then the assumption (ii) and (77) give f (Prox λ f (x)) = 0} which contradicts (166). Thus, (80) f (Prox λ f (x)) > 0. Because Prox λ f (x) argmin f and (77), the assumption (ii) also ensures (81) 0 f (Prox λ f (x)). Therefore, (78) follows from assumptions (i) and (ii). Claim 2. G eλ f is a cutter on B( x, δ 2 ). To this end, let u Fix G eλ f B( x, δ 2 ), x B( x, δ 2 ), e λ f (x) > 0, and e λ f (x) = 0. Then u Fix G f B( x, δ 2 ), f (Prox λ f (x)) > 0, 0 f (Prox λ f (x)) by (80), (81). Using (78) we calculate e λ f (x) + λ 1 (x Prox λ f (x)), u x = e λ f (x) + λ 1 (x Prox λ f (x)), u Prox λ f (x) + λ 1 (x Prox λ f (x)), Prox λ f (x) x = f (Prox λ f (x)) + 1 2λ x Prox λ f (x) 2 + λ 1 (x Prox λ f (x)), u Prox λ f (x) 42

43 λ 1 x Prox λ f (x)) 2 = f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 1 2λ x Prox λ f (x)) 2 0. Hence, G eλ f is a cutter on B( x, δ 2 ) by Theorem 5.11(ii). 1 2λ x Prox λ f (x)) 2 Is it possible that G eλ f is a cutter for every λ > 0 but G f is not a cutter? This is partially answered by the following result. Proposition 7.24 Let f : R n R be C 2 and prox-bounded below. If G eλ f is a cutter for all sufficiently small λ > 0, then G f is a cutter. Proof. Let λ > 0 be sufficiently small. Proposition 7.8 yields that e λ f is C 1+. Write S := x R n 0 f (x) } lev 0 f, S λ := x R n e λ f (x) = 0 } lev 0 e λ f. Using that e λ f (x) = 0 0 f (x) by Proposition 7.16 and that e λ f f, we have S S λ. Since G eλ f is a cutter, by Theorem 5.11 we obtain S λ u R n eλ f (x) + e λ f (x), u x 0 } whenever e λ f (x) > 0 and e λ f (x) = 0. It follows that (82) S u R n e λ f (x) + e λ f (x), u x 0 } whenever e λ f (x) > 0 and e λ f (x) = 0. By [3, Theorem 3.10] or [32, Theorem 5.1], (83) f (x) = lim sup e λm f (x m ) m in which x m x, e λm f (x m ) f (x), λ m 0. Whenever f (x) > 0, f (x) = 0, (83) implies that for sufficiently large m, it holds that e λm f (x m ) > 0 and e λm f (x m ) = 0. Then by (82), Passing to the limit when m, we have ( u S) e λm f (x m ) + e λm f (x m ), u x m 0. ( u S) f (x) + f (x), u x 0. Hence, G f is a cutter by using Theorem 5.11 again. 7.4 The subgradient projector of d C when C is prox-regular at a point In this subsection, instead of functions we shall consider sets which are prox-regular at some points. Recall that a set C R n is prox-regular at x C for v N C ( x) when ι C is prox-regular at x for v; see [51, Exercise 13.31]. Example 7.25 Let C R n be closed and x C. If C is prox-regular at x for v = 0, then there exists a neighborhood U of x on which 43

44 (i) P C is single-valued and Lipschitz; (ii) P C = (Id +T) 1 for some localization T of N C around ( x, 0); (iii) d C is strictly differentiable on U \ C with d C = Id P C d C ; (iv) G dc = P C ; (v) G d 2 C = Id +P C 2. Proof. (i), (ii), and (iii) are given in [51, page 618]. To see (iv), let x U \ C. Since d C (x) > 0 and d C (x) = x P C(x) d C (x) = 0, we have G dc (x) = x d C(x) d C (x) 2 d C(x) = x (x P C (x)) = P C (x). When x U C, G dc (x) = x = P C (x). (v) follows from (iv) and Theorem 3.9. Remark 7.26 Sets which satisfy the assumption on C in Theorem 7.25 include convex sets, strongly amenable sets, etc; see, e.g., [51, page 442]. See also [1] for recent advances on proxregular sets and uniformly prox-regular sets. According to Example 7.25(iv), when C is prox-regular at x for v = 0, we have G dc = P C around a neighborhood of x. What happens if, in addition, G dc is a cutter or quasi nonexpansive on the neighborhood? Proposition 7.27 Let C R n be closed and x C, and let C be prox-regular at x for v = 0. Suppose that there exists δ > 0 such that one of the following holds: (i) P C is a cutter on B( x, δ), i.e., (84) ( x B( x, δ))( u C B( x, δ)) x P C (x), u P C (x) 0. (ii) P C is quasi-ne on B( x, δ), i.e., (85) ( x B( x, δ))( u C B( x, δ)) P C (x) u x u. Then C B( x, δ) is convex. Proof. By Example 7.25, there exists δ > 0 such that P C is single-valued and Lipschitz on the closed ball B( x, δ). (i). Assume that (84) holds. On the one hand, (84) gives that C B( x, δ) x B( x,δ) u B( x, δ) x P C (x), u P C (x) 0 }. 44

45 On the other hand, let y x B( x,δ) u B( x, δ) x P C (x), u P C (x) 0 }. Then y B( x, δ) and x P C (x), y P C (x) 0 for every x B( x, δ). Taking x = y we have y P C (y), y P C (y) 0, which implies y = P C (y), so y C. Therefore, y B( x, δ) C. Hence C B( x, δ) = u B( x, δ) x P C (x), u P C (x) 0 }, x B( x,δ) and consequently C B( x, δ) is a convex set. (ii). Similar arguments as (i) show that C B( x, δ) = u B( x, δ) P C x u x u }. x B( x,δ) To finish the proof, it suffices to observe that in the Euclidean space R n, for every x, y R n the set u R n y u x u } is a half space when x = y, and the whole space R n if x = y. Proposition 7.28 Let C R n be closed and x C. If there exists δ > 0 such that C B( x, δ) is convex, then there exists δ 1 > 0 such that P C is a cutter on B( x, δ 1 ). Consequently, P C is a quasi-ne on B( x, δ 1 ). Proof. Observe that for p P C (x), p x p x + x x 2 x x. Thus, we can choose 0 < δ 1 < δ sufficiently small, e.g., δ 1 < δ/2, such that x x < δ 1 implies ( p P C (x)) p x < δ. This implies that whenever x B( x, δ 1 ), we have P C (x) C B( x, δ). Then (86) ( x B( x, δ 1 )) P C (x) = P C B( x,δ) (x). Because C B( x, δ) is closed and convex, P C B( x,δ) is firmly nonexpansive on R n. From (86), we have that P C is firmly nonexpansive on B( x, δ 1 ), that is, It follows that ( x, y B( x, δ 1 )) P C (x) P C (y) 2 + (Id P C )(x) (Id P C )(y) 2 x y 2. ( x B( x, δ 1 ))( y C B( x, δ 1 )) P C (x) y 2 + x P C (x) 2 x y 2. Hence P C is a cutter on B( x, δ 1 ). 8 Characterization of subgradient projectors of convex functions Subgradient projectors of convex functions are quasi-fne, so algorithms developed in [20] or [7] can be applied; see also Theorem Therefore, in practice, it is useful to have available some results on whether a mapping is a subgradient projector of a convex function. This is the goal of this section. The results in this section provide some checkable conditions for convergence of iterated subgradient projectors in Section 6. The following result is of independent interest. 45

46 Proposition 8.1 Let C R n be closed and convex. Assume that the function f : R n \ C R satisfies (i) f 0 on R n \ C; (ii) f is convex on every convex subsets of R n \ C; (iii) Whenever x bdry(r n \ C), one has lim y x f (y) = 0. That is, lim i f (y i ) = 0 whenever y R n \C (y i ) i N is a sequence in R n \ C converging to a boundary point x of R n \ C. Define Then g is convex on R n. g : R n R : x f (x) if x C, 0 if x C. Proof. Let x, y R n, 0 λ 1. We need to show (87) g(λx + (1 λ)y) λg(x) + (1 λ)g(y). We consider three cases. (i). If [x, y] R n \ C, g = f is convex on [x, y] by the assumption. (ii). If λx + (1 λ)y C, then since g(x), g(y) 0. g(λx + (1 λ)y) = 0 λg(x) + (1 λ)g(y) (iii). λx + (1 λ)y C and [x, y] C =. In particular, x, y cannot both be in C. We consider two subcases. Subcase 1. x C and y C. As y C, there exists z bdry(c) such that Because and f is convex on [z, y], we have λx + (1 λ)y [z, y] X \ C and f (z) = 0. λx + (1 λ)y = αz + (1 α)y for some 0 α 1, (88) f (λx + (1 λ)y) = f (αz + (1 α)y) α f (z) + (1 α) f (y) = (1 α) f (y). Now z = βx + (1 β)y for some 0 β 1, and λx + (1 λ)y = αz + (1 α)y = α(βx + (1 β)y) + (1 α)y = (αβ)x + (1 αβ)y give λ = αβ. Therefore, by (88), g(x) = 0 and g(y) = f (y) 0, (89) (90) g(λx + (1 λ)y) = f (λx + (1 λ)y) (1 αβ) f (y) = (1 λ)g(y) + λg(x), 46

47 which is (87). Subcase 2. x C and y C. By the assumption, there exists z bdry(c) such that λx + (1 λ)y [z, y] or λx + (1 λ)y [x, z], say λx + (1 λ)y [z, y]. Then λx + (1 λ)y = αz + (1 α)y for some 0 α 1. As f is convex on [z, y], f (z) = 0, (91) g(λx + (1 λ)y) = f (αz + (1 α)y) α f (z) + (1 α) f (y) = (1 α) f (y). Now z = βx + (1 β)y for some 0 β 1, and λx + (1 λ)y = αz + (1 α)y = α(βx + (1 β)y) + (1 α)y = (αβ)x + (1 αβ)y give λ = αβ. Then by (91), using g(x) = f (x) 0, g(y) = f (y) 0, we obtain (92) (93) (94) g(λx + (1 λ)y) (1 αβ) f (y) = (1 λ) f (y) (1 λ) f (y) + λ f (x) = (1 λ)g(y) + λg(x), which is (87). Combining (i) (iii), we conclude that g is convex on R n. Theorem 8.2 Let T : R n R n and C := x R n Tx = x } be closed convex. Then T is a subgradient projector of a convex function f : R n R with lev 0 f = C if and only if there exists g : R n [, + ) such that g : R n \ C R is locally Lipschitz, g(x) = for every x C, and (i) for every x R n \ C, x Tx x Tx 2 g(x); (ii) the function defined by f (x) := exp(g(x)) if x C, 0 if x C, is convex. In this case, T = G f. Proof. : Assume that T is a subgradient projector, say T = G f1 with f 1 : R n R being convex and lev 0 f 1 = C. Then f = max0, f 1 } is convex and G f = G f1. Put g = ln f and C = lev 0 f. Since f is locally Lipschitz, g is locally Lipschitz on R n \ C. Note that g(x) = ( f (x))/ f (x) when f (x) > 0. Apply Theorem 4.1(i) to obtain (i). : Assume that (i), (ii) hold. When x C, (i) and (ii) give x Tx = 1 c(x), f (x) g(x) = f (x) 47

48 where c(x) g(x). Using (i) again, we have (95) Tx = x x Tx 2 c(x) = x c(x) c(x) 2 = G f (x) by Theorem 4.1(ii). Moreover, when x C, Tx = x = G f (x). Hence T = G f. For an n n symmetric matrix A, by A 0 we mean that A is positive semidefinite. Theorem 8.3 Let T : R n R n and C := x R n Tx = x }. Suppose that C is closed and convex, and T is continuously differentiable on R n \ C. Define T 1 : R n \ C R n : x x Tx x Tx 2. Then T is a subgradient projector of a convex function f : R n R with lev 0 f = C and being differentiable on R n \ C if and only if (i) For every x R n \ C, the matrix T 1 (x)(t 1 (x)) + T 1 (x) 0; (ii) There exists a function g : R n [, + ) such that ( x bdry(c)) ( x R n \ C) g(x) = T 1 (x), lim y x g(y) =, and ( x C) g(x) =. y R n \C Proof. : Assume that T = G f with f being convex and lev 0 f = C. Theorem 4.10 shows that f is continuously differentiable on R n \ C. By Theorem 4.1(i), we can put g = ln f to obtain (ii). Moreover, as f = exp(g), thanks to (16) in Theorem 4.1, for every x C we have f (x) = e g(x) g(x) = e g(x) T 1 (x), 2 f (x) = e g(x) T 1 (x)(t 1 (x)) + e g(x) T 1 (x) = e g(x)( T 1 (x)(t 1 (x)) + T 1 (x) ). Since f is convex, 2 f (x) 0, and this is equivalent to which is (i). T 1 (x)(t 1 (x)) + T 1 (x) 0 : Assume that (i) and (ii) hold. Put f = exp(g). Then lev 0 f = C, and for x R n \ C, f (x) = e g(x) g(x) = e g(x) T 1 (x), 2 f (x) = e g(x) T 1 (x)(t 1 (x)) + e g(x) T 1 (x) = e g(x)( T 1 (x)(t 1 (x)) + T 1 (x) ). (i) and (ii) imply that f is differentiable and convex on convex subsets of R n \ C, and f 0 on C. By Proposition 8.1, f is convex on R n. Moreover, when x = Tx we have (96) (97) G f (x) = x = x ( ( ) 2 f (x) f (x) = x T 1(x) f (x) f (x) T 1 (x) 2 x Tx x Tx 2 1 x Tx ) 2 = x (x Tx) = Tx. 48

49 Corollary 8.4 Let T : R R and C := x R Tx = x }. Suppose that C is a closed interval, and T is continuously differentiable on R \ C. Then T is a subgradient projector of a convex function f : R R with lev 0 f = C and being differentiable on R \ C if and only if (i) T is monotonically increasing on convex subsets of R \ C; (ii) The function g(x) = x a 1 s Ts ds satisfies lim x sup(c) g(x) = for some a > sup(c); and lim x inf(c) g(x) = for some a < inf(c). Proof. Define n : R R : x x Tx. Then for every x C, T 1 (x) = 1 equivalent to 1 n 2 (x) n (x) n 2 (x) 0. This is the same as n (x) 1, which transpires to T (x) 0. n(x). Theorem 8.3(i) is Remark 8.5 Let f : R n R be continuously differentiable, lev 0 f = Fix T, and G f = T. Can one use T to decide whether f is convex? The proof of Theorem 8.3 implies that f = f T 1 where If T 1 : R n \ lev 0 f R n : x x Tx x Tx 2. (98) f T 1 is monotone on convex subsets of R n \ lev 0 f, then f is convex on convex subsets of R n \ lev 0 f. Using Proposition 8.1, we conclude that max0, f } is convex on R n. When T is continuously differentiable, (98) is equivalent to (99) ( x R n \ C) T 1 (x)(t 1 (x)) + T 1 (x) 0. On R, (99) is equivalent to (100) T is monotonically increasing on convex subsets of R \ C. Corollary 8.6 Let T : R R and C := x R Tx = x }. Let C be a closed interval, and T be continuously differentiable on R \ C. Define Suppose that (i) N is nonexpansive; (ii) The function N : R R : x x Tx. g(x) = x a 1 s Ts ds satisfies lim x sup(c) g(x) = for some a > sup(c); and lim x inf(c) g(x) = for some a < inf(c). 49

50 Then T is a subgradient projector of a convex function f : R R with lev 0 f = C and being differentiable on R \ C. In particular, the assumption (i) holds when T is firmly nonexpansive. Proof. It suffices to observe that T = Id N. Since N is nonexpansive, T is monotone. Also note that T is firmly nonexpansive if and only if N is. We illustrate Corollary 8.4 with three examples. They demonstrate that both conditions (i) and (ii) in Corollary 8.4 are needed. More precisely, (i) is for the convexity of f ; (ii) is for lev 0 f = C. Example 8.7 Define T : R R by T(x) := x x + xe 2 x if x > 0, 0 if x 0. Then T is a subgradient projector of the nonconvex function e 2 x 1 if x > 0, f : R R : x 0 if x 0. In this case, T fails to be monotone, but T verifies condition (ii) of Corollary 8.4. Proof. When x > 0, f (x) = e 2 x x 1/2, so that f (x) = e2 x (1 1/(2 x)). x Since f (x) < 0 when x < 1/4, f is not convex on R. Now we show that (i). T fails to be monotone. This is equivalent to verify that for some x we have N (x) > 1 where N(x) = x Tx. Indeed, L Hospital s rule gives ( x x N (x) = e 2 x ) = 1 e 2 x 1 2 e 2 x x + x e 2. e 2 x 1 lim x 0 + e 2 x x = 2, so lim x 0 + N (x) = 2. Therefore, T is not monotone. (ii). T satisfies condition (ii) of Corollary 8.4. For x > 0, With a > 0, we have g(x) = x a 1 N(s) ds = x a N(x) = e2 x 1 e 2 x x. 1/2 e 2 x x 1/2 e 2 x 1 dx = x ln(e2 1) ln(e 2 a 1). Clearly, lim x 0 + g(x) =. Hence (ii) holds. 50

51 Example 8.8 Define T : R R : x x 1 2x if x = 0, 0 if x = 0. Then T = G f where f : R R : x e x2. However, lev 0 f = but Fix(T) = 0}. In this case, in Corollary 8.4 condition (i) holds but condition (ii) fails. Proof. We have N(x) = x T(x) = 1 2x and N (x) = 1 2x 2. Therefore, T is monotone on (0, + ) and (, 0). This says that condition (i) of Corollary 8.4 holds. However, when a > 0, for x > 0 we have g(x) = x a 1 x N(x) dx = 2xdx = x 2 a 2. a Then lim x 0 + g(x) = a 2, so condition (ii) of Corollary 8.4 fails. Example 8.9 Define T : R R by Then T = G f where the nonconvex function x x if x > 0, x 0 if x = 0, x x if x < 0. f : R R : x e 2 x if x 0, e 2 x if x < 0. However, lev 0 f = but Fix T = 0}. In this case, both conditions (i) and (ii) in Corollary 8.4 fail. Proof. The function f (x) = e 2 x is nonconvex on [0, + ), see Example 8.7. G f = T follows by direct calculations. Condition (i) of Corollary 8.4 fails: T is not monotonically increasing on [0, + ) since T (x) = < 0 when x > 0 is sufficiently near 0. x Condition (ii) of Corollary 8.4 fails. Indeed, N(x) = x T(x) = x when x 0. When a > 0, for x > 0 we have x 1 g(x) = ds = 2 x 2 a, s so that lim x 0 + g(x) = 2 a. a For further properties of subgradient projectors of convex functions, we refer the reader to [44, 54, 12]. This completes Part II. We will investigate conditions under which a subgradient projector is linear in part III. 51

52 Part III Linear subgradient projectors 9 Characterizations of G f,s when G f,s is linear We shall see in this section that under appropriate conditions a linear operator is a subgradient projector of a convex function if and only if it is a convex combination of the identity operator and a projection operator on a subspace (Theorems 9.6 and 9.11). For subgradient projectors of convex functions, see [12, 44, 9, 46, 47, 48]. We begin with 9.1 Linear cutters are precisely linear firmly nonexpansive mappings Proposition 9.1 Let H be a Hilbert space, and T : H H be a linear operator. Then the following are equivalent: (i) T is a cutter, i.e., quasi-firmly nonexpansive. (ii) T is firmly nonexpansive. (iii) There exists δ > 0 and x Fix T such that T is a cutter on B( x, δ), i.e., a local cutter. Proof. (i) (ii). Assume that T is a cutter. Then for every x X and u Fix T, x Tx, u Tx = Tx x, Tx u 0. Put u = 0. We have Tx x, Tx 0 0 Tx 2 x, Tx. Hence T is firmly nonexpansive, see [7, Corollary 4.3]. (ii) (i). Assume that T is firmly nonexpansive. Let u Fix T. Then Tu = u and (101) (102) (103) (104) Tx x, Tx u = Tx x, Tx Tu = Tx Tu + Tu x, Tx Tu = Tx Tu 2 + Tu x, Tx Tu = Tx Tu 2 x u, Tx Tu 0. Hence T is a cutter. (iii) (ii). By the assumption ( x B( x, δ))( u B( x, δ) Fix T) x Tx, u Tx 0. As T x = x, and T is linear, for x = x + v with v δ, we have 0 x Tx, x Tx = Tx x, Tx x = T(x x) (x x), T(x x) = Tv 2 v, Tv. Since T is linear, we have Tx 2 x, Tx for every x X, so T is firmly nonexpansive, see [7, Corollary 4.3]. 52

53 Since (i) (ii), and (i) implies (iii), the proof is done. The following example says that Proposition 9.1 fails if T is not linear. Example 9.2 Define the continuous nonlinear mapping x/2 if 2 x 2, 3 x if 2 x 3, T : R R : x (3 + x) if 3 x 2, 0 otherwise. Then T is a cutter, nonexpansive, but not firmly nonexpansive as T is not monotone; cf. [7, Proposition 4.2(iv)]. Indeed, Fix T = 0}. This means that T is a cutter if and only if (T(x)) 2 xt(x). When 2 x 2, we have (T(x)) 2 = x2 4 x x 2 = xt(x); When 2 x 3, (T(x)) 2 = (3 x) 2 = (3 x)(3 x) x(3 x); when 3 x 2, when x > 3, (T(x)) 2 = [ (x + 3)] 2 = [ (x + 3)][ (x + 3)] x[ (3 + x)]; (T(x)) 2 = 0 = xt(x). Hence T is a cutter. Clearly, T is nonexpansive. As T is not monotone, we conclude that T is not firmly nonexpansive. Remark 9.3 Observe that Example 9.2 is much simpler than the example on R 2 constructed by Cegielski [20, Example 2.2.8, page 68]. 9.2 Subgradient projector of powers of a quadratic function It is natural to investigate subgradient projectors of quadratic functions or their variants first. In the following result, we assume B = 0 because that B = 0 gives G f = Id with f 0. Theorem 9.4 Let a > 0 and B = 0 being an n n symmetric and positive semidefinite matrix. Consider the function f : R n R : x (x Bx) 1/(2a). Then the following hold: (i) lev 0 f = x R n Bx = 0 }. (ii) We have G f (x) = x a x Bx Bx 2 Bx if Bx = 0, x if Bx = 0. 53

54 (iii) G f is linear if and only if B = λp L where λ > 0 and L X is a subspace. In this case ker B = L, f (x) = λ 1/(2a)( d L (x) ) 1/a and G f = Id ap L = (1 a) Id +ap L. (iv) Assume that G f is linear. Then G f is a cutter if and only if 0 < a 1. Proof. (i). Since B is symmetric and positive semidefinite, there exists a matrix A such that B = A A; see, e.g., [38, page 558]. Then Ax = 0 Bx = 0. The result follows because f (x) = Ax 1/a. (ii). G f follows from direct calculations. (iii). : Assume that G f is linear. The mapping x T 1 (x) := a 1( x G f (x) ) = x Bx Bx 2 Bx if Bx = 0, 0 if Bx = 0, is linear. Let λ 1, λ 2 > 0 be any two eigenvalues of B. We show that λ 1 = λ 2. Suppose that λ 1 = λ 2. Take unit length eigenvector v i associated with λ i. Note that v 1, v 2 = 0, Bv i = 0 and B(v 1 + v 2 ) = λ 1 v 1 + λ 2 v 2 = 0. As T 1 is linear, we have T 1 (v 1 + v 2 ) = T 1 v 1 + T 1 v 2. Now (105) (106) (107) (108) (109) (110) T 1 (v 1 + v 2 ) = (v 1 + v 2 ) B(v 1 + v 2 ) B(v 1 + v 2 ) 2 B(v 1 + v 2 ) = (v 1 + v 2 ) (λ 1 v 1 + λ 2 v 2 ) λ 1 v 1 + λ 2 v 2 2 (λ 1 v 1 + λ 2 v 2 ) = λ 1 + λ 2 λ (λ 1 v 1 + λ 2 v 2 ), λ2 2 T 1 v 1 + T 1 v 2 = v 1 Bv 1 Bv 1 2 Bv 1 + v 2 Bv 2 Bv 2 2 Bv 2 = λ 1 v 1 2 λ 1 v 1 2 λ 1v 1 + λ 2 v 2 2 λ 2 v 2 2 λ 2v 2 = v 1 + v 2. As v 1, v 2 } are linearly independent, the above gives λ 1 = λ 2 which contradicts λ 1 = λ 2. Therefore, all positive eigenvalues of B have to be equal. Hence, we have ( ) B = λu Id 0 U 0 0 where U is an orthogonal matrix, λ > 0, Id is an m m identity matrix with m = rank B. The matrix ( ) U Id 0 U 0 0 is idempotent and symmetric, so it is a matrix associated with an orthogonal projection onto a closed subspace, say P L, [38, page 430, page 433]. Hence B = λp L 54

55 which implies that Bx = 0 if and only if P L x = 0, i.e., ker B = L. Then when P L x = 0, T 1 (x) = x Bx Bx 2 Bx = λx P L x x λp L λp L x λp Lx = λx P L x λ 2 x P L x λp Lx = P L x; when P L x = 0, T 1 x = 0 = P L x. Hence T 1 = P L. It follows that G f = Id at 1 = Id ap L = (1 a) Id +a(id P L ) = (1 a) Id +ap L. We proceed to find the expression for f (x): (111) (112) (113) (114) f (x) = (x Bx) 1/(2a) = (x λp L x) 1/(2a) = λ 1/(2a) (x P L P L x) 1/(2a) = λ 1/(2a) ( P L x 2 ) 1/(2a) = λ 1/(2a) ( x P L x 2 ) 1/(2a) = λ 1/(2a) (d L (x) 2 ) 1/(2a) = λ 1/(2a)( d L (x) ) 1/a. : Assume that B = λp L for λ > 0 and some subspace L R n. The assumption gives f (x) = λ 1/(2a)( d L (x) ) 1/a. By Proposition 3.4(i), G f = G( ) 1/a. By Theorem 3.9, G f = (1 a) Id +ag dl. By Fact 2.8, d L Hence G f is linear. G f = (1 a) Id +ap L. (iv). : Assume that G f is linear and a cutter. By Fact 9.1, G f is firmly nonexpansive, so is Id G f. By (ii) Id G f = ap L, ap L has to be nonexpansive. Because B = 0, we have L = 0}. Take 0 = x L. The nonexpansiveness requires so that a 1. ap L x ap L 0 = ax x : Assume that 0 < a 1. Since x (x Bx) 1/2 is convex, and the function [0, + ) t t 1/a is convex and increasing when 0 < a 1, we have that x f (x) = ( (x Bx) 1/2) 1/a is convex. Then G f is a cutter by Fact We illustrate Theorem 9.4(iv) with the following example. Example 9.5 Let a > 1. Consider f : R n R : x (x x) 1/(2a) = x 1/a. Then f is not convex, and G f (x) = (1 a)x for every x R n. Although G f is linear, it is not a cutter since it is not monotone; see, e.g., Proposition

56 9.3 Symmetric and linear subgradient projectors The following result completely characterizes symmetric and linear subgradient projectors. Theorem 9.6 Assume that T : R n R n is linear and symmetric. Then the following are equivalent: (i) T is a subgradient projector of a convex function f : R n R with lev 0 f =. (ii) T = G f where f : R n R is given by (115) f (x) = K(x P L x) 1/(2λ) = K ( d L (x) ) 1/λ where 0 < λ 1, K > 0, and L R n is a subspace such that L G f = (1 λ) Id +λp L. = Fix T. In this case, Proof. (i) (ii). Assume that T = G f for some convex function. Since T is linear and a cutter, T is firmly nonexpansive by Proposition 9.1. Then T 1 = Id T is firmly nonexpansive by [7, Proposition 4.2]. We consider two cases. Case 1. int lev 0 f =. We have T 1 0 on an open set B(x 0, ε) lev 0 f, i.e., T 1 (x 0 + b) = 0 for every b < ε. As T 1 is linear, T 1 (b) = T 1 (x 0 + b) T 1 (x 0 ) = 0 0 = 0 when b < ε, so T 1 0 on R n. Thus, T = Id on R n. Then T = G f with f 0. This means that (ii) holds with L = 0}, λ = 1 and K > 0. Case 2. int lev 0 f =. Since lev 0 f is a proper subspace, it is an intersection of a finite collection of hyper-planes [50, Corollary 1.4.1], so R n \ lev 0 f is union of a finite collection of open half spaces. As T 1 is continuous, we only need to consider Then T 1 (x) = T 1 (x) = f (x) f (x) when f (x) > 0. f (x) 2 f (x) f (x) and f (x) f (x) = T 1(x) T 1 (x) 2. Since T is symmetric, T 1 is symmetric, so there exists an orthogonal matrix Q such that Q T 1 Q = D where D is an diagonal matrix and Q denotes the transpose of Q. Put g = ln f and x = Qy. When y Q (Fix T), we have ( g)(qy) = T 1Qy T 1 Qy 2. Multiplying both sides by Q and using Q being an isometry (i.e., Q z = z for every z R n ) give Q ( g)(qy) = Q T 1 Qy T 1 Qy 2 = Dy Q T 1 Qy 2 = Dy Dy 2. If we put h = g Q, then h(y) = Q g(qy) for every y R n \ (Q Fix T), so ( y R n \ (Q Fix T)) h(y) = Dy Dy 2. 56

57 Moreover, R n \ (Q Fix T) is a finite union of open half spaces, because Q Fix T is a proper subspace of R n. Write λ λ 2 0 D = λ n When λ 1 = = λ n = 0, this is covered in Case 1. We thus assume that T 1 0. As T 1 is monotone, we can and do assume that λ 1,, λ m > 0 and λ m+1 = = λ n = 0. Then ( ) λ h(y) = 1 y 1 λ m y m,,, 0,, 0. m k=1 λ2 k y2 k m k=1 λ2 k y2 k Since h has continuous second order derivatives on the nonempty open R n \ (Q Fix T), it must hold that 2 h = 2 h y i y j y j y i which gives (116) 2λ j λ 2 i y iy j m k=1 λ2 k y2 k = 2λ iλ 2 j y iy j m k=1 λ2 k y2 k when 1 i, j m, i = j. As int lev 0 f = int Fix T =, (116) holds on the nonempty open R n \ (Q Fix T), so we have λ i = λ j. Because 1 i, j m were arbitrary, we obtain that λ 1 = = λ m. Hence T 1 = Q ( ) λ Idm 0 Q = λq 0 0 ( ) Idm 0 Q = λp 0 0 L where L R n is a linear subspace; see [38, page 430]. More precisely, T 1 is a positive multiple of an orthogonal projector with (117) Fix T = ker T 1 = L. Now T 1 is firmly nonexpansive and T 1 = T 1, this implies that T 1 + T 1 2 T 1 T 1 = T 1 T 2 1 = Q ( (λ λ 2 ) Id 0 0 is positive semidefinite, so 0 λ 1. Because T 1 = 0 in this case, we obtain 0 < λ 1. Therefore, when x Fix T, Note that P L = P L, P 2 L = P L, ln f (x) = T 1x T 1 x 2 = λp Lx λp L x 2 = 1 λ ln P L x = 1 P L x P Lx = 1 P L x P L ) P L x P L x 2. P L x P L x = Q P Lx P L x 2. It follows that ln f (x) = 1 λ ln P Lx = ln P L x 1/λ. 57

58 On each connected and open component of R n \ Fix T, this is equivalent to ln f (x) = ln P L x 1/λ + c for some constant c R. Taking exp both sides gives (118) f (x) = K P L x 1/λ = K( P L x 2 ) 1/(2λ) = K(x P L x) 1/(2λ) where K = exp(c) > 0. As P L = Id P L, we obtain Moreover, f (x) = K x P x 1/λ = K(d L (x)) 1/λ. T = G f = Id T 1 = Id λp L = Id λ(id P L ) = (1 λ) Id +λp L where L = Fix T by (117). One can apply the same argument on each connected and open component of R n \ Fix T, while one might have different constant K s in (118), but λ will be the same. Indeed, suppose that there exist 0 < λ, λ 1 1, λ = λ 1 such that (1 λ) Id +λp Fix T = (1 λ 1 ) Id +λ 1 P Fix T. Then P Fix T = Id so that Fix T = R n, which contradicts that int Fix T =. Using the same K > 0 for all connected and open component of R n \ Fix T, one obtains (115). (ii) (i). Clear. Theorem 9.6 is proved under the assumption that the linear subgradient projector of a convex function is symmetric. We think that the assumption of symmetry is superfluous; cf. Theorem Conjecture 9.7 If f : R n R is convex and its subgradient projector G f,s is linear, then G f,s must be symmetric. Note that when f is not convex, G f,s can be nonsymmetric; see Corollary 11.6(ii). 9.4 Characterization of linear subgradient projectors In subsection 9.3, we assume that the linear operator is symmetric. What happens if the linear operator is not symmetric? For this purpose we need the following result. Proposition 9.8 Let M : R n R n be linear, monotone and ( x R n \ ker M) h(x) = Mx Mx 2 where the function h : R n \ ker M R. If dim ran M = 2, then M is symmetric. Proof. If dim ran M = 0, then M = 0, so it is symmetric. Let us assume that dim ran M > 0 and dim ran M = 2. Since h has continuous mixed second order derivatives at x whenever Mx = 0, the Hessian matrix 2 h(x) is symmetric. As 2 h(x) = Mx 2 M Mx( Mx 2 ) Mx 4 = Mx 2 M 2Mxx M M Mx 4, 58

59 the symmetric property means that Mx 2 M 2Mxx M M = ( Mx 2 M 2Mxx M M) = M Mx 2 2M Mxx M whenever Mx = 0. Put y = Mx. The above is simplified to M M 2 = yy yy M M y 2 y 2. Denote the projection operator on the line spanned by y}, span(y), by P y := yy y 2. We have (119) ( y ran M) M M 2 = P y M M P y. Since M is monotone, ran M = ran M ; see, e.g., [14, Theorem 3.2]. Let e i i = 1,..., m } be an orthonormal basis of ran M. Then (120) P ran M = Note that m e i ei. i=1 (121) M P ran M = M P ran M = M (P ran M + P (ran M ) ) = M because M P (ran M ) = 0. To see this, let y (ran M ). For every z R n, M y, z = y, Mz = 0 because Mz ran M = ran M. Because z R n was arbitrary, we must have M y = 0. Since (122) M M 2 = P ei M M P ei by (119), summing up (122) from from i = 1 to i = m, followed by using (120) and (121), we obtain m 2 (M M ) = ( m i=1 P ei )M M ( m i=1 P ei ) = P ran M M M P ran M = M M, that is, ( m 2 1)(M M ) = 0. Hence M M = 0 because m = 2, and so M is symmetric. The proof of Proposition 9.8 requiring dim ran M = 2 seems bizarre. However, the following examples show that Proposition 9.8 fails when dim ran M = 2. Example 9.9 When dim ran M = 2, although M : R 2 R 2 is linear, monotone and one cannot guarantee that M is symmetric. To see this, let x = (x, y) R 2. ( x R n \ ker M) h(x) = Mx Mx 2, 59

60 (1). Define M := Then M is linear, monotone, dim ran M = 2 and arctan(y/x) = whenever x = 0. However, M is not symmetric. (2). Define M := Then M is linear, firmly nonexpansive and ( ) ( ) y x x 2 + y 2 = ( ) 1/2 1/2. 1/2 1/2 ( ln(x 2 + y 2 ) ) + arctan(y/x) = 2 Mx Mx 2 ( ) x y y + x x 2 + y 2 = Mx Mx 2 whenever x = 0. However, dim ran M = 2 and M is not symmetric. Conjecture 9.10 Let M : R n R n be linear, monotone and ( x R n \ ker M) h(x) = Mx Mx 2 where the function h : R n \ ker M R. If dim ran M = 2 and exp(h) is convex on convex subsets of R n \ ker M, then M is symmetric. Combining Theorem 9.6 and Proposition 9.8, we obtain the following characterization of linear subgradient projectors. Theorem 9.11 Assume that T : R n R n is linear and dim ran(id T) = 2. Then the following are equivalent: (i) T is a subgradient projector of a convex function f : R n R with lev 0 f =. (ii) T = G f where f : R n R is given by f (x) := K(x P L x) 1/(2λ) = K ( d L (x) ) 1/λ where 0 < λ 1, K > 0, and L R n is a subspace such that L G f = (1 λ) Id +λp L. = Fix T. In this case, Proof. (i) (ii). Assume that T = G f for some convex function f : R n R. Then T is a cutter by Fact As T is linear, in view of Proposition 9.1, T is firmly nonexpansive, so M := Id T is firmly nonexpansive, in particular, monotone. By Theorem 4.1(i), (123) h(x) = Mx Mx 2 60

61 where h(x) = ln f (x), f (x) > 0. Since Fix T = lev 0 f = ker M, (123) is equivalent to h(x) = Mx Mx 2 when Mx = 0. Proposition 9.8 shows that M is symmetric, so is T = Id M. It suffices to apply Theorem 9.6 to obtain (ii). (ii) (i). Clear. 10 Subgradient projectors of convex functions are not closed under convex combinations and compositions A convex combination of cutters is a cutter, see [20, Corollary ] or [7, Proposition 4.34]. Convex combinations of a finite family of cutters with a common fixed point are effectively used in simultaneous cutter methods; see [20, Section 5.8], [7, Corollary 5.18]. A question that naturally arises is whether the set of subgradient projectors of convex functions is convex. Theorem 9.6 allows us to show that the answer is negative. While Theorem 10.1 works only in R 2, Theorem 10.3 works in R n with n 2. Theorem 10.1 In R 2, a convex combination of subgradient projectors of convex functions need not be a subgradient projector of a convex function. Proof. Let L := 0} R R 2 and M := x = (x 1, x 2 ) R 2 x 1 + x 2 = 0 }. Both L, M are proper linear subspaces of R 2. Define f, g : R 2 R by (124) ( x R 2 ) f (x) := K 1 ( dl (x) ) 1/λ 1, g(x) := K 2 ( dm (x) ) 1/λ 2 where 0 < λ 1 = λ 2 < 1, K 1, K 2 > 0. By Theorem 9.6, we have (125) G f = (1 λ 1 ) Id +λ 1 P L, and G g = (1 λ 2 ) Id +λ 2 P M. Now consider λ 3 G f + (1 λ 3 )G g where 0 < λ 3 < 1. Then (126) (127) (128) λ 3 G f + (1 λ 3 )G g ( =λ 3 (1 λ1 ) Id +λ 1 P L ) + (1 λ3 ) ( ) (1 λ 2 ) Id +λ 2 P M =(1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M. We show that λ 3 G f + (1 λ 3 )G g is not a subgradient projector of a convex function by contradiction. Suppose that λ 3 G f + (1 λ 3 )G g is a subgradient projector of a convex function. By Theorem 9.6, there are 0 < λ < 1 and S which is a subspace of R 2 such that (129) λ 3 G f + (1 λ 3 )G g = (1 λ) Id +λp S. Therefore, we have (130) (1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M = (1 λ) Id +λp S. 61

62 Naturally, the set of fixed points of left-hand side is equal to the set of fixed points of right-hand side. Thus we have (131) (132) Fix ((1 λ) Id +λp S ) = Fix ((1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M ). By [7, Proposition 4.34], we have (133) Fix ((1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M ) = L M. Also, (134) Fix ((1 λ) Id +λp S ) = S. Hence, using definitions of L, M, and (131)-(134), it follows that (135) (136) (137) (138) (0, 0)} = L M = Fix ((1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M ) = Fix ((1 λ) Id +λp S ) = S. Therefore S = (0, 0)}, which implies S = R 2. In terms of matrices, we have ( ) ( ) ( ) 1 0 1/2 1/2 0 0 (139) P L =, P 0 0 M =, and P S =. 1/2 1/2 0 0 In particular, P L, P S are diagonal matrices, but P M is not. Hence, equation (130) is not true. Therefore, λ 3 G f + (1 λ 3 )G g is not a subgradient projector of a convex function. Our next result needs averaged mappings. Definition 10.2 (See [4], [7, Definition 4.23]) Let λ (0, 1). An operator T : R n R n is λ-averaged if there exists a nonexpansive operator N : R n R n such that T = (1 λ) Id +λn. Theorem 10.3 Let n 2, 0 < λ 1 < 1, 0 < λ 2 < 1, 0 < λ < 1. Suppose that L, M are linear subspaces of R n satisfying L = M, M = L, and that both L and M are proper linear subspaces of R n. Define f : R n R : x (d L (x)) 1/λ 1, and g : R n R : x (d M (x)) 1/λ 2. If 1 λ λ = λ 2 λ 1, then (1 λ)g f + λg g is not a subgradient projector of a convex function. Proof. By Theorem 9.6, we have (140) G f = (1 λ 1 ) Id +λ 1 P L, and G g = (1 λ 2 ) Id +λ 2 P M. Then (141) (142) (1 λ)g f + λg g =(1 λ) ((1 λ 1 ) Id +λ 1 P L ) + λ ((1 λ 2 ) Id +λ 2 P M ) 62

63 (143) (144) = [(1 λ)(1 λ 1 ) + λ(1 λ 2 )] Id +λ 1 (1 λ)p L + λλ 2 P M =β Id +(1 β)(γp L + (1 γ)p M ), where β := (1 λ)(1 λ 1 ) + λ(1 λ 2 ) and γ := and γ = 1 2. Indeed, λ 1 (1 λ). We observe that 0 1 (1 λ)(1 λ 1 ) λ(1 λ 2 < β < 1 ) (145) 0 < β = (1 λ)(1 λ 1 ) + λ(1 λ 2 ) < (1 λ) + λ = 1. Also, (146) (147) (148) (149) γ = 1 2 λ 1 (1 λ) 1 (1 λ)(1 λ 1 ) λ(1 λ 2 ) = λλ 2 1 (1 λ)(1 λ 1 ) λ(1 λ 2 ) λ 1 (1 λ) = λλ 2 1 λ = λ 2. λ λ 1 1 λ By the assumption: λ = λ 2 λ 1, so γ = 1 2. Since G f, G g are linear and symmetric, so is (1 λ)g f + λg g. We show that (1 λ)g f + λg g is not a subgradient projector of a convex function by contradiction. If (1 λ)g f + λg g is a subgradient projector of a convex function, by Theorem 9.6, we have (150) (1 λ)g f + λg g = (1 α) Id +αp S where 0 < α < 1 and S is a subspace of X. (1 λ)g f + λg g. Because Note that G f, G g are averaged mappings, so is (151) Fix G f = L, and Fix G g = M, by [7, Proposition 4.34], we obtain (152) Fix((1 λ)g f + λg g ) = Fix G f Fix G g = L M = 0}. Because Fix((1 α) Id +αp S ) = S, using (150) and (152) we obtain S = 0}. Therefore, in view of equation (150), we have (153) (1 λ)g f + λg g = (1 α) Id. Combing (144) and (153) gives (154) β Id +(1 β)(γp L + (1 γ)p M ) = (1 α) Id. We proceed to analyze α, β. Take x M \ 0}, which is possible since M = 0}. Then P M x = 0, P L x = P M x = x. Equation (154) gives (155) βx + (1 β)(γx) = (1 α)x, which implies (156) β + (1 β)γ = 1 α. 63

64 Take x L \ 0}, which is possible since L = 0}. Then P L x = 0, P M x = P L x = x. Equation (154) gives (157) βx + (1 β)(1 γ)x = (1 α)x, which implies (158) β + (1 β)(1 γ) = 1 α. Subtracting equation (156) from equation (158), we have (159) (1 β)(1 2γ) = 0 which implies β = 1 or γ = 1 2. This contradicts the choices of λ, λ 1, λ 2. If two nearest point projectors onto subspaces commute, then their composition is the projection onto the intersection of the subspaces; see [27, Lemma 9.2]. One referee asks whether there is an analogue when two linear subgradient projectors commute. The answer is negative. To this end, we need an auxiliary result. Lemma 10.4 Let L, M R n be two subspaces, and λ i [0, 1) with i = 1, 2. Then the following are equivalent: (i) ( λ 1 Id +(1 λ 1 )P L )( λ2 Id +(1 λ 2 )P M ) = ( λ2 Id +(1 λ 2 )P M )( λ1 Id +(1 λ 1 )P L ). (ii) P L P M = P M P L. (iii) P L P M = P M P L. Proof. (i) (ii): This follows from ( )( (160) λ1 Id +(1 λ 1 )P L λ2 Id +(1 λ 2 )P M ) (161) = λ 1 λ 2 Id +λ 1 (1 λ 2 )P M + (1 λ 1 )λ 2 P L + (1 λ 1 )(1 λ 2 )P L P M, and (162) (163) ( )( λ2 Id +(1 λ 2 )P M λ1 Id +(1 λ 1 )P L ) = λ 1 λ 2 Id +λ 2 (1 λ 1 )P L + (1 λ 2 )λ 1 P M + (1 λ 1 )(1 λ 2 )P M P L. (ii) (iii): Since P L = Id P L, P M = Id P M, (ii) is equivalent to (Id P L )(Id P M ) = (Id P M )(Id P L ), which is (iii) after simplifications. Theorem 10.5 In R 2, even though two linear subgradient projectors of convex functions commute, its composition need not be a subgradient projector of a convex function. Proof. Let 0 < λ 1 < λ 2 < 1. Because ( ) 1 0 = P 0 0 R 0}, and ( ) 0 0 = P 0 1 0} R, 64

65 by Theorem 9.6, there exist two convex functions f, g : R 2 R such that ( ) ( ) ( ) G f = λ 1 + (1 λ ) =, and λ 1 ( ) ( ) ( ) λ2 0 G g = λ 2 + (1 λ ) = These two subgradient projectors are commutative by Lemma 10.4 or a direct calculation: ( ) λ2 0 G f G g = G g G f =. 0 λ 1 We claim that T := G f G g is not a subgradient projector of a convex function. We prove this by contradiction. Suppose that T is a subgradient projector. Since T is symmetric and linear, by Theorem 9.6, there exists 0 λ 1 such that (164) T = λ Id +(1 λ)p where P is a projector onto a subspace of R 2. We consider five cases. Case 1. λ = 0. This gives P = T. Because P is a projector, its eigenvalues are 0 or 1. This is impossible, since 0 < λ i < 1 and λ 1 = λ 2. Case 2. λ = 1. This gives T = Id. This is impossible, since λ 1 = λ 2. Case 1 and Case 2 implies that 0 < λ < 1. This gives P = ( λ2 λ 1 λ 0 λ 0 1 λ 1 λ Case 3. λ > λ 1. Then λ 1 λ < 0. This is impossible, since the eigenvalues of P have to be nonnegative. Case 4. λ = λ 1. Since λ 2 λ 1 λ > 0, and P has eigenvalues only of 0 or 1, we have It follows that λ 2 = 1, which is impossible. λ 2 λ 1 λ = 1. Case 5. 0 < λ < λ 1. Then λ 1 λ 1 λ > 0, λ 2 λ 1 λ > 0. Since P has eigenvalues only of 0 or 1, we must have from which λ 1 = λ 2. This is impossible. ) λ 2 λ 1 λ = λ 1 λ 1 λ = 1 Altogether, (164) does not hold. Using Theorem 9.6 again, we conclude that T is not a subgradient projector of a convex function. 65.

66 11 A complete analysis of linear subgradient projectors on R 2 In this section we turn our attention to linear operators on R 2. One nice feature is that we are able to not only characterize when the linear operator is a subgradient projector but also give explicit formulae for the corresponding functions. Is every linear mapping from R 2 to R 2 a subgradient projector of an essentially strictly differentiable function (convex or nonconvex) on R 2? The answer is no by Theorem 11.2 below. Theorem 11.2(iii) also shows that Theorem 9.11 fails if dim ran(id T) = 2 is removed. We start with a simple result about essentially strictly differentiable functions, see Definition 4.3. Lemma 11.1 Let O R n be a nonempty open set and f : O R n R be an essentially strictly differentiable function. If there exists a continuous selection s : O R n with s(x) f (x) for every x O, then f is strictly differentiable on O. Consequently, f is continuously differentiable on O. Proof. By [15, Theorem 2.4, Corollary 4.2], f has a minimal Clarke subdifferential c f, and c f can be recovered by every dense selection of c f. Since s(x) f (x) c f (x), and s is continuous on O, we have c f (x) = f (x) = s(x)} for every x O, which implies that f is strictly differentiable at x; see, e.g., [51, page 362, Theorem 9.18] or [40, Theorem 3.54]. Hence f is strictly differentiable on O. We consider the linear operator T : R 2 R 2 defined by (165) where ( ) 1 a b T := c 1 d (166) a 2 + b 2 + c 2 + d 2 = 0 (i.e., (a, b, c, d) = (0, 0, 0, 0)). Note that when a = b = c = d = 0, we have T = Id = G f with f 0. Theorem 11.2 Let T be given by (165). Then T is a subgradient projector of an essentially strictly differentiable function on R 2 \ Fix T if and only if one of the following holds: (i) a = b = c = 0, d = 0: T = G f where f (x 1, x 2 ) := K x 2 1/d for some K > 0; b = c = d = 0, a = 0: T = G f where f (x 1, x 2 ) := K x 1 1/a for some K > 0. (ii) a = 0, d = 0, b = c = 0, ad = c 2 : T = G f where f (x 1, x 2 ) := K ax 1 + cx 2 a/(a2 +c 2) for some K > 0. (iii) a = d, b = c, and a 2 + c 2 = 0: T = G f where K(x1 2 + x2 2 ) a ( ( )) 2(a 2 +c 2 ) exp c arctan x1 a 2 +c 2 x 2 if x 2 = 0, (167) f (x 1, x 2 ) := 0 if (x 1, x 2 ) = (0, 0), a ( ) K x 1 (a 2 +c 2 ) exp c π a 2 +c 2 2 if x 1 = 0, x 2 = 0, for some K > 0, and f is lsc. In particular, when c = 0, f is not convex. 66

67 Proof. Observe that (166) implies that Fix T is a proper subspace of R 2. Assume that T is a subgradient projector. By Theorem 4.1 and Lemma 11.1, we can find a differentiable function g : (R 2 \ Fix T) R such that for every x R 2 \ Fix T, x Tx x Tx 2 = g(x). Because ( a b x Tx = c d ) ( x1 x 2 ) ( ax1 + bx = 2 cx 1 + dx 2 ), we have g x 1 = g x 2 = ax 1 + bx 2 (ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2, cx 1 + dx 2 (ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2. Since 2 x 1 x 2 g(x 1, x 2 ) = (a2 b bc 2 + 2acd)x (b3 + bd 2 )x (ab2 + ad 2 )x 1 x 2 ((ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2 ) 2, 2 x 2 x 1 g(x 1, x 2 ) = (a2 c + c 3 )x (cd2 b 2 c + 2abd)x (c2 d + a 2 d)x 1 x 2 ((ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2 ) 2, on the nonempty open set of R 2 \ Fix T, we have 2 x 1 x 2 g(x 1, x 2 ) = 2 x 2 x 1 g(x 1, x 2 ). This leads to a 2 b bc 2 + 2acd = a 2 c + c 3, (1) b 3 + bd 2 = cd 2 b 2 c + 2abd, (2) ab 2 + ad 2 = c 2 d + a 2 d. (3) Now multiplying (2) by a, followed by subtracting it with (3) multiplied by b, gives (ad bc)(ab + cd) = 0. It suffices to consider two cases: Case ad = bc. (1) implies (b c)(a 2 + c 2 ) = 0. Observe that b c = 0 is impossible by (2). Then the following two subcases could happen. i. b = c = 0. Then (3) ad(a d) = 0. This means (168) a = b = c = 0, d = 0, or (169) b = c = d = 0, a = 0. ii. b = c = 0, which implies a = 0, d = 0, and ad = c 2. 67

68 Case ab + cd = 0. (1) implies (b + c)(a 2 + c 2 ) = 0. When b = c = 0, (3) gives ad(a d) = 0, which leads to (168), (169), or a = d = 0. It remains to consider the case b = c = 0. Then (2) and (3) imply a = d. Moreover, we can and do assume a 2 + c 2 = 0 since a = c = 0 gives (168) by (2). In summary, we only have the following three cases. Case 1. a = b = c = 0, d = 0. Then we get g(x 1, x 2 ) = ln x 2 d Or b = c = d = 0, a = 0. Then we get g(x 1, x 2 ) = ln x 1 a Case 2. a = 0, d = 0, b = c = 0, ad = c 2. Then we get g(x 1, x 2 ) = Case 3. a = d, b = c, and a 2 + c 2 = 0. Then we get g(x 1, x 2 ) = + C 1, if x 2 = 0. + C 1, if x 1 = 0. a a 2 + c 2 ln ax 1 + cx 2 + C 2, if ax 1 + cx 2 = 0. a 2(a 2 + c 2 ) ln(x2 1 + x2 2) c a 2 + c 2 arctan ( x1 x 2 ) + C 3, if x 2 = 0. Since g = ln f, we obtain f = exp(g) by using Case 1-Case 2. For Case 3, we obtain ( f (x 1, x 2 ) = K(x1 2 + a x2 2(a 2) 2 +c 2 ) exp c ( )) a 2 + c 2 arctan x1 if x 2 = 0 for some K > 0. However, when c = 0, f is not continuous at ( x 1, 0) with x 1 = 0 since lim arctan x 1 = lim x 1 x 1,x 2 0 x arctan x 1 = ± π 2 x 1 x 1,x 2 0 x 2 2. The function given by (167) is lsc but not continuous at every ( x 1, 0). Moreover, f is not convex on R 2 since a finite-valued convex function on a finite dimensional space is continuous; see, e.g., [7, Corollary 8.31]. It is interesting to ask for what selection s f, we have G f = T on R 2. On R 2 \ (x1, x 2 ) x 2 = 0 }, one clearly chooses s = f. It remains to determine the subgradient of f at ( x 1, 0). Indeed, when x 2 = 0, f (x 1, x 2 ) = exp(g(x 1, x 2 )), so that f (x 1, x 2 ) = f (x 1, x 2 ) g(x 1, x 2 ), i.e. ( f (x 1, x 2 ) = K(x1 2 + a x2 2(a 2) 2 +c 2 ) exp c ( )) a 2 + c 2 arctan x1 1 (ax 1 cx 2, ax 2 + cx 1 ) x 2 a 2 + c 2 x x2 2 When (x 1, x 2 ) ( x 1, 0), cx 1 /x 2 > 0, we have ( a f (x 1, x 2 ) K x 1 (a 2 +c 2 ) exp c ) π a 2 + c 2 = f ( x 1, 0), 2 68 x 2

69 and ( a f (x 1, x 2 ) K x 1 (a 2 +c 2 ) exp c ) ( π 1 a a 2 + c 2 2 a 2 + c 2, x 1 Therefore, by the definition of limiting subdifferentials (see Definition 2.1), ( a (170) K x 1 (a 2 +c 2 ) exp c ) ( ) π 1 a c a 2 + c 2 2 a 2 + c 2, f ( x 1, 0). x 1 x 1 Hence, we can choose s( x 1, 0) to be the limiting subgradient given by (170). c x 1 ). Remark 11.3 Note that f ( x 1, 0) is not a singleton when x 1 = 0 and c = 0 in Theorem 11.2(iii). Thus, in Theorem 11.2(iii), we only have T G f when c = 0. In order to make f continuous on R n, we need c = 0, in which case (167) reduces to f (x 1, x 2 ) = K (x 1, x 2 ) 1/a, and G f = (1 a) Id. Clearly, f is not convex when a > 1. This has been discussed in Example 2.5. Corollary 11.4 Let T be given by (165). Suppose that one of the following holds: (i) b = c. (ii) b = c = 0, a = 0, d = 0, and a = d. Then there exists no f : R 2 R being essentially strictly differentiable such that T = G f. Corollary 11.5 Let T be given by (165). Suppose that b = c = 0, 0 < a < 1, 0 < d < 1, and a = d. Then T is firmly nonexpansive, and there exists no f : R 2 R being essentially strictly differentiable such that T = G f. Corollary 11.6 (i) The skew linear mapping T := ( ) is not firmly nonexpansive, so not a cutter. However, T is a subgradient projector of a nonconvex, discontinuous but lsc function f 1 given by (x 2 + y 2 ) 1/4 exp ( (1/2) arctan(x/y) ) if y = 0, (171) f 1 (x, y) := 0 if (x, y) = (0, 0), x 1/2 exp( π/4) if x = 0, y = 0. (ii) The linear mapping T := ( 1/2 ) 1/2 1/2 1/2 is firmly nonexpansive and a cutter. However, T is a subgradient projector of a nonconvex, discontinuous but lsc function f 2 given by (x 2 + y 2 ) 1/2 exp ( arctan(x/y) ) if y = 0, (172) f 2 (x, y) := 0 if (x, y) = (0, 0), x exp( π/2) if x = 0, y = 0. 69

Figure 1: Plot of function given by (171) Figure 2: Plot of function given by (172) Note that f 2 = f 2 1 in Corollary 11.6 and G f 2 = (Id +G f1 )/2. Remark 11.7 Corollary 11.5 and Corollary 11.

5, there exists no continuous convex function f such that G f = T in either case of Corollary 11.6. Corollary 11.6 says that T = G f being linear and firmly nonexpansive does not imply that f is convex.

70 Figure 1: Plot of function given by (171) Figure 2: Plot of function given by (172) Note that f 2 = f 2 1 in Corollary 11.6 and G f 2 = (Id +G f1 )/2. Remark 11.7 Corollary 11.5 and Corollary 11.6 together show that although the set of cutters and the set of subgradient projectors have a nonempty intersection, they are different because neither one contains the other. By Theorem 4.5, there exists no continuous convex function f such that G f = T in either case of Corollary Corollary 11.6 says that T = G f being linear and firmly nonexpansive does not imply that f is convex. A key point below is that if T = G f is linear and f is convex on R 2, then Theorem 11.2 implies that T has to be firmly nonexpansive and symmetric. Corollary 11.8 Let T be given by (165). Then T is a subgradient projector of a convex function if and only if one of the following holds: (i) a = b = c = 0, d = 0, 0 < d 1: T = G f where f (x 1, x 2 ) = K x 2 1/d for some K > 0; or b = c = d = 0, a = 0, 0 < a 1: T = G f where f (x 1, x 2 ) = K x 1 1/a for some K > 0. (ii) a = 0, d = 0, b = c = 0, ad = c 2, a a 2 + c 2 : T = G f where f (x 1, x 2 ) = K ax 1 + cx 2 a/(a2 +c 2 ) for some K > 0. (iii) a = d, b = c = 0, 0 < a 1: T = G f where f (x 1, x 2 ) = K(x x2 2 ) 1 2a for some K > 0. Acknowledgments The authors thank two anonymous referees for careful reading and constructive suggestions on the paper. HHB was partially supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (NSERC) and by the Canada Research Chair Program. CW was partially supported by National Natural Science Foundation of China ( ). XW was partially supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (NSERC). JX was supported by by NSERC grants of HHB and XW. 70

On the convexity of piecewise-defined functions

On the convexity of piecewise-defined functions arxiv:1408.3771v1 [math.ca] 16 Aug 2014 Heinz H. Bauschke, Yves Lucet, and Hung M. Phan August 16, 2014 Abstract Functions that are piecewise defined are