Subgradient Projectors: Extensions, Theory, and Characterizations

Size: px
Start display at page:

Download "Subgradient Projectors: Extensions, Theory, and Characterizations"

Transcription

1 Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke, Caifang Wang, Xianfu Wang, and Jia Xu April 13, 2017 Abstract Subgradient projectors play an important role in optimization and for solving convex feasibility problems. For every locally Lipschitz function, we can define a subgradient projector via generalized subgradients even if the function is not convex. The paper consists of three parts. In the first part, we study basic properties of subgradient projectors and give characterizations when a subgradient projector is a cutter, a local cutter, or a quasi-nonexpansive mapping. We present global and local convergence analyses of subgradent projectors. Many examples are provided to illustrate the theory. In this second part, we investigate the relationship between the subgradient projector of a prox-regular function and the subgradient projector of its Moreau envelope. We also characterize when a mapping is the subgradient projector of a convex function. In the third part, we focus on linearity properties of subgradient projectors. We show that, under appropriate conditions, a linear operator is a subgradient projector of a convex function if and only if it is a convex combination of the identity operator and a projection operator onto a subspace. In general, neither a convex combination nor a composition of subgradient projectors of convex functions is a subgradient projector of a convex function Mathematics Subject Classification: Primary 49J52; Secondary 49J53, 47H04, 47H05, 47H09. Keywords: Approximately convex function, averaged mapping, cutter, essentially strictly differentiable function, fixed point, limiting subgradient, local cutter, local quasi-firmly nonexpansive mapping, local quasi-nonexpansive mapping, local Lipschitz function, linear cutter, linear firmly nonexpansive mapping, linear subgradient projection operator, Moreau envelope, projection, prox-bounded, proximal mapping, prox-regular function, quasi-firmly nonexpansive mapping, quasi-nonexpansive mapping, (C, ε)-firmly nonexpansive mapping, subdifferentiable function, subgradient projection operator. 1 Introduction Studies of optimization problems and convex feasibility problems have led in recent years to the development of a theory of subgradient projectors, which is a projection to a certain half-space. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. heinz.bauschke@ubc.ca. Department of Mathematics, Shanghai Maritime University, Shanghai, China. cfwang@shmtu.edu.cn. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. shawn.wang@ubc.ca. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. jia.xu@ubc.ca. 1

2 Rather than finding projections on level sets of original functions, the iterative algorithms find projections on half spaces which include the 0-level set of the function. Polyak developed subgradient projector iteration for convex functions [46, 47, 48], and they are further developed by Censor, Combettes, Fukushima, Kiwiel, Yamada and others, and applied to many kinds of optimization problems [22, 21, 9, 24, 25, 29, 34, 35, 44, 19, 55]. In [12], we give a systematic study for subgradient projectors of convex functions. Convexity is often a too strong assumption for the needs of applications. In a recent work [43], Pang studied finitely convergent algorithms for nonconvex inequality problems involving approximately convex functions. The subgradient projector by Pang used the Clarke subdifferential instead of the Mordukhovich limiting subdifferential. To this day, however, there is a lack of systematic theory on the subgradient projector when a function is possibly nonconvex. The goal of this paper is to carry out the basic theory of subgradient projectors for possibly nonconvex functions on a finite dimensional space, which is thus aimed ultimately at applications to diverse problems of nonconvex optimization. Non-differentiable and nonconvex functions arise in many optimization problems. As far as nonconvex functions are concerned, the cutter theory or T -class developed by Cegielski [20], Bauschke, Borwein and Combettes [8], and Bauschke, Wang, Wang and Xu [13] furnish the new approach to subgradient projectors, without appealing to the existence theory on subgradient projectors for convex functions. Our study shows that subgradient projectors for nonconvex functions have many attractive analytical properties. Among all results presented here, we discover that while cutters and quasi-nonexpansive mappings on R n are global, cutters and quasi-nonexpansive mappings on a neighborhood are more useful for functions which are locally convex around the desired point, say a critical point or a feasible point. This paper not only includes some results from [54], but also many refinements and new advances. Since definitions and proofs are much simpler in the finite dimensional space, and many technical complications do not even appear, we shall work in the finite dimensional space only. For the convenience of readers, our main results are presented in three parts. In the first part, we study extensions and theory of subgradient projectors. In the second part, we consider subgradient projectors of Moreau enevelopes and conditions under which a mapping is the subgradient projector of a convex function. The third part is devoted to linear subgradient projectors. The remainder of this paper is organized as follows. Part I consists of Sections 2 6. Section 2 provides an extension of subgradient projectors from convex functions to possibly nonconvex functions; Section 3 is devoted to calculus of subgradient projectors; Section 4 deals with whether one can recover a function from its subgradient projector, and fixed point closed property of a subgradient projector; conditions on functions under which their subgradient projectors are cutters or local cutters are presented in Section 5. Section 6 is devoted to convergence analysis of subgradient projectors by using theory from cutters, local cutters, quasinonexpansive mapping, and local quasinonexpansive mappings. Under appropriate assumptions, we show that subgradient projectors are (C, ε)-firmly nonexpansive, a very useful concept introduced by Hesse and Luke for studying local linear convergence of a variety of algorithms. Part II consists of Sections 7 8. For prox-bounded and prox-regular functions, their Moreau envelopes are differentiable. Section 7 studies the subgradient projectors of Moreau envelopes of prox-bounded and prox-regular functions, and their connections to subgradient projectors of original functions. We show that if f is proper, lsc, prox-bounded, and prox-regular on R n, then f is a difference of convex functions (Corollary 7.9); and that if f is C 2, min f = 0, and f (x) = 0 for every x R n \ argmin f, then the subgradient projector G f of f is a cutter if and only if the 2

3 subgradient projector G eλ f of the envelope e λ f is a cutter for every λ > 0 (Propositions 7.22 and 7.24). Section 8 characterizes when a mapping is actually a subgradient projector of a convex function. Part III consists of Section It is interesting to ask when a subgradient projector is linear, and what special properties a linear subgradient projector possesses. To the best of our knowledge, this question has not been explored in the literature. Section 9 studies linear subgradient projectors and their distinguished features. In particular, we give a nonlinear cutter which is nonexpansive but not firmly nonexpansive, and the example is much simpler than the one given by Cegielski [20]. In Section 10, using results from Section 9, we show that in general neither a convex combination nor a composition of subgradient projectors of convex functions is a subgradient projector of a convex function. Finally, in Section 11, we completely characterize linear subgradient projectors on R 2, and give explicit formulae for the corresponding functions. The notation that we employ is for the most part standard; however, a partial list is provided for the reader s convenience. Throughout this paper, R n is the n-dimensional dimensional Euclidean space with inner product, and induced norm, i.e., ( x R n ) x := x, x. The identity operator on R n is Id. For a mapping T : R n R n, its fixed point set is denoted by Fix T := x R n Tx = x } ; its kernel is ker T := x R n Tx = 0 } ; its range is ran T := y R n y = Tx for some x R n}. For a function f : R n (, + ], its α-level set is denoted by lev α f := x R n f (x) α } ; its effective domain is dom f := x R n f (x) < + }. For a set-valued mapping F : R n R m, the domain, range and fixed point set of F are given by dom F := x R n F(x) = }, ran F := x R n F(x), and Fix F := x R n x F(x) } respectively. We use B(x, δ) for the closed ball centered at x R n with radius δ > 0. R + denotes the set of non-negative real numbers, and N denotes the set of non-negative integers 0, 1, 2,...}. For a set C R n, its distance function is and the projection operator onto C is d C : R n [0, + ) : x inf x y y C }, P C : R n C : x p C x p = d C (x) }. The indicator function of C is ι C : R n (, + ] is defined by ι C (x) := 0 if x C and ι C (x) := + if x C. We write int C for the interior, and bdry(c) := C \ int C for the boundary of C, respectively. For a subspace L R n, its orthogonal complement is defined to be L := y R n y, x = 0, x L }. When x, y R n, the line segment between x, y is given by [x, y] := (1 λ)x + λy 0 λ 1 }. 3

4 Part I Extensions to possibly nonconvex functions and basic theory 2 An extension of subgradient projector via limiting subgradients To introduce subgradient projectors for possible non-convex functions, we need the following generalized subgradients [51, 40, 39, 31]. Definition 2.1 Consider a function f : R n (, + ] and a point x R n with f ( x) finite. For a vector v R n, one says that (i) v is a regular (or Fréchet) subgradient of f at x, written v ˆ f ( x), if f (x) f ( x) + v, x x + o( x x ); (ii) v is a limiting (or Mordukhovich) subgradient of f at x, written v f ( x), if there are sequences x ν x, f (x ν ) f ( x) and v ν ˆ f (x ν ) with v ν v. A locally Lipschitz function is subdifferentially regular at x with f ( x) finite if f ( x) = ˆ f ( x), see [51, Corollary 8.11], [39]. It is well-known that when f is locally Lipschitz, f is nonempty-valued everywhere; when f is lower semicontinuous (lsc), the set of points at which f is nonemptyvalued is at least dense in the domain of f, [51, Corollary 8.10]. Furthermore, f is the usual Fenchel subdifferential when f is convex. All of these results can be found in [18, 17, 40, 51]. A function f : R n R is called subdifferentiable if f (x) = for every x R n. While every locally Lipschitz functions on R n is a subdifferentiable function, a subdifferentiable function might not be locally Lipschitz, e.g., 1 if x 0, f : R R : x 1 x if x > 0, see also [51, page 359]. The key concept we shall study is the subgradient projection operator. Definition 2.2 Let f : R n R be lsc and subdifferentiable, and let s : R n R n be a selection of f. The subgradient projector of f is defined by (1) G f,s : R n R n : x x x f (x) s(x) 2 s(x) if f (x) > 0 and 0 f (x), otherwise. When it is not necessary to emphasize the selection s, we will write G f. It is also convenient to introduce the set-valued mapping associated with f by (2) G f : R n R n : x G f,s (x) s is a selection of f } with G f,s being given in (1). 4

5 Although subgradient projectors have been well studied for convex functions [12, 21, 24, 20, 44, 46, 48, 55, 42], the extension to possibly nonconvex functions is new. When f is convex and inf R n f 0, G f,s reduces to x f (x) G f,s : R n R n s(x) if f (x) > 0, : x s(x) 2 x otherwise, where s : R n R n is a selection of f with s(x) f (x). When f is continuously differentiable on R n \ lev 0 f, G f reduces to x f (x) G f : R n R n f (x) if f (x) > 0 and f (x) = 0, : x f (x) 2 x otherwise. The geometric interpretation and motivation of the subgradient projector come from the following: Proposition 2.3 Let f : R n R be lsc and subdifferentiable, and let s be a selection of f. (i) Whenever f (x) > 0, 0 f (x), we have where the half space G f,s (x) = P H (s(x),x)(x) (3) H (s(x), x) : = z R n f (x) + s(x), z x 0 }. (ii) The fixed point set of G f,s is Fix G f,s = x R n 0 f (x) } lev 0 f = Fix G f. If f is locally Lipschitz, then Fix G f,s is closed. (iii) If f is convex and inf R n f 0, then Fix G f,s = lev 0 f. Proof. (i). According to [7] or [20, page 133], for the half space H (a, β) := z R n a, z β }, where a R n, a = 0 and β R, its metric projection is given by x a,x β a if a, x > β, (4) P H (a,β)x = a 2 x if a, x β. Apply (4) with a := s(x), β := β(x) = s(x), x f (x). (ii). This follows from the definition of G f. When f is locally Lipschitz, f is uppersemicontinuous [51, Proposition 8.7], so x R n 0 f (x) } is closed. Being a union of two closed sets, Fix G is closed. (iii). When f is convex, 0 f (x) gives f (x) = min X f, so f (x) 0. Then x R n 0 f (x) } lev 0 f. Thus (iii) follows from (ii). 5

6 Remark 2.4 (i) Proposition 2.3(i) uses the Euclidean distance. Following [36, 8, 11, 10], one may define Bregman subgradient projectors for lsc and subdifferentiable functions. This will be explored in future work. (ii) Proposition 2.3(ii) shows that for subdifferential functions, the fixed point of G f gives x R n such that 0 f (x) or f (x) 0. Proposition 2.3(iii) shows that for convex functions, the fixed point of G f gives x R n such that f (x) 0. We give two simple examples to illustrate the difference of subgradient projectors between convex and nonconvex functions. Example 2.5 Consider f : R n R : x k x = x 1/k where k > 0. Then (i) When k 1, f is convex, G f = (1 k) Id is firmly nonexpansive. (ii) When k > 1, f is not convex, G f = (1 k) Id is not monotone and need not be nonexpansive, e.g, k = 3. Proof. For x = 0, we have and the result follows from the definition of G f. f (x) = 1 x k x 1/k 1 x, Let B denote the closed unit ball of R n. According to [51, Exercise 8.14], for a nonempty C R n the normal cone and regular normal cone mapping are respectively defined N C := ι C and ˆN C := ˆ ι C. Recall Fact 2.6 ([51, Example 8.53]) For f := d C in the case of a closed set C = in R n, one has at any point x C that f ( x) = N C ( x) B, ˆ f ( x) = ˆN C ( x) B. On the other hand, for any x C, one has f ( x) = x P C( x) d C ( x) x x, ˆ f ( x) = d C ( x) } if P C ( x) = x}, otherwise. Example 2.7 (subgradient projectors of distance functions) Let C = be a closed set in R n. Then G dc = P C. Proof. Let s be a selection of d C. By Fact 2.6, where p(x) P C (x). We show G dc,s = p. ( x C) s(x) = x p(x) d C (x) When x C, we have P C (x) = x} and p(x) = x. Because d C (x) = 0 for x C, the definition of G dc,s gives G dc,s(x) = x. Thus G dc,s(x) = x = p(x) for x C. 6

7 When x C, d C (x) > 0 and 0 d C (x) because every x d C (x) has x = 1 by Fact 2.6. Then for x C, G dc,s(x) = x d C (x) x p(x) d C (x) = p(x). Altogether, G dc,s(x) = p(x) for every x R n. When C is nonempty, closed and convex, the projection mapping P C is single-valued, and d C is continuously differentiable on R n \ C. Example 2.7 implies: Fact 2.8 ([11], [25]) Let C R n be nonempty, closed and convex. Then G dc = P C. What happens if we take the subgradient projector of a distance function to a set where the distance is taken with respect to another norm? The following example illustrates that using the Euclidean norm for d C in Example 2.7 is essential. Example 2.9 Define f : R 2 R : (x 1, x 2 ) x 1 + x 2, the distance function to C := (0, 0)} in l 1 -norm. When x 1 > 0, x 2 > 0, x 1 = x 2, we have G f (x 1, x 2 ) = ((x 1 x 2 )/2, (x 2 x 1 )/2) = (0, 0) = P C (x 1, x 2 ). Even using the dual norm of 1 for s(x 1, x 2 ), we have G f (x 1, x 2 ) = ( x 2, x 1 ) = (0, 0) = P C (x 1, x 2 ). Remark 2.10 Example 2.7 might lead the reader to believe that G f is a monotone operator. This holds for any twice differentiable convex function f : R R; see [12, Proposition 8.2]. However, this fails for f : R 2 R : (x 1, x 2 ) x 1 p + x 2 p when 1 < p < 2; see [12, Proposition 10.1(iii)]. The following example shows that the assumption that f being subdifferentiable is important in Definition 2.2. Example 2.11 The function defined by f : R R : x 1/ x if x = 0, 0 if x = 0, has G f = 2 Id on R so that Fix G f = Fix 2 Id = 0}. However, the function defined by 1/ x if x = 0, g : R (, + ] : x + if x = 0, has G g = 2 Id on R \ 0}. Because G g is not defined at x = 0, we have Fix G g = but Fix 2 Id = 0}. 3 Calculus for subgradient projectors In this section we obtain calculus results for subgradient projectors defined in Section 2 related to representations of subgradient projectors for max functions, compositions of functions with a linear operator, and positive powers of nonnegative functions. Subdifferential calculus is the main tool for proving these results. 7

8 A mapping Φ : R n R k is called strictly differentiable at x if the Fréchet derivative Φ ( x) exists and Φ(x) Φ(y) Φ ( x)(x y) lim = 0. x,y x x y y =x The following facts on subdifferentials are crucial to study the calculus of subgradient projectors. Fact 3.1 ([39, Theorem 6.5], [40, Theorem 1.110(ii)]) Assume that F : R n R k is locally Lipschitz at x R n, and g : R k R is strictly differentiable at F( x). Then for f (x) = g(f(x)), one has f ( x) = g (ȳ), F ( x) with ȳ = F( x). For a matrix A : R n R n, let A denote its transpose. Fact 3.2 ([39, Theorem 6.7(i)], [40, Proposition 1.112(i)], or [51, Exercise 10.7]) Let F : R n R n be strictly differentiable at x with F ( x) being invertible, and suppose that f (x) = g(f(x)) with g : R n (, + ] being lsc around ȳ = F( x) and f being finite at x. Then f ( x) = ( F ( x) ) g(ȳ) with ȳ = F( x). Fact 3.3 ([39, Theorem 7.5(ii)], [40, Theorem 3.46(ii)]) Let f 1, f 2 : R n R be locally Lipschitz at x and J( x) := j f j ( x) = max f 1, f 2 }( x) }. Then max f 1, f 2 }( x) conv f j ( x) j J( x) }, where the equality holds and max f 1, f 2 } is subdifferentially regular at x if the function f j is subdifferentially regular at x for j J( x). Proposition 3.4 Let f : R n R be lsc and subdifferentiable. (i) If k > 0 then G k f = G f. (ii) Let α R and s be a selection of f. Define G f,α : R n R n : x x x f (x) α s(x) 2 s(x) if f (x) > α and 0 f (x), otherwise. Then G f,α = G f α,s. (iii) Let α > 0 and s be a selection of f. Then G f α,s (x) = G f,s (x) + αs(x) s(x) 2 x if f (x) > α and 0 f (x), otherwise. Proof. (i). By Fact 3.1, (k f ) = k f. Note that k f (x) > 0 if and only if f (x) > 0, and 0 (k f )(x) if and only if 0 f (x). When k f (x) > 0 and 0 (k f )(x), for s(x) f (x), we have ks(x) (k f )(x) so that G k f,ks (x) = x k f (x) ks(x) 2 ks(x) = x f (x) s(x) 2 s(x) = G f,s(x). 8

9 When k f (x) 0 or 0 (k f )(x), we have f (x) 0 or 0 f (x), so G k f,ks (x) = x = G f,s (x). (ii). It suffices to note that ( f α) = f. (iii). When f (x) > α and 0 f (x), we have f (x) > 0 and 0 f (x). Then G f α,s (x) = x f (x) α s(x) 2 s(x) = x f (x) s(x) 2 s(x) + α s(x) 2 s(x) = G f,s(x) + α s(x) 2 s(x). When f (x) α or 0 f (x), G f α,s (x) = x by the definition. Proposition 3.5 Assume that f 1, f 2 : R n R are locally Lipschitz and subdifferentially regular. For the maximum function g := max f 1, f 2 }, one has G f1 (x) if g(x) > max f 2 (x), 0}, 0 f 1 (x), G G g (x) = f2 (x) if g(x) > max f 1 (x), 0}, 0 f 2 (x), V(x) if g(x) = f 1 (x) = f 2 (x) > 0, 0 conv ( f 1 (x) f 2 (x) ), x if g(x) 0, or 0 conv ( f 1 (x) f 2 (x) ), where V(x) := x f i(x) s(x) 2 s(x) s(x) conv ( f 1 (x) f 2 (x) )}. Proof. When g(x) > 0, we consider three cases: (i). g(x) > f 2 (x); (ii). g(x) > f 1 (x); (iii). g(x) min f 2 (x), f 1 (x)} which is g(x) = f 1 (x) = f 2 (x). Also note that g(x) = conv ( f 1 (x) f 2 (x) ) when f 1 (x) = f 2 (x) by Fact 3.3. Proposition 3.6 Assume that f : R n R is lsc and subdifferentiable, and that g(x) := f (kx) with 0 = k R. Then for every x R n. Moreover, Fix G g = 1 k Fix G f. G g (x) = 1 k G f (kx) Proof. By Fact 3.2, g(x) = k f (y) where y = kx, so 0 g(x) if and only if 0 f (y) with y = kx. Let s be a selection of f. When g(x) > 0 and 0 g(x), we have f (kx) > 0 and 0 f (kx), therefore (5) (6) where s(y) f (y) with y = kx. G g,ks(k ) (x) = x f (kx) ks(y) 2 ks(y) = x 1 f (kx) k s(y) 2 s(y) = 1 ( kx f (kx) ) k s(y) 2 s(y) = 1 k G f,s(kx) When g(x) 0 or 0 g(x), we have f (y) 0 or 0 f (y) with y = kx, thus G g,ks(k ) (x) = x = 1 k kx = 1 k G f,s(kx). This establishes the result. 9

10 Proposition 3.7 Let A : R n R n be unitary and b R n, and let f : R n R be lsc and subdifferentiable. Define g : R n R : x f (Ax + b). Then (7) G g (x) = A ( G f (Ax + b) b ) for every x R n. Furthermore, (8) Fix G g = A (Fix G f b). Proof. Let s be a selection of f. By Fact 3.2, g(x) = A f (y) where y = Ax + b. As A is unitary, A s(y) = s(y) for every s(y) f (y). When g(x) > 0 and 0 g(x), we have f (Ax + b) > 0 and 0 f (y) with y = Ax + b, therefore (9) (10) G g,a s(a +b)(x) = x ( f (Ax + b) A s(y) 2 A s(y) = A Ax + b = A (G f,s (Ax + b) b). ) f (Ax + b) s(y) 2 s(y) b When g(x) 0 or 0 g(x), we have f (y) 0 or 0 f (y) with y = Ax + b, thus G g,a s(a +b)(x) = x = A (Ax + b b) = A (G f,s (Ax + b) b). Hence (7) holds. Finally, (8) follows from (7). Corollary 3.8 Let a R n, f : R n R be lsc and subdifferentiable, and g(x) := f (x a). Then G g (x) = G f (x a) + a for every x R n. Moreover, Fix G g = a + Fix G f. Theorem 3.9 Assume that f : R n R + is locally Lipschitz, and g := f k with k > 0. Then ( G g = 1 1 ) Id + 1 k k G f. Proof. By Fact 3.1, g(x) = k f (x) k 1 f (x) when f (x) > 0. Let s be a selection of f. When g(x) > 0 and 0 g(x), we have f (x) > 0 and 0 f (x), therefore (11) (12) (Id G g,k f k 1 s )(x) = f (x) k k f (x) k 1 s(x) 2 k f (x)k 1 s(x) = 1 f (x) k s(x) 2 s(x) = 1 k (Id G f,s)(x). When g(x) = 0 or 0 g(x), we have f (x) = 0 or 0 f (x), thus (Id G g,k f k 1 s )(x) = 0 = (Id G f,s)(x). Therefore, Id G g,k f k 1 s = 1 k (Id G f,s) which gives G g,k f k 1 s = ( 1 1 k ) Id + 1 k G f,s. Remark 3.10 While Theorem 3.9 says that the convex combination of Id and G f,s is a subgradient projector, the set of subgradient projectors is not a convex set; see Theorems 10.1,10.3 in Section 10. Note that if U i : H H is a cutter (see [20] or Definition 5.1) with a common fixed point, i I := 1, 2,..., m}, and w : H m is an appropriate weight function, then the operator U := i I w i U i is a cutter, cf. [20, Corollary ]. 10

11 Corollary 3.11 For f := d 2 C in the case of a closed set C = in Rn, one has G f = Id +P C. 2 Proof. Combine Theorem 3.9 and Example 2.7. Example 3.12 (penalty function) Assume that g : R n R is locally Lipschitz. In optimization, for a direct constraint given by C := x R n g(x) 0 }, one can define penalty substitutes. Two popular penalty functions associated with g are: the linear penalty θ 1 g(x) = t + (g(x)) and quadratic penalty θ 2 g(x) = t 2 +(g(x)) where t + := max0, t}, cf. [51, page 4]. We have ( x R n ) G θ1 g(x) = G g (x). Because θ 2 g = (θ 1 g) 2, by Theorem 3.9 we obtain G θ2 g = Id +G θ 1 g. 2 The following is immediate from the definition of subgradient projectors. Proposition 3.13 Let f, g : R n R be lsc and subdifferentiable such that f g on an open set O R n. Then G f = G g on O. Remark 3.14 For calculus of subgradient projectors of convex functions, see [12, 44]. 4 Basic properties of subgradient projectors In this section under appropriate conditions we show that the subgradient projector can determine a function uniquely up to a positive scalar multiplication, that the subgradient projector enjoys the fixed point closedness property, and that the subgradient projector is continuous if the function is strictly differentiable. We start with some elementary properties of subgradient projectors. Theorem 4.1 Let f : R n R be lsc and subdifferentiable, and G f,s be given by (1). Then the following hold: (i) We have (13) x G f,s (x) = f (x) s(x) and (14) x G f,s (x) x G f,s (x) 2 = 1 f (x) s(x) for every x satisfying f (x) > 0 and 0 f (x). In particular, when f is locally Lipschitz, one has (15) x G f,s (x) x G f,s (x) 2 = s(x) (ln f )(x); f (x) 11

12 when f is continuously differentiable, one has (16) x G f (x) = (ln f (x)). x G f (x) 2 (ii) Set g := ln f when f > 0. Then whenever f (x) > 0 and 0 f (x) we have (17) G f,s (x) = x c(x) c(x) 2 where c(x) = s(x) f (x) g(x). If f is continuously differentiable on Rn \ lev 0 f, then (18) G f (x) = x g(x) g(x) 2, whenever f (x) > 0 and f (x) = 0. Proof. (i). By the definition of G f,s, when f (x) > 0 and 0 f (x), x G f,s (x) = Therefore, x G f,s (x) = f (x) s(x). It follows that (19) x G f,s (x) = f (x)2 s(x) s(x) 2 f (x) = x G f,s(x) 2 s(x) f (x), f (x) s(x) 2 s(x). equivalently, x G f,s (x) x G f,s (x) 2 = s(x). When f is locally Lipschitz, (15) holds because Fact 3.1 gives f (x) (ln f )(x) = f (x) when f (x) > 0. When f is continuously differentiable, s(x) = f (x), hence f (x) (16) follows from (ln f (x)) = f (x)/ f (x) when f (x) > 0. (ii). By Fact 3.1, we have g(x) = f (x) f (x) (17) follows since 1 c(x) 2 = f (x)2 s(x) 2 when f (x) > 0 and 0 f (x). s(x) when f (x) > 0. Then c(x) = where s(x) f (x). f (x) When f is continuously differentiable on R n \ lev 0 f, the same holds for g, so (18) follows. 4.1 When is a mapping T a subgradient projector? Theorem 4.2 Given a mapping T : R n R n. The following are equivalent: (i) T is the subgradient projector of a locally Lipschitz function. (ii) There exists a locally Lipschitz function f : R n R such that (20) x T(x) (ln f (x)) x T(x) 2 whenever f (x) > 0 and 0 f (x), Tx = x whenever f (x) 0 or 0 f (x). 12

13 Proof. (i) (ii). Suppose that T = G f,s with f being locally Lipschitz. Apply Theorem 4.1(i) to obtain (20). (ii) (i). Assume that (20) holds. By Fact 3.1, ln f = f f 0 f (x), (20) gives 1 x Tx = s(x) i.e., x Tx = f (x) when f > 0. When f (x) > 0 and f (x) s(x) where s(x) f (x). Then using (20) again we obtain x Tx = x Tx 2 s(x) so that f (x) as required. Tx = x x Tx 2 s(x) ( ) 2 f (x) s(x) f (x) = x s(x) f (x) = x f (x) s(x) 2 s(x) Can the functions Theorem 4.2(i) and (ii) be different? This is answered in the next subsection. 4.2 Recovering f from its subgradient projector G f Can one determine the function f if G f is known? To this end, we recall the concept of essentially strictly differentiable functions by Borwein and Moors [15, Section 4]. Definition 4.3 A locally Lipschitz function f : R n R is called essentially strictly differentiable on an open set O R n if f is strictly differentiable everywhere on O except possibly on a Lebesgue null set. This class of functions has been extensively studied by Borwein and Moors [15]. This class of functions includes finite-valued convex functions, Clarke regular locally Lipschitz functions, semismooth locally Lipschitz functions, C 1 functions and others, [15, pages ]. If a locally Lipschitz function is essentially strictly differentiable, then f is single-valued almost everywhere. Moreover, the Clarke subdifferetial c f, which can be written as conv f (the convex hull of f ) when f is locally Lipschitz [40, Theorem 3.57], can be recovered by every densely defined selection s f ; see, e.g., [15]. We refer the reader to [23] and [51] for details on the Clarke subdifferential. Fact 4.4 Let f, g be locally Lipschitz on a polygonally connected and open subset O of R n. If f = g almost everywhere on O, then h := f g is a constant on O. Proof. We prove this by contradiction. Rademacher s Theorem says that a locally Lipschitz function is differentiable almost everywhere, see, e.g., [28, page 81]. By the assumption, h is locally Lipschitz, so h = 0 almost everywhere. Suppose that x, y O and h(x) = h(y). As O is polygonally connected, there exists z O such that either [x, z] O with h(x) = h(z) or [z, y] O with h(z) = h(y). Without loss of generality, assume [z, y] O and h(z) = h(y). As h is differentiable almost everywhere, by Fubini s Theorem [49, Theorem 6.2.2, page 110], we can choose z nearby z and ỹ nearby y so that both h is differentiable and h = 0 almost everywhere on [ z, ỹ] O, and h( z) = h(ỹ). Then h(ỹ) h( z) = 1 0 h( z + t(ỹ z)), ỹ z dt = 1 0 0dt = 0 which contradicts h( z) = h(ỹ). 13

14 Theorem 4.5 Let T : R n R n be a subgradient projector. Suppose that there exist two essentially strictly differentiable functions f, f 1 : R n R such that G f,s = T = G f1,s 1 with s being a selection of f and s 1 being a selection of f 1. Then on each polygonally connected component of R n \ Fix T there exists k > 0 such that f = k f 1. Proof. Assume that there exist two essentially strictly differentiable and locally Lipschitz functions f, f 1 such that T = G f,s = G f1,s 1. Since T has a full domain, we have dom f = dom f 1 = R n. By Theorem 4.2, we have x T(x) x T(x) 2 (ln f (x)) whenever x Rn \ Fix T, x T(x) x T(x) 2 (ln f 1(x)) whenever x R n \ Fix T. As f, f 1 are locally Lipschitz, both ln f, ln f 1 are locally Lipschitz on R n \ Fix T. Then ln f = 1 f f, ln f 1 = 1 f 1 f 1 by Fact 3.1 or [23, Theorem 2.3.9(ii)]. Because f, f 1 are essentially strictly differentiable and locally Lipschitz, f, f 1 are single-valued almost everywhere [15], thus (ln f 1 (x)) = x T(x) x T(x) 2 = (ln f (x)) almost everywhere on Rn \ Fix T. By Fact 4.4, on each polygonally connected component of R n \ Fix T, there exists c R such that ln f ln f 1 = c, which implies that f 1 = k f for k = e c > 0. Example 4.6 Define and Then 2x if x > 0, f 1 : R R : x 0 if 1 x 0, 3(x + 1) if x < 1, x if x > 0, f 2 : R R : x 0 if 1 x 0, 1(x + 1) if x < 1. 0 if x 0, ( x R) G f1 (x) = G f2 (x) = x if 1 x 0, 1 if x < 1. The set R \ [ 1, 0] has two connected components (, 1) and (0, + ). We have f 1 = 3 f 2 on (, 1), and f 1 = 2 f 2 on (0, + ). The following example shows that Theorem 4.5 fails if one removes the assumption of essentially strictly differentiability. Example 4.7 In [16], Borwein, Moors and Wang showed that generically nonexpansive Lipschitz functions have their limiting subdifferentials identically equal to the unit ball; see also [53]. Let f : R n R be a locally Lipschitz function such that f (x) = B for every x R n. Since 0 f (x) for every x R n, in view of Definition 2.2 we have G f = Id. As such, generically nonexpansive Lipschitz functions have a subgradient projector equal to the identity mapping. 14

15 4.3 Fixed point closed property and continuity Definition 4.8 We say that an operator T : D R n is fixed point closed at x D if for every sequence x k x with x k Tx k 0 one has x = Tx. If this holds for every x D, we say that T has the fixed point closed property on D. In [20], Cegielski calls the fixed point closed property of T as Id T being closed at 0. Theorem 4.9 (fixed-point closed property) Let f : R n R be locally Lipschitz and G f,s be given by Definition 2.2. Then G f,s is fixed-point closed at every x R n, i.e., (21) y G f,s (y) 0 and y x x = G f,s (x). Proof. Assume that a sequence (y n ) n N in R n satisfies (22) y n G f,s (y n ) 0 and y n x. Consider three cases. Case 1. If there exists infinitely many y n s, say (y nk ) k N, such that 0 f (y nk ). Since f is upper semicontinuous, taking limit when k gives 0 f (x). Hence x = G f (x). Case 2. If there exists infinitely many y n s, say (y nk ) k N, such that f (y nk ) 0. Taking limit when k and using the continuity of f at x gives Hence x = G f (x). f (x) = lim k f (y nk ) 0. Case 3. There exists N N such that f (y n ) > 0 and 0 f (y n ) when n > N. Then by (13), (23) f (y n ) = y n G f,s (y n ) s(y n ). As f is continuous at x, f is locally Lipschitz around x, so f is locally bounded around x. Therefore, f (x) = lim f (y n ) = lim ( y n G f,s (y n ) s(y n ) ) = 0 n n since y n G f,s (y n ) 0. Hence x = G f,s (x). Altogether, x Fix G f,s. This establishes (21) because (y n ) n N was an arbitrary sequence satisfying (22). The following result generalizes [12, Theorem 5.6]. Theorem 4.10 Let f : R n R be a locally Lipschitz function and essentially strictly differentiable, and let G f,s be given by Definition 2.2. Suppose that x R n \ Fix G f,s. Then the following statements are equivalent: (i) G f,s is continuous at x. (ii) f is strictly differentiable at x. Consequently, G f,s is continuous on R n \ Fix G f,s if and only if f is continuously differentiable on R n \ Fix G f,s. 15

16 Proof. (ii) (i). Assume that f is strictly differentiable at x R n \ Fix G f,s. Under the assumption, s : R n R n is continuous at x and s(x) = 0. The result follows from the definition G f,s : y y f (y) s(y) 2 s(y). (i) (ii). Assume that G f,s is continuous at x R n \ Fix G f,s. By (14), y G f,s (y) s(y) = f (y) y G f,s (y) 2 so s is continuous at x. Because s is a selection of f and f is essentially strictly differentiable, we conclude that f is strictly differentiable at x. Note that Fix G f,s is closed by Proposition 2.3(ii). The remaining result follows from the fact that on an open set on which a function is finite, the function is continuously differentiable if and only if the function is strictly differentiable; cf. [51, Corollary 9.19]. We illustrate Theorem 4.10 by two examples. Example 4.11 Define Then f : R n R : x x if x 1, 2x 1 if x > 1. 0 if x < 1, G f,s (x) = 1/2 if x > 1, 1 1 where s(x) [1, 2] if x = 1, s(x) is discontinuous at x = 1, because f is not differentiable at x = 1. Proof. When x < 0, G f,s (x) = x x ( 1) 2 ( 1) = 0; When x = 0, f (0) = 0, so G f,s (0) = 0; When 0 < x < 1, G f,s (x) = x x 1 2 (1) = 0; When x > 1, G f,s (x) = x 2x = 1/2; When x = 1, f (1) = [1, 2], so where s(x) [1, 2]. G f,s (x) = x 1 s(x) 2 (s(x)) = x 1 s(x) The next example gives a function that is differentiable but not strictly differentiable at 0, and that its subgradient projector is not continuous at 0. Example 4.12 Define f : R R : x x 2 sin 1 x + x + 1 if x = 0, 0 if x = 0. Then f is differentiable everywhere, but not strictly differentiable at 0. The subgradient projector x x 2 sin(1/x)+x+1 if f (x) > 0 and f (x) = 0, G f (x) = 2x sin(1/x) cos(1/x 2 )+1 x otherwise, is not continuous at 0. 16

17 Proof. At x = 0, f (0) = 1 and f (0) = 1. The function f is not strictly differentiable at 0 because f is not continuous at 0. Since lim x 0 G f (x) does not exist, the subgradient projector is not continuous at 0. How about the continuity of G f,s on Fix G f,s? Since G f,s = Id on Fix G f,s, it is always continuous at x int(fix G f,s ). The following result deals with the case of x bdry(fix G f,s ). Theorem 4.13 Let f : R n R be locally Lipschtiz, G f,s be given by Definition 2.2, and x bdry(fix G f,s ). (i) Assume that f (x) > 0 and 0 f (x). Then G f,s is discontinuous at x. (ii) Assume that f (x) 0. Suppose that one of the following holds: (a) α > 0 such that (24) ( y : f (y) > 0, 0 f (y)) α f (y) + s(y), x y 0. (b) In particular, this is true when f is convex. (25) 0 f (x). (c) (26) lim inf s(y) > 0. y x f (y)>0,0 f (y) Then G f,s is continuous at x. Proof. (i). As x bdry(fix G f ), there exists a sequence (y k ) k N such that y k x, f (y k ) > 0 and 0 f (y k ). Because f is locally Lipschitz and s(y k ) f (x), (s(y k )) k N is locally bounded. By taking a subsequence if necessary, we can assume that s(y k ) l R +. Taking limit when k yields that y k G f,s (y k ) = f (y k) s(y k ) f (x) l which is + if l = 0 or a positive number if l > 0. Because G f,s (x) = x, this shows that G f,s is not continuous at x. (ii). To show that G f,s is continuous at x, it suffices to show that (27) lim y x f (y)>0,0 f (y) f (y) s(y) = 0. Indeed, by Theorem 4.1, when f (y) > 0 and 0 f (y), we have y G f,s (y) = f (y) s(y). Then (27) gives that lim y x G f,s (y) = lim y x (G f,s (y) y) + lim y x y = x. When y Fix G f,s, G f (y) = y, clearly lim y x G f (y) = x. Hence G f,s is continuous at x. 17

18 Now (24) gives so that which implies (27). f (y) s(y), y x α f (y) y x s(y) α y x s(y) α Next, we show (25) implies (26). Note that (25) gives d f (x) (0) > 0 since f (x) is closed by [51, Theorem 8.6]. Because f is locally Lipschitz, in view of [51, Proposition 8.7], we have that lim sup y x f (y) f (x), hence 0 f (y) for y sufficiently nearby x. Invoking [51, Corollary 4.7(b)], we obtain lim inf d f y x (y) (0) d f (x) (0), from which it follows that (28) (29) lim inf s(y) y x f (y)>0,0 f (y) lim inf d y x f (y) (0) f (y)>0,0 f (y) lim inf d f y x (y) (0) d f (x) (0) > 0, and this gives (26). Finally, (26) gives (27) because lim y x f (y) = 0 and 0 lim inf y x f (y)>0,0 f (y) f (y) s(y) lim sup y x f (y)>0,0 f (y) f (y) s(y) lim y x, f (y)>0 f (y) lim inf y x s(y) = 0. f (y)>0,0 f (y) Here is an example showing the result of Theorem 4.13(i). Example 4.14 (1). Define f : R R : x x Then 2x G f (x) = 3 1 if x = 0 and x > 1, 3x 2 x if x = 0 or x 1, has Fix G f = (, 1] 0}, and G f is not continuous at x = 0. (2). Define Then f : R R : x G f (x) = x + 1 if x 0, 1 if x 0. 1 if 1 < x < 0, x if x 1 or x 0, has Fix G f = (, 1] [0, + ), and G f is not continuous at x = 0. 18

19 4.4 The family of subgradient projectors Theorem 4.15 Let f : R n R be locally Lipschitz. Then the following are equivalent: (i) G f is a single-valued. (ii) f is strictly differentiable on R n \ Fix G f. Proof. (i) (ii). By (14) in Theorem 4.1, when x R n \ Fix G f, we have x G f,s (x) s(x) = f (x) x G f,s (x) 2 where s(x) f (x). By the assumption G f = T for an everywhere single-valued T : R n R n, so x Tx s(x) = f (x) x Tx 2. It follows that f (x) is a singleton, so f is strictly differentiable at x by [51, Theorem 9.18]. Therefore, f is strictly differentiable on R n \ Fix G f. (ii) (i). Clear. Theorem 4.16 Let C R n be a nonempty closed set. Then the following are equivalent: (i) G dc is a single-valued. (ii) C is convex. Proof. According to Fact 2.6, we have Fix G dc = C. (i) (ii). By Theorem 4.15, d C is strictly differentiable on R n \ C. Fact 2.6 shows that P C is singlevalued for every x R n \ C. Hence, C is convex; cf. [27, Theorem 12.7]. (ii) (i). Apply Fact When is the subgradient projector G f a cutter or local cutter? In this section we provide conditions for a subgradient projector to be a cutter or local cutter, and an explicit nonconvex function with a cutter subgradient projector. Along the way some calculus on cutter subgradient projectors are also developed. 5.1 Cutters, quasi-firmly nonexpansive mappings, and local cutters Recall the following well-known algorithmic operators. Definition 5.1 ([20, page 53]) Let D be a nonempty subset of R n and T : D R n. We say that T is a cutter if Fix T = and (30) ( x D)( u Fix T) x Tx, u Tx 0. 19

20 Definition 5.2 ([20, page 56]) Let D be a nonempty subset of R n and T : D R n. We say that T is quasi-firmly nonexpansive (quasi-fne) if Fix T = and ( x D)( u Fix T) Tx u 2 + x Tx 2 x u 2. In [20, page 56], quasi-fne mappings are called strongly quasinonexpansive mappings. The following fact says that a cutter is strongly Fejer monotone with respect to the set of its fixed points, and that cutters and quasi-fne mappings are the same, see [20, page 108]. Fact 5.3 ([20, Theorem , Lemma ]) (i) A mapping T : D R n is a cutter if and only if T is quasi-fne. (ii) Let T : D R n be a cutter. Then T is always continuous on Fix T. (iii) Let T : D R n be a cutter. Then Fix T is closed and convex. In Definitions 5.1 and (i), they require that T satisfies the inequalities for all x D and u Fix T. In practice, the sets D and Fix T might be too large to verify those inequalities. We now introduce local cutters and locally quasi-firmly nonexpansive mappings. Definition 5.4 A mapping T : D R n is a local cutter at x Fix T if Fix T = and there exists δ > 0 such that (31) ( x B( x, δ) D)( u B( x, δ) Fix T) x Tx, u Tx 0. Definition 5.5 A mapping T : D R n is locally quasi-firmly nonexpansive (locally quasi-fne) at x Fix T if there exists δ > 0 such that ( x B( x, δ) D)( u B( x, δ) Fix T) Tx u 2 + Tx x 2 x u 2. A localized version of Fact 5.3(i) comes next. Proposition 5.6 A mapping T : D R n is a local cutter at x Fix T if and only if T is locally quasi-fne at x Fix T. Proof. This follows from x u 2 = Tx u 2 + x Tx x Tx, Tx u. Proposition 5.7 Assume that T : R R and Fix T =. Then T is a cutter on R if and only if (32) ( x R) Tx [x, P Fix T x]. Proof. The sufficiency is clear. Conversely, when x Fix T, (32) clearly holds. Assume x Fix T and c Fix T. Because T is from R to R, there exists λ R such that As T is a cutter, we have Tx = (1 λ)x + λc. (x Tx)(c Tx) = λ(1 λ)(x c) 2 0, which gives 0 λ 1, so that Tx [x, c]. Since c Fix T was arbitrary, it follows that x [x, P Fix T x]. 20

21 Remark 5.8 Compare Proposition 5.7 to Corollary 8.4, which characterizes the subgradient projector of a convex function on R. For a nonempty convex set C R n, the recession cone of C is rec C := x R n x + C C }. The negative polar of K R n is K := y R n y, x 0, x K }. Proposition 5.9 Let T : R n R n be a cutter. Then ran(id T) (rec(fix T)). Consequently, when Fix T is a linear subspace, ran(id T) (Fix T). In other words, ran(id T) (ker(id T)). Proof. Let x Tx ran(id T) and v rec(fix T). Then for every k > 0 and u Fix T, we have u + kv Fix T. The assumption of T being a cutter implies x Tx, u + kv Tx 0 x Tx, u/k + v Tx/k 0. When k this gives x Tx, v 0. Since v rec(fix T) was arbitrary, we have x Tx (rec(fix T)). When Fix T is a linear subspace, Fix T = rec(fix T) and (rec(fix T)) = (rec(fix T)). 5.2 Characterizations of G f being a cutter or local cutter Our first result characterizes the class of functions f for which its G f,s is a cutter. Lemma 5.10 Let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Suppose that f (x) > 0, 0 f (x). Then x G f,s (x), u G f,s (x) = f (x) s(x) 2 ( f (x) + s(x), u x ). Proof. Let f (x) > 0, 0 f (x). The definition of G f,s gives x G f,s (x), u G f,s (x) = f (x)s(x) s(x) 2, u x + f (x) s(x) 2 s(x) = f (x) s(x), u x + f 2 (x) s(x) 2 s(x) 2 = f (x) ( ) f (x) + s(x), u x. s(x) 2 Theorem 5.11 (level sets of tangent planes including the target set) Let f : R n subdifferentiable, let G f,s be given by Definition 2.2, and R be lsc and S := u R n f (u) 0 or 0 f (u) }. Then the following hold: 21

22 (i) G f,s is a cutter if and only if whenever x S and u S one has f (x) + s(x), u x 0. (ii) Let x S and δ > 0. G f,s is a cutter on B( x, δ) if and only if for all x B( x, δ) \ S and u S B( x, δ) one has f (x) + s(x), u x 0. Proof. (i). When f (x) 0 or 0 f (x), x = G f,s (x), (31) holds for T = G f,s. Assume that f (x) > 0, 0 f (x) and s(x) f (x). By Lemma 5.10, x G f,s (x), u G f,s (x) = f (x) s(x) 2 ( f (x) + s(x), u x ). Since f (x) > 0, we deduce that x G f,s (x), u G f,s (x) 0 f (x) + s(x), u x 0. Hence, the result follows from Definition 5.1. (ii). Apply the same arguments as in above with x B( x, δ) and u S B( x, δ). One immediately obtains the following: Fact 5.12 ([20, page 146]) Let f : R n R be convex, let G f,s be given by Definition 2.2, and lev 0 f =. Then G f,s is a cutter. Consequently, G f,s is continuous at every x lev 0 f. Proof. As lev 0 f =, Fix G f,s = lev 0 f. Assume that f (x) > 0. For u G f,s, f (u) 0. By the convexity of f we have f (x) + s(x), u x f (u) 0. Theorem 5.11 shows that G f,s is a cutter. The remaining result follows from Fact 5.3(ii). In Fact 5.12, lev 0 f = is required, as the following example shows. Example 5.13 (1). Let f : R R be defined by ( x R) x 1 if x > 0, G f (x) = 0 if x = 0, x + 1 if x < 0. f (x) := exp x. Then lev 0 f = and In particular, this G f is discontinuous at x = 0 and not a cutter. Moreover, G f is not monotone. (2). Consider f : R n R : x exp( x 2 /2). We have lev 0 f = and G f (x) = x x x 2 if x = 0, 0 if x = 0. In particular, G f is not continuous at 0, so not a cutter. 22

23 Example 5.14 The nonconvex function f : R n R : x x 2 if x 1, 1 if x > 1, has G f being a cutter on a neighborhood of 0, but not a cutter on R n. It is instructive to consider d C where C R n is closed and nonempty. Proposition 5.15 Let C R n be closed and nonempty, and s be a selection of d C. Then G dc,s is a cutter if and only if the set C is convex. Proof. By Fact 2.6, 0 d C ( x) whenever x C, because v = 1 for every v d C ( x). This implies that Fix G dc,s = C. Assume that G dc,s is a cutter. Then Fix G dc,s = C is convex. Conversely, assume that C is convex. We have d C is convex, consequently, G dc,s is a cutter. Theorem 5.16 Let k 1, f : R n [0, + ) be locally Lipschitz, and let G f,s be given by Definition 2.2 Suppose that G f,s is a cutter, and Fix G f,s =. If g = f k, then G g,k f k 1 s is a cutter. Proof. By Theorem 3.9, G g,k f k 1 s = (1 1/k) Id +1/kG f,s. As Id and G f,s are both cutters, and Fix G f,s Fix Id = Fix G f,s =, being a convex combination of cutters, G g,k f k 1 s is a cutter by [20, Corollary , page 62]. In Corollary 11.6, we give an example showing even though G f 2,2 f s is a cutter, G f,s might not be a cutter; so the converse of Theorem 5.16 is not true. Theorem 5.17 Let A : R n R n be unitary, b R n, let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Define g : R n R : x f (Ax + b). If G f,s is a cutter, then G g,a s(a +b) is a cutter. Proof. Let x R n, u Fix G g,a s(a +b). Proposition 3.7 gives G g,a s(a +b)(x) = A ( G f,s (Ax + b) b ), Au + b Fix G f,s. Since A is unitary and G f,s is a cutter, we have (33) (34) (35) (36) (37) x G g,a s(a +b)(x) 2 = x A ( G f,s (Ax + b) b ) 2 = Ax + b G f,s (Ax + b) 2 Ax + b (Au + b) 2 G f,s (Ax + b) (Au + b) 2 = x u 2 G f,s (Ax + b) Au b 2 = x u 2 A ( G f,s (Ax + b) b ) u 2 = x u 2 G g,a s(a +b)(x) u 2. Hence G g,a s(a +b) is a cutter by Fact 5.3(i). Corollary 5.18 Let B be an n n symmetric matrix. Define f : R n R : x 1 2 x Bx. 23

24 Then x x Bx Bx if x (38) G f (x) = 2 Bx Bx > 0 and Bx = 0, 2 x otherwise. Moreover, the following are equivalent: (i) G f is a cutter. (ii) B is positive semidefinte or negative semidefinite. Proof. (38) follows from Definition 2.2. Because B is symmetric, there exists an orthogonal matrix Q such that Q BQ = D, where D is an n n diagonal matrix whose diagonal entries are eigenvalues of B. Using x = Qy, Theorem 5.17 shows that G f is a cutter if and only if G g is a cutter, where g : R n R : y f (Qy) = 1 2 y Dy. (i) (ii). (i) implies that G g is a cutter. This means that (39) ( y R n : y Dy > 0)( u R n : u Du 0) y Du 1 2 y Dy. We will show that all nonzero diagonal entries of D have the same sign. Suppose to the contrary that there exist diagonal entries of D such that λ i > 0, λ j < 0. Put y k = 0, u k = 0 for k = 1,..., n, k = i, j. Then (39) reduces to that whenever λ i y 2 i + λ j y 2 j > 0 and (40) λ i u 2 i + λ j u 2 j 0, we have (41) λ i y i u i + λ j y j u j 1 2 (λ iy 2 i + λ j y 2 j ). Fix (y i, y j ) such that λ i y 2 i + λ j y 2 j > 0 and y j < 0. When u i = 0, u j +, (40) is verified but (41) fails to hold. This contradicts that G g is a cutter. Hence all nonzero diagonal entries of D must have the same sign, which implies that B is positive semidefinite if positive sign, and B is negative semidefinite if negative sign. (ii) (i). When B is positive semidefinite, f is convex, we apply Fact When B is negative semidefinite, G g = Id is a cutter. Theorem 5.19 Let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Assume that R k = 0 and g(x) = f (kx). If G f,s is a cutter, then G g,ks(k ) is a cutter. Proof. Proposition 3.6 gives G g,ks(k ) (x) = 1 k G f,s(kx), and Fix G g,ks(k ) = 1 k Fix G f,s. Let x X and u Fix G g,ks(k ). We have (42) x G g,ks(k ) (x), u G g,ks(k ) (x) = x 1/kG f,s (kx), u 1/kG f,s (kx) (43) = 1/k 2 kx G f,s (kx), ku G f,s (kx) 0 since G f,s is a cutter. Therefore, G g,ks(k ) is a cutter. One might ask: If each function f i : R n R has G fi being a cutter, must the maximum g := max f 1, f 2 } have G g being a cutter? The answer is negative as the following example shows. 24

25 Example 5.20 Let f 1, f 2 : R n R be defined by f 1 (x) := 1 + x and f 2 (x) := 1 x on R. Each G fi is a cutter by Fact The function g(x) := max f 1 (x), f 2 (x)} has 1 if x > 0, G g (x) = 0 if x = 0, 1 if x < 0, which is not continuous at x = 0, so G g is not a cutter. 5.3 A nonconvex function whose G f is a cutter Example 5.21 If f is not convex, G f,s need not be a cutter. Consider Then the subgradient projector of f is G f,s (x) = f : R R : x 1 exp( x 2 ). x ( 1 2x exp( x 2 ) 1 2x ) if x = 0, 0 if x = 0, and Fix G f = 0}. However G f is not a cutter. Indeed, when x > 2 we have f (x) + s(x)(0 x) = 1 exp( x 2 ) + (2x exp( x 2 ))(0 x) By Theorem 5.11, G f is not a cutter. = exp(x2 ) (1 + 2x 2 ) exp(x 2 ) = x2 (x 2 2) 2 exp(x 2 ) > x2 + x4 2 (1 + 2x2 ) exp(x 2 ) Example 5.22 Even though f is not convex, G f,s may still be a cutter. Define 0 if x 0, x if 0 x 20/7, f : R R : x 8(x 2.5) if 20/7 x 3, 2(x 1) if x > 3. Then f is not convex since f (x) is not monotone on [20/7, + ). However, its subgradient projector x if x 0, 0 if 0 < x < 20/7, 20 7 G f,s (x) = if x = 20/7, where s(x) [1, 8], s(x) 2.5 if 20/7 < x < 3, 3 4 if x = 3, where s(x) 2, 8}, s(x) 1 if x > 3. 25

26 is a cutter. To see this, by Theorem 5.11, it suffices to consider zero level sets of tangent planes. Indeed, Let f (u) 0, i.e., u 0. When x 0 > 3, when x 0 = 3, f (x 0 ) + s(x 0 )(u x 0 ) = 2(u 1) 0; f (x 0 ) + s(x 0 )(u x 0 ) = 4 + s(3)(u 3) 4 + 2(u 3) 0; where 2 s(3) 8; when 20/7 < x 0 < 3, when x 0 = 20/7, f (x 0 ) + s(x 0 )(u x 0 ) = 8(u 2.5) 0; f (x 0 ) + s(x 0 )(u x 0 ) = 70/2 + s(20/7)(u 20/7) u 0; where 1 s(20/7) 8; when 0 < x 0 < 20/7, f (x 0 ) + s(x 0 )(u x 0 ) = u 0. See Corollary 11.6(ii) for an example on R 2. Note that even if G f is continuous, it does not mean that G f is a cutter, e.g., see Example 2.5(ii). In [20], Cegielski developed a systematic theory for cutters. The theory of cutters can be used to study the class of functions (Theorem 5.11) whose subgradient projectors are cutters. One might also ask: If f : R n R has G f,s being a cutter, does g := f + r have G g,s being a cutter for every r R? In general, the answer is negative. When f is convex and lev 0 f =, it follows from Fact 5.12 that G f r,s is a cutter whenever r > 0. This might fail for r < 0 as the following example shows. Example 5.23 Let f : R R be defined by f (x) := e x 1. Then x 1 + e x if x > 0, G f,s (x) = x + 1 e x if x < 0, 0 if x = 0, is a cutter by Fact However, for g : R R : x e x, we have g = f + 1 and but G g,s is not a cutter by Example 5.13(1). For a nonconvex function function, although G f,s is a cutter, G f r,s might not be a cutter even when r > 0. Example 5.24 Let f be given by Example 5.22, and g := f 20/7. Then x if x 20/7, 20 G g,s (x) = 7 if 20/7 < x < 3, s 17/7, 20/7} if x = 3, 17/7 if x > 3. As shown in Example 5.22, G f,s is a cutter. However, G g,s is not a cutter by using Proposition 5.7 or by direct calculations using Definition

27 6 Convergence analysis of subgadient projectors In this section, we study the convergence of the sequence generated by the subgradient projector. When the function is convex, the convergence analysis has been fairly well known; see, e.g., [47, Section 5.3], [46], [9], and [20]. For nonconvex functions, we demonstrate that the convergence results on cutters, local cutters, quasi-ne mappings, and local quasi-ne mappings can be effectively used. It turns out that local cutters and local quasi-ne mappings are more appropriate for nonconvex functions. In addition to cutters and local cutters, see Definitions 5.1 and 5.4, quasi-nonexpansive mappings and local quasi-nonexpansive mappings are also useful for the convergence analysis. 6.1 Quasi-nonexpansive mappings and local quasi-nonexmapsive mappings According to [7, page 59], and [20, page 47], we define: Definition 6.1 Let D be a nonempty subset of R n and T : D R n. We say that (i) T is quasinonexpansive (quasi-ne) if ( x D)( y Fix T) Tx y x y. (ii) A mapping T : D D is said to be asymptotically regular at x D if T k+1 x T k x 0 as k ; it is said to be asymptotic regular on D if it is so at every x D. Definition 6.1(ii) requires that T satisfy the inequalities for all x D and y Fix T. In practice, the sets D and Fix T might be too large to verify those inequalities. We now introduce locally quasinonexpansive mappings. Definition 6.2 A mapping T : D R n is locally quasinonexpansive (locally quasi-ne) at x Fix T if there exists δ > 0 such that ( x B( x, δ) D) ( y B( x, δ) Fix T) Tx y x y. The connection between quasi-ne mappings and quasi-fne mappings is given by the following fact. Fact 6.3 ([9, Proposition 2.3(v) (vi)], [20, Corollary ]) Let D be a nonempty subset of R n, and T : D R n with Fix T =. Then the following are equivalent: (i) T is quasi-fne. (ii) 2T Id is quasi-ne. The following result says that quasi-ne, nonexpansiveness, and local quasi-ne are the same for linear mappings. Although the equivalence of quasi-ne and nonexpansiveness for linear mappings has been given in [7, Exercise 4.4], the equivalence to local quasi-ne is new. Proposition 6.4 Let T : R n R n be a linear operator. Then the following are equivalent: 27

28 (i) T is quasi-ne. (ii) T is nonexpansive. (iii) There exists δ > 0 and x Fix T such that T is quasi-ne on B( x, δ). Proof. (i) (ii). Since 0 Fix T, we have Tx x for every x R n. Hence T is nonexpansive. (ii) (i). Clear. (ii) (iii). Clear. (iii) (ii). The assumption means that there exists x Fix T and δ > 0 such that (44) ( x B( x, δ))( y B( x, δ) Fix T) Tx y x y. Let v B(0, δ). Using T x = x, y = x, and T being linear, from (44) we obtain Tv = T( x + v) T x = T( x + v) x ( x + v) x = v. Since v B(0, δ) was arbitrary and T is linear, we have Tv v for every v R n. Hence T is nonexpansive. Remark 6.5 Fact 6.3 and Proposition 6.4 hold in Hilbert spaces. We formulate them only in R n. The following example illustrates that for nonlinear T, quasinonexpanseness and nonexpanseness are different. Example 6.6 Define T : R R : x Then T is quasi-ne but not nonexpansive. x 2 sin 1 x if x = 0, 0 if x = 0. Proof. T is quasi-ne because that Fix T = 0} and ( x R) T(x) 0 = x 2 sin 1 x x 2 x. T is not nonexpansive because for x > 0, we have and T (1/(2nπ)) = nπ > 1. T (x) = 1 2 sin 1 x 1 2x cos 1 x For analogous results on linear cutters, see Proposition 9.1 in Section 9. Although we have developed calculus for G f being cutters in Section 5, most results also hold for quasi-ne mappings. We single out two most important ones. Theorem 6.7 Let k 1 and f : R n [0, + ) be locally Lipschitz, and let G f,s be given by Definition 2.2. Suppose that G f,s is quasi-ne and Fix G f,s =. If g = f k, then G g,k f k 1 s is quasi-ne. Proof. Apply Theorem 3.9 and [7, Exercise 4.11]. Theorem 6.8 Let A : R n R n be unitary and b R n, let f : R n R be lsc and subdifferentiable, and let G f,s be given by Definition 2.2. Define g : R n R : x f (Ax + b). If G f,s is quasi-ne, then G g,a s(a +b) is quasi-ne. 28

29 Proof. Apply Proposition 3.7 and Definition 6.1. Corollary 6.9 Let k 1 and f : R n [0, + ) be locally Lipschitz with G f,s be given by Definition 2.2. Suppose that g = f 2 and Fix G f,s =. Then G f,s is quasi-ne if and only if G g,2 f s is quasi-fne. Proof. By Theorem 3.9, G g,2 f s = G f,s+id 2. The result then follows from Fact Convergences of cutters, local cutters, quasi-ne mappings, and local quasi-ne mappings Proposition 6.10 (convergence of iterates of a cutter) Let D R n be a nonempty closed convex set, T : D D be an operator with a fixed point, and that T has the fixed point closed property on D. If T is a cutter, then for every x D, the sequence (T k x) k N converges to a point z Fix T. Proof. Since T is a cutter, T is quasi-fne by Fact 5.3(i), so quasi-ne. Moreover, T is asymptotically regular by [20, Theorem 3.4.3]. The result now follows from [20, Theorem 3.5.2]. Proposition 6.11 (convergence of iterates of a locally quasi-fne mapping) Let D be nonempty closed convex subset of R n, let T : D D and Fix T =. Assume that (i) There exists x Fix T and δ > 0 such that T is locally quasi-fne (see Definition 5.5); (ii) T has the fixed-point closed property. Let x 0 D B( x, δ). Set ( k N) x k+1 = Tx k. Then (x k ) k N converges to a point z B( x, δ) Fix T. Proof. By assumption (i), (45) ( x B( x, δ) D) ( y B( x, δ) Fix T) Tx y 2 + Tx x 2 x y 2. With x 0 D B( x, δ), equation (45) gives ( y B( x, δ) Fix T) Tx 0 y 2 + Tx 0 x 0 2 x 0 y 2, so x 1 x x 0 x δ. By induction, we have that (46) (x k ) k N is a sequence in B( x, δ). Moreover, equation (45) gives ( k N)( y B( x, δ) Fix T) Tx k y 2 + Tx k x k 2 x k y 2, so (x k ) k N is Fejér monotone with respect to C := B( x, δ) Fix T, and x k+1 x k 0 as k. Let x be a cluster point of (x k ) k N, say x kl x. Since Tx kl x kl 0, and T is fixed-point closed, we have Tx x = 0. Moreover, x x δ because of (46). Thus, x C. Applying [7, Theorem 5.5], we conclude that x k z C. 29

30 Proposition 6.12 (convergence of iterates of a locally quasi-ne mapping) Let D be nonempty closed convex subset of R n, let T : D D and int(fix T) =. Assume that (i) There exists x Fix T and δ > 0 such that T is locally quasi-ne (see Definition 6.2); (ii) int(b( x, δ) Fix T) = ; (iii) T has the fixed-point closed property. Let x 0 D B( x, δ). Set ( k N) x k+1 = Tx k. Then (x k ) k N converges to a point z B( x, δ) Fix T. Proof. By assumption (i), there exists δ > 0 such that (47) ( x B( x, δ) D) ( y B( x, δ) Fix T) Tx y x y ; With x 0 D B( x, δ), equation (47) gives ( y B( x, δ) Fix T) Tx 0 y x 0 y, so x 1 x x 0 x δ. By induction, we have that (48) (x k ) k N is a sequence in B( x, δ). Moreover, equation (47) gives ( k N)( y B( x, δ) Fix T) Tx k y x k y, so (x k ) k N is Fejér monotone with respect to C := B( x, δ) Fix T. As int C =, we have that x k z R n by [7, Proposition 5.10]. This implies that Tx k x k = x k+1 x k 0 and x k z as k. Since T is fixed-point closed, we have Tz z = 0. Moreover, z x δ because of (48). Hence x k z B( x, δ) Fix T. 6.3 Applications to subgradient projectors Theorem 6.13 Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f =. If the subgradient projector G f,s is a cutter, then for every x R n, the sequence (G f,s k x) k N converges to a point z such that either 0 f (z) or f (z) 0. Proof. Combine Theorem 4.9 and Proposition To proceed, it will be convenient to single out: Lemma 6.14 Let f : R n R be lsc and subdifferentiable, and G f,s be given by Definition 2.2. When f (x) > 0, 0 f (x), and y R n, we have (49) G f,s (x) y 2 = x y 2 + f (x) s(x) 2 ( f (x) + 2 y x, s(x) ). 30

31 Proof. This follows from G f,s (x) y 2 = x y f (x) s(x) 2 s(x) = x y 2 + f 2 (x) s(x) x y, f (x) s(x) 2 s(x). Theorem 6.15 Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and Then the following hold: (i) G f,s is quasi-ne if and only if S := x R n 0 f (x)} lev 0 f. (50) ( x S) ( y S) f (x) + 2 y x, s(x) 0. (ii) Assume that int S =, and (50) holds. Then for every x R n, the sequence (G f,s k x) k N converges to a point z S. Proof. (i). By Lemma 6.14, when x S, and y S, we have (51) G f,s (x) y 2 = x y 2 + f (x) s(x) 2 ( f (x) + 2 y x, s(x) ). In view of Definition 6.1, assumption (50) is equivalent to G f,s being quasi-ne. (ii). By (i), the sequence (G f,s k x) k N is Fejér monotone with respect to S. Since int S =, by [7, Proposition 5.10], the sequence (G f,s k x) k N converges to a point z R n. Write x k = G f,s k x. Then ( k N) x k+1 = G f,s (x k ). As f is locally Lipschitz at z and x k z, the sequence (s(x k )) k N is bounded. Since x k+1 x k = f (x k) s(x k ) 2 s(x k) and lim k x k = z, we have Hence z S. f (z) = lim k f (x k ) = lim k x k+1 x k s(x k ) = 0. Theorem 6.16 Let f : R n R be locally Lipschitz, and G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f =. Assume that the subgradient projector G f,s is locally quasi-fne at x S, i.e., there exists δ > 0 such that (52) ( x B( x, δ) \ S) ( y B( x, δ) S) f (x) + y x, s(x) 0. Then for every x 0 B( x, δ), the sequence (x k ) k N defined by ( k N) x k+1 = G f,s (x k ) converges to a point z B( x, δ) S. 31

32 Proof. (52) guarantees that G f,s is locally quasi-fne at x S. Indeed, when x S and y S B( x, δ), using Lemma 6.14, (52) and (13), we have (53) (54) (55) (56) G f,s (x) y 2 = x y 2 + f (x) ( ) f (x) + 2 y x, s(x) s(x) 2 = x y 2 + f 2 (x) 2( f (x) + y x, s(x) ) f (x) s(x) 2 f (x) ( ) f (x) x y 2 + G f (x) x 2 f (x) x y 2 G f,s (x) x 2. In view of Theorem 4.9, it suffices to apply Proposition Theorem 6.17 Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f with int S =. Assume that there exist x S and δ > 0 such that (57) ( x B( x, δ) \ S) ( y B( x, δ) S) f (x) + 2 y x, s(x) 0. Assume further that int(b( x, δ) S) =. Then for every x 0 B( x, δ), the sequence (x k ) k N defined by converges to a point z B( x, δ) S. ( k N) x k+1 = G f,s (x k ) Proof. (57) guarantees that G f,s is locally quasi-ne at x. Indeed, for every x B( x, δ) \ S and y B( x, δ) S, using Lemma 6.14 and (57) we have (58) (59) G f,s (x) y 2 = x y 2 + x y 2. f (x) ( ) f (x) + 2 y x, s(x) s(x) 2 By Theorem 4.9, G f,s has the fixed point closed property. Therefore, Proposition 6.12 applies. Example 6.18 (1). Define f : R R : x 1 x if x 1, 0 otherwise. Because Fix G f,s = x R x 1 } is not convex, we have that G f,s is not a cutter. However, f satisfies the assumptions of both Theorems 6.16 and 6.17 so that the local convergence theory applies. (2). Define 0 if x 0, x if 0 x 1, f : R R : x 1 if 1 x 2, x 1 if x 2. As Fix G f,s = (, 0] [1, 2], G f,s is not a cutter. However, both Theorems 6.16 and 6.17 apply. 32

33 6.4 Finite convergence and (C, ε)-firmly nonexpansiveness Finite termination algorithms for subgradient projectors of convex functions have been studied in [46, 29, 13]. Recently, in [43] Pang studied finite convergent algorithms of subgradient projectors of locally Lipschitz functions defined in terms of the Clarke subdifferential. Naturally, one asks what his result implies about the subgradient projector defined by us. To this end, let us recall lower-c k functions defined by Rockafellar and Wets [51, Definition 10.29], and approximate convex functions by Nghai, Luc, and Théra [41], respectively. Definition 6.19 A function f : O R, where O is an open subset in R n, is said to be lower C k on O if on some neighborhood V of each x O there is a representation f (x) := max t T f t (x) in which f t is of class C k on V and the index set T is compact such that f t (x) and all its partial derivatives through order k depend continuously not just on x V but jointly on (t, x) T V. Definition 6.20 A function f : R n R is approximately convex at x R n if for every ε > 0 there exists δ > 0 such that ( x, y B( x, δ))( λ (0, 1)) f (λx + (1 λ)y) λ f (x) + (1 λ) f (y) + ελ(1 λ) x y. Fact 6.21 (See [2, Theorem 4.5], [26, Corollary 3]) Let f : R n R be locally Lipschitz at x. Then the following are equivalent: (i) f is lower-c 1 around x. (ii) f is approximately convex at x. (iii) for every ε > 0 there exists δ > 0 such that ( x, y B( x, δ))(x c f (x)) f (y) f (x) + x, y x ε x y. Theorem 6.22 (finite convergence for accelerated subgradient projectors) Let f : R n R be locally Lipschitz, and let x R n satisfy (i) f ( x) = 0; (ii) 0 f ( x); (iii) f is lower-c 1 around x. Suppose that the strictly decreasing sequence (ε k ) k N converges to 0 at a sublinear rate. Then there exist δ > 0 and ε > 0 such that for every x 0 B( x, δ) and ε 0 < ε, the sequence (x k ) k N defined by (60) ( k N) x k+1 = x k ε k + f (x k ) s k 2 s k, where s k f (x k ), converges in finitely many iterations, i.e., f (x k ) 0 for some k N. Proof. Since f is lower-c 1 around x, f is Clarke regular around x; see [51, Theorem 10.31]. Thus, the Clarke subdifferential and the limiting subdifferential of f are the same around x. Because f is upper semicontinuous, when δ is sufficiently small, (ii) guarantees that 0 f (x) for every x B( x, δ), which implies that (60) is well defined. By Fact 6.21, (iii) is equivalent to f being approximately convex at x. The result then follows from [43, Theorem 3]. 33

34 Remark 6.23 See [2, 41, 52] for more characterizations on lower-c 1 functions and approximately convex functions. Let ε 0 and C R n. In [30], Hesse and Luke studied (C, ε)-firmly nonexpansive mappings; see also [37]. Definition 6.24 Let C, D be nonempty subsets of R n and T : D R n. T is called (C, ε)-firmly nonexpansive if ( x D)( y C) Tx Ty 2 + (x Tx) (y Ty) 2 (1 + ε) x y 2. Theorem 6.25 ((C, ε)-firmly nonexpansivness of G f,s ) Let f : R n R be locally Lipschitz, G f,s be given by Definition 2.2, and S := x R n 0 f (x)} lev 0 f. Suppose that x R n satisfies (i) f ( x) = 0; (ii) 0 f ( x); (iii) f is lower-c 1 around x. Then for every ε > 0 there exists δ > 0 such that on B( x, δ) the subgradient projector G f,s is (S B( x, δ), ε)-firmly nonexpansive, in which ε = 1 + 8Lε/d f ( x) (0) 2 and L being the Lipschitz modulus of f around x. Proof. Let α := d f ( x) (0)/2. Then α > 0 by (ii). For every ε > 0, we can find δ > 0 such that (61) f (y) f (x) + x, y x ε x y, when x, y B( x, δ), x f (x). This follows from (iii) and Fact s(x) α whenever s(x) f (x) and x B( x, δ). This is because that f ( x) is compact, f is upper semicontinuous, and (ii). f (x) f (y) L x y whenever x, y B( x, δ). This is possible because f is locally Lipschitz around x. Since 0 f (y) for y B( x, δ), we must have f (y) 0 if y S B( x, δ)). Thus, (61) gives (62) ( x B( x, δ))( y S B( x, δ))( x f (x)) f (x) + x, y x ε x y. Put C := S B( x, δ). When f (x) > 0, 0 f (x), and y S B( x, δ), using Lemma 6.14, (62), and (13), we have (63) (64) G f,s (x) y 2 = x y 2 + f (x) ( ) f (x) + 2 y x, s(x) s(x) 2 = x y 2 + f 2 (x) 2( f (x) + y x, s(x) ) f (x) s(x) 2 f (x) 34

35 (65) (66) (67) (68) (69) This completes the proof. x y f (x) ( ) f (x) s(x) 2 ε y x + G f,s(x) x 2 f (x) x y 2 2( f (x) f (y)) + α 2 ε x y G f,s (x) x 2 x y 2 2L x y + α 2 ε x y G f,s (x) x 2 = x y 2 + 2Lε α 2 x y 2 G f,s (x) x 2 (1 + ε) x y 2 G f,s (x) x 2. Remark 6.26 Observe that both Theorems 6.22 and 6.25 aim for solving nonconvex inequality problems, e.g., finding a point x such that f (x) 0 with f (x) = e x2 + 1/2 and f satisfying the assumptions at x = ln 2. However, they do not apply to f = d C. This completes Part I. In Part II, we will study subgradient projectors of Moreau envelopes, and their connections to subgradient projectors of original functions. Part II Subgradient projectors of Moreau envelopes and characterizations 7 Subgradient projectors of Moreau envelopes When f : R n (, + ] is lsc, f (x) might be empty for some x R n. However, e λ f has much better properties when f is prox-bounded, see, e.g., Fact 7.5 below; and this is the motivation for us to study subgradient projectors of Moreau envelopes below. To do this, we need to study the relationship between G eλ f and G f,s. Recall that for a proper, lsc function f : R n (, + ] and parameter value λ > 0, the Moreau envelope e λ f and proximal mapping P λ f are defined respectively by e λ f : R n (, + ] : x inf f (w) + 1 } x w 2, and w 2λ P λ f : R n R n : x argmin w f (w) + 1 x w 2 2λ When f is proper, lsc, and convex, we refer the reader to [7, Chapter 12] and [50] for the properties e λ f and P λ f. When f is a proper and lsc function, not necessarily convex, in [45] Poliquin and Rockafellar coined the notions of prox-boundedness and prox-regularity of functions; see also [51, page 610]. Definition 7.1 (i) A function f : R n (, + ] is prox-bounded if there exists λ > 0 such that e λ f (x) > for some x R n. The supremum of the set of all such λ is the threshold λ f of prox-boundedness for f. 35 }.

36 (ii) A function f : R n (, + ] is prox-regular at x for v if f is finite and locally lsc at x with v f ( x), and there exists ε > 0 and ρ 0 such that f (x ) f (x) + v, x x ρ 2 x x 2 for all x x ε when v f (x), v v < ε, x x < ε, f (x) < f ( x) + ε. When this holds for all v f ( x), f is said to be prox-regular at x. We give a simple example to illustrate the concepts of prox-regularity and prox-bounded of functions. Example 7.2 (1). The function f : R R : x x is prox-bounded with λ f = +. However, f is not prox-regular at x = 0. (2). The function f : R R : x x 3 is prox-regular on R. However, f is not prox-bounded. (3). The function f : R R : x x 2 /2 is prox-regular on R, and prox-bounded with λ f = 1. (4). The function f : R R : x x 3 x not prox-regular at x = 0, and not prox-bounded. In the sequel, we shall also need the following key concepts. Definition 7.3 ([51, page 614]) Let O be a nonempty open subset of R n. We say that f : O R is C 1+ if f is differentiable with f Lipschitz continuous. Set q : R n R : x x 2 /2. Definition 7.4 ([51, page 567]) A proper, lsc function f : R n (, + ] is µ-hypoconvex for some µ > 0 if f + µ 1 q is convex. 7.1 Fine properties of prox-regular functions Two major facts about the Moreau envelopes of prox-bounded functions and prox-regular functions are: Fact 7.5 ([51, Example 10.32]) Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Then for every λ (0, λ f ), the function e λ f is lower C 2, hence semidifferentiable, locally Lipschitz, Clarke regular, and [ e λ f ](x) = λ 1 [conv P λ f (x) x], = [e λ f ](x) λ 1 [x P λ f (x)]. Fact 7.6 ([5, Proposition 5.3], [51, Proposition 13.37]) Let f : R n (, + ] be lsc, proper, and prox-bounded with threshold λ f. Suppose that f is prox-regular at x for v f ( x). Then for all λ (0, λ f ) there is a neighborhood U λ of x + λ v for which the following equivalent properties hold: (i) e λ f is C 1+ on U λ. (ii) P λ f is nonempty, single-valued, monotone and Lipschitz continuous on U λ. Further, e λ f = (Id P λ f )/λ on U λ. 36

37 Proposition 7.7 Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Then for every λ (0, λ f ), one has dom P λ f = R n. Consequently, ran(id +λ f ) = R n. Proof. As 0 < λ < λ f, we have dom P λ f = R n. To complete the proof, it suffices to apply [51, Example 10.2]: P λ f (Id +λ f ) 1. Proposition 7.8 (global prox-regularity implies hypoconvexity) Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Suppose that f is prox-regular on R n. Then for every λ (0, λ f ), the following hold: (i) The function f + λ 1 q is convex. (ii) P λ f = (Id +λ f ) 1 is single-valued and Lipschitz continuous on R n. (iii) e λ f = (Id P λ f )/λ. Proof. When λ (0, λ f ), we have dom P λ f = R n. Since f is prox-regular, by Fact 7.6, for v f (x) there exists an open neighborhood U λ of x + λv such that P λ f is single-valued and locally Lipschitz. Proposition 7.7 implies that P λ f is single-valued and locally Lipschitz on R n. As P λ f is always monotone, cf. [51, Proposition 12.19], P λ f is maximally monotone by [51, Example 12.7]. Then (i) and (ii) follow from [51, Proposition 12.19]. To obtain (iii), one can apply Fact 7.6(ii).. Proposition 7.8(i) immediately implies: Corollary 7.9 Let f : R n (, + ] be proper, lsc, and prox-bounded. Suppose that f is prox-regular on R n. Then the function f is a difference of two convex functions. Characterizations of prox-regularity on an open subset is given by Fact 7.10 ([51, Theorem 10.33], [51, Proposition 13.33]) Let f : O R, where O is a nonempty open set in R n. The following are equivalent: (i) The function f is lower C 2 on O. (ii) Relative to some neighborhood of each point of O, there is an expression f = g ρ q in which g is finite, convex function, and ρ > 0. (iii) f is prox-regular and locally Lipschitz on O. Corollary 7.11 Let f : R n (, + ] and let O be a nonempty open subset of R n. Suppose that f is prox-regular and locally Lipschitz on O. Then for every compact convex subset S of O, there exists ρ > 0 such that f + ρ q is convex on S. Proof. Let x S. By Fact 7.10, there exists an open ball B(x, δ x ) O and ρ x such that f + ρ x q is convex on B(x, δ x ). Select from the covering of S by various balls B(x, δ x ) a finite covering, say B(x i, δ xi ) with i = 1,..., m. Let ρ := maxρ x1,..., ρ xm }. As f + ρ q is convex on each B(x i, δ xi ), and S B(x i, δ xi ), we obtain that f + ρ q is convex on S. 37

38 7.2 Relationship among ( e λ f ) 1 (0), Fix P λ f and ( f ) 1 (0) Proposition 7.12 Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f. Then for every λ (0, λ f ), the following hold: (i) For every α R, the level set lev α f = if and only if lev α (e λ f ) =. Moreover, lev α (e λ f ) lev α f. (ii) 0 e λ f (x) x P λ f (x) 0 f (x). (iii) If, in addition, f is prox-regular at x for v f ( x), then on a neighborhood U λ of x + λ v one has Proof. When f is prox-regular on R n, one has 0 = e λ f (x) x = P λ f (x). ( x R n ) 0 = e λ f (x) x = P λ f (x) 0 f (x). (i). Since inf f = inf e λ f and argmin f = argmin e λ f by [51, Example 1.46], lev α f = if and only if lev α (e λ f ) = for every α R. The inclusion follows from e λ f f. (ii). By Fact 7.5, we have e λ f (x) λ 1 [x P λ f (x)]. This gives the first implication. By [51, Example 10.2], P λ f (x) (Id +λ f ) 1 (x) for all x R n. The second implication follows. (iii). By Fact 7.6 or [45, Theorem 4.4], the Moreau envelope e λ f is C 1+ on a neighborhood U λ of x + λ v with e λ f = λ 1 [Id P λ f ] on U λ. When f is prox-regular on R n, one has e λ f = λ 1 [Id P λ f ], and P λ f = (Id +λ f ) 1 is single-valued on R n by Proposition 7.8. Fact 7.13 [51, Proposition 12.19] For a proper, lsc function f : R n (, + ], assume that f is µ-hypoconvex for some µ > 0. Then P µ f = (Id +µ f ) 1, and for all λ (0, µ) the mapping P λ f = (Id +λ f ) 1 is Lipschitz continuous with constant µ/[µ λ]. Under the assumption of f being µ-hypoconvex for some µ > 0, when λ > 0 is sufficiently small e λ f gives rise to a smooth regularization of f. Proposition 7.14 For a proper, lsc function f : R n (, + ], assume that f is µ-hypoconvex for some µ > 0. Then for every λ (0, µ), the following hold: (i) e λ f is C 1+ and e λ f = λ 1 (Id P λ f ) on R n. (ii) e λ f (x) = 0 0 f (x). Proof. As f is µ-hypoconvex, f is prox-regular and prox-bounded. By Fact 7.13 and Fact 7.6, e λ f = λ 1 [Id P λ f ] = λ 1 [Id (Id +λ f ) 1 ]. Remark 7.15 Proposition 7.14(ii) can also been obtained from [33, Theorem 4.4], in which the authors study the Bregman envelope and proximal mapping of proper, lsc, and prox-bounded functions. 38

39 Proposition 7.16 For a proper, lsc function f : R n (, + ], assume that f := max f 1,..., f m } with f i being C 2 and that f is prox-bounded below. Then for every λ > 0 sufficiently small, one has 0 = e λ f (x) x = P λ f (x) 0 f (x). Proof. By [51, Proposition 13.33] or [45, Example 2.9], f is prox-regular everywhere on R n. By Proposition 7.8, P λ f = (Id +λ f ) 1. By Fact 7.6, we have e λ f = λ 1 [Id P λ f ]. It remains to apply Proposition 7.12(iii), and P λ f = (Id +λ f ) 1 being single-valued. For a sequence C k } k N of subsets of R n, its limit and outer limit are denoted respectively by lim k C k and lim sup k C k ; see [51, page 109]. Proposition 7.17 Let f : R n (, + ] be proper, lsc, and prox-bounded with threshold λ f > 0. Assume that C R n is nonempty and closed. Then for every α R one has lim λ 0 lev α (e λ f + ι C ) = lev α ( f + ι C ). Proof. In view of [51, Theorem 7.4(d)], e λ f + ι C converges epigraphically to f + ι C when λ 0. By [51, Proposition 7.7], for every α R there exists α λ α such that lim λ 0 lev αλ (e λ f + ι C ) = lev α ( f + ι C ). Since lev α ( f + ι C ) lev α (e λ f + ι C ) lev αλ (e λ f + ι C ), we obtain lim λ 0 lev α (e λ f + ι C ) = lev α ( f + ι C ). 7.3 The subgradient projector of e λ f The following result extends [12, Proposition 3.1(viii)] and [11, Example 4.9(ii)] from convex functions to possibly nonconvex functions. Theorem 7.18 (subgradient projector of Moreau envelopes of a prox-regular function) Suppose that f : R n (, + ] is proper, lsc, and prox-bounded with threshold λ f, and that f is prox-regular. Then for every λ (0, λ f ), the subgradient projector of e λ f is given by x e λ λ f (x) G eλ f : R n R n (x P : x x P λ f (x) 2 λ f (x)) x if e λ f (x) > 0 and x = P λ f (x), otherwise, and Fix G eλ f = lev 0 (e λ f ) x R n x = P λ f (x)}. When x = P λ f (x), we have 0 f (x). Moreover, lim λ 0 lev 0 (e λ f ) = lev 0 f. Proof. Apply Propositions 7.12, 7.8, and 7.17 with C = R n and α = 0. The restriction of G f to a subset D R n is denoted by G f D and is the operator defined by G f D : D R n, G f D (x) = G f (x) for every x D. Theorem 7.19 (functions being prox-regular at the critical point) Suppose that f : R n (, + ] is proper, lsc, and prox-bounded with threshold λ f, and that f is prox-regular at x for 0 f ( x). Then for every λ (0, λ f ), there exists a closed neighborhood U λ of x for which x e λ λ f (x) (x P (70) ( x U λ ) G eλ f Uλ (x) = x P λ f (x) 2 λ f (x)) x if e λ f (x) > 0 and x = P λ f (x), otherwise, 39

40 (71) Fix G eλ f Uλ = ( lev 0 (e λ f ) x R n x = P λ f (x)} ) U λ, and (72) x = P λ f (x) 0 f (x). Moreover, lim sup λ 0 ( lev0 (e λ f ) U λ ) lev0 f. Proof. Apply Fact 7.6 and Proposition 7.12(iii) to obtain (70) (72). Since ( lev0 (e λ f ) U λ ) lev0 (e λ f ), and lim lev 0 (e λ f ) = lev 0 (e 1/k f ) λ 0 k 1 by [51, Exercise 4.3(b)], it suffices to use Proposition 7.17 with C = R n and α = 0. Theorem 7.20 (subgradient projector of Moreau envelopes of a hypoconvex function) Suppose that f : R n (, + ] is proper and lsc, and that f is µ-hypoconvex for some µ > 0. Then for every λ (0, µ), the subgradient projector of e λ f is given by x e λ λ f (x) G eλ f : R n R n (x P : x x P λ f (x) 2 λ f (x)) x if e λ f (x) > 0 and 0 f (x), otherwise, Moreover, lim λ 0 lev 0 (e λ f ) = lev 0 f. Fix G eλ f = lev 0 (e λ f ) x R n 0 f (x)}, and x R n 0 = e λ f (x)} = x R n 0 f (x)}. Proof. Apply Propositions 7.14 and 7.17 with C = R n and α = 0. Theorems 7.18 and 7.20 imply that if one can solve x λ Fix(G eλ f ), then either 0 f (x λ ) for some λ > 0 or the subsequential limits of (x λ ) will lie in lev 0 f when λ 0. Remark 7.21 Moreau envelopes of nonconvex functions in infinite dimensional spaces have been intensively studied; see, e.g., [5, 6, 32, 3]. Thus, it is possible to have analogues of Theorems 7.18, 7.19, 7.20 in infinite dimensional spaces. However, this is beyond the scope of this paper. Cutters are important for studying convergence of iterative methods; see, e.g., [9, 20, 13]. It is natural to ask whether G eλ f is a cutter in the case that G f is a cutter. Although we cannot answer this in general, the following special case is true. Proposition 7.22 Let f : R n (, + ] be proper, lsc, and prox-regular. Suppose that min f = 0, f is strictly differentiable at every x argmin f, and that 0 f (x) for every x R n \ argmin f. Then for every λ > 0 the following hold: (i) Fix G eλ f = Fix G f. (ii) If G f,s is a cutter for every selection s of f, then G eλ f is a cutter. 40

41 Proof. As min f >, the function f is prox-bounded with threshold r f = +. (i). Note that min f = min e λ f, argmin f = argmin e λ f. The assumption min f = 0 implies lev 0 f = lev 0 e λ f = argmin f. Because f is prox-regular on R n and r f = +, for every λ > 0 we have e λ f = λ 1 (Id Prox λ f ) and Prox λ f = (Id +λ f ) 1 being single-valued by Proposition 7.8. This gives x R n e λ f (x) = 0 } = x R n 0 f (x) }. Then Fix G eλ f = lev 0 e λ f x R n e λ f (x) = 0 } = lev 0 f x R n 0 f (x) } = Fix G f. (ii). Assume that e λ f (x) > 0 and 0 = e λ f (x). Since f is prox-regular, e λ f (x) = λ 1 (x Prox λ f (x)) = 0. By the definition of Prox λ f, we have (73) 0 = λ 1 (x Prox λ f (x)) f (Prox λ f (x)), which implies that Prox λ f (x) argmin f. Indeed, if Prox λ f (x) argmin f, the assumption gives f (Prox λ f (x)) = 0} which contradicts (73). Thus, f (Prox λ f (x)) > 0. Because Prox λ f (x) argmin f, the assumption also gives 0 f (Prox λ f (x)). These arguments, (i), Theorem 5.11, and G f,s being a cutter altogether imply that (74) f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 0 if u Fix G f,s, e λ f (x) > 0 and e λ f (x) = 0. Now we show that G eλ f is a cutter. Let u Fix G eλ f, e λ f (x) > 0, and e λ f (x) = 0. In view of (74) and (i), we calculate e λ f (x) + λ 1 (x Prox λ f (x)), u x = e λ f (x) + λ 1 (x Prox λ f (x)), u Prox λ f (x) + λ 1 (x Prox λ f (x)), Prox λ f (x) x = f (Prox λ f (x)) + 1 2λ x Prox λ f (x) 2 + λ 1 (x Prox λ f (x)), u Prox λ f (x) λ 1 x Prox λ f (x)) 2 = f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 1 2λ x Prox λ f (x)) 2 1 2λ x Prox λ f (x)) 2 0. Theorem 5.11(i) concludes the proof. A local version of Theorem 7.22 comes as follows. Proposition 7.23 Let f : R n (, + ] be proper, lsc, and prox-regular at x for v = 0, and let S := x R n 0 f (x) } lev 0 f. Suppose that min f = 0, and there exists δ > 0 such that (i) For every selection s of f, G f,s is a cutter on B( x, δ), i.e., ( x B( x, δ) \ S)( u S B( x, δ)) f (x) + s(x), u x 0. 41

42 (ii) f is strictly differentiable at every u argmin f B( x, δ), and that 0 f (x) for every x (R n \ argmin f ) B( x, δ). Then for every λ > 0 there is a neighborhood of x on which G eλ f is a cutter. Proof. Because min f >, the function f is prox-bounded with threshold r f = +. Since min f = min e λ f and argmin f = argmin e λ f, the assumption min f = 0 implies lev 0 f = lev 0 e λ f = argmin f. Because f is prox-regular at x for v = 0, and r f = +, by Proposition 7.6 for every λ > 0 there exists δ > δ 1 > 0 such that on B( x, δ 1 ) the proximal mappings (75) P λ f is Lipschitz continuous, P λ f ( x) = x, and (76) e λ f = λ 1 (Id Prox λ f ). By (75) there exists δ 1 > δ 2 > 0 such that (77) P λ f (x) B( x, δ 1 ) when x B( x, δ 2 ). Claim 1. For every u Fix G f,s B( x, δ 2 ) we have (78) f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 0 if e λ f (x) > 0, e λ f (x) = 0 and x B( x, δ 2 ). Indeed, let e λ f (x) > 0 and 0 = e λ f (x) and x B( x, δ 2 ). In view of (76), e λ f (x) = λ 1 (x Prox λ f (x)) = 0. By the definition of Prox λ f or [45, Proposition 4.3(b)], we have (79) 0 = λ 1 (x Prox λ f (x)) f (Prox λ f (x)). This implies that Prox λ f (x) argmin f. Suppose to the contrary that Prox λ f (x) argmin f. Then the assumption (ii) and (77) give f (Prox λ f (x)) = 0} which contradicts (166). Thus, (80) f (Prox λ f (x)) > 0. Because Prox λ f (x) argmin f and (77), the assumption (ii) also ensures (81) 0 f (Prox λ f (x)). Therefore, (78) follows from assumptions (i) and (ii). Claim 2. G eλ f is a cutter on B( x, δ 2 ). To this end, let u Fix G eλ f B( x, δ 2 ), x B( x, δ 2 ), e λ f (x) > 0, and e λ f (x) = 0. Then u Fix G f B( x, δ 2 ), f (Prox λ f (x)) > 0, 0 f (Prox λ f (x)) by (80), (81). Using (78) we calculate e λ f (x) + λ 1 (x Prox λ f (x)), u x = e λ f (x) + λ 1 (x Prox λ f (x)), u Prox λ f (x) + λ 1 (x Prox λ f (x)), Prox λ f (x) x = f (Prox λ f (x)) + 1 2λ x Prox λ f (x) 2 + λ 1 (x Prox λ f (x)), u Prox λ f (x) 42

43 λ 1 x Prox λ f (x)) 2 = f (Prox λ f (x)) + λ 1 (x Prox λ f (x)), u Prox λ f (x) 1 2λ x Prox λ f (x)) 2 0. Hence, G eλ f is a cutter on B( x, δ 2 ) by Theorem 5.11(ii). 1 2λ x Prox λ f (x)) 2 Is it possible that G eλ f is a cutter for every λ > 0 but G f is not a cutter? This is partially answered by the following result. Proposition 7.24 Let f : R n R be C 2 and prox-bounded below. If G eλ f is a cutter for all sufficiently small λ > 0, then G f is a cutter. Proof. Let λ > 0 be sufficiently small. Proposition 7.8 yields that e λ f is C 1+. Write S := x R n 0 f (x) } lev 0 f, S λ := x R n e λ f (x) = 0 } lev 0 e λ f. Using that e λ f (x) = 0 0 f (x) by Proposition 7.16 and that e λ f f, we have S S λ. Since G eλ f is a cutter, by Theorem 5.11 we obtain S λ u R n eλ f (x) + e λ f (x), u x 0 } whenever e λ f (x) > 0 and e λ f (x) = 0. It follows that (82) S u R n e λ f (x) + e λ f (x), u x 0 } whenever e λ f (x) > 0 and e λ f (x) = 0. By [3, Theorem 3.10] or [32, Theorem 5.1], (83) f (x) = lim sup e λm f (x m ) m in which x m x, e λm f (x m ) f (x), λ m 0. Whenever f (x) > 0, f (x) = 0, (83) implies that for sufficiently large m, it holds that e λm f (x m ) > 0 and e λm f (x m ) = 0. Then by (82), Passing to the limit when m, we have ( u S) e λm f (x m ) + e λm f (x m ), u x m 0. ( u S) f (x) + f (x), u x 0. Hence, G f is a cutter by using Theorem 5.11 again. 7.4 The subgradient projector of d C when C is prox-regular at a point In this subsection, instead of functions we shall consider sets which are prox-regular at some points. Recall that a set C R n is prox-regular at x C for v N C ( x) when ι C is prox-regular at x for v; see [51, Exercise 13.31]. Example 7.25 Let C R n be closed and x C. If C is prox-regular at x for v = 0, then there exists a neighborhood U of x on which 43

44 (i) P C is single-valued and Lipschitz; (ii) P C = (Id +T) 1 for some localization T of N C around ( x, 0); (iii) d C is strictly differentiable on U \ C with d C = Id P C d C ; (iv) G dc = P C ; (v) G d 2 C = Id +P C 2. Proof. (i), (ii), and (iii) are given in [51, page 618]. To see (iv), let x U \ C. Since d C (x) > 0 and d C (x) = x P C(x) d C (x) = 0, we have G dc (x) = x d C(x) d C (x) 2 d C(x) = x (x P C (x)) = P C (x). When x U C, G dc (x) = x = P C (x). (v) follows from (iv) and Theorem 3.9. Remark 7.26 Sets which satisfy the assumption on C in Theorem 7.25 include convex sets, strongly amenable sets, etc; see, e.g., [51, page 442]. See also [1] for recent advances on proxregular sets and uniformly prox-regular sets. According to Example 7.25(iv), when C is prox-regular at x for v = 0, we have G dc = P C around a neighborhood of x. What happens if, in addition, G dc is a cutter or quasi nonexpansive on the neighborhood? Proposition 7.27 Let C R n be closed and x C, and let C be prox-regular at x for v = 0. Suppose that there exists δ > 0 such that one of the following holds: (i) P C is a cutter on B( x, δ), i.e., (84) ( x B( x, δ))( u C B( x, δ)) x P C (x), u P C (x) 0. (ii) P C is quasi-ne on B( x, δ), i.e., (85) ( x B( x, δ))( u C B( x, δ)) P C (x) u x u. Then C B( x, δ) is convex. Proof. By Example 7.25, there exists δ > 0 such that P C is single-valued and Lipschitz on the closed ball B( x, δ). (i). Assume that (84) holds. On the one hand, (84) gives that C B( x, δ) x B( x,δ) u B( x, δ) x P C (x), u P C (x) 0 }. 44

45 On the other hand, let y x B( x,δ) u B( x, δ) x P C (x), u P C (x) 0 }. Then y B( x, δ) and x P C (x), y P C (x) 0 for every x B( x, δ). Taking x = y we have y P C (y), y P C (y) 0, which implies y = P C (y), so y C. Therefore, y B( x, δ) C. Hence C B( x, δ) = u B( x, δ) x P C (x), u P C (x) 0 }, x B( x,δ) and consequently C B( x, δ) is a convex set. (ii). Similar arguments as (i) show that C B( x, δ) = u B( x, δ) P C x u x u }. x B( x,δ) To finish the proof, it suffices to observe that in the Euclidean space R n, for every x, y R n the set u R n y u x u } is a half space when x = y, and the whole space R n if x = y. Proposition 7.28 Let C R n be closed and x C. If there exists δ > 0 such that C B( x, δ) is convex, then there exists δ 1 > 0 such that P C is a cutter on B( x, δ 1 ). Consequently, P C is a quasi-ne on B( x, δ 1 ). Proof. Observe that for p P C (x), p x p x + x x 2 x x. Thus, we can choose 0 < δ 1 < δ sufficiently small, e.g., δ 1 < δ/2, such that x x < δ 1 implies ( p P C (x)) p x < δ. This implies that whenever x B( x, δ 1 ), we have P C (x) C B( x, δ). Then (86) ( x B( x, δ 1 )) P C (x) = P C B( x,δ) (x). Because C B( x, δ) is closed and convex, P C B( x,δ) is firmly nonexpansive on R n. From (86), we have that P C is firmly nonexpansive on B( x, δ 1 ), that is, It follows that ( x, y B( x, δ 1 )) P C (x) P C (y) 2 + (Id P C )(x) (Id P C )(y) 2 x y 2. ( x B( x, δ 1 ))( y C B( x, δ 1 )) P C (x) y 2 + x P C (x) 2 x y 2. Hence P C is a cutter on B( x, δ 1 ). 8 Characterization of subgradient projectors of convex functions Subgradient projectors of convex functions are quasi-fne, so algorithms developed in [20] or [7] can be applied; see also Theorem Therefore, in practice, it is useful to have available some results on whether a mapping is a subgradient projector of a convex function. This is the goal of this section. The results in this section provide some checkable conditions for convergence of iterated subgradient projectors in Section 6. The following result is of independent interest. 45

46 Proposition 8.1 Let C R n be closed and convex. Assume that the function f : R n \ C R satisfies (i) f 0 on R n \ C; (ii) f is convex on every convex subsets of R n \ C; (iii) Whenever x bdry(r n \ C), one has lim y x f (y) = 0. That is, lim i f (y i ) = 0 whenever y R n \C (y i ) i N is a sequence in R n \ C converging to a boundary point x of R n \ C. Define Then g is convex on R n. g : R n R : x f (x) if x C, 0 if x C. Proof. Let x, y R n, 0 λ 1. We need to show (87) g(λx + (1 λ)y) λg(x) + (1 λ)g(y). We consider three cases. (i). If [x, y] R n \ C, g = f is convex on [x, y] by the assumption. (ii). If λx + (1 λ)y C, then since g(x), g(y) 0. g(λx + (1 λ)y) = 0 λg(x) + (1 λ)g(y) (iii). λx + (1 λ)y C and [x, y] C =. In particular, x, y cannot both be in C. We consider two subcases. Subcase 1. x C and y C. As y C, there exists z bdry(c) such that Because and f is convex on [z, y], we have λx + (1 λ)y [z, y] X \ C and f (z) = 0. λx + (1 λ)y = αz + (1 α)y for some 0 α 1, (88) f (λx + (1 λ)y) = f (αz + (1 α)y) α f (z) + (1 α) f (y) = (1 α) f (y). Now z = βx + (1 β)y for some 0 β 1, and λx + (1 λ)y = αz + (1 α)y = α(βx + (1 β)y) + (1 α)y = (αβ)x + (1 αβ)y give λ = αβ. Therefore, by (88), g(x) = 0 and g(y) = f (y) 0, (89) (90) g(λx + (1 λ)y) = f (λx + (1 λ)y) (1 αβ) f (y) = (1 λ)g(y) + λg(x), 46

47 which is (87). Subcase 2. x C and y C. By the assumption, there exists z bdry(c) such that λx + (1 λ)y [z, y] or λx + (1 λ)y [x, z], say λx + (1 λ)y [z, y]. Then λx + (1 λ)y = αz + (1 α)y for some 0 α 1. As f is convex on [z, y], f (z) = 0, (91) g(λx + (1 λ)y) = f (αz + (1 α)y) α f (z) + (1 α) f (y) = (1 α) f (y). Now z = βx + (1 β)y for some 0 β 1, and λx + (1 λ)y = αz + (1 α)y = α(βx + (1 β)y) + (1 α)y = (αβ)x + (1 αβ)y give λ = αβ. Then by (91), using g(x) = f (x) 0, g(y) = f (y) 0, we obtain (92) (93) (94) g(λx + (1 λ)y) (1 αβ) f (y) = (1 λ) f (y) (1 λ) f (y) + λ f (x) = (1 λ)g(y) + λg(x), which is (87). Combining (i) (iii), we conclude that g is convex on R n. Theorem 8.2 Let T : R n R n and C := x R n Tx = x } be closed convex. Then T is a subgradient projector of a convex function f : R n R with lev 0 f = C if and only if there exists g : R n [, + ) such that g : R n \ C R is locally Lipschitz, g(x) = for every x C, and (i) for every x R n \ C, x Tx x Tx 2 g(x); (ii) the function defined by f (x) := exp(g(x)) if x C, 0 if x C, is convex. In this case, T = G f. Proof. : Assume that T is a subgradient projector, say T = G f1 with f 1 : R n R being convex and lev 0 f 1 = C. Then f = max0, f 1 } is convex and G f = G f1. Put g = ln f and C = lev 0 f. Since f is locally Lipschitz, g is locally Lipschitz on R n \ C. Note that g(x) = ( f (x))/ f (x) when f (x) > 0. Apply Theorem 4.1(i) to obtain (i). : Assume that (i), (ii) hold. When x C, (i) and (ii) give x Tx = 1 c(x), f (x) g(x) = f (x) 47

48 where c(x) g(x). Using (i) again, we have (95) Tx = x x Tx 2 c(x) = x c(x) c(x) 2 = G f (x) by Theorem 4.1(ii). Moreover, when x C, Tx = x = G f (x). Hence T = G f. For an n n symmetric matrix A, by A 0 we mean that A is positive semidefinite. Theorem 8.3 Let T : R n R n and C := x R n Tx = x }. Suppose that C is closed and convex, and T is continuously differentiable on R n \ C. Define T 1 : R n \ C R n : x x Tx x Tx 2. Then T is a subgradient projector of a convex function f : R n R with lev 0 f = C and being differentiable on R n \ C if and only if (i) For every x R n \ C, the matrix T 1 (x)(t 1 (x)) + T 1 (x) 0; (ii) There exists a function g : R n [, + ) such that ( x bdry(c)) ( x R n \ C) g(x) = T 1 (x), lim y x g(y) =, and ( x C) g(x) =. y R n \C Proof. : Assume that T = G f with f being convex and lev 0 f = C. Theorem 4.10 shows that f is continuously differentiable on R n \ C. By Theorem 4.1(i), we can put g = ln f to obtain (ii). Moreover, as f = exp(g), thanks to (16) in Theorem 4.1, for every x C we have f (x) = e g(x) g(x) = e g(x) T 1 (x), 2 f (x) = e g(x) T 1 (x)(t 1 (x)) + e g(x) T 1 (x) = e g(x)( T 1 (x)(t 1 (x)) + T 1 (x) ). Since f is convex, 2 f (x) 0, and this is equivalent to which is (i). T 1 (x)(t 1 (x)) + T 1 (x) 0 : Assume that (i) and (ii) hold. Put f = exp(g). Then lev 0 f = C, and for x R n \ C, f (x) = e g(x) g(x) = e g(x) T 1 (x), 2 f (x) = e g(x) T 1 (x)(t 1 (x)) + e g(x) T 1 (x) = e g(x)( T 1 (x)(t 1 (x)) + T 1 (x) ). (i) and (ii) imply that f is differentiable and convex on convex subsets of R n \ C, and f 0 on C. By Proposition 8.1, f is convex on R n. Moreover, when x = Tx we have (96) (97) G f (x) = x = x ( ( ) 2 f (x) f (x) = x T 1(x) f (x) f (x) T 1 (x) 2 x Tx x Tx 2 1 x Tx ) 2 = x (x Tx) = Tx. 48

49 Corollary 8.4 Let T : R R and C := x R Tx = x }. Suppose that C is a closed interval, and T is continuously differentiable on R \ C. Then T is a subgradient projector of a convex function f : R R with lev 0 f = C and being differentiable on R \ C if and only if (i) T is monotonically increasing on convex subsets of R \ C; (ii) The function g(x) = x a 1 s Ts ds satisfies lim x sup(c) g(x) = for some a > sup(c); and lim x inf(c) g(x) = for some a < inf(c). Proof. Define n : R R : x x Tx. Then for every x C, T 1 (x) = 1 equivalent to 1 n 2 (x) n (x) n 2 (x) 0. This is the same as n (x) 1, which transpires to T (x) 0. n(x). Theorem 8.3(i) is Remark 8.5 Let f : R n R be continuously differentiable, lev 0 f = Fix T, and G f = T. Can one use T to decide whether f is convex? The proof of Theorem 8.3 implies that f = f T 1 where If T 1 : R n \ lev 0 f R n : x x Tx x Tx 2. (98) f T 1 is monotone on convex subsets of R n \ lev 0 f, then f is convex on convex subsets of R n \ lev 0 f. Using Proposition 8.1, we conclude that max0, f } is convex on R n. When T is continuously differentiable, (98) is equivalent to (99) ( x R n \ C) T 1 (x)(t 1 (x)) + T 1 (x) 0. On R, (99) is equivalent to (100) T is monotonically increasing on convex subsets of R \ C. Corollary 8.6 Let T : R R and C := x R Tx = x }. Let C be a closed interval, and T be continuously differentiable on R \ C. Define Suppose that (i) N is nonexpansive; (ii) The function N : R R : x x Tx. g(x) = x a 1 s Ts ds satisfies lim x sup(c) g(x) = for some a > sup(c); and lim x inf(c) g(x) = for some a < inf(c). 49

50 Then T is a subgradient projector of a convex function f : R R with lev 0 f = C and being differentiable on R \ C. In particular, the assumption (i) holds when T is firmly nonexpansive. Proof. It suffices to observe that T = Id N. Since N is nonexpansive, T is monotone. Also note that T is firmly nonexpansive if and only if N is. We illustrate Corollary 8.4 with three examples. They demonstrate that both conditions (i) and (ii) in Corollary 8.4 are needed. More precisely, (i) is for the convexity of f ; (ii) is for lev 0 f = C. Example 8.7 Define T : R R by T(x) := x x + xe 2 x if x > 0, 0 if x 0. Then T is a subgradient projector of the nonconvex function e 2 x 1 if x > 0, f : R R : x 0 if x 0. In this case, T fails to be monotone, but T verifies condition (ii) of Corollary 8.4. Proof. When x > 0, f (x) = e 2 x x 1/2, so that f (x) = e2 x (1 1/(2 x)). x Since f (x) < 0 when x < 1/4, f is not convex on R. Now we show that (i). T fails to be monotone. This is equivalent to verify that for some x we have N (x) > 1 where N(x) = x Tx. Indeed, L Hospital s rule gives ( x x N (x) = e 2 x ) = 1 e 2 x 1 2 e 2 x x + x e 2. e 2 x 1 lim x 0 + e 2 x x = 2, so lim x 0 + N (x) = 2. Therefore, T is not monotone. (ii). T satisfies condition (ii) of Corollary 8.4. For x > 0, With a > 0, we have g(x) = x a 1 N(s) ds = x a N(x) = e2 x 1 e 2 x x. 1/2 e 2 x x 1/2 e 2 x 1 dx = x ln(e2 1) ln(e 2 a 1). Clearly, lim x 0 + g(x) =. Hence (ii) holds. 50

51 Example 8.8 Define T : R R : x x 1 2x if x = 0, 0 if x = 0. Then T = G f where f : R R : x e x2. However, lev 0 f = but Fix(T) = 0}. In this case, in Corollary 8.4 condition (i) holds but condition (ii) fails. Proof. We have N(x) = x T(x) = 1 2x and N (x) = 1 2x 2. Therefore, T is monotone on (0, + ) and (, 0). This says that condition (i) of Corollary 8.4 holds. However, when a > 0, for x > 0 we have g(x) = x a 1 x N(x) dx = 2xdx = x 2 a 2. a Then lim x 0 + g(x) = a 2, so condition (ii) of Corollary 8.4 fails. Example 8.9 Define T : R R by Then T = G f where the nonconvex function x x if x > 0, x 0 if x = 0, x x if x < 0. f : R R : x e 2 x if x 0, e 2 x if x < 0. However, lev 0 f = but Fix T = 0}. In this case, both conditions (i) and (ii) in Corollary 8.4 fail. Proof. The function f (x) = e 2 x is nonconvex on [0, + ), see Example 8.7. G f = T follows by direct calculations. Condition (i) of Corollary 8.4 fails: T is not monotonically increasing on [0, + ) since T (x) = < 0 when x > 0 is sufficiently near 0. x Condition (ii) of Corollary 8.4 fails. Indeed, N(x) = x T(x) = x when x 0. When a > 0, for x > 0 we have x 1 g(x) = ds = 2 x 2 a, s so that lim x 0 + g(x) = 2 a. a For further properties of subgradient projectors of convex functions, we refer the reader to [44, 54, 12]. This completes Part II. We will investigate conditions under which a subgradient projector is linear in part III. 51

52 Part III Linear subgradient projectors 9 Characterizations of G f,s when G f,s is linear We shall see in this section that under appropriate conditions a linear operator is a subgradient projector of a convex function if and only if it is a convex combination of the identity operator and a projection operator on a subspace (Theorems 9.6 and 9.11). For subgradient projectors of convex functions, see [12, 44, 9, 46, 47, 48]. We begin with 9.1 Linear cutters are precisely linear firmly nonexpansive mappings Proposition 9.1 Let H be a Hilbert space, and T : H H be a linear operator. Then the following are equivalent: (i) T is a cutter, i.e., quasi-firmly nonexpansive. (ii) T is firmly nonexpansive. (iii) There exists δ > 0 and x Fix T such that T is a cutter on B( x, δ), i.e., a local cutter. Proof. (i) (ii). Assume that T is a cutter. Then for every x X and u Fix T, x Tx, u Tx = Tx x, Tx u 0. Put u = 0. We have Tx x, Tx 0 0 Tx 2 x, Tx. Hence T is firmly nonexpansive, see [7, Corollary 4.3]. (ii) (i). Assume that T is firmly nonexpansive. Let u Fix T. Then Tu = u and (101) (102) (103) (104) Tx x, Tx u = Tx x, Tx Tu = Tx Tu + Tu x, Tx Tu = Tx Tu 2 + Tu x, Tx Tu = Tx Tu 2 x u, Tx Tu 0. Hence T is a cutter. (iii) (ii). By the assumption ( x B( x, δ))( u B( x, δ) Fix T) x Tx, u Tx 0. As T x = x, and T is linear, for x = x + v with v δ, we have 0 x Tx, x Tx = Tx x, Tx x = T(x x) (x x), T(x x) = Tv 2 v, Tv. Since T is linear, we have Tx 2 x, Tx for every x X, so T is firmly nonexpansive, see [7, Corollary 4.3]. 52

53 Since (i) (ii), and (i) implies (iii), the proof is done. The following example says that Proposition 9.1 fails if T is not linear. Example 9.2 Define the continuous nonlinear mapping x/2 if 2 x 2, 3 x if 2 x 3, T : R R : x (3 + x) if 3 x 2, 0 otherwise. Then T is a cutter, nonexpansive, but not firmly nonexpansive as T is not monotone; cf. [7, Proposition 4.2(iv)]. Indeed, Fix T = 0}. This means that T is a cutter if and only if (T(x)) 2 xt(x). When 2 x 2, we have (T(x)) 2 = x2 4 x x 2 = xt(x); When 2 x 3, (T(x)) 2 = (3 x) 2 = (3 x)(3 x) x(3 x); when 3 x 2, when x > 3, (T(x)) 2 = [ (x + 3)] 2 = [ (x + 3)][ (x + 3)] x[ (3 + x)]; (T(x)) 2 = 0 = xt(x). Hence T is a cutter. Clearly, T is nonexpansive. As T is not monotone, we conclude that T is not firmly nonexpansive. Remark 9.3 Observe that Example 9.2 is much simpler than the example on R 2 constructed by Cegielski [20, Example 2.2.8, page 68]. 9.2 Subgradient projector of powers of a quadratic function It is natural to investigate subgradient projectors of quadratic functions or their variants first. In the following result, we assume B = 0 because that B = 0 gives G f = Id with f 0. Theorem 9.4 Let a > 0 and B = 0 being an n n symmetric and positive semidefinite matrix. Consider the function f : R n R : x (x Bx) 1/(2a). Then the following hold: (i) lev 0 f = x R n Bx = 0 }. (ii) We have G f (x) = x a x Bx Bx 2 Bx if Bx = 0, x if Bx = 0. 53

54 (iii) G f is linear if and only if B = λp L where λ > 0 and L X is a subspace. In this case ker B = L, f (x) = λ 1/(2a)( d L (x) ) 1/a and G f = Id ap L = (1 a) Id +ap L. (iv) Assume that G f is linear. Then G f is a cutter if and only if 0 < a 1. Proof. (i). Since B is symmetric and positive semidefinite, there exists a matrix A such that B = A A; see, e.g., [38, page 558]. Then Ax = 0 Bx = 0. The result follows because f (x) = Ax 1/a. (ii). G f follows from direct calculations. (iii). : Assume that G f is linear. The mapping x T 1 (x) := a 1( x G f (x) ) = x Bx Bx 2 Bx if Bx = 0, 0 if Bx = 0, is linear. Let λ 1, λ 2 > 0 be any two eigenvalues of B. We show that λ 1 = λ 2. Suppose that λ 1 = λ 2. Take unit length eigenvector v i associated with λ i. Note that v 1, v 2 = 0, Bv i = 0 and B(v 1 + v 2 ) = λ 1 v 1 + λ 2 v 2 = 0. As T 1 is linear, we have T 1 (v 1 + v 2 ) = T 1 v 1 + T 1 v 2. Now (105) (106) (107) (108) (109) (110) T 1 (v 1 + v 2 ) = (v 1 + v 2 ) B(v 1 + v 2 ) B(v 1 + v 2 ) 2 B(v 1 + v 2 ) = (v 1 + v 2 ) (λ 1 v 1 + λ 2 v 2 ) λ 1 v 1 + λ 2 v 2 2 (λ 1 v 1 + λ 2 v 2 ) = λ 1 + λ 2 λ (λ 1 v 1 + λ 2 v 2 ), λ2 2 T 1 v 1 + T 1 v 2 = v 1 Bv 1 Bv 1 2 Bv 1 + v 2 Bv 2 Bv 2 2 Bv 2 = λ 1 v 1 2 λ 1 v 1 2 λ 1v 1 + λ 2 v 2 2 λ 2 v 2 2 λ 2v 2 = v 1 + v 2. As v 1, v 2 } are linearly independent, the above gives λ 1 = λ 2 which contradicts λ 1 = λ 2. Therefore, all positive eigenvalues of B have to be equal. Hence, we have ( ) B = λu Id 0 U 0 0 where U is an orthogonal matrix, λ > 0, Id is an m m identity matrix with m = rank B. The matrix ( ) U Id 0 U 0 0 is idempotent and symmetric, so it is a matrix associated with an orthogonal projection onto a closed subspace, say P L, [38, page 430, page 433]. Hence B = λp L 54

55 which implies that Bx = 0 if and only if P L x = 0, i.e., ker B = L. Then when P L x = 0, T 1 (x) = x Bx Bx 2 Bx = λx P L x x λp L λp L x λp Lx = λx P L x λ 2 x P L x λp Lx = P L x; when P L x = 0, T 1 x = 0 = P L x. Hence T 1 = P L. It follows that G f = Id at 1 = Id ap L = (1 a) Id +a(id P L ) = (1 a) Id +ap L. We proceed to find the expression for f (x): (111) (112) (113) (114) f (x) = (x Bx) 1/(2a) = (x λp L x) 1/(2a) = λ 1/(2a) (x P L P L x) 1/(2a) = λ 1/(2a) ( P L x 2 ) 1/(2a) = λ 1/(2a) ( x P L x 2 ) 1/(2a) = λ 1/(2a) (d L (x) 2 ) 1/(2a) = λ 1/(2a)( d L (x) ) 1/a. : Assume that B = λp L for λ > 0 and some subspace L R n. The assumption gives f (x) = λ 1/(2a)( d L (x) ) 1/a. By Proposition 3.4(i), G f = G( ) 1/a. By Theorem 3.9, G f = (1 a) Id +ag dl. By Fact 2.8, d L Hence G f is linear. G f = (1 a) Id +ap L. (iv). : Assume that G f is linear and a cutter. By Fact 9.1, G f is firmly nonexpansive, so is Id G f. By (ii) Id G f = ap L, ap L has to be nonexpansive. Because B = 0, we have L = 0}. Take 0 = x L. The nonexpansiveness requires so that a 1. ap L x ap L 0 = ax x : Assume that 0 < a 1. Since x (x Bx) 1/2 is convex, and the function [0, + ) t t 1/a is convex and increasing when 0 < a 1, we have that x f (x) = ( (x Bx) 1/2) 1/a is convex. Then G f is a cutter by Fact We illustrate Theorem 9.4(iv) with the following example. Example 9.5 Let a > 1. Consider f : R n R : x (x x) 1/(2a) = x 1/a. Then f is not convex, and G f (x) = (1 a)x for every x R n. Although G f is linear, it is not a cutter since it is not monotone; see, e.g., Proposition

56 9.3 Symmetric and linear subgradient projectors The following result completely characterizes symmetric and linear subgradient projectors. Theorem 9.6 Assume that T : R n R n is linear and symmetric. Then the following are equivalent: (i) T is a subgradient projector of a convex function f : R n R with lev 0 f =. (ii) T = G f where f : R n R is given by (115) f (x) = K(x P L x) 1/(2λ) = K ( d L (x) ) 1/λ where 0 < λ 1, K > 0, and L R n is a subspace such that L G f = (1 λ) Id +λp L. = Fix T. In this case, Proof. (i) (ii). Assume that T = G f for some convex function. Since T is linear and a cutter, T is firmly nonexpansive by Proposition 9.1. Then T 1 = Id T is firmly nonexpansive by [7, Proposition 4.2]. We consider two cases. Case 1. int lev 0 f =. We have T 1 0 on an open set B(x 0, ε) lev 0 f, i.e., T 1 (x 0 + b) = 0 for every b < ε. As T 1 is linear, T 1 (b) = T 1 (x 0 + b) T 1 (x 0 ) = 0 0 = 0 when b < ε, so T 1 0 on R n. Thus, T = Id on R n. Then T = G f with f 0. This means that (ii) holds with L = 0}, λ = 1 and K > 0. Case 2. int lev 0 f =. Since lev 0 f is a proper subspace, it is an intersection of a finite collection of hyper-planes [50, Corollary 1.4.1], so R n \ lev 0 f is union of a finite collection of open half spaces. As T 1 is continuous, we only need to consider Then T 1 (x) = T 1 (x) = f (x) f (x) when f (x) > 0. f (x) 2 f (x) f (x) and f (x) f (x) = T 1(x) T 1 (x) 2. Since T is symmetric, T 1 is symmetric, so there exists an orthogonal matrix Q such that Q T 1 Q = D where D is an diagonal matrix and Q denotes the transpose of Q. Put g = ln f and x = Qy. When y Q (Fix T), we have ( g)(qy) = T 1Qy T 1 Qy 2. Multiplying both sides by Q and using Q being an isometry (i.e., Q z = z for every z R n ) give Q ( g)(qy) = Q T 1 Qy T 1 Qy 2 = Dy Q T 1 Qy 2 = Dy Dy 2. If we put h = g Q, then h(y) = Q g(qy) for every y R n \ (Q Fix T), so ( y R n \ (Q Fix T)) h(y) = Dy Dy 2. 56

57 Moreover, R n \ (Q Fix T) is a finite union of open half spaces, because Q Fix T is a proper subspace of R n. Write λ λ 2 0 D = λ n When λ 1 = = λ n = 0, this is covered in Case 1. We thus assume that T 1 0. As T 1 is monotone, we can and do assume that λ 1,, λ m > 0 and λ m+1 = = λ n = 0. Then ( ) λ h(y) = 1 y 1 λ m y m,,, 0,, 0. m k=1 λ2 k y2 k m k=1 λ2 k y2 k Since h has continuous second order derivatives on the nonempty open R n \ (Q Fix T), it must hold that 2 h = 2 h y i y j y j y i which gives (116) 2λ j λ 2 i y iy j m k=1 λ2 k y2 k = 2λ iλ 2 j y iy j m k=1 λ2 k y2 k when 1 i, j m, i = j. As int lev 0 f = int Fix T =, (116) holds on the nonempty open R n \ (Q Fix T), so we have λ i = λ j. Because 1 i, j m were arbitrary, we obtain that λ 1 = = λ m. Hence T 1 = Q ( ) λ Idm 0 Q = λq 0 0 ( ) Idm 0 Q = λp 0 0 L where L R n is a linear subspace; see [38, page 430]. More precisely, T 1 is a positive multiple of an orthogonal projector with (117) Fix T = ker T 1 = L. Now T 1 is firmly nonexpansive and T 1 = T 1, this implies that T 1 + T 1 2 T 1 T 1 = T 1 T 2 1 = Q ( (λ λ 2 ) Id 0 0 is positive semidefinite, so 0 λ 1. Because T 1 = 0 in this case, we obtain 0 < λ 1. Therefore, when x Fix T, Note that P L = P L, P 2 L = P L, ln f (x) = T 1x T 1 x 2 = λp Lx λp L x 2 = 1 λ ln P L x = 1 P L x P Lx = 1 P L x P L ) P L x P L x 2. P L x P L x = Q P Lx P L x 2. It follows that ln f (x) = 1 λ ln P Lx = ln P L x 1/λ. 57

58 On each connected and open component of R n \ Fix T, this is equivalent to ln f (x) = ln P L x 1/λ + c for some constant c R. Taking exp both sides gives (118) f (x) = K P L x 1/λ = K( P L x 2 ) 1/(2λ) = K(x P L x) 1/(2λ) where K = exp(c) > 0. As P L = Id P L, we obtain Moreover, f (x) = K x P x 1/λ = K(d L (x)) 1/λ. T = G f = Id T 1 = Id λp L = Id λ(id P L ) = (1 λ) Id +λp L where L = Fix T by (117). One can apply the same argument on each connected and open component of R n \ Fix T, while one might have different constant K s in (118), but λ will be the same. Indeed, suppose that there exist 0 < λ, λ 1 1, λ = λ 1 such that (1 λ) Id +λp Fix T = (1 λ 1 ) Id +λ 1 P Fix T. Then P Fix T = Id so that Fix T = R n, which contradicts that int Fix T =. Using the same K > 0 for all connected and open component of R n \ Fix T, one obtains (115). (ii) (i). Clear. Theorem 9.6 is proved under the assumption that the linear subgradient projector of a convex function is symmetric. We think that the assumption of symmetry is superfluous; cf. Theorem Conjecture 9.7 If f : R n R is convex and its subgradient projector G f,s is linear, then G f,s must be symmetric. Note that when f is not convex, G f,s can be nonsymmetric; see Corollary 11.6(ii). 9.4 Characterization of linear subgradient projectors In subsection 9.3, we assume that the linear operator is symmetric. What happens if the linear operator is not symmetric? For this purpose we need the following result. Proposition 9.8 Let M : R n R n be linear, monotone and ( x R n \ ker M) h(x) = Mx Mx 2 where the function h : R n \ ker M R. If dim ran M = 2, then M is symmetric. Proof. If dim ran M = 0, then M = 0, so it is symmetric. Let us assume that dim ran M > 0 and dim ran M = 2. Since h has continuous mixed second order derivatives at x whenever Mx = 0, the Hessian matrix 2 h(x) is symmetric. As 2 h(x) = Mx 2 M Mx( Mx 2 ) Mx 4 = Mx 2 M 2Mxx M M Mx 4, 58

59 the symmetric property means that Mx 2 M 2Mxx M M = ( Mx 2 M 2Mxx M M) = M Mx 2 2M Mxx M whenever Mx = 0. Put y = Mx. The above is simplified to M M 2 = yy yy M M y 2 y 2. Denote the projection operator on the line spanned by y}, span(y), by P y := yy y 2. We have (119) ( y ran M) M M 2 = P y M M P y. Since M is monotone, ran M = ran M ; see, e.g., [14, Theorem 3.2]. Let e i i = 1,..., m } be an orthonormal basis of ran M. Then (120) P ran M = Note that m e i ei. i=1 (121) M P ran M = M P ran M = M (P ran M + P (ran M ) ) = M because M P (ran M ) = 0. To see this, let y (ran M ). For every z R n, M y, z = y, Mz = 0 because Mz ran M = ran M. Because z R n was arbitrary, we must have M y = 0. Since (122) M M 2 = P ei M M P ei by (119), summing up (122) from from i = 1 to i = m, followed by using (120) and (121), we obtain m 2 (M M ) = ( m i=1 P ei )M M ( m i=1 P ei ) = P ran M M M P ran M = M M, that is, ( m 2 1)(M M ) = 0. Hence M M = 0 because m = 2, and so M is symmetric. The proof of Proposition 9.8 requiring dim ran M = 2 seems bizarre. However, the following examples show that Proposition 9.8 fails when dim ran M = 2. Example 9.9 When dim ran M = 2, although M : R 2 R 2 is linear, monotone and one cannot guarantee that M is symmetric. To see this, let x = (x, y) R 2. ( x R n \ ker M) h(x) = Mx Mx 2, 59

60 (1). Define M := Then M is linear, monotone, dim ran M = 2 and arctan(y/x) = whenever x = 0. However, M is not symmetric. (2). Define M := Then M is linear, firmly nonexpansive and ( ) ( ) y x x 2 + y 2 = ( ) 1/2 1/2. 1/2 1/2 ( ln(x 2 + y 2 ) ) + arctan(y/x) = 2 Mx Mx 2 ( ) x y y + x x 2 + y 2 = Mx Mx 2 whenever x = 0. However, dim ran M = 2 and M is not symmetric. Conjecture 9.10 Let M : R n R n be linear, monotone and ( x R n \ ker M) h(x) = Mx Mx 2 where the function h : R n \ ker M R. If dim ran M = 2 and exp(h) is convex on convex subsets of R n \ ker M, then M is symmetric. Combining Theorem 9.6 and Proposition 9.8, we obtain the following characterization of linear subgradient projectors. Theorem 9.11 Assume that T : R n R n is linear and dim ran(id T) = 2. Then the following are equivalent: (i) T is a subgradient projector of a convex function f : R n R with lev 0 f =. (ii) T = G f where f : R n R is given by f (x) := K(x P L x) 1/(2λ) = K ( d L (x) ) 1/λ where 0 < λ 1, K > 0, and L R n is a subspace such that L G f = (1 λ) Id +λp L. = Fix T. In this case, Proof. (i) (ii). Assume that T = G f for some convex function f : R n R. Then T is a cutter by Fact As T is linear, in view of Proposition 9.1, T is firmly nonexpansive, so M := Id T is firmly nonexpansive, in particular, monotone. By Theorem 4.1(i), (123) h(x) = Mx Mx 2 60

61 where h(x) = ln f (x), f (x) > 0. Since Fix T = lev 0 f = ker M, (123) is equivalent to h(x) = Mx Mx 2 when Mx = 0. Proposition 9.8 shows that M is symmetric, so is T = Id M. It suffices to apply Theorem 9.6 to obtain (ii). (ii) (i). Clear. 10 Subgradient projectors of convex functions are not closed under convex combinations and compositions A convex combination of cutters is a cutter, see [20, Corollary ] or [7, Proposition 4.34]. Convex combinations of a finite family of cutters with a common fixed point are effectively used in simultaneous cutter methods; see [20, Section 5.8], [7, Corollary 5.18]. A question that naturally arises is whether the set of subgradient projectors of convex functions is convex. Theorem 9.6 allows us to show that the answer is negative. While Theorem 10.1 works only in R 2, Theorem 10.3 works in R n with n 2. Theorem 10.1 In R 2, a convex combination of subgradient projectors of convex functions need not be a subgradient projector of a convex function. Proof. Let L := 0} R R 2 and M := x = (x 1, x 2 ) R 2 x 1 + x 2 = 0 }. Both L, M are proper linear subspaces of R 2. Define f, g : R 2 R by (124) ( x R 2 ) f (x) := K 1 ( dl (x) ) 1/λ 1, g(x) := K 2 ( dm (x) ) 1/λ 2 where 0 < λ 1 = λ 2 < 1, K 1, K 2 > 0. By Theorem 9.6, we have (125) G f = (1 λ 1 ) Id +λ 1 P L, and G g = (1 λ 2 ) Id +λ 2 P M. Now consider λ 3 G f + (1 λ 3 )G g where 0 < λ 3 < 1. Then (126) (127) (128) λ 3 G f + (1 λ 3 )G g ( =λ 3 (1 λ1 ) Id +λ 1 P L ) + (1 λ3 ) ( ) (1 λ 2 ) Id +λ 2 P M =(1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M. We show that λ 3 G f + (1 λ 3 )G g is not a subgradient projector of a convex function by contradiction. Suppose that λ 3 G f + (1 λ 3 )G g is a subgradient projector of a convex function. By Theorem 9.6, there are 0 < λ < 1 and S which is a subspace of R 2 such that (129) λ 3 G f + (1 λ 3 )G g = (1 λ) Id +λp S. Therefore, we have (130) (1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M = (1 λ) Id +λp S. 61

62 Naturally, the set of fixed points of left-hand side is equal to the set of fixed points of right-hand side. Thus we have (131) (132) Fix ((1 λ) Id +λp S ) = Fix ((1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M ). By [7, Proposition 4.34], we have (133) Fix ((1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M ) = L M. Also, (134) Fix ((1 λ) Id +λp S ) = S. Hence, using definitions of L, M, and (131)-(134), it follows that (135) (136) (137) (138) (0, 0)} = L M = Fix ((1 λ 2 + λ 2 λ 3 λ 1 λ 3 ) Id +λ 1 λ 3 P L + λ 2 (1 λ 3 )P M ) = Fix ((1 λ) Id +λp S ) = S. Therefore S = (0, 0)}, which implies S = R 2. In terms of matrices, we have ( ) ( ) ( ) 1 0 1/2 1/2 0 0 (139) P L =, P 0 0 M =, and P S =. 1/2 1/2 0 0 In particular, P L, P S are diagonal matrices, but P M is not. Hence, equation (130) is not true. Therefore, λ 3 G f + (1 λ 3 )G g is not a subgradient projector of a convex function. Our next result needs averaged mappings. Definition 10.2 (See [4], [7, Definition 4.23]) Let λ (0, 1). An operator T : R n R n is λ-averaged if there exists a nonexpansive operator N : R n R n such that T = (1 λ) Id +λn. Theorem 10.3 Let n 2, 0 < λ 1 < 1, 0 < λ 2 < 1, 0 < λ < 1. Suppose that L, M are linear subspaces of R n satisfying L = M, M = L, and that both L and M are proper linear subspaces of R n. Define f : R n R : x (d L (x)) 1/λ 1, and g : R n R : x (d M (x)) 1/λ 2. If 1 λ λ = λ 2 λ 1, then (1 λ)g f + λg g is not a subgradient projector of a convex function. Proof. By Theorem 9.6, we have (140) G f = (1 λ 1 ) Id +λ 1 P L, and G g = (1 λ 2 ) Id +λ 2 P M. Then (141) (142) (1 λ)g f + λg g =(1 λ) ((1 λ 1 ) Id +λ 1 P L ) + λ ((1 λ 2 ) Id +λ 2 P M ) 62

63 (143) (144) = [(1 λ)(1 λ 1 ) + λ(1 λ 2 )] Id +λ 1 (1 λ)p L + λλ 2 P M =β Id +(1 β)(γp L + (1 γ)p M ), where β := (1 λ)(1 λ 1 ) + λ(1 λ 2 ) and γ := and γ = 1 2. Indeed, λ 1 (1 λ). We observe that 0 1 (1 λ)(1 λ 1 ) λ(1 λ 2 < β < 1 ) (145) 0 < β = (1 λ)(1 λ 1 ) + λ(1 λ 2 ) < (1 λ) + λ = 1. Also, (146) (147) (148) (149) γ = 1 2 λ 1 (1 λ) 1 (1 λ)(1 λ 1 ) λ(1 λ 2 ) = λλ 2 1 (1 λ)(1 λ 1 ) λ(1 λ 2 ) λ 1 (1 λ) = λλ 2 1 λ = λ 2. λ λ 1 1 λ By the assumption: λ = λ 2 λ 1, so γ = 1 2. Since G f, G g are linear and symmetric, so is (1 λ)g f + λg g. We show that (1 λ)g f + λg g is not a subgradient projector of a convex function by contradiction. If (1 λ)g f + λg g is a subgradient projector of a convex function, by Theorem 9.6, we have (150) (1 λ)g f + λg g = (1 α) Id +αp S where 0 < α < 1 and S is a subspace of X. (1 λ)g f + λg g. Because Note that G f, G g are averaged mappings, so is (151) Fix G f = L, and Fix G g = M, by [7, Proposition 4.34], we obtain (152) Fix((1 λ)g f + λg g ) = Fix G f Fix G g = L M = 0}. Because Fix((1 α) Id +αp S ) = S, using (150) and (152) we obtain S = 0}. Therefore, in view of equation (150), we have (153) (1 λ)g f + λg g = (1 α) Id. Combing (144) and (153) gives (154) β Id +(1 β)(γp L + (1 γ)p M ) = (1 α) Id. We proceed to analyze α, β. Take x M \ 0}, which is possible since M = 0}. Then P M x = 0, P L x = P M x = x. Equation (154) gives (155) βx + (1 β)(γx) = (1 α)x, which implies (156) β + (1 β)γ = 1 α. 63

64 Take x L \ 0}, which is possible since L = 0}. Then P L x = 0, P M x = P L x = x. Equation (154) gives (157) βx + (1 β)(1 γ)x = (1 α)x, which implies (158) β + (1 β)(1 γ) = 1 α. Subtracting equation (156) from equation (158), we have (159) (1 β)(1 2γ) = 0 which implies β = 1 or γ = 1 2. This contradicts the choices of λ, λ 1, λ 2. If two nearest point projectors onto subspaces commute, then their composition is the projection onto the intersection of the subspaces; see [27, Lemma 9.2]. One referee asks whether there is an analogue when two linear subgradient projectors commute. The answer is negative. To this end, we need an auxiliary result. Lemma 10.4 Let L, M R n be two subspaces, and λ i [0, 1) with i = 1, 2. Then the following are equivalent: (i) ( λ 1 Id +(1 λ 1 )P L )( λ2 Id +(1 λ 2 )P M ) = ( λ2 Id +(1 λ 2 )P M )( λ1 Id +(1 λ 1 )P L ). (ii) P L P M = P M P L. (iii) P L P M = P M P L. Proof. (i) (ii): This follows from ( )( (160) λ1 Id +(1 λ 1 )P L λ2 Id +(1 λ 2 )P M ) (161) = λ 1 λ 2 Id +λ 1 (1 λ 2 )P M + (1 λ 1 )λ 2 P L + (1 λ 1 )(1 λ 2 )P L P M, and (162) (163) ( )( λ2 Id +(1 λ 2 )P M λ1 Id +(1 λ 1 )P L ) = λ 1 λ 2 Id +λ 2 (1 λ 1 )P L + (1 λ 2 )λ 1 P M + (1 λ 1 )(1 λ 2 )P M P L. (ii) (iii): Since P L = Id P L, P M = Id P M, (ii) is equivalent to (Id P L )(Id P M ) = (Id P M )(Id P L ), which is (iii) after simplifications. Theorem 10.5 In R 2, even though two linear subgradient projectors of convex functions commute, its composition need not be a subgradient projector of a convex function. Proof. Let 0 < λ 1 < λ 2 < 1. Because ( ) 1 0 = P 0 0 R 0}, and ( ) 0 0 = P 0 1 0} R, 64

65 by Theorem 9.6, there exist two convex functions f, g : R 2 R such that ( ) ( ) ( ) G f = λ 1 + (1 λ ) =, and λ 1 ( ) ( ) ( ) λ2 0 G g = λ 2 + (1 λ ) = These two subgradient projectors are commutative by Lemma 10.4 or a direct calculation: ( ) λ2 0 G f G g = G g G f =. 0 λ 1 We claim that T := G f G g is not a subgradient projector of a convex function. We prove this by contradiction. Suppose that T is a subgradient projector. Since T is symmetric and linear, by Theorem 9.6, there exists 0 λ 1 such that (164) T = λ Id +(1 λ)p where P is a projector onto a subspace of R 2. We consider five cases. Case 1. λ = 0. This gives P = T. Because P is a projector, its eigenvalues are 0 or 1. This is impossible, since 0 < λ i < 1 and λ 1 = λ 2. Case 2. λ = 1. This gives T = Id. This is impossible, since λ 1 = λ 2. Case 1 and Case 2 implies that 0 < λ < 1. This gives P = ( λ2 λ 1 λ 0 λ 0 1 λ 1 λ Case 3. λ > λ 1. Then λ 1 λ < 0. This is impossible, since the eigenvalues of P have to be nonnegative. Case 4. λ = λ 1. Since λ 2 λ 1 λ > 0, and P has eigenvalues only of 0 or 1, we have It follows that λ 2 = 1, which is impossible. λ 2 λ 1 λ = 1. Case 5. 0 < λ < λ 1. Then λ 1 λ 1 λ > 0, λ 2 λ 1 λ > 0. Since P has eigenvalues only of 0 or 1, we must have from which λ 1 = λ 2. This is impossible. ) λ 2 λ 1 λ = λ 1 λ 1 λ = 1 Altogether, (164) does not hold. Using Theorem 9.6 again, we conclude that T is not a subgradient projector of a convex function. 65.

66 11 A complete analysis of linear subgradient projectors on R 2 In this section we turn our attention to linear operators on R 2. One nice feature is that we are able to not only characterize when the linear operator is a subgradient projector but also give explicit formulae for the corresponding functions. Is every linear mapping from R 2 to R 2 a subgradient projector of an essentially strictly differentiable function (convex or nonconvex) on R 2? The answer is no by Theorem 11.2 below. Theorem 11.2(iii) also shows that Theorem 9.11 fails if dim ran(id T) = 2 is removed. We start with a simple result about essentially strictly differentiable functions, see Definition 4.3. Lemma 11.1 Let O R n be a nonempty open set and f : O R n R be an essentially strictly differentiable function. If there exists a continuous selection s : O R n with s(x) f (x) for every x O, then f is strictly differentiable on O. Consequently, f is continuously differentiable on O. Proof. By [15, Theorem 2.4, Corollary 4.2], f has a minimal Clarke subdifferential c f, and c f can be recovered by every dense selection of c f. Since s(x) f (x) c f (x), and s is continuous on O, we have c f (x) = f (x) = s(x)} for every x O, which implies that f is strictly differentiable at x; see, e.g., [51, page 362, Theorem 9.18] or [40, Theorem 3.54]. Hence f is strictly differentiable on O. We consider the linear operator T : R 2 R 2 defined by (165) where ( ) 1 a b T := c 1 d (166) a 2 + b 2 + c 2 + d 2 = 0 (i.e., (a, b, c, d) = (0, 0, 0, 0)). Note that when a = b = c = d = 0, we have T = Id = G f with f 0. Theorem 11.2 Let T be given by (165). Then T is a subgradient projector of an essentially strictly differentiable function on R 2 \ Fix T if and only if one of the following holds: (i) a = b = c = 0, d = 0: T = G f where f (x 1, x 2 ) := K x 2 1/d for some K > 0; b = c = d = 0, a = 0: T = G f where f (x 1, x 2 ) := K x 1 1/a for some K > 0. (ii) a = 0, d = 0, b = c = 0, ad = c 2 : T = G f where f (x 1, x 2 ) := K ax 1 + cx 2 a/(a2 +c 2) for some K > 0. (iii) a = d, b = c, and a 2 + c 2 = 0: T = G f where K(x1 2 + x2 2 ) a ( ( )) 2(a 2 +c 2 ) exp c arctan x1 a 2 +c 2 x 2 if x 2 = 0, (167) f (x 1, x 2 ) := 0 if (x 1, x 2 ) = (0, 0), a ( ) K x 1 (a 2 +c 2 ) exp c π a 2 +c 2 2 if x 1 = 0, x 2 = 0, for some K > 0, and f is lsc. In particular, when c = 0, f is not convex. 66

67 Proof. Observe that (166) implies that Fix T is a proper subspace of R 2. Assume that T is a subgradient projector. By Theorem 4.1 and Lemma 11.1, we can find a differentiable function g : (R 2 \ Fix T) R such that for every x R 2 \ Fix T, x Tx x Tx 2 = g(x). Because ( a b x Tx = c d ) ( x1 x 2 ) ( ax1 + bx = 2 cx 1 + dx 2 ), we have g x 1 = g x 2 = ax 1 + bx 2 (ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2, cx 1 + dx 2 (ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2. Since 2 x 1 x 2 g(x 1, x 2 ) = (a2 b bc 2 + 2acd)x (b3 + bd 2 )x (ab2 + ad 2 )x 1 x 2 ((ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2 ) 2, 2 x 2 x 1 g(x 1, x 2 ) = (a2 c + c 3 )x (cd2 b 2 c + 2abd)x (c2 d + a 2 d)x 1 x 2 ((ax 1 + bx 2 ) 2 + (cx 1 + dx 2 ) 2 ) 2, on the nonempty open set of R 2 \ Fix T, we have 2 x 1 x 2 g(x 1, x 2 ) = 2 x 2 x 1 g(x 1, x 2 ). This leads to a 2 b bc 2 + 2acd = a 2 c + c 3, (1) b 3 + bd 2 = cd 2 b 2 c + 2abd, (2) ab 2 + ad 2 = c 2 d + a 2 d. (3) Now multiplying (2) by a, followed by subtracting it with (3) multiplied by b, gives (ad bc)(ab + cd) = 0. It suffices to consider two cases: Case ad = bc. (1) implies (b c)(a 2 + c 2 ) = 0. Observe that b c = 0 is impossible by (2). Then the following two subcases could happen. i. b = c = 0. Then (3) ad(a d) = 0. This means (168) a = b = c = 0, d = 0, or (169) b = c = d = 0, a = 0. ii. b = c = 0, which implies a = 0, d = 0, and ad = c 2. 67

68 Case ab + cd = 0. (1) implies (b + c)(a 2 + c 2 ) = 0. When b = c = 0, (3) gives ad(a d) = 0, which leads to (168), (169), or a = d = 0. It remains to consider the case b = c = 0. Then (2) and (3) imply a = d. Moreover, we can and do assume a 2 + c 2 = 0 since a = c = 0 gives (168) by (2). In summary, we only have the following three cases. Case 1. a = b = c = 0, d = 0. Then we get g(x 1, x 2 ) = ln x 2 d Or b = c = d = 0, a = 0. Then we get g(x 1, x 2 ) = ln x 1 a Case 2. a = 0, d = 0, b = c = 0, ad = c 2. Then we get g(x 1, x 2 ) = Case 3. a = d, b = c, and a 2 + c 2 = 0. Then we get g(x 1, x 2 ) = + C 1, if x 2 = 0. + C 1, if x 1 = 0. a a 2 + c 2 ln ax 1 + cx 2 + C 2, if ax 1 + cx 2 = 0. a 2(a 2 + c 2 ) ln(x2 1 + x2 2) c a 2 + c 2 arctan ( x1 x 2 ) + C 3, if x 2 = 0. Since g = ln f, we obtain f = exp(g) by using Case 1-Case 2. For Case 3, we obtain ( f (x 1, x 2 ) = K(x1 2 + a x2 2(a 2) 2 +c 2 ) exp c ( )) a 2 + c 2 arctan x1 if x 2 = 0 for some K > 0. However, when c = 0, f is not continuous at ( x 1, 0) with x 1 = 0 since lim arctan x 1 = lim x 1 x 1,x 2 0 x arctan x 1 = ± π 2 x 1 x 1,x 2 0 x 2 2. The function given by (167) is lsc but not continuous at every ( x 1, 0). Moreover, f is not convex on R 2 since a finite-valued convex function on a finite dimensional space is continuous; see, e.g., [7, Corollary 8.31]. It is interesting to ask for what selection s f, we have G f = T on R 2. On R 2 \ (x1, x 2 ) x 2 = 0 }, one clearly chooses s = f. It remains to determine the subgradient of f at ( x 1, 0). Indeed, when x 2 = 0, f (x 1, x 2 ) = exp(g(x 1, x 2 )), so that f (x 1, x 2 ) = f (x 1, x 2 ) g(x 1, x 2 ), i.e. ( f (x 1, x 2 ) = K(x1 2 + a x2 2(a 2) 2 +c 2 ) exp c ( )) a 2 + c 2 arctan x1 1 (ax 1 cx 2, ax 2 + cx 1 ) x 2 a 2 + c 2 x x2 2 When (x 1, x 2 ) ( x 1, 0), cx 1 /x 2 > 0, we have ( a f (x 1, x 2 ) K x 1 (a 2 +c 2 ) exp c ) π a 2 + c 2 = f ( x 1, 0), 2 68 x 2

69 and ( a f (x 1, x 2 ) K x 1 (a 2 +c 2 ) exp c ) ( π 1 a a 2 + c 2 2 a 2 + c 2, x 1 Therefore, by the definition of limiting subdifferentials (see Definition 2.1), ( a (170) K x 1 (a 2 +c 2 ) exp c ) ( ) π 1 a c a 2 + c 2 2 a 2 + c 2, f ( x 1, 0). x 1 x 1 Hence, we can choose s( x 1, 0) to be the limiting subgradient given by (170). c x 1 ). Remark 11.3 Note that f ( x 1, 0) is not a singleton when x 1 = 0 and c = 0 in Theorem 11.2(iii). Thus, in Theorem 11.2(iii), we only have T G f when c = 0. In order to make f continuous on R n, we need c = 0, in which case (167) reduces to f (x 1, x 2 ) = K (x 1, x 2 ) 1/a, and G f = (1 a) Id. Clearly, f is not convex when a > 1. This has been discussed in Example 2.5. Corollary 11.4 Let T be given by (165). Suppose that one of the following holds: (i) b = c. (ii) b = c = 0, a = 0, d = 0, and a = d. Then there exists no f : R 2 R being essentially strictly differentiable such that T = G f. Corollary 11.5 Let T be given by (165). Suppose that b = c = 0, 0 < a < 1, 0 < d < 1, and a = d. Then T is firmly nonexpansive, and there exists no f : R 2 R being essentially strictly differentiable such that T = G f. Corollary 11.6 (i) The skew linear mapping T := ( ) is not firmly nonexpansive, so not a cutter. However, T is a subgradient projector of a nonconvex, discontinuous but lsc function f 1 given by (x 2 + y 2 ) 1/4 exp ( (1/2) arctan(x/y) ) if y = 0, (171) f 1 (x, y) := 0 if (x, y) = (0, 0), x 1/2 exp( π/4) if x = 0, y = 0. (ii) The linear mapping T := ( 1/2 ) 1/2 1/2 1/2 is firmly nonexpansive and a cutter. However, T is a subgradient projector of a nonconvex, discontinuous but lsc function f 2 given by (x 2 + y 2 ) 1/2 exp ( arctan(x/y) ) if y = 0, (172) f 2 (x, y) := 0 if (x, y) = (0, 0), x exp( π/2) if x = 0, y = 0. 69

70 Figure 1: Plot of function given by (171) Figure 2: Plot of function given by (172) Note that f 2 = f 2 1 in Corollary 11.6 and G f 2 = (Id +G f1 )/2. Remark 11.7 Corollary 11.5 and Corollary 11.6 together show that although the set of cutters and the set of subgradient projectors have a nonempty intersection, they are different because neither one contains the other. By Theorem 4.5, there exists no continuous convex function f such that G f = T in either case of Corollary Corollary 11.6 says that T = G f being linear and firmly nonexpansive does not imply that f is convex. A key point below is that if T = G f is linear and f is convex on R 2, then Theorem 11.2 implies that T has to be firmly nonexpansive and symmetric. Corollary 11.8 Let T be given by (165). Then T is a subgradient projector of a convex function if and only if one of the following holds: (i) a = b = c = 0, d = 0, 0 < d 1: T = G f where f (x 1, x 2 ) = K x 2 1/d for some K > 0; or b = c = d = 0, a = 0, 0 < a 1: T = G f where f (x 1, x 2 ) = K x 1 1/a for some K > 0. (ii) a = 0, d = 0, b = c = 0, ad = c 2, a a 2 + c 2 : T = G f where f (x 1, x 2 ) = K ax 1 + cx 2 a/(a2 +c 2 ) for some K > 0. (iii) a = d, b = c = 0, 0 < a 1: T = G f where f (x 1, x 2 ) = K(x x2 2 ) 1 2a for some K > 0. Acknowledgments The authors thank two anonymous referees for careful reading and constructive suggestions on the paper. HHB was partially supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (NSERC) and by the Canada Research Chair Program. CW was partially supported by National Natural Science Foundation of China ( ). XW was partially supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (NSERC). JX was supported by by NSERC grants of HHB and XW. 70

On the convexity of piecewise-defined functions

On the convexity of piecewise-defined functions On the convexity of piecewise-defined functions arxiv:1408.3771v1 [math.ca] 16 Aug 2014 Heinz H. Bauschke, Yves Lucet, and Hung M. Phan August 16, 2014 Abstract Functions that are piecewise defined are

More information

Strongly convex functions, Moreau envelopes and the generic nature of convex functions with strong minimizers

Strongly convex functions, Moreau envelopes and the generic nature of convex functions with strong minimizers University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part B Faculty of Engineering and Information Sciences 206 Strongly convex functions, Moreau envelopes

More information

arxiv: v1 [math.oc] 15 Apr 2016

arxiv: v1 [math.oc] 15 Apr 2016 On the finite convergence of the Douglas Rachford algorithm for solving (not necessarily convex) feasibility problems in Euclidean spaces arxiv:1604.04657v1 [math.oc] 15 Apr 2016 Heinz H. Bauschke and

More information

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings arxiv:1505.04129v1 [math.oc] 15 May 2015 Heinz H. Bauschke, Graeme R. Douglas, and Walaa M. Moursi May 15, 2015 Abstract

More information

On subgradient projectors

On subgradient projectors On subgradient projectors Heinz H. Bauschke, Caifang Wang, Xianfu Wang, and Jia Xu February 16, 2015 Abstract The subgradient projector is of considerable importance in convex optimization because it plays

More information

Monotone operators and bigger conjugate functions

Monotone operators and bigger conjugate functions Monotone operators and bigger conjugate functions Heinz H. Bauschke, Jonathan M. Borwein, Xianfu Wang, and Liangjin Yao August 12, 2011 Abstract We study a question posed by Stephen Simons in his 2008

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

The resolvent average of monotone operators: dominant and recessive properties

The resolvent average of monotone operators: dominant and recessive properties The resolvent average of monotone operators: dominant and recessive properties Sedi Bartz, Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang September 30, 2015 (first revision) December 22, 2015 (second

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In

More information

Convex Feasibility Problems

Convex Feasibility Problems Laureate Prof. Jonathan Borwein with Matthew Tam http://carma.newcastle.edu.au/drmethods/paseky.html Spring School on Variational Analysis VI Paseky nad Jizerou, April 19 25, 2015 Last Revised: May 6,

More information

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Convex Analysis and Economic Theory AY Elementary properties of convex functions Division of the Humanities and Social Sciences Ec 181 KC Border Convex Analysis and Economic Theory AY 2018 2019 Topic 6: Convex functions I 6.1 Elementary properties of convex functions We may occasionally

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Optimality, identifiability, and sensitivity

Optimality, identifiability, and sensitivity Noname manuscript No. (will be inserted by the editor) Optimality, identifiability, and sensitivity D. Drusvyatskiy A. S. Lewis Received: date / Accepted: date Abstract Around a solution of an optimization

More information

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Optimality, identifiability, and sensitivity

Optimality, identifiability, and sensitivity Noname manuscript No. (will be inserted by the editor) Optimality, identifiability, and sensitivity D. Drusvyatskiy A. S. Lewis Received: date / Accepted: date Abstract Around a solution of an optimization

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

Dedicated to Michel Théra in honor of his 70th birthday

Dedicated to Michel Théra in honor of his 70th birthday VARIATIONAL GEOMETRIC APPROACH TO GENERALIZED DIFFERENTIAL AND CONJUGATE CALCULI IN CONVEX ANALYSIS B. S. MORDUKHOVICH 1, N. M. NAM 2, R. B. RECTOR 3 and T. TRAN 4. Dedicated to Michel Théra in honor of

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45 Division of the Humanities and Social Sciences Supergradients KC Border Fall 2001 1 The supergradient of a concave function There is a useful way to characterize the concavity of differentiable functions.

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Bregman distances and Klee sets

Bregman distances and Klee sets Bregman distances and Klee sets Heinz H. Bauschke, Xianfu Wang, Jane Ye, and Xiaoming Yuan July 24, 2008 (revised version) Abstract In 1960, Klee showed that a subset of a Euclidean space must be a singleton

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Victoria Martín-Márquez

Victoria Martín-Márquez A NEW APPROACH FOR THE CONVEX FEASIBILITY PROBLEM VIA MONOTROPIC PROGRAMMING Victoria Martín-Márquez Dep. of Mathematical Analysis University of Seville Spain XIII Encuentro Red de Análisis Funcional y

More information

Self-dual Smooth Approximations of Convex Functions via the Proximal Average

Self-dual Smooth Approximations of Convex Functions via the Proximal Average Chapter Self-dual Smooth Approximations of Convex Functions via the Proximal Average Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang Abstract The proximal average of two convex functions has proven

More information

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

Nearly convex sets: fine properties and domains or ranges of subdifferentials of convex functions

Nearly convex sets: fine properties and domains or ranges of subdifferentials of convex functions Nearly convex sets: fine properties and domains or ranges of subdifferentials of convex functions arxiv:1507.07145v1 [math.oc] 25 Jul 2015 Sarah M. Moffat, Walaa M. Moursi, and Xianfu Wang Dedicated to

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

BREGMAN DISTANCES, TOTALLY

BREGMAN DISTANCES, TOTALLY BREGMAN DISTANCES, TOTALLY CONVEX FUNCTIONS AND A METHOD FOR SOLVING OPERATOR EQUATIONS IN BANACH SPACES DAN BUTNARIU AND ELENA RESMERITA January 18, 2005 Abstract The aim of this paper is twofold. First,

More information

Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems

Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems Robert Hesse and D. Russell Luke December 12, 2012 Abstract We consider projection algorithms for solving

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Lecture Notes on Iterative Optimization Algorithms

Lecture Notes on Iterative Optimization Algorithms Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION CHRISTIAN GÜNTHER AND CHRISTIANE TAMMER Abstract. In this paper, we consider multi-objective optimization problems involving not necessarily

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

INVERSE FUNCTION THEOREM and SURFACES IN R n

INVERSE FUNCTION THEOREM and SURFACES IN R n INVERSE FUNCTION THEOREM and SURFACES IN R n Let f C k (U; R n ), with U R n open. Assume df(a) GL(R n ), where a U. The Inverse Function Theorem says there is an open neighborhood V U of a in R n so that

More information

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε 1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Monotone Linear Relations: Maximality and Fitzpatrick Functions

Monotone Linear Relations: Maximality and Fitzpatrick Functions Monotone Linear Relations: Maximality and Fitzpatrick Functions Heinz H. Bauschke, Xianfu Wang, and Liangjin Yao November 4, 2008 Dedicated to Stephen Simons on the occasion of his 70 th birthday Abstract

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT Linear and Nonlinear Analysis Volume 1, Number 1, 2015, 1 PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT KAZUHIRO HISHINUMA AND HIDEAKI IIDUKA Abstract. In this

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT

ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT EMIL ERNST AND MICHEL VOLLE Abstract. This article addresses a general criterion providing a zero duality gap for convex programs in the setting of

More information

NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS

NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS SAMEER CHAVAN Abstract. This is the first part of Notes on Multivariable Calculus based on the classical texts [6] and [5]. We present here the geometric

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Chapter 2: Preliminaries and elements of convex analysis

Chapter 2: Preliminaries and elements of convex analysis Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M. COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is

More information

GENERALIZED DIFFERENTIATION WITH POSITIVELY HOMOGENEOUS MAPS: APPLICATIONS IN SET-VALUED ANALYSIS AND METRIC REGULARITY

GENERALIZED DIFFERENTIATION WITH POSITIVELY HOMOGENEOUS MAPS: APPLICATIONS IN SET-VALUED ANALYSIS AND METRIC REGULARITY GENERALIZED DIFFERENTIATION WITH POSITIVELY HOMOGENEOUS MAPS: APPLICATIONS IN SET-VALUED ANALYSIS AND METRIC REGULARITY C.H. JEFFREY PANG Abstract. We propose a new concept of generalized dierentiation

More information

Heinz H. Bauschke and Walaa M. Moursi. December 1, Abstract

Heinz H. Bauschke and Walaa M. Moursi. December 1, Abstract The magnitude of the minimal displacement vector for compositions and convex combinations of firmly nonexpansive mappings arxiv:1712.00487v1 [math.oc] 1 Dec 2017 Heinz H. Bauschke and Walaa M. Moursi December

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

Inequality Constraints

Inequality Constraints Chapter 2 Inequality Constraints 2.1 Optimality Conditions Early in multivariate calculus we learn the significance of differentiability in finding minimizers. In this section we begin our study of the

More information

Convex Analysis Background

Convex Analysis Background Convex Analysis Background John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this set of notes, we will outline several standard facts from convex analysis, the study of

More information

PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS

PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS Fixed Point Theory, 15(2014), No. 2, 399-426 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS ANDRZEJ CEGIELSKI AND RAFA

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Analysis Comprehensive Exam Questions Fall 2008

Analysis Comprehensive Exam Questions Fall 2008 Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

WEAK CONVERGENCE THEOREMS FOR EQUILIBRIUM PROBLEMS WITH NONLINEAR OPERATORS IN HILBERT SPACES

WEAK CONVERGENCE THEOREMS FOR EQUILIBRIUM PROBLEMS WITH NONLINEAR OPERATORS IN HILBERT SPACES Fixed Point Theory, 12(2011), No. 2, 309-320 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html WEAK CONVERGENCE THEOREMS FOR EQUILIBRIUM PROBLEMS WITH NONLINEAR OPERATORS IN HILBERT SPACES S. DHOMPONGSA,

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

The Brezis-Browder Theorem in a general Banach space

The Brezis-Browder Theorem in a general Banach space The Brezis-Browder Theorem in a general Banach space Heinz H. Bauschke, Jonathan M. Borwein, Xianfu Wang, and Liangjin Yao March 30, 2012 Abstract During the 1970s Brezis and Browder presented a now classical

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous: MATH 51H Section 4 October 16, 2015 1 Continuity Recall what it means for a function between metric spaces to be continuous: Definition. Let (X, d X ), (Y, d Y ) be metric spaces. A function f : X Y is

More information

On Total Convexity, Bregman Projections and Stability in Banach Spaces

On Total Convexity, Bregman Projections and Stability in Banach Spaces Journal of Convex Analysis Volume 11 (2004), No. 1, 1 16 On Total Convexity, Bregman Projections and Stability in Banach Spaces Elena Resmerita Department of Mathematics, University of Haifa, 31905 Haifa,

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION ALESSIO FIGALLI AND YOUNG-HEON KIM Abstract. Given Ω, Λ R n two bounded open sets, and f and g two probability densities concentrated

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

Radius Theorems for Monotone Mappings

Radius Theorems for Monotone Mappings Radius Theorems for Monotone Mappings A. L. Dontchev, A. Eberhard and R. T. Rockafellar Abstract. For a Hilbert space X and a mapping F : X X (potentially set-valued) that is maximal monotone locally around

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

Helly's Theorem and its Equivalences via Convex Analysis

Helly's Theorem and its Equivalences via Convex Analysis Portland State University PDXScholar University Honors Theses University Honors College 2014 Helly's Theorem and its Equivalences via Convex Analysis Adam Robinson Portland State University Let us know

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Constraint qualifications for convex inequality systems with applications in constrained optimization

Constraint qualifications for convex inequality systems with applications in constrained optimization Constraint qualifications for convex inequality systems with applications in constrained optimization Chong Li, K. F. Ng and T. K. Pong Abstract. For an inequality system defined by an infinite family

More information

Near Equality, Near Convexity, Sums of Maximally Monotone Operators, and Averages of Firmly Nonexpansive Mappings

Near Equality, Near Convexity, Sums of Maximally Monotone Operators, and Averages of Firmly Nonexpansive Mappings Mathematical Programming manuscript No. (will be inserted by the editor) Near Equality, Near Convexity, Sums of Maximally Monotone Operators, and Averages of Firmly Nonexpansive Mappings Heinz H. Bauschke

More information

On Slater s condition and finite convergence of the Douglas Rachford algorithm for solving convex feasibility problems in Euclidean spaces

On Slater s condition and finite convergence of the Douglas Rachford algorithm for solving convex feasibility problems in Euclidean spaces On Slater s condition and finite convergence of the Douglas Rachford algorithm for solving convex feasibility problems in Euclidean spaces Heinz H. Bauschke, Minh N. Dao, Dominikus Noll and Hung M. Phan

More information

A general iterative algorithm for equilibrium problems and strict pseudo-contractions in Hilbert spaces

A general iterative algorithm for equilibrium problems and strict pseudo-contractions in Hilbert spaces A general iterative algorithm for equilibrium problems and strict pseudo-contractions in Hilbert spaces MING TIAN College of Science Civil Aviation University of China Tianjin 300300, China P. R. CHINA

More information

Spectral Theory, with an Introduction to Operator Means. William L. Green

Spectral Theory, with an Introduction to Operator Means. William L. Green Spectral Theory, with an Introduction to Operator Means William L. Green January 30, 2008 Contents Introduction............................... 1 Hilbert Space.............................. 4 Linear Maps

More information

B. Appendix B. Topological vector spaces

B. Appendix B. Topological vector spaces B.1 B. Appendix B. Topological vector spaces B.1. Fréchet spaces. In this appendix we go through the definition of Fréchet spaces and their inductive limits, such as they are used for definitions of function

More information

The Split Hierarchical Monotone Variational Inclusions Problems and Fixed Point Problems for Nonexpansive Semigroup

The Split Hierarchical Monotone Variational Inclusions Problems and Fixed Point Problems for Nonexpansive Semigroup International Mathematical Forum, Vol. 11, 2016, no. 8, 395-408 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2016.6220 The Split Hierarchical Monotone Variational Inclusions Problems and

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m Convex analysts may give one of two

More information