Computing proximal points of nonconvex functions

Size: px

Start display at page:

Download "Computing proximal points of nonconvex functions"

Elfreda Oliver
6 years ago
Views:

1 Mathematical Programming manuscript No. (will be inserted by the editor) Warren Hare Claudia Sagastizábal Computing proximal points o nonconvex unctions the date o receipt and acceptance should be inserted later Abstract The proximal point mapping is the basis o many optimization techniques or convex unctions. By means o variational analysis, the concept o proximal mapping was recently extended to nonconvex unctions that are prox-regular and prox-bounded. In such a setting, the proximal point mapping is locally Lipschitz continuous and its set o ixed points coincide with the critical points o the original unction. This suggests that the many uses o proximal points, and their corresponding proximal envelopes (Moreau envelopes), will have a natural extension rom Convex Optimization to Nonconvex Optimization. For example, the inexact proximal point methods or convex optimization might be redesigned to work or nonconvex unctions. In order to begin the practical implementation o proximal points in a nonconvex setting, a irst crucial step would be to design eicient methods o approximating nonconvex proximal points. This would provide a solid oundation on which uture design and analysis or nonconvex proximal point methods could lourish. In this paper we present a methodology based on the computation o proximal points o piecewise aine models o the nonconvex unction. These models can be built with only the knowledge obtained rom a black box providing, or each point, the unction value and one subgradient. Convergence o the method is proved or the class o nonconvex unctions that are prox-bounded and lower-c 2 and encouraging preliminary numerical testing is reported. Keywords Nonconvex Optimization, Nonsmooth Optimization, proximal point, prox-regular, lower-c 2 AMS Subject Classiication Primary: 90C26, 49J52; Secondary: 65K0, 49J53, 49M05. Introduction and motivation Envelope unctions are useul or both theoretical and practical purposes in the analysis and manipulation o lower semicontinuous unctions. This is the case in particular o the Moreau envelope [30], also known as Moreau-Yosida regularization [37] and proximal envelope [35]. For a convex unction, or instance, the unction and its Moreau envelope have the same set o minima. Since the Moreau envelope unction is continuously dierentiable, a gradient-like method can be applied to ind a minimizer o the original unction, possibly nondierentiable. As such, the well known proximal point algorithm, [4], [23], [34], can be interpreted as a preconditioned gradient method or minimizing the Moreau envelope [7]. Many other examples can be ound in convex programming, Warren Hare IMPA, Estrada Dona Castorina 0, Jardim Botânico, Rio de Janeiro RJ , Brazil. Research supported by CNPq Grant No /2004-0, whare@cecm.su.ca Claudia Sagastizábal IMPA, Estrada Dona Castorina 0, Jardim Botânico, Rio de Janeiro RJ , Brazil. On leave rom INRIA, France. Research supported by CNPq Grant No /2004-2, sagastiz@impa.br

2 2 Hare and Sagastizábal where several variants o proximal point methods have been extensively applied to develop dierent algorithms, such as [33], [2], [3], [], [3], [8]. To make use o the Moreau envelope in algorithms requires the computation, at least approximately, o the proximal point mapping. The search direction used in an inexact proximal point algorithm is an approximation o the search direction which would be created i the exact proximal point were known. This direction is generally calculated by running a ew iterations o a sub-algorithm which (i run to ininity) calculates the proximal point o a unction. The proo o algorithmic convergence is then based on the relationships between optimality and proximal points, and the ability to calculate a proximal point or a convex unction. For example, convex variants o bundle methods, [5, vol. II], deine a new serious step when the approximate computation o certain proximal point (at the current serious step) is considered good enough. Bundle null steps are nothing but iterations to better approximate a proximal point. In [] and [9] it is shown that proximal points can be approximated with any desired accuracy via an adequate cutting-planes model o the convex unction. This result is the basis or showing convergence o a bundle method when it generates a last serious step ollowed by an ininite number o null steps. Recent research into proximal points o nonconvex unctions has shown that much o the theoretical power o proximal points is not constrained to the convex case. In the nonconvex case, research on Moreau envelopes and the proximal point mapping has largely hinged upon the idea o prox-regularity; see [3], [32], [5], [6]. First introduced in [32], prox-regularity provides the necessary structure to extend many o the results on proximal points to a nonconvex setting. Speciically, i a unction is prox-regular and prox-bounded, as deined in Section 2 below, the proximal point mapping is Lipschitz continuous near stationary points and its ixed points correspond to the critical points o the original unction [3, Thm. 4.4]. The same relations are shown to hold in Hilbert and Banach spaces in [6] and [5], respectively. The results obtained or prox-regular unctions suggest that a stationary point o such a unction could be ound by applying a nonconvex variant o the proximal point method. For both actual implementation and to create a irm oundation or the theory o nonconvex proximal point algorithms, a irst crucial step would be a method o computing the proximal point or a nonconvex unction. This is the important question we address in this paper. Along the lines o [9, Sec. 4], we ocus on identiying a minimal set o conditions to be satisied along iterations by simple model unctions approximating the (unknown) nonconvex unction. These conditions correspond to implementable models built on the knowledge o the inormation given by a black box, whose output or each point is the unction value and one subgradient o the unction at the given point. This paper is organized as ollows. In Subsection. we review some related previous work and highlight how our approach diers rom past research. Section 2 reviews Variational Analysis deinitions and results. Section 3 lays the background o our methodology, with some initial theoretical results ensuring convergence to the proximal point. A constructive approach to compute proximal points or nonconvex unctions and a derived implementable black box algorithm are given, respectively, in Sections 4 and 5. The algorithm is shown to converge or prox-bounded lower-c 2 unctions in Theorem 4. Section 6 reports some encouraging preliminary numerical results, obtained by running the algorithm on a large collection o randomly generated lower-c 2 unctions. In particular, in Subsection 6.6, comparative runs with two well-known bundle methods show the potential advantages o our approach.. Relation with past research on Nonconvex Nonsmooth Optimization Extending to the nonconvex setting a method designed or convex unctions is not a new idea. In particular, this is the case o bundle methods, [24], [9], [25], [26]. As noted, when minimizing a convex unction, null step-iterations ollowing a serious step can be interpreted as the approximate calculation o the proximal point o the unction at such serious step. For this reason, our work can be related to nonconvex bundle iterations between serious steps. We emphasize that this work ocuses on how to approximate one proximal point, or a given point x 0 ixed. In a ollow-up paper we shall consider how to use our work in a dynamic setting, letting x 0 vary at each serious step and, hence, developing a nonconvex bundle-like algorithm. Keeping these comments in mind, we now examine some eatures o nonconvex bundle methods. Historically, convex bundle methods were developed in the early 80 s in a dual orm [5, Ch. XIV]. Likewise, the irst nonconvex variants o bundle methods were also developed rom a dual point o view.

3 Computing proximal points 3 That is, they are designed to eventually produce an approximate subgradient (playing the role o the ε-subgradient in the convex case) going to zero. In these methods, no attention is paid to the primal view o how to model the objective unction by using a cutting-planes approximation (primal orms o convex bundle methods were mostly developed in the 90 s; see [5, Ch. XV]). As a result, modiications introduced in the convex algorithm to adapt it to the nonconvex setting lead to ixes in the modelling unction, that were not supported by a theoretical background rom the primal point o view. Basically, such ixes consist in redeining the so-called linearization errors; see [5, De. XIV..2.7], which can be negative in the nonconvex case, to make them nonnegative by orce. Along these lines, various possibilities were considered, or example, using either the absolute value o (negative) linearization errors, or a subgradient locality measure depending on a penalty coeicient; we reer to [26] or a general study on this subject. More recent nonconvex bundle methods, such as [7], [20], [2], [22], although developed rom convex proximal bundle methods based on a primal view, still pursue the track o previous dual approaches, by using subgradient locality measures, possibly quadratic, to redeine negative linearization errors. The use o quadratic subgradient locality measures is similar to the idea we present here, but has the weakness that the penalty coeicient is ixed a priori (moreover, primal inormation, corresponding to unction values, is again ignored). Instead, as explained in Remark 9, our penalty coeicient (η n ) will be calculated dynamically during each iteration in a manner which orces linearization errors (o a penalization o the objective unction) to be positive. In this way, we gain both a primal and dual view o the underlying optimization problem; also see Remark 8. Unlike past nonconvex approaches, in our research we begin by returning to the original building blocks o the proximal point algorithms o convex optimization: the proximal point itsel. A strong theoretical base or nonconvex proximal points has already begun in works such as [3], [32], [5], [6]. The next logical step is thereore to create an algorithm capable o computing the proximal point o a nonconvex unction. To the best o our knowledge, the present work is the irst algorithm or computing the proximal point or a nonconvex unction. In Subsection 6.6 numerical tests, comparing the computation o a proximal point via our algorithm and via running previously developed bundle algorithms on the proximal point subproblem, show avourable results which urther support our methodology. Finally, by approaching the problem rom this direction we will, in the long run, not just create working algorithms, but also create a better understanding o why the algorithms work and how they can be improved. For example, the algorithm developed in this paper is not only capable o calculating the proximal point o a nonconvex unction, but it is also capable o inorming the user when the given proximal parameter is insuicient or the desired proximal point to be well deined (a concern relevant in the nonconvex setting only). This result will make it possible to create workable nonconvex algorithms based on the nonconvex VU theory in [27], [28], and the nonconvex partly smooth theory in [4]. Also, a ull nonconvex nonsmooth algorithm based on the building blocks developed in this work will have the distinct advantage over the previous nonconvex bundle methods that it will provide both a primal and dual view o the nonconvex optimization problem, whereas previous methods only provide a dual view o the problem. This is because, in a method created rom a primal-dual oundation, a modelling unction directly related to the objective unction is minimized at each iteration; while in previous approaches, only the subgradients o the minimized unction were related to subgradients o the objective unction. As a result, it is likely that uture research in this direction will be more proitable than previous directions. 2 Basic deinitions and past results In this section we recall some important concepts and results related to Variational Analysis and the proximal point mapping. 2. Some notions rom Variational Analysis We begin by ixing the notation used throughout this paper. Given a sequence o vectors {z k } converging to 0, we write ζ k = o( z k ) i and only i or all ε > 0 there exists k ε > 0 such that ζ k ε z k or all k k ε.

4 4 Hare and Sagastizábal For a set S IR N and a point x S: A vector v is normal to S at x i there are sequences x ν S x and v ν v such that v ν, z x ν o( z x ν ) or z S. A set S is said to be Clarke regular at x S when S is locally closed at x and each normal vector v satisies v, z x o( z x ) or all z S. Let : IR N IR {+ } be a proper lsc unction, so that its epigraph, denoted and deined by epi := {(x, β) IR N IR : β (x)}, is a closed set in IR N+. For a point x IR N where is inite-valued: We use the Mordukhovich subdierential [29] denoted by ( x) in [35, De 8.3, p. 30]. The unction is said to be subdierentially regular at x i epi is a Clarke regular set at the point ( x, ( x)); see [35, De. 7.25, p. 260]. For such unctions, a relaxed subgradient inequality holds: (x) ( x) + ḡ, x x + o( x x ) or each ḡ ( x) and all x IR N. The unction is prox-regular at x or a subgradient ḡ ( x) (with parameter r pr ) i there exists r pr > 0 such that (x ) (x) + g, x x r pr 2 x x 2 whenever x and x are near x with (x) near ( x) and g (x) near ḡ. When this property holds or all subgradients in ( x), the unction is said to be prox-regular at x; see [35, De. 3.27, p. 60]. The unction is prox-bounded i there exists R 0 such that the unction ( ) + R 2 2 is bounded below. The corresponding threshold (o prox-boundedness) is the smallest r pb 0 such that ( ) + R 2 2 is bounded below or all R > r pb. In this case, ( ) + R 2 x 2 is bounded below or any x and R > r pb [35, De..23, p. 20 & Thm..25, p. 2]. The unction is lower-c 2 on an open set V i at any point x in V, appended with a quadratic term is a convex unction on an open neighbourhood V o x; see [35, Thm. 0.33, p. 450]. The unction is said to be prox-regular i it is prox-regular at every x IR N. In this case, the proxregularity parameter may either be pointwise (i.e., dependent on x IR N ) or global (independent o x IR N ). Likewise, is said to be lower-c 2 i the corresponding neighbourhood V is the entire space IR N (note that this does not imply the neighbourhood V in the deinition above is the entire space). Remark (lower-c 2 ) The original deinition o lower-c 2 unctions states: The unction is lower-c 2 on an open set V i or each x V there is a neighbourhood V o x upon which a representation (x) = max t T t (x) holds, where T is a compact set and the unctions t are twice continuously dierentiable jointly in both x and t. The deinition we cite above is proved equivalent to the original one in [35, Thm. 0.33, p. 450]. The next lemma, whose proo is straightorward, will be very useul in dealing with lower-c 2 unctions. Lemma 2 Suppose the unction is lower-c 2 on V and x V. Then there exists ε > 0, K > 0, and ρ > 0 such that (i) or any point x 0 and parameter R ρ the unction + R 2 x0 2 is convex and inite valued on the closed ball Bε ( x), and (ii) the unction is Lipschitz continuous with constant K on the closed ball Bε ( x). Proo. We consider irst the unction + ρ 2 x0 2. Its convexity and the act that ρ can be selected independently o x 0 is clear rom the proo o [35, Thm. 0.33, p. 450]. Finite valuedness then ollows by selecting ε > 0 such that the closed ball B ε ( x) is contained in the neighbourhood V, and applying the original deinition o lower-c 2 unctions in Remark. Finally, since inite valued convex unctions are Lipschitz continuous on compact domains [35, Ex. 9.4, p. 359], we have the existence o the Lipschitz constant K or the unction on the ball Bε ( x).

5 Computing proximal points 5 Clearly, any unction which is bounded below is prox-bounded with threshold r pb = 0. Convex unctions are prox-regular with global prox-regularity parameter zero (r pr = 0) [35, Thm. 3.30, p. 6] while lower-c 2 unctions are prox-regular [35, Prop. 3.33, p. 63]. Finally, i a unction is proxregular at a point, then it is subdierentially regular at that point [35, p. 60]. 2.2 Proximal envelope results Due to its key role in this work, we consider the proximal point mapping in more detail. Given a positive parameter R, the proximal point mapping o the unction at the point x IR N is deined by p R (x) := argmin w IR N {(w) + R 2 } w x 2, () where the minima above can orm a nonsingleton set. Even or prox-bounded unctions, some choices o R (below the prox-boundedness threshold) may result in p R (x) being the empty set. Uniqueness o p R (x) requires prox-boundedness and prox-regularity; see Theorem 4 below. Usually, R and x are called the prox-parameter and prox-center o (). The proximal envelope is the unction that assigns to each x the optimal objective value in equation (). Intuitively, as the prox-parameter increases, the related proximal points should converge towards the given prox-center. We next show this result or Lipschitz continuous unctions and give a simple bound or the distance between the set o proximal points and the prox-center. Lemma 3 (Lipschitz continuous unctions and proximal points) Suppose the unction is Lipschitz continuous with constant K. Then or any R > 0 such that p R (x 0 ) is nonempty the ollowing relation holds: p x 0 < 2K R or all p p R(x 0 ). Proo. { Given a prox-parameter R > 0 such that p R (x 0 ) is nonempty, the inclusion p R (x 0 ) x : (x) + R 2 x x0 2 (x 0 ) } holds by (). The result ollows, using the Lipschitz continuity o : p R (x 0 ) { x : (x) + R 2 x x0 2 (x 0 ) } { x : (x 0 ) K x x 0 + R 2 x x0 2 (x 0 ) } = { } x : x x 0 2K R. In [35, Prop. 3.37, p. 67], the (pointwise) assumptions o prox-regularity and prox-boundedness are used to show that the proximal point mapping is well deined near stationary points. The ollowing proposition extends this result to points that need not be stationary and gives (global) conditions or the result to hold in the whole space. Theorem 4 Suppose that the lsc unction is prox-regular at x or ḡ ( x) with parameter r pr, and that is prox-bounded with threshold r pb. Then or any R > max{r pr, r pb } and all x near x + Rḡ the ollowing hold: the proximal point mapping p R is single-valued and locally Lipschitz continuous, the gradient o the proximal envelope at x is given by R(x p R (x)), and the proximal point is uniquely determined by the relation p = p R (x) R (x p) (p). Suppose instead the lsc unction is prox-bounded with threshold r pb and lower-c 2 on V. Let x V and let ε > 0, K > 0 and ρ > 0 be given by Lemma 2. Then or any R > R x := max{4k/ε, ρ, r pb } and all x B ε/2 ( x) the three items above hold. Thus, x is a stationary point o i and only i x = p R (x ) or any R > R x.

6 6 Hare and Sagastizábal Proo. Validity o the three items when the assumptions hold at x or ḡ ( x) ollows rom applying [3, Thm. 4.4] to the unction h(x) := (x + x) ḡ, x, which is prox-regular at x := 0 or the subgradient 0 = ḡ ḡ h(0) and prox-bounded with the same threshold o prox-boundedness as [35, Ex. 3.35, p. 65 & Ex..24, p. 20]. Consider now the lower-c 2 case. For a given x and the values ε > 0, K > 0 and ρ > 0 provided by Lemma 2 we have + ρ 2 x 2 is convex on B ε ( x) or any x B ε ( x). As such, or any x B ε ( x), we have + ρ 2 x 2 is prox-regular at x with parameter 0. Thereore, or any x B ε ( x), is prox-regular at x with parameter ρ. Since is lower-c 2 there also exists K > 0 such that is Lipschitz continuous with constant K on B ε ( x), by Lemma 2. Letting i Bε( x) denote the indicator unction o the open ball B ε ( x), our assumptions imply that the unction := + i Bε( x) is prox-regular with global parameter r pr. Moreover, i R 4K/ε, then or any x B ε/2 ( x) by Lemma 3 we have that p R (x) Bε/2 (x). Set R x := max{4k/ε, ρ, r pb }, and notice that or any R > R x the unction is globally prox-regular with parameter ρ R, proxbounded with threshold r pb R, and satisies p R (x) = p R (x) or all x Bε/2 ( x). Now select any x B ε/2 ( x) and ix R > R x. Letting p = p R (x), we have that R(x p) (p). The mapping p R satisies the three (pointwise) items in this proposition, with x = p Bε/2 ( x), ḡ = R(x p) (p), and r pr = r pr, near the point p + R (R(x p)) = x. Since p R = p R near x, the proo is complete. 3 Main idea and results In order to compute p R (x 0 ) or a given x 0 IR N, we proceed by splitting the prox-parameter R into two nonnegative terms η and µ satisying R = η + µ. These terms play two distinct roles derived rom the relation p R (x 0 ) = p µ ( + η 2 x0 2 )(x 0 ). In this expression, η deines a convexiied unction which we shall model by a simple unction ϕ η, while µ is used as a prox-parameter or this model. We reer to η and µ as the convexiication parameter and model prox-parameter, respectively. Along the iterative process, η, µ and ϕ η have to be suitably modiied to guarantee convergence to the proximal point, or to detect ailure i R is not suiciently large or the proximal point to be well deined. More ormally, or n = 0,, 2,..., given positive parameters η n and µ n and a model unction ϕ n,ηn + η n 2 x0 2, we let x n+ IR N be deined as ollows: { } x n+ := argmin w ϕ n,ηn (w) + µ n 2 w x0 2. (2) Since x n+ = p µn ϕ n,ηn (x 0 ), we call it an approximal point. To ensure that the sequence o approximal points is well deined and converges to p R (x 0 ), the data in equation (2) must be suitably chosen. In our construction, the amily o parameters and model unctions satisies:

7 Computing proximal points 7 ϕ n,ηn is a convex unction, (3a) ϕ n,ηn (x 0 ) (x 0 ), ϕ n+,ηn+ (w) ϕ n,ηn (x n+ ) + µ n x 0 x n+, w x n+ or all w IR N, ϕ n,ηn (w) (x n ) + 2 η n x n x g n +η n (x n x 0 ), w x n or some g n (3b) (3c) (x n ) and all w IR N,(3d) µ = µ n and η n = η or some positive µ, nonnegative η, and n suiciently large. (3e) The parameter η n will be deined to satisy η n = R µ n, thus in condition (3e) the relation η = R µ holds. Note that or model prox-parameters to be positive (a reasonable request), convexiication parameters should always be smaller than R. In Section 4 we give a black box implementable method ensuring satisaction o conditions (3) above. In this method, the sequence o model prox-parameters {µ n } is nonincreasing, thus the corresponding sequence o convexiication parameters {η n } will be nondecreasing. Global assumptions on the model unctions, i.e., to be veriied or all points in IR N, are sound when is a convex unction; however, in our nonconvex setting, we should rather rely on local conditions. Some o the conditions listed in (3) are similar to the global conditions o [9, Sec. 4], but transormed into local counterparts that are more suitable or a nonconvex unction. In particular, condition (3b) can be thought o as a weakened orm o [9, eq. (4.7)] which states that ϕ n,ηn at all points (instead o at x 0 only). Condition (3c) is exactly equivalent to [9, eq. (4.8)], while condition (3d) is [9, eq. (4.9)], stated or the unction + 2 η n x 0 2 ; see Section 4 below. Although or our results we need satisaction o (3c) at the point x n+ only, it seems more natural to state the relation or any w IR N. The next lemma uses conditions (3a), (3b), (3c), and (3e) to show convergence o the sequence o approximal points. Lemma 5 For n = 0,,..., consider the sequence {x n } deined by (2), where the amily o parameters and model unctions satisies conditions (3a), (3b), (3c), and (3e). The ollowing hold: { } (i) the sequence ϕn,ηn (xn+ ) + µ n 2 xn+ x 0 2 is eventually strictly increasing and convergent. (ii) The sequence {x n } is bounded and x n+ x n 0 as n. Proo. Take n big enough or (3e) to hold. For notational simplicity, consider the unction L n (w) := ϕ n,η (x n+ ) + µ 2 xn+ x µ 2 w xn+ 2 and let s n+ := µ(x 0 x n+ ) ϕ n,η (x n+ ) be the subgradient given by the optimality condition corresponding to (2). Sequence elements in (i) correspond both to L n (x n+ ) and the optimal objective value o problem (2). Since x n+ is the unique minimizer o this quadratic programming problem, L n (x n+ ) < ϕ n,η (x 0 ), and condition (3b) implies that L n (x n+ ) (x 0 ) or all n. (4) Condition (3c) written with w = x n+2 gives the inequality ϕ n+,η (x n+2 ) ϕ n,η (x n+ ) + s n+, x n+2 x n+. By deinition, L n+ (x n+2 ) = ϕ n+,η (x n+2 ) + µ 2 xn+2 x 0 2 ; hence, L n+ (x n+2 ) L n (x n+ ) + s n+, x n+2 x n+ + µ 2 xn+2 x 0 2 µ 2 xn+ x 0 2. Ater expanding squares, the two rightmost terms above satisy the relation µ 2 xn+2 x 0 2 µ 2 xn+ x 0 2 = µ x 0 x n+, x n+2 x n+ + µ 2 xn+2 x n+ 2 = s n+, x n+2 x n+ + µ 2 xn+2 x n+ 2,

8 8 Hare and Sagastizábal so we obtain L n+ (x n+2 ) L n (x n+ ) + µ 2 xn+2 x n+ 2. (5) Thereore, the sequence in (i) is strictly increasing or n big enough. It is also bounded above, by (4), so it must converge. By conditions (3a) and (3b), L n (x 0 ) = ϕ n,η (x n+ ) + s n+, x 0 x n+ ϕ n,η (x 0 ) (x 0 ). Since L n (x n+ ) + µ 2 x0 x n+ 2 = L n (x 0 ), we obtain, using condition (3e), that µ 2 x0 x n+ 2 + L n (x n+ ) (x 0 ). By (i), L n (x n+ ) converges, so the sequence {x n } is bounded. The second part o item (ii) ollows rom equation (5) by noting 0 µ 2 xn+ x n 2 L n (x n+ ) L n (x n ) 0. To show convergence o the sequence o approximal points to p R (x 0 ), we apply a local nonconvex analog o the condition ϕ n,ηn. This is the purpose o condition (6) below, a relation we show holds or lower-c 2 unctions when applying the algorithm CompProx; see Section 5 and Theorem 4 below. Theorem 6 Suppose the lsc unction is prox-bounded and lower-c 2 on V. Let x 0 V, and let ε > 0, K > 0 and ρ > 0 be given by Lemma 2 (written with x = x 0 ). Suppose that R > max{4k/ε, ρ, r pb } and consider the sequence {x n } deined by (2), where the amily o parameters and model unctions satisies (3). Let p be an accumulation point o the sequence {x n }, and let the subsequence {x ni } p as i. I is continuous at p, and η [0, R) is such that ϕ ni,η (w) (w) + η 2 w x0 2 or all w near p and i suiciently large, (6) then p is the proximal point o at x 0 : p = p R (x 0 ). Furthermore, ( lim ϕni,η ni i (xni+ ) + µ ni 2 xni+ x 0 2) = (p R (x 0 )) + R 2 p R(x 0 ) x 0 2. (7) I is continuous and condition (6) holds near all accumulation points o {x n }, then the entire sequence {x n } converges to the proximal point p R (x 0 ) and (7) holds or the entire sequence. Proo. Without loss o generality, we only consider the concluding portion o the sequence {x n }, so assume η n = η and µ n = µ throughout. Recall that in Lemma 5, or the bounded sequence {x n } we showed that x n+ x n 0 as n. Thereore x ni p as i implies x ni+ p as i. For w near p, by equation (6) and the act µ(x 0 x n+ ) is a subgradient o the convex unction ϕ n,η at x n+, we have (w) ϕ ni,η (w) η 2 w x0 2 ϕ ni,η (xni+ ) η 2 w x0 2 + µ x 0 x ni+, w x ni+ = ϕ ni,η (xni+ ) η 2 xni+ x η 2 xni+ x 0 2 η 2 w x0 2 + µ x 0 x ni+, w x ni+ = ϕ ni,η (xni+ ) η 2 xni+ x (µ + η) x 0 x ni+, w x ni+ η 2 w xni+ 2. Condition (3d) written with n = n i or w = x ni+ gives the ollowing inequality, ϕ ni,η (xni+ ) 2 η xni+ x 0 2 (x ni )+ 2 η xni x η xni+ x g ni +η(x ni x 0 ), x ni+ x ni,

9 Computing proximal points 9 or some g ni (x ni ). It ollows that (w) ϕ ni,η (xni+ ) η 2 xni+ x (µ + η) x 0 x ni+, w x ni+ η 2 w xni+ 2 (x ni ) + 2 η xni x η xni+ x g ni + η(x ni x 0 ), x ni+ x ni +R x 0 x ni+, w x ni+ 2 η w xni+ 2. Passing to the limit as i, and using that is continuous at p, yields the relations (w) lim ϕ i ni,η (xni+ ) η 2 p x0 2 + (µ + η) x 0 p, w p η w p 2 2 (p) + R x 0 p, w p η 2 w p 2, or all w near p. Since η 2 w p 2 = o( w p ), the last inequality means that R(x 0 p) (p), an inclusion equivalent to having p = p R (x 0 ), by Theorem 4. Furthermore, evaluating the relations at w = p shows that lim i ϕ n i,η (xni+ ) = (p) + η 2 p x0 2, which is equivalent to equation (7). Convergence o the whole sequence {x n }, when is continuous and (6) holds at all accumulation points ollows rom the act that, or prox-regular unctions, the proximal point is unique. Theorem 6 shows convergence o the sequence o iterates to the proximal point, thereby extending to the nonconvex setting Proposition 4.3 o [9]. It urther shows the corresponding optimal values in (2) to converge to the value o the proximal envelope at x 0. We now proceed to construct adequate models with inormation provided by the black box. 4 Building the approximations In this section we consider a constructive manner to deine along iterations model unctions, ϕ n,ηn, as well as convexiication parameters, η n. Model prox-parameters, µ n, will essentially be deined to ensure that R = η n + µ n. An important dierence with respect to the convex case arises at this stage. Since the model unction approximates + η 2 x0 2 or varying η, we need not only the unctional and subgradient values o, but also those o the quadratic term x 0 2. Accordingly, at iteration n, our bundle o inormation is composed o B n := { x i, i := (x i ), g i (x i ) } i I n, with I n {0,,..., n}. (8) The corresponding quadratic bundle elements, computable rom B n, are denoted and deined by Bq n := { iq := 2 } xi x 0 2, g iq := x i x 0. (9) i I n The ollowing useul relation holds or any pair o indices i, j: j q i q g i q, x j x i = 2 xi x j 2. (0) With the notation introduced in (8) and (9), the model unction used to compute x n+ is given by ϕ n,ηn (w) := max i I n { i + η n i q + g i + η n g i q, w x i }. () Since ϕ n,ηn is a piecewise aine unction, inding x n+ via equation (2) amounts to solving a linearquadratic programming problem on a simplex; see [6], [2] or specially designed solvers.

10 0 Hare and Sagastizábal In order to solve (2), we must also deine the convexiication and model prox- parameters. For the convexiication parameter, in particular, we desire to eventually satisy conditions (3) and (6). Although it is not practical (nor possible with just black box inormation) to check an ininite number o points, we can update η so that condition (6) holds at least or all points in the bundle. The next lemma states a computable lower bound which ensures satisaction o (6), written with η replaced by η n, or all bundle elements; see (3) below. Lemma 7 Consider the bundle deined by (8)-(9) at iteration n with I n containing at least two elements and the value η n := max i j I n j i g j, xj x i 2 xj x i 2. (2) The unction ϕ n,ηn deined by () at iteration n satisies ϕ n,ηn (x j ) j + η n j q or all j I n (3) whenever η n η n. Proo. We begin by noting that equation (3) holds i and only i or η = η n i + η i q + g i + ηg i q, x j x i j + η j q or all i, j I n+. Using (0) we see satisaction o (3) is equivalent to having i j gi, xi x j = i j + gi, xj x i 2 xj x i 2 q j q i gq, i x j x i η or all i j I n when η = η n. Hence or any η n η n equation (3) holds. We now give a computable rule based on the lower bound in Lemma 7 that deines convexiication and model prox- parameters in a monotone relation (nondecreasing or {η n } and nonincreasing or {µ n }). The rule makes use o a convexiication growth parameter Γ, ixed at the beginning o the algorithm. Proposition 8 Fix a convexiication growth parameter Γ >. For n, consider the bundle sets deined by (8)-(9) at iteration n. Suppose the parameters satisy the relation µ n + η n = R with µ n positive, and that the index set I n contains at least two elements. Having the lower bound η n rom (2), deine new parameters according to the rule η n := η n and µ := µ n i η n η n η n := Γ η n and µ n := R Γ η n i η n > η n. (4) Then, either both parameters remain unchanged or they are strictly monotone: η n < η n and µ n > µ n. Moreover, η n max{η n, η n } and η n + µ n = R. Lastly, i η n < R/Γ, then η n < R and µ n > 0 positive. Proo. All o the statements ollow easily rom rule (4).

11 Computing proximal points Remark 9 As mentioned in Subsection., nonnegativity o linearization errors is crucial or nonconvex bundle methods. The i th linearization error or (at x 0 ) is deined by ( ) e i = (x0 ) i + g i, x 0 x i. For the convexiied unction η := + η 2 x0 2, linearization errors have the expression ( ) = (x 0 ) i + ηq i + g i + ηgq, i x 0 x i. e η i From the relation e η i = e i + η 2 xi x 0 2, we see that the value η n calculated in equation (2) is nothing but the smallest value o η ensuring that e η i 0 or all i I n. As a result, or any value o η η n, the corresponding unction η will appear convex, as ar as it can be seen rom the available (bundle) inormation. Our rule (4) makes use o η n in a manner that can be viewed as a primal consistent method o orcing linearization errors to be nonnegative. At this stage, it is interesting to reer to previous work exploring modiications o the cutting-planes model, although restricted to a convex unction. The modiied model unction introduced by [36] and urther developed by [7], is the maximum over i I n o tilted aine unctions, given by i + ( θ) gi, xi i e i > θ κ gi, x0 x i ) κ i + ( κ)(x0 ) + ( θ) g i, ( κ θ xi + ( κ θ )x0 otherwise, where κ ]0, ] and θ [0, κ] are given parameters. In the expression above, note that the slope o the linearization is always tilted by a actor θ. In addition, the linearization is shited and (the slope urther tilted) i the error e i is not suiciently positive, a consideration related to our deinition o η. The original proposal in [36] corresponds to always deining the linearization by the irst line above, or θ [0, /2]. It is motivated by extending a modiied Newton s method or inding multiple roots o a polynomial to the minimization o a strictly convex quadratic unction. This is probably the reason why it is argued in [7] that tilting errors avoids visiting successive corners o the epigraph o and speeds up convergence, by employing second order inormation in the ridge (i.e., in the U-subspace in [27]). However, the resulting model can cut the graph o the (convex) unction, unlike the method we present here. More precisely, when the unction is convex, the numerator in (2) will always be nonpositive, yielding η n 0 or all n. By (4), η n never increases and, when started with η 0 = 0, our model simply reduces to a standard cutting-planes model; see also the comments ollowing Corollary 2 below. An important eect o our parameter update is that, whenever the unction + η n x 0 2 is (locally) convex, the rule leaves the parameters unchanged, as shown in the next lemma. Lemma 0 For n, consider the bundle sets deined by (8)-(9) at iteration n. Suppose the parameters satisy the relation µ n + η n = R with µ n positive, and that the index set I n contains at least two elements. I the unction + ˆη 2 x0 2 is convex on an open ball containing {x i : i I n }, then the lower bound rom (2) satisies η n ˆη. Hence, i the unction + η n 2 x0 2 is convex on an open ball containing {x i : i I n }, rule (2)-(4) sets η n = η n. Proo. Note that or arbitrary i I n, g i + ˆηgi q ( + ˆη 2 x0 2 )(x i ). By the convexity assumption, the corresponding subgradient inequalities yield the relations i + ˆη i q + g i + ˆηg i q, x j x i j + ˆη j q or any i, j I n. Using (0) we see that i j + gi, xj x i 0.5 x j x i 2 ˆη or all i j I n.

12 2 Hare and Sagastizábal Thus, the lower bound rom (2) satisies the inequality η n ˆη, and the inal result ollows. Another important consequence o Proposition 8 is that when the lower bound η n is bigger than the prox-parameter R, the relation in (3) cannot hold with η R, and nothing can be said about the convergence o the approximal sequence. In the algorithm stated in Section 6, we use such lower bound to check i the given prox-parameter R is suiciently large or our methodology to work. The size o quadratic programming problems (2) is directly related to the bundle cardinality, i.e., to the number o indices in I n. For the sake o eiciency, such cardinality should not grow beyond control as n increases. So ar, in rule (2)-(4) the only requirement or index sets is to contain more than one element. In order to ensure satisaction o (3b) and (3d), these elements should at least correspond to the prox-center and to the most recent approximal point ({0, n + } I n+ ). Satisaction o condition (3c) is guaranteed i all active bundle points at x n+ enter the index set I n+ ; this is the purpose o the index set Jn act deined below. Accordingly, or a given iteration n, we deine a amily o parameters and model unctions F := {µ n, η n, ϕ n,ηn } n via the ollowing criteria: (F-i) ϕ n,ηn is given by () with I n ; (F-ii) or n, having parameters satisying the relation µ n + η n = R with µ n positive, η n and µ n are deined by (2)-(4); (F-iii) each index set is chosen to satisy I n {0, n}; and { } (F-iv) I n+ Jn act := i I n : ϕ n,ηn (x n+ ) = i + η nq i + g i + gi q, x n+ x i. We now show how these properties combine to yield satisaction o conditions (3a) to (3d). Proposition Consider a amily o parameters and model unctions F := {µ n, η n, ϕ n,ηn } n deining iterates x n+ by (2). At iteration n the ollowing hold: (i) I the amily satisies (F-i), then {ϕ n,ηn } n satisies condition (3a). (ii) I or a ixed convexiication growth parameter Γ > the amily satisies (F-ii) and (F-iii), and the prox-parameter R is large enough or the lower bound deined in (2) to satisy η n < R/Γ, then η n < R and ϕ n,ηn (x j ) j + η n j q or all j I n. Thereore, {ϕ n,ηn } n satisies condition (3b). (iii) I η n+ = η n, and the amily satisies (F-i) and (F-iv), then {ϕ n,ηn } n satisies condition (3c). (iv) I the amily satisies (F-i) and (F-iii), then {ϕ n,ηn } n satisies condition (3d). Proo. [Item (i)] By (F-i), ϕ n,ηn is deined in () as the pointwise maximum o a inite collection o aine unctions, thereore convex, and item (i) holds. [Item (ii)] I η n < R/Γ, by Proposition 8, R > η n max{η n, η n }. In particular, η n η n, so ϕ n,ηn (x j ) j +η nq j or all j I n, by Lemma 7. Condition (3b) ollows rom applying this inequality at j = 0 (because 0 I n by (F-iii)), recalling that q 0 = 0: ϕ n,ηn (x 0 ) 0 + η nq 0 = 0. [Item (iii)] Suppose that η n+ = η n and the amily satisies (F-i) and (F-iv). Writing the optimality conditions o equation (2), we see there exists an optimal nonnegative multiplier ᾱ IR In such that ᾱ j > 0 or all j Jn act and ᾱ i = i IR In ϕ n,ηn (x n+ ) = ( ᾱ i i + η n q i + g i + η n gq, i x n+ x i ) i J act n s n+ = µ n (x 0 x n+ ) = It ollows that ϕ n,ηn (x n+ ) = i J act n i J act n ᾱ i ( g i + η n g i q). ( ᾱ i i + η n q i ) ᾱ i g i + η n gq, i x i + s n+, x n+. i Jn act By (F-i), using () with n replaced by n +, or all w IR N and each i I n+ ϕ n+,ηn+ (w) i + η n+q i + g i + η n+gq, i w x i. As a result, writing the inequality or the convex sum o indices

13 Computing proximal points 3 j J act n ϕ n+,ηn+ (w) = and using that η n+ = η n we see that i J act n i J act n = i J act n ᾱ i ϕ n+,ηn+ (w) ( ᾱ i i + η n+ q i ) + ᾱ i g i + η n+ gq, i w x i i Jn act ) + ᾱ i ( i + η n i q i J act n ᾱ i g i + η n g i q, w x i = ϕ n,ηn (x n+ ) s n+, x n+ + s n+, w, which is condition (3c). [Item (iv)] Setting i = n in () (n I n because o (F-iii)) yields condition (3d). I the unction + η n 2 x0 2 ever becomes (locally) convex, the application o rule (2)-(4) in (F-ii) leaves the subsequent parameters unchanged. Corollary 2 Consider a amily o parameters and model unctions F = {µ n, η n, ϕ n,ηn } n chosen to satisy (F-i)-(F-iv). I at some iteration n the convexiication parameter η n < R is such that the unction + η n 2 x0 2 is convex on a ball containing {x i : i I n }, then (F-ii) reduces to: (F-ii ) η n+ = η n and µ n+ = R η n = µ n. Proo. This ollows easily rom Lemma 0. When + η n 2 x0 2 is nonconvex, the model unctions may not deine a cutting-planes model or the unction. When is a convex unction, rom Corollary 2 we see that (F-ii) will always set η n = η 0 (= 0) and µ n = µ 0 (= R). Since in this case the quadratic bundle set Bq n is just the zero vector, our method becomes a standard cutting-planes approximation or estimating the proximal point, as in [], [9]. In the case when is only locally convex, the application o Corollary 2 requires some knowledge o the region where the iterates lie. In the next lemma we see that or lower-c 2 unctions, by careully selecting the prox-parameter R, it is possible to keep each new iterate in a ball around the prox-center. Lemma 3 (lower-c 2 and iteration distance) Suppose the unction is lower-c 2 on V and x 0 V. Let ε > 0, K > 0, and ρ > 0 be given by Lemma 2 written with x = x 0. Consider a amily o parameters and model unctions F := {µ n, η n, ϕ n,ηn } n deining iterates x n+ by (2). I at iteration n {x i : i I n } B ε (x 0 ) and R > 2K ε + 3η n, then the next iterate x n+ generated by equation (2) also lies within B ε (x 0 ). In particular, i Γ > is a ixed convexiication growth parameter, η 0 = 0, and R > 2K ε + 3Γ ρ, then all iterates lie within B ε (x 0 ). Proo: Recall that, by deinition (), ϕ n,ηn (w) = max i I n { i + η n i q + g i + η n g i q, w x i }, with x n+ = p µn ϕ n,ηn (x 0 ), by equation (2). In order to obtain a bound or x n+ x 0 via Lemma 3, we start by showing that ϕ n,ηn is Lipschitz continuous on IR N. Since, by assumption, {x i : i I n } B ε (x 0 ), and by Lemma 2, is Lipschitz continuous on B ε (x 0 ), any subgradient g i (xi ) satisies g i K. Together with the bound gi q ε, we obtain or ϕ n,ηn the Lipschitz constant L = max i I n { g i + η n g i q } max i I n { g i } + εη n K + εη n.

14 4 Hare and Sagastizábal Lemma 3 written with = ϕ n,ηn and our assumptions yield the desired result: x n+ x 0 2L µ n 2(K + εη n) R η n < ε. We prove the inal statement o the lemma by induction on n. Given R > 2K/ε+3Γ ρ, the statement clearly holds or n = 0. I {x i : i I n } B ε (x 0 ) and η n Γ ρ, we have just shown that x n+ B ε (x 0 ). Then, by Lemma 0, η n ρ and hence, by rule (4), either η n+ = η n Γ ρ, or η n+ = Γ η n Γ ρ. In both cases, η n+ Γ ρ, and the result ollows. Beore continuing to the an actual implementation o our algorithm, we provide one inal discussion as to its convergence or lower-c 2 unctions. Theorem 4 (lower-c 2 ) Suppose the lsc unction is prox-bounded with threshold r pb. Let the proxcenter x 0 IR N and prox-parameter R > r pb be given. Suppose the unction is lower-c 2 on V and x 0 V. Let ε > 0, K > 0, ρ > 0, and R x 0 be given by Lemma 2 and Theorem 4 written with x = x 0. For n = 0,,..., consider the sequence {x n } deined by (2), where the amily o parameters and model unctions F = {µ n, η n, ϕ n,ηn } n is chosen according to (F-i)-(F-iv) and the convexiication growth parameter Γ > is ixed. Then the ollowing holds: (i) I R > 2K/ε + 3Γ ρ, then (3) holds and the sequence o approximal points converges to some point p IR N. (ii) I at any iteration, η m ρ and R > 4K/ε + 3η m, then or all n m the amily satisies conditions (3) and (6). In this case, the sequence o approximal points converges to p R (x 0 ), and ( ) lim ϕ n n,ηn (x n+ ) + µ n 2 xn+ x 0 2 = (p R (x 0 )) + R 2 p R(x 0 ) x 0 2. Proo. In item (i) we assume R > 2K/ε+3Γ ρ and use Lemma 3 to ensure that the sequence {x n } lies in the open ball B ε (x 0 ). To see item (i), we begin by recalling that, because the unction + ρ 2 x0 2 is convex on B ε (x 0 ), the lower bound rom (2) satisies η n ρ < R (by Lemma 0 written with ˆη = ρ). Hence, by items (i), (ii), and (iv) in Proposition, conditions (3a), (3b), and (3d) hold. To show satisaction o conditions (3c) and (3e), we need only to show that eventually η n becomes constant (condition (3c) would then ollow rom Proposition, item (iii)). Suppose, or contradiction purposes, that {η n } never stabilizes. Then, rom some iteration m on, the convexiication parameters must become positive: η n η m > 0(= η 0 ) or n m. For the sequence not to stabilize, there must be an ininite number o iterations n m during which η n = Γ η n > Γ η n. But this process can only be repeated a inite number o times, because, since Γ >, the update will eventually result in η n ρ. Let m 2 denote the irst o such iterations. Since η m2 ρ, and the unction + η m 2 x0 2 is convex, by Corollary 2, we have that η n = η m2 =: η or all subsequent iterations. Thereore, convexiication parameters stabilize, and (3) holds. Then, by item (ii) in Lemma 5, the sequence o approximal points converges to some point p IR N. In item (ii) we assume R > 4K/ε + 3η m and use again Lemma 3. Since η m ρ and x m+ is in B ε (x 0 ), by Corollary 2 we have that η m+ = η m ρ. By mathematical induction we thereore have x n B ε (x 0 ) and, hence, η n = η m or all n m. Moreover, + η m 2 x0 2 is convex on B ε (x 0 ). An argument similar to the one in item (i) shows satisaction o conditions (3) and (6), with (6) holding or all w B ε (x 0 ). Since lower-c 2 unctions are continuous, and R > max{4k/ε, η m, r pb } max{4k/ε, ρ, r pb }, Theorem 6 completes the proo. 5 Algorithmic Implementation In this section we discuss eective implementation o the algorithm outlined in Section 3 and analyze its convergence. We start with the important question o deining stopping tests.

15 Computing proximal points 5 5. Developing a stopping criterion We return once more to the algorithm s original inspiration, i.e., considering that or some η > 0 the unction + η 2 x0 2 is somehow more convex than the original unction. With this relation in mind, we reexamine the stopping test or computing proximal points o convex unctions. For a convex unction h, cutting-planes model unctions satisy ϕ n h, thus the relation R(x 0 x n+ ) ϕ n (x n+ ) implies that or all w IR N h(w) ϕ n (w) h(x n+ ) + R x 0 x n+, w x n+ (h(x n+ ) ϕ n (x n+ )). In particular, or w = p R h(x 0 ), we obtain h(p R h(x 0 )) h(x n+ ) + R x n+ x 0, x n+ p R h(x 0 ) (h(x n+ ) ϕ n (x n+ )). Likewise, since p R h(x 0 ) is characterized by the inclusion R(x 0 p R h(x 0 )) h(p R h(x 0 )), the associated subgradient inequality written at x n+ gives h(x n+ ) h(p R h(x 0 )) + R x 0 p R h(x 0 ), x n+ p R h(x 0 ). The two inequalities above combined yield 0 R x n+ p R h(x 0 ), x n+ p R h(x 0 ) (h(x n+ ) ϕ n (x n+ )), i.e., R x n+ p R h(x 0 ) 2 h(x n+ ) ϕ n (x n+ ). Thereore, testing i the approximal estimate satisies h(x n+ ) ϕ n (x n+ ) R < tol 2 stop, where tol stop is a stopping tolerance, can be used as a stopping criterion in the convex case. For our nonconvex unction, we seek p R (x 0 ) via p µn ( + η n 2 x0 2 ). I + η n 2 x0 2 ) is convex on a ball containing the points o interest, then the convex stopping criterion, written with the notation o (8)-(9), becomes i n+ + η n n+ q ϕ n,ηn (x n+ ) R η n tol 2 stop then x n+ p R (x 0 ) tol stop. (5) In general, we have no way o knowing when or where + η n 2 x0 2 is convex, so this stopping criterion may lead to alse positive outcomes. However, i is a lower-c 2 unction, there exists some positive η such that + η 2 x0 2 is convex locally near the points o interest. Thus, a reasonable stopping condition might be that (x n+ ) + η 2 xn+ x 0 2 ϕ n,η (x n+ ) tol 2 R η stop, or some big enough η [η n, R). The next proposition shows that this stopping criterion is indeed stronger than the one in equation (5). We also analyze a second test, where ϕ n,η (x n+ ) (an unknown value) is replaced by ϕ n,ηn (x n+ ) (a computable quantity). Theorem 5 (Stopping Criterion) Suppose x n+ is computed as the solution to (2) with model unction given by (). For any positive η consider model unctions ϕ n,η deined by () written with η n replaced by η. For the test unction η T n (η) := (x n+ ) + η 2 xn+ x 0 2 ϕ n,η (x n+ ), the ollowing holds: (i) T n is a positive increasing unction or η η n, where η n is deined in (2). (ii) T n (η) (x n+ ) + η 2 xn+ x 0 2 ϕ n,ηn (x n+ ) or any η η n.

16 6 Hare and Sagastizábal Proo. Following the notation o (8)-(9), since ϕ n,η is deined as in (), using (0) with j = n +, we obtain that T n (η) = n+ + η n+ q ϕ n,η (x n+ ) = n+ + ηq n+ max{ i + ηq i + g i + ηg i i I n q, x n+ x i } { = min n+ i I i g i, x n+ x i + η(q n+ q i g i n q, x n+ x 0 ) } { = min n+ i I i g i, x n+ x i + η } n 2 xn+ x i 2. By Lemma 7 (equation (2) with (i, j) therein replaced by (n +, i)), the act that η η implies that η 2 xn+ x i 2 i n+ g i, x i x n+. Thus, T n (η) is positive and increasing or all η η n, and item (i) ollows. I η η n, then clearly n+ +ηq n+ ϕ n,ηn (x n+ ) n+ +η n q n+ ϕ n,ηn (x n+ ). Utilizing equations () and (0) we see that n+ + ηq n+ ϕ n,ηn (x n+ ) n+ + η n q n+ ϕ n,ηn (x n+ ) n+ + ηq n+ i η q i g i + ηgi q, x n+ x i = min i I n +(η n η)(q n+ q i gq, i x n+ x 0 ) n+ + ηq n+ ϕ n,η (x n+ ) + min{ i I n 2 (η n η) x n+ x i 2 } = T n (η) + min i In { 2 (η η n) x n+ x i 2 }. Since η η n, the quadratic term above is nonnegative, and item (ii) ollows. Our algorithm uses in the stopping test a quotient whose numerator is the right hand side expression in Theorem 5(ii), written with η = R tol µ, or tol µ (0, R] a given tolerance. For the denominator, along the lines o (5), we use R η = tol µ. 5.2 The algorithm We are now in position to ormally deine our algorithm or computing proximal points o nonconvex unctions. It requires the choice o some parameters used or stopping tests, that can be activated i no signiicant progress is observed (parameters min length and max short ), i µ n becomes too small (parameter tol µ ), or i convergence is achieved (parameter tol stop ). Algorithm (CompProx) A prox-center x 0 IR N and a prox-parameter R > 0 are given, as well as a black box procedure evaluating, or each x IR N, the unction value (x) and one subgradient in (x). Step 0 (Initialization) Choose a convexiication growth parameter Γ >, a minimum step length min length 0, a maximum number o allowed short steps max short 0, a µ-tolerance level tol µ (0, R], and a stopping tolerance tol stop 0. Compute the black box inormation 0 := (x 0 ) and g 0 (x0 ). Let q 0 := 0, gq 0 := 0, I 0 := {0}, µ 0 := R and η 0 := 0. Set n = 0 and the short steps counter short = 0.

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques