LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS

Size: px

Start display at page:

Download "LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS"

Shauna Green
5 years ago
Views:

1 Tech. report INS Preprint No. 626 LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS SOMAYEH HOSSEINI, WEN HUANG, ROHOLLAH YOUSEFPOUR Abstract. This paper presents line search algorithms for finding extrema of locally Lipschitz functions defined on Riemannian manifolds. To this end we generalize the so-called Wolfe conditions for nonsmooth functions on Riemannian manifolds. Using ε-subgradient-oriented descent directions and the Wolfe conditions, we propose a nonsmooth Riemannian line search algorithm and establish the convergence of our algorithm to a stationary point. Moreover, we extend the classical BFGS algorithm to nonsmooth functions on Riemannian manifolds. Numerical experiments illustrate the effectiveness and efficiency of the proposed algorithm.. introduction This paper is concerned with the numerical solution of optimization problems defined on Riemannian manifolds where the objective function may be nonsmooth. Such problems arise in a variety of applications, e.g., in computer vision, signal processing, motion and structure estimation and numerical linear algebra; see for instance [, 2, 9, 28]. In the linear case, it is well nown that the line search strategy is one of the basic iterative approaches to find a local minimum of an objective function. For smooth functions defined on linear spaces, each iteration of a line search method computes a search direction and then shows how far to move along that direction. Let f : R n R be a smooth function and the direction p be given, define φ(α) = f(x + αp). The problem that finds a step size in the direction p such that φ(α) φ(0) is just line search about α. If we find α such that the objective function in the direction p is minimized, such a line search is called an exact line search. If we choose α such that the objective function has an acceptable descent amount, such a line search is called an inexact line search. Theoretically, an exact line search may not accelerate a line search algorithm due to, such as, the hemstitching phenomenon. Practically, exact optimal step sizes generally cannot be found, and it is also expensive to find almost exact step sizes. Therefore the inexact line search with less computation load is highly popular. A popular inexact line search condition stipulates that α should first of all give sufficient decrease in the objective function f, as usually measured by the following inequality named the Armijo condition f(x + αp) f(x) c α grad f(x), p 2, (.) for some c (0, ), where grad f(x) denotes the gradient of f at x and u, v 2 denotes the Euclidean inner product u T v. To rule out unacceptably short steps, a second requirement called the curvature condition is used, which requires α to satisfy p, grad f(x + αp) 2 c 2 grad f(x), p 2, for some c 2 (c, ), where c is the constant in (.). If α satisfies the Armijo and curvature conditions, then we say α satisfies the Wolfe conditions. Key words and phrases. Riemannian manifolds, Lipschitz functions, Descent directions, Clare subdifferential. AMS Subject Classifications: 49J52, 65K05, 58C05. Somayeh Hosseini, Hausdorff Center for Mathematics and Institute for Numerical Simulation, University of Bonn, 535 Bonn, Germany (hosseini@ins.uni-bonn.de). Wen Huang, Department of Computational and Applied Mathematics, Rice University, Houston, USA (huwst08@gmail.com). Rohollah Yousefpour, Department of Mathematical Sciences, University of Mazandaran, Babolsar, Iran (yousefpour@umz.ac.ir).

2 2 In smooth optimization algorithms on linear spaces for choosing the search direction p at x, we need that the angle θ, defined below, is bounded away from 90 ; cos θ = grad f(x), p 2 grad f(x) p. (.2) The convergence results are obtained by the Armijo condition along with a safeguard against too small step sizes; see [24]. Indeed, classical convergence results establish that accumulation points of the sequence of iterates are stationary points of the objective function f and the convergence of the whole sequence to a single limit-point is not guaranteed. The question is that whether similar results are correct in nonsmooth optimization problems? In [32], the authors generalized the aforementioned Wolfe conditions for nonsmooth convex functions. They used the Clare subdifferential instead of the gradient. But to obtain convergence, one must not only have well-chosen step lengths but also well-chosen search directions. In nonsmooth problems the angle condition (.2) does not propose a proper set of search directions. However, an equivalent condition carried over nonsmooth problems to obtain the convergence results has been introduced in [25]. Euclidean spaces are not the only spaces in which optimization algorithms are used. There are many applications of optimization on Riemannian manifolds. A manifold, in general, does not have a linear structure, hence the usual techniques, which are often used to study optimization problems on linear spaces cannot be applied and new techniques need to be developed. The common denominator of approaches in optimization methods on manifolds is that instead of conducting a linear step during the line search procedure, one uses retractions or defines the step along a geodesic via the use of the exponential map. Contribution. Our main contributions are fourfold. First, we generalize the concept of a subgradientoriented descent sequence from [25], to Riemannian manifolds. We define also a new notion called ε- subgradient-oriented descent sequence. Then we present a numerical search direction algorithm to find a descent direction for a nonsmooth objective function defined on a Riemannian manifold. In this algorithm, we use a positive definite matrix P in order to define a P-norm equivalent to the norm induced by the inner product on our tangent space. If we use the identity matrix and, therefore, wor with the induced norm on the tangent space, then the algorithm reduces to the descent search algorithm presented in [8]. Second, we define a nonsmooth Armijo condition on Riemannian manifolds, which is a generalization of the nonsmooth Armijo condition presented in [32] to a Riemannian setting. Similar to Euclidean spaces we can add a curvature condition to the nonsmooth Armijo condition to get a nonsmooth generalization of the Wolfe conditions on Riemannian manifolds. This curvature condition is indeed a Riemannian version of the curvature condition presented in [32]. However, due to woring on different tangent spaces, it is not a trivial generalization and using a notion of vector transport is needed. We present also numerical line search algorithms to find a suitable step length satisfying the Wolfe conditions for nonsmooth optimization problems on Riemannian manifolds and study the behavior of the algorithms. The idea of these algorithms are inspired by some similar algorithms from [33]. Third, we combine the search direction algorithm with the line search algorithm to define a minimization algorithm for a nonsmooth optimization problem on a Riemannian manifold. To prove the convergence results for our minimization algorithm, we need to have a sequence of ε-subgradient-oriented descent directions, hence it is important to update the sequence of positive definite matrices, which define the equivalent norms on the tangent spaces, such that the sequences of their smallest and largest eigenvalues are bounded. As our last contribution in this paper, we have also plan to present a practical strategy to update the sequence of matrices to impose such a condition on the sequences of eigenvalues. This strategy can be seen as a version of nonsmooth BFGS method on Riemannian manifolds, which is presented on this setting for the first time and can be considered as a generalization of the smooth BFGS on Riemannian manifolds in [4]. To the best of our nowledge, this version of nonsmooth BFGS has not been presented before for optimization problems on linear spaces, therefore it is not only new on Riemannian settings, but also on linear spaces. This paper is organized as follows. Section 2 presents the proposed Riemannian optimization for nonsmooth cost functions. Specifically, Sections 2. and 2.2 respectively analyze the line search conditions and search direction for nonsmooth functions theoretically. Sections 2.3 and 2.4 respectively give a practical approach to compute a search direction and a step size. Section 2.5 combines the search direction with the line search algorithm and gives a minimization algorithm. This algorithm can be combined with the BFGS strategy and the result is presented in Section 3. Finally, experiments that compare the proposed algorithm with the Riemannian BFGS and Riemannian gradient sampling are reported in Section 4.

3 Previous Wor. For the smooth optimization on Riemannian manifolds the line search algorithms have been studied in [, 27, 30, 3]. In considering optimization problems with nonsmooth objective functions on Riemannian manifolds, it is necessary to generalize concepts of nonsmooth analysis to Riemannian manifolds. In the past few years a number of results have been obtained on numerous aspects of nonsmooth analysis on Riemannian manifolds, [3, 4, 0,, 2, 2]. Papers [8, 9] are among the first papers on numerical algorithms for minimization of nonsmooth functions on Riemannian manifolds. 2. Line search algorithms on Riemannian manifolds In this paper, we use the standard notations and nown results of Riemannian manifolds, see, e.g. [8, 29]. Throughout this paper, M is an n-dimensional complete manifold endowed with a Riemannian metric.,. on the tangent space T x M. We identify tangent space of M at a point x, denoted by T x M, with the cotangent space at x (via the Riemannian metric), denoted by T x M. We denote by cln the closure of the set N. Also, let S be a nonempty closed subset of a Riemannian manifold M, we define dist S : M R by dist S (x) := inf{dist(x, s) : s S }, where dist is the Riemannian distance on M. We use of a class of mappings called retractions. Definition 2. (Retraction). A retraction on a manifold M is a smooth map R : T M M with the following properties. Let R x denote the restriction of R to T x M. R x (0 x ) = x, where 0 x denotes the zero element of T x M. With the canonical identification T 0x T x M T x M, DR x (0 x ) = id TxM, where id TxM denotes the identity map on T x M. By the inverse function Theorem, we have that R x is a local diffeomorphism. For example, the exponential map defined by exp : T M M, v T x M exp x v, exp x (v) = γ(), where γ is a geodesic starting at x with initial tangent vector v, is a retraction; see []. We define B R (x, ε) to be {R x (η x ) η x < ε}. If the retraction R is the exponential function exp, then B R (x, ε) is the open ball centered at x with radius ε. By using retractions, we extend the concepts of nonsmooth analysis on Riemannian manifolds. Let f : M R be a locally Lipschitz function on a Riemannian manifold. For x M, we let f ˆ x = f R x denote the restriction of the pullbac ˆf = f R to T x M. Recall that if G is a locally Lipschitz function defined from a Banach space X to R. The Clare generalized directional derivative of G at the point x X in the direction v X, denoted by G (x; v), is defined by G (x; v) = lim sup y x, t 0 G(y + tv) G(y), t and the generalized subdifferential of G at x, denoted by G(x), is defined by G(x) := {ξ X : ξ, v G (x; v) for all v X}. The Clare generalized directional derivative of f at x in the direction p T x M, denoted by f (x; p), is defined by f (x; p) = f ˆ x (0x ; p), where f ˆ x (0x ; p) denotes the Clare generalized directional derivative of fˆ x : T x M R at 0 x in the direction p T x M. Therefore, the generalized subdifferential of f at x, denoted by f(x), is defined by f(x) = f ˆ x (0 x ). A point x is a stationary point of f if 0 f(x). A necessary condition that f achieves a local minimum at x is that x is a stationary point of f; see [8, 0]. Theorem 2.2 can be proved along the same lines as [0, Theorem 2.9]. Theorem 2.2. Let M be a Riemannian manifold, x M and f : M R be a Lipschitz function of Lipschitz constant L near x, i.e., f(x) f(y) Ldist(x, y), for all y in a neighborhood x. Then (a) f(x) is a nonempty, convex, compact subset of T x M, and ξ L for every ξ f(x). (b) for every v in T x M, we have f (x; v) = max{ ξ, v : ξ f(x)}. (c) if {x i } and {ξ i } are sequences in M and T M such that ξ i f(x i ) for each i, and if {x i } converges to x and ξ is a cluster point of the sequence {ξ i }, then we have ξ f(x). (d) f is upper semicontinuous at x. 3

4 4 In classical optimization on linear spaces, line search methods are extensively used. They are based on updating the iterate by finding a direction and then adding a multiple of the obtained direction to the previous iterate. The extension of line search methods to manifolds is possible by the notion of retraction. We consider algorithms of the general forms stated in Algorithm. Algorithm A line search minimization algorithm on a Riemannian manifold : Require: A Riemannian manifold M, a function f : M R. 2: Input: x 0 M, = 0. 3: Output: Sequence {x }. 4: repeat 5: Choose a retraction R x : T x M M. 6: Choose a descent direction p T x M. 7: Choose a step length α R. 8: Set x + = R x (α p ); = +. 9: until x + sufficiently minimizes f. Once the retraction R x is defined, the search direction p and the step length α are remained. We say p is a descent direction at x, if there exists α > 0 such that for every t (0, α), we have f(r x (tp )) f(x ) < 0. It is obvious that if f (x ; p ) < 0, then p is a descent direction at x. In order to have global convergence results, some conditions must be imposed on the descent direction p as well as the step length α. 2.. Step length. The step length α has to cause a substantial reduction of the objective function f. The ideal choice would be α = argmin α>0 f(r x (αp )) if this exact line search can be carried out efficiently. But in general, it is too expensive to find this value. More practical strategies to identify a step length that achieves adequate reductions in the objective function at minimal cost, is an inexact line search. A popular inexact line search condition stipulates that the step length α should give a sufficient decrease in the objective function f, which is measured by the following condition. Definition 2.3 (Armijo condition). Let f : M R be a locally Lipschitz function on a Riemannian manifold M with a retraction R, x M and p T x M. If the following inequality holds for a step length α and a fixed constant c (0, ) f(r x (αp)) f(x) c αf (x; p), then α satisfies in the Armijo condition. The existence of such a step size is proven later in Theorem Sufficient decrease and bactracing. The Armijo condition does not ensure that the algorithm maes reasonable progress. We present here a bactracing line search algorithm, which chooses its candidates appropriately to mae an adequate progress. An adequate step length will be found after a finite number of iterations, because α will finally become small enough that the Armijo condition holds. Algorithm 2 A bactracing line search on a Riemannian manifold. : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, scalars c, ρ (0, ). 2: Input: α 0 > 0. 3: Output: α. 4: α = α 0. 5: repeat 6: α = ρα. 7: until f(r x (αp )) f(x ) c αf (x ; p ). 8: Terminate with α = α.

5 The Wolfe conditions. There are other useful conditions to rule out unacceptably short step lengths. For example, one can use a second requirement, called the curvature condition. To present this requirement for nonsmooth functions on nonlinear spaces, some preliminaries are needed. To define the curvature condition on a Riemannian manifold, we have to translate a vector from one tangent space to another one. Definition 2.4 (Vector transport). A vector transport associated to a retraction R is defined as a continuous function T : T M T M T M, (η x, ξ x ) T ηx (ξ x ), which for all (η x, ξ x ) satisfies the following conditions: (i) T ηx : T x M T R(ηx)M is a linear map, (ii) T 0x (ξ x ) = ξ x. In short, if η x T x M and R x (η x ) = y, then T ηx transports vectors from the tangent space of M at x to the tangent space at y. Two additional properties are needed in this paper. First, the vector transport need to preserve inner products, that is, T ηx (ξ x ), T ηx (ζ x ) = ξ x, ζ x. (2.) In particular, ξ x T ηx (ξ x ) is then an isometry and possesses an isometric inverse. Second, we will assume that T satisfies the following condition, called locing condition in [6], for transporting vectors along their own direction: where T ξx (ξ x ) = β ξx T Rξx (ξ x ), β ξx = ξ x T Rξx ξ x, (2.2) T Rηx (ξ x ) = DR x (η x )(ξ x ) = d dt R x(η x + tξ x ) t=0. These conditions can be difficult to verify, but are in particular satisfied for the most natural choices of R and T ; for example the exponential map as a retraction and the parallel transport as a vector transport satisfy these conditions with β ξx =. For a further discussion, especially on construction of vector transports satisfying the locing condition, we refer to [6, Sec. 4]. We introduce more intuitive notations: T x y (ξ x ) = T ηx (ξ x ), T x y (ξ y ) = (T ηx ) (ξ y ) whenever y = R x (η x ). Now we present the nonsmooth curvature condition for locally Lipschitz functions on Riemannian manifolds. Definition 2.5 (Curvature condition). The step length α satisfies in the curvature inequality, if the following inequality holds for constant c 2 (c, ), where c is the Armijo constant. Note that if there exists ξ f(r x (αp)) such that sup ξ, T x Rx(αp)(p) c 2 f (x; p), ξ f(r x(αp)) β αp ξ, β αp T x Rx(αp)(p) c 2 f (x; p), then the curvature inequality holds. As in the smooth case, we can define a strong curvature condition by sup ξ, T x Rx(αp)(p) c 2 f (x; p). ξ f(r x(αp)) β αp The following lemma can be proved using Lemma 3. of [22]. Lemma 2.6. Let f : M R be a locally Lipschitz function on a Riemannian manifold M and the function W defined by W (α) := f(r x (αp)) f(x) c 2 αf (x; p), (2.3) where c 2 (c, ), x M and p T x M, be increasing on a neighborhood of some α 0, then α 0 satisfies the curvature condition. Indeed, if W is increasing on a neighborhood of some α 0, then there exists ξ in W (α 0 ) f(r x (α 0 p)), DR x (α 0 p)(p) c 2 f (x; p), such that ξ 0. The result will be obtained using the locing condition.

6 6 Definition 2.7 (Wolfe conditions). Let f : M R be a locally Lipschitz function and p T x M. satisfies the Armijo and curvature conditions, then we say α satisfies the Wolfe conditions. In the following theorem the existence of step lengths satisfying the Wolfe conditions under some assumptions is proved. Theorem 2.8. Assume that f : M R is a locally Lipschitz function on a Riemannian manifold M, R x : T x M M is a retraction, p T x M is chosen such that f (x; p) < 0 and f is bounded below on {R x (αp) : α > 0}, if 0 < c < c 2 <, then there exist step lengths satisfying the Wolfe conditions. Proof. Since φ(α) = f(r x (αp)) is bounded below for all α > 0 and 0 < c <, the line l(α) = f(x) + αc f (x; p) must intersect the graph φ at least once. Since if we assume l(α) < Φ(α) for all α > 0, then f (x; p) < c f (x; p) lim sup α 0 f(r x (αp)) f(x) α f (x; p), which is a contradiction. It means that there exists α > 0 such that l( α) Φ( α). But since l(α) is not bounded below and Φ(α) is bounded below, then there exists ˆα > 0, l(ˆα) = Φ(ˆα). Let α > 0 be the smallest intersecting value of α, hence f(r x (α p)) = f(x) + α c f (x; p). (2.4) It follows that the Armijo condition is satisfied for all step lengths less than α. Now by the mean value theorem, there exist ε (0, ) and ξ f(r x (ε α p)) such that If α f(r x (α p)) f(x) = α ξ, DR x (ε α p)(p). (2.5) By combining (2.4) and (2.5), we obtain ξ, DR x (ε α p)(p) = c f (x; p) > c 2 f (x; p). Using the locing condition, we conclude that ε α satisfies the curvature condition. Remar 2.9. There are a number of rules for choosing the step length α for problems on linear spaces; see [23, 32]. We can define their generalizations on Riemannian manifolds using the concepts of nonsmooth analysis on Riemannian manifolds and the notions of retraction and vector transport. For instance, one can use a generalization of the Mifflin condition, proposed first by Mifflin in [23]. The step length α satisfies the Mifflin condition if the following inequalities hold for the fixed constants c (0, ), c 2 (c, ) f(r x (αp)) f(x) c α p, sup ξ, T x Rx(αp)(p) c 2 p. ξ f(r x(αp)) β αp 2.2. Descent directions. To obtain global convergence result for a line search method, we must not only have well-chosen step lengths but also well-chosen search directions. The following definition is equivalent to gradient-orientedness carried over nonsmooth problems; see [25]. We now that the search direction for a smooth optimization problem often has the form p = P grad f(x ), where P is a symmetric and nonsingular linear map. Therefore, it is not far from expectation to use elements of the subdifferential of f at x in Definition 2.0 and produce a subgradient-oriented descent sequence in nonsmooth problems. Definition 2.0 (Subgradient-oriented descent sequence). A sequence {p } of descent directions is called subgradient-oriented if there exist a sequence of subgradients {g } and a sequence of symmetric linear maps {P : T x M T x M} satisfying 0 < λ λ min (P ) λ max (P ) Λ <, for 0 < λ < Λ < and all N such that p = P g, where λ min (P ) and λ max (P ) denote respectively the smallest and largest eigenvalues of P. In the next definition, we present an approximation of the subdifferential which can be computed approximately. As we aim at transporting subgradients from tangent spaces at nearby points of x M to the tangent space at x, it is important to define a notion of injectivity radius for R x. Let ι(x) := sup{ε > 0 R x : B(0 x, ε) B R (x, ε) is injective}. Then the injectivity radius of M with respect to the retraction R is defined as ι(m) := inf x M ι(x).

7 7 When using the exponential map as a retraction, this definition coincides with the usual one. Definition 2. (ε-subdifferential). Let f : M R be a locally Lipschitz function on a Riemannian manifold M and 0 < 2ε < ι(x). We define the ε-subdifferential of f at x denoted by ε f(x) as follows; ε f(x) = clconv{βη T x y ( f(y)) : y clb R (x, ε)}, where η = Rx (y). Every element of the ε-subdifferential is called an ε-subgradient. Definition 2.2 (ε-subgradient-oriented descent sequence). A sequence {p } of descent directions is called ε-subgradient-oriented if there exist a sequence of ε-subgradients {g } and a sequence of symmetric linear maps {P : T x M T x M} satisfying 0 < λ λ min (P ) λ max (P ) Λ <, for 0 < λ < Λ < and all N such that p = P g, where λ min (P ) and λ max (P ) denote respectively the smallest and largest eigenvalues of P. From now, we assume that a basis of T x M, for all x M is given and we denote every linear map using its matrix representation with respect to the given basis. In the following, we use a positive definite matrix P in order to define a P-norm equivalent to the induced norm induced by the inner product on our tangent space. Indeed ξ P = P ξ, ξ /2 and λ min (P ) P λ max(p ). 2. Theorem 2.3. Assume that f : M R is a locally Lipschitz function on a Riemannian manifold M, R x : T x M M is a retraction and 0 / ε f(x), g = argmin ξ εf(x) ξ P, where P is a positive definite matrix. Assume that p = P g. Then fε (x; p) = g 2 P direction, where fε (x; p) = sup ξ εf(x) ξ, P g. Proof. We first prove that fε (x; p) = g 2 P. It is clear that f ε (x; p) = sup ξ, P g g, P g = g 2 P. ξ εf(x) and p is a descent Now we claim that g 2 P ξ, P g for every ξ εf(x), which implies sup ξ εf(x) ξ, P g g 2 P. Proof of the claim: assume on the contrary; there exists ξ ε f(x) such that ξ, P g < g 2 P and consider w := g + t(ξ g) ε f(x), then g 2 P w 2 P = t(2 ξ g, P g + t ξ g, P (ξ g) ), we can assume that t is small enough such that g 2 P > w 2 P, which is a contradiction and the first part of the theorem is proved. Now we prove that p is a descent direction. Let α := ε p, then for every t (0, α), by Lebourg s mean value theorem, there exist 0 < t 0 < and ξ f(r x (t 0 tp)) such that f(r x (tp)) f(x) = ξ, DR x (t 0 tp)(tp). Using locing condition and isometric property of the vector transport, we have that f(r x (tp)) f(x) = ξ, DR x (tt 0 p)(tp) = t T x Rx(tt β 0p)(ξ), p. tt0p Since tt 0 p ε, it follows that β tt0 p T x R x(tt 0p)(ξ) ε f(x). Therefore, f(r x (tp)) f(x) tf ε (x; p). Note y clb(x, ε). The coefficient 2 guarantees inverse vector transports is well-defined on the boundary of B(x, ε).

8 A descent direction algorithm. For general nonsmooth optimization problems it may be difficult to give an explicit description of the full ε-subdifferential set. Therefore, we need an iterative procedure to approximate the ε-subdifferential. We start with a subgradient of an arbitrary point nearby x and move the subgradient to the tangent space in x and in every subsequent iteration, the subgradient of a new point nearby x is computed and moved to the tangent space in x to be added to the woring set to improve the approximation of ε f(x). Indeed, we do not want to provide a description of the entire ε-subdifferential set at each iteration, what we do is to approximate ε f(x) by the convex hull of its elements. In this way, let P be a positive definite matrix and W := {v,..., v } ε f(x), then we define Now if we have g := argmin v convw v P. f(r x ( εp p )) f(x) cε g 2 P, c (0, ) (2.6) p where p = P g, then we can say convw is an acceptable approximation for ε f(x). Otherwise, using the next lemma we add a new element of ε f(x) \ convw to W. Lemma 2.4. Let W = {v,..., v } ε f(x), 0 / convw and If we have g = argmin{ v P : v convw }. f(r x ( εp p )) f(x) > cε g 2 P, p where c (0, ) and p = P g, then there exist θ 0 (0, and v + :=β θ 0p T x R x(θ 0p)( v + ) / convw. β θ 0p T x R x(θ 0p)( v + ), p c g 2 P, ε p ] and v + f(r x (θ 0 p )) such that Proof. We prove this lemma using Lemma 3. and Proposition 3. in [22]. Define h(t) := f(r x (tp )) f(x) + ct g 2 P, t R, (2.7) and a new locally Lipschitz function G : B(0 x, ε) T x M R by G(g) = f(r x (g)), then h(t) = G(tp ) G(0) + ct g 2 P. Assume that h( ε p ) > 0, then by Proposition 3. of [22], there exists θ ε 0 [0, p ] such that h is increasing in a neighborhood of θ 0. Therefore, by Lemma 3. of [22] for every ξ h(θ 0 ), one has ξ 0. By [0, Proposition 3.] If v + f(r x (θ 0 p )) such that then by the locing condition This implies that which proves our claim. h(θ 0 ) f(r x (θ 0 p )), DR x (θ 0 p )(p ) + c g 2 P. v +, DR x (θ 0 p )(p ) + c g 2 P h(θ 0 ), β θ 0p T x R x(θ 0p)( v + ), p + c g 2 P 0. v + :=β θ 0p T x R x(θ 0p)( v + ) / convw, Now we present Algorithm 3 to find a vector v + ε f(x) which can be added to the set W in order to improve the approximation of ε f(x). This algorithm terminates after finitely many iterations; see [8]. Then we give Algorithm 4 for finding a descent direction. Moreover, Theorem 2.5 proves that Algorithm 4 terminates after finitely many iterations. Theorem 2.5. For the point x M, let the level set N = {x : f(x) f(x )} be bounded, then for each x N, Algorithm 4 terminates after finitely many iterations.

9 9 Algorithm 3 An h-increasing point algorithm; (v, t) = Increasing(x, p, g, a, b, P, c). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M and a vector transport T. 2: Input x M, g, p T x M, a, b R, c (0, ) and P a positive definite matrix such that p = P g. 3: Let t b p, b b p and a a p. 4: repeat 5: select v f(r x (tp)) such that v, β tp T x Rx(tp)(p) + c g 2 P h(t), where h is defined in (2.7), 6: if v, β tp T x Rx(tp)(p) + c g 2 P < 0 then 7: t = a+b 2 8: if h(b) > h(t) then 9: a = t 0: else : b = t 2: end if 3: end if 4: until v, β tp T x Rx(tp)(p) + c g 2 P 0 Algorithm 4 A descent direction algorithm; (g, p ) = Descent(x, δ, c, ε, P ). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, the injectivity radius ι(m) > 0 and a vector transport T. 2: Input x M, δ, c (0, ), 0 < ε < ι(m) and a positive definite matrix P. 3: Select arbitrary v ε f(x). 4: Set W = {v} and let =. 5: Step : (Compute a descent direction) 6: Solve the following minimization problem and let g be its solution: 7: if g 2 δ then Stop. 8: else let p = P g. 9: end if 0: Step 2: (Stopping condition) min v P. v convw : if f(r x ( εp p )) f(x) cε g 2 P, then Stop. p 2: end if 3: Step 3: (v, t) = Increasing(x, p, g, 0, ε, P, c). 4: Set v + = βtp T x Rx(tp )(v), W + = W {v + } and = +. Go to Step. Proof. We claim that either after a finite number of iterations the stopping condition is satisfied or for some m, g m 2 δ, and the algorithm terminates. If the stopping condition is not satisfied and g 2 > δ, then by Lemma 2.4 we find v + / convw such that v +, p c g 2 P. Note that DR x on clb(0 x, ε) is bounded by some m 0, therefore β η m for every η clb(0 x, ε). Hence by isometry property of the vector transport and by the Lipschitzness of f of the constant L, Theorem 2.9 of [0] implies that for every ξ ε f(x), ξ m L. Now, by definition, g + conv({v + } W ) has

10 0 the minimum norm, therefore for all t (0, ), g + 2 P tv + + ( t)g 2 P where the last inequality is obtained by assuming g 2 P + 2t P g, (v + g ) + t 2 v + g 2 P g 2 P 2t( c) g 2 P + 4t 2 L 2 m 2 λ max (P ) ( [( c)(2lm ) δ /2 λ min (P ) /2 λ max (P ) /2 ] 2 ) g 2 P, t = ( c)(2lm ) 2 λ max (P ) g 2 P (0, ), δ /2 (0, Lm ) and λ min (P ) g 2 P g 2 > δ. Now considering r = [( c)(2lm ) δ /2 λ min (P ) /2 λ max (P ) /2 ] 2 (0, ), it follows that g + 2 P r g 2 P... r (Lm ) 2 λ max (P ). Therefore, after a finite number of iterations g + 2 P δλ min(p ), which proves that g + 2 δ Step length selection algorithms. A crucial observation is that verifying the Wolfe conditions presented in Definition 2.7 can be impractical in case that no explicit expression for the subdifferential f(x) is available. Using an approximation of the Clare subdifferential, we overcome this problem. In the last subsection, we approximated f (x; p ) by g 2 P, where p := P g, g = argmin{ v P : v convw } and convw is an approximation of ε f(x). Therefore, in our line search algorithm we use the approximation of f (x; p) to find a suitable step length. Algorithm 5 A line search algorithm; α = Line(x, p, g, P, c, c 2 ) : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, the injectivity radius ι(m) > 0 and a vector transport T. 2: Input x M, a descent direction p in T x M with p = P g where g ε f(x) and P is a positive definite matrix and c (0, ), c 2 (c, ). 3: Set α 0 = 0, α max < ι(m), α = and i =. 4: Repeat 5: Evaluate A(α i ) := f(r x (α i p)) f(x) + c α i g 2 P 6: if A(α i ) > 0 then 7: α must be obtained by Zoom(x, p, g, P, α i, α i, c, c 2 ) 8: Stop 9: end if 0: Compute ξ f(r x (α i p)) such that ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P W (α i), where W is defined in (2.3). : if ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P 0 then α = α i 2: Stop 3: else 4: Choose α i+ (α i, α max ) 5: end if 6: i = i +. 7: End(Repeat) (2.8) The tas of a line search algorithm is to find a step size which decreases the objective function along the paths. The Wolfe conditions are used in the line search to enforce a sufficient decrease in the objective function, and to exclude unnecessarily small step sizes. Algorithm 5 is a one dimensional search procedure for the function φ(α) = f(r x (αp)) to find a step length satisfying the Armijo and curvature conditions. The procedure is a generalization of the algorithm for the well-nown Wolfe conditions for smooth functions, see [24, p ]. The algorithm has two stages. The first stage begins with a trial estimate α and eeps it increasing until it finds either an acceptable step length or an interval that contains the desired step length. The parameter α max is a user-supplied bound on the maximum step length allowed. The last step

11 Algorithm 6 α = Zoom(x, p, g, P, a, b, c, c 2 ) : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M and a vector transport T. 2: Input x M, a descent direction p in T x M with p = P g, where g ε f(x) and P is a positive definite matrix and c (0, ), c 2 (c, ), a, b R. 3: i =, a = a, b = b. 4: Repeat 5: α i = ai+bi 2 6: Evaluate A(α i ) := f(r x (α i p)) f(x) + c α i g 2 P, 7: if A(α i ) > 0 then 8: b i+ = α i, a i+ = a i. 9: else 0: Compute ξ f(r x (α i p)) such that ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P W (α i), where W is defined in (2.3). : if ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P 0 then α = α i 2: Stop. 3: else a i+ = α i, b i+ = b i. 4: end if 5: end if 6: i = i +. 7: End(Repeat) of Algorithm 5 performs extrapolation to find the next trial value α i+. To implement this step we can simply set α i+ to some constant multiple of α i. In the case that Algorithm 5 finds an interval [α i, α i ] that contains the desired step length, the second stage is invoed by Algorithm 6 called Zoom which successively decreases the size of the interval. Remar 2.6. By using Lemma 3. of [22], if there exists ξ f(r x (α i p)) such that ξ, DR x (α i p)(p) + c 2 g 2 P W (α i) and ξ, DR x (α i p)(p) + c 2 g 2 P < 0, where W is defined in (2.3), then W is decreasing on a neighborhood of α i, which means that for every η W (α i ), η 0. Proposition 2.7. Assume that f : M R is a locally Lipschitz function and p is the descent direction obtained by Algorithm 4. Then either Algorithm 6 terminates after finitely many iterations or it generates a sequence of intervals [a i, b i ], such that each one contains some subintervals satisfying the Wolfe conditions and a i and b i converge to a step length a > 0. Moreover, there exist ξ, ξ 2, ξ 3 f(r x (ap)) such that ξ, T x Rx(ap)(p) c 2 g 2 P, ξ 2, T x Rx(ap)(p) c 2 g 2 P β ap β ap ξ 3, β ap T x Rx(ap)(p) c g 2 P. Proof. Suppose that the algorithm does not terminate after finitely many iterations. Since {a i } and {b i } are monotone sequences, they converge to some a and b. As we have b i a i := b a 2 i, thus b i a i converges to zero. Therefore, a = b. We claim that a i > 0 after finitely many iterations. Since p is a descent direction, then there exists α > 0 such that A(s) 0 for all s (0, α), where A(s) is defined in Algorithm 5. Note that there exists m > 0 such that for every i m, b 2 i α. If a m+ = 0, then we must have A(α i ) > 0 for all i =,..., m. Hence, we have b m+ = α m, a m = a m+ = 0 and α m+ = b m+ = b 2 2 m. Therefore, α m+ α. This implies that A(α m+ ) 0, then a m+2 = α m+. Let S be the set of all indices with a i+ = α i. Therefore, there exists ξ i f(r x (α i p)) such that ξ i, T x Rx(α ip)(p) + c 2 g 2 P < 0 β αip for all i S. Since ξ i f(r x (α i p)) and f is locally Lipschitz on a neighborhood of x, then by [0, Theorem 2.9] the sequence {ξ i } contains a convergent subsequence and without loss of generality, we can assume this

12 2 sequence is convergent to some ξ f(r x (ap)). Therefore, ξ, β ap T x Rx(ap)(p) + c 2 g 2 P 0. Since a i < b i, A(a i ) 0 and A(a i ) < A(b i ), therefore A(.) contains a step length r i such that A(.) is increasing on its neighborhood and A(r i ) 0. Since c < c 2, therefore W (.) is also increasing in a neighborhood of r i. Therefore, the Wolfe conditions are satisfied at r i. Assume that κ i, β T ri x R p x(r ip)(p) +c 2 g 2 P W (r i) for some κ i f(r x (r i p)), then κ i, β T ri x R p x(r ip)(p) +c 2 g 2 P 0. Therefore, without loss of generality, we can suppose that κ i is convergent to some ξ 2 f(r x (ap)). This implies that ξ 2, β ap T x Rx(ap)(p) +c 2 g 2 P 0. Note that A(.) is increasing on a neighborhood of r i, therefore for all η i f(r x (r i p)) with η i, β rip T x Rx(r ip)(p) + c g 2 P A(r i ), we have η i, β T ri x R p x(r ip)(p) +c g 2 P 0. As before, we can say η i is convergent to some ξ 3 in f(r x (ap)) and ξ 3, β ap T x Rx(ap)(p) + c g 2 P 0. In the next proposition, we prove that if Algorithms 6 does not terminate after finitely many iterations and converges to a, then the Wolfe conditions are satisfied at a. Proposition 2.8. Assume that f : M R is a locally Lipschitz function and p := P g is a descent direction obtained from Algorithm 4. If Algorithm 6 does not terminate after finitely many iterations and converges to a. Then there exists ξ f(r x (ap)) such that ξ, β ap T x Rx(ap)(p) = c 2 g 2 P. Proof. By Proposition 2.7, there exist ξ, ξ 2 f(r x (ap)) such that and ξ, ξ, T x Rx(ap)(p) c 2 g 2 P, ξ 2, T x Rx(ap)(p) c 2 g 2 P, β ap β ap T x Rx(ap)(p) + c 2 g 2 P, ξ 2, T x Rx(ap)(p) + c 2 g 2 P W (a), β ap β ap where W is defined in (2.3). Since W (a) is convex, therefore 0 W (a) which means there exists ξ f(r x (ap)) such that ξ, T x Rx(ap)(p) + c 2 g 2 P = 0. β ap In the finite precision arithmetic, if the length of the interval [a i, b i ] is too small, then two function values f(r x (a i p)) and f(r x (b i p)) are close to each other. Therefore, in practice, Algorithm 6 must be terminated after finitely many iterations; see [24]. If Algorithm 6 does not find a step length satisfying the Wolfe conditions, then we select a step length satisfying the Armijo condition Minimization algorithms. Finally, Algorithm 7 is the minimization algorithm for locally Lipschitz objective functions on Riemannian manifolds. Theorem 2.9. If f : M R is a locally Lipschitz function on a complete Riemannian manifold M, and N = {x : f(x) f(x )} is bounded and the sequence of symmetric matrices {P s } satisfies the following condition 0 < λ λ min (P s ) λ max (P s ) Λ <, (2.9) for 0 < λ < Λ < and all, s. Then either Algorithm 7 terminates after a finite number of iterations with g s = 0, or every accumulation point of the sequence {x } belongs to the set X = {x M : 0 f(x)}.

13 3 Algorithm 7 A minimization algorithm; x = Min(f, x, θ ε, θ δ, ε, δ, c, c 2 ). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M and the injectivity radius ι(m) > 0. 2: Input: A starting point x M, c (0, ), c 2 (c, ), θ ε, θ δ (0, ), δ (0, ), ε (0, ι(m)), = and P = I. 3: Step (Set new parameters) s = and x s = x, P s = P. 4: Step 2. (Descent direction) (g s, ps ) = Descent(xs, δ, c, ε, P s) 5: if g s = 0, then Stop. 6: end if 7: if g s 2 δ then set ε + = ε θ ε, δ + = δ θ δ, x + = x s, P + = P s, = +. Go to Step. 8: else α = Line(x s, p s, g, s P s, c, c 2 ) and construct the next iterate x s+ = R x s (αp s s+ ) and update P. Set s = s + and go to Step 2. 9: end if Proof. If the algorithm terminates after finite number of iterations, then x s is an ε stationary point of f. Suppose that the algorithm does not terminate after finitely many iterations. Assume that p s is a descent direction, since α, we have ε p f(x s+ ) f(x s ) c ε g s 2 P s p for s =, 2,..., therefore, f(x s+ ) < f(x s ) for s =, 2,... Since f is Lipschitz and N is bounded, it follows that f has a minimum in N. Therefore, f(x s ) is a bounded decreasing sequence in R, so is convergent. Thus f(x s ) f(xs+ ) is convergent to zero and there exists s such that f(x s ) f(x s+ ) c ε δ λ p, for all s s. Thus λ g s 2 g s 2 P s (f(xs ) f(xs+ ) ) p δ λ, s s. (2.0) c ε Hence after finitely many iterations, there exists s such that x + = x s. Since M is a complete Riemannian manifold and {x } N is bounded, there exists a subsequence {x i } converging to a point x M. Since convw s i i g s i i 2 P s i := min{ v 2 P s i i i Hence lim i g i = 0. Note that g i < 0, is a subset of εi f(x s i i ), then : v εi f(x s i i )} min{ v 2 P s i : v W s i i } Λδ i. i εi f(x s i i ), hence 0 f(x ). 3. Nonsmooth BFGS algorithms on Riemannian manifolds In this section we discuss the nonsmooth BFGS methods on Riemannian manifolds. Let f be a smooth function defined on R n and P be a positive definite matrix which is the approximation of the Hessian of f. We now that p = P grad f(x ) is a descent direction. The approximation of the Hessian can be updated by the BFGS method, when the computed step length satisfies in the Wolfe conditions. Indeed we assume that s = x + x, y = grad f(x + ) grad f(x ) and α satisfies the Wolfe conditions, then we have the so-called secant inequality y, s 2 > 0. Therefore, P can be updated by the BFGS method as follows; P + := P + y y T P s s T P. s, y 2 s, P s 2 The structure of smooth BFGS algorithm on Riemannian manifolds are given in several papers; see [7, 26, 27]. Note that the classical update formulas for the approximation of the Hessian have no meaning on Riemannian manifolds. First, s := T x R x (α p )(α p ),

14 4 y := β α p grad f(x + ) T x R x (α p )(grad f(x )) are vectors in the tangent space T x+ M. The inner product on tangent spaces is then given by the chosen Riemannian metric. Furthermore, the dyadic product of a vector with the transpose of another vector, which results in a matrix in the Euclidean space, is not a naturally defined operation on a Riemannian manifold. Moreover, while in Euclidean spaces the Hessian can be expressed as a symmetric matrix, on Riemannian manifolds it can be defined as a symmetric and bilinear form. However, one can define a linear function P : T x M T x M by D 2 f(x )(η, ξ) := η, P ξ, η, ξ T x M. Therefore, the approximation of the Hessian can be updated by the BFGS method as follows; P + := P + y y y s P s ( P s ) ( P, s ) s where P := T x R x (α p ) P T x R x (α p ). Now we assume that f : M R is a locally Lipschitz function and g := argmin v convw v P, p = P g, where P is a positive definite matrix and convw is an approximation of ε f(x). Let α be returned by Algorithm 5 and ξ f(r x (αp)) be such that ξ, β αp T x Rx(αp)(p) + c 2 g 2 P 0. Then for all v convw, ξ β αp T x Rx(αp)(v), T x Rx(αp)(p) > 0. β αp This shows that if we update the approximation of the Hessian matrix by the BFGS method as: P + := P + y y y s P s ( P s ) ( P, s ) s where P := T x R x (α p ) P T x R x (α p ) and s := T x R x (α p )(α p ), y := β α ξ p T x R x (α p )(g ) are vectors provided that ξ, T x R β x (α p )(p ) + c 2 g 2 0, P α p then the Hessian approximation P + is symmetric positive definite. It is worthwhile to mention that to have the global convergence of the minimization algorithm 7, the sequence of symmetric matrices {P s } must satisfy the following condition 0 < λ λ min (P s ) λ max (P s ) Λ <, (3.) for 0 < λ < Λ < and all, s. From a theoretical point of view it is difficult to guarantee (3.); see[24, page 22]. But we can translate the bounds on the spectrum of P s into conditions that only involve s and y as follow; s y y s s λ, y y s Λ. This technique is used in [24, Theorem 8.5]; see also Algorithm in [32]. It is worthwhile to mention that, in practice, Algorithm 6 must be terminated after finitely many iterations. But we need to assume that even if Algorithm 6 does not find a step length satisfying the Wolfe conditions, then we can select a step length satisfying the Armijo condition and update P s+ in Algorithm 8 by identity matrix. 4. Experiments The oriented bounding box problem [5], which aims to find a minimum volume box containing n given points in d dimensional space, is used to illustrate the performance of Algorithm 8. Suppose points are given by a matrix E R d n, where each column represents the coordinate of a point. A cost function of volume is given by d f : O d R : O V (OE) = (e i,max e i,min ), i=

15 5 Algorithm 8 A nonsmooth BFGS algorithm on a Riemannian manifold; x = subrbf GS(f, x, θ ε, θ δ, ε, δ, c, c 2 ). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, the injectivity radius ι(m) > 0 and a vector transport T. 2: Input: A starting point x M, c (0, ), c 2 (c, ), θ ε, θ δ (0, ), δ (0, ), ε (0, ι(m)), =, P = I, a bound /Λ > 0 on y s y y and λ on s y s s. 3: Step (Set new parameters) s =, x s = x and P s = P. 4: Step 2. (Descent direction) (g s, ps ) = Descent(xs, δ, c, ε, P s ) 5: if g s = 0 then Stop. 6: end if 7: if g s 2 δ, then set ε + = ε θ ε, δ + = δ θ δ, x + = x s, P + = P s, = +. Go to Step. 8: else α = Line(x s, p s, g s, P s, c, c 2 ) and construct the next iterate x s+ = R x s (αp s ) and define s := T x s R x s (αp s ) (αp s ), y := β αp s ξ T x s R x s (αp s ) (g s), s := s + max(0, Λ s y )y y y. 9: if s y s s λ then, Update P s+ 0: else P s+ := I. : end if Set s = s + and go to Step 2. 2: end if := P s + y y y s P ss ( P ss ) ( P ss. ) s where O d denotes the d-by-d orthogonal group, e i,max and e i,min denote max and min entries, respectively, of the i-th row of OE. If there exists more than one entry at any row reaching maximum or minimum values for a given O, then the cost function f is not differentiable at O. Otherwise, f is differentiable and its gradient is grad f(o) = P O (T E T ), where T R d n and w e i,max e i,min, the column of e i,max ; w i-th row of T = e i,max e i,min, the column of e i,min ; 0, otherwise. for i =,..., d, w = f(o), and P O (M) = M O(O T M + M T O)/2. The qf retraction is used R X (η X ) = qf(x + η X ), where qf(m) denotes the Q factor of the QR decomposition with nonnegative elements on the diagonal of R. The vector transport by parallelization [7] is isometric and essentially identity. We modify it by the approach in [6, Section 4.2] and use the resulting vector transport satisfying the locing condition. To the best of our nowledge, it is unnown how large the injectivity radius for this retraction. But in practice, the vector transport can be represented by a matrix. Therefore, we always use the inverse of the matrix as the inverse of the vector transport. Algorithm 8 is compared with the Riemannian gradient sampling (see [5, Section 7.2] or [3, Algorithm ]) and the modified Riemannian BFGS method (see [5, Section 7.3]), which is a Riemannian generalization of [20]. The main difference between the Riemannian gradient sampling (RGS) method, the modified Riemannian BFGS method, and Algorithm 7 is the search direction. Specifically, the search direction η in RGS at x

16 6 is computed as follows: i) randomly generate m points in a small enough neighborhood of x ; ii) transport the gradients at those m points to the tangent space at x ; iii) compute the shortest tangent vector in the convex hull of the resulting tangent vectors and the gradient at x ; and iv) set η to be the shortest vector. Note that the number of points, m, is required to be larger than the dimension of the domain. The modified Riemannian BFGS method maes an assumption that the cost function is differentiable at all the iterates. It follows that the search direction is the same as the Riemannian BFGS method for smooth cost functions [6]. However, the stopping criterion is required to be modified for non-smooth cost functions. Specifically, let G be defined as follows: j =, G = {g } if Rx (x ) > ε; j = j +, G = {g () j +,..., g(), g() } if R x (x ) ε; j = J, G = {g () J+,..., g() } if R x (x ) ε;, g() where g (j) i = T xi x j (g i ), ε > 0 and positive integer J are given parameters. The J also needs to be larger than the dimension of the domain. The modified Riemannian BFGS method stops if the shortest length vector in the convex hull of G is less than δ. The tested algorithms stop if one of the following conditions is satisfied: the number of iterations reaches 5000; the step size is less than the machine epsilon ; ε 0 6 and δ 0 2. We say that an algorithm successfully terminates if it is stopped by satisfying the last condition. Note that an unsuccessfully terminated algorithm does not imply that the last iterate must be not close to a stationary point. It may also imply that the stopping criterion is not robust. The following parameters are used for Algorithm 8: ε = 0 4, δ = 0 8, θ ε = 0 2, θ δ = 0 4, λ = 0 4, Λ = 0 4, c = 0 4 and c 2 = The ε and J in the modified Riemannian BFGS method are set to be 0 6 and 2dim, respectively, where dim = d(d )/2 is the dimension of the domain. Multiple values of the parameter m in RGS are tested and given in the caption of Figures and 2. Initial iterate is given by orthonormalizing a matrix whose entries are drawn from a standard normal distribution. The code is written in C++ and is available at htm. All experiments are performed on a 64 bit Ubuntu platform with 3.6 GHz CPU (Intel Core i7-4790). The three algorithms are tested with d = 3, 4, For each value of d, we use 00 random runs. Note that the three algorithms use 00 same random seeds. Figure reports the percentage of successful runs of each algorithm. The success rate of RGS largely depends on the parameter m. Specifically, the larger m is, the higher the success rate is. Algorithm 8 always successfully terminates, which means that Algorithm 8 is more robust than all the other methods. The average number of function and gradient evaluations of successful runs and the average computational time of the successful runs are reported in Figure 2. Among the successful tests, the modified Riemannian BFGS method needs the least number of function and gradient evaluations due to its simple approach while Algorithm 8 needs the second least. For the bounding box problem, the larger the dimension of the domain is, the cheaper the function and gradient evaluations are when compared to solving the quadratic programming problem, i.e., finding the shortest length vector in a convex hull of a set of vectors. Therefore, as shown in Figure 2, even though the number of function and gradient evaluations are different for big d, the computational time is not significantly different. However, Algorithm 8 is still always slightly faster than all the other methods for all the values of d. In conclusion, the experiments show that the proposed method, Algorithm 8, is more robust and faster than RGS and the modified Riemannian BFGS method in the sense of success rate and computational time. 5. Acnowledgements We than Pierre-Antoine Absil at Université catholique de Louvain for his helpful comments. This paper presents research results of the Belgian Networ DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office. This wor was supported by FNRS under grant PDR T

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1 A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir