LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS

Size: px
Start display at page:

Download "LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS"

Transcription

1 Tech. report INS Preprint No. 626 LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS SOMAYEH HOSSEINI, WEN HUANG, ROHOLLAH YOUSEFPOUR Abstract. This paper presents line search algorithms for finding extrema of locally Lipschitz functions defined on Riemannian manifolds. To this end we generalize the so-called Wolfe conditions for nonsmooth functions on Riemannian manifolds. Using ε-subgradient-oriented descent directions and the Wolfe conditions, we propose a nonsmooth Riemannian line search algorithm and establish the convergence of our algorithm to a stationary point. Moreover, we extend the classical BFGS algorithm to nonsmooth functions on Riemannian manifolds. Numerical experiments illustrate the effectiveness and efficiency of the proposed algorithm.. introduction This paper is concerned with the numerical solution of optimization problems defined on Riemannian manifolds where the objective function may be nonsmooth. Such problems arise in a variety of applications, e.g., in computer vision, signal processing, motion and structure estimation and numerical linear algebra; see for instance [, 2, 9, 28]. In the linear case, it is well nown that the line search strategy is one of the basic iterative approaches to find a local minimum of an objective function. For smooth functions defined on linear spaces, each iteration of a line search method computes a search direction and then shows how far to move along that direction. Let f : R n R be a smooth function and the direction p be given, define φ(α) = f(x + αp). The problem that finds a step size in the direction p such that φ(α) φ(0) is just line search about α. If we find α such that the objective function in the direction p is minimized, such a line search is called an exact line search. If we choose α such that the objective function has an acceptable descent amount, such a line search is called an inexact line search. Theoretically, an exact line search may not accelerate a line search algorithm due to, such as, the hemstitching phenomenon. Practically, exact optimal step sizes generally cannot be found, and it is also expensive to find almost exact step sizes. Therefore the inexact line search with less computation load is highly popular. A popular inexact line search condition stipulates that α should first of all give sufficient decrease in the objective function f, as usually measured by the following inequality named the Armijo condition f(x + αp) f(x) c α grad f(x), p 2, (.) for some c (0, ), where grad f(x) denotes the gradient of f at x and u, v 2 denotes the Euclidean inner product u T v. To rule out unacceptably short steps, a second requirement called the curvature condition is used, which requires α to satisfy p, grad f(x + αp) 2 c 2 grad f(x), p 2, for some c 2 (c, ), where c is the constant in (.). If α satisfies the Armijo and curvature conditions, then we say α satisfies the Wolfe conditions. Key words and phrases. Riemannian manifolds, Lipschitz functions, Descent directions, Clare subdifferential. AMS Subject Classifications: 49J52, 65K05, 58C05. Somayeh Hosseini, Hausdorff Center for Mathematics and Institute for Numerical Simulation, University of Bonn, 535 Bonn, Germany (hosseini@ins.uni-bonn.de). Wen Huang, Department of Computational and Applied Mathematics, Rice University, Houston, USA (huwst08@gmail.com). Rohollah Yousefpour, Department of Mathematical Sciences, University of Mazandaran, Babolsar, Iran (yousefpour@umz.ac.ir).

2 2 In smooth optimization algorithms on linear spaces for choosing the search direction p at x, we need that the angle θ, defined below, is bounded away from 90 ; cos θ = grad f(x), p 2 grad f(x) p. (.2) The convergence results are obtained by the Armijo condition along with a safeguard against too small step sizes; see [24]. Indeed, classical convergence results establish that accumulation points of the sequence of iterates are stationary points of the objective function f and the convergence of the whole sequence to a single limit-point is not guaranteed. The question is that whether similar results are correct in nonsmooth optimization problems? In [32], the authors generalized the aforementioned Wolfe conditions for nonsmooth convex functions. They used the Clare subdifferential instead of the gradient. But to obtain convergence, one must not only have well-chosen step lengths but also well-chosen search directions. In nonsmooth problems the angle condition (.2) does not propose a proper set of search directions. However, an equivalent condition carried over nonsmooth problems to obtain the convergence results has been introduced in [25]. Euclidean spaces are not the only spaces in which optimization algorithms are used. There are many applications of optimization on Riemannian manifolds. A manifold, in general, does not have a linear structure, hence the usual techniques, which are often used to study optimization problems on linear spaces cannot be applied and new techniques need to be developed. The common denominator of approaches in optimization methods on manifolds is that instead of conducting a linear step during the line search procedure, one uses retractions or defines the step along a geodesic via the use of the exponential map. Contribution. Our main contributions are fourfold. First, we generalize the concept of a subgradientoriented descent sequence from [25], to Riemannian manifolds. We define also a new notion called ε- subgradient-oriented descent sequence. Then we present a numerical search direction algorithm to find a descent direction for a nonsmooth objective function defined on a Riemannian manifold. In this algorithm, we use a positive definite matrix P in order to define a P-norm equivalent to the norm induced by the inner product on our tangent space. If we use the identity matrix and, therefore, wor with the induced norm on the tangent space, then the algorithm reduces to the descent search algorithm presented in [8]. Second, we define a nonsmooth Armijo condition on Riemannian manifolds, which is a generalization of the nonsmooth Armijo condition presented in [32] to a Riemannian setting. Similar to Euclidean spaces we can add a curvature condition to the nonsmooth Armijo condition to get a nonsmooth generalization of the Wolfe conditions on Riemannian manifolds. This curvature condition is indeed a Riemannian version of the curvature condition presented in [32]. However, due to woring on different tangent spaces, it is not a trivial generalization and using a notion of vector transport is needed. We present also numerical line search algorithms to find a suitable step length satisfying the Wolfe conditions for nonsmooth optimization problems on Riemannian manifolds and study the behavior of the algorithms. The idea of these algorithms are inspired by some similar algorithms from [33]. Third, we combine the search direction algorithm with the line search algorithm to define a minimization algorithm for a nonsmooth optimization problem on a Riemannian manifold. To prove the convergence results for our minimization algorithm, we need to have a sequence of ε-subgradient-oriented descent directions, hence it is important to update the sequence of positive definite matrices, which define the equivalent norms on the tangent spaces, such that the sequences of their smallest and largest eigenvalues are bounded. As our last contribution in this paper, we have also plan to present a practical strategy to update the sequence of matrices to impose such a condition on the sequences of eigenvalues. This strategy can be seen as a version of nonsmooth BFGS method on Riemannian manifolds, which is presented on this setting for the first time and can be considered as a generalization of the smooth BFGS on Riemannian manifolds in [4]. To the best of our nowledge, this version of nonsmooth BFGS has not been presented before for optimization problems on linear spaces, therefore it is not only new on Riemannian settings, but also on linear spaces. This paper is organized as follows. Section 2 presents the proposed Riemannian optimization for nonsmooth cost functions. Specifically, Sections 2. and 2.2 respectively analyze the line search conditions and search direction for nonsmooth functions theoretically. Sections 2.3 and 2.4 respectively give a practical approach to compute a search direction and a step size. Section 2.5 combines the search direction with the line search algorithm and gives a minimization algorithm. This algorithm can be combined with the BFGS strategy and the result is presented in Section 3. Finally, experiments that compare the proposed algorithm with the Riemannian BFGS and Riemannian gradient sampling are reported in Section 4.

3 Previous Wor. For the smooth optimization on Riemannian manifolds the line search algorithms have been studied in [, 27, 30, 3]. In considering optimization problems with nonsmooth objective functions on Riemannian manifolds, it is necessary to generalize concepts of nonsmooth analysis to Riemannian manifolds. In the past few years a number of results have been obtained on numerous aspects of nonsmooth analysis on Riemannian manifolds, [3, 4, 0,, 2, 2]. Papers [8, 9] are among the first papers on numerical algorithms for minimization of nonsmooth functions on Riemannian manifolds. 2. Line search algorithms on Riemannian manifolds In this paper, we use the standard notations and nown results of Riemannian manifolds, see, e.g. [8, 29]. Throughout this paper, M is an n-dimensional complete manifold endowed with a Riemannian metric.,. on the tangent space T x M. We identify tangent space of M at a point x, denoted by T x M, with the cotangent space at x (via the Riemannian metric), denoted by T x M. We denote by cln the closure of the set N. Also, let S be a nonempty closed subset of a Riemannian manifold M, we define dist S : M R by dist S (x) := inf{dist(x, s) : s S }, where dist is the Riemannian distance on M. We use of a class of mappings called retractions. Definition 2. (Retraction). A retraction on a manifold M is a smooth map R : T M M with the following properties. Let R x denote the restriction of R to T x M. R x (0 x ) = x, where 0 x denotes the zero element of T x M. With the canonical identification T 0x T x M T x M, DR x (0 x ) = id TxM, where id TxM denotes the identity map on T x M. By the inverse function Theorem, we have that R x is a local diffeomorphism. For example, the exponential map defined by exp : T M M, v T x M exp x v, exp x (v) = γ(), where γ is a geodesic starting at x with initial tangent vector v, is a retraction; see []. We define B R (x, ε) to be {R x (η x ) η x < ε}. If the retraction R is the exponential function exp, then B R (x, ε) is the open ball centered at x with radius ε. By using retractions, we extend the concepts of nonsmooth analysis on Riemannian manifolds. Let f : M R be a locally Lipschitz function on a Riemannian manifold. For x M, we let f ˆ x = f R x denote the restriction of the pullbac ˆf = f R to T x M. Recall that if G is a locally Lipschitz function defined from a Banach space X to R. The Clare generalized directional derivative of G at the point x X in the direction v X, denoted by G (x; v), is defined by G (x; v) = lim sup y x, t 0 G(y + tv) G(y), t and the generalized subdifferential of G at x, denoted by G(x), is defined by G(x) := {ξ X : ξ, v G (x; v) for all v X}. The Clare generalized directional derivative of f at x in the direction p T x M, denoted by f (x; p), is defined by f (x; p) = f ˆ x (0x ; p), where f ˆ x (0x ; p) denotes the Clare generalized directional derivative of fˆ x : T x M R at 0 x in the direction p T x M. Therefore, the generalized subdifferential of f at x, denoted by f(x), is defined by f(x) = f ˆ x (0 x ). A point x is a stationary point of f if 0 f(x). A necessary condition that f achieves a local minimum at x is that x is a stationary point of f; see [8, 0]. Theorem 2.2 can be proved along the same lines as [0, Theorem 2.9]. Theorem 2.2. Let M be a Riemannian manifold, x M and f : M R be a Lipschitz function of Lipschitz constant L near x, i.e., f(x) f(y) Ldist(x, y), for all y in a neighborhood x. Then (a) f(x) is a nonempty, convex, compact subset of T x M, and ξ L for every ξ f(x). (b) for every v in T x M, we have f (x; v) = max{ ξ, v : ξ f(x)}. (c) if {x i } and {ξ i } are sequences in M and T M such that ξ i f(x i ) for each i, and if {x i } converges to x and ξ is a cluster point of the sequence {ξ i }, then we have ξ f(x). (d) f is upper semicontinuous at x. 3

4 4 In classical optimization on linear spaces, line search methods are extensively used. They are based on updating the iterate by finding a direction and then adding a multiple of the obtained direction to the previous iterate. The extension of line search methods to manifolds is possible by the notion of retraction. We consider algorithms of the general forms stated in Algorithm. Algorithm A line search minimization algorithm on a Riemannian manifold : Require: A Riemannian manifold M, a function f : M R. 2: Input: x 0 M, = 0. 3: Output: Sequence {x }. 4: repeat 5: Choose a retraction R x : T x M M. 6: Choose a descent direction p T x M. 7: Choose a step length α R. 8: Set x + = R x (α p ); = +. 9: until x + sufficiently minimizes f. Once the retraction R x is defined, the search direction p and the step length α are remained. We say p is a descent direction at x, if there exists α > 0 such that for every t (0, α), we have f(r x (tp )) f(x ) < 0. It is obvious that if f (x ; p ) < 0, then p is a descent direction at x. In order to have global convergence results, some conditions must be imposed on the descent direction p as well as the step length α. 2.. Step length. The step length α has to cause a substantial reduction of the objective function f. The ideal choice would be α = argmin α>0 f(r x (αp )) if this exact line search can be carried out efficiently. But in general, it is too expensive to find this value. More practical strategies to identify a step length that achieves adequate reductions in the objective function at minimal cost, is an inexact line search. A popular inexact line search condition stipulates that the step length α should give a sufficient decrease in the objective function f, which is measured by the following condition. Definition 2.3 (Armijo condition). Let f : M R be a locally Lipschitz function on a Riemannian manifold M with a retraction R, x M and p T x M. If the following inequality holds for a step length α and a fixed constant c (0, ) f(r x (αp)) f(x) c αf (x; p), then α satisfies in the Armijo condition. The existence of such a step size is proven later in Theorem Sufficient decrease and bactracing. The Armijo condition does not ensure that the algorithm maes reasonable progress. We present here a bactracing line search algorithm, which chooses its candidates appropriately to mae an adequate progress. An adequate step length will be found after a finite number of iterations, because α will finally become small enough that the Armijo condition holds. Algorithm 2 A bactracing line search on a Riemannian manifold. : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, scalars c, ρ (0, ). 2: Input: α 0 > 0. 3: Output: α. 4: α = α 0. 5: repeat 6: α = ρα. 7: until f(r x (αp )) f(x ) c αf (x ; p ). 8: Terminate with α = α.

5 The Wolfe conditions. There are other useful conditions to rule out unacceptably short step lengths. For example, one can use a second requirement, called the curvature condition. To present this requirement for nonsmooth functions on nonlinear spaces, some preliminaries are needed. To define the curvature condition on a Riemannian manifold, we have to translate a vector from one tangent space to another one. Definition 2.4 (Vector transport). A vector transport associated to a retraction R is defined as a continuous function T : T M T M T M, (η x, ξ x ) T ηx (ξ x ), which for all (η x, ξ x ) satisfies the following conditions: (i) T ηx : T x M T R(ηx)M is a linear map, (ii) T 0x (ξ x ) = ξ x. In short, if η x T x M and R x (η x ) = y, then T ηx transports vectors from the tangent space of M at x to the tangent space at y. Two additional properties are needed in this paper. First, the vector transport need to preserve inner products, that is, T ηx (ξ x ), T ηx (ζ x ) = ξ x, ζ x. (2.) In particular, ξ x T ηx (ξ x ) is then an isometry and possesses an isometric inverse. Second, we will assume that T satisfies the following condition, called locing condition in [6], for transporting vectors along their own direction: where T ξx (ξ x ) = β ξx T Rξx (ξ x ), β ξx = ξ x T Rξx ξ x, (2.2) T Rηx (ξ x ) = DR x (η x )(ξ x ) = d dt R x(η x + tξ x ) t=0. These conditions can be difficult to verify, but are in particular satisfied for the most natural choices of R and T ; for example the exponential map as a retraction and the parallel transport as a vector transport satisfy these conditions with β ξx =. For a further discussion, especially on construction of vector transports satisfying the locing condition, we refer to [6, Sec. 4]. We introduce more intuitive notations: T x y (ξ x ) = T ηx (ξ x ), T x y (ξ y ) = (T ηx ) (ξ y ) whenever y = R x (η x ). Now we present the nonsmooth curvature condition for locally Lipschitz functions on Riemannian manifolds. Definition 2.5 (Curvature condition). The step length α satisfies in the curvature inequality, if the following inequality holds for constant c 2 (c, ), where c is the Armijo constant. Note that if there exists ξ f(r x (αp)) such that sup ξ, T x Rx(αp)(p) c 2 f (x; p), ξ f(r x(αp)) β αp ξ, β αp T x Rx(αp)(p) c 2 f (x; p), then the curvature inequality holds. As in the smooth case, we can define a strong curvature condition by sup ξ, T x Rx(αp)(p) c 2 f (x; p). ξ f(r x(αp)) β αp The following lemma can be proved using Lemma 3. of [22]. Lemma 2.6. Let f : M R be a locally Lipschitz function on a Riemannian manifold M and the function W defined by W (α) := f(r x (αp)) f(x) c 2 αf (x; p), (2.3) where c 2 (c, ), x M and p T x M, be increasing on a neighborhood of some α 0, then α 0 satisfies the curvature condition. Indeed, if W is increasing on a neighborhood of some α 0, then there exists ξ in W (α 0 ) f(r x (α 0 p)), DR x (α 0 p)(p) c 2 f (x; p), such that ξ 0. The result will be obtained using the locing condition.

6 6 Definition 2.7 (Wolfe conditions). Let f : M R be a locally Lipschitz function and p T x M. satisfies the Armijo and curvature conditions, then we say α satisfies the Wolfe conditions. In the following theorem the existence of step lengths satisfying the Wolfe conditions under some assumptions is proved. Theorem 2.8. Assume that f : M R is a locally Lipschitz function on a Riemannian manifold M, R x : T x M M is a retraction, p T x M is chosen such that f (x; p) < 0 and f is bounded below on {R x (αp) : α > 0}, if 0 < c < c 2 <, then there exist step lengths satisfying the Wolfe conditions. Proof. Since φ(α) = f(r x (αp)) is bounded below for all α > 0 and 0 < c <, the line l(α) = f(x) + αc f (x; p) must intersect the graph φ at least once. Since if we assume l(α) < Φ(α) for all α > 0, then f (x; p) < c f (x; p) lim sup α 0 f(r x (αp)) f(x) α f (x; p), which is a contradiction. It means that there exists α > 0 such that l( α) Φ( α). But since l(α) is not bounded below and Φ(α) is bounded below, then there exists ˆα > 0, l(ˆα) = Φ(ˆα). Let α > 0 be the smallest intersecting value of α, hence f(r x (α p)) = f(x) + α c f (x; p). (2.4) It follows that the Armijo condition is satisfied for all step lengths less than α. Now by the mean value theorem, there exist ε (0, ) and ξ f(r x (ε α p)) such that If α f(r x (α p)) f(x) = α ξ, DR x (ε α p)(p). (2.5) By combining (2.4) and (2.5), we obtain ξ, DR x (ε α p)(p) = c f (x; p) > c 2 f (x; p). Using the locing condition, we conclude that ε α satisfies the curvature condition. Remar 2.9. There are a number of rules for choosing the step length α for problems on linear spaces; see [23, 32]. We can define their generalizations on Riemannian manifolds using the concepts of nonsmooth analysis on Riemannian manifolds and the notions of retraction and vector transport. For instance, one can use a generalization of the Mifflin condition, proposed first by Mifflin in [23]. The step length α satisfies the Mifflin condition if the following inequalities hold for the fixed constants c (0, ), c 2 (c, ) f(r x (αp)) f(x) c α p, sup ξ, T x Rx(αp)(p) c 2 p. ξ f(r x(αp)) β αp 2.2. Descent directions. To obtain global convergence result for a line search method, we must not only have well-chosen step lengths but also well-chosen search directions. The following definition is equivalent to gradient-orientedness carried over nonsmooth problems; see [25]. We now that the search direction for a smooth optimization problem often has the form p = P grad f(x ), where P is a symmetric and nonsingular linear map. Therefore, it is not far from expectation to use elements of the subdifferential of f at x in Definition 2.0 and produce a subgradient-oriented descent sequence in nonsmooth problems. Definition 2.0 (Subgradient-oriented descent sequence). A sequence {p } of descent directions is called subgradient-oriented if there exist a sequence of subgradients {g } and a sequence of symmetric linear maps {P : T x M T x M} satisfying 0 < λ λ min (P ) λ max (P ) Λ <, for 0 < λ < Λ < and all N such that p = P g, where λ min (P ) and λ max (P ) denote respectively the smallest and largest eigenvalues of P. In the next definition, we present an approximation of the subdifferential which can be computed approximately. As we aim at transporting subgradients from tangent spaces at nearby points of x M to the tangent space at x, it is important to define a notion of injectivity radius for R x. Let ι(x) := sup{ε > 0 R x : B(0 x, ε) B R (x, ε) is injective}. Then the injectivity radius of M with respect to the retraction R is defined as ι(m) := inf x M ι(x).

7 7 When using the exponential map as a retraction, this definition coincides with the usual one. Definition 2. (ε-subdifferential). Let f : M R be a locally Lipschitz function on a Riemannian manifold M and 0 < 2ε < ι(x). We define the ε-subdifferential of f at x denoted by ε f(x) as follows; ε f(x) = clconv{βη T x y ( f(y)) : y clb R (x, ε)}, where η = Rx (y). Every element of the ε-subdifferential is called an ε-subgradient. Definition 2.2 (ε-subgradient-oriented descent sequence). A sequence {p } of descent directions is called ε-subgradient-oriented if there exist a sequence of ε-subgradients {g } and a sequence of symmetric linear maps {P : T x M T x M} satisfying 0 < λ λ min (P ) λ max (P ) Λ <, for 0 < λ < Λ < and all N such that p = P g, where λ min (P ) and λ max (P ) denote respectively the smallest and largest eigenvalues of P. From now, we assume that a basis of T x M, for all x M is given and we denote every linear map using its matrix representation with respect to the given basis. In the following, we use a positive definite matrix P in order to define a P-norm equivalent to the induced norm induced by the inner product on our tangent space. Indeed ξ P = P ξ, ξ /2 and λ min (P ) P λ max(p ). 2. Theorem 2.3. Assume that f : M R is a locally Lipschitz function on a Riemannian manifold M, R x : T x M M is a retraction and 0 / ε f(x), g = argmin ξ εf(x) ξ P, where P is a positive definite matrix. Assume that p = P g. Then fε (x; p) = g 2 P direction, where fε (x; p) = sup ξ εf(x) ξ, P g. Proof. We first prove that fε (x; p) = g 2 P. It is clear that f ε (x; p) = sup ξ, P g g, P g = g 2 P. ξ εf(x) and p is a descent Now we claim that g 2 P ξ, P g for every ξ εf(x), which implies sup ξ εf(x) ξ, P g g 2 P. Proof of the claim: assume on the contrary; there exists ξ ε f(x) such that ξ, P g < g 2 P and consider w := g + t(ξ g) ε f(x), then g 2 P w 2 P = t(2 ξ g, P g + t ξ g, P (ξ g) ), we can assume that t is small enough such that g 2 P > w 2 P, which is a contradiction and the first part of the theorem is proved. Now we prove that p is a descent direction. Let α := ε p, then for every t (0, α), by Lebourg s mean value theorem, there exist 0 < t 0 < and ξ f(r x (t 0 tp)) such that f(r x (tp)) f(x) = ξ, DR x (t 0 tp)(tp). Using locing condition and isometric property of the vector transport, we have that f(r x (tp)) f(x) = ξ, DR x (tt 0 p)(tp) = t T x Rx(tt β 0p)(ξ), p. tt0p Since tt 0 p ε, it follows that β tt0 p T x R x(tt 0p)(ξ) ε f(x). Therefore, f(r x (tp)) f(x) tf ε (x; p). Note y clb(x, ε). The coefficient 2 guarantees inverse vector transports is well-defined on the boundary of B(x, ε).

8 A descent direction algorithm. For general nonsmooth optimization problems it may be difficult to give an explicit description of the full ε-subdifferential set. Therefore, we need an iterative procedure to approximate the ε-subdifferential. We start with a subgradient of an arbitrary point nearby x and move the subgradient to the tangent space in x and in every subsequent iteration, the subgradient of a new point nearby x is computed and moved to the tangent space in x to be added to the woring set to improve the approximation of ε f(x). Indeed, we do not want to provide a description of the entire ε-subdifferential set at each iteration, what we do is to approximate ε f(x) by the convex hull of its elements. In this way, let P be a positive definite matrix and W := {v,..., v } ε f(x), then we define Now if we have g := argmin v convw v P. f(r x ( εp p )) f(x) cε g 2 P, c (0, ) (2.6) p where p = P g, then we can say convw is an acceptable approximation for ε f(x). Otherwise, using the next lemma we add a new element of ε f(x) \ convw to W. Lemma 2.4. Let W = {v,..., v } ε f(x), 0 / convw and If we have g = argmin{ v P : v convw }. f(r x ( εp p )) f(x) > cε g 2 P, p where c (0, ) and p = P g, then there exist θ 0 (0, and v + :=β θ 0p T x R x(θ 0p)( v + ) / convw. β θ 0p T x R x(θ 0p)( v + ), p c g 2 P, ε p ] and v + f(r x (θ 0 p )) such that Proof. We prove this lemma using Lemma 3. and Proposition 3. in [22]. Define h(t) := f(r x (tp )) f(x) + ct g 2 P, t R, (2.7) and a new locally Lipschitz function G : B(0 x, ε) T x M R by G(g) = f(r x (g)), then h(t) = G(tp ) G(0) + ct g 2 P. Assume that h( ε p ) > 0, then by Proposition 3. of [22], there exists θ ε 0 [0, p ] such that h is increasing in a neighborhood of θ 0. Therefore, by Lemma 3. of [22] for every ξ h(θ 0 ), one has ξ 0. By [0, Proposition 3.] If v + f(r x (θ 0 p )) such that then by the locing condition This implies that which proves our claim. h(θ 0 ) f(r x (θ 0 p )), DR x (θ 0 p )(p ) + c g 2 P. v +, DR x (θ 0 p )(p ) + c g 2 P h(θ 0 ), β θ 0p T x R x(θ 0p)( v + ), p + c g 2 P 0. v + :=β θ 0p T x R x(θ 0p)( v + ) / convw, Now we present Algorithm 3 to find a vector v + ε f(x) which can be added to the set W in order to improve the approximation of ε f(x). This algorithm terminates after finitely many iterations; see [8]. Then we give Algorithm 4 for finding a descent direction. Moreover, Theorem 2.5 proves that Algorithm 4 terminates after finitely many iterations. Theorem 2.5. For the point x M, let the level set N = {x : f(x) f(x )} be bounded, then for each x N, Algorithm 4 terminates after finitely many iterations.

9 9 Algorithm 3 An h-increasing point algorithm; (v, t) = Increasing(x, p, g, a, b, P, c). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M and a vector transport T. 2: Input x M, g, p T x M, a, b R, c (0, ) and P a positive definite matrix such that p = P g. 3: Let t b p, b b p and a a p. 4: repeat 5: select v f(r x (tp)) such that v, β tp T x Rx(tp)(p) + c g 2 P h(t), where h is defined in (2.7), 6: if v, β tp T x Rx(tp)(p) + c g 2 P < 0 then 7: t = a+b 2 8: if h(b) > h(t) then 9: a = t 0: else : b = t 2: end if 3: end if 4: until v, β tp T x Rx(tp)(p) + c g 2 P 0 Algorithm 4 A descent direction algorithm; (g, p ) = Descent(x, δ, c, ε, P ). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, the injectivity radius ι(m) > 0 and a vector transport T. 2: Input x M, δ, c (0, ), 0 < ε < ι(m) and a positive definite matrix P. 3: Select arbitrary v ε f(x). 4: Set W = {v} and let =. 5: Step : (Compute a descent direction) 6: Solve the following minimization problem and let g be its solution: 7: if g 2 δ then Stop. 8: else let p = P g. 9: end if 0: Step 2: (Stopping condition) min v P. v convw : if f(r x ( εp p )) f(x) cε g 2 P, then Stop. p 2: end if 3: Step 3: (v, t) = Increasing(x, p, g, 0, ε, P, c). 4: Set v + = βtp T x Rx(tp )(v), W + = W {v + } and = +. Go to Step. Proof. We claim that either after a finite number of iterations the stopping condition is satisfied or for some m, g m 2 δ, and the algorithm terminates. If the stopping condition is not satisfied and g 2 > δ, then by Lemma 2.4 we find v + / convw such that v +, p c g 2 P. Note that DR x on clb(0 x, ε) is bounded by some m 0, therefore β η m for every η clb(0 x, ε). Hence by isometry property of the vector transport and by the Lipschitzness of f of the constant L, Theorem 2.9 of [0] implies that for every ξ ε f(x), ξ m L. Now, by definition, g + conv({v + } W ) has

10 0 the minimum norm, therefore for all t (0, ), g + 2 P tv + + ( t)g 2 P where the last inequality is obtained by assuming g 2 P + 2t P g, (v + g ) + t 2 v + g 2 P g 2 P 2t( c) g 2 P + 4t 2 L 2 m 2 λ max (P ) ( [( c)(2lm ) δ /2 λ min (P ) /2 λ max (P ) /2 ] 2 ) g 2 P, t = ( c)(2lm ) 2 λ max (P ) g 2 P (0, ), δ /2 (0, Lm ) and λ min (P ) g 2 P g 2 > δ. Now considering r = [( c)(2lm ) δ /2 λ min (P ) /2 λ max (P ) /2 ] 2 (0, ), it follows that g + 2 P r g 2 P... r (Lm ) 2 λ max (P ). Therefore, after a finite number of iterations g + 2 P δλ min(p ), which proves that g + 2 δ Step length selection algorithms. A crucial observation is that verifying the Wolfe conditions presented in Definition 2.7 can be impractical in case that no explicit expression for the subdifferential f(x) is available. Using an approximation of the Clare subdifferential, we overcome this problem. In the last subsection, we approximated f (x; p ) by g 2 P, where p := P g, g = argmin{ v P : v convw } and convw is an approximation of ε f(x). Therefore, in our line search algorithm we use the approximation of f (x; p) to find a suitable step length. Algorithm 5 A line search algorithm; α = Line(x, p, g, P, c, c 2 ) : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, the injectivity radius ι(m) > 0 and a vector transport T. 2: Input x M, a descent direction p in T x M with p = P g where g ε f(x) and P is a positive definite matrix and c (0, ), c 2 (c, ). 3: Set α 0 = 0, α max < ι(m), α = and i =. 4: Repeat 5: Evaluate A(α i ) := f(r x (α i p)) f(x) + c α i g 2 P 6: if A(α i ) > 0 then 7: α must be obtained by Zoom(x, p, g, P, α i, α i, c, c 2 ) 8: Stop 9: end if 0: Compute ξ f(r x (α i p)) such that ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P W (α i), where W is defined in (2.3). : if ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P 0 then α = α i 2: Stop 3: else 4: Choose α i+ (α i, α max ) 5: end if 6: i = i +. 7: End(Repeat) (2.8) The tas of a line search algorithm is to find a step size which decreases the objective function along the paths. The Wolfe conditions are used in the line search to enforce a sufficient decrease in the objective function, and to exclude unnecessarily small step sizes. Algorithm 5 is a one dimensional search procedure for the function φ(α) = f(r x (αp)) to find a step length satisfying the Armijo and curvature conditions. The procedure is a generalization of the algorithm for the well-nown Wolfe conditions for smooth functions, see [24, p ]. The algorithm has two stages. The first stage begins with a trial estimate α and eeps it increasing until it finds either an acceptable step length or an interval that contains the desired step length. The parameter α max is a user-supplied bound on the maximum step length allowed. The last step

11 Algorithm 6 α = Zoom(x, p, g, P, a, b, c, c 2 ) : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M and a vector transport T. 2: Input x M, a descent direction p in T x M with p = P g, where g ε f(x) and P is a positive definite matrix and c (0, ), c 2 (c, ), a, b R. 3: i =, a = a, b = b. 4: Repeat 5: α i = ai+bi 2 6: Evaluate A(α i ) := f(r x (α i p)) f(x) + c α i g 2 P, 7: if A(α i ) > 0 then 8: b i+ = α i, a i+ = a i. 9: else 0: Compute ξ f(r x (α i p)) such that ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P W (α i), where W is defined in (2.3). : if ξ, β αi p T x R x(α ip)(p) + c 2 g 2 P 0 then α = α i 2: Stop. 3: else a i+ = α i, b i+ = b i. 4: end if 5: end if 6: i = i +. 7: End(Repeat) of Algorithm 5 performs extrapolation to find the next trial value α i+. To implement this step we can simply set α i+ to some constant multiple of α i. In the case that Algorithm 5 finds an interval [α i, α i ] that contains the desired step length, the second stage is invoed by Algorithm 6 called Zoom which successively decreases the size of the interval. Remar 2.6. By using Lemma 3. of [22], if there exists ξ f(r x (α i p)) such that ξ, DR x (α i p)(p) + c 2 g 2 P W (α i) and ξ, DR x (α i p)(p) + c 2 g 2 P < 0, where W is defined in (2.3), then W is decreasing on a neighborhood of α i, which means that for every η W (α i ), η 0. Proposition 2.7. Assume that f : M R is a locally Lipschitz function and p is the descent direction obtained by Algorithm 4. Then either Algorithm 6 terminates after finitely many iterations or it generates a sequence of intervals [a i, b i ], such that each one contains some subintervals satisfying the Wolfe conditions and a i and b i converge to a step length a > 0. Moreover, there exist ξ, ξ 2, ξ 3 f(r x (ap)) such that ξ, T x Rx(ap)(p) c 2 g 2 P, ξ 2, T x Rx(ap)(p) c 2 g 2 P β ap β ap ξ 3, β ap T x Rx(ap)(p) c g 2 P. Proof. Suppose that the algorithm does not terminate after finitely many iterations. Since {a i } and {b i } are monotone sequences, they converge to some a and b. As we have b i a i := b a 2 i, thus b i a i converges to zero. Therefore, a = b. We claim that a i > 0 after finitely many iterations. Since p is a descent direction, then there exists α > 0 such that A(s) 0 for all s (0, α), where A(s) is defined in Algorithm 5. Note that there exists m > 0 such that for every i m, b 2 i α. If a m+ = 0, then we must have A(α i ) > 0 for all i =,..., m. Hence, we have b m+ = α m, a m = a m+ = 0 and α m+ = b m+ = b 2 2 m. Therefore, α m+ α. This implies that A(α m+ ) 0, then a m+2 = α m+. Let S be the set of all indices with a i+ = α i. Therefore, there exists ξ i f(r x (α i p)) such that ξ i, T x Rx(α ip)(p) + c 2 g 2 P < 0 β αip for all i S. Since ξ i f(r x (α i p)) and f is locally Lipschitz on a neighborhood of x, then by [0, Theorem 2.9] the sequence {ξ i } contains a convergent subsequence and without loss of generality, we can assume this

12 2 sequence is convergent to some ξ f(r x (ap)). Therefore, ξ, β ap T x Rx(ap)(p) + c 2 g 2 P 0. Since a i < b i, A(a i ) 0 and A(a i ) < A(b i ), therefore A(.) contains a step length r i such that A(.) is increasing on its neighborhood and A(r i ) 0. Since c < c 2, therefore W (.) is also increasing in a neighborhood of r i. Therefore, the Wolfe conditions are satisfied at r i. Assume that κ i, β T ri x R p x(r ip)(p) +c 2 g 2 P W (r i) for some κ i f(r x (r i p)), then κ i, β T ri x R p x(r ip)(p) +c 2 g 2 P 0. Therefore, without loss of generality, we can suppose that κ i is convergent to some ξ 2 f(r x (ap)). This implies that ξ 2, β ap T x Rx(ap)(p) +c 2 g 2 P 0. Note that A(.) is increasing on a neighborhood of r i, therefore for all η i f(r x (r i p)) with η i, β rip T x Rx(r ip)(p) + c g 2 P A(r i ), we have η i, β T ri x R p x(r ip)(p) +c g 2 P 0. As before, we can say η i is convergent to some ξ 3 in f(r x (ap)) and ξ 3, β ap T x Rx(ap)(p) + c g 2 P 0. In the next proposition, we prove that if Algorithms 6 does not terminate after finitely many iterations and converges to a, then the Wolfe conditions are satisfied at a. Proposition 2.8. Assume that f : M R is a locally Lipschitz function and p := P g is a descent direction obtained from Algorithm 4. If Algorithm 6 does not terminate after finitely many iterations and converges to a. Then there exists ξ f(r x (ap)) such that ξ, β ap T x Rx(ap)(p) = c 2 g 2 P. Proof. By Proposition 2.7, there exist ξ, ξ 2 f(r x (ap)) such that and ξ, ξ, T x Rx(ap)(p) c 2 g 2 P, ξ 2, T x Rx(ap)(p) c 2 g 2 P, β ap β ap T x Rx(ap)(p) + c 2 g 2 P, ξ 2, T x Rx(ap)(p) + c 2 g 2 P W (a), β ap β ap where W is defined in (2.3). Since W (a) is convex, therefore 0 W (a) which means there exists ξ f(r x (ap)) such that ξ, T x Rx(ap)(p) + c 2 g 2 P = 0. β ap In the finite precision arithmetic, if the length of the interval [a i, b i ] is too small, then two function values f(r x (a i p)) and f(r x (b i p)) are close to each other. Therefore, in practice, Algorithm 6 must be terminated after finitely many iterations; see [24]. If Algorithm 6 does not find a step length satisfying the Wolfe conditions, then we select a step length satisfying the Armijo condition Minimization algorithms. Finally, Algorithm 7 is the minimization algorithm for locally Lipschitz objective functions on Riemannian manifolds. Theorem 2.9. If f : M R is a locally Lipschitz function on a complete Riemannian manifold M, and N = {x : f(x) f(x )} is bounded and the sequence of symmetric matrices {P s } satisfies the following condition 0 < λ λ min (P s ) λ max (P s ) Λ <, (2.9) for 0 < λ < Λ < and all, s. Then either Algorithm 7 terminates after a finite number of iterations with g s = 0, or every accumulation point of the sequence {x } belongs to the set X = {x M : 0 f(x)}.

13 3 Algorithm 7 A minimization algorithm; x = Min(f, x, θ ε, θ δ, ε, δ, c, c 2 ). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M and the injectivity radius ι(m) > 0. 2: Input: A starting point x M, c (0, ), c 2 (c, ), θ ε, θ δ (0, ), δ (0, ), ε (0, ι(m)), = and P = I. 3: Step (Set new parameters) s = and x s = x, P s = P. 4: Step 2. (Descent direction) (g s, ps ) = Descent(xs, δ, c, ε, P s) 5: if g s = 0, then Stop. 6: end if 7: if g s 2 δ then set ε + = ε θ ε, δ + = δ θ δ, x + = x s, P + = P s, = +. Go to Step. 8: else α = Line(x s, p s, g, s P s, c, c 2 ) and construct the next iterate x s+ = R x s (αp s s+ ) and update P. Set s = s + and go to Step 2. 9: end if Proof. If the algorithm terminates after finite number of iterations, then x s is an ε stationary point of f. Suppose that the algorithm does not terminate after finitely many iterations. Assume that p s is a descent direction, since α, we have ε p f(x s+ ) f(x s ) c ε g s 2 P s p for s =, 2,..., therefore, f(x s+ ) < f(x s ) for s =, 2,... Since f is Lipschitz and N is bounded, it follows that f has a minimum in N. Therefore, f(x s ) is a bounded decreasing sequence in R, so is convergent. Thus f(x s ) f(xs+ ) is convergent to zero and there exists s such that f(x s ) f(x s+ ) c ε δ λ p, for all s s. Thus λ g s 2 g s 2 P s (f(xs ) f(xs+ ) ) p δ λ, s s. (2.0) c ε Hence after finitely many iterations, there exists s such that x + = x s. Since M is a complete Riemannian manifold and {x } N is bounded, there exists a subsequence {x i } converging to a point x M. Since convw s i i g s i i 2 P s i := min{ v 2 P s i i i Hence lim i g i = 0. Note that g i < 0, is a subset of εi f(x s i i ), then : v εi f(x s i i )} min{ v 2 P s i : v W s i i } Λδ i. i εi f(x s i i ), hence 0 f(x ). 3. Nonsmooth BFGS algorithms on Riemannian manifolds In this section we discuss the nonsmooth BFGS methods on Riemannian manifolds. Let f be a smooth function defined on R n and P be a positive definite matrix which is the approximation of the Hessian of f. We now that p = P grad f(x ) is a descent direction. The approximation of the Hessian can be updated by the BFGS method, when the computed step length satisfies in the Wolfe conditions. Indeed we assume that s = x + x, y = grad f(x + ) grad f(x ) and α satisfies the Wolfe conditions, then we have the so-called secant inequality y, s 2 > 0. Therefore, P can be updated by the BFGS method as follows; P + := P + y y T P s s T P. s, y 2 s, P s 2 The structure of smooth BFGS algorithm on Riemannian manifolds are given in several papers; see [7, 26, 27]. Note that the classical update formulas for the approximation of the Hessian have no meaning on Riemannian manifolds. First, s := T x R x (α p )(α p ),

14 4 y := β α p grad f(x + ) T x R x (α p )(grad f(x )) are vectors in the tangent space T x+ M. The inner product on tangent spaces is then given by the chosen Riemannian metric. Furthermore, the dyadic product of a vector with the transpose of another vector, which results in a matrix in the Euclidean space, is not a naturally defined operation on a Riemannian manifold. Moreover, while in Euclidean spaces the Hessian can be expressed as a symmetric matrix, on Riemannian manifolds it can be defined as a symmetric and bilinear form. However, one can define a linear function P : T x M T x M by D 2 f(x )(η, ξ) := η, P ξ, η, ξ T x M. Therefore, the approximation of the Hessian can be updated by the BFGS method as follows; P + := P + y y y s P s ( P s ) ( P, s ) s where P := T x R x (α p ) P T x R x (α p ). Now we assume that f : M R is a locally Lipschitz function and g := argmin v convw v P, p = P g, where P is a positive definite matrix and convw is an approximation of ε f(x). Let α be returned by Algorithm 5 and ξ f(r x (αp)) be such that ξ, β αp T x Rx(αp)(p) + c 2 g 2 P 0. Then for all v convw, ξ β αp T x Rx(αp)(v), T x Rx(αp)(p) > 0. β αp This shows that if we update the approximation of the Hessian matrix by the BFGS method as: P + := P + y y y s P s ( P s ) ( P, s ) s where P := T x R x (α p ) P T x R x (α p ) and s := T x R x (α p )(α p ), y := β α ξ p T x R x (α p )(g ) are vectors provided that ξ, T x R β x (α p )(p ) + c 2 g 2 0, P α p then the Hessian approximation P + is symmetric positive definite. It is worthwhile to mention that to have the global convergence of the minimization algorithm 7, the sequence of symmetric matrices {P s } must satisfy the following condition 0 < λ λ min (P s ) λ max (P s ) Λ <, (3.) for 0 < λ < Λ < and all, s. From a theoretical point of view it is difficult to guarantee (3.); see[24, page 22]. But we can translate the bounds on the spectrum of P s into conditions that only involve s and y as follow; s y y s s λ, y y s Λ. This technique is used in [24, Theorem 8.5]; see also Algorithm in [32]. It is worthwhile to mention that, in practice, Algorithm 6 must be terminated after finitely many iterations. But we need to assume that even if Algorithm 6 does not find a step length satisfying the Wolfe conditions, then we can select a step length satisfying the Armijo condition and update P s+ in Algorithm 8 by identity matrix. 4. Experiments The oriented bounding box problem [5], which aims to find a minimum volume box containing n given points in d dimensional space, is used to illustrate the performance of Algorithm 8. Suppose points are given by a matrix E R d n, where each column represents the coordinate of a point. A cost function of volume is given by d f : O d R : O V (OE) = (e i,max e i,min ), i=

15 5 Algorithm 8 A nonsmooth BFGS algorithm on a Riemannian manifold; x = subrbf GS(f, x, θ ε, θ δ, ε, δ, c, c 2 ). : Require: A Riemannian manifold M, a locally Lipschitz function f : M R, a retraction R from T M to M, the injectivity radius ι(m) > 0 and a vector transport T. 2: Input: A starting point x M, c (0, ), c 2 (c, ), θ ε, θ δ (0, ), δ (0, ), ε (0, ι(m)), =, P = I, a bound /Λ > 0 on y s y y and λ on s y s s. 3: Step (Set new parameters) s =, x s = x and P s = P. 4: Step 2. (Descent direction) (g s, ps ) = Descent(xs, δ, c, ε, P s ) 5: if g s = 0 then Stop. 6: end if 7: if g s 2 δ, then set ε + = ε θ ε, δ + = δ θ δ, x + = x s, P + = P s, = +. Go to Step. 8: else α = Line(x s, p s, g s, P s, c, c 2 ) and construct the next iterate x s+ = R x s (αp s ) and define s := T x s R x s (αp s ) (αp s ), y := β αp s ξ T x s R x s (αp s ) (g s), s := s + max(0, Λ s y )y y y. 9: if s y s s λ then, Update P s+ 0: else P s+ := I. : end if Set s = s + and go to Step 2. 2: end if := P s + y y y s P ss ( P ss ) ( P ss. ) s where O d denotes the d-by-d orthogonal group, e i,max and e i,min denote max and min entries, respectively, of the i-th row of OE. If there exists more than one entry at any row reaching maximum or minimum values for a given O, then the cost function f is not differentiable at O. Otherwise, f is differentiable and its gradient is grad f(o) = P O (T E T ), where T R d n and w e i,max e i,min, the column of e i,max ; w i-th row of T = e i,max e i,min, the column of e i,min ; 0, otherwise. for i =,..., d, w = f(o), and P O (M) = M O(O T M + M T O)/2. The qf retraction is used R X (η X ) = qf(x + η X ), where qf(m) denotes the Q factor of the QR decomposition with nonnegative elements on the diagonal of R. The vector transport by parallelization [7] is isometric and essentially identity. We modify it by the approach in [6, Section 4.2] and use the resulting vector transport satisfying the locing condition. To the best of our nowledge, it is unnown how large the injectivity radius for this retraction. But in practice, the vector transport can be represented by a matrix. Therefore, we always use the inverse of the matrix as the inverse of the vector transport. Algorithm 8 is compared with the Riemannian gradient sampling (see [5, Section 7.2] or [3, Algorithm ]) and the modified Riemannian BFGS method (see [5, Section 7.3]), which is a Riemannian generalization of [20]. The main difference between the Riemannian gradient sampling (RGS) method, the modified Riemannian BFGS method, and Algorithm 7 is the search direction. Specifically, the search direction η in RGS at x

16 6 is computed as follows: i) randomly generate m points in a small enough neighborhood of x ; ii) transport the gradients at those m points to the tangent space at x ; iii) compute the shortest tangent vector in the convex hull of the resulting tangent vectors and the gradient at x ; and iv) set η to be the shortest vector. Note that the number of points, m, is required to be larger than the dimension of the domain. The modified Riemannian BFGS method maes an assumption that the cost function is differentiable at all the iterates. It follows that the search direction is the same as the Riemannian BFGS method for smooth cost functions [6]. However, the stopping criterion is required to be modified for non-smooth cost functions. Specifically, let G be defined as follows: j =, G = {g } if Rx (x ) > ε; j = j +, G = {g () j +,..., g(), g() } if R x (x ) ε; j = J, G = {g () J+,..., g() } if R x (x ) ε;, g() where g (j) i = T xi x j (g i ), ε > 0 and positive integer J are given parameters. The J also needs to be larger than the dimension of the domain. The modified Riemannian BFGS method stops if the shortest length vector in the convex hull of G is less than δ. The tested algorithms stop if one of the following conditions is satisfied: the number of iterations reaches 5000; the step size is less than the machine epsilon ; ε 0 6 and δ 0 2. We say that an algorithm successfully terminates if it is stopped by satisfying the last condition. Note that an unsuccessfully terminated algorithm does not imply that the last iterate must be not close to a stationary point. It may also imply that the stopping criterion is not robust. The following parameters are used for Algorithm 8: ε = 0 4, δ = 0 8, θ ε = 0 2, θ δ = 0 4, λ = 0 4, Λ = 0 4, c = 0 4 and c 2 = The ε and J in the modified Riemannian BFGS method are set to be 0 6 and 2dim, respectively, where dim = d(d )/2 is the dimension of the domain. Multiple values of the parameter m in RGS are tested and given in the caption of Figures and 2. Initial iterate is given by orthonormalizing a matrix whose entries are drawn from a standard normal distribution. The code is written in C++ and is available at htm. All experiments are performed on a 64 bit Ubuntu platform with 3.6 GHz CPU (Intel Core i7-4790). The three algorithms are tested with d = 3, 4, For each value of d, we use 00 random runs. Note that the three algorithms use 00 same random seeds. Figure reports the percentage of successful runs of each algorithm. The success rate of RGS largely depends on the parameter m. Specifically, the larger m is, the higher the success rate is. Algorithm 8 always successfully terminates, which means that Algorithm 8 is more robust than all the other methods. The average number of function and gradient evaluations of successful runs and the average computational time of the successful runs are reported in Figure 2. Among the successful tests, the modified Riemannian BFGS method needs the least number of function and gradient evaluations due to its simple approach while Algorithm 8 needs the second least. For the bounding box problem, the larger the dimension of the domain is, the cheaper the function and gradient evaluations are when compared to solving the quadratic programming problem, i.e., finding the shortest length vector in a convex hull of a set of vectors. Therefore, as shown in Figure 2, even though the number of function and gradient evaluations are different for big d, the computational time is not significantly different. However, Algorithm 8 is still always slightly faster than all the other methods for all the values of d. In conclusion, the experiments show that the proposed method, Algorithm 8, is more robust and faster than RGS and the modified Riemannian BFGS method in the sense of success rate and computational time. 5. Acnowledgements We than Pierre-Antoine Absil at Université catholique de Louvain for his helpful comments. This paper presents research results of the Belgian Networ DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office. This wor was supported by FNRS under grant PDR T

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1

Introduction. A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions. R. Yousefpour 1 A Modified Steepest Descent Method Based on BFGS Method for Locally Lipschitz Functions R. Yousefpour 1 1 Department Mathematical Sciences, University of Mazandaran, Babolsar, Iran; yousefpour@umz.ac.ir

More information

FUNCTIONS ON RIEMANNIAN MANIFOLDS

FUNCTIONS ON RIEMANNIAN MANIFOLDS ε-subgradient ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS P. GROHS*, S. HOSSEINI* Abstract. This paper presents a descent direction method for finding extrema of locally Lipschitz

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Vincent D. Blondel, Julien M. Hendricx and John N. Tsitsilis July 24, 2009 Abstract

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

OPTIMALITY CONDITIONS FOR GLOBAL MINIMA OF NONCONVEX FUNCTIONS ON RIEMANNIAN MANIFOLDS

OPTIMALITY CONDITIONS FOR GLOBAL MINIMA OF NONCONVEX FUNCTIONS ON RIEMANNIAN MANIFOLDS OPTIMALITY CONDITIONS FOR GLOBAL MINIMA OF NONCONVEX FUNCTIONS ON RIEMANNIAN MANIFOLDS S. HOSSEINI Abstract. A version of Lagrange multipliers rule for locally Lipschitz functions is presented. Using Lagrange

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

Note on the convex hull of the Stiefel manifold

Note on the convex hull of the Stiefel manifold Note on the convex hull of the Stiefel manifold Kyle A. Gallivan P.-A. Absil July 9, 00 Abstract In this note, we characterize the convex hull of the Stiefel manifold and we find its strong convexity parameter.

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Rank-Constrainted Optimization: A Riemannian Manifold Approach

Rank-Constrainted Optimization: A Riemannian Manifold Approach Ran-Constrainted Optimization: A Riemannian Manifold Approach Guifang Zhou1, Wen Huang2, Kyle A. Gallivan1, Paul Van Dooren2, P.-A. Absil2 1- Florida State University - Department of Mathematics 1017 Academic

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06 A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06 CHRISTIAN LÉONARD Contents Preliminaries 1 1. Convexity without topology 1 2. Convexity

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

On the iterate convergence of descent methods for convex optimization

On the iterate convergence of descent methods for convex optimization On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.

More information

A quasisecant method for minimizing nonsmooth functions

A quasisecant method for minimizing nonsmooth functions A quasisecant method for minimizing nonsmooth functions Adil M. Bagirov and Asef Nazari Ganjehlou Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences,

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Extending a two-variable mean to a multi-variable mean

Extending a two-variable mean to a multi-variable mean Extending a two-variable mean to a multi-variable mean Estelle M. Massart, Julien M. Hendrickx, P.-A. Absil Universit e catholique de Louvain - ICTEAM Institute B-348 Louvain-la-Neuve - Belgium Abstract.

More information

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES JEREMY J. BECNEL Abstract. We examine the main topologies wea, strong, and inductive placed on the dual of a countably-normed space

More information

Characterization of lower semicontinuous convex functions on Riemannian manifolds

Characterization of lower semicontinuous convex functions on Riemannian manifolds Wegelerstraße 6 53115 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de S. Hosseini Characterization of lower semicontinuous convex functions on Riemannian manifolds INS Preprint

More information

The nonsmooth Newton method on Riemannian manifolds

The nonsmooth Newton method on Riemannian manifolds The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and

More information

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES CHRISTOPHER HEIL 1. Compact Sets Definition 1.1 (Compact and Totally Bounded Sets). Let X be a metric space, and let E X be

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Analysis in weighted spaces : preliminary version

Analysis in weighted spaces : preliminary version Analysis in weighted spaces : preliminary version Frank Pacard To cite this version: Frank Pacard. Analysis in weighted spaces : preliminary version. 3rd cycle. Téhéran (Iran, 2006, pp.75.

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε 1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Variational inequalities for set-valued vector fields on Riemannian manifolds

Variational inequalities for set-valued vector fields on Riemannian manifolds Variational inequalities for set-valued vector fields on Riemannian manifolds Chong LI Department of Mathematics Zhejiang University Joint with Jen-Chih YAO Chong LI (Zhejiang University) VI on RM 1 /

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Elements of Linear Algebra, Topology, and Calculus

Elements of Linear Algebra, Topology, and Calculus Appendix A Elements of Linear Algebra, Topology, and Calculus A.1 LINEAR ALGEBRA We follow the usual conventions of matrix computations. R n p is the set of all n p real matrices (m rows and p columns).

More information

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

A projection-type method for generalized variational inequalities with dual solutions

A projection-type method for generalized variational inequalities with dual solutions Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4812 4821 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A projection-type method

More information

ON FRACTAL DIMENSION OF INVARIANT SETS

ON FRACTAL DIMENSION OF INVARIANT SETS ON FRACTAL DIMENSION OF INVARIANT SETS R. MIRZAIE We give an upper bound for the box dimension of an invariant set of a differentiable function f : U M. Here U is an open subset of a Riemannian manifold

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

LECTURE 16: CONJUGATE AND CUT POINTS

LECTURE 16: CONJUGATE AND CUT POINTS LECTURE 16: CONJUGATE AND CUT POINTS 1. Conjugate Points Let (M, g) be Riemannian and γ : [a, b] M a geodesic. Then by definition, exp p ((t a) γ(a)) = γ(t). We know that exp p is a diffeomorphism near

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

ALEXANDER LYTCHAK & VIKTOR SCHROEDER

ALEXANDER LYTCHAK & VIKTOR SCHROEDER AFFINE FUNCTIONS ON CAT (κ)-spaces ALEXANDER LYTCHAK & VIKTOR SCHROEDER Abstract. We describe affine functions on spaces with an upper curvature bound. 1. introduction A map f : X Y between geodesic metric

More information

The Hopf argument. Yves Coudene. IRMAR, Université Rennes 1, campus beaulieu, bat Rennes cedex, France

The Hopf argument. Yves Coudene. IRMAR, Université Rennes 1, campus beaulieu, bat Rennes cedex, France The Hopf argument Yves Coudene IRMAR, Université Rennes, campus beaulieu, bat.23 35042 Rennes cedex, France yves.coudene@univ-rennes.fr slightly updated from the published version in Journal of Modern

More information

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f))

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f)) 1. Basic algebra of vector fields Let V be a finite dimensional vector space over R. Recall that V = {L : V R} is defined to be the set of all linear maps to R. V is isomorphic to V, but there is no canonical

More information

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT Linear and Nonlinear Analysis Volume 1, Number 1, 2015, 1 PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT KAZUHIRO HISHINUMA AND HIDEAKI IIDUKA Abstract. In this

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

Monotone Point-to-Set Vector Fields

Monotone Point-to-Set Vector Fields Monotone Point-to-Set Vector Fields J.X. da Cruz Neto, O.P.Ferreira and L.R. Lucambio Pérez Dedicated to Prof.Dr. Constantin UDRIŞTE on the occasion of his sixtieth birthday Abstract We introduce the concept

More information

R-Linear Convergence of Limited Memory Steepest Descent

R-Linear Convergence of Limited Memory Steepest Descent R-Linear Convergence of Limited Memory Steepest Descent Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 16T-010 R-Linear Convergence

More information

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous: MATH 51H Section 4 October 16, 2015 1 Continuity Recall what it means for a function between metric spaces to be continuous: Definition. Let (X, d X ), (Y, d Y ) be metric spaces. A function f : X Y is

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

A generic property of families of Lagrangian systems

A generic property of families of Lagrangian systems Annals of Mathematics, 167 (2008), 1099 1108 A generic property of families of Lagrangian systems By Patrick Bernard and Gonzalo Contreras * Abstract We prove that a generic Lagrangian has finitely many

More information

Generalized monotonicity and convexity for locally Lipschitz functions on Hadamard manifolds

Generalized monotonicity and convexity for locally Lipschitz functions on Hadamard manifolds Generalized monotonicity and convexity for locally Lipschitz functions on Hadamard manifolds A. Barani 1 2 3 4 5 6 7 Abstract. In this paper we establish connections between some concepts of generalized

More information

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech Math 6455 Nov 1, 26 1 Differential Geometry I Fall 26, Georgia Tech Lecture Notes 14 Connections Suppose that we have a vector field X on a Riemannian manifold M. How can we measure how much X is changing

More information

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

A Distributed Newton Method for Network Utility Maximization, I: Algorithm

A Distributed Newton Method for Network Utility Maximization, I: Algorithm A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

MA651 Topology. Lecture 10. Metric Spaces.

MA651 Topology. Lecture 10. Metric Spaces. MA65 Topology. Lecture 0. Metric Spaces. This text is based on the following books: Topology by James Dugundgji Fundamental concepts of topology by Peter O Neil Linear Algebra and Analysis by Marc Zamansky

More information

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true 3 ohn Nirenberg inequality, Part I A function ϕ L () belongs to the space BMO() if sup ϕ(s) ϕ I I I < for all subintervals I If the same is true for the dyadic subintervals I D only, we will write ϕ BMO

More information

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ. Convexity in R n Let E be a convex subset of R n. A function f : E (, ] is convex iff f(tx + (1 t)y) (1 t)f(x) + tf(y) x, y E, t [0, 1]. A similar definition holds in any vector space. A topology is needed

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

A New Sequential Optimality Condition for Constrained Nonsmooth Optimization

A New Sequential Optimality Condition for Constrained Nonsmooth Optimization A New Sequential Optimality Condition for Constrained Nonsmooth Optimization Elias Salomão Helou Sandra A. Santos Lucas E. A. Simões November 23, 2018 Abstract We introduce a sequential optimality condition

More information

Unconstrained optimization I Gradient-type methods

Unconstrained optimization I Gradient-type methods Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa www.di.unipi.it/~frangio frangio@di.unipi.it Computational Mathematics for Learning

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees Noname manuscript No. (will be inserted by the editor) A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees Frank E. Curtis Xiaocun Que May 26, 2014 Abstract

More information

Lebesgue Measure on R n

Lebesgue Measure on R n 8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

arxiv: v1 [math.na] 6 Jan 2010

arxiv: v1 [math.na] 6 Jan 2010 LEVEL SET METHODS FOR FINDING SADDLE POINTS OF GENERAL MORSE INDEX C.H. JEFFREY PANG arxiv:00.0925v [math.na] 6 Jan 200 Abstract. For a function f : X R, a point is critical if its derivatives are zero,

More information

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45 Division of the Humanities and Social Sciences Supergradients KC Border Fall 2001 1 The supergradient of a concave function There is a useful way to characterize the concavity of differentiable functions.

More information

DEVELOPMENT OF MORSE THEORY

DEVELOPMENT OF MORSE THEORY DEVELOPMENT OF MORSE THEORY MATTHEW STEED Abstract. In this paper, we develop Morse theory, which allows us to determine topological information about manifolds using certain real-valued functions defined

More information

Problem: A class of dynamical systems characterized by a fast divergence of the orbits. A paradigmatic example: the Arnold cat.

Problem: A class of dynamical systems characterized by a fast divergence of the orbits. A paradigmatic example: the Arnold cat. À È Ê ÇÄÁ Ë ËÌ ÅË Problem: A class of dynamical systems characterized by a fast divergence of the orbits A paradigmatic example: the Arnold cat. The closure of a homoclinic orbit. The shadowing lemma.

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

i=1 β i,i.e. = β 1 x β x β 1 1 xβ d

i=1 β i,i.e. = β 1 x β x β 1 1 xβ d 66 2. Every family of seminorms on a vector space containing a norm induces ahausdorff locally convex topology. 3. Given an open subset Ω of R d with the euclidean topology, the space C(Ω) of real valued

More information

Optimization methods on Riemannian manifolds and their application to shape space

Optimization methods on Riemannian manifolds and their application to shape space Optimization methods on Riemannian manifolds and their application to shape space Wolfgang Ring Benedit Wirth Abstract We extend the scope of analysis for linesearch optimization algorithms on (possibly

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

P-adic Functions - Part 1

P-adic Functions - Part 1 P-adic Functions - Part 1 Nicolae Ciocan 22.11.2011 1 Locally constant functions Motivation: Another big difference between p-adic analysis and real analysis is the existence of nontrivial locally constant

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information