A Riemannian symmetric rank-one trust-region method

Size: px
Start display at page:

Download "A Riemannian symmetric rank-one trust-region method"

Transcription

1 Tech. report UCL-INMA v2 A Riemannian symmetric ran-one trust-region method Wen Huang P.-A. Absil K. A. Gallivan January 15, 2014 Abstract The well-nown symmetric ran-one trust-region method where the Hessian approximation is generated by the symmetric ran-one update is generalized to the problem of minimizing a real-valued function over a d-dimensional Riemannian manifold. The generalization relies on basic differential-geometric concepts, such as tangent spaces, Riemannian metrics, and the Riemannian gradient, as well as on the more recent notions of (first-order) retraction and vector transport. The new method, called RTR-SR1, is shown to converge globally and d + 1-step q- superlinearly to stationary points of the objective function. A limited-memory version, referred to as LRTR-SR1, is also introduced. In this context, novel efficient strategies are presented to construct a vector transport on a submanifold of a Euclidean space. Numerical experiments Rayleigh quotient minimization on the sphere and a joint diagonalization problem on the Stiefel manifold illustrate the value of the new methods. Key words: Riemannian optimization; optimization on manifolds; symmetric ran-one update; Rayleigh quotient; joint diagonalization; Stiefel manifold 2010 Mathematics Subject Classification: 65K05, 90C48, 90C53 1 Introduction We consider the problem min f(x) (1) x M of minimizing a smooth real-valued function f defined on a Riemannian manifold M. Recently investigated application areas include image segmentation [RW12] and recognition [TVSC11], electrostatics and electronic structure calculation [WY12], finance and chemistry [Bor12], multilinear algebra [SL10, IAVD11], low-ran learning [MMBS11, BA11], and blind source separation [KS12, SAGQ12]. This paper presents research results of the Belgian Networ DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office. This wor was financially supported by the Belgian FRFC (Fonds de la Recherche Fondamentale Collective). This wor was performed in part while the third author was a Visiting Professor at the Institut de mathématiques pures et appliquées (MAPA) at Université catholique de Louvain. Department of Mathematics, 208 Love Building, 1017 Academic Way, Florida State University, Tallahassee FL , USA Department of Mathematical Engineering, ICTEAM Institute, Université catholique de Louvain, B-1348 Louvainla-Neuve, Belgium Corresponding author. absil@inma.ucl.ac.be. Phone: Fax:

2 The wealth of applications has stimulated the development of general-purpose methods for(1) see, e.g., [AMS08, RW12, SI13] and references therein including the trust-region approach upon which we focus in this wor. A well-nown technique in optimization [CGT00], the trust-region method was extended to Riemannian manifolds in [ABG07] (or see [AMS08, Ch. 7]), and found applications, e.g., in [JBAS10, VV10, IAVD11, MMBS11, BA11]. Trust-region methods construct a quadratic model m of the objective function f around the current iterate x and produce a candidate new iterate by (approximately) minimizing the model m within a region where it is trusted. Depending on the discrepancy between f and m at the candidate new iterate, the size of the trust region is updated and the candidate new iterate is accepted or rejected. For lac of efficient techniques to produce a second-order term in m that is inexact but nevertheless guarantees superlinear convergence, the Riemannian trust-region (RTR) framewor loses some of its appeal when the exact second-order term the Hessian of f is not available. This is in contrast with the Euclidean case, where several strategies exist to build an inexact second-order term that preserves superlinear convergence of the trust-region method. Among these strategies, the symmetric ran-one (SR1) update is favored in view of its simplicity and because it preserves symmetry without unnecessarily enforcing positive definiteness; see, e.g., [NW06, 6.2] for a more detailed discussion. The n+1 step q-superlinear rate of convergence of the SR1 trust-region method was shown by Byrd et al. [BKS96] using a sophisticated analysis that builds on [CGT91, KBS93]. The classical (Euclidean) SR1 trust-region method can also be viewed as a quasi-newton method, enhanced with a trust-region globalization strategy. The idea of quasi-newton methods on manifolds is not new [Gab82, 4.5], however, most of the literature of which we are aware restricts consideration to generalizing the Broyden Fletcher Goldfarb Shanno (BFGS) quasi-newton method combined with a line search strategy. Early wor such as [BM06, SL10] used Riemannian BFGS methods for a specific application without an analysis of convergence properties, but more recently, systematic analyses of Riemannian BFGS methods based on the framewor of retraction and vector transport developed in [ADM02, AMS08] have been made. Qi [Qi11] analyzed a version of Riemannian BFGS methods with retraction and vector transport restricted to exponential mapping and parallel translation and showed superlinear convergence using a Riemannian Dennis-Moré condition. Ring and Wirth [RW12] proposed and analyzed a Riemannian BFGS that avoids the restrictions on retraction and vector transport assumed by Qi but that needs to resort to the derivative of the retraction. Seibert et al. [SKH13] discussed the freedom available when generalizing BFGS to Riemannian manifolds and analyzed one generalization of BFGS method on Riemannian manifolds that are isometric to R n. Most recently, Huang [Hua13] developed a complete convergence theory that avoids the restrictions of Qi, Ring and Wirth, guarantees superlinear convergence for the Riemannian Broyden family of quasi-newton methods (including a version of SR1), and facilitates efficient implementation. In this paper, motivated by the situation described above, we introduce a generalization of the classical (i.e., Euclidean) SR1 trust-region method to the Riemannian setting (1). Besides maing use of basic Riemannian geometric concepts (tangent space, Riemannian metric, gradient), the new method, called RTR-SR1, relies on the notions of retraction and vector transport introduced in [ADM02, AMS08]. A detailed global and local convergence analysis is given. A limited-memory version of RTR-SR1, referred to as LRTR-SR1, is also introduced. Numerical experiments show that the RTR-SR1 method displays the expected convergence properties. When the Hessian of f is not available, RTR-SR1 thus offers an attractive way of tacling (1) by a trust-region approach. Moreover, even when the Hessian of f is available, maing use of it can be expensive computa- 2

3 tionally, and the numerical experiments show that ignoring the Hessian information and resorting instead to the RTR-SR1 approach can be beneficial. Another contribution of this paper with respect to [BKS96] is an extension of the analysis to allow for inexact solutions of the trust-region subproblem compare (10) with [BKS96, (2.4)]. This extension maes it possible to resort to inner iterations such as the Steihaug Toint truncated CG method (see [AMS08, 7.3.2] for its Riemannian extension) while staying within the assumptions of the convergence analysis. The paper is organized as follows. The RTR-SR1 method is stated and discussed in Section 2. The convergence analysis is carried out in Section 3. The limited-memory version is introduced in Section 4. Numerical experiments are reported in Section 5. Conclusions are drawn in Section 6. 2 The Riemannian SR1 trust-region method The proposed Riemannian SR1 trust-region (RTR-SR1) method is described in Algorithm 1. The algorithm statement is commented in Section 2.1 and the important questions of representing tangent vectors and choosing the vector transport are discussed in Sections 2.2 and A guide to Algorithm 1 Algorithm 1 can be viewed as a Riemannian version of the classical (Euclidean) SR1 trust-region method (see, e.g., [NW06, Algorithm 6.2]). It can also be viewed as an SR1 version of the Riemannian trust-region framewor [AMS08, Algorithm 10 p. 142]. Therefore, several pieces of information given in [AMS08, Ch. 7] remain relevant for Algorithm 1. In particular, the algorithm statement maes use of standard Riemannian concepts that are described, e.g., in [O N83, AMS08], such as the tangent space T x M to the manifold M at a point x, a Riemannian metric g, and the gradient gradf of a real-valued function f on M. The algorithm statement also relies on the notion of retraction, introduced in [ADM02] (or see [AMS08, 4.1]). A retraction R on M is a smooth map from the tangent bundle TM (i.e., the set of all tangent vectors to M) onto M such that, for all x M and all ξ x T x M, the curve t R(tξ x ) is tangent to ξ x at t = 0. We let R x denote the restriction of R to T x M. The domain of R need not be the entire tangent bundle, but this is usually the case in practice, and in this wor we assume throughout that R is defined wherever needed. Specific ways of constructing retractions are proposed in [ADM02, AMS08, AM12]; see also [WY12, JD13] for the important case of the Stiefel manifold. Within the Riemannian trust-region framewor, the characterizing aspect of Algorithm 1 lies in the update mechanism for the Hessian approximation B. The proposed update mechanism, based on formula (3) and on Step 6 of Algorithm 1, is a rather straightforward Riemannian generalization of the classical SR1 update B +1 = B + (y B s )(y B s ) T (y B s ) T s. Significantly less straightforward is the Riemannian generalization of the superlinear convergence result, as we will see in Section 3.4. (Observe that the local convergence result [AMS08, Theorem ] does not apply here because the Hessian approximation condition [AMS08, (7.36)] is not guaranteed to hold.) 3

4 Algorithm 1 Riemannian trust region with symmetric ran-one update (RTR-SR1) Input: Riemannian manifold M with Riemannian metric g; retraction R; isometric vector transport T S ; differentiable real-valued objective function f on M; initial iterate x 0 M; initial Hessian approximation B 0, symmetric with respect to g. 1: Choose 0 > 0, ν (0,1), c (0,0.1), τ 1 (0,1) and τ 2 > 1; Set 0; 2: Obtain s T x M by (approximately) solving s = arg min s T x M m (s) = arg min s T x M f(x )+g(gradf(x ),s)+ 1 2 g(s,b s),s.t. s ; (2) 3: Set ρ f(x ) f(r x (s )) m (0) m (s ) ; 4: Let y = T 1 S s gradf(r x (s )) gradf(x ); If g(s,y B s ) < ν s y B s, then B +1 = B, otherwise define the linear operator B +1 : T x M T x M by B +1 = B + (y B s )(y B s ), (SR1) (3) g(s,y B s ) where a denotes the flat of a T x M, i.e., a : T x M R : v g(a,v); 5: if ρ > c then 6: x +1 R x (s ); B +1 T Ss B +1 T 1 S s ; 7: else 8: x +1 x ; B +1 B +1 ; 9: end if 10: if ρ > 3 4 then 11: if s 0.8 then 12: +1 τ 2 ; 13: else 14: +1 ; 15: end if 16: else if ρ < 0.1 then 17: +1 τ 1 ; 18: else 19: +1 ; 20: end if 21: +1, goto 2 until convergence. Instrumental in the Riemannian SR1 update is the notion of vector transport, introduced in [AMS08, 8.1] as a generalization of the classical Riemannian concept of parallel translation. A vector transport on a manifold M on top of a retraction R is a smooth mapping satisfying the following properties for all x M: TM TM TM : (η x,ξ x ) T ηx (ξ x ) TM 1. (Associated retraction) T ηx (ξ x ) T Rx(ξ x)m for all ξ x T x M; 4

5 2. (Consistency) T 0x (ξ x ) = ξ x for all ξ x T x M; 3. (Linearity) T ηx (aξ x +bζ x ) = at ηx (ξ x )+bt ηx (ζ x ). The Riemannian SR1 update uses tangent vectors at the current iterate to produce a new Hessian approximation at the next iterate, hence the need to perform a vector transport (see Step 6) from the current iterate to the next. In the Input step of Algorithm 1, the requirement that the vector transport T S is isometric means that, for all x M and all ξ x,ζ x,η x T x M, the equation g(t Sηx ξ x,t Sηx ζ x ) = g(ξ x,ζ x ) (4) holds. Techniques for constructing an isometric vector transport on submanifolds of Euclidean spaces are described in Section 2.3. ThesymmetryrequirementonB 0 withrespecttotheriemannianmetricg meansthatg(b 0 ξ x0,η x0 ) = g(ξ x0,b 0 η x0 ) for all ξ x0,η x0 T x0 M. It is readily seen from (3) and Step 6 of Algorithm 1 that B is symmetric for all. Note however that B is, in general, not positive definite. A possible stopping criterion for Algorithm 1 is gradf(x ) < ǫ for some specified ǫ > 0, where, which also appears in the statement of Algorithm 1, denotes the norm induced by the Riemannian metric g, i.e., ξ = g(ξ,ξ). (5) In the spirit of [RW12, Remar 4], we point out that it is possible to formulate the SR1 update (3) in the new tangent space T x+1 M; in the present case of SR1, the algorithm remains equivalent since the vector transport is isometric. Otherwise, Algorithm 1 does not call for comments other than those made in [AMS08, Ch. 7]. In particular, we point out that the meaning of approximately in Step 2 of Algorithm 1 depends on the desired convergence results. We will see in the convergence analysis (Section 3) that enforcing the Cauchy decrease (9) is enough to ensure global convergence to stationary points, but another condition such as (10) is needed to guarantee superlinear convergence. The truncated CG method, discussed in [AMS08, 7.3.2] in the Riemannian context, is an inner iteration for Step 2 that returns an s satisfying conditions (9) and (10). 2.2 Representation of tangent vectors Let us now consider the frequently encountered situation where the manifold M is described as a d-dimensional submanifold of an m-dimensional Euclidean space E. In particular, this is the case of the sphere and the Stiefel manifold involved in the numerical experiments in Section 5. A tangent vector in T x M can be represented either by its d-dimensional vector of coordinates in a given basis B x of T x M, or else as an m-dimensional vector in E since T x M T x E E. The latter option may be preferable when the codimension m d is small (e.g., the sphere) because building, storing and manipulating the basis B x of T x M may be inconvenient. Liewise, since B is a linear transformation of T x M, it can be represented in the basis B x as a d d matrix, or as an m m matrix restricted to act on T x M. Here again, the latter approach may be computationally more efficient when the codimension m d is small. A related choice has to be made for the representation of the vector transport, since T ηx is a linear map from T x M to T Rx(η x)m. This question is addressed in Section

6 2.3 Isometric vector transport We present two ways of constructing an isometric vector transport on a d-dimensional submanifold M of an m-dimensional Euclidean space E Vector transport by parallelization An open subset U of M is termed parallelizable if it admits a smooth field of tangent bases, i.e., a smooth function B : U R m d : z B z where B z is a basis of T z M. The whole manifold M itself may not be parallelizable; in particular, every manifold of nonzero Euler characteristic is not parallelizable [Sti35, 1.6], and it is also nown that the only parallelizable spheres are those of dimension 1, 3, and 7 [Boo03, p. 116]. However, for the global convergence analysis carried out in Section 3.2, the vector transport is not required to be smooth or even continuous, and for the local convergence analysis in Section 3.4, we only need a parallelizable neighborhood U of the limit point x. Such a neighborhood always exists (tae for example a coordinate neighborhood [AMS08, p. 37]). Given an orthonormal smooth field of tangent bases B, i.e., such that B xb x = I for all x (where I stands for the identity matrix of adequate size), the proposed isometric vector transport from T x M to T y M is given by T = B y B x. (6) The d d matrix representation of this vector transport in the pair of bases (B x,b y ) is simply the identity. This considerably simplifies the implementation of Algorithm Vector transport by rigging If M is described as a d-dimensional submanifold of an m-dimensional Euclidean space E and the codimension (m d) is much smaller than the dimension d, then the vector transport by rigging, introduced next, may be preferable. For generality, we do not assume that M is a Riemannian submanifold of E; in other words, the Riemannian metric g on M may not be the one induced by the metric of E. A motivation for this generality is to be able to handle the canonical metric of the Stiefel manifold [EAS98, (2.22)]. For simplicity of the exposition, we wor in an orthonormal basis of E and, for x M, we let G x denote a matrix expression of g x, i.e., g x (ξ x,η x ) = ξ T xg x η x for all ξ x,η x T x M. An open subset U of M is termed rigged if it admits a smooth field of normal bases, i.e., a smooth function N : U R m d : z N z where N z is a basis of the normal space N z M. The whole manifold M itself may not be rigged, but it is always locally rigged. Given a smooth field of normal bases N, the proposed isometric vector transport T from T x M to T y M is defined as follows. Compute (I N x (N T x N x ) 1 N T x )N y (i.e., the orthogonal projection of N y onto T x M) and observe that its column space is T x M (T x M T y M). Obtain an orthonormal matrix Q x by Gram-Schmidt orthonormalizing (I N x (N T x N x ) 1 N T x )N y. Proceed liewise with x and y interchanged to get Q y. Finally, let T = G 1 2 y (I Q x Q T x Q y Q T x)g 1 2 x. (7) While it is clear that T satisfies the three properties of vector transport mentioned in Section 2.1, proving that T is (locally) smooth remains an open question. Moreover, the column space of 6

7 (I N x (N T x N x ) 1 N T x )N y gets more sensitive to numerical errors as the distance between x and y decreases. Nevertheless, there is evidence that T is smooth indeed, and we have observed that using vector transport by rigging in Algorithm 1 is a worthy alternative in large-scale low-codimension problems. 3 Convergence analysis of RTR-SR1 3.1 Notation and standing assumptions Throughout the convergence analysis, unless otherwise specified, we let {x }, {B }, { B }, {s }, {y }, and { } be infinite sequences generated by Algorithm 1, and we mae use of the notation introduced in that algorithm. We let Ω denote the sublevel set of x 0, i.e., Ω = {x M : f(x) f(x 0 )}. The global and local convergence analyses each mae standing assumptions at the beginning of their respective sections. The numbered assumptions introduced below are not standing assumptions and will be invoed specifically whenever needed. Note that, apart from Assumption 3.6, all the numbered assumptions are Riemannian generalizations of assumptions made in [BKS96] for the analysis of the Euclidean SR1 trust-region method. 3.2 Global convergence analysis In some results, we will assume for the retraction R that there exists µ > 0 and δ µ > 0 such that ξ µdist(x,r x (ξ)) for all x Ω, for all ξ T x M, ξ δ µ. (8) This corresponds to [AMS08, (7.25)] restricted to the sublevel set Ω. Such a condition is instrumental in the global convergence analysis of Riemannian trust-region schemes. Note that, in view of [RW12, Lemma 6], condition (8) can be shown to hold globally under the condition that R has equicontinuous derivatives. The next assumption corresponds to [BKS96, (A3)]. Assumption 3.1. The sequence of linear operators {B } is bounded by a constant M such that B M for all. We will often require that the trust-region subproblem (2) is solved accurately enough that, for some positive constants σ 1 and σ 2, and that gradf(x ) m (0) m (s ) σ 1 gradf(x ) min{,σ 2 }, (9) B B s = gradf(x )+δ with δ gradf(x ) 1+θ, whenever s 0.8, (10) where θ > 0 is a constant. These conditions are generalizations of [BKS96, (2.3 4)]. Observe that, even if we restrict to the Euclidean case, condition (10) remains weaer than condition [BKS96, (2.4)]. The purpose of introducing δ in (10) is to encompass stopping criteria such as [AMS08, 7

8 (7.10)] that do not require the computation of an exact solution of the trust-region subproblem. We point out in particular that (9) and (10) hold if the approximate solution of the trust-region subproblem (2) is obtained from the truncated CG method, described in [AMS08, 7.3.2] in the Riemannian context. We can now state and prove the main global convergence results. Point (iii) generalizes [BKS96, Theorem 2.1] while points (i) and (ii) are based on [AMS08, 7.4.1]. Theorem 3.1 (convergence). (i) If f C 2 is bounded below on the sublevel set Ω, Assumption 3.1 holds, condition (9) holds, and (8) is satisfied then lim gradf(x ) = 0. (ii) If f C 2, M is compact, Assumption 3.1 holds, and (9) holds then lim gradf(x ) = 0, {x } has at least one limit point, and every limit point of {x } is a stationary point of f. (iii) If f C 2, the sublevel set Ω is compact, f has a unique stationary point x in Ω, Assumption 3.1 holds, condition (9) holds, and (8) is satisfied then {x } converges to x. Proof. (i) Observe that the proof of [AMS08, Theorem 7.4.4] still holds when condition [AMS08, (7.25)] is weaened to its restriction (8) to Ω. Indeed, since the trust-region method is a descent iteration, it follows that all iterates are in Ω. The assumptions thus allow us to conclude, by [AMS08, Theorem 7.4.4], that lim gradf(x ) = 0. (ii) It follows from [AMS08, Proposition 7.4.5] and [AMS08, Corollary 7.4.6] that all the assumptions of [AMS08, Theorem 7.4.4] hold. Hence lim gradf(x ) = 0, and every limit point is thus a stationary point of f. Since M is compact, {x }isguaranteedtohaveatleastonelimitpoint. (iii)againby[ams08, Theorem7.4.4], we get that lim gradf(x ) = 0. Since {x } belongs to the compact set Ω and cannot have limit points other than x, it follows that {x } converges to x. 3.3 More notation and standing assumptions For the purpose of conducting a local convergence analysis, we now assume that {x } converges to a point x. Moreover, we assume throughout that f C 2. We let U trn be a totally retractive neighborhood of x, a concept inspired from the notion of totally normal neighborhood (see [dc92, 3.3]). By this, we mean that there is δ trn > 0 such that, for each y U trn, we have that R y (B(0 y,δ trn )) U trn and R y ( ) is a diffeomorphism on B(0 y,δ trn ), where B(0 y,δ trn ) denotes the ball of radius δ trn in T y M centered at the origin 0 y. The existence of a totally retractive neighborhood can be shown along the lines of [dc92, Theorem 3.3.7]. We assume without loss of generality that {x } U trn. Whenever we consider an inverse retraction R 1 x (y), we implicitly assume that x,y U trn. 3.4 Local convergence analysis The purpose of this section is to obtain a superlinear convergence result for Algorithm 1, stated in Theorem The analysis can be viewed as a Riemannian generalization of the local analysis in [BKS96, 2]. As we proceed, we will point out the main hurdles that had to be overcome in the generalization. The analysis maes use of several preparation lemmas, independent of Algorithm 1, that are of potential interest in the broader context of Riemannian optimization. These preparation lemmas become trivial or well nown in the Euclidean context. The next assumption corresponds to a part of [BKS96, (A1)]. Assumption 3.2. The point x is a nondegenerate local minimizer of f. In other words, gradf(x ) = 0 and Hessf(x ) is positive definite. 8

9 The next assumption generalizes the assumption, contained in [BKS96, (A1)], that the Hessian of f is Lipschitz continuous near x. (Recall that T S is the vector transport invoed in Algorithm 1.) Note that the assumption holds if f C 3 ; see Lemma 3.5. Assumption 3.3. There exists a constant c 0 such that for all x,y U trn, Hessf(y) T Sη Hessf(x)T 1 S η cdist(x,y), where Hessf(x) is the Riemannian Hessian of f at x (see, e.g., [AMS08, 5.5]), η = R 1 x (y), and is also used to denote the operator norm induced by the Riemannian norm (5). The next assumption is introduced to handle the Riemannian case; in the classical Euclidean setting, Assumption 3.4 follows from Assumption 3.3. Assumption 3.4 is mild since it holds if f C 3, as shown in Lemma 3.5. Assumption 3.4. There exists a constant c 0 such that for all x,y U trn, all ξ x T x M with R x (ξ x ) U trn, and all ξ y T y M with R y (ξ y ) U trn, the inequality Hess ˆf y (ξ y ) T Sη Hess ˆf x (ξ x )T 1 S η c 0 ( ξ y + ξ x + η ) holds, where η = R 1 x (y), ˆf x = f R x, and ˆf y = f R y. The next assumption corresponds to [BKS96, (A2)]. It implies that no updates of B are sipped. In the Euclidean case, Khalfan et al. [KBS93] show that this is usually the case in practice. Assumption 3.5. The inequality holds. g(s,y B s ) ν s y B s The next assumption is introduced to handle the Riemannian case. It states that the iterates eventually continuously stay in the totally retractive neighborhood U trn (the terminology is borrowed from [ATV13, Definition 2.8]). The assumption is needed, in particular, for Lemma 3.6. Note that, whereas in the Euclidean setting the assumption follows from the standing assumption that {x } converges to x, this is no longer the case on some Riemannian manifolds, where {x } may converge to x while the connecting segments {R x (ts ) : t [0,1]} do not. Assumption 3.6 is thus invoed to ensure that we are in a position to carry out a local convergence analysis. Assumption 3.6. There exists N such that, for all N and all t [0,1], R x (ts ) U trn. The next lemma is proved in [GQA12, Lemma 14.1]. Lemma 3.2. Let M be a Riemannian manifold, let U be a compact coordinate neighborhood in M, and let the hat denote coordinate expressions. Then there are c 2 > c 1 > 0 such that, for all x,y U, we have c 1 ˆx ŷ 2 dist(x,y) c 2 ˆx ŷ 2, where 2 denotes the Euclidean norm, i.e., ˆx 2 = ˆx Tˆx. 9

10 Lemma 3.3. Let M be a Riemannian manifold endowed with a retraction R and let x M. Then there exist a 0 > 0, a 1 > 0, and δ a0,a 1 > 0 such that for all x in a sufficiently small neighborhood of x and all ξ,η T x M with ξ δ a0,a 1 and η δ a0,a 1, the inequalities hold. a 0 ξ η dist(r x (η),r x (ξ)) a 1 ξ η Proof. Since R is smooth, we can choose a neighborhood small enough such that R satisfies the condition of [RW12, Lemma 6], and the result follows from that lemma. The following lemma follows from Lemma 3.3 by setting η = 0. We state it separately for convenience as we will frequently invoe it in the analysis. Lemma 3.4. Let M be a Riemannian manifold endowed with retraction R and let x M. Then there exist a 0 > 0, a 1 > 0, and δ a0,a 1 > 0 such that for all x in a sufficiently small neighborhood of x and all ξ T x M with ξ δ a0,a 1, the inequalities hold. a 0 ξ dist(x,r x (ξ)) a 1 ξ Lemma 3.5. If f C 3, then Assumptions 3.3 and 3.4 hold. Proof. First, we prove that Assumption 3.3 holds. Define a function h : M M T M TM,(x,y,ξ y ) T Sη Hessf(x)T 1 S η ξ y, where η = Rx 1 (y). Since f C 3, we now that h(x,y,ξ y ) is C 1. Then there exists b 0 such that for all x,y U trn, ξ y T y M, ξ y = 1, h(y,y,ξ y ) h(x,y,ξ y ) b 0 dist({y,y,ξ y },{x,y,ξ y }) where b 0, b 1 and b 2 are some constants. So we have b 1 {ŷ,ŷ, ˆξ y } {ˆx,ŷ, ˆξ y } 2 (by Lemma 3.2) = b 1 ŷ ˆx 2 b 2 dist(y,x), (by Lemma 3.2) b 2 dist(y,x) h(y,y,ξ y ) h(x,y,ξ y ) = (Hessf(y) T Sη Hessf(x)T 1 S η )ξ y Given any linear operator A on T y M, we have A by definition is sup ξ =1 Aξ. Note that ξ = 1 is a compact set. Hence, there exists ξ = 1 such that A = Aξ. Therefore, we can choose ξ y, ξ y = 1 such that (Hessf(y) T Sη Hessf(x)T 1 S η )ξ y = (Hessf(y) T Sη Hessf(x)T 1 S η ). We obtain Hessf(y) T Sη Hessf(x)T 1 S η b 2 dist(y,x). 10

11 To prove Assumption 3.4, we redefine h as h(y,x,ξ x ) = T Sη Hess ˆf x (ξ x )T 1 S η. If we use orthonormal vector fields to obtain the coordinate expression of h, denoted by ĥ, then the manifold norm and the Euclidean norm of coordinate expressions are the same and we have Hess ˆf y (ξ y ) T Sη Hess ˆf x (ξ x )T 1 S η = Hess ˆf y (ˆξ y ) ˆT Sη Hess ˆf x (ˆξ x )ˆT 1 S η 2. (11) Since f C 3, we now that ĥ is also in C1. Hence there exists a constant b 3 such that Therefore ĥ(ŷ,ŷ, ˆξ y ) ĥ(ŷ,ˆx, ˆξ x ) 2 b 3 {ŷ,ŷ, ˆξ y } {ŷ,ˆx, ˆξ x } 2. Hess ˆf y (ˆξ y ) ˆT Sη Hess ˆf x (ˆξ x )ˆT 1 S η 2 = ĥ(ŷ,ŷ, ˆξ y ) ĥ(ŷ,ˆx, ˆξ x ) 2 This and (11) give us Assumption 3.4. b 3 {ŷ,ŷ, ˆξ y } {ŷ,ˆx, ˆξ x } 2 b 4 ( ŷ ˆx 2 + ˆξ y 2 + ˆξ x 2 ) b 5 (dist(x,y)+ ˆξ y 2 + ˆξ x 2 ) (by Lemma 3.2) b 6 ( η + ξ y + ξ x ) (by Lemma 3.4) The next lemma generalizes [BKS96, Lemma 2.2]. The ey difference with the Euclidean case is the following: in the Euclidean case, when s is accepted, we simply have s = x +1 x, while in the Riemannian generalization, we invoe Assumption 3.6 and Lemma 3.4 to deduce that s 1 a 0 dist(x +1,x ). Note that Assumption 3.6 cannot be removed. To see this, consider for example the unit sphere with the exponential retraction, where we can have x = x +1 with s = 2π. Lemma 3.6. Suppose Assumption 3.6 holds. Then either or there exist K > 0 and > 0 such that for all > K In either case s 0. 0 (12) =. (13) Proof. Let = liminf and suppose first that > 0. From line 11 of Algorithm 1, if is increased, then s 0.8 andx +1 = R x (s ), whichimpliesbylemma3.4andassumption3.6 that dist(x,x +1 ) a The latter inequality cannot hold for infinitely many values of since x x and liminf > 0. Hence, there exists K 0 such that is not increased for any K. Since > 0, this implies that for all K. In view of the trust-region update mechanism in Algorithm 1 and since = liminf, we also now that, for some K 1 > K, K1 < 1 τ 1. If the trust region radius were to be decreased we would have K1 +1 <, which we have ruled out. Since neither increase nor decrease can occur, we must have = for all K 1. Suppose now that = 0. Since x x, for every ǫ > 0 there exists K ǫ 0 such that dist(x +1,x ) < ǫ for all K ǫ. Since liminf = 0, there exists j K ǫ such that j < ǫ. But 11

12 since is increased only if s 1 0.8a 0 dist(x +1,x ) < ǫ 0.8a 0, and the increase factor is τ 2, we have that < τ 2ǫ 0.8a 0 for all j. Therefore (12) follows. To show that s 0, note that if (12) is true, then clearly s 0. If (13) is true, then for all > K, the step s is accepted and s 1 a 0 dist(x +1,x ) (by Lemma 3.4), hence s 0 since {x } converges. Lemma 3.7. Let M be a Riemannian manifold endowed with two vector transports T 1 and T 2, and let x M. Then there exist a constant a 4 and a neighborhood U of x such that for all x,y U and all ξ T y M, T1 1 η ξ T2 1 η ξ a 4 ξ η, where η = R 1 x (y). Proof. We use the hat to denote coordinate expressions. Let T 1 (ˆx,ˆη) and T 2 (ˆx,ˆη) denote the coordinate expression of T1 1 η and T2 1 η, respectively. Then T 1 1 η ξ T 1 2 η ξ b 0 (T 1 (ˆx,ˆη) T 2 (ˆx,ˆη))ˆξ 2 b 0 ˆξ 2 T 1 (ˆx,ˆη) T 2 (ˆx,ˆη) 2 b 1 ˆξ 2 ˆη 2 (since T 1 (ˆx,0) = T 2 (ˆx,0) and both T 1 and T 2 are smooth) b 2 ξ η for some constants b 0, b 1, and b 2. The next lemma is proved in [GQA12, Lemma 14.5]. Lemma 3.8. Let F be a C 1 vector field on a Riemannian manifold M and let x M be a nondegenerate zero of F. Then there exist a neighborhood U of x and a 5,a 6 > 0 such that for all x U, a 5 dist(x, x) F(x) a 6 dist(x, x). In the Euclidean case, the next lemma holds with ã 7 = 0 and reduces to the Fundamental Theorem of Calculus. Lemma 3.9. Let F be a C 1 vector field on a Riemannian manifold M, let R be a retraction on M, and let x M. Then there exist a neighborhood U of x and a constant ã 7 such that for all x,y U, Pγ 0 1 F(y) F(x) [ 1 0 Pγ 0 t DF(γ(t))Pγ t 0 dt]η ã 7 η 2, where η = R 1 x (y) and P γ is the parallel translation along the curve γ given by by γ(t) = R x (tη). Proof. Define G : [0,1] T x M : t G(t) = Pγ 0 t F(γ(t)). Observe that G(0) = F(x) and G(1) = Pγ 0 1 F(y). We have G (t) = d dǫ G(t+ǫ) ǫ=0 = P 0 t γ d γ F(γ(t+ǫ)) ǫ=0 dǫ Pt t+ǫ = P 0 t γ DF(γ(t))[ d dǫ γ(t+ǫ)] ǫ=0 = Pγ 0 t DF(γ(t))[T R( tη)η], 12

13 where we have used an expression of the covariant derivative D in terms of the parallel translation P (see, e.g., [Cha06, theorem I.2.1]), and where T R(tη) η = d dt (R(tη)). Since G(1) G(0) = 1 0 G (t)dt, we obtain = Pγ 0 1 F(y) F(x) where b 0 is some constant. 0 P 0 t γ P 0 t γ P 0 t γ 1 0 P 0 t γ DF(γ(t))(T R(tη) η Pγ t 0 η)dt DF(γ(t))P t 0 γ DF(γ(t))P t 0 γ b 0 η 2 (by Lemma 3.7) DF(γ(t))Pγ t 0 ηdt (Pγ 0 t T R( tη)η η) dt (Pγ 0 t T R(tη) η T 1 R(tη) T R(tη)η) dt Lemma Suppose Assumptions 3.2 and 3.3 hold. Then there exist a neighborhood U and a constant a 7 such that for all x 1, x 1, x 2, and x 2 U, we have g(t Sζ ξ 1,y 2 ) g(t Sζ y 1,ξ 2 ) a 7 max{dist(x 1,x ),dist(x 2,x ),dist( x 1,x ),dist( x 2,x )} ξ 1 ξ 2, where ζ = R 1 x 1 (x 2 ), ξ 1 = R 1 x 1 ( x 1 ), ξ 2 = R 1 x 2 ( x 2 ), y 1 = T 1 S ξ1 gradf( x 1 ) gradf(x 1 ), and y 2 = T 1 S ξ2 gradf( x 2 ) gradf(x 2 ). Proof. Define ȳ 1 = Pγ gradf( x 1 ) gradf(x 1 ) and ȳ 2 = Pγ gradf( x 2 ) gradf(x 2 ), where P is the parallel transport, γ 1 (t) = R x1 (tξ 1 ), and γ 2 (t) = R x2 (tξ 2 ). From Lemma 3.9, we have ȳ 1 H 1 (x 1, x 1 )ξ 1 b 0 ξ 1 2 and ȳ 2 H 2 (x 2, x 2 )ξ 2 b 0 ξ 2 2, (14) where H 1 (x 1, x 1 ) = 1 0 P0 t γ 1 Hessf(γ 1 (t))pγ t 0 1 dt, H2 (x 2, x 2 ) = 1 b 0 is a constant. It follows that g(t Sζ ξ 1,y 2 ) g(t Sζ y 1,ξ 2 ) 0 P0 t γ 2 g(t Sζ ξ 1,ȳ 2 ) g(t Sζ ȳ 1,ξ 2 ) + g(t Sζ ξ 1,y 2 ȳ 2 ) g(t Sζ (y 1 ȳ 1 ),ξ 2 ) Hessf(γ 2 (t))pγ t 0 2 dt, and g(t Sζ ξ 1, H 2 (x 2, x 2 )ξ 2 ) g(t Sζ H1 (x 1, x 1 )ξ 1,ξ 2 ) +b 1 ( ξ 1 + ξ 2 ) ξ 1 ξ 2 (by (14)) + g(t Sζ ξ 1,T 1 S ζ2 gradf( x 2 ) Pγ gradf( x 2 )) + g(t Sζ (T 1 S ζ1 gradf( x 1 ) P 0 1 γ 1 gradf( x 1 )),ξ 2 ) g(t Sζ ξ 1, H 2 (x 2, x 2 )ξ 2 ) g(t Sζ H1 (x 1, x 1 )ξ 1,ξ 2 ) +b 1 ( ξ 1 + ξ 2 ) ξ 1 ξ 2 +b 2 ξ 1 ξ 2 gradf( x 2 ) +b 3 ξ 1 ξ 2 gradf( x 1 ) (by Lemma 3.7) g( H 2 (x 2, x 2 )T Sζ ξ 1,ξ 2 ) g(t Sζ H1 (x 1, x 1 )ξ 1,ξ 2 ) (average Hessian is also self-adjoint) +b 4 ξ 1 ξ 2 (dist(x 1, x 1 )+dist(x 2, x 2 )+dist( x 2,x )+dist( x 1,x )) (by Lemmas 3.4 and 3.8) b 5 ξ 1 ξ 2 max{dist(x 1,x ),dist(x 2,x ),dist( x 1,x ),dist( x 2,x )} (by triangle inequality of distance) + g( H 2 (x 2, x 2 )T Sζ ξ 1,ξ 2 ) g(t Sζ H1 (x 1, x 1 )ξ 1,ξ 2 ) (15) 13

14 where b 1,b 2, b 3, b 4 and b 5 are some constants. Using hat to denote coordinate expressions, T(ˆx 1,ˆx 2 ) to denote T ζ and Ĝ(ˆx 2) to denote the matrix expression of the Riemannian metric at x 2, we have g( H 2 (x 2, x 2 )T Sζ ξ 1,ξ 2 ) g(t Sζ H1 (x 1, x 1 )ξ 1,ξ 2 ) = ˆξ 1 T T(ˆx 1,ˆx 2 ) T ˆ H2 (ˆx 2,ˆ x 2 ) T Ĝ(ˆx 2 )ˆξ 2 ˆξ 1 T ˆ H1 (ˆx 1,ˆ x 1 ) T T(ˆx 1,ˆx 2 ) T Ĝ(ˆx 2 )ˆξ 2 ˆξ 1 2 T(ˆx 1,ˆx 2 ) T ˆ H2 (ˆx 2,ˆ x 2 ) T ˆ H1 (ˆx 1,ˆ x 1 ) T T(ˆx 1,ˆx 2 ) T 2 Ĝ(ˆx 2) 2 ˆξ 2 2 (16) where 2 denotes the Euclidean norm. Define a function J(ˆx 1,ˆ x 1,ˆx 2,ˆ x 2 ) = T(ˆx 1,ˆx 2 ) T ˆ H2 (ˆx 2,ˆ x 2 ) T ˆ H1 (ˆx 1,ˆ x 1 ) T T(ˆx 1,ˆx 2 ) T. We can see that when (ˆx T 1,ˆ x T 1 ) = (ˆxT 2,ˆ x T 2 ), J = 0. Since, in view of Assumption 3.3, J is Lipschitz continuous, it follows that (16) becomes g( H 2 T Sζ ξ 1,ξ 2 ) g(t Sζ H1 ξ 1,ξ 2 ) b 6 (ˆx T 1,ˆ x T 1) (ˆx T 2,ˆ x T 2) 2 ˆξ 1 2 ˆξ 2 2 b 7 ξ 1 ξ 2 max{dist(x 1,x 2 ),dist( x 1, x 2 )}, where b 6,b 7 are some constants. Combining this equation with (15), we obtain g(t Sζ ξ 1,y 2 ) g(t Sζ y 1,ξ 2 ) b 8 ξ 1 ξ 2 max{dist(x 1,x ),dist(x 2,x ),dist( x 1,x ),dist( x 2,x )}, where b 8 is a constant. Lemma Let M be a Riemannian manifold endowed with a vector transport T with associated retraction R, and let x M. Then there is a neighborhood U of x and a 8 such that for all x,y U, id T 1 ξ T 1 η T ζ a 8 max(dist(x, x),dist(y, x)), id T 1 ζ T η T ξ a 8 max(dist(x, x),dist(y, x)), where ξ = R 1 x (x), η = R 1 x (y), ζ = R 1 x (y), id is the identity operator, and is the operator norm induced by the Riemannian metric. Proof. Let the hat denote coordinate expressions, chosen such that the matrix expression of the Riemannian metric at x is the identity. Let L(x,y) denote T R 1 x (y). We have id T 1 ξ T 1 η T ζ = I L( x,x) 1 L(x,y) 1 L( x,y). Define a function J( x,ξ,ζ) = I L( x,r x (ξ)) 1 L(R x (ξ),r x (ζ)) 1 L( x,r x (ζ)). Notice that J is a smooth function and J( x,0 x,0 x ) = 0. So J( x,ξ,ζ) = J( x,ξ,ζ) J( x,0 x,0 x ) = Ĵ(ˆ x, ˆξ, ˆζ) Ĵ(ˆ x,ˆ0 x,ˆ0 x ) 2 b 0 ( ˆξ 2 + ˆζ 2 ) (smoothness of J) b 1 (dist(x, x)+dist(y, x)) (by Lemma 3.4) b 2 max(dist(x, x),dist(y, x)), where b 0,b 1 and b 2 are some constants and 2 denotes the Euclidean norm. So id T 1 ξ Tη 1 T ζ b 2 max(dist(x, x),dist(y, x)). This concludes the first part of the proof. The second part of the result follows from a similar argument. 14

15 The next lemma generalizes [CGT91, Lemma 1]. It is instrumental in the proof of Lemma 3.14 below. In the Euclidean setting, it is possible to give an expression for a 9 and a 10 in terms of c of Assumption 3.3 and ν of Assumption 3.5. In the Riemannian setting, we could not obtain such an expression, in part because the constant b 2 that appears in the proof below is no longer zero. However, the existence of a 9 and a 10 can still be shown, under the assumption that {x } converges to x, and this is all we need in order to carry on with Lemma Lemma Suppose Assumptions 3.1, 3.2, 3.3, and 3.5 hold. Then for all j. Moreover, there exist constants a 9 and a 10 such that for all j, i j +1, where ǫ i,j = max j i dist(x,x ) and with ζ j,i = R 1 x j (x i ). Proof. From (3), we have y j B j+1 s j = 0 (17) y j (B i ) j s j a 9 a i j 2 10 ǫ i,j s j (18) (B i ) j = T 1 S ζj,i B i T Sζj,i B j+1 s j = (B j + (y j B j s j )(y j B j s j ) )s j = y j. g(s j,y j B j s j ) This yields (17), as well as (18) with i = j +1. The proof of (18) for i > j +1 is by induction. We choose j +1 and assume that (18) holds for all i = j +1,...,. Let r = y B s. We have g(r,t Sζj, s j ) = g(y B s,t Sζj, s j ) g(y,t Sζj, s j ) g(s,t Sζj, y j ) + g(s,t Sζj, (y j (B ) j s j )) + g(s,t Sζj, ((B ) j s j )) g(b s,t Sζj, s j ) g(y,t Sζj, s j ) g(s,t Sζj, y j ) + T Sζj, (y j (B ) j s j ) s + g(s,b T Sζj, s j ) g(b s,t Sζj, s j ) g(y,t Sζj, s j ) g(s,t Sζj, y j ) +b 0 a 9 a j 2 10 ǫ,j s j s (B self-adjoint and induction assumption) b 0 a 9 a j 2 10 ǫ,j s j s +b 1 ǫ +1,j s s j, (by Lemma 3.10) where b 0 and b 1 are some constants. It follows that y j (B +1 ) j s j = y j T 1 S ζj,+1 B +1 T Sζj,+1 s j = y j T 1 S ζj,+1 T Ss B+1 T 1 S s T Sζj,+1 s j y j T 1 S ζj, B+1 T Sζj, s j + T 1 S ζj, B+1 T Sζj, s j T 1 S ζj,+1 T Ss B+1 T 1 S s T Sζj,+1 s j y j ((B ) j +T 1 S ζj, (r )(r ) g(s,r ) T S ζj, )s j +b 2 ǫ +1,j s j (by Lemma 3.11, Assumption 3.1, and (3)) y j (B ) j s j +b 3 g(r,t Sζj, s j ) s +b 2 ǫ +1,j s j (by Assumption 3.5) a 9 a j 2 10 ǫ,j s j +b 3 b 0 a 9 a j 2 10 ǫ,j s j +b 3 b 1 ǫ,j s j +b 2 ǫ +1,j s j (a 9 a j b 3 b 0 a 9 a j b 3 b 1 +b 2 )ǫ +1,j s j, (note that ǫ,j ǫ +1,j ) 15

16 where b 2,b 3 are some constant. Because b 0,b 1,b 2 and b 3 are independent of a 9 and a 10, we can choose a 9 and a 10 large enough such that (a 9 a j b 3 b 0 a 9 a j b 3 b 1 +b 2 ) a 9 a +1 j for all j, j +1. Tae for example, a 9 > 1 and a 10 > 1+b 3 b 0 +b 3 b 1 +b 2. Therefore This concludes the argument by induction. y j (B +1 ) j s j a 9 a +1 j 2 10 ǫ +1,j s j. Lemma If Assumption 3.3 holds then there exist a neighborhood U of x and a constant a 11 such that for all x 1,x 2 U, the inequality y T Sζ1 Hessf(x )T 1 S ζ1 s a 11 s max{dist(x 1,x ),dist(x 2,x )} holds, where ζ 1 = R 1 x (x 1), s = R 1 x 1 (x 2 ), y = T 1 S s gradf(x 2 ) gradf(x 1 ). Proof. Define ȳ = Pγ 0 1 gradf(x 2 ) gradf(x 1 ), where P is the parallel transport along the curve γ defined by γ(t) = R x1 (ts). From Lemma 3.9, we have where H = 1 0 P0 t γ ȳ Hs b 0 s 2, (19) Hessf(γ(t))Pγ t 0 dt and b 0 is a constant. We then have y T Sζ1 Hessf(x )T 1 S ζ1 s y ȳ + ȳ Hs + Hs T Sζ1 Hessf(x )T 1 S ζ1 s = T 1 S ζ gradf(x 2 ) Pγ 0 1 gradf(x 2 ) +b 0 s 2 + H T Sζ1 Hessf(x )T 1 S ζ1 s b 1 s max{dist(x 1,x ),dist(x 2,x )}+b 0 s 2 (by Lemma 3.7) +( 1 where b 1 and b 2 are some constants. 0 P 0 t γ Hessf(γ(t))Pγ t 0 dt Hessf(x 1 ) + Hessf(x 1 ) T Sζ1 Hessf(x )T 1 S ζ1 ) s b 2 s max{dist(x 1,x ),dist(x 2,x )}, (by Assumption 3.3) With these technical lemmas in place, we now start the Riemannian generalization of the sequence of lemmas in [BKS96] that leads to the main result [BKS96, Theorem 2.7], generalized here as Theorem For an easier comparison with [BKS96], in the rest of the convergence analysis, we let n (instead of d) denote the dimension of the manifold M. The next lemma generalizes[bks96, Lemma 2.3], itself a slight variation of[kbs93, Lemma 3.2]. The proof of [BKS96, Lemma 2.3] involves considering the span of a few s j s. In the Riemannian setting, adifficultyarisesfromthefactthatthes j sarenotinthesametangentspace. Weovercome this difficulty by transporting the s j s to T x M. Lemma Let s be such that R x (s ) x. If Assumptions 3.1, 3.2, 3.3, and 3.5 hold then there exists K 0 such that for any set of n + 1 steps S = {s j : K 1 <... < n+1 }, there exists an index m with m {2,3,...,n+1} such that (B m H m )s m s m < (a 12 a n ā 12 )ǫ 1 n S, 16

17 where ǫ S = max 1 j n+1 {dist(x j,x ),dist(r xj (s j ),x )}, H m = T Sζm Hessf(x )T 1 S ζm, ζ m = R 1 x (x m ), a 12,ā 12 are some constants, and n is the dimension of the manifold. Proof. Given S, for j = 1,2,...,n+1, define [ ] s1 S j = s 1, s 2 s 2,..., s j, s j where s i = T 1 S ζi s i,i = 1,2,...,j. The proof is organized as follows. We will first obtain in (28) that there exists m [2,n + 1] and u R m 1, w T x M such that s m / s m = S m 1 u w, S m 1 has full column ran and is well conditioned, and w is small. We will also obtain in (30) that (T 1 S ζm B m T Sζm Hessf(x ))S m 1 is small due to the Hessian approximating properties of the SR1 update given in Lemma 3.13 above. The conclusion follows from these two results. Let G denote the matrix expression of inner product of T x M and Ŝj denote the coordinate expression of S j, for j {1,...,n}. Let κ j be the smallest singular value of G 1/2 Ŝ j and define κ n+1 = 0. We have 1 = κ 1 κ 2... κ n+1 = 0. Let m be the smallest integer for which Since m n+1 and κ 1 = 1, we have κ m κ m 1 < ǫ 1 n S. (20) κ m 1 = κ 1 ( κ 2 )...( κ m 1 ) > ǫ (m 2)/n S > ǫ (n 1)/n S. (21) κ 1 κ m 2 Since x x and R x (s ) x, we can assume that ǫ S (0,( 1 4 )n ) for all. Now, we choose z R m such that G 1/2 Ŝ m z 2 = κ m z 2 (22) and z = ( u 1 where u R m 1. (The last component of z is nonzero due to that m is the smallest such that (20) is true.) Let w = S m z and its coordinate expression ŵ = Ŝmz. From the definition of G 1/2 Ŝ m and z, we have G 1/2 Ŝ m 1 u G 1/2 ŵ = G1/2 ˆ s m G 1/2, (23) ˆ s m 2 where ˆ s m is the coordinate expression of s m. Since κ m 1 is the smallest singular value of G 1/2 Ŝ m 1, we have that u 2 1 G 1/2 κ m 1 < G1/2 ŵ 2 +1 ǫ (n 1)/n S Ŝ m 1 u 2 = 1 κ m 1 G 1/2 ), ŵ + G1/2 ˆ s m G 1/2 2 G ˆ s m 2 1/2 ŵ 2 +1 κ m 1 = w +1 κ m 1 (24) = w +1. (by (21)) (25) ǫ (n 1)/n S 17

18 Using (22) and (24), we have that w 2 = G 1/2 ŵ 2 2 = G 1/2 Ŝ m z 2 2 = κ 2 m z 2 2 = κ 2 m(1+ u 2 2) κ 2 m +( κ m κ m 1 ) 2 ( G 1/2 ŵ 2 +1) 2 = κ 2 m +( κ m κ m 1 ) 2 ( w +1) 2. Therefore, since (20) implies that κ m < ǫ 1/n S, using (20), This implies w 2 < ǫ 2/n S +ǫ 2/n S ( w +1)2 < 4ǫ 2/n S ( w +1)2. (26) w (1 2ǫ 1/n S ) < 2ǫ1/n S, and hence w < 1, since ǫ S < ( 1 4 )n. Therefore, (25) and (26) imply that u 2 < 2 ǫ (n 1)/n S, (27) w < 4ǫ 1/n S. (28) Equation (28) is the announced result that w is small. The bound (27) will also be invoed below. Now we show that (T 1 S ζj B j T Sζj Hessf(x ))S j 1 is small for all j [2,n+1] (and thus in particular for j = m). By Lemma 3.12, we have for all i { 1, 2,..., j 1 }. Therefore, y i (B j ) i s i a 9 a j i 2 10 ǫ j,i s i a 9 a n ǫ S s i, (29) (T 1 S ζj B j T Sζj Hessf(x )) s i s i T 1 S ζi y i T 1 S ζj B j T Sζj s i T 1 S ζi y i Hessf(x ) s i + s i s i T 1 S ζi y i T 1 S ζj B j T Sζj s i +b 1 ǫ S (by Lemma 3.13) s i T 1 S ζi (y i T Sζi T 1 S ζj B j T Sζj T 1 S ζi s i ) = +b 1 ǫ S s i (y i (B j ) i s i ) b 2 +b 3 ǫ S (by Lemma 3.11 and Assumption 3.1) s i (b 4 a n b 3 )ǫ S (by (29)) where b 2,b 3 and b 4 are some constants. Therefore, we have that for any j [2,n+1], (T 1 S ζj B j T Sζj Hessf(x ))S j 1 g,2 b 5 ǫ S, (30) where b 5 = n(b 4 a n b 3 ) and g,2 is the norm induced by the Riemannian metric g and the Euclidean norm, i.e., A g,2 = sup Av / v 2 with defined in (5). 18

19 We can now conclude the proof as follows. Using (23) and (30) with j = m, (27) and (28), we have (T 1 S ζm B m T Sζm Hessf(x )) s m s m = (T 1 S ζm B m T Sζm Hessf(x ))(S m 1 u w) (T 1 S ζm B m T Sζm Hessf(x ))S m 1 g,2 u 2 + (T 1 S ζm B m T Sζm Hessf(x )) w 2 b 5 ǫ S ǫ (n 1)/n +(M +Hessf(x ))4ǫ 1/n S (by Assumption 3.1) S (2b 5 +b 6 )ǫ 1/n S where b 6 is some constant. Finally, (B m H m )s m s m = = (B m T Sζm Hessf(x )T 1 S ζm )s m s m (T 1 S ζm B m T Sζm Hessf(x )) s m (2b 5 +b 6 )ǫ 1/n S. s m The next lemma generalizes [BKS96, Lemma 2.4]. Its proof is a translation of the proof of [BKS96, Lemma 2.4], where we invoe two manifold-specific results: the equality of Hessf(x ) and Hess(f R x )(0 x ) (which holds in view of [AMS08, Proposition 5.5.6] since x is a critical point of f), and the bound in Lemma 3.4 on the retraction R. Lemma Suppose that Assumptions 3.1, 3.2, 3.3, 3.4, 3.5 and 3.6 hold and the trust-region subproblem (2) is solved accurately enough for (9) to hold. Then there exists N such that for any set of p > n consecutive steps s +1,s +1,...,s +p with N, there exists a set, G, of at least p n indices contained in the set {i : +1 i +p} such that for all j G, (B j H j )s j s j < a 13 ǫ 1 n, where a 13 = a 12 a p ā 12, H j = T Sζj Hessf(x )T 1 S ζj, ζ j = R 1 x (x j), and ǫ = Furthermore, for sufficiently large, if j G, then max {dist(x j,x ),dist(r xj (s j ),x )}. +1 j +p s j < a 14 dist(x j,x ), (31) where a 14 is a constant, and ρ j (32) 19

20 Proof. By Lemma 3.6, s 0. Therefore, by Lemma 3.14, applied to the set {s,s +1,...,s +p }, (33) there exists N such that for any N there exists an index l 1, with +1 l 1 +p satisfying (B l1 H l1 )s l1 s l1 < a 13 ǫ 1 n, where a 13 = a 12 a p ā 12. Now we can apply Lemma 3.14 to the set {s,s +1,...,s +p } s l1 to get l 2. Repeating this p n times, we get a set of p n indices G = {l 1,l 2,...,l p n } such that if j G, then (B j H j )s j < a 13 ǫ 1 n s j. (34) We show (31) next. Consider j G. By (34), we have Therefore, where b 0 is a constant and we have g(s j,(h j B j )s j ) s j (H j B j )s j a 13 ǫ 1 n s j 2. g(s j,b j s j ) g(s j,h j s j ) a 13 ǫ 1 n s j 2 > b 0 s j 2, (choosing large enough) 0 m j (0) m j (s j ) = g(gradf(x j ),s j ) 1 2 g(s j,b j s j ) where b 1 is some constant. This yields (31). gradf(x j ) s j 1 2 b 0 s j 2 b 1 dist(x j,x ) s j 1 2 b 0 s j 2, (by Lemma 3.8) 20

21 Finally, we show (32). Let j G and define ˆf x (η) = f(r x (η)). It follows that f(x j ) f(r xj (s j )) (m j (0) m j (s j )) = f(x j ) f(r xj (s j ))+g(gradf(x j ),s j )+ 1 2 g(s j,b j s j ) = ˆf xj (0 xj ) ˆf xj (s j )+g(gradf(x j ),s j )+ 1 2 g(s j,b j s j ) = 1 2 g(s j,b j s j ) 1 0 g(hess ˆf xj (τs j )[s j ],s j )(1 τ)dτ (by Taylor s theorem) 1 2 g(s j,b j s j ) 1 2 g(s j,h j s j ) g(s j,h j s j ) = 1 2 g(s j,(b j H j )s j ) g(hess ˆf xj (τs j )[s j ],s j )(1 τ)dτ (g(s j,t Sζj Hessf(x )T 1 S ζj s j ) g(hess ˆf xj (τs j )[s j ],s j ))(1 τ)dτ 1 2 s j (B j H j )s j + s j (T Sζj Hess ˆf x (0 x )T 1 S ζj Hess ˆf xj (τs j )) (1 τ)dτ (by [AMS08, Proposition 5.5.6]) b 2 s j 2 ǫ 1 n +b 3 s j 2 (dist(x j,x )+ s j ) (by (34), Lemma 3.4 and Assumption 3.4) b 4 s j 2 ǫ 1 n, (by (31) and dist(x j,x ) is smaller than ǫ 1 n eventually) where b 2,b 3 and b 4 are some constants. In view of (31) and Lemma 3.8, we have s j < b 5 gradf(x j ), where b 5 is some constant. Combining with s j j, we obtain Noticing (9), we have s j 2 b 5 gradf(x j ) min{ j,b 5 gradf(x j ) }. f(x j ) f(r xj (s j )) (m j (0) m j (s j )) b 6 ǫ 1 n (m j (0) m j (s j )), where b 6 is a constant. This implies (32). The next result generalizes [BKS96, Lemma 2.5] in two ways: the Euclidean setting is extended to the Riemannian setting, and inexact solves are allowed by the presence of δ. The main hurdle that we had to overcome in the Riemannian generalization is that the equality dist(x +s,x ) = s ξ does not necessarily hold. As we will see, Lemma 3.3 comes to our rescue. Lemma Suppose Assumptions 3.2 and 3.3 hold. If the quantities e = dist(x,x ) and (B H )s s 21

22 are sufficiently small and if B s = gradf(x )+δ with δ gradf(x ) 1+θ, then dist(r x (s ),x (B H )s ) a 15 e +a 16 e 1+min{θ,1} s, (35) and (B H )s h(r x (s )) a 17 h(x )+a 18 h 1+min{θ,1} (x ), (36) s a 19 h(x ) e a 20 h(x ) (37) where a 15,a 16,a 17 and a 18 are some constants and h(x) = (f(x) f(x )) 1 2. Proof. By definition of s, we have s = H 1 [(H B )s gradf(x )+δ ]. (38) Define ξ = R 1 x x. Therefore, letting γ be the curve defined by γ(t) = R x (tξ ), we have s ξ = H 1 [(H B )s gradf(x )+δ H ξ ] b 0 ( (H B )s + δ + Pγ 0 1 gradf(x ) gradf(x ) ( + ( 1 0 P 0 t γ 1 0 P 0 t γ Hessf(γ(t))Pγ t 0 dt)ξ Hessf(x )ξ + Hessf(x )ξ H ξ ) b 0 ( (H B )s +b 1 ξ 1+min{θ,1} (by Lemmas 3.8 and 3.9) + ( 1 0 P 0 t γ Hessf(γ(t))Pγ t 0 dt)ξ Hessf(x )ξ + Hessf(x ) H ξ ) b 0 (H B )s +b 0 b 1 ξ 1+min{θ,1} +b 0 b 3 ξ 2 (by Assumption 3.3) Hessf(γ(t))Pγ t 0 dt)ξ b 0 (H B )s +b 4 ξ 1+min{θ,1} (39) where b 1, b 2, b 3 and b 4 are some constants. From Lemma 3.3, we have dist(r x (s ),x ) = dist(r x (s ),R x (ξ )) b 5 s ξ, (40) where b 5 is a constant. Combining (39) and (40) and using Lemma 3.4, we obtain dist(r x (s ),x ) b 0 b 5 (H B )s + b 4 b 5 e 1+min{θ,1}. (41) From (38), for large enough such that H 1 (H B )s 1 2 s, we have s 1 2 s + H 1 ( gradf(x ) + gradf(x ) 1+θ ). Using Lemma 3.8, this yields s b 6 dist(x,x ), 22

23 where b 6 is a constant. Using the latter in (41) yields dist(r x (s ),x ) b 0 b 5 b 6 (H B )s s dist(x,x )+ b 4 b 5 e 1+min{θ,1}, which shows (35). Weshow (37)next. Define ˆf x (η) = f(r x (η))andletζ = R 1 x (x ). Wehave, forsomet (0,1), ˆf x (ζ ) ˆf x (0 x ) = g(gradf(x ),ζ )+g(hess ˆf x (tζ )[ζ ],ζ ) = g(hess ˆf x (tζ )[ζ ],ζ ), where we have used (Euclidean) Taylor s theorem to get the first equality and the fact that x is a critical point of f (Assumption 3.2) for the second one. Therefore, since Hess ˆf x = Hessf(x ) is positive definite (in view of [AMS08, Proposition 5.5.6] and Assumption 3.2), there exist b 7 and b 8 such that b 7 (ˆf x (ζ ) ˆf x (0 x )) ˆζ 2 b 8 (ˆf x (ζ ) ˆf x (0 x )) Then, using Lemma 3.4, we obtain that there exist b 9 and b 10 such that In other words, b 9 (f(x ) f(x )) dist(x,x ) 2 b 10 (f(x ) f(x )). b 9 h 2 (x ) e 2 b 10h 2 (x ), and we have shown (37). Combining it with (35), we get (36). With Lemmas 3.15 and 3.16 in place, the rest of the local convergence analysis is essentially a translation of the analysis in [BKS96]. The next lemma generalizes [BKS96, Lemma 2.6]. Lemma If Assumptions 3.1, 3.2, 3.3, 3.4, 3.5, and 3.6 hold and the subproblem is solved accurately enough for (9) and (10) to hold, then where h = h(x ). h lim = 0, Proof. Let p be the smallest integer greater than 2n+n( lnτ 1 /lnτ 2 ), where τ 1 and τ 2 are defined in Algorithm 1. Then τ n 1τ p 2n 2 1. (42) Applying Lemma 3.15 with this value of p, there exists N such that if N, then there exists a set of at least p n indices, G {j : +1 j +p}, such that if j G, then We now show that for such steps, (B j H j )s j s j < cǫ 1 n, (43) ρ j h j+1 j+1 1 τ 2 h j j. (44) 23

Rank-Constrainted Optimization: A Riemannian Manifold Approach

Rank-Constrainted Optimization: A Riemannian Manifold Approach Ran-Constrainted Optimization: A Riemannian Manifold Approach Guifang Zhou1, Wen Huang2, Kyle A. Gallivan1, Paul Van Dooren2, P.-A. Absil2 1- Florida State University - Department of Mathematics 1017 Academic

More information

Optimization Algorithms on Riemannian Manifolds with Applications

Optimization Algorithms on Riemannian Manifolds with Applications Optimization Algorithms on Riemannian Manifolds with Applications Wen Huang Coadvisor: Kyle A. Gallivan Coadvisor: Pierre-Antoine Absil Florida State University Catholic University of Louvain November

More information

Convergence analysis of Riemannian trust-region methods

Convergence analysis of Riemannian trust-region methods Convergence analysis of Riemannian trust-region methods P.-A. Absil C. G. Baker K. A. Gallivan Optimization Online technical report Compiled June 19, 2006 Abstract This document is an expanded version

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

A Broyden Class of Quasi-Newton Methods for Riemannian Optimization

A Broyden Class of Quasi-Newton Methods for Riemannian Optimization A Broyden Class of Quasi-Newton Methods for Riemannian Optimization Wen Huang, K. A. Gallivan, P.-A. Absil FSU15-04 Florida State University Mathematics Department Tech. report FSU-15-04 A BROYDEN CLASS

More information

A BROYDEN CLASS OF QUASI-NEWTON METHODS FOR RIEMANNIAN OPTIMIZATION

A BROYDEN CLASS OF QUASI-NEWTON METHODS FOR RIEMANNIAN OPTIMIZATION Tech. report UCL-INMA-2014.01 A BROYDEN CLASS OF QUASI-NEWTON METHODS FOR RIEMANNIAN OPTIMIZATION WEN HUANG, K. A. GALLIVAN, AND P.-A. ABSIL Abstract. This paper develops and analyzes a generalization

More information

An extrinsic look at the Riemannian Hessian

An extrinsic look at the Riemannian Hessian http://sites.uclouvain.be/absil/2013.01 Tech. report UCL-INMA-2013.01-v2 An extrinsic look at the Riemannian Hessian P.-A. Absil 1, Robert Mahony 2, and Jochen Trumpf 2 1 Department of Mathematical Engineering,

More information

Note on the convex hull of the Stiefel manifold

Note on the convex hull of the Stiefel manifold Note on the convex hull of the Stiefel manifold Kyle A. Gallivan P.-A. Absil July 9, 00 Abstract In this note, we characterize the convex hull of the Stiefel manifold and we find its strong convexity parameter.

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Accelerated line-search and trust-region methods

Accelerated line-search and trust-region methods 03 Apr 2008 Tech. Report UCL-INMA-2008.011 Accelerated line-search and trust-region methods P.-A. Absil K. A. Gallivan April 3, 2008 Abstract In numerical optimization, line-search and trust-region methods

More information

arxiv: v1 [math.oc] 7 Aug 2017

arxiv: v1 [math.oc] 7 Aug 2017 ADAPTIVE REGULARIZED NEWTON METHOD FOR RIEMANNIAN OPTIMIZATION JIANG HU, ANDRE MILZAREK, ZAIWEN WEN, AND YAXIANG YUAN arxiv:1708.02016v1 [math.oc] 7 Aug 2017 Abstract. Optimization on Riemannian manifolds

More information

LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS

LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS Tech. report INS Preprint No. 626 LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS SOMAYEH HOSSEINI, WEN HUANG, ROHOLLAH YOUSEFPOUR Abstract. This paper presents line search

More information

Algorithms for Riemannian Optimization

Algorithms for Riemannian Optimization allivan, Absil, Qi, Baker Algorithms for Riemannian Optimization 1 Algorithms for Riemannian Optimization K. A. Gallivan 1 (*) P.-A. Absil 2 Chunhong Qi 1 C. G. Baker 3 1 Florida State University 2 Catholic

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems

A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems C. G. Baker P.-A. Absil K. A. Gallivan Technical Report FSU-SCS-2005-096 Submitted June 7, 2005 Abstract A general inner-outer

More information

Optimization methods on Riemannian manifolds and their application to shape space

Optimization methods on Riemannian manifolds and their application to shape space Optimization methods on Riemannian manifolds and their application to shape space Wolfgang Ring Benedit Wirth Abstract We extend the scope of analysis for linesearch optimization algorithms on (possibly

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Differential Geometry MTG 6257 Spring 2018 Problem Set 4 Due-date: Wednesday, 4/25/18

Differential Geometry MTG 6257 Spring 2018 Problem Set 4 Due-date: Wednesday, 4/25/18 Differential Geometry MTG 6257 Spring 2018 Problem Set 4 Due-date: Wednesday, 4/25/18 Required problems (to be handed in): 2bc, 3, 5c, 5d(i). In doing any of these problems, you may assume the results

More information

Matrix Manifolds: Second-Order Geometry

Matrix Manifolds: Second-Order Geometry Chapter Five Matrix Manifolds: Second-Order Geometry Many optimization algorithms make use of second-order information about the cost function. The archetypal second-order optimization algorithm is Newton

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Vincent D. Blondel, Julien M. Hendricx and John N. Tsitsilis July 24, 2009 Abstract

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

DEVELOPMENT OF MORSE THEORY

DEVELOPMENT OF MORSE THEORY DEVELOPMENT OF MORSE THEORY MATTHEW STEED Abstract. In this paper, we develop Morse theory, which allows us to determine topological information about manifolds using certain real-valued functions defined

More information

Chapter 4. Inverse Function Theorem. 4.1 The Inverse Function Theorem

Chapter 4. Inverse Function Theorem. 4.1 The Inverse Function Theorem Chapter 4 Inverse Function Theorem d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d dd d d d d This chapter

More information

arxiv: v2 [math.oc] 28 Apr 2018

arxiv: v2 [math.oc] 28 Apr 2018 Global rates of convergence for nonconvex optimization on manifolds arxiv:1605.08101v2 [math.oc] 28 Apr 2018 Nicolas Boumal P.-A. Absil Coralia Cartis May 1, 2018 Abstract We consider the minimization

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

Elements of Linear Algebra, Topology, and Calculus

Elements of Linear Algebra, Topology, and Calculus Appendix A Elements of Linear Algebra, Topology, and Calculus A.1 LINEAR ALGEBRA We follow the usual conventions of matrix computations. R n p is the set of all n p real matrices (m rows and p columns).

More information

APPLICATIONS OF DIFFERENTIABILITY IN R n.

APPLICATIONS OF DIFFERENTIABILITY IN R n. APPLICATIONS OF DIFFERENTIABILITY IN R n. MATANIA BEN-ARTZI April 2015 Functions here are defined on a subset T R n and take values in R m, where m can be smaller, equal or greater than n. The (open) ball

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS MEHIDDIN AL-BAALI AND HUMAID KHALFAN Abstract. Techniques for obtaining safely positive definite Hessian approximations with selfscaling

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

arxiv: v1 [math.oc] 22 May 2018

arxiv: v1 [math.oc] 22 May 2018 On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods Yu Bai Song Mei arxiv:1805.08756v1 [math.oc] 22 May 2018 May 23, 2018 Abstract We prove that a simple Sequential

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Optimisation on Manifolds

Optimisation on Manifolds Optimisation on Manifolds K. Hüper MPI Tübingen & Univ. Würzburg K. Hüper (MPI Tübingen & Univ. Würzburg) Applications in Computer Vision Grenoble 18/9/08 1 / 29 Contents 2 Examples Essential matrix estimation

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

MATH 4030 Differential Geometry Lecture Notes Part 4 last revised on December 4, Elementary tensor calculus

MATH 4030 Differential Geometry Lecture Notes Part 4 last revised on December 4, Elementary tensor calculus MATH 4030 Differential Geometry Lecture Notes Part 4 last revised on December 4, 205 Elementary tensor calculus We will study in this section some basic multilinear algebra and operations on tensors. Let

More information

Numerical Methods for Large-Scale Nonlinear Systems

Numerical Methods for Large-Scale Nonlinear Systems Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.

More information

A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve

A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve Ronny Bergmann a Technische Universität Chemnitz Kolloquium, Department Mathematik, FAU Erlangen-Nürnberg,

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

The nonsmooth Newton method on Riemannian manifolds

The nonsmooth Newton method on Riemannian manifolds The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and

More information

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech Math 6455 Nov 1, 26 1 Differential Geometry I Fall 26, Georgia Tech Lecture Notes 14 Connections Suppose that we have a vector field X on a Riemannian manifold M. How can we measure how much X is changing

More information

Riemannian geometry of surfaces

Riemannian geometry of surfaces Riemannian geometry of surfaces In this note, we will learn how to make sense of the concepts of differential geometry on a surface M, which is not necessarily situated in R 3. This intrinsic approach

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

A brief introduction to Semi-Riemannian geometry and general relativity. Hans Ringström

A brief introduction to Semi-Riemannian geometry and general relativity. Hans Ringström A brief introduction to Semi-Riemannian geometry and general relativity Hans Ringström May 5, 2015 2 Contents 1 Scalar product spaces 1 1.1 Scalar products...................................... 1 1.2 Orthonormal

More information

Choice of Riemannian Metrics for Rigid Body Kinematics

Choice of Riemannian Metrics for Rigid Body Kinematics Choice of Riemannian Metrics for Rigid Body Kinematics Miloš Žefran1, Vijay Kumar 1 and Christopher Croke 2 1 General Robotics and Active Sensory Perception (GRASP) Laboratory 2 Department of Mathematics

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

CHAPTER 1 PRELIMINARIES

CHAPTER 1 PRELIMINARIES CHAPTER 1 PRELIMINARIES 1.1 Introduction The aim of this chapter is to give basic concepts, preliminary notions and some results which we shall use in the subsequent chapters of the thesis. 1.2 Differentiable

More information

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Riemannian Optimization and its Application to Phase Retrieval Problem

Riemannian Optimization and its Application to Phase Retrieval Problem Riemannian Optimization and its Application to Phase Retrieval Problem 1/46 Introduction Motivations Optimization History Phase Retrieval Problem Summary Joint work with: Pierre-Antoine Absil, Professor

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS J. OPERATOR THEORY 44(2000), 243 254 c Copyright by Theta, 2000 ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS DOUGLAS BRIDGES, FRED RICHMAN and PETER SCHUSTER Communicated by William B. Arveson Abstract.

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Inexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty

Inexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty Inexact Newton Methods Applied to Under Determined Systems by Joseph P. Simonis A Dissertation Submitted to the Faculty of WORCESTER POLYTECHNIC INSTITUTE in Partial Fulfillment of the Requirements for

More information

LECTURE: KOBORDISMENTHEORIE, WINTER TERM 2011/12; SUMMARY AND LITERATURE

LECTURE: KOBORDISMENTHEORIE, WINTER TERM 2011/12; SUMMARY AND LITERATURE LECTURE: KOBORDISMENTHEORIE, WINTER TERM 2011/12; SUMMARY AND LITERATURE JOHANNES EBERT 1.1. October 11th. 1. Recapitulation from differential topology Definition 1.1. Let M m, N n, be two smooth manifolds

More information

A Convexity Theorem For Isoparametric Submanifolds

A Convexity Theorem For Isoparametric Submanifolds A Convexity Theorem For Isoparametric Submanifolds Marco Zambon January 18, 2001 1 Introduction. The main objective of this paper is to discuss a convexity theorem for a certain class of Riemannian manifolds,

More information

arxiv: v2 [math.oc] 31 Jan 2019

arxiv: v2 [math.oc] 31 Jan 2019 Analysis of Sequential Quadratic Programming through the Lens of Riemannian Optimization Yu Bai Song Mei arxiv:1805.08756v2 [math.oc] 31 Jan 2019 February 1, 2019 Abstract We prove that a first-order Sequential

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information