A BROYDEN CLASS OF QUASI-NEWTON METHODS FOR RIEMANNIAN OPTIMIZATION

Size: px
Start display at page:

Download "A BROYDEN CLASS OF QUASI-NEWTON METHODS FOR RIEMANNIAN OPTIMIZATION"

Transcription

1 Tech. report UCL-INMA A BROYDEN CLASS OF QUASI-NEWTON METHODS FOR RIEMANNIAN OPTIMIZATION WEN HUANG, K. A. GALLIVAN, AND P.-A. ABSIL Abstract. This paper develops and analyzes a generalization of the Broyden class of quasi- Newton methods to the problem of minimizing a smooth objective function f on a Riemannian manifold. A condition on vector transport and retraction that guarantees convergence and facilitates efficient computation is derived. Experimental evidence is presented demonstrating the value of the extension to the Riemannian Broyden class through superior performance for some problems compared to existing Riemannian BFGS methods, in particular those that depend on differentiated retraction. Key words. Riemannian optimization; manifold optimization; Quasi-Newton methods; Broyden methods; Stiefel manifold AMS subject classifications. 65K05, 90C48, 90C53 1. Introduction. In the classical Euclidean setting, the Broyden class (see, e.g., [22, 6.3]) is a family of quasi-newton methods that depend on a real parameter, φ. Its Hessian approximation update formula is B +1 = (1 φ )B+1 BFGS +φ B+1 DFP, where B+1 BFGS stands for the update obtained by the Broyden Fletcher Goldfarb Shanno (BFGS) method and B+1 DFP for the update of the Davidon Fletcher Powell (DFP) method. Therefore, all members of the Broyden class satisfy the well-nown secant equation, central to many quasi-newton methods. For many years, BFGS, φ = 0, was the preferred member of the family, as it tends to perform better in numerical experiments. Analyzing the entire Broyden class was nevertheless a topic of interest in view of the insight that it gives into the properties of quasi-newton methods; see [11] and the many references therein. Subsequently, it was found that negative values of φ are desirable [34, 10] and recent results reported in [19] indicate that a significant improvement can be obtained by exploiting the freedom offered by φ. The problem of minimizing a smooth objective function f on a Riemannian manifold has been a topic of much interest over the past few years due to severalimportant applications. Recently considered applications include matrix completion problems [8, 21, 12, 33], truss optimization [26], finite-element discretization of Cosserat rods [27], matrix mean computation [6, 4], and independent component analysis [31, 30]. Research efforts to develop and analyze optimization methods on manifolds can be traced bac to the wor of Luenberger [20]; they concern, among others, steepestdescent methods [20], conjugate gradients [32], Newton s method [32, 3], and trustregion methods [1, 5]; see also [2] for an overview. The idea of quasi-newton methods on manifolds is not new, however, the literature of which we are aware restricts consideration to the BFGS quasi-newton method. Gabay [15] discussed a version using parallel transport on submanifolds of R n. Brace and Manton [9] applied a version on the Grassmann manifold to the problem of weighted low-ran approximations. Qi [25] compared the performance of different Department ofmathematical Engineering, ICTEAMInstitute, Universitécatholique delouvain, B-1348 Louvain-la-Neuve, Belgium. Department of Mathematics, 208 Love Building, 1017 Academic Way, Florida State University, Tallahassee FL , USA. Corresponding author. huwst08@gmail.com. 1

2 2 W. Huang and K. A. Gallivan and P.-A Absil vector transports for a version of BFGS on Riemannian manifolds. Savas and Lim [28] proposed a BFGS and limited memory BFGS methods for problems with cost functions defined on a Grassmannian and applied the methods to the best multilinear ran approximation problem. Ring and Wirth [26] systematically analyzed a version of the BFGS on Riemannian manifolds which requires differentiated retraction. Seibert et al. [29] discussed the freedom available when generalizing BFGS to Riemannian manifolds and analyzed one generalization of BFGS method on Riemannian manifolds that are isometric to R n. In view of the above considerations, generalizing the Broyden family to manifolds is an appealing endeavor, which we undertae in this paper. For φ = 0 (BFGS) the proposed algorithm is quite similar to the BFGS method of Ring and Wirth [26]. Notably, both methods rely on the framewor of retraction and vector transport developed in [3, 2]. The BFGS method of [26] is more general in the sense that it also considers infinite-dimensional manifolds. On the other hand, a characteristic of our wor is that we strive to resort as little as possible to the derivative of the retraction. Specifically,thedefinitionofy (whichcorrespondstotheusualdifferenceofgradients) in [26] involvesdf Rx (s ), whose Riesz representationis (DR x (s )) f Rx (s ); here we use the notation of [26], i.e., R is the retraction, f R = f R, s = Rx 1 (x +1 ), represents the Riemannian adjoint operator, and f is the Riemannian gradient. In contrast, our definition of y relies on the same isometric vector transport as the one that appears in the Hessian approximation update formula. This can be advantageous in situations where R is defined by means of a constraint restoration procedure that does not admit a closed-form expression. It may also be the case that the chosen R admits a closed-form expression but that its derivative is unnown to the user. The price to pay for using the isometric vectortransport in y is satisfying a novel locing condition. Fortunately, we show simple procedures that can produce a retraction or an isometric vector transport such that the pair satisfies the locing condition. As a result, efficient and convergent algorithms can be developed. Another contribution with respect to [26] is that we propose a limited-memory version of the quasi-newton algorithm for large-scale problems. The paper is organized as follows. The Riemannian Broyden(RBroyden) family of algorithms and the locing condition are defined in Section 2. A convergence analysis is presented in Section 3. Two methods of constructing an isometric vector transport and a method of constructing a retraction related to the locing condition are derived in Section 4. The limited-memory RBFGS is described in Section 5. The inverse Hessian approximation formula of Ring and Wirth s RBFGS is derived in Section 6 and numerical experiments are reported in Section 7 for all versions of the methods. Conclusions are presented and future wor suggested in Section RBroyden family of methods. The RBroyden family of methods is stated in Algorithm 1, and the details, in particular the Hessian approximation update formulas in Steps 6 and 7, are explained next. A basic bacground in differential geometry is assumed; such a bacground can be found, e.g., in [7, 2]. The concepts of retraction and vector transport can be found in [2] or [25]. Practically, the input statement of Algorithm 1 means the following. The retraction R is a C 2 mapping 1 from the tangent bundle TM onto M such that (i) R(0 x ) = x for all x M (where 0 x denotes the origin of T x M) and (ii) d R(tξ x) t=0 = ξ x for all ξ x T x M. The restriction of R to T x M is denoted by R x. The domain of R does 1 The C 2 assumption is used in Lemma 3.5.

3 A Riemannian Broyden Class Method 3 not need to be the whole tangent bundle, but in practice it often is, and in this paper we mae the blanet assumption that R is defined wherever needed. Algorithm 1 RBroyden family Input: Riemannian manifold M with Riemannian metric g; retraction R; isometric vector transport T S, with R as associated retraction, that satisfies (2.8); continuously differentiable real-valued function f on M; initial iterate x 0 M; initial Hessian approximation B 0 which is a linear transformation of the tangent space T x0 M that is symmetric positive definite with respect to the metric g; convergence tolerance ε > 0; Wolfe condition constants 0 < c 1 < 1 2 < c 2 < 1; 1: 0; 2: while gradf(x ) > ε do 3: Obtain η T x M by solving B η = gradf(x ); 4: Set x +1 = R x (α η ) where α > 0 is computed from a line search procedure to satisfy the Wolfe conditions (2.1) (2.2) f(x +1 ) f(x )+c 1 α g(gradf(x ),η ), d f(r(tη )) t=α c 2 d f(r(tη )) t=0 ; 5: Set x +1 = R x (α η ); 6: Define s = T Sα η α η and y = β 1 gradf(x +1 ) T Sα η gradf(x ), where α β = η T Rα η α η and T R is the differentiated retraction of R; 7: Define the linear operator B +1 : T x+1 M T x+1 M by B +1 p = B p g(s, B p) g(s, B B s + g(y,p) s ) g(y,s ) y +φ g(s, B s )g(v,p)v, for all p T x+1 M or equivalently (2.3) B +1 = B B s ( B s ) + y y ( B s ) s y s +φ g(s, B s )v v, where v = y /g(y,s ) B s /g(s, B s ), φ is any number in the open interval (φ c, ), B = T Sα η B T 1 S α, φ c η = 1/(1 u ), u = 1 (g(y, B y )g(s, B s ))/g(y,s ) 2, g(, ) denotesthe Riemannianmetric, and a represents the flat of a, i.e., a : T x M R : v g(a,v); 2 8: +1; 9: end while A vector transport T : TM TM TM,(η x,ξ x ) T ηx ξ x with associated retraction R is a smooth mapping such that, for all (x,η x ) in the domain of R and all ξ x,ζ x T x M, it holds that (i) T ηx ξ x T R(ηx)M, (ii) T 0x ξ x = ξ x, (iii) T ηx is a linear map. In Algorithm 1, the vector transport T S is isometric, i.e., it additionally 2 It can be shown using the Cauchy-Schwarz inequality that u 1 and u = 1 if and only if there exists a constant κ such that y = κb s.

4 4 W. Huang and K. A. Gallivan and P.-A Absil satisfies (iv) (2.4) g R(ηx)(T Sηx ξ x,t Sηx ζ x ) = g x (ξ x,ζ x ). In most practical cases, T Sη exists for all η, and we mae this assumption throughout. In fact, we do not require that the isometric vector transport is smooth. The required properties are T S C 0 and for any x M, there exists a neighborhood U of x and a constant c 0 such that for all x,y U (2.5) (2.6) T Sη T Rη c 0 η T 1 S η T 1 R η c 0 η where η = R 1 x (y), T R denotes the differentiated retraction, i.e., (2.7) T Rηx ξ x = DR(η x )[ξ x ] = d R x(η x +tξ x ) t=0 and denotes the induced norm of the Riemannian metric g. In the following analysis, we use only these two properties of isometric vector transport. In Algorithm 1, we require the isometric vector transport T S to satisfy the locing condition (2.8) T Sξ ξ = βt Rξ ξ, β = ξ T Rξ ξ, for all ξ T x M and all x M. Practical ways of building such a T S are discussed in Section 4. Observe that, throughout Algorithm 1, the differentiated retraction T R only appears in the form T Rξ ξ, which is equal to d R(tξ) t=1. Hence T Rα η α η is just the velocity vector of the line search curve α R(αη ) at time α and we are only required to be able to evaluate the differentiated retraction in the direction transported. The computational efficiency that results is also discussed in Section 4. The isometry condition (2.4) and the locing condition (2.8) are imposed on T S notably because, as shown in Lemma 2.1, they ensure that the second Wolfe condition (2.2) implies g(s,y ) > 0. Much as in the Euclidean case, it is essential that g(s,y ) > 0, otherwise the secant condition B +1 s = y cannot hold with B +1 positive definite, whereas positive definiteness of the B s is ey to guaranteeing that the search directions η are descent directions. It is possible to state Algorithm 1 without imposing the isometry and locing conditions, but then it becomes an open question whether the main convergence results would still hold. Clearly, some intermediate results would fail to hold and, assuming that the main results still hold, a completely different approach would probably be required to prove them. Whenφ = 0,theupdatingformula(2.3)reducestotheRiemannianBFGSformula of [25]. However, a crucial difference between Algorithm 1 and the Riemannian BFGS of [25] lies in the definition of y. Its definition in [25] corresponds to setting β to α 1 instead of η T Rα η α η. Our choice of β allows for a convergence analysis under more general assumptions than those of the convergence analysis of Qi [23]. Indeed, the convergence analysis of the Riemannian BFGS of [25], found in [23], assumes that retraction R is set to the exponential mapping and that vector transport T S is set to the parallel transport. These specific choices remain legitimate in Algorithm 1, hence the convergence analysis given here subsumes the one in [23]; however, several other choices become possible, as discussed in more detail in Section 4.

5 A Riemannian Broyden Class Method 5 Lemma 2.1 proves that Algorithm 1 is well-defined for φ (φ c, ), where φc is defined in Step 7 of Algorithm 1. Lemma 2.1. Algorithm 1 constructs infinite sequences {x }, {B }, { B }, {α }, {s }, and {y }, unless the stopping criterion in Step 2 is satisfied at some iteration. For all, the Hessian approximation B is symmetric positive definite with respect to metric g, η 0, and (2.9) g(s,y ) (c 2 1)α g(gradf(x ),η ). Proof. We first show that (2.9) holds when all the involved quantities exist and η 0. Define m (t) = f(r x (tη / η )). We have g(s,y ) = g(t Sα η α η,β 1 gradf(x +1 ) T Sα η gradf(x )) (2.10) = g(t Sα η α η,β 1 gradf(x +1 )) g(t Sα η α η,t Sα η gradf(x )) = g(β 1 T S α η α η,gradf(x +1 )) g(α η,gradf(x )) (by isometry) = g(t Rα η α η,gradf(x +1 )) g(α η,gradf(x )) (by (2.8)) ( dm (α η ) = α η dm ) (0). Note that guaranteeing (2.10), which will be used frequently, is the ey reason for imposing the locing condition (2.8). From the second Wolfe condition (2.2), we have (2.11) Therefore, it follows that dm (α η ) dm (0) c 2. (2.12) dm (α η ) dm (0) (c 2 1) dm (0) 1 = (c 2 1) η g(gradf(x ),η ). The claim (2.9) follows from (2.10) and (2.12). When B is symmetric positive definite, η is a descent direction. Observe that the function α f(r(αη )) remains a continuously differentiable function from R to R which is bounded below. Therefore, the classical result in [22, Lemma 3.1] guarantees the existence of a step size, α, that satisfies the Wolfe conditions. The claims are proved by induction. They hold for = 0 in view of the assumptions on B 0 of Step 3 and of the results above. Assume now that the claims hold for some. From (2.9), we have g(s,y ) (1 c 2 )α g(gradf(x ), η ) = (1 c 2 )α g(gradf(x ),B 1 gradf(x )) > 0. Recall that, in the Euclidean case, s T y > 0 is a necessary and sufficient condition for the existence of a positive-definite secant update (see [14, Lemma 9.2.1]), and that BFGS is such an update [14, (9.2.10)]. From the generalization of these results to the Riemannian case (see [23, Lemmas and 2.4.2]), it follows that B +1 is symmetric positive definite when φ 0. Consider the function h(φ ) : R R d which gives the eigenvalues of B +1. Since B +1 is symmetric positive definite when φ 0, we now all entries of h(0) are

6 6 W. Huang and K. A. Gallivan and P.-A Absil greater than 0. By calculations similar to those for the Euclidean case [10], we have det(b +1 ) = det(b ) g(y,s ) g(s,b s ) (1+φ (u 1)), where φ and u are defined in Step 7 of Algorithm 1. So det(b +1 ) = 0 if and only if φ = φ c < 0. In other words, h(φ ) has one or more 0 entries if and only if φ = φ c. In addition, since all entries of h(0) are greater than 0 and h(φ ) is a continuous function, we have that all entries of h(φ ) are greater than 0 if and only if φ > φ c. Therefore, the operator B +1 is positive definite when φ > φ c. Noting that the vector transport is isometric in (2.3), the symmetry of B +1 is easily verified. 3. Global convergence analysis. In this section, global convergence is proven under a generalized convexity assumption and for φ [0,1 δ], where δ is any number in (0,1]. The behavior of the Riemannian Broyden methods with φ not necessarily in this interval is explored in the experiments. Note the result derived in this section also guarantees local convergence to an isolated local minimizer Basic assumptions and definitions. Throughout the convergence analysis, {x }, {B }, { B }, {α }, {s }, {y }, and {η } are infinite sequences generated by Algorithm 1, Ω denotes the sublevel set {x : f(x) f(x 0 )}, and x is a local minimizer of f in the level set Ω. The existence of such an x is guaranteed if Ω is compact, which happens, in particular, whenever the manifold M is compact. Coordinate expressions in a neighborhood and in tangent spaces are used when appropriate. For an element of the manifold, v M, ˆv R d denotes the coordinates defined by a chart ϕ over a neighborhood U, i.e., ˆv = ϕ(v) for v U. Coordinate expressions, ˆF(x), for elements, F(x), of a vector field F on M are written in terms of the canonical basis of the associated tangent space, T x M, via the coordinate vector fields defined by the chart ϕ. The convergence analysis depends on the property of(strong) retraction-convexity formalized in Definition 3.1 and the following three additional assumptions. Definition 3.1. For a function f : M R : x f(x) on a Riemannian manifold M with retraction R, define m x,η (t) = f(r x (tη)) for x M and η T x M. The function f is retraction-convex with respect to the retraction R in a set S if for all x S, all η T x M and η = 1, m x,η (t) is convex for all t which satisfy R x (τη) S for all τ [0,t]. Moreover, f is strongly retraction-convex in S if m x,η (t) is strongly convex, i.e., there exist two constants 0 < a 0 < a 1 such that a 0 d2 m x,η (t) a 2 1, for all x S, all η = 1 and all t such that R x (τη) S for all τ [0,t]. Assumption 3.1. The objective function f is twice continuously differentiable. Let Ω be a neighborhood of x and ρ be a positive constant such that, for all y Ω, Ω R y (B(0 y,ρ)) and R y ( ) is a diffeomorphism on B(0 y,ρ). The existence of Ω, termed ρ-totally retractive neighborhood of x, is guaranteed [18, 3.3]. Shrining Ω if necessary, further assume that it is an R-star shaped neighborhood of x, i.e., R x (trx 1 (x)) Ω for all x Ω and t [0,1]. The next two assumptions are a Riemannian generalization of a weaened version of [11, Assumption 2.1]: in the Euclidean setting (M is the Euclidean space R n and the retraction R is the standard one, i.e., R x (η) = x+η), if [11, Assumption 2.1] holds, then the next two assumptions are satisfied by letting Ω be the sublevel set Ω. Assumption 3.2. The iterates x stay continuouslyin Ω, meaning that R x (tη ) Ω for all t [0,α ]. Observe that, in view of Lemma 2.1 (B remains symmetric positive definite with respect to g), for all K > 0, the sequences {x }, {B }, { B }, {α }, {s }, {y }, and {η }, for K, are still generated by Algorithm 1. Hence, Assumption 3.2

7 A Riemannian Broyden Class Method 7 amounts to requiring that the iteratesx eventually stay continuouslyin Ω. Note that Assumption 3.2 cannot be removed. To see this, consider for example the unit sphere with the exponential retraction, where we can have x = x +1 with α η = 2π. (A similar comment was made in [18] before Lemma 3.6.) Assumption 3.3. f is strongly retraction-convex (Definition 3.1) with respect to the retraction R in Ω. The definition of retraction-convexity generalizes standard Euclidean and Riemannian concepts. It is easily seen that (strong) retraction-convexity reduces to (strong) convexity when the function is defined on Euclidean space. It can be shown that when R is the exponential mapping, (strong) retraction-convexity is ordinary (strong) convexity for a C 2 function based on the definiteness of its Hessian. It also can be shown that a set S as in Definition 3.1 always exists around a nondegenerate local minimizer of a C 2 function (Lemma 3.1). Lemma 3.1. Suppose Assumption 3.1 holds and Hessf(x ) is positive definite. Define m x,η (t) = f(r x (tη)). Then there exist a -totally retractive neighborhood N of x and two constants 0 < ã 0 < ã 1 such that (3.1) ã 0 d2 m x,η 2 (t) ã 1, for all x N, η T x M, η = 1 and t <. Proof. By definition, we have d f(r x(tη)) = g(gradf(r x (tη)),dr x (tη)[η]) and d 2 2f(R x(tη)) =g(hessf(r x (tη))[dr x (tη)[η]],dr x (tη)[η]) ( +g gradf(r x (tη)), D ) DR x(tη)[η], where the definition of D can be found in [2, 5.4]. Since DR x(tη)[η] t=0 = η and gradf(r x (tη)) t=0,x=x = 0 x, itholdsthat d2 f(r 2 x (tη)) t=0,x=x = g(hessf(x )[η],η). In addition, since Hessf(x ) is positive definite, there exist two constants 0 < b 0 < b 1 such that inequalities b 0 < d2 f(r 2 x (tη)) t=0,x=x < b 1 hold for all η T x M and η = 1. Note that d2 f(r 2 x (tη)) is continuous with respect to (x,t,η). Therefore, there exist a neighborhood U of x and a neighborhood V of 0 such that b 0 /2 < d2 f(r 2 x (tη)) < 2b 1 for all (x,t) U V, η T x M and η = 1. Choose > 0 such that [, ] V and let N U be a -totally retractive neighborhood of x Preliminary lemmas. The lemmas in this section provide the results needed to show global convergence stated in Theorem 3.1. The strategy generalizes that for the Euclidean case in [11]. Where appropriate, comments are included indicating important adaptations of the reasoning to Riemannian manifolds. The main adaptations originate from the fact that the Euclidean analysis exploits the relationship y = Ḡs ([11, (2.3)]), where Ḡ is a average Hessian of f, and this relationship does not gracefully generalize to Riemannian manifolds (unless the isometric vector transport T S in Algorithm 1 is chosen as the parallel transport, a choice we often want to avoid in view of its computational cost). This difficulty requires alternative approaches in severalproofs. The ey idea of the approachesis to mae use of the scaled function m x,η (t) rather than f, a well-nown strategy in the Riemannian setting.

8 8 W. Huang and K. A. Gallivan and P.-A Absil The first result, Lemma 3.2, is used to prove Lemma 3.4. Lemma 3.2. If Assumptions 3.1, 3.2 and 3.3 hold then there exists a constant a 0 > 0 such that (3.2) 1 2 a 0 s 2 (c 1 1)α g(gradf(x ),η ). Constant a 0 can be chosen as in Definition 3.1. Proof. In Euclidean space, Taylor s Theorem is used to characterize a function around a point. A generalization of Taylor s formula to Riemannian manifolds was proposed in [32, Remar 3.2], but it is restricted to the exponential mapping rather than allowing for an arbitrary retraction. This difficulty is overcome by defining a function on a curve on the manifold and applying Taylor s Theorem. Define m (t) = f(r x (tη / η )). Since f C 2 is strongly retraction-convex on Ω by Assumption 3.3, there exist constants 0 < a 0 < a 1 such that a 0 d2 m x,η(t) a 2 1 for all t [0,α η ]. From Taylor s theorem, we now f(x +1 ) f(x ) = m (α η ) m (0) = dm (0) (3.3) = g(gradf(x ),α η )+ 1 2 α η + 1 d 2 m (p) 2 d 2 m (p) 2 (α η ) 2 g(gradf(x ),α η )+ 1 2 a 0(α η ) 2, 2 (α η ) 2 where 0 p α η. Using (3.3), the first Wolfe condition (2.1) and that s = α η, we obtain (c 1 1)g(gradf(x ),α η ) a 0 s 2 /2. Lemma 3.3 generalizes [11, (2.4)]. Lemma 3.3. If Assumptions 3.1, 3.2 and 3.3 hold then there exist two constants 0 < a 0 a 1 such that (3.4) a 0 g(s,s ) g(s,y ) a 1 g(s,s ), for all. Constants a 0 and a 1 can be chosen as in Definition 3.1. Proof. In the Euclidean case of [11, (2.4)], the proof follows easily from the convexity of the cost function and the resulting positive definiteness of the Hessian over the entire relevant set. The Euclidean proof exploits the relationship y = Ḡs, where Ḡ is the average Hessian, and that Ḡ must be positive definite to bound the inner product s T y using the largest and smallest eigenvalues that can in turn be bounded on the relevant set. We do not have this property on a Riemannian manifold but the locing condition, retraction-convexity and replacing the average Hessian with a quantity derived from a function defined on a curve on the manifold allows the generalization. Definem (t) = f(r x (tη / η )). Usingthelocingcondition(2.10)andTaylor s Theorem yields ( dm(α η ) g(s,y ) = α η dm(0) ) α η d 2 m = α η 0 2 (s)ds, and since g(s,s ) = α 2 η 2, we have g(s,y ) g(s,s ) = 1 α η d 2 m α η 0 2 (s)ds.

9 A Riemannian Broyden Class Method 9 By Assumption 3.3, it follows that a 0 g(s,y ) g(s,s ) a 1. Lemma 3.4 generalizes [11, Lemma 2.1]. Lemma 3.4. Suppose Assumptions 3.1, 3.2 and 3.3 hold. Then there exist two constants 0 < a 2 < a 3 such that (3.5) a 2 gradf(x ) cosθ s a 3 gradf(x ) cosθ, for all, where cosθ = g(gradf(x ),η ) gradf(x ) η, i.e., θ is the angle between the search direction η and the steepest descent direction, gradf(x ). Proof. Define m (t) = f(r x (tη / η )). By (2.9) of Lemma 2.1, we have g(s,y ) α (c 2 1)g(gradf(x ),η ) = α (1 c 2 ) gradf(x ) η cosθ. Using (3.4) and noticing α η = s, we now s a 2 gradf(x ) cosθ, where a 2 = (1 c 2 )/a 1, proving the left inequality. By (3.2) of Lemma 3.2, we have (c 1 1)g(gradf(x ),α η ) a 0 s 2 /2. Noting s = α η and by the definition of cosθ, we have s a 3 gradf(x ) cosθ, where a 3 = 2(1 c 1 )/a 0. Lemma 3.5 is needed to prove Lemmas 3.6 and 3.9. Lemma 3.5 gives a Lipschitzlie relationship between two related vector transports applied to the same tangent vector. T 1 below plays the same role as T S in Algorithm 1. Lemma 3.5. Let M be a Riemannian manifold endowed with two vector transports T 1 C 0 and T 2 C where T 1 satisfies (2.5) and (2.6) and both transports are associated with a same retraction R. Then for any x M there exists a constant a 4 > 0 and a neighborhood of x, U, such that for all x,y U T 1η ξ T 2η ξ a 4 ξ η, where η = R 1 x y and ξ T x M. Proof. L R (ˆx,ˆη) and L 2 (ˆx,ˆη) denote the coordinate form of T Rη and T 2η respectively. Since all norms are equivalent for a finite dimensional space, there exist 0 < b 1 (x) < b 2 (x) such that b 1 (x) ξ x ˆξ x 2 b 2 (x) ξ x for all x U, where 2 denotes the Euclidean norm, i.e., 2-norm. By choosing U compact, it follows that b 1 ξ x ˆξ x 2 b 2 ξ x for all x U where 0 < b 1 < b 2, b 1 = min x U b 1 (x) and b 2 = max x U b 2 (x). It follows that T 1η ξ T 2η ξ = T 1η ξ T Rη ξ +T Rη ξ T 2η ξ c 0 η ξ + T Rη ξ T 2η ξ c 0 η ξ + 1 b 0 (L R (ˆx,ˆη) L 2 (ˆx,ˆη))ˆξ 2 c 0 η ξ + 1 b 0 ˆξ 2 L R (ˆx,ˆη) L 2 (ˆx,ˆη) 2 c 0 η ξ +b 1 ˆξ 2 ˆη 2 (since L R (ˆx,0) = L 2 (ˆx,0) and L 2 is smooth and L R C 1 because R C 2.) = a 4 ξ η where b 0, b 1 are positive constants. Lemma 3.6 is a consequence of Lemma 3.5. Lemma 3.6. Let M be a Riemannian manifold endowed with a retraction R whose differentiated retraction is denoted T R. Let x M. Then there is a neighborhood U

10 10 W. Huang and K. A. Gallivan and P.-A Absil of x and a constant ã 4 > 0 such that for all x,y U, any ξ T x M with ξ = 1, the effect of the differentiated retraction is bounded with T Rη ξ 1 ã 4 η, where η = R 1 x y. Proof. Applying Lemma 3.5 with T 1 = T R and T 2 isometric, we have T Rη ξ T 2η ξ b 0 ξ η, where b 0 is a positive constant. Noticing ξ = 1 and that is the induced norm, we have, by an application of the triangle inequality, b 0 η T Rη ξ T 2η ξ T Rη ξ T 2η ξ = T Rη ξ 1. Similarly, we have b 0 η T 2η ξ T Rη ξ T 2η ξ T Rη ξ = 1 T Rη ξ to complete the proof. Lemma 3.7 generalizes [11, (2.13)] and implies a generalization of the Zoutendij Condition [22, Theorem 3.2], i.e., if cosθ does not approach 0, then according to this lemma, the algorithm is convergent. Lemma 3.7. Suppose Assumptions 3.1, 3.2 and 3.3 hold. Then there exists a constant a 5 > 0 such that for all f(x +1 ) f(x ) (1 a 5 cos 2 θ )(f(x ) f(x )), where cosθ = g(gradf(x ),η ) gradf(x ) η. Proof. The original proof in [11, (2.13)] uses the average Hessian. As when proving Lemma 3.2, this is replaced by considering a function defined on a curve on the manifold. Let z = Rx 1 x and ζ = (Rx 1 x )/z. Define m (t) = f(r x (tζ )). From Taylor s Theorem, we have (3.6) m (0) m (z ) = dm (z ) (0 z )+ 1 d 2 m (p) 2 2 (0 z ) 2, where p is some number between 0 and z. Notice that x is the minimizer, so m (0) m (z ) 0. Also note that a 0 d2 m (p) a 2 1 by Assumption 3.3 and Definition 3.1. We thus have (3.7) dm (z ) 1 2 a 0z. Still using (3.6) and noticing that d2 m (p) (0 z 2 ) 2 0, we have (3.8) f(x ) f(x ) dm (z ) z. Combining (3.7) and (3.8) and noticing that dm (z ) have f(x ) f(x ) 2 a 0 g 2 (gradf(x ),T Rz ζ ζ ) and = g(gradf(x ),T Rz ζ ζ ), we (3.9) f(x ) f(x ) 2 a 0 gradf(x ) 2 T Rz ζ ζ 2 b 0 gradf(x ) 2, by Lemma 3.6 and the assumptions on Ω, where b 0 is a positive constant. Using (3.5), the firstwolfecondition(2.1) anhe definitionofcosθ, weobtainf(x +1 ) f(x ) b 1 gradf(x ) 2 cos 2 θ, where b 1 is some positive constant. Using (3.9), we obtain f(x +1 ) f(x ) f(x +1 ) f(x ) + f(x ) f(x ) b 1 gradf(x ) 2 cos 2 θ + f(x ) f(x ) b1 b 0 cos 2 θ (f(x ) f(x ))+f(x ) f(x ) = (1 a 5 cos 2 θ )(f(x ) f(x )), where a 5 = b 1 /b 0 is a positive constant.

11 A Riemannian Broyden Class Method 11 Lemma 3.8 generalizes [11, Lemma 2.2]. Lemma 3.8. Suppose Assumptions 3.1, 3.2 and 3.3 hold. Then there exist two constants 0 < a 6 < a 7 such that (3.10) a 6 g(s, B s ) s 2 α a 7 g(s, B s ) s 2, for all. Proof. We have (1 c 2 )g(s, B s ) = (1 c 2 )g(α η,α B η ) = (c 2 1)α 2 g(η,gradf(x )) (by Step 3 of Algorithm 1) α g(s,y ) (by (2.9) of Lemma 2.1) α a 1 s 2 (by (3.4)). g(s Therefore,wehaveα a, B s ) 6 s, wherea 2 6 = (1 c 2 )/a 1,givingtheleftinequality. By(3.2) oflemma3.2, we have(c 1 1)α g(gradf(x ),η ) 1 2 a 0 s 2. It follows that (c 1 1)α g(gradf(x ),η ) = (1 c 1 )g(b η,α η ) = 1 c 1 α g(s, B s ). g(s Therefore, we have α a, B s ) 7 s, where a 2 7 = 2(1 c 1 )/a 0, giving the right inequality. Lemma 3.9 generalizes [11, Lemma 3.1, Equation (3.3)]. Lemma 3.9. Suppose Assumption 3.1 holds. Then, for all there exists a constant a 9 > 0 such that (3.11) g(y,y ) a 9 g(s,y ). Proof. Define y P = gradf(x +1) Pγ 1 0 gradf(x ), where γ (t) = R x (tα η ), i.e., the retraction line from x to x +1 and P γ is the parallel transport along γ (t). Details about the definition of parallel transport can be found, e.g., in [2, 5.4]. From [18, Lemma 8], we have Pγ 0 1 y P H α η b 0 α η 2 = b 0 s 2, where H = 1 γ Hessf(γ (t))pγ t 0 and b 0 > 0. It follows that 0 P0 t y y y + y P P = y y + P P γ 0 1 y P y y + P P γ 0 1 y P H α η + H α η gradf(x +1 )/β T Sα η gradf(x ) gradf(x +1 )+Pγ 1 0 gradf(x ) + H α η +b 0 s 2 gradf(x +1 )/β gradf(x +1 ) + Pγ 1 0 gradf(x ) T Sα η gradf(x ) + H α η +b 0 s 2 b 1 s gradf(x +1 ) +b 2 s gradf(x ) (by Lemmas 3.5 and 3.6) +b 3 s +b 0 s 2 (By Assumption 3.2 and b 4 s (By Assumption 3.2) Hessf is bounded above in a compact set)

12 12 W. Huang and K. A. Gallivan and P.-A Absil whereb 1, b 2, b 3 and b 4 > 0. Therefore, bylemma 3.3, wehave g(y,y ) g(s,y ) g(y,y ) a 0g(s,s ) b 2 4 a 0 giving the desired result. Lemma 3.10 generalizes [11, Lemma 3.1] and as with the earlier lemmas the proof does not use an average Hessian. Lemma Suppose Assumptions 3.1, 3.2 and 3.3 hold. Then there exist constants a 10 > 0, a 11 > 0, a 12 > 0 such that (3.12) (3.13) (3.14) g(s, B s ) g(s,y ) a 10 α B s 2 g(s, B s ) a α 11 cos 2 θ g(y, B s ) g(y,s ) a 12 α cosθ for all. Proof. By (2.9) of Lemma 2.1, we have g(s,y ) (c 2 1)g(gradf(x ),α η ). So by the Step 3 of Algorithm 1, we obtain g(s,y ) (1 c2) g(s, Bs ) and therefore g(s, B s ) g(s,y ) a 10 α, where a 10 = 1/(1 c 2 ), proving (3.12). Inequality (3.13) follows from B s 2 g(s, B s ) = α 2 gradf(x ) 2 α s gradf(x ) cosθ (by Step 3 of Algorithm 1 and the definition of cosθ ) = α gradf(x ) s cosθ a 11 α cos 2 θ, (by (3.5)) where a 11 > 0. Finally, inequality (3.14) follows from g(y, B s ) g(s,y ) α y gradf(x ) g(s,y ) a1/2 9 α gradf(x ) g 1/2 (s,y ) a1/2 9 α gradf(x ) a 1/2 0 s α (by Step 3 of Algorithm 1) (by (3.11)) (by (3.4)) a 12 α cosθ, (by (3.5)) where a 12 is a positive constant. Lemma 3.11 generalizes [11, Lemma 3.2]. Lemma Suppose Assumptions 3.1, 3.2 and 3.3 hold. φ [0,1]. Then there exists a constant a 13 > 0 such that (3.15) α j a 13, j=1 for all 1.

13 A Riemannian Broyden Class Method 13 Proof. The major difference between the Euclidean and Riemannian proofs is that in the Riemannian case, we have two operators B and B as opposed to a single operator in the Euclidean case. Once we have proven that they have the same trace and determinant, the proof unfolds similarly to the Euclidean proof. The details are given for the reader s convenience. Use hat to denote the coordinates expression of the operators B and B in Algorithm 1 and consider trace(ˆb) and det(ˆb). Note that trace(ˆb) and det(ˆb) are independent of the chosen basis. Since T S is an isometric vector transport, we have that T Sα η is invertible for all, and thus trace(ˆ B ) = trace(ˆt Sα η ˆBˆT 1 S α η ) = trace(ˆb ), det(ˆ B ) = det(ˆt Sα η ˆBˆT 1 S α η ) = det(ˆb ). From the update formula of B in Algorithm 1, the trace of the update formula is (3.16) trace(ˆb +1 ) = trace(ˆb )+ y 2 g(y,s ) +φ y 2 g(y,s ) g(s, B s ) g(y,s ) (1 φ ) B s 2 g(s, B s ) 2φ g(y, B s ) g(y,s ). Recall that φ g(s, B s ) 0. If we choose a particularbasis such that the expression of the metric is the identity, then the Broyden update equation (2.3) is exactly the classical Broyden update equation, except that B is replaced by ˆ B, where B is defined in [11, (1.4)], and by [11, (3.9)] we have (3.17) det(ˆb +1 ) det(ˆb ) g(y,s ) g(s, B s ). Since det and g(, ) are independent of the basis, it follows that (3.17) holds regardless of the chosen basis. Using (3.11), (3.12), (3.13) and (3.14) for (3.16), we obtain (3.18) trace(ˆb +1 ) trace(ˆb )+a 9 +φ a 9 a 10 α a 11(1 φ )α cos 2 θ + 2φ a 12 α cosθ Notice that (3.19) α = α gradf(x ) η = α B s s cosθ g(gradf(x ),η ) g(s, B s ) = B s s α s 2 g(s, B s ) a 7 B s s (by (3.10)). Since the fourth term in (3.18) is always negative, cosθ 1, φ 0, (3.19) and (3.18) imply that trace(ˆb +1 ) trace(ˆb )+a 9 +(φ a 9 a 10 a 7 +2φ a 12 a 7 ) B s s.

14 14 W. Huang and K. A. Gallivan and P.-A Absil Since B s s B = σ 1 d i=1 σ i = trace( B ), where σ 1 σ 2... σ d are the singular values of B, we have trace(ˆb +1 ) a 9 + (1 + φ a 9 a 10 a 7 + 2φ a 12 a 7 )trace(ˆb ). This inequality implies that there exists a constant b 0 > 0 such that (3.20) trace(ˆb +1 ) b 0. Using (3.12) and (3.17), we have 1 1 (3.21) det(ˆb +1 ) det(ˆb ) det(ˆb 1 ). a 10 α a 10 α j From the geometric/arithmetic mean inequality 3 applied to the eigenvalues of ˆB +1, we now det(ˆb +1 ) ( trace( ˆB +1 ) d ) d, where d is the dimension of manifold M. Therefore, by (3.20) and (3.21), j=1 1 a 10 α j 1 det(ˆb 1 ) ( j=1 ) d trace(ˆb +1 ) 1 d det(ˆb 1 )d d(bd 0 ). Thus there exists a constant a 13 > 0 such that j=1 α j a 13, for all Main convergence result. With the preliminary lemmas in place, the main convergence result can be stated and proven in a manner that closely follows the Euclidean proof of [11, Theorem 3.1]. Theorem 3.1. Suppose Assumptions 3.1, 3.2 and 3.3 hold and φ [0,1 δ]. Then the sequence {x } generated by Algorithm 1 converges to a minimizer x of f. Proof. Inequality (3.18) can be written as (3.22) trace(ˆb +1 ) trace(ˆb )+a 9 +t α, where t = φ a 9 a 10 a 11 (1 φ )/cos 2 θ +2φ a 12 /cosθ. The proof is by contradiction. Assume cosθ 0, then t since φ is bounded away from 1, i.e., φ 1 δ. So there exists a constant K 0 > 0 such that t < 2a 9 /a 13 for all K 0. Using (3.22) and that ˆB +1 is positive definite, we have (3.23) 0 < trace(ˆb +1 ) trace(ˆb K0 )+a 9 ( +1 K 0 )+ < trace(ˆb K0 )+a 9 ( +1 K 0 ) 2a 9 a 13 j=k 0 t j α j j=k 0 α j. Applying the geometric/arithmetic mean inequality to (3.15), we get j=1 α j a 13 and therefore (3.24) j=k 0 α j a 13 K 0 1 j=1 α j. 3 For x i 0, ( d i=1 x i) 1/d d i=1 x i/d.

15 Plugging (3.24) into (3.23), we obtain A Riemannian Broyden Class Method 15 0 < trace(ˆb K0 )+a 9 ( +1 K 0 ) 2a 9 a a K a 13 a 13 = trace(ˆb K0 )+a 9 (1 K 0 )+ 2a K α j. a 13 For large enough, the right-hand side of the inequality is negative, which contradicts theassumptionthatcosθ 0. Thereforethereexistsaconstantϑandasubsequence such that cosθ j > ϑ > 0 for all j, i.e., there is a subsequence that does not converge to 0. Applying Lemma 3.7 completes the proof. 4. Ensuring the locing condition. In order to apply an algorithm in the RBroyden family, we must specify a retraction R and an isometric vector transport T S that satisfy the locing condition (2.8). Exponential mapping and parallel transport satisfy condition (2.8) with β = 1. However, for some manifolds, we do not have the analytical form of exponential mapping and parallel transport. Even if a form is nown its evaluation may be unacceptably expensive. Two methods of constructing an isometric vector transport given a retraction and a method for constructing a retraction and an isometric vector transport simultaneously are discussed in this section. In practice, the choice of the pair must also consider if an efficient implementation is possible. In Sections 4.1 and 4.2, the differentiated retraction T Rη ξ is only needed for η and ξ in the same direction, where η,ξ T x M and x M. A reduction of complexity can follow from this restricted usage of T R. We illustrate this point for the Stiefel manifold St(p,n) = {X R n p X T X = I p }asit isusedin ourexperiments. Consider the retraction by polar decomposition [2, (4.8)], j=1 (4.1) R x (η) = (x+η)(i p +η T η) 1/2. For retraction (4.1), as shown in [17, (10.2.7)], the differentiated retraction is T Rη ξ = zλ + (I n zz T )ξ(z T (x + η)) 1, where z = R x (η), Λ is the unique solution of the Sylvester equation (4.2) z T ξ ξ T z = ΛP +PΛ and P = y T (x+η) = (I p +η T η) 1/2. Now consider the more specific tas of computing T Rη ξ for η and ξ in the same direction. Since T Rη ξ is linear in ξ, we can assume without loss of generality that ξ = η. Then the solution of (4.2) has a closed form, i.e., Λ = P 1 x T ηp 1. This can be seen from z T η η T z = z T (η +x x) (η +x x) T z = x T z z T x+z T (η +x) (η +x) T z =x T z z T x = x T (x+η)(i p +η T η) 1/2 (I p +η T η) 1/2 (x+η) T x =x T ηp 1 P 1 η T x = x T ηp 1 +P 1 x T η (x T η is a sew symmetric matrix) =P(P 1 x T ηp 1 )+(P 1 x T ηp 1 )P. The closed form solution of Λ yields a form, T Rη η = (I n zp 1 η T )ηp 1, with lower complexity. We are not aware of such a low-complexity closed-form solution for T Rη ξ when η and ξ are not in the same direction. j=1 α j

16 16 W. Huang and K. A. Gallivan and P.-A Absil 4.1. Method 1: from a retraction and an isometric vector transport. Given a retraction R, if an associated isometric vector transport, T I, for which there is an efficient implementation, is nown then T I can be modified so that it satisfies condition (2.8). Consider x M, η T x M, y = R x (η) and define the tangent vectors ξ 1 = T Iη η and ξ 2 = βt Rη η with the normalizing scalar β = η T Rη η. The desired isometric vector transport is ( (4.3) T Sη ξ = id 2ν 2ν2 )( ν2 ν id 2ν 1ν ) 1 2 ν1 ν T Iη ξ, 1 where ν 1 = ξ 1 ω and ν 2 = ω ξ 2. ω could be any tangent vector in T y M which satisfies ω = ξ 1 = ξ 2. If ω is any unit-norm vector in the space spanned by {ξ 1,ξ 2 }, e.g., ω = ξ 1 or ξ 2, then P y is the well-nown direct rotation from ξ 1 to ξ 2 in the inner product that defines. The use of the negative sign avoids numerical cancelationasξ 1 approachesξ 2, i.e., nearconvergence. TwoHouseholderreflectorsare used to preserve the orientation, which is sufficient to mae T S satisfy the consistency condition (ii) of vector transport. It can be shown that this T S satisfies conditions (2.5) and (2.6) [17, Theorem 4.4.1] Method 2: from a retraction and bases of tangent spaces. Method 1 modifies a given isometric vector transport T I. In this section, we show how to construct an isometric vector transport T I from a field of orthonormal tangent bases. Let d denote the dimension of manifold M and let the function giving a basis of T x M be B : x B(x) = (b 1,b 2,...b d ), where b i,1 i d form an orthonormal basis of T x M. Consider x M, η,ξ T x M, y = R x (η), B 1 = B(x) and B 2 = B(y). Define [ ] T, B1 : T x M R d : η x g x (b (1) 1,η x)... g x (b (1) d,η (1) x) where b i is the i-th element of B 1 and liewise for B2. Then T I in Method 1 can be chosen to be defined by T Iη = B 2 B1. Simplifying (4.3) yields the desired isometric vector transport ( T Sη ξ = B 2 I 2v 2v2 T )( v2 Tv I 2v 1v T ) 1 2 v1 Tv B1ξ, 1 wherev 1 = B 1η w,v 2 = w βb 2T Rη η. w canbeanyvectorsuchthat w = B 1η = βb 2 T R η η, and choosing w = B 1 η or βb 2 T R η η yields a direct rotation. The problem therefore becomes how to build the function B. Absil et al. [2, p. 37] give an approach based on (U,ϕ), a chart of the manifold M, that yields a smooth B defined in the chart domain. E i, the i-th coordinate vector field of (U,ϕ) on U, is defined by (E i f)(x) := i (f ϕ 1 )(ϕ(x)) = D(f ϕ 1 )(ϕ(x))[e i ]. These coordinate vector fields are smooth and every vector field ξ T M admits the decomposition ξ = i (ξϕ i)e i on U, where ϕ i is the i-th component of ϕ. The function B : x B(x) = (E 2,E 2,...,E d ) is a smooth function that builds a basis on U. Finally, any orthogonalization method, such as the Gram-Schmi algorithm or a QR decomposition, can be used to get an orthonormal basis giving the function B : x B(x) = B(x)M(x), where M(x) is an upper triangle matrix with positive diagonal terms Method 3: from a transporter. Let L(TM,TM) denote the fiber bundlewithbasespacem Msuchthatthefiberover(x,y) M MisL(T x M,T y M), the set of all linear maps from T x M to T y M. We define a transporter L on M

17 A Riemannian Broyden Class Method 17 to be a smooth section of the bundle L(TM,TM) that is, for (x,y) M M, L(x,y) L(T x M,T y M) such that, for all x M, L(x,x) = id. Given a retraction R, it can be shown that T defined by (4.4) T ηx ξ x = L(x,R x (η x ))ξ x is a vector transport with associated retraction R. Moreover, if L(x, y) is isometric from T x M to T y M, then the vector transport (4.4) is isometric. (The term transporter was used previously in [24] in the context of embedded submanifolds.) In this section, it is assumed that an efficient transporter L is given, and we show that the retraction R defined by the differential equation (4.5) d R x(tη) = L(x,R x (tη))η, R x (0) = x and the resulting vector transport T defined by(4.4) satisfy the locing condition(2.8). To this end, let η be an arbitrary vector in T x M. We have T η η = L(x,R x (η))η = d R x(tη) t=1 = d dτ R x(η+τη)) τ=0 = T Rη η, where we have used (4.4), (4.5) and (2.7). That is, R and T satisfy the locing condition (2.8) with β = 1. For some manifolds, there exist transporters such that (4.5) has a closed solution, e.g., the Stiefel manifold [17, ] and the Grassmann manifold [17, ]. 5. Limited-memory RBFGS. In the form of RBroyden methods discussed above, explicit representations are needed for the operators B, B, T Sα η, and T 1 S α. These may not be available. Furthermore, even if explicit expressions are η nown, applying them may be unacceptably expensive computationally, e.g., the matrix multiplication required in the update of B. Generalizations of the Euclidean limited-memory BFGS method can solve this problem for RBFGS. The idea of limitedmemory RBFGS (LRBFGS) is to store some number of the most recent s and y and to transport those vectors to the new tangent space rather than the entire matrix H. For RBFGS, the inverse update formula is H +1 = V H V + ρ s s, where H = T Sα η H T 1 1 S α, ρ η = g(y,s ) and V = id ρ y s. If the l+1 most recent s and y are stored then we have H +1 = Ṽ Ṽ 1 Ṽ l H 0 +1Ṽ l Ṽ 1Ṽ +ρ l Ṽ Ṽ 1 Ṽ l+1 s(+1) l s (+1) l + +ρ s (+1) s (+1), Ṽ l+1 Ṽ 1Ṽ where Ṽi = id ρ i y (+1) i s (+1) i, H0 +1 is the initial Hessian approximation for step + 1, s (+1) i represents a tangent vector in T x+1 M given by transporting s i and liewise for y (+1) i. The details of transporting s i,y i to s (+1) i are given later. Note that H +1 0 is not necessarily H l. It can be any positive definite self-adjoint operator. Similar to the Euclidean case, we use (5.1) H0 +1 = g(s,y ) g(y,y ) id.,y (+1) i ItiseasilyseenthatStep4toStep13ofAlgorithm2yieldη +1 = H +1 gradf(x +1 ) (generalized from the two-loop recursion, see e.g., [22, Algorithm 7.4]), and only the action of T S is needed.

18 18 W. Huang and K. A. Gallivan and P.-A Absil Algorithm 2 LRBFGS Input: Riemannian manifold M with Riemannian metric g; a retraction R; isometric vector transport T S that satisfies (2.8); smooth function f on M; initial iterate x 0 M; an integer l > 0. 1: = 0, ε > 0, 0 < c 1 < 1 2 < c 2 < 1, γ 0 = 1, l = 0; 2: while gradf(x ) > ε do 3: H 0 = γ id. Obtain η T x M by the following algorithm: 4: q gradf(x ) 5: for i = 1, 2,...,l do 6: ξ i ρ i g(s () i,q); 7: q q ξ i y () i ; 8: end for 9: r H 0 q; 10: for i = l,l+1,..., 1 do 11: ω ρ i g(y () i,r); 12: r r+s () i (ξ i ω); 13: end for 14: set η = r; 15: find α that satisfies Wolfe conditions f(x +1 ) f(x )+c 1 α g(gradf(x ),η ) d f(r d x(tη )) t=α c 2 f(r x(tη ) t=0 16: Set x +1 = R x (α η ); 17: Defines (+1) = T Sα η α η, y (+1) = gradf(x +1 )/β T Sα η gradf(x ), ρ =,y (+1) ), γ +1 = g(s (+1),y (+1) 1/g(s (+1) )/ y (+1) 2, and β = α η T Rα η α η ; 18: Let l = max{ l,0}. Add s (+1), y (+1) and ρ into storage and if > l, then discard vector pair {s () l 1,y() l 1 } and scalar ρ l 1 from storage; Transport s () l,s () l+1,...,s() 1 and y() l,y () l+1,...,y() get s (+1) l,s (+1) l+1,...,s (+1) 1 and y (+1) l 19: = +1; 20: end while 1 from T x M to T x+1 M by T S, then,y (+1) l+1,...,y (+1) 1 ; Algorithm 2 is a limited-memory algorithm based on this idea. Note Step 18 of Algorithm 2. The vector s (+1) m is obtained by transporting s ( m+1) m m times. If the vector transport is insensitive to finite precision then the approachisacceptable. Otherwise,s (+1) m maybenotint x +1 M. Caremustbetaen to avoid this situation. One possibility is to project s (+1) i,y (+1) i,i = l,l+1,..., 1 to tangent space T x+1 M after every transport. 6. Ring and Wirth s RBFGS update formula. In Ring and Wirth s RBFGS [26] forinfinite dimensionalriemannian manifolds, the directionvectorη is chosenas the solution in T x M of the equation B RW (η,ξ) = Df Rx (0)[ξ] = g( gradf(x ),ξ) for all ξ T x M, where f Rx = f R x.

19 A Riemannian Broyden Class Method 19 The update in [26] is B RW +1(T Sα η ζ,t Sα η ξ) = B RW (ζ,ξ) BRW (s,ζ)b RW (s,ξ) B RW (s,s ) + yrw (ζ)y RW (ξ) y RW (s ) where s = Rx 1 (x +1 ) T x M and y RW = Df Rx (s ) Df Rx (0) is a cotangent vector at x, i.e., Df Rx (s )[ξ] = g(gradf(r x (s )),T Rs ξ) for all ξ T x M. One can equivalently define y RW to be y, where y = TR s gradf(r x (s )) gradf(x ) is in the tangent space T x M. Similarly, in order to ease the comparison by falling bac to the conventions of Algorithm 1, define B to be the linear transformation of T x M by B RW (ζ,ξ) = ζ B ξ. Then the update in [26] becomes T 1 S α B η +1 T Sα η = B B s (B s ) + y y (B s ) s y s. Since the vector transport is isometric, the update can be equivalently applied on T x+1 M and yields B +1 = B B s ( B s ) + ỹỹ ( B s ) s ỹ s, where s and B are defined in Algorithm 1 and (6.1) ỹ = T Sα η T R s gradf(r x (s )) T Sα η gradf(x ) T x+1 M. Comparing the definition of y in Algorithm 1 to (6.1), one can see that the difference between the updates in Algorithm 1 and [26] is that y uses a scalar 1 β while ỹ uses T Sα η TR s. 7. Experiments. To investigate the performance of the RBroyden family of methods, we consider the minimization of the Brocett cost function f : St(p,n) R : X trace(x T AXN), which finds p smallest eigenvalues and corresponding eigenvectors of A, where N = diag(µ 1,...,µ p ) with µ 1 > > µ p > 0, A R n n and A = A T. It has been shown that the columns of a global minimizer, X e i, are eigenvectors for the p smallest eigenvalues, λ i, ordered so that λ 1 λ p [2, 4.8]. The Brocett cost function is not a retraction-convex function on the entire domain. However, for generic choices of A, the cost function is strongly retraction-convex in a sublevel set around any global minimizer. RBroyden algorithms do not require retraction-convexity to be well-defined and all converge to a global minimizer of the Brocett cost function if started sufficiently close. For our experiments we view St(p,n) as an embedded submanifold of the Euclidean space R n p endowed with the Euclidean metric A,B = trace(a T B). The corresponding gradient can be found in [2, p. 80] Methods tested. For problems with sizes moderate enough to mae dense matrix operations acceptable in the complexity of a single step, we tested algorithms in the RBroyden family using the inverse Hessian approximation update (7.1) H +1 = H H y ( H y ) ( H y + s s ) y s y + φ g(y, H y )u u,,

20 20 W. Huang and K. A. Gallivan and P.-A Absil where u = s /g(s,y ) H y /g(y, H y ). It can be shown that if H = B 1, φ [0,1) and (7.2) φ = (1 φ )g 2 (y,s ) (1 φ )g 2 (y,s )+φ g(y, H y )g(s, B s ) (0,1], then H +1 = B The Euclidean version of the relationship between φ and φ can be found, e.g., in [16, (50)]. In our tests, we set variable φ to Davidon s value φ D defined in (7.6) or we set φ = φ = 1.0,0.8,0.6,0.4,0.2,0.1,0.01,0. For these problem sizes the inverse Hessian approximation update tends to be preferred to the Hessian approximation update since it avoids solving a linear system. Our experiments on problems of moderate size also include the RBFGS in [26] with the inverse Hessian update formula [26, (7)]. Finally, we investigate the performance of the LRBFGS for problems of moderate size and problems with much larger sizes. In the latter cases, since dense matrix operations are too expensive computationally and spatially, LRBFGS is compared to a Riemannian conjugate gradient method, RCG, described in [2]. All experiments are performed in Matlab R2014a on a 64 bit Ubuntu platform with 3.6 GHz CPU(Intel(R) Core(TM) i7-4790) Vector transport and retraction. Sections 2.2 and 2.3 in [18] provide methods for representing a tangent vector and constructing isometric vector transports when the d-dimensional manifold M is a submanifold of R w. If the codimension w d is not much smaller than w, then a d-dimensional representation for a tangent vector and an isometric transporter by parallelization L(x,y) = B y B x are preferred, where B : V R w d : z B z is a smooth function defined on an open set V of M and the columns of B z form an orthonormal basis of T z M. Otherwise, a w-dimensional representation for a tangent vector and an isometric transporter by rigging L(x,y) = G 1 2 y (I Q x Q T x Q y Q T x)g 1 2 x are preferred, where G z denotes a matrix expression of g z, i.e., g z (η z,ξ z ) = ηz TG zη z, Q x is given by orthonormalizing (I N x (Nx TN x) 1 Nx T)N y, liewise with x and y interchanged for Q y, and N : V R (w d) d : z N z is a smooth function defined on an open set V of M and the columns of N z form an orthonormal basis of the normal space at z. In particular for St(p,n), w and d are np and np p(p + 1)/2 respectively. Methods of obtaining smooth functions to build smooth bases of the tangent and normal spaces for St(p,n) are discussed in [18, Section 5]. We use the ideas in Section 4.3 applied to the isometric transporter derived from the parallelization B introduced in[18, Section 5] to define a retraction R. The details, wored out in [17, ], yield the following result. Let X St(p,n). We have (7.3) (R X (η) R X (η) ) = (X X ) ( exp ( Ω K T K 0 (n p) (n p) )), where Ω = X T η, K = X T η and given M Rn p, M R n (n p) denotes a matrix that satisfies M T M = I (n p) (n p) and M T M = I p (n p). The function (7.4) Y = R X (η)

A Broyden Class of Quasi-Newton Methods for Riemannian Optimization

A Broyden Class of Quasi-Newton Methods for Riemannian Optimization A Broyden Class of Quasi-Newton Methods for Riemannian Optimization Wen Huang, K. A. Gallivan, P.-A. Absil FSU15-04 Florida State University Mathematics Department Tech. report FSU-15-04 A BROYDEN CLASS

More information

Optimization Algorithms on Riemannian Manifolds with Applications

Optimization Algorithms on Riemannian Manifolds with Applications Optimization Algorithms on Riemannian Manifolds with Applications Wen Huang Coadvisor: Kyle A. Gallivan Coadvisor: Pierre-Antoine Absil Florida State University Catholic University of Louvain November

More information

A Riemannian symmetric rank-one trust-region method

A Riemannian symmetric rank-one trust-region method http://sites.uclouvain.be/absil/2013.03 Tech. report UCL-INMA-2013.03-v2 A Riemannian symmetric ran-one trust-region method Wen Huang P.-A. Absil K. A. Gallivan January 15, 2014 Abstract The well-nown

More information

Rank-Constrainted Optimization: A Riemannian Manifold Approach

Rank-Constrainted Optimization: A Riemannian Manifold Approach Ran-Constrainted Optimization: A Riemannian Manifold Approach Guifang Zhou1, Wen Huang2, Kyle A. Gallivan1, Paul Van Dooren2, P.-A. Absil2 1- Florida State University - Department of Mathematics 1017 Academic

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS

LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS Tech. report INS Preprint No. 626 LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS SOMAYEH HOSSEINI, WEN HUANG, ROHOLLAH YOUSEFPOUR Abstract. This paper presents line search

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

Optimization methods on Riemannian manifolds and their application to shape space

Optimization methods on Riemannian manifolds and their application to shape space Optimization methods on Riemannian manifolds and their application to shape space Wolfgang Ring Benedit Wirth Abstract We extend the scope of analysis for linesearch optimization algorithms on (possibly

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

arxiv: v1 [math.oc] 22 May 2018

arxiv: v1 [math.oc] 22 May 2018 On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods Yu Bai Song Mei arxiv:1805.08756v1 [math.oc] 22 May 2018 May 23, 2018 Abstract We prove that a simple Sequential

More information

Algorithms for Riemannian Optimization

Algorithms for Riemannian Optimization allivan, Absil, Qi, Baker Algorithms for Riemannian Optimization 1 Algorithms for Riemannian Optimization K. A. Gallivan 1 (*) P.-A. Absil 2 Chunhong Qi 1 C. G. Baker 3 1 Florida State University 2 Catholic

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS by Johannes Brust, Oleg Burdaov, Jennifer B. Erway, and Roummel F. Marcia Technical Report 07-, Department of Mathematics and Statistics, Wae

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry

Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry 1 Ahmed Douik, Student Member, IEEE and Babak Hassibi, Member, IEEE arxiv:1802.02628v1 [math.oc] 7 Feb 2018 Abstract

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

B 1 = {B(x, r) x = (x 1, x 2 ) H, 0 < r < x 2 }. (a) Show that B = B 1 B 2 is a basis for a topology on X.

B 1 = {B(x, r) x = (x 1, x 2 ) H, 0 < r < x 2 }. (a) Show that B = B 1 B 2 is a basis for a topology on X. Math 6342/7350: Topology and Geometry Sample Preliminary Exam Questions 1. For each of the following topological spaces X i, determine whether X i and X i X i are homeomorphic. (a) X 1 = [0, 1] (b) X 2

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve

A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve Ronny Bergmann a Technische Universität Chemnitz Kolloquium, Department Mathematik, FAU Erlangen-Nürnberg,

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Math Linear Algebra II. 1. Inner Products and Norms

Math Linear Algebra II. 1. Inner Products and Norms Math 342 - Linear Algebra II Notes 1. Inner Products and Norms One knows from a basic introduction to vectors in R n Math 254 at OSU) that the length of a vector x = x 1 x 2... x n ) T R n, denoted x,

More information

R-Linear Convergence of Limited Memory Steepest Descent

R-Linear Convergence of Limited Memory Steepest Descent R-Linear Convergence of Limited Memory Steepest Descent Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 16T-010 R-Linear Convergence

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

An implementation of the steepest descent method using retractions on riemannian manifolds

An implementation of the steepest descent method using retractions on riemannian manifolds An implementation of the steepest descent method using retractions on riemannian manifolds Ever F. Cruzado Quispe and Erik A. Papa Quiroz Abstract In 2008 Absil et al. 1 published a book with optimization

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f))

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f)) 1. Basic algebra of vector fields Let V be a finite dimensional vector space over R. Recall that V = {L : V R} is defined to be the set of all linear maps to R. V is isomorphic to V, but there is no canonical

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Convergence analysis of Riemannian trust-region methods

Convergence analysis of Riemannian trust-region methods Convergence analysis of Riemannian trust-region methods P.-A. Absil C. G. Baker K. A. Gallivan Optimization Online technical report Compiled June 19, 2006 Abstract This document is an expanded version

More information

DEVELOPMENT OF MORSE THEORY

DEVELOPMENT OF MORSE THEORY DEVELOPMENT OF MORSE THEORY MATTHEW STEED Abstract. In this paper, we develop Morse theory, which allows us to determine topological information about manifolds using certain real-valued functions defined

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

4.2. ORTHOGONALITY 161

4.2. ORTHOGONALITY 161 4.2. ORTHOGONALITY 161 Definition 4.2.9 An affine space (E, E ) is a Euclidean affine space iff its underlying vector space E is a Euclidean vector space. Given any two points a, b E, we define the distance

More information

I teach myself... Hilbert spaces

I teach myself... Hilbert spaces I teach myself... Hilbert spaces by F.J.Sayas, for MATH 806 November 4, 2015 This document will be growing with the semester. Every in red is for you to justify. Even if we start with the basic definition

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

Spectral Processing. Misha Kazhdan

Spectral Processing. Misha Kazhdan Spectral Processing Misha Kazhdan [Taubin, 1995] A Signal Processing Approach to Fair Surface Design [Desbrun, et al., 1999] Implicit Fairing of Arbitrary Meshes [Vallet and Levy, 2008] Spectral Geometry

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Matrix Manifolds: Second-Order Geometry

Matrix Manifolds: Second-Order Geometry Chapter Five Matrix Manifolds: Second-Order Geometry Many optimization algorithms make use of second-order information about the cost function. The archetypal second-order optimization algorithm is Newton

More information

Curvature homogeneity of type (1, 3) in pseudo-riemannian manifolds

Curvature homogeneity of type (1, 3) in pseudo-riemannian manifolds Curvature homogeneity of type (1, 3) in pseudo-riemannian manifolds Cullen McDonald August, 013 Abstract We construct two new families of pseudo-riemannian manifolds which are curvature homegeneous of

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

A Riemannian conjugate gradient method for optimization on the Stiefel manifold

A Riemannian conjugate gradient method for optimization on the Stiefel manifold A Riemannian conjugate gradient method for optimization on the Stiefel manifold Xiaojing Zhu Abstract In this paper we propose a new Riemannian conjugate gradient method for optimization on the Stiefel

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Chapter Three. Matrix Manifolds: First-Order Geometry

Chapter Three. Matrix Manifolds: First-Order Geometry Chapter Three Matrix Manifolds: First-Order Geometry The constraint sets associated with the examples discussed in Chapter 2 have a particularly rich geometric structure that provides the motivation for

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

arxiv: v2 [math.oc] 31 Jan 2019

arxiv: v2 [math.oc] 31 Jan 2019 Analysis of Sequential Quadratic Programming through the Lens of Riemannian Optimization Yu Bai Song Mei arxiv:1805.08756v2 [math.oc] 31 Jan 2019 February 1, 2019 Abstract We prove that a first-order Sequential

More information

MATH 4030 Differential Geometry Lecture Notes Part 4 last revised on December 4, Elementary tensor calculus

MATH 4030 Differential Geometry Lecture Notes Part 4 last revised on December 4, Elementary tensor calculus MATH 4030 Differential Geometry Lecture Notes Part 4 last revised on December 4, 205 Elementary tensor calculus We will study in this section some basic multilinear algebra and operations on tensors. Let

More information

Tikhonov Regularization of Large Symmetric Problems

Tikhonov Regularization of Large Symmetric Problems NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2000; 00:1 11 [Version: 2000/03/22 v1.0] Tihonov Regularization of Large Symmetric Problems D. Calvetti 1, L. Reichel 2 and A. Shuibi

More information

Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery

Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery Mingkui Tan, School of omputer Science, The University of Adelaide, Australia Ivor W. Tsang, IS, University of Technology Sydney,

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Convergence analysis of Riemannian GaussNewton methods and its connection with the geometric condition number by Paul Breiding and

More information

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e., Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

INNER PRODUCT SPACE. Definition 1

INNER PRODUCT SPACE. Definition 1 INNER PRODUCT SPACE Definition 1 Suppose u, v and w are all vectors in vector space V and c is any scalar. An inner product space on the vectors space V is a function that associates with each pair of

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly. C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a

More information

Numerical Methods for Large-Scale Nonlinear Systems

Numerical Methods for Large-Scale Nonlinear Systems Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.

More information

Differential Topology Final Exam With Solutions

Differential Topology Final Exam With Solutions Differential Topology Final Exam With Solutions Instructor: W. D. Gillam Date: Friday, May 20, 2016, 13:00 (1) Let X be a subset of R n, Y a subset of R m. Give the definitions of... (a) smooth function

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 ) A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Riemannian Optimization and its Application to Phase Retrieval Problem

Riemannian Optimization and its Application to Phase Retrieval Problem Riemannian Optimization and its Application to Phase Retrieval Problem 1/46 Introduction Motivations Optimization History Phase Retrieval Problem Summary Joint work with: Pierre-Antoine Absil, Professor

More information

Intrinsic Representation of Tangent Vectors and Vector Transports on Matrix Manifolds

Intrinsic Representation of Tangent Vectors and Vector Transports on Matrix Manifolds Tech. report UCL-INMA-2016.08.v2 Intrinsic Representation of Tangent Vectors and Vector Transports on Matrix Manifolds Wen Huang P.-A. Absil K. A. Gallivan Received: date / Accepted: date Abstract The

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India

Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India CHAPTER 9 BY Prof. M. Saha Professor of Mathematics The University of Burdwan West Bengal, India E-mail : mantusaha.bu@gmail.com Introduction and Objectives In the preceding chapters, we discussed normed

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection.

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection. HOMEWORK 2 - RIEMANNIAN GEOMETRY ANDRÉ NEVES 1. Problems In what follows (M, g will always denote a Riemannian manifold with a Levi-Civita connection. 1 Let X, Y, Z be vector fields on M so that X(p Z(p

More information

THE EULER CHARACTERISTIC OF A LIE GROUP

THE EULER CHARACTERISTIC OF A LIE GROUP THE EULER CHARACTERISTIC OF A LIE GROUP JAY TAYLOR 1 Examples of Lie Groups The following is adapted from [2] We begin with the basic definition and some core examples Definition A Lie group is a smooth

More information