Optimization methods on Riemannian manifolds and their application to shape space

Size: px

Start display at page:

Download "Optimization methods on Riemannian manifolds and their application to shape space"

Jayson Adams
5 years ago
Views:

1 Optimization methods on Riemannian manifolds and their application to shape space Wolfgang Ring Benedit Wirth Abstract We extend the scope of analysis for linesearch optimization algorithms on (possibly infinitedimensional Riemannian manifolds to the convergence analysis of the BFGS quasi-newton scheme and the Fletcher Reeves conjugate gradient iteration. Numerical implementations for exemplary problems in shape spaces show the practical applicability of these methods. Introduction There are a number of problems that can be expressed as a minimization of a function f : M R over a smooth Riemannian manifold M. Applications range from linear algebra (e. g. principal component analysis or singular value decomposition,, ] to the analysis of shape spaces (e. g. computation of shape geodesics, 24], see also the references in 3]. If the manifold M can be embedded in a higher-dimensional space or if it is defined via equality constraints, then there is the option to employ the very advanced tools of constrained optimization (see 6] for an introduction. Often, however, such an embedding is not at hand, and one has to resort to optimization methods designed for Riemannian manifolds. Even if an embedding is nown, one might hope that a Riemannian optimization method performs more efficiently since it exploits the underlying geometric structure of the manifold. For this purpose, various methods have been devised, from simple gradient descent on manifolds 25] to sophisticated trust region methods 5]. The aim of this article is to extend the scope of analysis for these methods, concentrating on linesearch methods. In particular, we will consider the convergence of BFGS quasi-newton methods and Fletcher Reeves nonlinear conjugate gradient iterations on (possibly infinite-dimensional manifolds, thereby filling a gap in the existing analysis. Furthermore, we apply the proposed methods to exemplary problems, showing their applicability to manifolds of practical interest such as shape spaces. Early attempts to adapt standard optimization methods to problems on manifolds were presented by Gabay 9] who introduced a steepest descent, a Newton, and a quasi-newton algorithm, also stating their global and local convergence properties (however, without giving details of the analysis for the quasi-newton case. Udrişte 23] also stated a steepest descent and a Newton algorithm on Riemannian manifolds and proved (linear convergence of the former under the assumption of exact linesearch. Fairly recently, Yang too up these methods and analysed convergence and convergence rate of steepest descent and Newton s method for Armijo step-size control 25]. In comparison to standard linesearch methods in vector spaces, the above approaches all substitute the linear step in the search direction by a step along a geodesic. However, geodesics may be difficult to obtain. In alternative approaches, the geodesics are thus often replaced by more general paths, based on so-called retractions (a retraction R x is a mapping from the tangent space T x M to the manifold M at x onto M. For example, Hüper and Trumpf find quadratic convergence for Newton s method without step-size control ], even if the Hessian is computed for a different retraction than the one defining the path along which the Newton step is taen. In the sequel, more advanced trust region Newton methods have been developed and analysed in a series of papers 6,, 5]. The analysis requires some type of uniform Lipschitz continuity for the gradient and the Hessian of the concatenations f R x, which we will also mae use of. The global and local convergence analysis of gradient descent, Newton s method, and trust region methods on manifolds with general retractions is summarized in 2]. In 2], the authors also present a way to implement a quasi-newton as well as a nonlinear conjugate gradient iteration, however, without analysis. Riemannian BFGS quasi-newton methods, which generalize Gabay s original approach, have been devised in 8]. Lie the above schemes, they do not rely on

2 geodesics but allow more general retractions. Furthermore, the vector transport between the tangent spaces T x M and T x+ M at two subsequent iterates (which is needed for the BFGS update of the Hessian approximation is no longer restricted to parallel transport. A similar approach is taen in 7], where a specific, non-parallel vector transport is considered for a linear algebra application. A corresponding global convergence analysis of a quasi-newton scheme has been performed by Ji 2]. However, to obtain a superlinear convergence rate, specific conditions on the compatibility between the vector transports and the retractions are required (cf. section 3. which are not imposed (at least explicitly in the above wor. In this paper, we pursue several objectives. First, we extend the convergence analysis of standard Riemannian optimization methods (such as steepest descent and Newton s method to the case of optimization on infinite-dimensional manifolds. Second, we analyze the convergence rate of the Riemannian BFGS quasi-newton method as well as the convergence of a Riemannian Fletcher Reeves conjugate gradient iteration (as two representative higher order gradient-based optimization methods, which to the authors nowledge has not been attempted before (neither in the finite- nor the infinite-dimensional case. The analysis is performed in the unifying framewor of step-size controlled linesearch methods, which allows for rather streamlined proofs. Finally, we demonstrate the feasibility of these Riemannian optimization methods by applying them to problems in state-of-the-art shape spaces. The outline of this article is as follows. Section 2 summarizes the required basic notions. Section 3 then introduces linesearch optimization methods on Riemannian manifolds, following the standard procedure for finite-dimensional vector spaces 6, 7] and giving the analysis of basic steepest descent and Newton s method as a prerequisite for the ensuing analysis of the BFGS scheme in section 3. and the nonlinear CG iteration in section 3.2. Finally, numerical examples on shape spaces are provided in section 4. 2 Notations Let M denote a geodesically complete (finite- or infinite-dimensional Riemannian manifold. In particular, we assume M to be locally homeomorphic to some separable Hilbert space H 4, Sec..], that is, for each x M there is some neighborhood x U x M and a homeomorphism φ x from U x into some open subset of H. Let C (M denote the vector space of smooth real functions on M (in the sense that for any f C (M and any chart (U x, φ x, f φ x is smooth. For any smooth curve γ : 0, ] M we define the tangent vector to γ at t 0 (0, as the linear operator γ(t 0 : C (M R such that ( d γ(t 0 f = dt f γ (t 0 f C (M. The tangent space T x M to M in x M is the set of tangent vectors γ(t 0 to all smooth curves γ : 0, ] M with γ(t 0 = x. It is a vector space, and it is equipped with an inner product g x (, : T x M T x M R, the so-called Riemannian metric, which smoothly depends on x. The corresponding induced norm will be denoted x, and we will employ the same notation for the associated dual norm. Let V(M denote the set of smooth tangent vector fields on M, then we define a connection : V(M V(M V(M as a map such that fx+gy Z = f X Z + g Y Z X, Y, Z V(M and f, g C (M, Z (ax + by = a Z X + b Z Y X, Y, Z V(M and a, b R, Z (fx = (ZfX + f Z X X, Z V(M and f C (M. In particular, we will consider the Levi-Civita connection, which additionally satisfies the properties of absence of torsion and preservation of the metric, X, Y ] = X Y Y X X, Y V(M, Xg(Y, Z = g( X Y, Z + g(y, X Z X, Y, Z V(M, where X, Y ] = XY Y X denotes the Lie bracet and g(x, Y : M x g x (X, Y. Any connection is paired with a notion of parallel transport. Given a smooth curve γ : 0, ] M, the initial value problem γ(t v = 0, v(0 = v 0 2

3 defines a way of transporting a vector v 0 T γ(0 M to a vector v(t T γ(t M. For the Levi-Civita connection considered here this implies constancy of the inner product g γ(t (v(t, w(t for any two vectors v(t, w(t, transported parallel along γ. For x = γ(0, y = γ(t, we will denote the parallel transport of v(0 T x M to v(t T y M by v(t = Tx,yv(0 Pγ (if there is more than one t with y = γ(t, the correct interpretation will become clear from the context. As detailed further below, Tx,y Pγ can be interpreted as the derivative of a specific mapping P γ : T x M M. We assume that any two points x, y M can be connected by a shortest curve γ : 0, ] M with x = γ(0, y = γ(, where the curve length is measured as Lγ] = 0 g γ(t ( γ(t, γ(t dt. Such a curve is denoted a geodesic, and the length of geodesics induces a metric distance on M, dist(x, y = min Lγ]. γ:0,] M γ(0=x,γ(=y If the geodesic γ connecting x and y is unique, then γ(0 T x M is denoted the logarithmic map of y with respect to x, log x y = γ(0. It satisfies dist(x, y = log x y x. Geodesics can be computed via the zero acceleration condition γ(t γ(t = 0. The exponential map exp x : T x M M, v exp x v, is defined as exp x v = γ(, where γ solves the above ordinary differential equation with initial conditions γ(0 = x, γ(0 = v. It is a diffeomorphism of a neighborhood of 0 T x M into a neighborhood of x M with inverse log x. We shall denote the geodesic between x and y by γx; y], assuming that it is unique. The exponential map exp x obviously provides a (local parametrization of M via T x M. We will also consider more general parameterizations, so-called retractions. Given x M, a retraction is a smooth mapping R x : T x M M with R x (0 = x and DR x (0 = id TxM, where DR x denotes the derivative of R x. The inverse function theorem then implies that R x is a local homeomorphism. Besides exp x, there are various possibilities to define retractions. For example, consider a geodesic γ : R M with γ(0 = x, parameterized by arc length, and define the retraction ( P γ : T x M M, v exp pγ(v T Pγ x,p v π γ(v γ(v], where p γ (v = exp x (π γ (v and π γ denotes the orthogonal projection onto span{ γ(0}. This retraction corresponds to running along the geodesic γ according to the component of v parallel to γ(0 and then following a new geodesic into the direction of the parallel transported component of v orthogonal to γ(0. We will later have to consider the transport of a vector from one tangent space T x M into another one T y M, that is, we will consider isomorphisms T x,y : T x M T y M. We are particularly interested in operators Tx,y Rx which represent the derivative DR x (v of a retraction R x at v T x M with R x (v = y (where in case of multiple v with R x (v = y it will be clear from the context which v is meant. In that sense, the parallel transport Tx,y Pγ along a geodesic γ connecting x and y belongs to the retraction P γ. Another possible vector transport is defined by the variation of the exponential map, evaluated at the representative of y in T x M, T exp x x,y = D exp x (log x y, which maps each v 0 T x M onto the tangent vector γ(0 to the curve γ : t exp x (log x y + tv 0. Also the adjoints (Ty,x Pγ, (T exp y y,x (defined by g y (v, Ty,xw = g x (T y,x v, w v T y M, w T x M or inverses (Ty,x Pγ, (T exp y y,x can be considered for vector transport T x,y. Note that Tx,y Pγ is an isometry with Tx,y Pγ = (Ty,x P γ = (Ty,x P γ, where γ is the geodesic connecting x with y and γ( = γ(. Furthermore, log x y is transported onto the same vector T x,y log x y = γ( by Tx,y, Pγ T exp x x,y and their adjoints and inverses. Given a smooth function f : M R, we define its (Fréchet-derivative Df(x at a point x M as an element of the dual space to T x M via Df(xv = vf. The Riesz representation theorem then implies the existence of a f(x T x M such that Df(xv = g x ( f(x, v for all v T x M, which we denote the gradient of f at x. On T x M, define f Rx = f R x 3

4 for a retraction R x. Due to DR x (0 = id we have Df Rx (0 = Df(x. Furthermore, f Rx = f Ry Ry R x where Ry exists and thus for y = R x (v, Df Rx (v = Df Ry (0DR x (v = Df(yT Rx x,y, f Rx (v = (T Rx x,y f(y. We define the Hessian D 2 f(x of a smooth f at x as the symmetric bilinear form D 2 f(x : T x M T x M R, (v, w g x ( v f(x, w, which is equivalent to D 2 f(x = D 2 f expx (0. By 2 f(x : T x M T x M we denote the linear operator mapping v T x M onto the Riesz representation of D 2 f(x(v,. Note that if M is embedded into a vector space X and f = f M for a smooth f : X R, we usually have D 2 f(x D 2 f(x TxM T xm (which is easily seen from the example X = R n 2, f(x = x X, M = {x X : x X = }. For a smooth retraction R x we do not necessarily have D 2 f Rx (0 = D 2 f(x, however, this holds at stationary points of f, 2]. Let T n (T x M denote the vector space of n-linear forms T : (T x M n R together with the norm T x = sup v,...,v n T xm We call a function g : M x M T n (T x M, x g(x T (v,...,v n v x... v n x. T n (T x M, (Lipschitz continuous at ˆx M if x g(x (T P γˆx;x] ˆx,x n T n (Tˆx M is (Lipschitz continuous g(x P at ˆx (with Lipschitz constant lim sup x ˆx (T γˆx;x] ˆx,x n g(ˆx /dist(x, ˆx. Here, γˆx; x] denotes ˆx the shortest geodesic, which is unique in the neighborhood of ˆx. (Lipschitz continuity on U M means (Lipschitz continuity at every x U (with uniform Lipschitz constant. A function f : M R is called n times (Lipschitz continuously differentiable, if D l f : M x M T l (T x M is (Lipschitz continuous for 0 l n. 3 Iterative minimization via geodesic linesearch methods In classical optimization on vector spaces, linesearch methods are widely used. They are based on updating the iterate by choosing a search direction and then adding a multiple of this direction to the old iterate. Adding a multiple of the search direction obviously requires the structure of a vector space and is not possible on general manifolds. The natural extension to manifolds is to follow the search direction along a path. We will consider iterative algorithms of the following generic form. Algorithm (Linesearch minimization on manifolds. Input: f : M R, x 0 M, = 0 repeat choose a descent direction p T x M choose a retraction R x : T x M M choose a step length α R set + = R x (α p + until + sufficiently minimizes f Here, a descent direction denotes a direction p with Df( p < 0. This property ensures that the objective function f indeed decreases along the search direction. For the choice of the step length, various approaches are possible. In general, the chosen α has to fulfill a certain quality requirement. We will here concentrate on the so-called Wolfe conditions, that is, for a given descent direction p T x M, the chosen step length α has to satisfy f(r x (αp f(x + c αdf(xp, Df(R x (αpt Rx x,r x(αp p c 2Df(xp, where 0 < c < c 2 <. Note that both conditions can be rewritten as f Rx (αp f Rx (0 + c αdf Rx (0p and Df Rx (αpp c 2 Df Rx (0p, the classical Wolfe conditions for minimization of f Rx. If the second condition is replaced by Df(R x (αpt Rx x,r p c x(αp 2Df(xp, (2 we obtain the so-called strong Wolfe conditions. Most optimization algorithms also wor properly if just the so-called Armijo condition (a is satisfied. In fact, the stronger Wolfe conditions are only needed for the later analysis of a quasi-newton scheme. Given a descent direction p, a feasible step length can always be found. (a (b 4

5 Proposition (Feasible step length, e. g. 6, Lem. 3.]. Let x M, p T x M be a descent direction and f Rx : span{p} R continuously differentiable. Then there exists α > 0 satisfying ( and (2. Proof. Rewriting the Wolfe conditions as f Rx (αp f Rx (0+c αdf Rx (0p and Df Rx (αpp c 2 Df Rx (0p ( Df Rx (αpp c 2 Df Rx (0p, respectively, the standard argument for Wolfe conditions in vector spaces can be applied. Besides the quality of the step length, the convergence of linesearch algorithms naturally depends on the quality of the search direction. Let us introduce the angle θ between the search direction p and the negative gradient f(, Df( p cos θ =. Df( x p x There is a classical lin between the convergence of an algorithm and the quality of its search direction. Theorem 2 (Zoutendij s theorem. Given f : M R bounded below and differentiable, assume the α in Algorithm to satisfy (. If the f Rx are Lipschitz continuously differentiable on span{p } with uniform Lipschitz constant L, then cos 2 θ Df( 2 <. N Proof. The proof for optimization on vector spaces also applies here: Lipschitz continuity and (b imply α L p 2 (Df Rx (α p Df Rx (0p (c 2 Df( p, from which we obtain α (c 2 Df( p /(L p 2. Then (a implies f(+ f( c c 2 L cos2 θ Df( 2 so that the result follows from the boundedness of f by summing over all. Corollary 3 (Convergence of generalized steepest descent. Let the search direction in Algorithm be the solution to B (p, v = Df( v v T x M, where the B are uniformly coercive and bounded bilinear forms on T x M (the case B (, = g x (, yields the steepest descent direction. Under the conditions of Theorem 2, Df( x 0. Proof. Obviously, cos θ = follows from Zoutendij s theorem. B (p,p B (p, x p x is uniformly bounded above zero so that the convergence For continuous Df, the previous corollary implies that any limit point x of the sequence is a stationary point of f. Hence, on finite-dimensional manifolds, if {x M : f(x f(x 0 } is bounded, can be decomposed into subsequences each of which converges against a stationary point. On infinitedimensional manifolds, where only a wea convergence of subsequences can be expected, this is not true in general (which is not surprising given that not even existence of stationary points is granted without stronger conditions on f such as sequential wea lower semi-continuity. Moreover, the limit points may be non-unique. However, in the case of (locally strictly convex functions we have (local convergence against the unique minimizer by the following classical estimate. Proposition 4. Let U M be a geodesically star-convex neighborhood around x M (i. e. for any x U the shortest geodesics from x to x lie inside U, and define V (U = {v exp x (U : v x w x w exp x (exp x (v}. Let f be twice differentiable on U with x being a stationary point. If D 2 f expx V (U T x M, then there exists m > 0 such that for all x U, is uniformly coercive on dist(x, x m Df(x x. 5

6 Proof. By hypothesis, there is m > 0 with D 2 f expx (x(v, v m v 2 x for all x exp x (U, v T x M. = Dfexp x (vv v x Thus, for v = log x x we have Df(x x Df(x log x x log x x m v x = mdist(x, x. (Note that coercivity along γx; x ] would actually suffice. = 0 D2 f expx (tv(v,v dt v x Hence, if f is smooth with Df(x = 0 and D 2 f(x = D 2 f expx (0 coercive (so that there exists a neighborhood U M of x such that D 2 f expx is uniformly coercive on V (U, then by the above proposition and Corollary 3 we thus have x if {x M : f(x f(x 0 } U. Remar. Note the difference between steepest descent and an approximation of the so-called gradient flow, the solution to the ODE dx dt = f(x. While the gradient flow aims at a smooth curve along which the direction at each point is the steepest descent direction, the steepest descent linesearch tries to move into a fixed direction as long as possible and only changes its direction to be the one of steepest descent if the old direction does no longer yield sufficient decrease. Therefore, the steepest descent typically taes longer steps in one direction. Remar 2. The condition on f Rx in Theorems and 2 may be untangled into conditions on f and R x. For example, one might require f to be Lipschitz continuously differentiable and T R,+ span{p } to be uniformly bounded, which is the case for R x = exp x or R x = P γx ;+ ], for example. As a method with only linear convergence, steepest descent requires many iterations. Improvement can be obtained by choosing in each iteration the Newton direction p N as search direction, that is, the solution to D 2 f Rx (0(p N, v = Df( v v T x M. (3 Note that the Newton direction is obtained with regard to the retraction R x. The fact that D 2 f Rx (0 = D 2 f(x at a stationary point x and the later result that the Hessian only needs to be approximated (Proposition 8 suggests that D 2 f( or a different approximation could be used as well to obtain fast convergence. In contrast to steepest descent, Newton s method is invariant with respect to a rescaling of f and in the limit allows a constant step size and quadratic convergence as shown in the following sequence of propositions which can be transferred from standard results (e. g. 6, Sec. 3.3]. Proposition 5 (Newton step length. Let f be twice differentiable and D 2 f Rx (0 continuous in x. Consider Algorithm and assume the α to satisfy ( with c 2 and the p to satisfy lim Df( + D 2 f Rx (0(p, x = 0. (4 p x Furthermore, let the f Rx be twice Lipschitz continuously differentiable on span{p } with uniform Lipschitz constant L. If x with Df(x = 0 and D 2 f(x bounded and coercive is a limit point of so that I x for some I N, then α = would also satisfy ( (independent of whether α = is actually chosen for sufficiently large I. Proof. Let m > 0 be the lowest eigenvalue belonging to D 2 f(x = D 2 f Rx (0. The continuity of the second derivative implies the uniform coercivity D 2 f Rx (0(v, v m 2 v 2 for all I sufficiently large. From D 2 f Rx (0(p p N, = Df( + D 2 f Rx (0(p, we then obtain p p N = o( p x. Furthermore, condition (4 implies lim Df( p + D 2 f Rx (0(p, p / p 2 = 0 and thus 0 = lim sup Df( p p 2 + D2 f (0(p Rx, p Df( p p 2 lim sup x, I p 2 + m 2 for I sufficiently large. Due to Df( x I 0 we deduce p x I 0. Df( p / p 2 m 4 (5 6

7 By Taylor s theorem, f Rx (p =f Rx (0+Df Rx (0p + 2 D2 f Rx (q (p, p for some q 0, p ] so that f Rx (p f( 2 Df(p = 2 (Df(p + D 2 f Rx (q (p, p = ( Df( p + D 2 f Rx (0(p N, p + D 2 f Rx (0(p p N, p 2 ( ] + D 2 f Rx (q D 2 f Rx (0 (p, p o( p 2 which implies feasibility of α = with respect to (a for I sufficiently large. Also, ( Df Rx (p p = Df(p + D 2 f Rx (0(p, p + D 2 f Rx (tp D 2 f Rx (0 (p, p dt = o( p 2 0 which together with (5 implies Df Rx (p p c 2 Df( p for sufficiently large I, that is, (b for α =. Lemma 6. Let U M be open and retractions R x : T x M M, x U, have equicontinuous derivatives at x in the sense ε > 0 δ > 0 x U : v x < δ T P γx;rx(v] DR x (0 DR x (v < ε. x,r x(v Then for any ε > 0 there is an ε > 0 such that for all x U and v, w T x M with v x, w x < ε, ( ε w v x dist(r x (v, R x (w ( + ε w v x. Proof. For ε > 0 there is δ > 0 such that for any x U, v x < δ implies T P γ x;r x (v] x,r x(v DR x (v < ε. From this we obtain for v, w T x M with v x, w x < δ that dist(r x (v, R x (w 0 DR x (v + t(w v(w v Rx(v+t(w v dt w v x + ( + ε w v x. 0 ] DR x (v + t(w v T P γx;rx(v+t(w v] x,r x(v+t(w v (w v Rx(v+t(w v Furthermore, for δ small enough, the shortest geodesic path between R x (v and R x (w can be expressed as t R x (p(t, where p : 0, ] T x M with p(0 = v and p( = w. Then, for v x, w x < ( ε δ 2 =: ε, dist(r x (v, R x (w = DR x(p(t dp(t 0 dt dt Rx(p(t dp(t 0 dt dt x 0 ( ε dp(t dt dt ( ε w v x, x 0 DR x (p(t T P γx;rx(p(t] x,r x(p(t ] dp(t dt Rx(p(t where we have used p(t x < δ for all t 0, ], since otherwise one could apply the above estimate to the segments 0, t and (t 2, ] with t = inf{t : p(t x δ}, t 2 = sup{t : p(t x δ}, which yields dist(r x (v, R x (w ( ε 0,t (t 2,] dp(t dt dt ( ε2(δ ε = ( ε 2 δ > ( + ε w v x, x contradicting the first estimate. Proposition 7 (Convergence of Newton s method. Let f be twice differentiable and D 2 f Rx (0 continuous in x. Consider Algorithm where p = p N as defined in (3, α satisfies ( with c 2, and α = whenever possible. Assume has a limit point x with Df(x = 0 and D 2 f(x bounded and coercive. Furthermore, assume that in a neighborhood U of x, the DR x are equicontinuous in the above sense dt dt 7

8 and that the f Rx with U are twice Lipschitz continuously differentiable on Rx (U with uniform Lipschitz constant L. Then x with for some C > 0. dist(+, x lim dist 2 (, x C Proof. There is a subsequence ( I, I N, with I x. By Proposition 5, α = for I sufficiently large. Furthermore, by the previous lemma there is ɛ > 0 such that 2 w v dist(r x (v, R x (w 3 2 w v for all w, v T x M with v x, w x < ɛ. Hence, for I large enough such that dist(, x < ɛ 4, there exists R (x, and D 2 f Rx (0(p Rx (x, = which implies p R (x x 0 x = (Df Rx (Rx (x Df( D 2 f Rx (0(Rx (x, ( D 2 f Rx (trx (x D 2 f Rx (0 (R (x, dt x x L R (x 2 Rx (x 2 for the smallest eigenvalue m of D 2 f(x (using the 2 L m same argument as in the proof of Proposition 5. The previous lemma then yields the desired convergence rate (note from the proof of Proposition 5 that p tends to zero and thus also convergence of the whole sequence. The Riemannian Newton method was already proposed by Gabay in 982 9]. Smith proved quadratic convergence for the more general case of applying Newton s method to find a zero of a one-form on a manifold with the Levi-Civita connection 2] (the method is stated for an unspecified affine connection in 4], using geodesic steps. Yang rephrased the proof within a broader framewor for optimization algorithms, however, restricting to the case of minimizing a function (which corresponds to the method introduced above, only with geodesic retractions 25]. Hüper and Trumpf show quadratic convergence even if the retraction used for taing the step is different from the retraction used for computing the Newton direction ]. A more detailed overview is provided in 2, Sec. 6.6]. While all these approaches were restricted to finite-dimensional manifolds, we here explicitly include the case of infinite-dimensional manifolds. A superlinear convergence rate can also be achieved if the Hessian in each step is only approximated, which is particularly interesting with regard to our aim of also analyzing a quasi-newton minimization approach. Proposition 8 (Convergence of approximated Newton s method. Let the assumptions of the previous proposition hold, but instead of the Newton direction p N consider directions p which only satisfy (4. Then we have superlinear convergence, dist(+, x lim dist(, x = 0. Proof. From the proof of Proposition 5 we now p p N x = o( p x. Also, the proof of Proposition 7 shows p N Rx (x x 2 L m p Rx (x x p p N x + p N R (x 2. Thus we obtain Rx (x x o( p x + 2 L R x m (x 2. This inequality first implies p x = O( R (x x and then p Rx (x x so that the result follows as in the proof of Proposition 7. = o( R (x x Remar 3. Of course, again the conditions on f Rx in the previous analysis can be untangled into separate conditions on f and the retractions. For example, one might require f to be twice Lipschitz continuously differentiable and the retractions R x to have uniformly bounded second derivatives for x in a neighborhood of x and arguments in a neighborhood of 0 T x M. For example, R x = exp x 8

9 or R x = P γx ;+ ] satisfy these requirements if the manifold M is well-behaved near x. (On a twodimensional manifold, the second derivative of the exponential map deviates the stronger from zero, the larger the Gaussian curvature is in absolute value. On higher-dimensional manifolds, to compute the directional second derivative of the exponential map in two given directions, it suffices to consider only the two-dimensional submanifold which is spanned by the two directions, so the value of the directional second derivative depends on the sectional curvature belonging to these directions. If all sectional curvatures of M are uniformly bounded near x which is not necessarily the case for infinite-dimensional manifolds, this implies also the boundedness of the second derivatives of the exponential map so that R x = exp x or R x = P γx ;+ ] indeed satisfy the requirements. 3. BFGS quasi-newton scheme A classical way to retain superlinear convergence without computing the Hessian at every iterate consists in the use of quasi-newton methods, where the objective function Hessian is approximated via the gradient information at the past iterates. The most popular method, which we would lie to transfer to the manifold setting here, is the BFGS ran-2-update formula. Here, the search direction p is chosen as the solution to B (p, = Df(, (6 where the bilinear forms B : (T x M 2 R are updated according to s = α p = R (+ y = Df Rx (s Df Rx (0 B + (T v, T w = B (v, w B (s, vb (s, w B (s, s + (y v(y w y s v, w T x M. Here, T T x,+ denotes some linear map from T x M to T x+ M, which we obviously require to be invertible to mae B + well-defined. Remar 4. There are more possibilities to define the BFGS update, for example, using s = R + (, y = Df Rx+ (0 Df Rx+ ( s, and a corresponding update formula for B + (which loos as above, only with the vector transport at different places. The analysis wors analogously, and the above choice only allows the most elegant notation. Actually, the formulation from the above remar was already introduced by Gabay 9] (for geodesic retractions and parallel transport and resumed by Absil et al. 2, 8] (for general retractions and vector transport. A slightly different variant, which ignores any ind of vector transport, was provided in 2], together with a proof of convergence. In contrast to these approaches, we also consider infinitedimensional manifolds and prove convergence as well as superlinear convergence rate. Lemma 9. Consider Algorithm with the above BFGS search direction and Wolfe step size control, where T and T are uniformly bounded and the f Rx are assumed smooth. If B 0 is bounded and coercive, then y s > 0 and B is bounded and coercive for all N. Proof. Assume B to be bounded and coercive. Then p is a descent direction, and (b implies the curvature condition y s (c 2 α Df Rx (0p > 0 so that B + is well-defined. Let us denote by ŷ = f Rx (s f Rx (0 the Riesz representation of y. Furthermore, by the Lax Milgram lemma, there is ˆB : T x M T x M with B (v, w = g x (v, ˆB w for all v, w T x M. Obviously, ] ˆB + = T ˆB B (s, ˆB s B (s, s + y ( ŷ y s T If H = ˆB, then by the Sherman Morrison formula, H + = T J H J + g (s, s y s ] T, J = ( id g (s, ŷ, (7 y s 9

10 is the inverse of ˆB +. Its boundedness is obvious. Furthermore, H + is coercive. Indeed, let M = ˆB, then for any v T x M with v x = and w = v g (s,v y s ŷ we have g x+ (T v, H +T v = g (w, H w + g (s, v 2 y s M w 2 + g (s, v 2 y s which is strictly greater than zero since the second summand being zero implies the first one being greater than or equal to M. In finite dimensions this already yields the desired coercivity, in infinite dimensions it still remains to show that the right-hand side is uniformly bounded away from zero for all unit vectors v T x M. Indeed, M w 2 + g (s, v 2 = 2 g ( ] 2 (s, v gx (s, v g x (ŷ, v + ŷ 2 x y s M y s y s + g (s, v 2 y s is a continuous function in (g x (s, v, g x (ŷ, v which by Weierstrass theorem taes its minimum on s x, s x ] y x, y x ]. This minimum is the smallest eigenvalue of H + and must be greater than zero. Both the boundedness and coercivity of T H +T imply boundedness and coercivity of ˆB + and thus of B +. By virtue of the above theorem, the BFGS search direction is well-defined for all iterates. It satisfies the all-important secant condition B + (T s, = y T which basically has the interpretation that B + (T 2 is supposed to be an approximation to D 2 f Rx (s. Since we also aim at B + D 2 f Rx+ (0, the choice T = T R,+ seems natural in view of the fact D 2 f Rx (Rx (x = D 2 f Rx (0 (Tx,x Rx 2 for a stationary point x. However, we were only able to establish the method s convergence for isometric T (see Proposition 0. For the actual implementation, instead of solving (6 one applies H to f(. By (7, this entails computing the transported vectors T f(, T 2 T f(,..., T0 T... T f( and their scalar products with the s i and y i, i = 0,...,, as well as the transports of the s i and y i. The convergence analysis of the scheme can be transferred from standard analyses (e. g. 7, Prop..6.7-Thm..6.9] or 6, Thm. 6.5] and 2] for slightly weaer results. Due to their technicality, the corresponding proofs are deferred to the appendix. Proposition 0 (Convergence of BFGS descent. Consider Algorithm with BFGS search direction and Wolfe step size control, where the T are isometries. Assume the f Rx to be uniformly convex on the f(x 0 -sublevel set of f, that is, there are 0 < m < M < such that m v 2 D 2 f Rx (p(v, v M v 2 v T x M for p R ({x M : f(x f(x 0 }. If B 0 is symmetric, bounded and coercive, then there exists a constant 0 < µ < such that for the minimizer x M of f. f( f(x µ + (f(x 0 f(x Corollary. Under the conditions of the previous proposition and if f expx is uniformly convex in the sense m v 2 x D2 f expx (p(v, v M v 2 x v T x M if f expx (p f(x 0, then dist(, x M µ + dist(x 0, x N. m 0

11 Proof. From f(x f(x = 2 D2 f expx (t log x x(log x x, log x x for some t 0, ] we obtain m 2 dist2 (x, x f(x f(x M 2 dist2 (x, x. Remar 5. The isometry condition on T is for example satisfied for the parallel transport from to +. The Riesz representation of y can be computed as (T R,+ f(+ f(. For R x = P γx ;+ ], this means simple parallel transport of f(+. For a superlinear convergence rate, it is well-nown that in infinite dimensions one needs particular conditions on the initial Hessian approximation B 0 (it has to be an approximation to the true Hessian in the nuclear norm, see e. g. 9]. On manifolds we furthermore require a certain type of consistency between the vector transports as shown in the following proposition, which modifies the analysis from 6, Thm. 6.6]. For ease of notation, define the averaged Hessian G = 0 D2 f Rx (ts dt and let the hat in Ĝ denote the Lax Milgram representation of G. The proof of the following is given in the appendix. Proposition 2 (BFGS approximation of Newton direction. Consider Algorithm with BFGS search direction and Wolfe step size control. Let x be a stationary point of f with bounded and coercive D 2 f(x. For large enough, assume Rx (x to be defined and assume the existence of isomorphisms T, : T x M T x M with T,, T < C uniformly for some C > 0 and, T R,x T, id 0, T,+ T T, id < bβ for some b > 0, 0 < β <, where denotes the nuclear norm (the sum of all singular values, T = tr T T. Finally, assume T,0 ˆB 0 T,0 2 f(x < for a symmetric, bounded and coercive B 0 and let Then T,ĜT, 2 f(x < =0 lim so that Proposition 8 can be applied. and Df( + D 2 f Rx (0(p, 2 f Rx (Rx (x 2 f Rx (0 0. x = 0 p x Corollary 3 (Convergence rate of BFGS descent. Consider Algorithm with BFGS search direction and Wolfe step size control, where c 2, α = whenever possible, and the T are isometries. Let x be a stationary point of f with bounded and coercive D 2 f(x, where we assume f expx to be uniformly convex on its f(x 0 -sublevel set. Assume that in a neighborhood U of x, the DR x are equicontinuous (in the sense of Lemma 6 and that the f Rx with U are twice Lipschitz continuously differentiable on Rx (U with uniform Lipschitz constant L and uniformly convex on the f(x 0 -sublevel set of f. Furthermore, assume the existence of isomorphisms T, : T x M T x M with T, and T, uniformly bounded and T R,x T, id, T,+ T T, id c max{dist(, x, dist(+, x } for some c > 0. If ˆB 0 is bounded and coercive with T,0 ˆB 0 T,0 2 f(x <, then x with dist(+, x lim dist(, x = 0. Proof. The conditions of Corollary are satisfied so that we have dist(, x < bβ for some b > 0, 0 < β <. As in the proof of Proposition 7, the equicontinuity of the retraction variations then implies the well-definedness of Rx (x for large enough. Also, T R,x T, id, T,+ T T, id < cbβ 0.

12 Furthermore, 2 f Rx (Rx (x 2 f Rx (0 L Rx (x x the uniform Lipschitz continuity of 2 f Rx and Lemma 6. Liewise, = O(dist(, x 0 follows from G (T, 2 D 2 f(x T, 2 f Rx (0T, 2 f(x + T, 2 G D 2 f Rx (0. The second term is bounded by T, 2 L R (+ x and thus by some constant times β due to the equivalence between the retractions and the exponential map near x (Lemma 6. The first term on the right-hand side can be bounded as in the proof of Proposition 2 which also yields a constant times β so that Proposition 2 can be applied. Proposition 8 then implies the result. On finite-dimensional manifolds, the operator norm is equivalent to the nuclear norm, and the condition on B 0 reduces to B 0 being bounded and coercive. In that case, if the manifold M has bounded sectional curvatures near x, f expx is uniformly convex, and f twice Lipschitz continuously differentiable, then the above conditions are for instance satisfied with R x = exp x or R x = P γx ;+ ] and T as well as T, being standard parallel transport. Note that the nuclear norm bound on T,+ T T, id is quite a strong condition, and it is not clear whether it could perhaps be relaxed. If for illustration we imagine T and T, to be simple parallel transport, then the bound implies that the manifold behaves almost lie a hyperplane (except for a finite number of dimensions. 3.2 Fletcher Reeves nonlinear conjugate gradient scheme Other gradient-based minimization methods with good convergence properties in practice include nonlinear conjugate gradient schemes. Here, in each step the search direction is chosen conjugate to the old search direction in a certain sense. Such methods often enjoy superlinear convergence rates. For example, Luenberger has shown +n x = O( x 2 for the Fletcher Reeves nonlinear conjugate gradient iteration on flat n-dimensional manifolds. However, such analyses seem quite special and intricate, and the understanding of these methods seems not yet as advanced as of the quasi-newton approaches, for example. Hence, we will only briefly transfer the convergence analysis for the standard Fletcher Reeves approach to the manifold case. Here, the search direction is chosen according to p = f( + β T R, p, β = Df( 2 Df( 2, (with β 0 = 0 and the step length α satisfies the strong Wolfe condition. Remar 6. If (as is typically done the nonlinear conjugate gradient iteration is restarted every K steps with p ik = f(x ik, i N, then Theorem 2 directly implies lim inf Df( x = 0. Hence, if f is strictly convex in the sense of Proposition 4, we obtain convergence of against the minimizer. We immediately have the following classical bound (e. g. 6, Lem. 5.6]. Lemma 4. Consider Algorithm with Fletcher Reeves search direction and strong Wolfe step size control with c 2 < 2. Then c 2 Df(p Df( 2 2c 2 c 2 N. Proof. The result is obvious for = 0. In order to perform an induction, note R x,+ p Df(+ p + Df(+ T Df(+ 2 = + β + + Df(+ 2 + = + Df Rx (α p p Df( 2. 2

13 The strong Wolfe condition (2 thus implies Df( p + c 2 Df( 2 Df(+p + Df( p Df(+ 2 c 2 + Df( 2 from which the result follows by the induction hypothesis. This bound allows the application of a standard argument (e. g. 6, Thm. 5.7] to obtain lim inf Df( x = 0. Thus, as before, if f is strictly convex in the sense of Proposition 4, the sequence converges against the minimizer. Proposition 5 (Convergence of Fletcher Reeves CG iteration. Under the conditions of the previous lemma and if T R,+ p p x for all N, then lim inf Df( x = 0. x+ Proof. The inequality of the previous lemma can be multiplied by Df( x p x to yield Theorem 2 then implies Df( 4 =0 p 2 c 2 Df( p 2c 2 c 2 Df( x p x cos θ c 2 Df( x p x. c2 c 2 Df( 2 so that p 2 = f( 2 + 2β Df( T R, p + β 2 T R 2, p f( 2 + 2c 2 c 2 β Df( 2 + β 2 p 2 <. Furthermore, by (2 and the previous lemma, Df( T R, p = + c 2 c 2 f( 2 + Df( 4 Df( 4 p 2 + c 2 c 2 Df( 4 j=0 Df(x j 2 x j, where the last step follows from induction and p 0 = f(x 0. However, if we now assume Df( x γ for some γ > 0 and all N, then this implies p 2 +c2 c 2 Df( 4 γ so that Df( 4 2 =0 p 2 c 2 +c 2 γ 2 =0 =, contradicting Zoutendij s theorem. Remar 7. R x = exp x and R x = P γx ;+ ] both satisfy the conditions required for convergence (the iterations for both retractions coincide. If the vector transport associated with the retraction increases the norm of the transported vector, convergence can no longer be guaranteed. Remar 8. If in the iteration we instead use β = Df(( f( f Rx (Rx ( Df( 2 or β + = Df (α Rx p ( f (α Rx p f( Df( 2 we obtain a Pola Ribière variant of a nonlinear CG iteration. If there are 0 < m < M < such that m v 2 D 2 f Rx (p(v, v M v 2 v T x M whenever f(r x (p f(x 0, if (T R,+ T R x (respectively,+ is uniformly bounded, and if α is obtained by exact linesearch, then for all N one can show cos θ (respectively cos θ + M m T R,+ which implies at least linear convergence, dist(, x 2 m f(x 0 f(x ] mm ρ2. + M m (T Rx,+ Here, the proofs of.5.8 and.5.9 from 7] can be directly transferred with λ i := α i, h i := p i, g i := f(x i, H(x i + sλ i h i := 2 f Rxi (sα i p i and replacing g i+ by Df xi (α i p i in (9a, g i, h i by Df Rxi (α i p i p i, and H i h i, g i+ by either (T Rx i x i,x i+ H i h i, g i+ or H i h i, Df Rxi (α i p i in the denominator of (9d..5.8(b follows from Theorem 2 similarly to the proof of Proposition 0. =: ρ 3

14 Figure : Minimization of f (left and f 2 (right on the torus via the Fletcher Reeves algorithm. The top row performs the optimization in the parametrization domain, the bottom row shows the result for the Riemannian optimization. For each case the optimization path is shown on the torus and in the parametrization domain (the color-coding from blue to red indicates the function value as well as the evolution of the function values. Obviously, optimization in the parametrization domain is more suitable for f, whereas Riemannian optimization with the torus metric is more suitable for f 2. 4 Numerical examples In this section we will first consider simple optimization problems on the two-dimensional torus to illustrate the influence of the Riemannian metric on the optimization progress. Afterwards we turn to an active contour model and a simulation of truss shape deformations as exemplary optimization problems to prove the efficiency of Riemannian optimization methods also in the more complex setting of Riemannian shape spaces. 4. The rôle of metric and vector transport The minimum of a functional on a manifold is independent of the manifold metric and the chosen retractions. Consequently, exploiting the metric structure does not necessarily aid the optimization process, and one could certainly impose different metrics on the same manifold of which some are more beneficial for the optimization problem at hand than others. The optimal pair of a metric and a retraction would be such that one single gradient descent step already hits the minimum. However, the design of such pairs requires far more effort than solving the optimization problem in a suboptimal way. Often, a certain metric and retraction fit naturally to the optimization problem at hand. For illustration, consider the two-dimensional torus, parameterized by (ϕ, ψ y(ϕ, ψ = ((r + r 2 cos ϕ cos ψ, (r + r 2 cos ϕ sin ψ, r 2 sin ϕ, ϕ, ψ π, π], r = 2, r 2 = 3 5. The corresponding metric shall be induced by the Euclidean embedding. As objective functions let us consider the following, both expressed as functions on the parametrization domain, f (ϕ, ψ = a( cos ϕ + b(ψ + π/2 mod (2π π] 2, f 2 (ϕ, ψ = a( cos ϕ + bdist(ϕ, Φ ψ 2, where we use (a, b = (, 40 and for all ψ π, π], Φ ψ is a discrete set such that {(ϕ, ψ π, π] 2 : ϕ Φ ψ } describes a shortest curve winding five times around the torus and passing through (ϕ, ψ = (0, 0 (compare Figure, right. Both functions may be slightly altered so that they are smooth all over the torus. f exhibits a narrow valley that is aligned with geodesics of the parametrization domain, while the valleys of f 2 follow a geodesic path on the torus. Obviously, an optimization based on the (Euclidean metric and (straight line geodesic retractions of the parametrization domain is much better in following the valley of f than an optimization based on the actual torus metric (Figure. For f 2 the situation is reverse. This phenomenon is also reflected in the iteration numbers until convergence (Table. It is not very pronounced for methods which converge after only few iterations, but it is very noticeable especially for gradient descent and nonlinear conjugate gradient iterations. 4

15 objective f objective f 2 parametr. metric torus metric parametr. metric torus metric gradient descent Newton descent BFGS quasi-newton Fletcher Reeves NCG Table : Iteration numbers for minimization of f and f 2 with different methods. The iteration is stopped as soon as ( fi ϕ, fi ψ l 2 < 0 3. As retractions we use the exponential maps with respect to the metric of the parametrization domain and of the torus, respectively. The BFGS and NCG method employ the corresponding parallel transport iteration number Figure 2: Evolution of distance d to the global minimizer (ϕ, ψ = (0, 0 of f 2 for different methods, starting from (ϕ, ψ = ( 0, 0. The thic dashed and solid line belongs to Riemannian gradient descent and Riemannian BFGS descent (with T being parallel transport, respectively. The thin lines show the evolution for Riemannian BFGS descent, where T is parallel transport concatenated with a rotation by ω for ω = 00 (solid, ω = d 0 (dashed, ω = 2 5 d (dotted, and ω = 5 log( log d (dot-dashed. Of course, the qualitative convergence behavior stays the same (such as linear convergence for gradient descent or superlinear convergence for the BFGS method, independent of the employed retractions. Therefore it sometimes pays off to choose rapidly computable retractions over retractions that minimize the number of optimization steps. We might for example minimize f or f 2 using the torus metric but nongeodesic retractions R x : T x M M, v y(y (x + Dy v (a straight step in the parametrization domain, where y was the parameterization. The resulting optimization will behave similarly to the optimization based on the (Euclidean metric of the parametrization domain, and indeed, the Fletcher Reeves algorithm requires 47 minimization steps for f and 83 for f 2. As a final discussion based on the illustrative torus example, consider the conditions on the vector transport T for the BFGS method. It is an open question whether the isometry of T is really needed for global convergence in Proposition 0. On compact manifolds such as the torus this condition is not needed since the sequence of iterates will always contain a converging subsequence so that global convergence follows from Proposition 2 instead of Proposition 0. A counterexample, showing the necessity of isometric transport, will liely be difficult to obtain. On the other hand, we can numerically validate the conditions on T to obtain superlinear convergence. Figure 2 shows the evolution of the distance between the current iterate and x = arg min f 2 for the BFGS method with varying T. In particular, we tae T to be parallel transport concatenated with a rotation by some angle ω that depends on dist(, x. The term T,+ T T, id in Proposition 2 and Corollary 3 then scales (at least lie ω. Apparently, the superlinear convergence seems to be retained for ω dist(, x as well as ω dist(, x, which is better than predicted by Corollary 3. (This is not surprising, though, since T,+ T T, id dist(, x combined with superlinear convergence of can still satisfy the conditions of Proposition 2. However, for constant ω and ω log( log(dist(,x, convergence is indeed only linear. Nevertheless, convergence is still much faster than for gradient descent. 5

16 4.2 Riemannian optimization in the space of smooth closed curves Riemannian optimization on the Stiefel manifold has been applied successfully and efficiently to several linear algebra problems, from low ran approximation 7] to eigenvalue computation 8]. The same concepts can be transferred to efficient optimization methods in the space of closed smooth curves, using the shape space and description of curves introduced by Younes et al. 26]. They represent a curve c : 0, ] C R 2 by two functions e, g : 0, ] R via c(θ = c(0 + 2 θ 0 (e + ig 2 dϑ. The conditions that the curve be closed, c( = c(0, and of unit length, = 0 c (θ dθ, result in the fact that e and g are orthonormal in L 2 (0, ], thus (e, g forms an element of the Stiefel manifold { } St(2, L 2 (0, ] = (e, g L 2 (0, ] : e L2 (0,] = g L 2 (0,] =, (e, g L 2 (0,] = 0. The Riemannian metric of the Stiefel manifold can now be imposed on the the space of smooth closed curves with unit length and fixed base point, which was shown to be equivalent to endowing this shape space with a Sobolev-type metric 26]. For general closed curves we follow Sundaramoorthi et al. 22] and represent a curve c by an element (c 0, ρ, (e, g of R 2 R St(2, L 2 (0, ] via c(θ = c 0 + exp ρ 2 θ 0 (e + ig 2 dϑ. c 0 describes the curve base point and exp ρ its length. (Note that Sundaramoorthi et al. choose c 0 as the curve centroid. Choosing the base point instead simplifies the notation a little and yields the same qualitative behavior. Sundaramoorthi et al. have shown the corresponding Riemannian metric to be equivalent to the very natural metric g c] (h, = h t t + λ l h l l + λ d c] dh d ds dd ds ds on the tangent space of curve variations h, : c] R 2, where c] is the image of c : 0, ] R 2, s denotes arclength, and λ l, λ d > 0. Here, h t and h l are the Gâteaux derivatives of the curve centroid (in our case the base point and the logarithm of the curve length for curve variation in direction h, and h d = h t + h l (c c 0 (analogous for. By 0] this yields a geodesically complete shape space in which there is a closed formula for the exponential map 22], lending itself for Riemannian optimization. (Note that simple L 2 -type metrics in the space of curves can in general not be used since the resulting spaces usually are degenerate 5]: They exhibit paths of arbitrarily small length between any two curves. To illustrate the efficiency of Riemannian optimization in this context we consider the tas of image segmentation via active contours without edges as proposed by Chan and Vese 8]. For a given gray scale image u : 0, ] 2 R we would lie to minimize the objective functional ( f(c] = a i u intc](u 2 dx + (u e u 2 dx extc] + a 2 ds, c] where a, a 2 > 0, u i and u e are given gray values, and intc] and extc] denote the interior and exterior of c]. The first two terms indicate that c] should enclose the image region where u is close to u i and far from u e, while the third term acts as a regularizer and measures the curve length. We interpret the curve c as an element of the above Riemannian manifold R 2 R St(2, L 2 (0, ] and add an additional term to the objective functional that prefers a uniform curve parametrization. The objective functional then reads ( f(c 0, ρ, (e, g = a i u int(c 0,ρ,(e,g](u 2 dx + (u e u 2 dx +a 2 exp(ρ+a 3 (e 2 +g 2 2 dϑ, ext(c 0,ρ,(e,g] 0 6

17 Figure 3: Curve evolution during BFGS minimization of f. The curve is depicted at steps 0,, 2, 3, 4, 7, and after convergence. Additionally we show the evolution of the function value f(c min c f(c. non-geodesic retr. geodesic retr. gradient flow gradient descent BFGS quasi-newton Fletcher Reeves NCG Table 2: Iteration numbers for minimization of f with different methods. The iteration is stopped as soon as the derivative of the discretized functional f has l 2 -norm less than 0 3. For the gradient flow discretization we employ a stepsize of 0.00, which is roughly the largest stepsize for which the curve stays within the image domain during the whole iteration. where we choose (a, a 2, a 3 = (50,,. For numerical implementation, e and g are discretized as piecewise constant functions on an equispaced grid over 0, ], and the image u is given as pixel values on a uniform quadrilateral grid, where we interpolate bilinearly between the nodes. Figure 3 shows the curve evolution for a particular example. Obviously, the natural metric ensures that the correct curve positioning, scaling, and deformation tae place quite independently. Corresponding iteration numbers are shown in Table 2. In one case we employed geodesic retractions with parallel transport (simple formulae for which are based on the matrix exponential 22], in the other case we used R (c0,ρ,(e,g(δc 0, δρ, (δe, δg = (c 0 + δc 0, ρ + δρ, Π St(2,L2 (0,](e + δe, g + δg, T (c 0,ρ,(e,g,(c 2 0,ρ2,(e 2,g 2 (δc 0, δρ, (δe, δg = (δc 0, δρ, Π T(e 2,g 2 St(2,L 2 (0,](δe, δg, where Π S denotes the orthogonal projection onto S (L 2 (0, ] 2. Due to the closed-form solution of the exponential map there is hardly any difference in computational costs. Riemannian BFGS and nonlinear conjugate gradient iteration yield much faster convergence than gradient descent. Gradient descent with step size control in turn is faster than gradient flow (which in numerical implementations corresponds to gradient descent with a fixed small step size, which is the method employed in 22], so that the use of higher order methods such as BFGS or NCG will yield a substantial gain in computation time. Figure 4 shows experiments for different weights inside the metric. We vary λ d and the ratio λ d λ l by the factor 6. Obviously, a larger λ d ensures a good curve positioning and scaling before starting major deformations. Small λ d has the reverse effect. The ratio between λ d and λ d λ l decides whether first the scaling or the positioning is adjusted. To close this example, Figure 5 shows the active contour segmentation on the widely used cameraman image. Since the image contains rather sharp discontinuities, the derivatives of the objective functional exhibit regions of steep variations. Nevertheless, the NCG and BFGS method stay superior to gradient descent: While for the top example in Figure 5 the BFGS method needed 46 steps, the NCG iteration needed 2539 and the gradient descent 8325 steps (the iteration was stopped as soon as the derivative of the discretized objective functional reached an l 2 -norm less than 0 2. The cameraman example also shows the limitations of the above shape space in the context of segmentation problems. Since we only consider closed curves with a well-defined interior and exterior, we can only segment simply connected regions. As soon as the curve self-intersects, we thus have to stop the optimization (compare Figure 5 bottom. 7

Top: (a, a 2, a 3 = (50, 3 0, 0 3, steps 0,, 5, 0, 20, 46 are shown. Middle: (a, a 2, a 3 = (50, 8 0 2, 0 3, steps 0, 0, 20, 40, 60, 6 are shown.

18 Figure 4: Steps 0,, 3, 5 of the BFGS optimization, using different weights in the shape space metric λ (top row: λ d = 6; bottom row: λ d = ; left column: = 6; right column: d λ l =. λ d λ l Figure 5: Segmentation of the cameraman image with different parameters (using the BFGS iteration and λ l = λ d =. Top: (a, a 2, a 3 = (50, 3 0, 0 3, steps 0,, 5, 0, 20, 46 are shown. Middle: (a, a 2, a 3 = (50, 8 0 2, 0 3, steps 0, 0, 20, 40, 60, 6 are shown. Bottom: (a, a 2, a 3 = (50, 0 2, 0 3, steps 0, 50, 00, 50, 200, 250 are shown. The curves were reparameterized every 70 steps. The bottom iteration was stopped as soon as the curve self-intersected. 8

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction