Optimization methods on Riemannian manifolds and their application to shape space

Size: px
Start display at page:

Download "Optimization methods on Riemannian manifolds and their application to shape space"

Transcription

1 Optimization methods on Riemannian manifolds and their application to shape space Wolfgang Ring Benedit Wirth Abstract We extend the scope of analysis for linesearch optimization algorithms on (possibly infinitedimensional Riemannian manifolds to the convergence analysis of the BFGS quasi-newton scheme and the Fletcher Reeves conjugate gradient iteration. Numerical implementations for exemplary problems in shape spaces show the practical applicability of these methods. Introduction There are a number of problems that can be expressed as a minimization of a function f : M R over a smooth Riemannian manifold M. Applications range from linear algebra (e. g. principal component analysis or singular value decomposition,, ] to the analysis of shape spaces (e. g. computation of shape geodesics, 24], see also the references in 3]. If the manifold M can be embedded in a higher-dimensional space or if it is defined via equality constraints, then there is the option to employ the very advanced tools of constrained optimization (see 6] for an introduction. Often, however, such an embedding is not at hand, and one has to resort to optimization methods designed for Riemannian manifolds. Even if an embedding is nown, one might hope that a Riemannian optimization method performs more efficiently since it exploits the underlying geometric structure of the manifold. For this purpose, various methods have been devised, from simple gradient descent on manifolds 25] to sophisticated trust region methods 5]. The aim of this article is to extend the scope of analysis for these methods, concentrating on linesearch methods. In particular, we will consider the convergence of BFGS quasi-newton methods and Fletcher Reeves nonlinear conjugate gradient iterations on (possibly infinite-dimensional manifolds, thereby filling a gap in the existing analysis. Furthermore, we apply the proposed methods to exemplary problems, showing their applicability to manifolds of practical interest such as shape spaces. Early attempts to adapt standard optimization methods to problems on manifolds were presented by Gabay 9] who introduced a steepest descent, a Newton, and a quasi-newton algorithm, also stating their global and local convergence properties (however, without giving details of the analysis for the quasi-newton case. Udrişte 23] also stated a steepest descent and a Newton algorithm on Riemannian manifolds and proved (linear convergence of the former under the assumption of exact linesearch. Fairly recently, Yang too up these methods and analysed convergence and convergence rate of steepest descent and Newton s method for Armijo step-size control 25]. In comparison to standard linesearch methods in vector spaces, the above approaches all substitute the linear step in the search direction by a step along a geodesic. However, geodesics may be difficult to obtain. In alternative approaches, the geodesics are thus often replaced by more general paths, based on so-called retractions (a retraction R x is a mapping from the tangent space T x M to the manifold M at x onto M. For example, Hüper and Trumpf find quadratic convergence for Newton s method without step-size control ], even if the Hessian is computed for a different retraction than the one defining the path along which the Newton step is taen. In the sequel, more advanced trust region Newton methods have been developed and analysed in a series of papers 6,, 5]. The analysis requires some type of uniform Lipschitz continuity for the gradient and the Hessian of the concatenations f R x, which we will also mae use of. The global and local convergence analysis of gradient descent, Newton s method, and trust region methods on manifolds with general retractions is summarized in 2]. In 2], the authors also present a way to implement a quasi-newton as well as a nonlinear conjugate gradient iteration, however, without analysis. Riemannian BFGS quasi-newton methods, which generalize Gabay s original approach, have been devised in 8]. Lie the above schemes, they do not rely on

2 geodesics but allow more general retractions. Furthermore, the vector transport between the tangent spaces T x M and T x+ M at two subsequent iterates (which is needed for the BFGS update of the Hessian approximation is no longer restricted to parallel transport. A similar approach is taen in 7], where a specific, non-parallel vector transport is considered for a linear algebra application. A corresponding global convergence analysis of a quasi-newton scheme has been performed by Ji 2]. However, to obtain a superlinear convergence rate, specific conditions on the compatibility between the vector transports and the retractions are required (cf. section 3. which are not imposed (at least explicitly in the above wor. In this paper, we pursue several objectives. First, we extend the convergence analysis of standard Riemannian optimization methods (such as steepest descent and Newton s method to the case of optimization on infinite-dimensional manifolds. Second, we analyze the convergence rate of the Riemannian BFGS quasi-newton method as well as the convergence of a Riemannian Fletcher Reeves conjugate gradient iteration (as two representative higher order gradient-based optimization methods, which to the authors nowledge has not been attempted before (neither in the finite- nor the infinite-dimensional case. The analysis is performed in the unifying framewor of step-size controlled linesearch methods, which allows for rather streamlined proofs. Finally, we demonstrate the feasibility of these Riemannian optimization methods by applying them to problems in state-of-the-art shape spaces. The outline of this article is as follows. Section 2 summarizes the required basic notions. Section 3 then introduces linesearch optimization methods on Riemannian manifolds, following the standard procedure for finite-dimensional vector spaces 6, 7] and giving the analysis of basic steepest descent and Newton s method as a prerequisite for the ensuing analysis of the BFGS scheme in section 3. and the nonlinear CG iteration in section 3.2. Finally, numerical examples on shape spaces are provided in section 4. 2 Notations Let M denote a geodesically complete (finite- or infinite-dimensional Riemannian manifold. In particular, we assume M to be locally homeomorphic to some separable Hilbert space H 4, Sec..], that is, for each x M there is some neighborhood x U x M and a homeomorphism φ x from U x into some open subset of H. Let C (M denote the vector space of smooth real functions on M (in the sense that for any f C (M and any chart (U x, φ x, f φ x is smooth. For any smooth curve γ : 0, ] M we define the tangent vector to γ at t 0 (0, as the linear operator γ(t 0 : C (M R such that ( d γ(t 0 f = dt f γ (t 0 f C (M. The tangent space T x M to M in x M is the set of tangent vectors γ(t 0 to all smooth curves γ : 0, ] M with γ(t 0 = x. It is a vector space, and it is equipped with an inner product g x (, : T x M T x M R, the so-called Riemannian metric, which smoothly depends on x. The corresponding induced norm will be denoted x, and we will employ the same notation for the associated dual norm. Let V(M denote the set of smooth tangent vector fields on M, then we define a connection : V(M V(M V(M as a map such that fx+gy Z = f X Z + g Y Z X, Y, Z V(M and f, g C (M, Z (ax + by = a Z X + b Z Y X, Y, Z V(M and a, b R, Z (fx = (ZfX + f Z X X, Z V(M and f C (M. In particular, we will consider the Levi-Civita connection, which additionally satisfies the properties of absence of torsion and preservation of the metric, X, Y ] = X Y Y X X, Y V(M, Xg(Y, Z = g( X Y, Z + g(y, X Z X, Y, Z V(M, where X, Y ] = XY Y X denotes the Lie bracet and g(x, Y : M x g x (X, Y. Any connection is paired with a notion of parallel transport. Given a smooth curve γ : 0, ] M, the initial value problem γ(t v = 0, v(0 = v 0 2

3 defines a way of transporting a vector v 0 T γ(0 M to a vector v(t T γ(t M. For the Levi-Civita connection considered here this implies constancy of the inner product g γ(t (v(t, w(t for any two vectors v(t, w(t, transported parallel along γ. For x = γ(0, y = γ(t, we will denote the parallel transport of v(0 T x M to v(t T y M by v(t = Tx,yv(0 Pγ (if there is more than one t with y = γ(t, the correct interpretation will become clear from the context. As detailed further below, Tx,y Pγ can be interpreted as the derivative of a specific mapping P γ : T x M M. We assume that any two points x, y M can be connected by a shortest curve γ : 0, ] M with x = γ(0, y = γ(, where the curve length is measured as Lγ] = 0 g γ(t ( γ(t, γ(t dt. Such a curve is denoted a geodesic, and the length of geodesics induces a metric distance on M, dist(x, y = min Lγ]. γ:0,] M γ(0=x,γ(=y If the geodesic γ connecting x and y is unique, then γ(0 T x M is denoted the logarithmic map of y with respect to x, log x y = γ(0. It satisfies dist(x, y = log x y x. Geodesics can be computed via the zero acceleration condition γ(t γ(t = 0. The exponential map exp x : T x M M, v exp x v, is defined as exp x v = γ(, where γ solves the above ordinary differential equation with initial conditions γ(0 = x, γ(0 = v. It is a diffeomorphism of a neighborhood of 0 T x M into a neighborhood of x M with inverse log x. We shall denote the geodesic between x and y by γx; y], assuming that it is unique. The exponential map exp x obviously provides a (local parametrization of M via T x M. We will also consider more general parameterizations, so-called retractions. Given x M, a retraction is a smooth mapping R x : T x M M with R x (0 = x and DR x (0 = id TxM, where DR x denotes the derivative of R x. The inverse function theorem then implies that R x is a local homeomorphism. Besides exp x, there are various possibilities to define retractions. For example, consider a geodesic γ : R M with γ(0 = x, parameterized by arc length, and define the retraction ( P γ : T x M M, v exp pγ(v T Pγ x,p v π γ(v γ(v], where p γ (v = exp x (π γ (v and π γ denotes the orthogonal projection onto span{ γ(0}. This retraction corresponds to running along the geodesic γ according to the component of v parallel to γ(0 and then following a new geodesic into the direction of the parallel transported component of v orthogonal to γ(0. We will later have to consider the transport of a vector from one tangent space T x M into another one T y M, that is, we will consider isomorphisms T x,y : T x M T y M. We are particularly interested in operators Tx,y Rx which represent the derivative DR x (v of a retraction R x at v T x M with R x (v = y (where in case of multiple v with R x (v = y it will be clear from the context which v is meant. In that sense, the parallel transport Tx,y Pγ along a geodesic γ connecting x and y belongs to the retraction P γ. Another possible vector transport is defined by the variation of the exponential map, evaluated at the representative of y in T x M, T exp x x,y = D exp x (log x y, which maps each v 0 T x M onto the tangent vector γ(0 to the curve γ : t exp x (log x y + tv 0. Also the adjoints (Ty,x Pγ, (T exp y y,x (defined by g y (v, Ty,xw = g x (T y,x v, w v T y M, w T x M or inverses (Ty,x Pγ, (T exp y y,x can be considered for vector transport T x,y. Note that Tx,y Pγ is an isometry with Tx,y Pγ = (Ty,x P γ = (Ty,x P γ, where γ is the geodesic connecting x with y and γ( = γ(. Furthermore, log x y is transported onto the same vector T x,y log x y = γ( by Tx,y, Pγ T exp x x,y and their adjoints and inverses. Given a smooth function f : M R, we define its (Fréchet-derivative Df(x at a point x M as an element of the dual space to T x M via Df(xv = vf. The Riesz representation theorem then implies the existence of a f(x T x M such that Df(xv = g x ( f(x, v for all v T x M, which we denote the gradient of f at x. On T x M, define f Rx = f R x 3

4 for a retraction R x. Due to DR x (0 = id we have Df Rx (0 = Df(x. Furthermore, f Rx = f Ry Ry R x where Ry exists and thus for y = R x (v, Df Rx (v = Df Ry (0DR x (v = Df(yT Rx x,y, f Rx (v = (T Rx x,y f(y. We define the Hessian D 2 f(x of a smooth f at x as the symmetric bilinear form D 2 f(x : T x M T x M R, (v, w g x ( v f(x, w, which is equivalent to D 2 f(x = D 2 f expx (0. By 2 f(x : T x M T x M we denote the linear operator mapping v T x M onto the Riesz representation of D 2 f(x(v,. Note that if M is embedded into a vector space X and f = f M for a smooth f : X R, we usually have D 2 f(x D 2 f(x TxM T xm (which is easily seen from the example X = R n 2, f(x = x X, M = {x X : x X = }. For a smooth retraction R x we do not necessarily have D 2 f Rx (0 = D 2 f(x, however, this holds at stationary points of f, 2]. Let T n (T x M denote the vector space of n-linear forms T : (T x M n R together with the norm T x = sup v,...,v n T xm We call a function g : M x M T n (T x M, x g(x T (v,...,v n v x... v n x. T n (T x M, (Lipschitz continuous at ˆx M if x g(x (T P γˆx;x] ˆx,x n T n (Tˆx M is (Lipschitz continuous g(x P at ˆx (with Lipschitz constant lim sup x ˆx (T γˆx;x] ˆx,x n g(ˆx /dist(x, ˆx. Here, γˆx; x] denotes ˆx the shortest geodesic, which is unique in the neighborhood of ˆx. (Lipschitz continuity on U M means (Lipschitz continuity at every x U (with uniform Lipschitz constant. A function f : M R is called n times (Lipschitz continuously differentiable, if D l f : M x M T l (T x M is (Lipschitz continuous for 0 l n. 3 Iterative minimization via geodesic linesearch methods In classical optimization on vector spaces, linesearch methods are widely used. They are based on updating the iterate by choosing a search direction and then adding a multiple of this direction to the old iterate. Adding a multiple of the search direction obviously requires the structure of a vector space and is not possible on general manifolds. The natural extension to manifolds is to follow the search direction along a path. We will consider iterative algorithms of the following generic form. Algorithm (Linesearch minimization on manifolds. Input: f : M R, x 0 M, = 0 repeat choose a descent direction p T x M choose a retraction R x : T x M M choose a step length α R set + = R x (α p + until + sufficiently minimizes f Here, a descent direction denotes a direction p with Df( p < 0. This property ensures that the objective function f indeed decreases along the search direction. For the choice of the step length, various approaches are possible. In general, the chosen α has to fulfill a certain quality requirement. We will here concentrate on the so-called Wolfe conditions, that is, for a given descent direction p T x M, the chosen step length α has to satisfy f(r x (αp f(x + c αdf(xp, Df(R x (αpt Rx x,r x(αp p c 2Df(xp, where 0 < c < c 2 <. Note that both conditions can be rewritten as f Rx (αp f Rx (0 + c αdf Rx (0p and Df Rx (αpp c 2 Df Rx (0p, the classical Wolfe conditions for minimization of f Rx. If the second condition is replaced by Df(R x (αpt Rx x,r p c x(αp 2Df(xp, (2 we obtain the so-called strong Wolfe conditions. Most optimization algorithms also wor properly if just the so-called Armijo condition (a is satisfied. In fact, the stronger Wolfe conditions are only needed for the later analysis of a quasi-newton scheme. Given a descent direction p, a feasible step length can always be found. (a (b 4

5 Proposition (Feasible step length, e. g. 6, Lem. 3.]. Let x M, p T x M be a descent direction and f Rx : span{p} R continuously differentiable. Then there exists α > 0 satisfying ( and (2. Proof. Rewriting the Wolfe conditions as f Rx (αp f Rx (0+c αdf Rx (0p and Df Rx (αpp c 2 Df Rx (0p ( Df Rx (αpp c 2 Df Rx (0p, respectively, the standard argument for Wolfe conditions in vector spaces can be applied. Besides the quality of the step length, the convergence of linesearch algorithms naturally depends on the quality of the search direction. Let us introduce the angle θ between the search direction p and the negative gradient f(, Df( p cos θ =. Df( x p x There is a classical lin between the convergence of an algorithm and the quality of its search direction. Theorem 2 (Zoutendij s theorem. Given f : M R bounded below and differentiable, assume the α in Algorithm to satisfy (. If the f Rx are Lipschitz continuously differentiable on span{p } with uniform Lipschitz constant L, then cos 2 θ Df( 2 <. N Proof. The proof for optimization on vector spaces also applies here: Lipschitz continuity and (b imply α L p 2 (Df Rx (α p Df Rx (0p (c 2 Df( p, from which we obtain α (c 2 Df( p /(L p 2. Then (a implies f(+ f( c c 2 L cos2 θ Df( 2 so that the result follows from the boundedness of f by summing over all. Corollary 3 (Convergence of generalized steepest descent. Let the search direction in Algorithm be the solution to B (p, v = Df( v v T x M, where the B are uniformly coercive and bounded bilinear forms on T x M (the case B (, = g x (, yields the steepest descent direction. Under the conditions of Theorem 2, Df( x 0. Proof. Obviously, cos θ = follows from Zoutendij s theorem. B (p,p B (p, x p x is uniformly bounded above zero so that the convergence For continuous Df, the previous corollary implies that any limit point x of the sequence is a stationary point of f. Hence, on finite-dimensional manifolds, if {x M : f(x f(x 0 } is bounded, can be decomposed into subsequences each of which converges against a stationary point. On infinitedimensional manifolds, where only a wea convergence of subsequences can be expected, this is not true in general (which is not surprising given that not even existence of stationary points is granted without stronger conditions on f such as sequential wea lower semi-continuity. Moreover, the limit points may be non-unique. However, in the case of (locally strictly convex functions we have (local convergence against the unique minimizer by the following classical estimate. Proposition 4. Let U M be a geodesically star-convex neighborhood around x M (i. e. for any x U the shortest geodesics from x to x lie inside U, and define V (U = {v exp x (U : v x w x w exp x (exp x (v}. Let f be twice differentiable on U with x being a stationary point. If D 2 f expx V (U T x M, then there exists m > 0 such that for all x U, is uniformly coercive on dist(x, x m Df(x x. 5

6 Proof. By hypothesis, there is m > 0 with D 2 f expx (x(v, v m v 2 x for all x exp x (U, v T x M. = Dfexp x (vv v x Thus, for v = log x x we have Df(x x Df(x log x x log x x m v x = mdist(x, x. (Note that coercivity along γx; x ] would actually suffice. = 0 D2 f expx (tv(v,v dt v x Hence, if f is smooth with Df(x = 0 and D 2 f(x = D 2 f expx (0 coercive (so that there exists a neighborhood U M of x such that D 2 f expx is uniformly coercive on V (U, then by the above proposition and Corollary 3 we thus have x if {x M : f(x f(x 0 } U. Remar. Note the difference between steepest descent and an approximation of the so-called gradient flow, the solution to the ODE dx dt = f(x. While the gradient flow aims at a smooth curve along which the direction at each point is the steepest descent direction, the steepest descent linesearch tries to move into a fixed direction as long as possible and only changes its direction to be the one of steepest descent if the old direction does no longer yield sufficient decrease. Therefore, the steepest descent typically taes longer steps in one direction. Remar 2. The condition on f Rx in Theorems and 2 may be untangled into conditions on f and R x. For example, one might require f to be Lipschitz continuously differentiable and T R,+ span{p } to be uniformly bounded, which is the case for R x = exp x or R x = P γx ;+ ], for example. As a method with only linear convergence, steepest descent requires many iterations. Improvement can be obtained by choosing in each iteration the Newton direction p N as search direction, that is, the solution to D 2 f Rx (0(p N, v = Df( v v T x M. (3 Note that the Newton direction is obtained with regard to the retraction R x. The fact that D 2 f Rx (0 = D 2 f(x at a stationary point x and the later result that the Hessian only needs to be approximated (Proposition 8 suggests that D 2 f( or a different approximation could be used as well to obtain fast convergence. In contrast to steepest descent, Newton s method is invariant with respect to a rescaling of f and in the limit allows a constant step size and quadratic convergence as shown in the following sequence of propositions which can be transferred from standard results (e. g. 6, Sec. 3.3]. Proposition 5 (Newton step length. Let f be twice differentiable and D 2 f Rx (0 continuous in x. Consider Algorithm and assume the α to satisfy ( with c 2 and the p to satisfy lim Df( + D 2 f Rx (0(p, x = 0. (4 p x Furthermore, let the f Rx be twice Lipschitz continuously differentiable on span{p } with uniform Lipschitz constant L. If x with Df(x = 0 and D 2 f(x bounded and coercive is a limit point of so that I x for some I N, then α = would also satisfy ( (independent of whether α = is actually chosen for sufficiently large I. Proof. Let m > 0 be the lowest eigenvalue belonging to D 2 f(x = D 2 f Rx (0. The continuity of the second derivative implies the uniform coercivity D 2 f Rx (0(v, v m 2 v 2 for all I sufficiently large. From D 2 f Rx (0(p p N, = Df( + D 2 f Rx (0(p, we then obtain p p N = o( p x. Furthermore, condition (4 implies lim Df( p + D 2 f Rx (0(p, p / p 2 = 0 and thus 0 = lim sup Df( p p 2 + D2 f (0(p Rx, p Df( p p 2 lim sup x, I p 2 + m 2 for I sufficiently large. Due to Df( x I 0 we deduce p x I 0. Df( p / p 2 m 4 (5 6

7 By Taylor s theorem, f Rx (p =f Rx (0+Df Rx (0p + 2 D2 f Rx (q (p, p for some q 0, p ] so that f Rx (p f( 2 Df(p = 2 (Df(p + D 2 f Rx (q (p, p = ( Df( p + D 2 f Rx (0(p N, p + D 2 f Rx (0(p p N, p 2 ( ] + D 2 f Rx (q D 2 f Rx (0 (p, p o( p 2 which implies feasibility of α = with respect to (a for I sufficiently large. Also, ( Df Rx (p p = Df(p + D 2 f Rx (0(p, p + D 2 f Rx (tp D 2 f Rx (0 (p, p dt = o( p 2 0 which together with (5 implies Df Rx (p p c 2 Df( p for sufficiently large I, that is, (b for α =. Lemma 6. Let U M be open and retractions R x : T x M M, x U, have equicontinuous derivatives at x in the sense ε > 0 δ > 0 x U : v x < δ T P γx;rx(v] DR x (0 DR x (v < ε. x,r x(v Then for any ε > 0 there is an ε > 0 such that for all x U and v, w T x M with v x, w x < ε, ( ε w v x dist(r x (v, R x (w ( + ε w v x. Proof. For ε > 0 there is δ > 0 such that for any x U, v x < δ implies T P γ x;r x (v] x,r x(v DR x (v < ε. From this we obtain for v, w T x M with v x, w x < δ that dist(r x (v, R x (w 0 DR x (v + t(w v(w v Rx(v+t(w v dt w v x + ( + ε w v x. 0 ] DR x (v + t(w v T P γx;rx(v+t(w v] x,r x(v+t(w v (w v Rx(v+t(w v Furthermore, for δ small enough, the shortest geodesic path between R x (v and R x (w can be expressed as t R x (p(t, where p : 0, ] T x M with p(0 = v and p( = w. Then, for v x, w x < ( ε δ 2 =: ε, dist(r x (v, R x (w = DR x(p(t dp(t 0 dt dt Rx(p(t dp(t 0 dt dt x 0 ( ε dp(t dt dt ( ε w v x, x 0 DR x (p(t T P γx;rx(p(t] x,r x(p(t ] dp(t dt Rx(p(t where we have used p(t x < δ for all t 0, ], since otherwise one could apply the above estimate to the segments 0, t and (t 2, ] with t = inf{t : p(t x δ}, t 2 = sup{t : p(t x δ}, which yields dist(r x (v, R x (w ( ε 0,t (t 2,] dp(t dt dt ( ε2(δ ε = ( ε 2 δ > ( + ε w v x, x contradicting the first estimate. Proposition 7 (Convergence of Newton s method. Let f be twice differentiable and D 2 f Rx (0 continuous in x. Consider Algorithm where p = p N as defined in (3, α satisfies ( with c 2, and α = whenever possible. Assume has a limit point x with Df(x = 0 and D 2 f(x bounded and coercive. Furthermore, assume that in a neighborhood U of x, the DR x are equicontinuous in the above sense dt dt 7

8 and that the f Rx with U are twice Lipschitz continuously differentiable on Rx (U with uniform Lipschitz constant L. Then x with for some C > 0. dist(+, x lim dist 2 (, x C Proof. There is a subsequence ( I, I N, with I x. By Proposition 5, α = for I sufficiently large. Furthermore, by the previous lemma there is ɛ > 0 such that 2 w v dist(r x (v, R x (w 3 2 w v for all w, v T x M with v x, w x < ɛ. Hence, for I large enough such that dist(, x < ɛ 4, there exists R (x, and D 2 f Rx (0(p Rx (x, = which implies p R (x x 0 x = (Df Rx (Rx (x Df( D 2 f Rx (0(Rx (x, ( D 2 f Rx (trx (x D 2 f Rx (0 (R (x, dt x x L R (x 2 Rx (x 2 for the smallest eigenvalue m of D 2 f(x (using the 2 L m same argument as in the proof of Proposition 5. The previous lemma then yields the desired convergence rate (note from the proof of Proposition 5 that p tends to zero and thus also convergence of the whole sequence. The Riemannian Newton method was already proposed by Gabay in 982 9]. Smith proved quadratic convergence for the more general case of applying Newton s method to find a zero of a one-form on a manifold with the Levi-Civita connection 2] (the method is stated for an unspecified affine connection in 4], using geodesic steps. Yang rephrased the proof within a broader framewor for optimization algorithms, however, restricting to the case of minimizing a function (which corresponds to the method introduced above, only with geodesic retractions 25]. Hüper and Trumpf show quadratic convergence even if the retraction used for taing the step is different from the retraction used for computing the Newton direction ]. A more detailed overview is provided in 2, Sec. 6.6]. While all these approaches were restricted to finite-dimensional manifolds, we here explicitly include the case of infinite-dimensional manifolds. A superlinear convergence rate can also be achieved if the Hessian in each step is only approximated, which is particularly interesting with regard to our aim of also analyzing a quasi-newton minimization approach. Proposition 8 (Convergence of approximated Newton s method. Let the assumptions of the previous proposition hold, but instead of the Newton direction p N consider directions p which only satisfy (4. Then we have superlinear convergence, dist(+, x lim dist(, x = 0. Proof. From the proof of Proposition 5 we now p p N x = o( p x. Also, the proof of Proposition 7 shows p N Rx (x x 2 L m p Rx (x x p p N x + p N R (x 2. Thus we obtain Rx (x x o( p x + 2 L R x m (x 2. This inequality first implies p x = O( R (x x and then p Rx (x x so that the result follows as in the proof of Proposition 7. = o( R (x x Remar 3. Of course, again the conditions on f Rx in the previous analysis can be untangled into separate conditions on f and the retractions. For example, one might require f to be twice Lipschitz continuously differentiable and the retractions R x to have uniformly bounded second derivatives for x in a neighborhood of x and arguments in a neighborhood of 0 T x M. For example, R x = exp x 8

9 or R x = P γx ;+ ] satisfy these requirements if the manifold M is well-behaved near x. (On a twodimensional manifold, the second derivative of the exponential map deviates the stronger from zero, the larger the Gaussian curvature is in absolute value. On higher-dimensional manifolds, to compute the directional second derivative of the exponential map in two given directions, it suffices to consider only the two-dimensional submanifold which is spanned by the two directions, so the value of the directional second derivative depends on the sectional curvature belonging to these directions. If all sectional curvatures of M are uniformly bounded near x which is not necessarily the case for infinite-dimensional manifolds, this implies also the boundedness of the second derivatives of the exponential map so that R x = exp x or R x = P γx ;+ ] indeed satisfy the requirements. 3. BFGS quasi-newton scheme A classical way to retain superlinear convergence without computing the Hessian at every iterate consists in the use of quasi-newton methods, where the objective function Hessian is approximated via the gradient information at the past iterates. The most popular method, which we would lie to transfer to the manifold setting here, is the BFGS ran-2-update formula. Here, the search direction p is chosen as the solution to B (p, = Df(, (6 where the bilinear forms B : (T x M 2 R are updated according to s = α p = R (+ y = Df Rx (s Df Rx (0 B + (T v, T w = B (v, w B (s, vb (s, w B (s, s + (y v(y w y s v, w T x M. Here, T T x,+ denotes some linear map from T x M to T x+ M, which we obviously require to be invertible to mae B + well-defined. Remar 4. There are more possibilities to define the BFGS update, for example, using s = R + (, y = Df Rx+ (0 Df Rx+ ( s, and a corresponding update formula for B + (which loos as above, only with the vector transport at different places. The analysis wors analogously, and the above choice only allows the most elegant notation. Actually, the formulation from the above remar was already introduced by Gabay 9] (for geodesic retractions and parallel transport and resumed by Absil et al. 2, 8] (for general retractions and vector transport. A slightly different variant, which ignores any ind of vector transport, was provided in 2], together with a proof of convergence. In contrast to these approaches, we also consider infinitedimensional manifolds and prove convergence as well as superlinear convergence rate. Lemma 9. Consider Algorithm with the above BFGS search direction and Wolfe step size control, where T and T are uniformly bounded and the f Rx are assumed smooth. If B 0 is bounded and coercive, then y s > 0 and B is bounded and coercive for all N. Proof. Assume B to be bounded and coercive. Then p is a descent direction, and (b implies the curvature condition y s (c 2 α Df Rx (0p > 0 so that B + is well-defined. Let us denote by ŷ = f Rx (s f Rx (0 the Riesz representation of y. Furthermore, by the Lax Milgram lemma, there is ˆB : T x M T x M with B (v, w = g x (v, ˆB w for all v, w T x M. Obviously, ] ˆB + = T ˆB B (s, ˆB s B (s, s + y ( ŷ y s T If H = ˆB, then by the Sherman Morrison formula, H + = T J H J + g (s, s y s ] T, J = ( id g (s, ŷ, (7 y s 9

10 is the inverse of ˆB +. Its boundedness is obvious. Furthermore, H + is coercive. Indeed, let M = ˆB, then for any v T x M with v x = and w = v g (s,v y s ŷ we have g x+ (T v, H +T v = g (w, H w + g (s, v 2 y s M w 2 + g (s, v 2 y s which is strictly greater than zero since the second summand being zero implies the first one being greater than or equal to M. In finite dimensions this already yields the desired coercivity, in infinite dimensions it still remains to show that the right-hand side is uniformly bounded away from zero for all unit vectors v T x M. Indeed, M w 2 + g (s, v 2 = 2 g ( ] 2 (s, v gx (s, v g x (ŷ, v + ŷ 2 x y s M y s y s + g (s, v 2 y s is a continuous function in (g x (s, v, g x (ŷ, v which by Weierstrass theorem taes its minimum on s x, s x ] y x, y x ]. This minimum is the smallest eigenvalue of H + and must be greater than zero. Both the boundedness and coercivity of T H +T imply boundedness and coercivity of ˆB + and thus of B +. By virtue of the above theorem, the BFGS search direction is well-defined for all iterates. It satisfies the all-important secant condition B + (T s, = y T which basically has the interpretation that B + (T 2 is supposed to be an approximation to D 2 f Rx (s. Since we also aim at B + D 2 f Rx+ (0, the choice T = T R,+ seems natural in view of the fact D 2 f Rx (Rx (x = D 2 f Rx (0 (Tx,x Rx 2 for a stationary point x. However, we were only able to establish the method s convergence for isometric T (see Proposition 0. For the actual implementation, instead of solving (6 one applies H to f(. By (7, this entails computing the transported vectors T f(, T 2 T f(,..., T0 T... T f( and their scalar products with the s i and y i, i = 0,...,, as well as the transports of the s i and y i. The convergence analysis of the scheme can be transferred from standard analyses (e. g. 7, Prop..6.7-Thm..6.9] or 6, Thm. 6.5] and 2] for slightly weaer results. Due to their technicality, the corresponding proofs are deferred to the appendix. Proposition 0 (Convergence of BFGS descent. Consider Algorithm with BFGS search direction and Wolfe step size control, where the T are isometries. Assume the f Rx to be uniformly convex on the f(x 0 -sublevel set of f, that is, there are 0 < m < M < such that m v 2 D 2 f Rx (p(v, v M v 2 v T x M for p R ({x M : f(x f(x 0 }. If B 0 is symmetric, bounded and coercive, then there exists a constant 0 < µ < such that for the minimizer x M of f. f( f(x µ + (f(x 0 f(x Corollary. Under the conditions of the previous proposition and if f expx is uniformly convex in the sense m v 2 x D2 f expx (p(v, v M v 2 x v T x M if f expx (p f(x 0, then dist(, x M µ + dist(x 0, x N. m 0

11 Proof. From f(x f(x = 2 D2 f expx (t log x x(log x x, log x x for some t 0, ] we obtain m 2 dist2 (x, x f(x f(x M 2 dist2 (x, x. Remar 5. The isometry condition on T is for example satisfied for the parallel transport from to +. The Riesz representation of y can be computed as (T R,+ f(+ f(. For R x = P γx ;+ ], this means simple parallel transport of f(+. For a superlinear convergence rate, it is well-nown that in infinite dimensions one needs particular conditions on the initial Hessian approximation B 0 (it has to be an approximation to the true Hessian in the nuclear norm, see e. g. 9]. On manifolds we furthermore require a certain type of consistency between the vector transports as shown in the following proposition, which modifies the analysis from 6, Thm. 6.6]. For ease of notation, define the averaged Hessian G = 0 D2 f Rx (ts dt and let the hat in Ĝ denote the Lax Milgram representation of G. The proof of the following is given in the appendix. Proposition 2 (BFGS approximation of Newton direction. Consider Algorithm with BFGS search direction and Wolfe step size control. Let x be a stationary point of f with bounded and coercive D 2 f(x. For large enough, assume Rx (x to be defined and assume the existence of isomorphisms T, : T x M T x M with T,, T < C uniformly for some C > 0 and, T R,x T, id 0, T,+ T T, id < bβ for some b > 0, 0 < β <, where denotes the nuclear norm (the sum of all singular values, T = tr T T. Finally, assume T,0 ˆB 0 T,0 2 f(x < for a symmetric, bounded and coercive B 0 and let Then T,ĜT, 2 f(x < =0 lim so that Proposition 8 can be applied. and Df( + D 2 f Rx (0(p, 2 f Rx (Rx (x 2 f Rx (0 0. x = 0 p x Corollary 3 (Convergence rate of BFGS descent. Consider Algorithm with BFGS search direction and Wolfe step size control, where c 2, α = whenever possible, and the T are isometries. Let x be a stationary point of f with bounded and coercive D 2 f(x, where we assume f expx to be uniformly convex on its f(x 0 -sublevel set. Assume that in a neighborhood U of x, the DR x are equicontinuous (in the sense of Lemma 6 and that the f Rx with U are twice Lipschitz continuously differentiable on Rx (U with uniform Lipschitz constant L and uniformly convex on the f(x 0 -sublevel set of f. Furthermore, assume the existence of isomorphisms T, : T x M T x M with T, and T, uniformly bounded and T R,x T, id, T,+ T T, id c max{dist(, x, dist(+, x } for some c > 0. If ˆB 0 is bounded and coercive with T,0 ˆB 0 T,0 2 f(x <, then x with dist(+, x lim dist(, x = 0. Proof. The conditions of Corollary are satisfied so that we have dist(, x < bβ for some b > 0, 0 < β <. As in the proof of Proposition 7, the equicontinuity of the retraction variations then implies the well-definedness of Rx (x for large enough. Also, T R,x T, id, T,+ T T, id < cbβ 0.

12 Furthermore, 2 f Rx (Rx (x 2 f Rx (0 L Rx (x x the uniform Lipschitz continuity of 2 f Rx and Lemma 6. Liewise, = O(dist(, x 0 follows from G (T, 2 D 2 f(x T, 2 f Rx (0T, 2 f(x + T, 2 G D 2 f Rx (0. The second term is bounded by T, 2 L R (+ x and thus by some constant times β due to the equivalence between the retractions and the exponential map near x (Lemma 6. The first term on the right-hand side can be bounded as in the proof of Proposition 2 which also yields a constant times β so that Proposition 2 can be applied. Proposition 8 then implies the result. On finite-dimensional manifolds, the operator norm is equivalent to the nuclear norm, and the condition on B 0 reduces to B 0 being bounded and coercive. In that case, if the manifold M has bounded sectional curvatures near x, f expx is uniformly convex, and f twice Lipschitz continuously differentiable, then the above conditions are for instance satisfied with R x = exp x or R x = P γx ;+ ] and T as well as T, being standard parallel transport. Note that the nuclear norm bound on T,+ T T, id is quite a strong condition, and it is not clear whether it could perhaps be relaxed. If for illustration we imagine T and T, to be simple parallel transport, then the bound implies that the manifold behaves almost lie a hyperplane (except for a finite number of dimensions. 3.2 Fletcher Reeves nonlinear conjugate gradient scheme Other gradient-based minimization methods with good convergence properties in practice include nonlinear conjugate gradient schemes. Here, in each step the search direction is chosen conjugate to the old search direction in a certain sense. Such methods often enjoy superlinear convergence rates. For example, Luenberger has shown +n x = O( x 2 for the Fletcher Reeves nonlinear conjugate gradient iteration on flat n-dimensional manifolds. However, such analyses seem quite special and intricate, and the understanding of these methods seems not yet as advanced as of the quasi-newton approaches, for example. Hence, we will only briefly transfer the convergence analysis for the standard Fletcher Reeves approach to the manifold case. Here, the search direction is chosen according to p = f( + β T R, p, β = Df( 2 Df( 2, (with β 0 = 0 and the step length α satisfies the strong Wolfe condition. Remar 6. If (as is typically done the nonlinear conjugate gradient iteration is restarted every K steps with p ik = f(x ik, i N, then Theorem 2 directly implies lim inf Df( x = 0. Hence, if f is strictly convex in the sense of Proposition 4, we obtain convergence of against the minimizer. We immediately have the following classical bound (e. g. 6, Lem. 5.6]. Lemma 4. Consider Algorithm with Fletcher Reeves search direction and strong Wolfe step size control with c 2 < 2. Then c 2 Df(p Df( 2 2c 2 c 2 N. Proof. The result is obvious for = 0. In order to perform an induction, note R x,+ p Df(+ p + Df(+ T Df(+ 2 = + β + + Df(+ 2 + = + Df Rx (α p p Df( 2. 2

13 The strong Wolfe condition (2 thus implies Df( p + c 2 Df( 2 Df(+p + Df( p Df(+ 2 c 2 + Df( 2 from which the result follows by the induction hypothesis. This bound allows the application of a standard argument (e. g. 6, Thm. 5.7] to obtain lim inf Df( x = 0. Thus, as before, if f is strictly convex in the sense of Proposition 4, the sequence converges against the minimizer. Proposition 5 (Convergence of Fletcher Reeves CG iteration. Under the conditions of the previous lemma and if T R,+ p p x for all N, then lim inf Df( x = 0. x+ Proof. The inequality of the previous lemma can be multiplied by Df( x p x to yield Theorem 2 then implies Df( 4 =0 p 2 c 2 Df( p 2c 2 c 2 Df( x p x cos θ c 2 Df( x p x. c2 c 2 Df( 2 so that p 2 = f( 2 + 2β Df( T R, p + β 2 T R 2, p f( 2 + 2c 2 c 2 β Df( 2 + β 2 p 2 <. Furthermore, by (2 and the previous lemma, Df( T R, p = + c 2 c 2 f( 2 + Df( 4 Df( 4 p 2 + c 2 c 2 Df( 4 j=0 Df(x j 2 x j, where the last step follows from induction and p 0 = f(x 0. However, if we now assume Df( x γ for some γ > 0 and all N, then this implies p 2 +c2 c 2 Df( 4 γ so that Df( 4 2 =0 p 2 c 2 +c 2 γ 2 =0 =, contradicting Zoutendij s theorem. Remar 7. R x = exp x and R x = P γx ;+ ] both satisfy the conditions required for convergence (the iterations for both retractions coincide. If the vector transport associated with the retraction increases the norm of the transported vector, convergence can no longer be guaranteed. Remar 8. If in the iteration we instead use β = Df(( f( f Rx (Rx ( Df( 2 or β + = Df (α Rx p ( f (α Rx p f( Df( 2 we obtain a Pola Ribière variant of a nonlinear CG iteration. If there are 0 < m < M < such that m v 2 D 2 f Rx (p(v, v M v 2 v T x M whenever f(r x (p f(x 0, if (T R,+ T R x (respectively,+ is uniformly bounded, and if α is obtained by exact linesearch, then for all N one can show cos θ (respectively cos θ + M m T R,+ which implies at least linear convergence, dist(, x 2 m f(x 0 f(x ] mm ρ2. + M m (T Rx,+ Here, the proofs of.5.8 and.5.9 from 7] can be directly transferred with λ i := α i, h i := p i, g i := f(x i, H(x i + sλ i h i := 2 f Rxi (sα i p i and replacing g i+ by Df xi (α i p i in (9a, g i, h i by Df Rxi (α i p i p i, and H i h i, g i+ by either (T Rx i x i,x i+ H i h i, g i+ or H i h i, Df Rxi (α i p i in the denominator of (9d..5.8(b follows from Theorem 2 similarly to the proof of Proposition 0. =: ρ 3

14 Figure : Minimization of f (left and f 2 (right on the torus via the Fletcher Reeves algorithm. The top row performs the optimization in the parametrization domain, the bottom row shows the result for the Riemannian optimization. For each case the optimization path is shown on the torus and in the parametrization domain (the color-coding from blue to red indicates the function value as well as the evolution of the function values. Obviously, optimization in the parametrization domain is more suitable for f, whereas Riemannian optimization with the torus metric is more suitable for f 2. 4 Numerical examples In this section we will first consider simple optimization problems on the two-dimensional torus to illustrate the influence of the Riemannian metric on the optimization progress. Afterwards we turn to an active contour model and a simulation of truss shape deformations as exemplary optimization problems to prove the efficiency of Riemannian optimization methods also in the more complex setting of Riemannian shape spaces. 4. The rôle of metric and vector transport The minimum of a functional on a manifold is independent of the manifold metric and the chosen retractions. Consequently, exploiting the metric structure does not necessarily aid the optimization process, and one could certainly impose different metrics on the same manifold of which some are more beneficial for the optimization problem at hand than others. The optimal pair of a metric and a retraction would be such that one single gradient descent step already hits the minimum. However, the design of such pairs requires far more effort than solving the optimization problem in a suboptimal way. Often, a certain metric and retraction fit naturally to the optimization problem at hand. For illustration, consider the two-dimensional torus, parameterized by (ϕ, ψ y(ϕ, ψ = ((r + r 2 cos ϕ cos ψ, (r + r 2 cos ϕ sin ψ, r 2 sin ϕ, ϕ, ψ π, π], r = 2, r 2 = 3 5. The corresponding metric shall be induced by the Euclidean embedding. As objective functions let us consider the following, both expressed as functions on the parametrization domain, f (ϕ, ψ = a( cos ϕ + b(ψ + π/2 mod (2π π] 2, f 2 (ϕ, ψ = a( cos ϕ + bdist(ϕ, Φ ψ 2, where we use (a, b = (, 40 and for all ψ π, π], Φ ψ is a discrete set such that {(ϕ, ψ π, π] 2 : ϕ Φ ψ } describes a shortest curve winding five times around the torus and passing through (ϕ, ψ = (0, 0 (compare Figure, right. Both functions may be slightly altered so that they are smooth all over the torus. f exhibits a narrow valley that is aligned with geodesics of the parametrization domain, while the valleys of f 2 follow a geodesic path on the torus. Obviously, an optimization based on the (Euclidean metric and (straight line geodesic retractions of the parametrization domain is much better in following the valley of f than an optimization based on the actual torus metric (Figure. For f 2 the situation is reverse. This phenomenon is also reflected in the iteration numbers until convergence (Table. It is not very pronounced for methods which converge after only few iterations, but it is very noticeable especially for gradient descent and nonlinear conjugate gradient iterations. 4

15 objective f objective f 2 parametr. metric torus metric parametr. metric torus metric gradient descent Newton descent BFGS quasi-newton Fletcher Reeves NCG Table : Iteration numbers for minimization of f and f 2 with different methods. The iteration is stopped as soon as ( fi ϕ, fi ψ l 2 < 0 3. As retractions we use the exponential maps with respect to the metric of the parametrization domain and of the torus, respectively. The BFGS and NCG method employ the corresponding parallel transport iteration number Figure 2: Evolution of distance d to the global minimizer (ϕ, ψ = (0, 0 of f 2 for different methods, starting from (ϕ, ψ = ( 0, 0. The thic dashed and solid line belongs to Riemannian gradient descent and Riemannian BFGS descent (with T being parallel transport, respectively. The thin lines show the evolution for Riemannian BFGS descent, where T is parallel transport concatenated with a rotation by ω for ω = 00 (solid, ω = d 0 (dashed, ω = 2 5 d (dotted, and ω = 5 log( log d (dot-dashed. Of course, the qualitative convergence behavior stays the same (such as linear convergence for gradient descent or superlinear convergence for the BFGS method, independent of the employed retractions. Therefore it sometimes pays off to choose rapidly computable retractions over retractions that minimize the number of optimization steps. We might for example minimize f or f 2 using the torus metric but nongeodesic retractions R x : T x M M, v y(y (x + Dy v (a straight step in the parametrization domain, where y was the parameterization. The resulting optimization will behave similarly to the optimization based on the (Euclidean metric of the parametrization domain, and indeed, the Fletcher Reeves algorithm requires 47 minimization steps for f and 83 for f 2. As a final discussion based on the illustrative torus example, consider the conditions on the vector transport T for the BFGS method. It is an open question whether the isometry of T is really needed for global convergence in Proposition 0. On compact manifolds such as the torus this condition is not needed since the sequence of iterates will always contain a converging subsequence so that global convergence follows from Proposition 2 instead of Proposition 0. A counterexample, showing the necessity of isometric transport, will liely be difficult to obtain. On the other hand, we can numerically validate the conditions on T to obtain superlinear convergence. Figure 2 shows the evolution of the distance between the current iterate and x = arg min f 2 for the BFGS method with varying T. In particular, we tae T to be parallel transport concatenated with a rotation by some angle ω that depends on dist(, x. The term T,+ T T, id in Proposition 2 and Corollary 3 then scales (at least lie ω. Apparently, the superlinear convergence seems to be retained for ω dist(, x as well as ω dist(, x, which is better than predicted by Corollary 3. (This is not surprising, though, since T,+ T T, id dist(, x combined with superlinear convergence of can still satisfy the conditions of Proposition 2. However, for constant ω and ω log( log(dist(,x, convergence is indeed only linear. Nevertheless, convergence is still much faster than for gradient descent. 5

16 4.2 Riemannian optimization in the space of smooth closed curves Riemannian optimization on the Stiefel manifold has been applied successfully and efficiently to several linear algebra problems, from low ran approximation 7] to eigenvalue computation 8]. The same concepts can be transferred to efficient optimization methods in the space of closed smooth curves, using the shape space and description of curves introduced by Younes et al. 26]. They represent a curve c : 0, ] C R 2 by two functions e, g : 0, ] R via c(θ = c(0 + 2 θ 0 (e + ig 2 dϑ. The conditions that the curve be closed, c( = c(0, and of unit length, = 0 c (θ dθ, result in the fact that e and g are orthonormal in L 2 (0, ], thus (e, g forms an element of the Stiefel manifold { } St(2, L 2 (0, ] = (e, g L 2 (0, ] : e L2 (0,] = g L 2 (0,] =, (e, g L 2 (0,] = 0. The Riemannian metric of the Stiefel manifold can now be imposed on the the space of smooth closed curves with unit length and fixed base point, which was shown to be equivalent to endowing this shape space with a Sobolev-type metric 26]. For general closed curves we follow Sundaramoorthi et al. 22] and represent a curve c by an element (c 0, ρ, (e, g of R 2 R St(2, L 2 (0, ] via c(θ = c 0 + exp ρ 2 θ 0 (e + ig 2 dϑ. c 0 describes the curve base point and exp ρ its length. (Note that Sundaramoorthi et al. choose c 0 as the curve centroid. Choosing the base point instead simplifies the notation a little and yields the same qualitative behavior. Sundaramoorthi et al. have shown the corresponding Riemannian metric to be equivalent to the very natural metric g c] (h, = h t t + λ l h l l + λ d c] dh d ds dd ds ds on the tangent space of curve variations h, : c] R 2, where c] is the image of c : 0, ] R 2, s denotes arclength, and λ l, λ d > 0. Here, h t and h l are the Gâteaux derivatives of the curve centroid (in our case the base point and the logarithm of the curve length for curve variation in direction h, and h d = h t + h l (c c 0 (analogous for. By 0] this yields a geodesically complete shape space in which there is a closed formula for the exponential map 22], lending itself for Riemannian optimization. (Note that simple L 2 -type metrics in the space of curves can in general not be used since the resulting spaces usually are degenerate 5]: They exhibit paths of arbitrarily small length between any two curves. To illustrate the efficiency of Riemannian optimization in this context we consider the tas of image segmentation via active contours without edges as proposed by Chan and Vese 8]. For a given gray scale image u : 0, ] 2 R we would lie to minimize the objective functional ( f(c] = a i u intc](u 2 dx + (u e u 2 dx extc] + a 2 ds, c] where a, a 2 > 0, u i and u e are given gray values, and intc] and extc] denote the interior and exterior of c]. The first two terms indicate that c] should enclose the image region where u is close to u i and far from u e, while the third term acts as a regularizer and measures the curve length. We interpret the curve c as an element of the above Riemannian manifold R 2 R St(2, L 2 (0, ] and add an additional term to the objective functional that prefers a uniform curve parametrization. The objective functional then reads ( f(c 0, ρ, (e, g = a i u int(c 0,ρ,(e,g](u 2 dx + (u e u 2 dx +a 2 exp(ρ+a 3 (e 2 +g 2 2 dϑ, ext(c 0,ρ,(e,g] 0 6

17 Figure 3: Curve evolution during BFGS minimization of f. The curve is depicted at steps 0,, 2, 3, 4, 7, and after convergence. Additionally we show the evolution of the function value f(c min c f(c. non-geodesic retr. geodesic retr. gradient flow gradient descent BFGS quasi-newton Fletcher Reeves NCG Table 2: Iteration numbers for minimization of f with different methods. The iteration is stopped as soon as the derivative of the discretized functional f has l 2 -norm less than 0 3. For the gradient flow discretization we employ a stepsize of 0.00, which is roughly the largest stepsize for which the curve stays within the image domain during the whole iteration. where we choose (a, a 2, a 3 = (50,,. For numerical implementation, e and g are discretized as piecewise constant functions on an equispaced grid over 0, ], and the image u is given as pixel values on a uniform quadrilateral grid, where we interpolate bilinearly between the nodes. Figure 3 shows the curve evolution for a particular example. Obviously, the natural metric ensures that the correct curve positioning, scaling, and deformation tae place quite independently. Corresponding iteration numbers are shown in Table 2. In one case we employed geodesic retractions with parallel transport (simple formulae for which are based on the matrix exponential 22], in the other case we used R (c0,ρ,(e,g(δc 0, δρ, (δe, δg = (c 0 + δc 0, ρ + δρ, Π St(2,L2 (0,](e + δe, g + δg, T (c 0,ρ,(e,g,(c 2 0,ρ2,(e 2,g 2 (δc 0, δρ, (δe, δg = (δc 0, δρ, Π T(e 2,g 2 St(2,L 2 (0,](δe, δg, where Π S denotes the orthogonal projection onto S (L 2 (0, ] 2. Due to the closed-form solution of the exponential map there is hardly any difference in computational costs. Riemannian BFGS and nonlinear conjugate gradient iteration yield much faster convergence than gradient descent. Gradient descent with step size control in turn is faster than gradient flow (which in numerical implementations corresponds to gradient descent with a fixed small step size, which is the method employed in 22], so that the use of higher order methods such as BFGS or NCG will yield a substantial gain in computation time. Figure 4 shows experiments for different weights inside the metric. We vary λ d and the ratio λ d λ l by the factor 6. Obviously, a larger λ d ensures a good curve positioning and scaling before starting major deformations. Small λ d has the reverse effect. The ratio between λ d and λ d λ l decides whether first the scaling or the positioning is adjusted. To close this example, Figure 5 shows the active contour segmentation on the widely used cameraman image. Since the image contains rather sharp discontinuities, the derivatives of the objective functional exhibit regions of steep variations. Nevertheless, the NCG and BFGS method stay superior to gradient descent: While for the top example in Figure 5 the BFGS method needed 46 steps, the NCG iteration needed 2539 and the gradient descent 8325 steps (the iteration was stopped as soon as the derivative of the discretized objective functional reached an l 2 -norm less than 0 2. The cameraman example also shows the limitations of the above shape space in the context of segmentation problems. Since we only consider closed curves with a well-defined interior and exterior, we can only segment simply connected regions. As soon as the curve self-intersects, we thus have to stop the optimization (compare Figure 5 bottom. 7

18 Figure 4: Steps 0,, 3, 5 of the BFGS optimization, using different weights in the shape space metric λ (top row: λ d = 6; bottom row: λ d = ; left column: = 6; right column: d λ l =. λ d λ l Figure 5: Segmentation of the cameraman image with different parameters (using the BFGS iteration and λ l = λ d =. Top: (a, a 2, a 3 = (50, 3 0, 0 3, steps 0,, 5, 0, 20, 46 are shown. Middle: (a, a 2, a 3 = (50, 8 0 2, 0 3, steps 0, 0, 20, 40, 60, 6 are shown. Bottom: (a, a 2, a 3 = (50, 0 2, 0 3, steps 0, 50, 00, 50, 200, 250 are shown. The curves were reparameterized every 70 steps. The bottom iteration was stopped as soon as the curve self-intersected. 8

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

DIFFERENTIAL GEOMETRY, LECTURE 16-17, JULY 14-17

DIFFERENTIAL GEOMETRY, LECTURE 16-17, JULY 14-17 DIFFERENTIAL GEOMETRY, LECTURE 16-17, JULY 14-17 6. Geodesics A parametrized line γ : [a, b] R n in R n is straight (and the parametrization is uniform) if the vector γ (t) does not depend on t. Thus,

More information

William P. Thurston. The Geometry and Topology of Three-Manifolds

William P. Thurston. The Geometry and Topology of Three-Manifolds William P. Thurston The Geometry and Topology of Three-Manifolds Electronic version 1.1 - March 00 http://www.msri.org/publications/books/gt3m/ This is an electronic edition of the 1980 notes distributed

More information

DIFFERENTIAL GEOMETRY. LECTURE 12-13,

DIFFERENTIAL GEOMETRY. LECTURE 12-13, DIFFERENTIAL GEOMETRY. LECTURE 12-13, 3.07.08 5. Riemannian metrics. Examples. Connections 5.1. Length of a curve. Let γ : [a, b] R n be a parametried curve. Its length can be calculated as the limit of

More information

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f))

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f)) 1. Basic algebra of vector fields Let V be a finite dimensional vector space over R. Recall that V = {L : V R} is defined to be the set of all linear maps to R. V is isomorphic to V, but there is no canonical

More information

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection.

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection. HOMEWORK 2 - RIEMANNIAN GEOMETRY ANDRÉ NEVES 1. Problems In what follows (M, g will always denote a Riemannian manifold with a Levi-Civita connection. 1 Let X, Y, Z be vector fields on M so that X(p Z(p

More information

LECTURE 10: THE PARALLEL TRANSPORT

LECTURE 10: THE PARALLEL TRANSPORT LECTURE 10: THE PARALLEL TRANSPORT 1. The parallel transport We shall start with the geometric meaning of linear connections. Suppose M is a smooth manifold with a linear connection. Let γ : [a, b] M be

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

1. Bounded linear maps. A linear map T : E F of real Banach

1. Bounded linear maps. A linear map T : E F of real Banach DIFFERENTIABLE MAPS 1. Bounded linear maps. A linear map T : E F of real Banach spaces E, F is bounded if M > 0 so that for all v E: T v M v. If v r T v C for some positive constants r, C, then T is bounded:

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M. 5 Vector fields Last updated: March 12, 2012. 5.1 Definition and general properties We first need to define what a vector field is. Definition 5.1. A vector field v on a manifold M is map M T M such that

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

Lectures in Discrete Differential Geometry 2 Surfaces

Lectures in Discrete Differential Geometry 2 Surfaces Lectures in Discrete Differential Geometry 2 Surfaces Etienne Vouga February 4, 24 Smooth Surfaces in R 3 In this section we will review some properties of smooth surfaces R 3. We will assume that is parameterized

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.) 4 Vector fields Last updated: November 26, 2009. (Under construction.) 4.1 Tangent vectors as derivations After we have introduced topological notions, we can come back to analysis on manifolds. Let M

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

4.7 The Levi-Civita connection and parallel transport

4.7 The Levi-Civita connection and parallel transport Classnotes: Geometry & Control of Dynamical Systems, M. Kawski. April 21, 2009 138 4.7 The Levi-Civita connection and parallel transport In the earlier investigation, characterizing the shortest curves

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Loos Symmetric Cones. Jimmie Lawson Louisiana State University. July, 2018

Loos Symmetric Cones. Jimmie Lawson Louisiana State University. July, 2018 Louisiana State University July, 2018 Dedication I would like to dedicate this talk to Joachim Hilgert, whose 60th birthday we celebrate at this conference and with whom I researched and wrote a big blue

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

INVERSE FUNCTION THEOREM and SURFACES IN R n

INVERSE FUNCTION THEOREM and SURFACES IN R n INVERSE FUNCTION THEOREM and SURFACES IN R n Let f C k (U; R n ), with U R n open. Assume df(a) GL(R n ), where a U. The Inverse Function Theorem says there is an open neighborhood V U of a in R n so that

More information

The nonsmooth Newton method on Riemannian manifolds

The nonsmooth Newton method on Riemannian manifolds The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and

More information

(x, y) = d(x, y) = x y.

(x, y) = d(x, y) = x y. 1 Euclidean geometry 1.1 Euclidean space Our story begins with a geometry which will be familiar to all readers, namely the geometry of Euclidean space. In this first chapter we study the Euclidean distance

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday September 21, 2004 (Day 1)

QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday September 21, 2004 (Day 1) QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday September 21, 2004 (Day 1) Each of the six questions is worth 10 points. 1) Let H be a (real or complex) Hilbert space. We say

More information

Chapter 10 Conjugate Direction Methods

Chapter 10 Conjugate Direction Methods Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

A Riemannian Framework for Denoising Diffusion Tensor Images

A Riemannian Framework for Denoising Diffusion Tensor Images A Riemannian Framework for Denoising Diffusion Tensor Images Manasi Datar No Institute Given Abstract. Diffusion Tensor Imaging (DTI) is a relatively new imaging modality that has been extensively used

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Generalized Newton-Type Method for Energy Formulations in Image Processing

Generalized Newton-Type Method for Energy Formulations in Image Processing Generalized Newton-Type Method for Energy Formulations in Image Processing Leah Bar and Guillermo Sapiro Department of Electrical and Computer Engineering University of Minnesota Outline Optimization in

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

i=1 α i. Given an m-times continuously

i=1 α i. Given an m-times continuously 1 Fundamentals 1.1 Classification and characteristics Let Ω R d, d N, d 2, be an open set and α = (α 1,, α d ) T N d 0, N 0 := N {0}, a multiindex with α := d i=1 α i. Given an m-times continuously differentiable

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

1. Geometry of the unit tangent bundle

1. Geometry of the unit tangent bundle 1 1. Geometry of the unit tangent bundle The main reference for this section is [8]. In the following, we consider (M, g) an n-dimensional smooth manifold endowed with a Riemannian metric g. 1.1. Notations

More information

Choice of Riemannian Metrics for Rigid Body Kinematics

Choice of Riemannian Metrics for Rigid Body Kinematics Choice of Riemannian Metrics for Rigid Body Kinematics Miloš Žefran1, Vijay Kumar 1 and Christopher Croke 2 1 General Robotics and Active Sensory Perception (GRASP) Laboratory 2 Department of Mathematics

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

PICARD S THEOREM STEFAN FRIEDL

PICARD S THEOREM STEFAN FRIEDL PICARD S THEOREM STEFAN FRIEDL Abstract. We give a summary for the proof of Picard s Theorem. The proof is for the most part an excerpt of [F]. 1. Introduction Definition. Let U C be an open subset. A

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Lecture 8. Connections

Lecture 8. Connections Lecture 8. Connections This lecture introduces connections, which are the machinery required to allow differentiation of vector fields. 8.1 Differentiating vector fields. The idea of differentiating vector

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

HILBERT S THEOREM ON THE HYPERBOLIC PLANE

HILBERT S THEOREM ON THE HYPERBOLIC PLANE HILBET S THEOEM ON THE HYPEBOLIC PLANE MATTHEW D. BOWN Abstract. We recount a proof of Hilbert s result that a complete geometric surface of constant negative Gaussian curvature cannot be isometrically

More information

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES JONATHAN LUK These notes discuss theorems on the existence, uniqueness and extension of solutions for ODEs. None of these results are original. The proofs

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Distances, volumes, and integration

Distances, volumes, and integration Distances, volumes, and integration Scribe: Aric Bartle 1 Local Shape of a Surface A question that we may ask ourselves is what significance does the second fundamental form play in the geometric characteristics

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Second Order Elliptic PDE

Second Order Elliptic PDE Second Order Elliptic PDE T. Muthukumar tmk@iitk.ac.in December 16, 2014 Contents 1 A Quick Introduction to PDE 1 2 Classification of Second Order PDE 3 3 Linear Second Order Elliptic Operators 4 4 Periodic

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities

Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities Laurent Saloff-Coste Abstract Most smoothing procedures are via averaging. Pseudo-Poincaré inequalities give a basic L p -norm control

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

X. Linearization and Newton s Method

X. Linearization and Newton s Method 163 X. Linearization and Newton s Method ** linearization ** X, Y nls s, f : G X Y. Given y Y, find z G s.t. fz = y. Since there is no assumption about f being linear, we might as well assume that y =.

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech Math 6455 Nov 1, 26 1 Differential Geometry I Fall 26, Georgia Tech Lecture Notes 14 Connections Suppose that we have a vector field X on a Riemannian manifold M. How can we measure how much X is changing

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true 3 ohn Nirenberg inequality, Part I A function ϕ L () belongs to the space BMO() if sup ϕ(s) ϕ I I I < for all subintervals I If the same is true for the dyadic subintervals I D only, we will write ϕ BMO

More information

DEVELOPMENT OF MORSE THEORY

DEVELOPMENT OF MORSE THEORY DEVELOPMENT OF MORSE THEORY MATTHEW STEED Abstract. In this paper, we develop Morse theory, which allows us to determine topological information about manifolds using certain real-valued functions defined

More information

Numerical Methods for Large-Scale Nonlinear Systems

Numerical Methods for Large-Scale Nonlinear Systems Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Variational Formulations

Variational Formulations Chapter 2 Variational Formulations In this chapter we will derive a variational (or weak) formulation of the elliptic boundary value problem (1.4). We will discuss all fundamental theoretical results that

More information

BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS

BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS BERGMAN KERNEL ON COMPACT KÄHLER MANIFOLDS SHOO SETO Abstract. These are the notes to an expository talk I plan to give at MGSC on Kähler Geometry aimed for beginning graduate students in hopes to motivate

More information

Open Questions in Quantum Information Geometry

Open Questions in Quantum Information Geometry Open Questions in Quantum Information Geometry M. R. Grasselli Department of Mathematics and Statistics McMaster University Imperial College, June 15, 2005 1. Classical Parametric Information Geometry

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

The oblique derivative problem for general elliptic systems in Lipschitz domains

The oblique derivative problem for general elliptic systems in Lipschitz domains M. MITREA The oblique derivative problem for general elliptic systems in Lipschitz domains Let M be a smooth, oriented, connected, compact, boundaryless manifold of real dimension m, and let T M and T

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Newtonian Mechanics. Chapter Classical space-time

Newtonian Mechanics. Chapter Classical space-time Chapter 1 Newtonian Mechanics In these notes classical mechanics will be viewed as a mathematical model for the description of physical systems consisting of a certain (generally finite) number of particles

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Appendix B Convex analysis

Appendix B Convex analysis This version: 28/02/2014 Appendix B Convex analysis In this appendix we review a few basic notions of convexity and related notions that will be important for us at various times. B.1 The Hausdorff distance

More information

Differential Geometry of Surfaces

Differential Geometry of Surfaces Differential Forms Dr. Gye-Seon Lee Differential Geometry of Surfaces Philipp Arras and Ingolf Bischer January 22, 2015 This article is based on [Car94, pp. 77-96]. 1 The Structure Equations of R n Definition

More information

Numerical Algorithms as Dynamical Systems

Numerical Algorithms as Dynamical Systems A Study on Numerical Algorithms as Dynamical Systems Moody Chu North Carolina State University What This Study Is About? To recast many numerical algorithms as special dynamical systems, whence to derive

More information

Exercises in Geometry II University of Bonn, Summer semester 2015 Professor: Prof. Christian Blohmann Assistant: Saskia Voss Sheet 1

Exercises in Geometry II University of Bonn, Summer semester 2015 Professor: Prof. Christian Blohmann Assistant: Saskia Voss Sheet 1 Assistant: Saskia Voss Sheet 1 1. Conformal change of Riemannian metrics [3 points] Let (M, g) be a Riemannian manifold. A conformal change is a nonnegative function λ : M (0, ). Such a function defines

More information

Linear Ordinary Differential Equations

Linear Ordinary Differential Equations MTH.B402; Sect. 1 20180703) 2 Linear Ordinary Differential Equations Preliminaries: Matrix Norms. Denote by M n R) the set of n n matrix with real components, which can be identified the vector space R

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

The Symmetric Space for SL n (R)

The Symmetric Space for SL n (R) The Symmetric Space for SL n (R) Rich Schwartz November 27, 2013 The purpose of these notes is to discuss the symmetric space X on which SL n (R) acts. Here, as usual, SL n (R) denotes the group of n n

More information

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality (October 29, 2016) Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/fun/notes 2016-17/03 hsp.pdf] Hilbert spaces are

More information