Self-concordant functions for optimization on smooth manifolds

Similar documents
The nonsmooth Newton method on Riemannian manifolds

Self-Concordant Barrier Functions for Convex Optimization

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

Variational inequalities for set-valued vector fields on Riemannian manifolds

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Weak sharp minima on Riemannian manifolds 1

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

OPTIMALITY CONDITIONS FOR GLOBAL MINIMA OF NONCONVEX FUNCTIONS ON RIEMANNIAN MANIFOLDS

Math 164-1: Optimization Instructor: Alpár R. Mészáros

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

Largest dual ellipsoids inscribed in dual cones

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Nonsymmetric potential-reduction methods for general cones

Steepest descent method on a Riemannian manifold: the convex case

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

HOMEWORK 2 - RIEMANNIAN GEOMETRY. 1. Problems In what follows (M, g) will always denote a Riemannian manifold with a Levi-Civita connection.

Newton s Method. Javier Peña Convex Optimization /36-725

Monotone Point-to-Set Vector Fields

Full Newton step polynomial time methods for LO based on locally self concordant barrier functions

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES

Lecture 3. Optimization Problems and Iterative Algorithms

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

Newton Method on Riemannian Manifolds: Covariant Alpha-Theory.

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

An interior-point trust-region polynomial algorithm for convex programming

Primal Central Paths and Riemannian Distances for Convex Sets

ON THE ARITHMETIC-GEOMETRIC MEAN INEQUALITY AND ITS RELATIONSHIP TO LINEAR PROGRAMMING, BAHMAN KALANTARI

Analysis II: The Implicit and Inverse Function Theorems

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

A LOCALIZATION PROPERTY AT THE BOUNDARY FOR MONGE-AMPERE EQUATION

On self-concordant barriers for generalized power cones

THE INVERSE FUNCTION THEOREM

Convex Functions and Optimization

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

10. Unconstrained minimization

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

Steepest descent method with a generalized Armijo search for quasiconvex functions on Riemannian manifolds

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE

Worst Case Complexity of Direct Search

Kantorovich s Majorants Principle for Newton s Method

On well definedness of the Central Path

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

CONTROLLABILITY OF NONLINEAR DISCRETE SYSTEMS

Unconstrained minimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Generalized vector equilibrium problems on Hadamard manifolds

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Central Paths in Semidefinite Programming, Generalized Proximal Point Method and Cauchy Trajectories in Riemannian Manifolds

Math Advanced Calculus II

Course Summary Math 211

A convergence result for an Outer Approximation Scheme

Lecture 14: October 17

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

7.3 Singular points and the method of Frobenius

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Tangent spaces, normals and extrema

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Numerical Algorithms as Dynamical Systems

ON TWO-DIMENSIONAL MINIMAL FILLINGS. S. V. Ivanov

DEVELOPMENT OF MORSE THEORY

Optimality Conditions for Constrained Optimization

Symmetric Spaces Toolkit

DIFFERENTIABLE CONJUGACY NEAR COMPACT INVARIANT MANIFOLDS

Optimization: Interior-Point Methods and. January,1995 USA. and Cooperative Research Centre for Robust and Adaptive Systems.

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

Convex Analysis and Optimization Chapter 2 Solutions

arxiv: v1 [math.oc] 1 Jul 2016

Extended Monotropic Programming and Duality 1

LECTURE 10: THE PARALLEL TRANSPORT

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

1. Geometry of the unit tangent bundle

Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

Multivariable Calculus

How curvature shapes space

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Primal-dual IPM with Asymmetric Barrier

5.2 The Levi-Civita Connection on Surfaces. 1 Parallel transport of vector fields on a surface M

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Choice of Riemannian Metrics for Rigid Body Kinematics

We describe the generalization of Hazan s algorithm for symmetric programming

The continuous d-step conjecture for polytopes

Applications of Linear Programming

Annals of Mathematics

PROJECTIONS ONTO CONES IN BANACH SPACES

Inexact alternating projections on nonconvex sets

A LOWER BOUND ON THE SUBRIEMANNIAN DISTANCE FOR HÖLDER DISTRIBUTIONS

4.7 The Levi-Civita connection and parallel transport

On Conically Ordered Convex Programs

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Linear Ordinary Differential Equations

ABSOLUTE CONTINUITY OF FOLIATIONS

1. Bounded linear maps. A linear map T : E F of real Banach

Transcription:

J Glob Optim 7) 38:437 457 DOI.7/s898-6-995-z ORIGINAL ARTICLE Self-concordant functions for optimization on smooth manifolds Danchi Jiang John B. Moore Huibo Ji Received: 8 February 6 / Accepted: 8 September 6 / Published online: October 6 Springer Science+Business Media B.V. 6 Abstract This paper discusses self-concordant functions on smooth manifolds. In Euclidean space, such functions are utilized extensively as barrier functions in interiorpoint methods for polynomial time optimization algorithms. Here, the self-concordant function is carefully defined on a differential manifold in such a way that the properties of self-concordant functions in Euclidean space are preserved. A Newton decrement is defined and analyzed for this class of functions. Based on this, a damped Newton algorithm is proposed for the optimization of self-concordant functions. Under reasonable technical assumptions such as geodesic completeness of the manifold, this algorithm is guaranteed to fall in any given small neighborhood of the optimal solution in a finite number of steps. The existence and uniqueness of the optimal solution is also proved in this paper. Hence, the optimal solution is a global one. Furthermore, it ensures a quadratic convergence within a neighborhood of the minimal point. This neighborhood can be specified in terms of the Newton decrement. The computational complexity bound of the proposed approach is also given explicitly. This complexity bound is shown to be of the ordero lnɛ)), whereɛ is the desired precision. Some interesting optimization problems are given to illustrate the proposed concept and algorithm. Keywords Self-concordant functions Polynomial algorithms Damped Newton method Geodesics Covariant differentials A part of the materials has been presented at 4 Conference on Decision and Control D. Jiang B) The School of Engineering, University of Tasmania, Private Bag 65, Hobart, TAS 7, Australia e-mail: Danchi.Jiang@utas.edu.au J. B. Moore H. Ji Dept. of Information Engineering, RSISE, Australian National University, Canberra, ACT, Australia J. B. Moore e-mail: John.Moore@anu.edu.au H. Ji e-mail: huibo@rsise.anu.edu.au

438 J Glob Optim 7) 38:437 457 Introduction Self-concordant functions play an important role in the powerful interior-point polynomial algorithms for convex programing in Euclidean space. Following the work of Nesterov and Nemirovskii [], many articles have been published using this type of function to construct barrier functions for interior-point algorithms. For example see [,3]). The idea of interior-point methods is to force the constraints of the optimization problem to be satisfied using a barrier penalty function in a composite cost function. This barrier function is relatively flat in the interior of the feasible region yet approaches to infinity in approaching the boundary. As the coefficient of the barrier function in the composite cost function converges to zero, the minimal point of the composite cost function converges to that of the original minimization problem. Self-concordant functions are a class of functions such that the third-order derivative is bounded by the cube of the square root of the second-order derivative, possibly with a constant scaling factor. The main advantage of using such a class of functions as barrier functions is that the computational complexity of the minimization problem of the constructed composite function is very low so that the original minimization problem can be solved in an amount of time that is a polynomial of the required precision. The notion of a self-concordant function has deep roots in geometry. In [4], it is shown that a Riemannian metric can be rendered by a self-concordant barrier function. Such a metric gives a good explanation of the optimal direction for optimization algorithms. As such, it can provide guidance for the construction of efficient interior-point methods. In this aspect, the optimal path is along a geodesic defined by the Riemannian metric. This path is not a straight line in Euclidean space. Indeed, optimization problems in Euclidean space can often be better understood with the structure of appropriately associated Riemannian manifolds see [5] for many meaningful examples). In fact, many optimization problems can also be better posed on manifolds rather than in Euclidean space. For example, optimization problems associated with orthogonal matrices, such as algorithms for the computation of eigenvalues and singular values, or Oja s flow in neural network pattern classification algorithms see [5 8]), to list a few. On the other hand, optimization methods such as the steepest descent method, the Newton method, and other related methods can be extended to Riemannian manifolds see [5,6,9]). It is natural to ask, what are the self-concordant functions and the associated interior-point methods on manifolds? Such a question can be justified by practical importance or by theoretical completeness. The self-concordant concept has been applied in [] with analysis restricted to a logarithm cost function optimized on a manifold. However, polynomial complexity is not available in that case. In this paper, we will give a useful definition of a self-concordant function on a manifold and give a thorough study of such. One of the advantages of solving minimization problems on manifolds is, as pointed out in [6], the reduction of the dimensions of the problems, compared to solving the original problems in their ambient Euclidean spaces. A typical intrinsic approach for minimization is based on the computation of geodesics and covariant differentials, which might be expansive. However, there are many meaningful cases where the computation can be very simple. One of the examples is the real compact semisimple Lie group endowed with its natural Riemannian metric. In such a case the geodesic and parallel transportation can be computed by matrix exponentiation, whose rich

J Glob Optim 7) 38:437 457 439 set of nice properties provide convenience in analysis and development of efficient algorithms see [6,]). Many particular classes of minimization problems in this category such as those on Orthogonal groups or Stiefel manifolds are studied in [5]. Another simple but non-trivial case is the sphere, where the geodesic and parallel transportation can be computed via trigonometric functions and vector calculation. This paper is organized as follows: notations and assumptions are listed in Sect.. In Sect. 3, the self-concordant function is defined to preserve as many nice properties of the original version in Euclidean space as possible. To facilitate the analysis and understanding of the elegantly proposed damped Newton method, the Newton decrement is defined and analyzed in Sect. 4. Then, the existence and uniqueness of the optimal solution are proved and a damped Newton algorithm is proposed in Sect. 5. It is shown that this algorithm has a similar convergence property and computational complexity to the algorithm for self-concordant functions in Euclidean space proposed in []. Two interesting examples are included to illustrate the proposed concept and approach in this paper in the last section. Assumptions and notations In this paper, notations will be listed in the following paragraph. Definitions and other details of the concepts and related results used will not be included. Please refer to [] for more details. M A smooth manifold T p M Tangent space of M at point p TM tangent bundle of M T p M -forms of M at point p T M dual bundle of TM An affine connection defined on M X i the ith order covariant differentials with respect to the vector field X, based on the affine connection, wherei is an integer exp p X Exponential map of vector field X based on the parallel transform defined using the affine connection X pq The tangent vector field of the shortest geodesic connecting point p and q on M, where the geodesic parameter is at p and at q t τ Parallel transform on M, where the affine connection is determined according to the context. In this paper, our purpose is to show how the self-concordant function can be extended to non-euclidean space. The discussion is not intended for the most general cases to avoid unnecessary details. As such, we make the following assumptions: Assumption ) The affine connection is symmetric, meaning that Y X f ) = X Y f ), for any vector field X, Y and function f. A very interesting example of the symmetric connection is the Riemannian connection. ) The smooth manifold M of interest is geodesic complete. Furthermore, the geodesic between two points is unique.

44 J Glob Optim 7) 38:437 457 3 Self-concordant functions Let M be a smooth manifold of finite dimension and a symmetric affine connection defined on it. Assume that M is geodesic complete in this paper to focus on the main ideas rather than how general the proposed concept and method can be applied. Consider a function defined on M: f : M R, which has an open domain, a closed map, meaning that {f P), P), P domf )} is a closed set in the product manifold R M, and is at least three times differentiable. Definition f is a self-concordant function with respect to if and only if the following condition holds: ] X 3 3/ f p) Mf [ X f p), X Tp M, p M, ) where M f is a positive constant associated with f. As noticed in [], if a function f is self-concordant with the constant M f, then the function M f f is self-concordant with the constant. It can also be directly checked by simple computation. As such, we assume M f = for the rest of this paper. Also notice that the second covariant differential of a self-concordant function is a positive semi-definite mapping, meaning that it is symmetric with respect to two tangent vectors and its value is always non-negative. For the simplicity of the analysis in this paper, we only consider those functions that satisfy the following assumption: Assumption X f p) >, p domf ), X T pm. Then, the second-order covariant differentials can be used to define a Dikin-type ellipsoid W p; r) as follows: Definition For any p domf ), andr >, { [ ] } / W p; r) := q M X pq f p) < r, where X pq is the vector field defined by the geodesic connecting the points p and q. The definition of self-concordant function is based on second-order and third-order covariant differentials with respect to the same vector field X. However, it is equivalent to the case where they are calculated with respect to different vector fields. More specifically, the following property holds: Property If f is a self-concordant function defined on M, then, the following inequality holds: X X X3 f p) ] / [ ] / [ / Mf [ X f p) X f p) X 3 f p)], X, X, X 3 T p M. ) This property comes from the linearity of the mapping df :T pm T p M R, and 3 df :T pm T p M T p M R,

J Glob Optim 7) 38:437 457 44 defined by df X, X ) := X X f P), respectively, for any X, X, X 3 T p M. and 3 df X, X, X 3 ) := X X X3 f p), Proof The following proof is inspired by that in [], Proposition 9.). However, it should be pointed out that, the trilinear property of the third-order differentials is not required in our proof. By assumption the affine connection is symmetric. Therefore, the manifold is torsion free. With a slight misuse of notions, for a given tangent vector X at p, we let X also denote the vector field generated by X using parallel transportation in a neighborhood of p where a geodesic starting from p along the direction X exists uniquely. In this case, it is easy to see that [X i, X j ]=, i = j. Hence the corresponding second-order covariant differentials is symmetric, meaning As such, it can be directly checked that Xi Xj f p) = Xj Xi f p), i, j =,, 3. X X X3 f p) = X X X3 f p), 3) X X X3 f p) = X X3 X f p). Adopting the reasoning process in Proposition 9.. in [, p 36], we only need to show that the extremal value of the function: WX, X, X 3 ) := X X X3 f p), X i T M p), X i f p) =, i =,, 3 4) is achieved at a triple vector in the form of Z, Z, Z). Assume the contrary, let X, Y, Z ) be an extremal point of W. By Lemma 9.. in [] one knows that there exists a unit vector X, meaning its second covariant differential of f is, in the direction X + Y such that X, X, Z ) achieves the extreme. By the same method in this lemma, one can show that there exists a unit vector Y in the one dimensional linear space spanned by X + Z such that X, Y, Y ) achieves the extremal value. By using this method iteratively, one constructs a sequence of vectors in R n R n R n : {X, Y, Z ), X, X, Z ), X, Y, Y ), X, X, Y ), X, Y, Y ),...}, where the vector in the middle is always in the one-dimensional linear space spanned by the summation of two different vectors in the previous triple and either the first or the last vector is the same as the middle. By the assumption that X f p) >forall tangent vector X, one can show that this sequence is convergent to a vector triple in the form of Z, Z, Z ). Hence this property is proved. A self-concordant function also has the following interesting property: Property p domf ) M, W p;) domf ). This property gives a safe bound for the line search along geodesics for optimization problems so that the search will always be in the admissible domain. We need the following lemma to prove it:

44 J Glob Optim 7) 38:437 457 Lemma Given that f is a self-concordant function defined on a smooth manifold M and X a vector field on M, define a function φt) : R Ras follows: [ ] / φt) := τ pexpp tx)x f exp p tx), 5) where τ pq is the parallel transportation from the point p to the point q and exp p X) is the exponential map of the vector field X at p. Then, the following results hold: ) φ t). ) If φ) >, then, φ), φ)) domφ). Proof It can be calculated that φ t) = ] d dt [ τpexpptx)x f exp p tx) ] 3/ [ τpexpptx)x f exp p tx) τ 3 pexpp tx)x f exp p tx) = ] 3/. [ τpexpptx)x f exp p tx) The claim ) follows directly from the definition of self-concordant function. Assume the cliam ) is not true. Since φ) >, because of the continuity of τ f exp pexpp tx) ptx), there is a symmetric neighborhood of in the definition domain of φ. Let t, t) denote the largest of such symmetric neighborhoods. Then, at least one of the two end points is not in domφ). Without loss of generality, assume t is this point and t <φ). Because φt) φ) t, we have τ f exp pexpp tx) p tx) < φ) t ) φ) t) < +, t t, t). Hence, lim τ t t pexpp tx)x f exp p tx) = τ pexpp tx) Xf exp p tx) = [φ) t ] < +. The existence of f exp p tx) comes from the assumption that f has a closed map and the fact that t s f exp p tx) = τ f exp pexpp νx) p νx)dν + Xf X) ds < +. Therefore,φ t) is well-defined, which is contradiction to the assumption we made. As such, ) holds. Now we can prove Property. Proof Notice, from Lemma, that for any vector field X TM, {exp p tx) t <φ )} domf ).

J Glob Optim 7) 38:437 457 443 On the other hand, { } {exp p tx) t <φ )} = exp p tx) t < X f p) = {exp ) } p tx t X f p) < = W p;). This completes the proof. In the following, two groups of properties will be given to reveal the relationship between two different points on a geodesic. They are delicate characteristics of selfconcordant functions. In fact, they are the foundation for the polynomial complexity of self-concordant functions. Property 3 For any p, q domf ), such that there is a geodesic contained in the definition domain of f connecting the points p and q, if f is a self-concordant function, the following results hold: [ X pq q) f q) ] / [ Xpq p) f p) ] / ] /, 6) + [ Xpq p) f p) X pq p) Xpq q)f q) Xpq p)f p) f p) ] /, 7) + [ Xpq p) f p) ] / f q) f p) + Xpq p)f p) + [ X pq p) f p) ] ) / ln + [ X pq p) f p). 8) Proof Let φt) be the same function defined in Lemma, where one can see that φ) φ) +. This is equivalent to 6) taking into account that φ) = [ X pq p) f p)] /,andφ) =[ X pq q) f q)] /. Furthermore, Xpq q)f q) Xpq p)f p) = = X pq exp p tx pq ) f exp p tx pq)dt, t tx pq exp p tx pq ) f exp p tx pq)dt, 9) which leads to 7) using the inequality 6).

444 J Glob Optim 7) 38:437 457 For the inequality 8), notice that: f q) f p) Xpq f p) = = Xpq exp p tx pq )f exp p tx pq ) Xpq p)f p))dt [ ) ] txpq exp t p tx pq )f exp p tx pq txpq p)f p) dt tx pq p) f p) t +[ tx pq p)] / ) dt. Let r =[ X pq p) f p)]/. The last integral becomes tr dt = r ln + r), + tr which leads to the inequality 8)byreplacingr with its original form. Property 4 For any p domf ), q W p;), and X TM, there holds: [ X pq p) f p)]/ ) Xp) f p) Xq) f q) Xp) f p) [ ] ) /, X pq p) f p) ) X pq p) Xpq q)f q) Xpq p)f p) f p) ] /, ) [ Xpq p) f p) ] / [ ] /). f q) f p) + Xpq p)f p) [ X pq p) f p) ln X pq p) f p) ) Proof Let ψt) be a function defined in the following form: Then, its derivative satisfies ψt) = Xexp p tx pq ) f exp p tx pq). ψ t) = Xpq exp p tx pq ) Xexp p tx pq ) f exp p tx pq) X pq exp p tx pq ) f exp p tx pq) / ψt) = t tx pq exp p tx pq ) f exp p tx pq) / ψt) ] / t [ Xpq p) f p) t ] / ψt). 3) t [ Xpq p) f p)

J Glob Optim 7) 38:437 457 445 Here the last part is obtained by applying φ) φ) from Lemma. Integrating both sides of the inequality 3), one obtains [ X pq p) f p)]/ ) ψ) ψ) [ ] ) /, X pq p) f p) which is equivalent to the inequality ). Combining the inequality ) and the formula 9), one obtains Xpq q)f q) Xpq p)f p) t f p) txpqp) [ ] ) / dt tx pq p) f p) = X pq p) f p) ] /, [ Xpq p) f p) which proves the inequality ). Combining this result and using the same technique as that used in the proof of the last property, there holds: f q) f p) Xpq f p) = = = Xpq exp p tx pq )f exp p tx pq )dt Xpq f p) { } t [ tx pq exp p tx pq )f exp p tx pq )] Xpq f p) dt tx pq f p) t [ tx pq f p)] / ) dt ] / [ X pq p) f p) ln [ X pq p) f p) ] / ). As such, the inequality ) is obtained by a simple transformation of this inequality. 4Newtondecrement Consider the following auxiliary quadratic cost defined on T p M : N f,p X) := f p) + X f p) + Xf p). 4) Definition 3 The Newton decrement X N f, p) is defined as the minimal solution to the auxiliary cost function given by 4). More specifically, X N f, p) := arg min N f,px). 5) X T p M Similar to the case in Euclidean space, the Newton decrement can be characterized in many ways. The following theorem summaries its properties.

446 J Glob Optim 7) 38:437 457 Theorem Let f : M R be a self-concordant function, p, a given point in domf ) M, and X N, its Newton decrement defined at p. The following results hold: XN X f p) = X f p), X T p M, 6) } X N f p) = max { X f p) X T p M, X f p). 7) Furthermore, if the quadratic form µ :T pm TM p), defined by: µ X) = X µ, for a given µ TM p), is of full rank, then X T pm X N = df ) df. 8) Proof Since p is a given point on the manifold M, the claimed results can be converted into their local representation in Euclidean space. More specifically, consider the following quadratic function: qx) := x Ax + b x + c, where A R n R n, A = A, b R n, c R. 9) Let x denote the optimal point. Then, the gradient of q at x must be a zero vector. i.e., y Ax + b) =, y R n. This is the local representation of 6). The case where µ is isomorphic is corresponding to the case where A is non-degenerate. Then, x = A b, which is the local representation of 8). On the other hand, y b = y Ax = y A / A / x y Ay) / x ) Ax ) /, where the equality holds if and only if y = x. As such, { } max{y b y R n, y y b Ay } =max y Ay y Rn = x ) Ax. This is the local representation of 7). Therefore, the proof is complete. 5 A damped newton algorithm for self-concordant functions Consider now the minimization problem of self-concordant functions on a smooth manifold. First, let us establish the existence of the minimal point: Theorem Let λ f p) be defined as follows: λ f p) := X f p) max for p domf ). ) X T p M X f p)

J Glob Optim 7) 38:437 457 447 If λ f p) < for some p domf ), then there exists a unique point p f domf ) such that f p f ) = min{f p) p domf )}. Proof Let p is a point such that λ f p )<. For any q domf ) such that f q) <f p ), from 8) we have f q) f p ) λ f p )[ X f p p q )] / [ ] / [ ] ) / + X f p p q ) ln + X f p p q ). Hence, [ ] ) / ln + X f p p ) q [ ] / λ. X f p p ) q Since ln + t) lim =, t + t there exists a constant c >, such that [ / X f p p q )] c. ) Hence these X p q contained in the compact set defined by inequality ). Consider the map from the any tangent vector X to its geodesic expx). This map is continuous by the definition of geodesic. This indicates the image of compact set defined by inequality ) is also compact. Therefore, a minimal point exists. On the other hand, let p denote a minimal point. Then, The uniqueness is proved. f q) f p ) +[ X p q f p )] / ln +[ X p q f p )] / ) > f p ), q domf ), q = p. ) Consider the following damped Newton method: Algorithm Damped Newton algorithm) Step : Find a feasible point p domf ). Step k: p k = exp pk +λ f p k ) X N, where exp pk tx N is the exponential map of the Newton decrement at p k. The following theorem establishes the convergence properties of the proposed damped Newton algorithm: Theorem 3 Let the minimal point of f p) be denoted as p, and p is any admissible point in W p ;). ) The following inequality holds: [ Xpp f p) ] / λ f p) λ f p). 3)

448 J Glob Optim 7) 38:437 457 ) If λ f p) <, then f p) f p ) λ f p) ln λ f p)). 4) 3) For the proposed Damped Newton Method algorithm, there holds: Proof f p ) f p k ) f p k ) λ f p k ) ln + λ f p k ))). 5) ) Let [ X pp f p)]/ be denoted as rp). Inviewof7)and) we have: r r X pp f p) r p). 6) + rp) On the other hand, there holds by the definition of λ f p). Therefore, where r can be solved as follows: Xpp f p) λ f p)rp) λ f p) rp) + rp), rp) λ f p) λ f p), which is 3). ) Based on 8) and the inequality 3) obtained above, one has: f p ) f p) Xpp f p) + rp) ln + rp)) rp) ln + rp)) λ f p)rp). 7) Let an auxiliary function gx, y) be defined as: It can be easily checked that and gx, y) = x ln + x) xy + y ln y), x, > y. gx,) = x ln + x) g, y) = y ln y). If there is a point x, y ) such that gx, y )<, this function must have a minimal interior point. The gradient will be zero at such a point. However, it can be calculated that g x x,y )= y =, + x g y x,y )= x + + =. y

J Glob Optim 7) 38:437 457 449 The solution to this system of equations satisfies y ) + x ) =. As such, at the minimal point there holds: gx, y ) = x x y + y = x y ) + y >, which is a contradiction. Therefore, the minimum, if it exists, is achieved at the boundary. Hence, gx, y), x, > y. Applying this inequality to 7), we obtain the right side of the inequality 4). 3) It is clear that p k+ W p k,) since [ ] +λ f p k ) X f p k ) 4.4) = λ f p k ) <. N + λ f p k ) Applying ), there holds f p k+ ) f p k ) + ln + λ f p k ) X N f p k ) + λ f p k ) [ X N f p k )] / = f p k ) λ f p k ) + ln + λ f p k )) + λ f p k ) [ X N f p k )] / ) by the definition of λ f p k ) and the results in Theorem. Therefore, the inequality 5)isproved. Notice that the two functions λ ln + λ), λ, + ) and λ ln λ), λ, ) are positive and monotonically increasing. The results proved in Theorem 3 have already given a set of error bounds for the function f p) and estimation of the variable point p based on λ f p). More specifically, the inequality 5) implies the following results: Corollary For the Damped Newton algorithm, the λ f p k ) is bounded as follows: λ f p k ) ln + λ f p k )) f p k ) f p ). 8) Furthermore, for a given precision ɛ >, the number of iterations, denoted as N, required such that λ f p N )<ɛis less than or equal to f p ) f p ) ɛ ln+ɛ). For the convergence rate, the following theorem reveals the quadratic convergence and the computational complexity of the damped Newton algorithm proposed in this paper.

45 J Glob Optim 7) 38:437 457 Theorem 4 For the damped Newton algorithm proposed in this paper, the following result holds: λ f p k+ ) λ f p k). 9) Proof For a vector field X TM, construct an auxiliary function ψt) as follows: ψt) := X f exp pk tx N p k )), t ),. 3) + λ f p k ) Now, with a slight misuse of the notation that ignores the vector field adaptation: ψ t) = XN p k ) X f exp pk tx N p k )), ψ t) = X N X f exp pk tx N p k )) X N f exp pk tx N p k ))[ X f exp p k tx N p k ))] /. The last inequality is obtained in view of Property. Notice that: X N f exp pk tx N p k )) = t tx N f exp pk tx N p k )) tx N f p k ) t [ ] ) / tx N f p k ) = λ f p k) tλ f p k )) 3) by applying the inequality ) and the fact that p k+ W p k,). Then, by applying theleftpartoftheinequality) we have ψ t) = λ f p k) [ ] / tλf p k ) ) X f exp p k tx N p k )) λ f p k) tλf p k ) ) tλ f p k ) [ X f p k)] / λ f p k) tλf p k ) ) 3 [ X f p k)] /.

J Glob Optim 7) 38:437 457 45 Let +λ f p k ) be denoted as r. Then, r ψr) = ψ) + = ψ) + r ψ τ)dτ τ {ψ ) + ψ t)dt}dτ X f p k ) + r XN X f p k ) r τ +[ X f p λ k)] / f p k) dt dτ tλ f p k )) 3 = X f p k ) + r XN X f p k ) +[ X f p λ k)] / f p k)r λ f p k )r. By using the 6) of the Newton decrement in Theorem, the last inequality results in: [ ] / ψr) r) X f p k ) + X f p λ f k) p k)r λ f p k )r [ ] / [ ] / r)λ f p k ) X f p k) + X f p λ f k) p k)r λ f p k )r = λ f p k) [ / X k)] + λ f p k ) f p, where [ X f p k)] / can be estimated by the left part of the inequality ). That is: ψr) λ f p k) + λ f p k ) rλ f p k ) [ X exp p k rx N p k ))] / = λ f p k)[ X exp p k rx N p k ))] /. Therefore, λ f p k+ ) = X f p k+ ) max [ X TMp k+ ) X f p k+ ) ] / = max X TMp k+ ) ψr) [ X f exp p k rx N p k )) ] / λ f p k). The proof is complete. It is clear that λ f p k+ )<λ f p k ) if λ f p k )<. It can also be proved by simple analysis that λ ln λ) < λ, λ, /).

45 J Glob Optim 7) 38:437 457 Remark Under the condition that λp) <, p M and starting from an admissible initial point, the convergence results of the proposed damped Newton algorithm can be summarized as: ) For not more than f p ) f p ) / ln3/) steps λ f p k ) will fall into the interval, /). ) λ f p k ) will monotonically and quadratically converge to zero starting from any point p k such that λ f p k ), /). 3) [ X f pp p)]/ λ f p), λ f p), /). 4) f p) f p ) λ f p), λ f p), /). 5) For a given precision ɛ, /), the maximal number of iterations such that λ f p k ) ɛ, is not more than { f p ) f p ) min ɛ ln + ɛ), min This amount is then bounded by α,/) f p ) f p ) α ln + α) + ln ɛ )}. ln α f p ) f p ) / ln3) + ln) + ln ɛ ln). 6 Illustrative examples This section includes two simple examples to illustrate the proposed damped Newton method: Example Consider the following optimization problem: minimize f x) := x + x, subject to: x, x >, x = x, x ) H, where H is a hyperbola satisfying x x =. The Riemannian metric is defined as the induced metric from the ambient Eucleadean space. Let T x H be the tangent space of H at x. i.e.,t x H ={αh h = x, x ), α R}. Then, the geodesic on the hyperbola can be calculated as: exp x tx = Ax, 3) ) e αt where A = e αt and X T x H. Hence, covariant differentials can be calculated as follows: X f x) = x + x )α, h f X) = x + x )α, h 3 f X) = x + x )α 3. It can be seen that X f x) is positive definite. Then 3 X f x) ) X f x) ) 3 = x x ). 33) x + x ) 3

J Glob Optim 7) 38:437 457 453 As such, f x) is a self-concordant function. Now we can apply the proposed damped Newton algorithm. Algorithm Damped Newton algorithm) step : Randomly generate a feasible initial point x. step k: Calculate the kth step according to: ) x k = exp x k + λx k ) X N = A k x k, where A k = e +λx k ) αk, e +λx k ) αk λx k ) = x k + x k )α ) T X N = α k x k, x k, α k = xk x k x k + x k. The simultaion result is shown in Table. Example Consider the following optimization problem: min f x) := lnx lnx n, { x = x, x subject to :,..., x n ) S n, < x,..., x n <, 34) where S n is the unit sphere with x T x =. Here, we define a Riemannian metric as the induced metric from the ambient Euclidean space, i.e., < y, z >= y T z where Table The simulation result for Example Step k x fx) λx) 6.,.667) 6.667.55, 3.96) 4.6.349.985,.335) 3.3.864 3.48,.376).797.4545 4.973,.56).4389.69 5.6487,.547).93.8938 6.47,.88).49.634 7.9379,.66).4.3 8.57,.9943)..96 9.,.)..8.,.)...,.)..

454 J Glob Optim 7) 38:437 457 y, z T x S n.letx S n and h = h, h,..., h n ) T x S n have unit length if not, we can normalize it). Then, the geodesic on the sphere is exp x th = x cost) + h sint), and the parallel transportation along the geodesic τh = h cost) x sint). Therefore, the following covariant differentials can be calculated: h f x) = h x h x,..., h n x n, h f x) = h x + h x +,..., + h n x + n, n h h 3 3 f x) = x 3 + h + h3 x x 3 + h +,..., + h3 n x x 3 + h n n x n The following procedure is to prove that the function f is self-concordant defined on S n. It can be easily checked that for any h T x S n, the second covariant differentials h f x) are always positive. Let y = h x,..., y n = h n x n, y = y, y,..., y n ) T, b = y +), y +),...,y n +))T. Then, we have 3 M f = h f x) ) h f x))3 = 4 y 3 +,..., +y3 n + y + +y n ) y + +y n + n) 3 = 4 y y + ) + +y ny n + )) y + ) + +y n + ))3 4 y b y + ) + +y n + )) y + ) + +y n + )) 4y + +y n ) y + ) + +y n + ) 4. 35) Therefore, the function f is self-concordant function defined on S n. We apply the damped Newton algorithm on this problem. In particular, n =. Figure shows the results of the damped Newton method on function f,wherex is the optimal solution and the logarithm function is -based. The result clearly demonstrates the quadratic convergence of the proposed algorithm. To illustrate the advantage of the proposed algorithms on manifold, this optimization problem is also solved in Euclidean space using the Lagrangian multiplier method [3], as a comparison example. Given the Lagrangian multiplier α R, the Lagrangian form of the original problem is Lx, α) = ln x ln x n + α x T x). 36) Then, Newton method can be applied in Euclidean space to find the critical point x, α ) of L. However, because the parametrization of problem in Euclidean space )

J Glob Optim 7) 38:437 457 455 - - -3 log xx * -4-5 -6-7 -8-9 3 4 5 6 7 8 9 Iteration number Fig. The results of damped Newton method of Example may not necessarily reflect its intrinsic global property, the efficiency and convergence of the Newton method will depend on the choice of initial points, which cannot be guaranteed. Figure shows the simulation result obtained starting from a randomly generated initial point. The performance is inferior to that shown in Fig.. 7 Conclusions This paper reports our effort to generalize the definition and known results for self-concordant functions in Euclidean space to manifolds. It lays a comparative solid foundation to facilitate the construction of barrier functions for interior-point algorithms on manifolds. For the proposed self-concordant function defined on a general class of smooth manifolds, a number of desirable properties are obtained. These include the feasibility of a Dikin-type ellipsoid and several inequalities that characterizes the similarity between self-concordant functions and quadratic functions along the geodesics of the manifold in various inequalities. Under the convexity condition on manifold defined by second-order covariant differential, it is also shown that the optimal solution is global. A Newton decrement is defined for this specific class of functions. This concept is analyzed in regard to the relationship between first-order covariant derivatives along Newton direction and along general direction, and to the maximal ratio of the norm of first-order covariant derivative and that of second-order derivative. The later facilitate

456 J Glob Optim 7) 38:437 457.5.5 log xx * ).5 -.5 3 4 5 6 7 8 9 Iteration number Fig. The results of Newton method in Euclidean space for Example the definition of the index λ f p). With those theoretical preparation, the existence of global optimal solution is shown when λ f p) < holds for a point p. A damped Newton algorithm is proposed. Its computational complexity is carefully analyzed and precise bound is shown to be O lnɛ)). Two simple but meaningful optimization problems are given to illustrate the efficiency of the proposed concept and algorithm. Acknowledgements The authors are very grateful for the valuable comments given by anonymous reviewers. The first author would like to thank Dr. Jochen Trumph for valuable discussions and suggestions. His research was Partially supported by the IRGS Grant J56 from UTAS. John B. Moore was Partially supported by an ARC Large Research Grant A589. Huibo Ji s work was in part supported by National ICT Australia. References. Nesterov, Y., Nemirovskii, A.: Interior-Point polynomial algorithms in convex programming. Studies in Applied Mathematics, vol. 3. SIAM, Philadelphia, PA 994). Glineur, F.: Improving complexity of structured convex optimization problems usiing selfconcordant barriers. Eur. J. Oper. Res. 43, 9 3 ) 3. Faybusovich, L.: Infinite-dimensional semidefinite programming: regularized determinants and self concordant barriers. Fields Inst. Commun. 8, 39 49 998)

J Glob Optim 7) 38:437 457 457 4. Nesterov, Y., Todd, M.: On the Riemannian geometry defined by self-concordant barriers and interior-point methods. Found. Comput. Math., 333 36 ) 5. Helmke, U., Moore, J.B.: Optimization and Dynamical Systems. Springer-Verlag, London 994) 6. Smith, S.: Optimization techniques on Riemannian manifolds. Fields Inst. Commun. 3, 3 46 994) 7. Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl., 33 353 998) 8. Yan, W., Helmke, U., Moore, J.B.: Global analysis of Oja s flow for neural networks. IEEE Trans. Neural Netw. 5, 674 683 993) 9. Addler, R.L., Dedieu J-P, M., Joesph, Y., Martens, M., Shub, M.: Newton s method on riemannian manifolds and a geometric model for the human spine. IMA J. Numer. Anal., 359 39 ). Udriste, C.: Optimization methods on Riemannian manifolds. Algebera, Groups Geom. 4, 339 359 997). Helgason, S.: Differential Geometry, Lie Groups, and Symmetric Spaces. Academic Press, New York 978). Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers Dordrecht, 4 3. Bertsekas, D.: Nonlinear Programming, nd edn. Athena Scientific, Belmonk, MA 999)