- PDF Free Download

Size: px

Start display at page:

Download ""

Brittany Morgan
5 years ago
Views:

1 Second-order optimality and beyond: characterization and evaluation complexity in nonconvex convexly-constrained optimization by C. Cartis, N. I. M. Gould and Ph. L. Toint Report NAXYS August University of Namur, 6, rue de Bruxelles, B5 Namur (Belgium)

2 Second-order optimality and beyond: characterization and evaluation complexity in convexly-constrained nonlinear optimization C. Cartis N. I. M. Gould and Ph. L. Toint 5 August 26 Abstract High-order optimality conditions for convexly-constrained nonlinear optimization problems are analyzed. A corresponding (expensive) measure of criticality for arbitrary order is proposed and extended to define high-order ǫ-approximate critical points. This new measure is then used within a conceptual trust-region algorithm to show that, if derivatives of the objective function up to order q can be evaluated and are Lipschitz continuous, then this algorithm applied to the convexly constrained problem needs at most O(ǫ (q+) ) evaluations of f and its derivatives to compute an ǫ-approximate q-th order critical point. This provides the first evaluation complexity result for critical points of arbitrary order in nonlinear optimization. An example is discussed showing that the obtained evaluation complexity bounds are essentially sharp. Keywords: nonlinear optimization, high-order optimality conditions, complexity theory, machine learning. Introduction Recent years have seen a growing interest in the analysis of the worst-case evaluation complexity of nonlinear (possibly nonconvex) smooth optimization (for the nonconvex case only, see [ 7, 3, 5 9, 2 3, 34 36, 39, 43 47] among others). In general terms, this analysis aims at giving (sometimes sharp) bounds on the number of evaluations of a minimization problem s functions (objective and constraints, if relevant) and their derivatives that are, in the worst case, necessary for certain algorithms to find an approximate critical point for the unconstrained, convexly-constrained or general nonlinear optimization problem. It is not uncommon that such algorithms may involve possibly extremely costly internal computations, provided the number of calls to the problem functions is ept as low as possible. Mathematical Institute, Oxford University, Oxford OX2 6GG, Great Britain. coralia.cartis@maths.ox.ac.u. Numerical Analysis Group, Rutherford Appleton Laboratory, Chilton OX QX, Great Britain. nic.gould@stfc.ac.u. Namur Center for Complex Systems (naxys) and Department of Mathematics, University of Namur, 6, rue de Bruxelles, B-5 Namur, Belgium. philippe.toint@unamur.be.

3 Cartis, Gould, Toint: high-order optimality and complexity 2 Most of the research to date focusses on finding first, second and, in one case, third-order critical points. Evaluation complexity for first-order critical point was first investigated, for the unconstrained case, by Nesterov in [38] and for first- and second-order Nesterov and Polya [39] and by Cartis, Gould and Toint in [2]. Third-order critical points were studied in [], motivated by highly-nonlinear problems in machine learning. However, the analysis of evaluation complexity for orders higher than three is missing both concepts and results. Thepurposeofthepresentpaperistoimproveonthissituationintwoways. Thefirstisto review optimality conditions of arbitrary orders q for convexly-constrained minimization problems, and the second is to describe a theoretical algorithm whose behaviour provides, for this class of problems, the first evaluation complexity bounds for such arbitrary orders of optimality. The paper is organized as follows. After the present introduction, Section 2 discusses some preliminary results on tensor norms, a generalized Cauchy-Schwarz inequality and high-order error bounds from Taylor series. Section 3 investigates optimality conditions for convexlyconstrained optimization, while Section 4 proposes a trust-region based minimization algorithm for solving this class of problems and analyzes its evaluation complexity. An example is introduced in Section 5 to show that the new evaluation complexity bounds are essentially sharp. A final discussion is presented in Section 6. 2 Preliminaries 2. Basic notations In what follows, y T x denotes the Euclidean inner product of the vectors x and y of IR n and x = (x T x) /2 is the associated Euclidean norm. If T and T 2 are tensors, T T 2 is their tensor product. B(x, ), the ball of radius centered at x. If X is a closed set, X denotes its boundary and X denotes its interior. The vectors {e i } n i= are the coordinate vectors in IR n. The notation λ min [M] stands for the leftmost eigenvalue of the symmetric matrix M. If {a } and {b } are two infinite sequences of non-negative scalars converging to zero, we say that a = o(b ) if and only if lim a /b =. The normal cone to a general convex set C at x C is defined by N C (x) def = {s IR n s T (z x) for all z C} and its polar, the tangent cone to F at x, by T C (x) = N C(x) def = {s IR n s T v for all v N C }. Note that C T C (x) for all x C. We also define P C [ ] be the orthogonal projection onto C and use the Moreau decomposition [37] which states that, for every x C and every y IR n y = P TC (x)[y]+p NC (x)[y] and (P TC (x)[y] x) T (P NC (x)[y] x) =. (2.) (See [2, Section 3.5] for a brief introduction of the relevant properties of convex sets and cones, or to [32, Chapter 3] or [42, Part I] for an in-depth treatment.)

4 Cartis, Gould, Toint: high-order optimality and complexity Tensor norms and generalized Cauchy-Schwartz inequality We will mae substantial use of tensors and their norms in what follows, and thus start by establishing some concepts and notation. If the notation T[v,...,v j ] stands for the tensor of order q j resulting from the application of the q-th order tensor T to the vectors v,...,v j, the (recursively induced () ) Euclidean norm q on the space of q-th order tensors is the given by def T q = max T[v,...,v q ]. (2.2) v = = v q = (Observe that this value is always non-negative since we can flip the sign of T[v,...,v q ] by flipping that of one of the vectors v i.) Note that the definition (2.2) implies that T[v,...,v j ] q j = max T[v,...,v j ][w,...,w q j ] w = = w q j = ( [ ]) ( ) = max T v w = = w q j = v,..., v j j v j,w,...,w q j v i i= ( ) ( ) j max T[w,...,w q ] v i w = = w q = i= j = T q. v i, i= (2.3) a simple generalization of the standard Cauchy-Schwartz inequality for order- tensors (vectors) and of Mv M v which is valid for induced norms of matrices (order-2 tensors). Observe also that perturbation theory (see [33, Th. 7]) implies that T q is continuous as a function of T. If T is a symmetric tensor of order q, define the q-ernel of the multilinear q-form as T[v] q def = T[v,...,v] }{{} q times er q [T] def = {v IR n T[v] q = } (see [8,9]). Note that, in general, er q [T] is a union of cones. Interestingly, the q-ernels are not only unions of cones but also subspaces for q =. However this is not true for general q-ernels, since both (,) T and (,) T belong to the 2-ernel of the symmetric 2-form x x 2 on IR 2, but not their sum. We also note that, for symmetric tensors of odd order, T[v] q = T[ v] q and thus that min d T[d]q = min d ( T[ d]q ) = min d ( T[d]q ) = max d T[d]q, (2.4) where we used the symmetry of the unit ball with respect to the origin to deduce the second equality. () That it is the recursively norm induced by the standard Euclidean norm results from the observation that [ ] max T[v,...,vq] = max v = = v q = v q = max T[v,...,vq ] v = = v q = [v q].

5 Cartis, Gould, Toint: high-order optimality and complexity High-order error bounds from Taylor series The tensors considered in what follows are symmetric and arise as high-order derivatives of the objective function f. For the p-th derivative of a function f : IR n IR to be Lipschitz continuous on the set S IR n, we require that there exists a constant L f,p such that, for all x,y S, p xf(x) p xf(y) p L f,p x y (2.5) where p xh(x) is the p-th order symmetric derivative tensor of h at x. Let T f,p (x,s) denote (2) the p-th order Taylor-series approximation to f(x + s) at some x IR n given by and consider the Taylor identity φ() t () = T f,p (x,s) def = f(x)+ ( )! p j= j! j xf(x)[s] j (2.6) ( ξ) [φ () (ξ) φ () ()]dξ (2.7) involvingagivenunivariatec function φ(α) and its -th order Taylor approximation t (α) = i= φ(i) ()α i /i! expressed in terms of the ith derivatives φ i, i =,...,. Let x,s IR n. Then, picing φ(α) = f(x+αs) and = q, it follows immediately from the fact that t p () = T f,p (x,s), the identity (2.3), (2.5), (2.6) and (2.7) imply that, for all x,s IR n, ( ξ) p dξ = p, (2.8) f(x+s) T f,p (x,s)+ (p )! [ ( ξ) p T f,p (x,s)+ (p )! ( ξ) p p xf(x+ξs)[s] p p xf(x)[s] p dξ ] dξ max ξ [,] p xf(x+ξs)[s] p p xf(x)[s] p T f,p (x,s)+ p! s p max ξ [,] p xf(x+ξ) p xf(x) p (2.9) = T f,p (x,s)+ L f,p p! s p+. Similarly, f(x+s) T f,p (x,s) p! s p max ξ [,] p xf(x+ξ) p xf(x) p T f,p (x,s) L f,p s p+. (2.) p! Inequalities (2.9) and (2.) will be useful in our developements below, but immediately note that they in fact depend only on the weaer requirement that max ξ [,] p xf(x+ξs) p xf(x) p L f,p s, (2.) for all x and s of interest, rather than relying on (2.5). (2) Unfortunately, double indices are necessary for most of our notation, as we need to distinguish both the function to which the relevant quantity is associated (the first index) and its order (the second index).

6 Cartis, Gould, Toint: high-order optimality and complexity 5 3 Unconstrained and convexly constrained problems The problem we wish to solve is formally described as minf(x) (3.) x F where we assume that f : IR n IR is q-times continuously differentiable and bounded from below, and that f has Lipschitz continuous derivatives of orders to q. We also assume that the feasible set F is closed, convex and non-empty. Note that this formulation covers unconstrained optimization (F = IR n ), as well as standard inequality (and linear equality) constrained optimization in its different forms: the set F may be defined by simple bounds, and/or by polyhedral or more general convex constraints. We are tacitly assuming here that the cost of evaluating values and derivatives of the constraint functions possibly involved in the definition of F is negligible. 3. High-order optimality conditions Given that our ambition is to wor with high-order model, it seems natural to aim at finding high-order local minimizers. As is standard, we say that x is a local minimizer of f if and only if there exists a (sufficiently small) neighbourhood B of x such that f(x) f(x ) for all x B F. (3.2) However, we must immediately remember important intrinsic limitations. These are examplified by the smooth two-dimensional problem { ) min x IR 2f(x) = x 2 (x 2 e /x2 if x, (3.3) x 2 2 if x =, which is a simplified version of a problem stated by Hancoc nearly a century ago [3, p. 36], itself a variation of a famous problem stated even earlier by Peano [4, Nos ]. The contour lines of its objective function are shown in Figure 3. on the following page. The first conclusion which can be drawn by examining this example is that, in general, assessing that a given point x (the origin in this case) is a local minimizer needs more that verifying that every direction from this point is an ascent direction. Indeed, this latter property holds in the example, but the origin is not a local minimizer (it is a saddle point). This is caused by the fact that objective function decrease may occur along specific curves starting from the point under consideration, and these curves need not be lines (such as x(α) = +αe e /2α2 e for α in the example). The second conclusion is that the characterization of a local minimizer cannot always be translated into a set of conditions only involving the Taylor expansion of f at x. In our example, the difficulty arises because the coefficients of the Taylor s expansion of e /x2 at x all vanish as x approaches the origin, and therefore that the (non-)minimizing nature of this point cannot be determined from the values of these coefficients. Thus the gap between necessary and sufficient optimality conditions cannot be closed if one restricts one s attention to using derivatives of the objective function at a putative solution of problem (3.).

7 Cartis, Gould, Toint: high-order optimality and complexity Figure 3.: The contour lines of the objective function in (3.3) Note that worse situations may also occur, for instance if we consider the following variation on Hancoc simplified example (3.3): { ) min x IR 2f(x) = x 2 (x 2 sin(/x)e /x2 if x, (3.4) x 2 2 if x =, for which no continuous descent path exists in a neighbourhood of the origin despite the origin not being a local minimizer. 3.. Necessary conditions for convexly constrained problems The above examples show that fully characterizing a local minimizer in terms of general continuous descent paths is in general impossible. However, the fact that no such path exists remains a necessary condition for such points, even if Hancoc s example shows that these paths may not be amenable to a characterization using path derivatives. In what follows, we therefore propose derivative-based necessary optimality conditions by focussing on a specific (yet reasonably general) class of descent paths x(α) of the form x(α) = x + q α i s i +o(α q ) (3.5) where α >. Define the q-th order descriptor set of F at x by D q def F (x) = { } q (s,...,s q ) IR n q x+ α i s i F for all α ς ς> i= i= (3.6) Note that D F (x) = T F(x), the standard tangent cone to F at x. We say that a feasible curve (3) x(α) is tangent to D q F (x) if (3.5) holds for some (s,...,s Q ) D q F (x). (3) Or arc, or path.

8 Cartis, Gould, Toint: high-order optimality and complexity 7 Note that the definition (3.6) implies that s i T F (x ) for i {,...,u} (3.7) where s u is the first nonzero s l. We now consider some conditions that preclude the existence of feasible descent paths of the form (3.5). These conditions involve the index sets P(j,) defined, for j, by P(j,) def = {(l,...,l ) {,...,j} For j 4, these are given by Table 3.2. l i = j}. (3.8) j {()} 2 {(2)} {(,)} 3 {(3)} {(,2),(2,)} {(,,)} 4 {(4)} {(,3),(2,2),(3,))} {(,,2),(,2,),(2,,)} {(,,,)} i= Table 3.2: The sets P(j,) for j 4 We now state necessary conditions for x to be a local minimizer. Theorem 3. Suppose that f is q times continuously differentiable in an open neighbourhood of x and that x is a local minimizer for problem (3.). If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc (3.5) starting from x is tangent to D q F (x ). Then, for j {,...,q}, the inequality j =! (l,...,l ) P(j,) xf(x )[s l,...,s l ] (3.9) holds for all (s,...,s j ) D j F (x ) such that, for i {,...,j }, i =! (l,...,l ) P(i,) xf(x )[s l,...,s l ] =. (3.) Proof. Consider feasible paths of the form (3.5). Substituting this relation in the expression f(x(α)) f(x ) (given by (3.2)) and collecting terms of equal degree in α, we obtain that, for sufficiently small α, ( ) q j f(x(α)) f(x ) = α j! xf(x )[s l,...,s l ] +o(α q ) (3.) j= = (l,...,l ) P(j,)

9 Cartis, Gould, Toint: high-order optimality and complexity 8 where P(i,) is defined in (3.8). For this to be true, we need each coefficient of α j to be non-negative on the zero set of the coefficients,...,j, subject to the requirement that the arc (3.5) must be feasible for α sufficiently small. Assume now that s T and that (3.) holds for i =. This latter condition request s to be in the zero set of the coefficient in α in (3.), that is s T er [ xf(x )]. Then the coefficient of α 2 in (3.) must be non-negative, which yields, using P(2,) = {(2)}, P(2,2) = {()} (see Table 3.2), that which is (3.9) for q = 2. xf(x )[s 2 ]+ 2 2 xf(x )[s ] 2. (3.2) We may then proceed in the same manner for all coefficients up from order 3 to q, each time considering them in the zero set of the previous coefficients (that is (3.)) and verify that (3.) directly implies (3.9). Following a long tradition, we say that x is a q-th order critical point for problem (3.) if the conclusions of this theorem hold for j {,...,q}. Of course, a q-th order critical point need not be a local minimizer, but every local minimizer is a q-th order critical point. Note that the constraint qualification assumption automatically holds if F is defined by a set of explicit polynomial inequalities and/or linear equations. Also note that, as the order j grows, (3.9) may be interpreted as imposing a condition on s j (via xf(x )[s j ]), given the directions {s i } j i= satisfying (3.). In more general situations, the fact that conditions (3.9) and (3.) not only depends on the behaviour of the objective function in some well-chosen subspace, but involves the geometry of the all possible feasible arcs maes the second-order condition (3.2) difficult to use, particular in the case where F IR n. In what follows we discuss, as far as we currently can, two resulting questions of interest.. Are they cases where these conditions reduce to checing homogeneous polynomials involving the objective function s derivatives on a subspace? 2. If that is not the case, are they circumstances in which not only the complete left-hand side of (3.) vanishes, but also each term of this left-hand side? We start by deriving useful consequences of Theorem 3.. Corollary 3.2 Suppose that the assumptions of Theorem 3. hold and let N be the normal cone to F at x and T the corresponding tangent cone. Then xf(x ) N (3.3) and Moreover s T span { xf(x ) } T and s 2 T. xf(x )[s i ] (i =,2). (3.4)

10 Cartis, Gould, Toint: high-order optimality and complexity 9 Proof. First observe that (3.9) for j = reduces to xf(x )[s ] = for all s T. Thus (3.3) holds. Also note that (3.9)-(3.) impose that s T er [ xf(x )] = T span { xf(x ) }, (3.5) which, because of (3.3) and the polarity of N and T, yields that s belongs to T. Assume now that s 2 T. Then, for all α sufficiently small, αs +α 2 s 2 does not belong to T and thus x(α) = x +αs +α 2 s 2 +o(α 2 ) cannot belong to F, which is a contradiction. Hence s 2 T and (3.4) follows for i = 2, while it follows from s T and (3.3) for i =. The first-order necessary condition (3.3) is well-nown for general first-order minimizers (see [4, Th. 2.9, p. 353] for instance). Consider now the second-order conditions (3.2). If F = IR n (or if the convex constraints are inactive at x ), then xf(x ) = because of (3.3) and (3.2) is nothing but the familiar condition that the Hessian of the objective function must be positive semi-definite. If x happens to lie on the boundary of F and xf(x ), (3.2) indicates that the effect of the curvature of the boundary of F may be represented by the term xf(x )[s 2 ], which is non-negative because of (3.4). Consider, for example, the problem min x F IR 2x where F = {x IR n x 2 x2 2 }, whose global solution is at the origin. In this case it is easy to chec that xf() = e N = span{ e }, that 2 xf() =, and that second-order feasible arcs of the form (3.5) with x() = may be chosen with s = ±e 2 and s 2 = βe where β. This imposes 2 2 xf()[s ] 2, which (unsurprisingly) holds. Interestingly, there are cases where the geometry of the set of locally feasible arcs is simple and manageable. In particular, suppose that the boundary of F is locally polyhedral. Then, given xf(x ), either T span { xf(x ) } =, in which case conditions (3.9) and (3.) are void, or there exists d in that subspace. It then possible to define a locally feasible arc with s = d and s 2 = = s q =. As a consequence, the smallest possible value of xf(x )[s 2 ] for feasible arcs starting from x is identically zero and this term therefore vanishes from (3.9)-(3.). Morever, because of the definition of P(, j) (see Table 3.2), all terms but that in j xf(x )[s ] j also vanish from these conditions, which then simplify to ( j ) j xf(x )[s ] j for all s T er i [ i xf(x )] (3.6) for j = 2,...,q, which is a condition only involving subspaces and (for i 2) cones (for i 3) Necessary conditions for unconstrained problems Consider now the case where x belongs to F, which is obviously the case if the problem is unconstrained. Then we have that the path (3.5) is unrestricted, D q F (x ) = IR n, and one is then free to choose the vectors {s i } q i= (and their sign) arbitrarily. Note first that, since N = {}, (3.3) implies that, unsurprisingly, xf(x ) =. i=

11 Cartis, Gould, Toint: high-order optimality and complexity For the second-order condition, we obtain from (3.9), again unsurprisingly, that, because er [ xf(x ] = IR n, 2 xf(x ) is positive semi-definite on IR n. Hence, if there exists a vector s er 2 [ 2 xf(x )], we have that [ 2 xf(x )] 2s = and therefore that 2 xf(x )[s,s 2 ] = for all s 2 IR n. Thus the term for = vanishes from (3.9), as well as all terms involving 2 xf(x ) applied to a vector s er 2 [ 2 xf(x )]. This implies in particular that the third-order condition may now be written as 3 xf(x )[s ] 3 = for all s er 2 [ 2 xf(x )], (3.7) where the equality is obtained by considering both s and s. Unfortunately, complications arise with fourth-order conditions, even when the objective function is a polynomial. Consider the following variant of Peano s [4] problem: min x IR 2f(x) = x2 2 κ x 2 x 2 +κ 2 x 4, (3.8) where κ and κ 2 are parameters. Then one can verify that ( ) xf() =, 2 xf() = 2 and Hence { [ 3 2κ for (i,j,) {(,2,),(,,2),(2,,)} xf()] ij = otherwise, { [ 4 24κ2 for (i,j,,l) = (,,,) xf()] ijl = otherwise. er [ xf()] = IR 2, er 2 [ 2 xf()] = span{e }, and er 3 [ 3 xf()] = span{e } span{e 2 }. The necessary condition (3.9) then state that, if the origin is a minimizer, then, using the arc defined by s = e and s 2 = 2 κ e 2, 2 2 xf()[s 2 ] xf()[s,s,s 2 ] xf()[s ] 4 = 4 κ2 2 κ2 +κ 2 = 4 κ2 +κ 2. This shows that the condition 4 xf(x )[s ] 4 on 3 i= eri [ i xf(x )], although necessary, is arbitrarily far away from the weaer necessary condition 2 2 xf()[s 2 ] xf()[s,s,s 2 ] xf()[s ] 4 (3.9) when κ grows. As was already the case for problem (3.3), the example for κ = and κ 2 = 2, say, shows that a function may admit a saddle point (x = ) which is a maximum (x = ) along a curve (x 2 = 3 2 x2 in this case) while at the same time be minimal along every line passing through x. Figure 3.2 shows the contour lines of the objective function of (3.8) for increasing values of κ 2, eeping κ = 3. One may attribute the problem that not every term in (3.9) vanishes to the fact that switching signs of s or s 2 does imply that any of the terms in (3.9) is zero (as we have verified) because of the terms 2 xf()[s 2 ] 2 and 4 xf()[s ] 4. Is this a feature of even orders

12 Cartis, Gould, Toint: high-order optimality and complexity *3mm *3mm Figure 3.2: The contour lines of (3.8) for κ = 3 and κ 2 = 2 (left), κ 2 = 2.25 (center) and κ 2 = 2.5 (right) only? Unfortunately, this not the case for q = 7. Indeed is it not difficult to verify that the terms whose multi-index (l,...,l ) is a permutation of (,2,2,2) belong to P(7,4) and those whose multi-index is a permutation of (,,,,,2) belong to P(7,6). Moreover, the contribution of these terms to the sum (3.9) cannot be distinguished by varying s or s 2, for instance by switching their signs as this technique yields only one equality in two unnowns. In general, we may therefore conclude that (3.9) must involve a mixture of terms with derivative tensors of various degrees A sufficient condition Despite the limitations we have seen when considering the simplified Hancoc example, we may still derive a sufficient condition for x to be an isolated minimizer, which is inspired by the standard second-order case (see Theorem 2.4 in Nocedal and Wright [4] for instance). Theorem 3.3 Suppose that f is q times continuously differentiable in an open neighbourhood of x F. If x F, suppose also that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D p F (x ). Let T be the tangent cone to F at x. If there exists an q [,p ] such that, for all s T, i xf(x )[s] i = (i =,...,q) and q+ x f(x )[s] q+ >, (3.2) then x is an isolated minimizer for problem (3.). Proof. The second part of condition(3.2) and the continuity of the(q+)-th derivative imply that q+ x f(z)[s] q+ > (3.2) for all s T and all z is a sufficiently small feasible neighbourhood of x. Now, using Taylor s expansion, we obtain that, for all s T and all τ (,), f(x +τs) f(x ) = q i= τ i i! i xf(x )[s] i + τq+ (q +)! q+ x f(z)[s] q+

13 Cartis, Gould, Toint: high-order optimality and complexity 2 for some z [x,x +τs]. If τ is sufficiently small, then this equality, the first part (3.2) and (3.2) ensure that f(x +τs) > f(x ). Since this strict inequality holds for all s T and all τ sufficiently small, x must be a feasible isolated minimizer. Observe that, in Peano s example (see (3.8) with κ = 3 and κ 2 = 2), we have that the curvature of the objective function is positive along every line passing through the origin, but that the order of the curvature varies with s (second order along s = e 2 and fourth order along s = e ), which precludes applying Theorem An approach using Taylor models As already noted, the conditions expressed in Theorem 3. may, in general, be very complicated to verify in an algorithm, due to their dependence on the geometry of the set of feasible paths. To avoid this difficulty, we now explore a different approach. Let us now define, for some (,] and some j {,...,p}, φ def f,j (x) = f(x) globmint f,j (x,d), (3.22) x+d F d the smallest value of the j-th order Taylor model T f,j (x,s) achievable by a feasible point at distance at most from x. Note that φ f,j (x) is a continuous function of x and for given F and f (see [33, Th. 7]). The introduction of this quantity is in part motivated by the following theorem. Theorem 3.4 Suppose that f is q times continuously differentiable in an open neighbourhood of x. If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D q F (x). Define Z f,j def F (x) = {(s,...,s j ) D j F (x) (s,...,s i ) satisfy (3.) (at x) for i {,...,j } }. Then φ f,j lim (x) j = implies that (3.9) holds (at x) for all (s,...,s j ) Z f,j F (x). Proof. We start by rewriting the power series (3.) for degree j, for any given arc x(α) tangent to D j F (x) in the form f(x(α)) f(x) = j c i α i +o(α j ) = T f,j (x,s(α)) f(x) (3.23) i= where s(α) def = x(α) x and c i def = i =! (l,...,l ) P(i,) xf(x)[s l,...,s l ],

14 Cartis, Gould, Toint: high-order optimality and complexity 3 and where the last equality in (3.23) holds because f and T f,j share the first j derivatives at x. This reformulation allows us to write that, for i {,...,j}, c i = d i [ Tf,j i! dα i (x,s(α)) f(x) ]. (3.24) α= Assume now there exists an (s,...,s j ) Z f,j F (x) such that (3.9) does not hold. In the notation just introduced, this means that, for this particular (s,...,s j ), Then, from (3.24), c i = for i {,...,j } and c j <. d i [ Tf,j dα i (x,s(α)) f(x) ] = for i {,...,j }, (3.25) α= and thus the first (j ) coefficients of the polynomial T f,j (x,s(α)) f(x) vanish. Thus, using (3.23), d j [ Tf,j dα j (x,s(α)) f(x) ] T f,j (x,s(α)) f(x) = j! lim α= α α j. (3.26) Now let i be the index of the first nonzero s i. Note that i {,...,j} since otherwise the structure of the sets P(i,) implies that c j =. Observe also that we may redefine the α parameter to α s i /i so that we may assume, without loss of generality that s i =. As a consequence, we obtain that, for sufficiently small α, s(α) 3 2 αi 3 α. (3.27) 2 Hence, successively using the facts that c j <, that (3.24) and (3.26) hold for all arcs x(α) tangent to D q F (x), and that (3.27) and (3.22) hold, we may deduce that < c j j! j! f(x) T f,j (x,s(α)) lim α α j f(x) T f,j (x,s(α)) ( 3 2 )j lim α s(α) j f(x) T f,j (x,s(α)) = ( 3 2 )j lim s(α) s(α) j ( 3 2 )j lim φ f,j (x) j. The conclusion of the theorem immediately follows since lim φ f,j (x) j =. This theorem has a useful consequence.

15 Cartis, Gould, Toint: high-order optimality and complexity 4 Corollary 3.5 Suppose that f is q times continuously differentiable in an open neighbourhood of x. If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D q F (x). Then x is a q-th order critical point for problem (3.) if φ f,j lim (x) j = for j {,...,q}. (3.28) Proof. We successively apply Theorem 3.4 q times and deduce that x is a j-th order critical point for j =,...,q. This last result says that we may avoid the difficulty of dealing the the possibly complicated geometry of D q F (x) if we are ready to perform the global optimization occurring in (3.22) exactly and find a way to compute or overestimate the limit in (3.28). Although this is a positive conclusion, these two remaining challenges remain daunting. However, it worthwhile noting that the standard approach to computing first- second- and third-order criticality measures for unconstrained problems follows the exact same approach. In the first-order case, it is easy to verify that xf(x) = [ ] [ min d xf(x)[d] = ( ) ] f(x) globmin f(x)+ xf(x)[d] d where the first equality is justified by the convexity of xf(x)[d] as a function of d. Because the left-hand side of the above relation is independent of, the computation of the limit (3.28) for tending to zero is trivial when j = and the limiting value is x f(x). For the second-order case, assuming xf(x) =, [ [ )] min,λ min [ xf(x)]] 2 = 2 2 globmin d 2 xf(x)[d] 2 = 2 [f(x) globmin d ( f(x)+ xf(x)[d]+ 2 2 xf(x)[d] 2)], (3.29) the first global optimization problem being easily solvable by a trust-region-type calculation [2, Section 7.3] or directly by an equivalent eigenvalue analysis. As for the first-order case, the left-hand side of the equation is independent of and obtaining the limit for tending to zero is trivial. Finally, if M(x) def = er[ xf(x)] er[ 2 xf(x)] and P M(x) is the orthogonal projection onto that subspace, P M(x) ( 3 xf(x)) = 6 3 [ min d xf(x)[d] ] = 3 [ f(x) globmin d ( f(x)+ xf(x)[d]+ 2 2 xf(x)[d] xf(x)[d] 3 )] (3.3) where the first equality results from(2.2). In this case, the global optimization in the subspace M(x) is potentially harder to solve exactly (a randomization argument is used in [] to derive

16 Cartis, Gould, Toint: high-order optimality and complexity 5 a upper bound on its value), although it still involves a subpace (4). While we are unaware of a technique for maing the global minimization in (3.22) easy in the even more complicated general case, we may thin of approximating the limit in (3.28) by choosing a (user-supplied) value of > small enough (5) and consider the size of the quantity (x). (3.3) jφ f,j Unfortunately, it is easy to see that, if is fixed at some positive value, a zero value of φ f,j (x) alone is not a necessary condition for x being a local minimizer. Indeed consider the univariate problem of minimizing f(x) = x 2 ( αx) for α >. One verifies that, for any >, the choice α = 2/ yields that φ f, () =, φ f,2 () = but φ f,3 () = 4 >, (3.32) α2 despite being a local (but not global) minimizer. As a matter of fact, φ f,j (x) gives more information than the mere potential proximity of a j-th order critical point: it is able to see beyond an infinitesimal neighbourhood of x and provides information on possible further descent beyond such a neighbourhood. Rather than a true criticality measure, it can be considered, for fixed, as an indicator of further progress, but its use for terminating at a local minimizer is clearly imperfect. Despite this drawbac, the above arguments would suggest that it is reasonable to consider a (conceptual) minimization algorithm whose objective is to find a point x ǫ such that φ f,j (x ǫ) ǫ j for j =,...,q (3.33) for some (,] sufficiently small and some q {,...,p}. This condition implies an approximate minimizing property which we mae more precise by the following result. Theorem 3.6 Suppose that f is q times continuously differentiable and that q xf is Lipschitz continous with constant L f,q (in the sense of (2.5)) in an open neighbourhood of x ǫ of radius larger than. Suppose also (3.33) holds for j = q. Then ( ) p!ǫ f(x ǫ +d) f(x ǫ ) 2ǫ q q q+ for all x ǫ +d F such that d min. L f,p (3.34) Proof. Consider x+d F. Using the triangle inequality, we have that f(x ǫ +d) = f(x ǫ +d) T f,q (x ǫ,d)+t f,q (x ǫ,d) f(x ǫ +d) T f,q (x ǫ,d) +T f,q (x ǫ,d). (3.35) Now, condition (3.33) for j = q, implies that, if d, T f,q (x ǫ,d) T f,q (x ǫ,) ǫ q = f(x ǫ ) ǫ q. (3.36) (4) We saw in Section 3..2 that q = 3 is the highest order for which this is possible. (5) Note that a small has the advantage of limiting the global optimization effort.

17 Cartis, Gould, Toint: high-order optimality and complexity 6 Hence, substituting (2.) and (3.36) into (3.35) and using again that d <, we deduce that f(x ǫ +d) f(x ǫ ) L f,p d q+ ǫ q, p! and the desired result follows. The size of the neighbourhood of x ǫ where f is locally smallest in that the first part (3.34) holds therefore increases with the criticality order q, a feature potentially useful in various contexts such as global optimization. Before turning to more algorithmic aspects, we briefly compare the results of Theorem 3.6 which what can be deduced on the local behaviour of the Taylor series T f,q (x,s) if, instead of requiring the exact necessary condition (3.9) to hold exactly, this condition is relaxed to j! xf(x )[s l,...,s l ] ǫ (3.37) = (l,...,l ) P(j,) while insisting that (3.) should hold exactly. If j = q =, it is easy to verify that (3.37) for s T is equivalent to the condition that P T [ xf(x )] ǫ, (3.38) from which we deduce, using the Cauchy-Schwarz inequality, that T f, (x,s) T f, (x,) ǫ (3.39) for all s T with d, that is (3.33) for j =. Thus, by Theorem 3.6, we obtain that (3.34) holds for j =. 4 A trust-region minimization algorithm Aware of the optimality conditions and their limitations, we may now consider an algorithm to achieve (3.33). This objective naturally suggests a trust-region (6) formulation with adaptative model degree, in which the user specifies a desired criticality order q, assuming that derivatives of order,...,q are available when needed. We made this idea explicit in Algorithm 4. on the next page. We first state a useful property of Algorithm 4., which ensures that a fixed fraction of the iterations,2,..., must be either successful or very successful. Indeed, if we define the following bound holds. S def = {l {,...,} ρ l η }, (6) A detailed account and a comprehensive bibliography on trust-region methods can be found in [2].

18 Cartis, Gould, Toint: high-order optimality and complexity 7 Algorithm 4.: Trust-region algorithm using adaptive order models for convexly-contrained problems (TRq) Step : Initialization. A criticality order q, an accuracy threshold ǫ (, ], a starting point x and an initial trust-region radius [ǫ,] are given, as well as algorithmic parameters max [,], γ γ 2 < γ 3 and < η η 2 <. Compute x = P F [x ], evaluate f(x ) and set =. Step : Step computation. For j =,...,q,. Evaluate j f(x ) and compute φ f,j (x ) from (3.22). 2. If φ f,j (x ) > ǫ j, go to Step 3 with s = d, where d is the argument of the global minimum in the computation of φ f,j (x ). Step 2: Termination. Terminate with x ǫ = x and ǫ =. Step 3: Accept the new iterate. Compute f(x +s ) and ρ = f(x ) f(x +s ) T f,j (x,) T f,j (x,s ). (4.) If ρ η, set x + = x +s. Otherwise set x + = x. Step 4: Update the trust-region radius. Set [γ,γ 2 ] if ρ < η, + [γ 2, ] if ρ [η,η 2 ), [,min( max,γ 3 )] if ρ η 2, (4.2) increment by one and go to Step.

19 Cartis, Gould, Toint: high-order optimality and complexity 8 Lemma 4. Assume that min for some min > independent of. Then Algorithm 4. ensures that, whenever S, ( def κ u S, where κ u = + logγ ) ( ) 3 + logγ 2 logγ 2 log. (4.3) min Proof. The trust-region update (4.2) ensures that γ U 2 γ S 3, where U = {,...,} \ S. This inequality then yields (4.3) by taing logarithms and using that S and = S + U. 4. Evaluation complexity for Algorithm 4. We start our worst-case analysis by formalizing our assumptions. Let L f def = {x+z IR n x F, f(x) f(x ) and z max }. AS. AS.2 AS.3 The feasible set F is closed, convex and non-empty. The objective function f is q times continuously differentiable on an open set containing L f. For j {,...,q}, the j-th derivative of f is Lipschitz continuous on L f (in the sense of (2.5)) with Lipschitz constant L f,j. For simplicity of notation, define L f def = max j {,...,q} L f,j. Algorithm 4. is required to start from a feasible x F, which, together with the fact that the subproblem solution in Step 2 involves minimization over F, leads to AS.. Note that AS.3 automatically holds if f is q+ times continuously differentiable and F is bounded. It is also important to note that we could replace AS.3 by the condition that (2.) holds on the path of iterates [x,x + ] without altering any of the proofs below. While this weaer formulation may be useful, we prefer to use AS.3 in the sequel of this paper because it is independent of the sequence of iterates and may be easier to verify a priori, given the problem (3.). Lemma 4.2 Suppose that AS.2 and AS.3 hold. Then, for all l {,...,}, l κ ǫ (4.4) where [ def κ = min, γ ] ( η 2 ). (4.5) L f

20 Cartis, Gould, Toint: high-order optimality and complexity 9 Proof. Assume that, for some l {,...,} From (4.), we obtain that, for some j {,...,q}, l η 2 L f ǫ. (4.6) ρ l = f(x l +s l ) T f,j (x l,s l ) T f,j (x l,) T f,j (x l,s l ) < L f s l j+ j!ǫ j l L f l j!ǫ ( η 2 ), where we used (2.9) and the fact that φ l f,j (x l) > ǫ j l to deduce the first inequality, the bound s l l to deduce the second, and (4.6) with j to deduce the third. Thus ρ l η 2 and l+ l. The mechanism of the algorithm and the inequality ǫ then ensures that, for all l, [ l min, γ ] ( η 2 )ǫ κ ǫ. L f We now derive a simple lower bound on the objective function decrease at successful iterations. Lemma 4.3 Suppose that AS. AS.3 hold. Then, if is the index of a successful iteration before termination, f(x ) f(x + ) η κ ǫ q+. (4.7) Proof. We have, using (4.), the fact that φ f,j (x ) > ǫ j (4.4) successively, that for some j {,...,q} and f(x ) f(x + ) η [T f,j (x,) T f,j (x,s )] = η φ f,j (x ) > η κ ǫ j+ η κ ǫ q+. Our worst-case evaluation complexity results can now be proved by summing the decreases guaranteed by this last lemma.

21 Cartis, Gould, Toint: high-order optimality and complexity 2 Theorem 4.4 Suppose that AS. AS.3 hold. Let f low be a lower bound on f within F. Then, given ǫ (,], Algorithm 4. applied to problem (3.) needs at most κ f f(x ) f low S ǫ q+ (4.8) successful iterations(each possibly involving one evaluation of f and its q first derivatives) and at most κ u κ f f(x ) f low S ǫ q+ + (4.9) iterations in total to terminate with an iterate x ǫ such that (3.33) holds, where κ f S = [ max, η L f γ ( η 2 ) ], (4.) and κ u is given by (4.3). Moreover, if ǫ is the value of at termination, for all d such that f(x ǫ +d) f(x ǫ ) 2ǫ q ǫ (4.) ( ) x ǫ +d F and d (ǫ q ǫ) Lf q+ q+. (4.2) q! Observe that, because of (4.2) and (4.4), ǫ [κ δ ǫ, max ]. Proof. Let be the index of an arbitrary iteration before termination. Using the definition of f low, the nature of successful iterations, (4.) and Lemma 4.3, we deduce that f(x ) f low f(x ) f(x + ) = i S [f(x i ) f(x i+ )] S [κ f S ] ǫ q+ (4.3) which proves (4.8). We next call upon Lemma 4. to compute the upper bound on the total number of iterations before termination (obviously, there must be a least one successful iteration unless termination occurs for = ) and add one for the evaluation at termination. Finally, (4.)-(4.2) result from Theorem 3.6, AS.3 and the fact that φ f,q (x ǫ) ǫ q ǫ at termination. Theorem 4.4 generalizes the nown bounds for the cases where F = IR and q = [38], q = 2 [2,39] and q = 3 []. The results for q = 2 with F IR n and for q > 3 appear to be new. The latter provide the first evaluation complexity bounds for general criticality order q. Note that, if q =, bounds of the type O(ǫ (p+)/p ) exist if one is ready to minimize models of degree p > q (see [6]). Whether similar improvements can be obtained for q > remains an open question at this stage.

22 Cartis, Gould, Toint: high-order optimality and complexity 2 We also observe that the above theory remains valid if the termination rule φ,j (x ) ǫ j for j {,...,q} (4.4) used in Step is replaced by a more flexible one, involving other acceptable termination circumstances, such as if (4.4) hold or some other condition holds. We conclude this section by noting that the global optimization effort involved in the computation of φ j,j (x ) (j {,...,q}) in Algorithm 4. might be limited by choosing max small enough. 5 Sharpness It is interesting that an example was presented in [4] showing that the bound in O(ǫ 3 ) evaluations for q = 2 is essentially sharp for both the trust-region and a regularization algorithm. This is significant, because requiring φ f,2 (x) ǫ 2 is slightly stronger, for small, than the standard condition [ ] xf(x) ǫ and min,λ min [ 2 xf(x)] ǫ (5.) (used in [39] and [2] for instance). Indeed, for one-dimensional problems and assuming 2 xf(x), the former condition amounts to requiring that ( ) 2 2 xf(x)+2 x(f(x) ǫ (5.2) where the absolute value reflects the fact that s = ± depending on the sign of g. In the remainder of this section, we show that the example proposed in [4] can be extended to arbitrary order q, and thus that the complexity bounds (4.8)-(4.9) are esentially sharp for our trust-region algorithm. The idea of our generalized example is to apply Algorithm 4. to a unidimensional objective function f for some fixed q and F = IR + (hence guaranteeing AS.), generating a sequence of iterates {x } starting from the origin, i.e. x = x =. We first choose the sequences of derivatives values up to order q to be, for all, j xf(x ) = for j {,...,q } and q xf(x ) = q! ( ) q+ +δ (5.3) + where δ (,) is a (small) positive constant. This means that, at iterate x, the q-th order Taylor model is given by ( ) q+ +δ T f,q (x,s) = f(x ) s q + wherethevalueoff(x )remainsunspecifiedfornow. Thestepisthenobtainedbyminimizing this model in a trust-region of radius = ( ) q+ +δ, +

23 Cartis, Gould, Toint: high-order optimality and complexity 22 yielding that s = = As a consequence, the model decrease is given by ( ) q+ +δ (,). (5.4) + T f,q (x,) T f,q (x,s ) = ( ) q! q xf(x )s q +(q+)δ =. (5.5) + For our example, we the define the objective function decrease at iteration to be f def = f(x ) f(x +s ) = 2 (η +η 2 )[T f,q (x,) T f,q (x,s )], (5.6) thereby ensuring that ρ [η,η 2 ) and x + = x + s for each. Summing up function decreases, we may then specify the objective function s values at the iterates by f(x ) = η +η 2 ζ(+(q +)δ) and f(x + ) = f(x ) η ( ) +η 2 +(q+)δ. (5.7) where ζ(t) def = = t is the Riemann zeta function. This function is finite for all t > (and thus also for t = +(q +)δ), thereby ensuring that f(x ) for all. We also verify that ( ) + + q+ +δ = [γ 2,] +2 in accordance with (4.2), provided γ 2 ( 2 3 ) q+ +δ. Observe also that (5.3) and (5.5) ensure that, for each, φ f,j (x ) = for j {,...,q } (5.8) and ( ) φ +(q+)δ ( ) f,q (x q+ +δ ) = = q + +. (5.9) We now use Hermite interpolation to construct the objective function f on the successive intervals [x,x + ], and define where p is the polynomial f(x) = p (x x )+f(x ) for x [x,x + ] and, (5.) p (s) = 2q+ i= with coefficients defined by the interpolation conditions p () = f(x ) f(x + ), p (s ) = ; c i, s i, (5.) j sp () = = j sp (s ) for j {,...,q }, (5.2) q sp () = q xf(x ), q sp (s ) = q xf(x + ).

24 Cartis, Gould, Toint: high-order optimality and complexity 23 These conditions ensure that f(x) is q times continuously differentiable on IR + and thus that AS.2 holds. They also impose the following values for the first q + coefficients c, = f(x ) f(x + ), c j, = (j {,...,q }), c q, = q xf(x ); (5.3) and the remaining q + coefficients are solutions of the linear system s q+ s q+2... s 2q+ (q +)s q (q +2)s q+.. (q+)! (q+2)!! s 2! s (2q +)s 2q. (2q+)! (q+)! sq+ c q+, c q+2,. c 2q+, = r, (5.4) where the right-hand side is given by f q! q xf(x )s q (q )! q xf(x )s q r =.. (5.5) p xf(x )s q xf(x + ) q xf(x ) Observe now that the coefficient matrix of this linear system may be written as s q+ s q... M s q... s s q where M q def =... q + q q +. (q+)!!. (q+2)! 2!.... (2q+)! (q+)! is an invertible matrix independent of (see Appendix). Hence c q+, c q+2,. c 2q+, = s... s q M q s (q+) Observe now that, because of (5.4), (5.6), (5.5) and (5.3), s q... s (5.6) r (5.7) f = O(s q+ ), q xf(x )s q j = O(s q+ j ) (j {,...,q }) and, since q xf(x ) < q xf(x + ) <, q xf(x + ) q xf(x ) q xf(x ) q!s.

25 Cartis, Gould, Toint: high-order optimality and complexity 24 These bounds and (5.5) imply that [r ] i, the i-th component of r, satisfies [r ] i = O(s q+2 i ) for i {,...,q +}. Hence, using (5.7) and the non-singularity of M q, we obtain that there exists a constant κ q independent of such that and thus that c i, s i q κ q for i {q +,...,2q +}, (5.8) q+ s p (s) 2q+ i=q+ i! c i, s i q 2q+ i=q+ i! κ q Moreover, using successively (5.), the triangle inequality, (5.3), (5.3), (5.4), (5.8) and κ q, we obtain that, for j {,...,q}, j sp (s) = 2q+ i=j i! (i j)! c i, s i j q! (q j)! c q, s q j + 2q+ i=q+ 2q+ q! (q j)! + i! (i j)! c i, s i q i=q+ 2q+ i! κ q (i j)! i=q i! (i j)! c i, s i q s q+ j and thus all derivatives of order one up to q remain bounded on [,s ]. Because of (5.), we therefore obtain that AS.3 holds. Moreover (5.3), (5.8), the inequalities q xf(x ) q! and f(x ), (5.) and (5.4) also ensure that f(x) is bounded below. We have therefore shown that the bounds of Theorem 4.4 are essentially sharp, in that, for every δ >, Algorithm 4. applied to the problem of minimizing the lower-bounded objective function f just constructed and satisfying AS.-AS.3 will tae, because of (5.8) and (5.9), ǫ q+ +(q+)δ iterations and evaluation of f and its q first derivatives to find an iterate x such that condition (4.4) holds. Moreover, it is clear that, in the example presented, the global rate of convergence is driven by the term of degree q in the Taylor series. 6 Discussion We have analyzed the optimality conditions of order 2 and above, and proposed a measure of criticality for arbitrary order for convexly constrained nonlinear optimization problems. As this measure can be extended to define ǫ-approximate critical points of high-order, we have then used it in a conceptual trust-region algorithm to show that, if derivatives of the objective

26 Cartis, Gould, Toint: high-order optimality and complexity 25 function up to order q can be evaluated and are Lipschitz continuous, then this algorithm applied to the convexly constrained problem (3.) needs at most O(ǫ (q+) ) evaluations of f and its derivatives to compute an ǫ-approximate q-order critical point. Moreover, we have shown by an example that this bound is essentially sharp. In the purely unconstrained case, this result recovers nown results for q = (firstorder criticality for Lipschitz gradients) [38], q = 2 (second-order criticality (7) with Lipschitz Hessians) [4,39] and q = 3 (third-order criticality (8) with Lipschitz continuous third derivative) [], but extends them to arbitrary order. The results for the convexly constrained case appear to be new and provide in particular the first complexity bound for second- and third-order criticality for such inequality constrained problems. Because the condition (4.4) measure different orders of criticality, we could choose to use a different ǫ for every order (as in [4]), complicating the expression of the bound accordingly. However, as shown by our example, the worst-case behaviour of Algorithm 4. is dominated by that of q xf, which maes the distinction of the various ǫ-s less crucial. Because of the global optimization occurring in the definition of the criticality measure φ f,j (x), the algorithm discussed in the present paper remains, in general, of a theoretical nature. However there may be cases where this computation is tractable for small enough, for instance if the derivative tensors of the objective function are strongly structured. Such approaches may hopefully be of use for small dimensional or structured highly nonlinear problems, such as those occurring in machine learning using deep learning techniques (see []). The present framewor for handling convex constraints is not free of limitations, resulting from our choice to transfer difficulties associated with the original problem to the subproblem solution, thereby sparing precious evaluations of f and its derivatives. In particular, the cost of evaluating any constraint function/derivative possibly defining the convex feasible set F is neglected by the present approach, which must therefore be seen as a suitable framewor to handle cheap inequality constraint such as simple bounds. It is nown that the complexity of obtaining ǫ-approximate first-order criticalty for unconstrained and convexly-constrained problem can be reduced to O(ǫ (p+)/p ) if one is ready to define the step by using a regularization model of order p. In the unconstrained case, this was shown for p = 2 in [2,39] and for general p in [6], while the convexly constrained case was analyzed (for p = 2) in [3]. Whether this methodology and the associated improvements in evaluation complexity bounds can be extended to order above one is the subject of ongoing research. Acnowledgements The authors would lie to than Oliver Stein for suggesting reference [33], as well as Jim Bure and Adrian Lewis for interesting discussions. The third author would also lie to acnowlegge the support provided by the Belgian Fund for Scientific Research (FNRS), the Leverhulme Trust (UK), Balliol College (Oxford), the Department of Applied Mathematics of the Hong Kong Polytechnic University, ENSEEIHT (Toulouse, France) and INDAM (Florence, Italy). References [] A. Anandumar and R. Ge. Efficient approaches for escaping high-order saddle points in nonconvex optimization. arxiv , 26. (7) Using (3.29). (8) Using (3.3).

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations