Size: px
Start display at page:

Download ""

Transcription

1 Second-order optimality and beyond: characterization and evaluation complexity in nonconvex convexly-constrained optimization by C. Cartis, N. I. M. Gould and Ph. L. Toint Report NAXYS August University of Namur, 6, rue de Bruxelles, B5 Namur (Belgium)

2 Second-order optimality and beyond: characterization and evaluation complexity in convexly-constrained nonlinear optimization C. Cartis N. I. M. Gould and Ph. L. Toint 5 August 26 Abstract High-order optimality conditions for convexly-constrained nonlinear optimization problems are analyzed. A corresponding (expensive) measure of criticality for arbitrary order is proposed and extended to define high-order ǫ-approximate critical points. This new measure is then used within a conceptual trust-region algorithm to show that, if derivatives of the objective function up to order q can be evaluated and are Lipschitz continuous, then this algorithm applied to the convexly constrained problem needs at most O(ǫ (q+) ) evaluations of f and its derivatives to compute an ǫ-approximate q-th order critical point. This provides the first evaluation complexity result for critical points of arbitrary order in nonlinear optimization. An example is discussed showing that the obtained evaluation complexity bounds are essentially sharp. Keywords: nonlinear optimization, high-order optimality conditions, complexity theory, machine learning. Introduction Recent years have seen a growing interest in the analysis of the worst-case evaluation complexity of nonlinear (possibly nonconvex) smooth optimization (for the nonconvex case only, see [ 7, 3, 5 9, 2 3, 34 36, 39, 43 47] among others). In general terms, this analysis aims at giving (sometimes sharp) bounds on the number of evaluations of a minimization problem s functions (objective and constraints, if relevant) and their derivatives that are, in the worst case, necessary for certain algorithms to find an approximate critical point for the unconstrained, convexly-constrained or general nonlinear optimization problem. It is not uncommon that such algorithms may involve possibly extremely costly internal computations, provided the number of calls to the problem functions is ept as low as possible. Mathematical Institute, Oxford University, Oxford OX2 6GG, Great Britain. coralia.cartis@maths.ox.ac.u. Numerical Analysis Group, Rutherford Appleton Laboratory, Chilton OX QX, Great Britain. nic.gould@stfc.ac.u. Namur Center for Complex Systems (naxys) and Department of Mathematics, University of Namur, 6, rue de Bruxelles, B-5 Namur, Belgium. philippe.toint@unamur.be.

3 Cartis, Gould, Toint: high-order optimality and complexity 2 Most of the research to date focusses on finding first, second and, in one case, third-order critical points. Evaluation complexity for first-order critical point was first investigated, for the unconstrained case, by Nesterov in [38] and for first- and second-order Nesterov and Polya [39] and by Cartis, Gould and Toint in [2]. Third-order critical points were studied in [], motivated by highly-nonlinear problems in machine learning. However, the analysis of evaluation complexity for orders higher than three is missing both concepts and results. Thepurposeofthepresentpaperistoimproveonthissituationintwoways. Thefirstisto review optimality conditions of arbitrary orders q for convexly-constrained minimization problems, and the second is to describe a theoretical algorithm whose behaviour provides, for this class of problems, the first evaluation complexity bounds for such arbitrary orders of optimality. The paper is organized as follows. After the present introduction, Section 2 discusses some preliminary results on tensor norms, a generalized Cauchy-Schwarz inequality and high-order error bounds from Taylor series. Section 3 investigates optimality conditions for convexlyconstrained optimization, while Section 4 proposes a trust-region based minimization algorithm for solving this class of problems and analyzes its evaluation complexity. An example is introduced in Section 5 to show that the new evaluation complexity bounds are essentially sharp. A final discussion is presented in Section 6. 2 Preliminaries 2. Basic notations In what follows, y T x denotes the Euclidean inner product of the vectors x and y of IR n and x = (x T x) /2 is the associated Euclidean norm. If T and T 2 are tensors, T T 2 is their tensor product. B(x, ), the ball of radius centered at x. If X is a closed set, X denotes its boundary and X denotes its interior. The vectors {e i } n i= are the coordinate vectors in IR n. The notation λ min [M] stands for the leftmost eigenvalue of the symmetric matrix M. If {a } and {b } are two infinite sequences of non-negative scalars converging to zero, we say that a = o(b ) if and only if lim a /b =. The normal cone to a general convex set C at x C is defined by N C (x) def = {s IR n s T (z x) for all z C} and its polar, the tangent cone to F at x, by T C (x) = N C(x) def = {s IR n s T v for all v N C }. Note that C T C (x) for all x C. We also define P C [ ] be the orthogonal projection onto C and use the Moreau decomposition [37] which states that, for every x C and every y IR n y = P TC (x)[y]+p NC (x)[y] and (P TC (x)[y] x) T (P NC (x)[y] x) =. (2.) (See [2, Section 3.5] for a brief introduction of the relevant properties of convex sets and cones, or to [32, Chapter 3] or [42, Part I] for an in-depth treatment.)

4 Cartis, Gould, Toint: high-order optimality and complexity Tensor norms and generalized Cauchy-Schwartz inequality We will mae substantial use of tensors and their norms in what follows, and thus start by establishing some concepts and notation. If the notation T[v,...,v j ] stands for the tensor of order q j resulting from the application of the q-th order tensor T to the vectors v,...,v j, the (recursively induced () ) Euclidean norm q on the space of q-th order tensors is the given by def T q = max T[v,...,v q ]. (2.2) v = = v q = (Observe that this value is always non-negative since we can flip the sign of T[v,...,v q ] by flipping that of one of the vectors v i.) Note that the definition (2.2) implies that T[v,...,v j ] q j = max T[v,...,v j ][w,...,w q j ] w = = w q j = ( [ ]) ( ) = max T v w = = w q j = v,..., v j j v j,w,...,w q j v i i= ( ) ( ) j max T[w,...,w q ] v i w = = w q = i= j = T q. v i, i= (2.3) a simple generalization of the standard Cauchy-Schwartz inequality for order- tensors (vectors) and of Mv M v which is valid for induced norms of matrices (order-2 tensors). Observe also that perturbation theory (see [33, Th. 7]) implies that T q is continuous as a function of T. If T is a symmetric tensor of order q, define the q-ernel of the multilinear q-form as T[v] q def = T[v,...,v] }{{} q times er q [T] def = {v IR n T[v] q = } (see [8,9]). Note that, in general, er q [T] is a union of cones. Interestingly, the q-ernels are not only unions of cones but also subspaces for q =. However this is not true for general q-ernels, since both (,) T and (,) T belong to the 2-ernel of the symmetric 2-form x x 2 on IR 2, but not their sum. We also note that, for symmetric tensors of odd order, T[v] q = T[ v] q and thus that min d T[d]q = min d ( T[ d]q ) = min d ( T[d]q ) = max d T[d]q, (2.4) where we used the symmetry of the unit ball with respect to the origin to deduce the second equality. () That it is the recursively norm induced by the standard Euclidean norm results from the observation that [ ] max T[v,...,vq] = max v = = v q = v q = max T[v,...,vq ] v = = v q = [v q].

5 Cartis, Gould, Toint: high-order optimality and complexity High-order error bounds from Taylor series The tensors considered in what follows are symmetric and arise as high-order derivatives of the objective function f. For the p-th derivative of a function f : IR n IR to be Lipschitz continuous on the set S IR n, we require that there exists a constant L f,p such that, for all x,y S, p xf(x) p xf(y) p L f,p x y (2.5) where p xh(x) is the p-th order symmetric derivative tensor of h at x. Let T f,p (x,s) denote (2) the p-th order Taylor-series approximation to f(x + s) at some x IR n given by and consider the Taylor identity φ() t () = T f,p (x,s) def = f(x)+ ( )! p j= j! j xf(x)[s] j (2.6) ( ξ) [φ () (ξ) φ () ()]dξ (2.7) involvingagivenunivariatec function φ(α) and its -th order Taylor approximation t (α) = i= φ(i) ()α i /i! expressed in terms of the ith derivatives φ i, i =,...,. Let x,s IR n. Then, picing φ(α) = f(x+αs) and = q, it follows immediately from the fact that t p () = T f,p (x,s), the identity (2.3), (2.5), (2.6) and (2.7) imply that, for all x,s IR n, ( ξ) p dξ = p, (2.8) f(x+s) T f,p (x,s)+ (p )! [ ( ξ) p T f,p (x,s)+ (p )! ( ξ) p p xf(x+ξs)[s] p p xf(x)[s] p dξ ] dξ max ξ [,] p xf(x+ξs)[s] p p xf(x)[s] p T f,p (x,s)+ p! s p max ξ [,] p xf(x+ξ) p xf(x) p (2.9) = T f,p (x,s)+ L f,p p! s p+. Similarly, f(x+s) T f,p (x,s) p! s p max ξ [,] p xf(x+ξ) p xf(x) p T f,p (x,s) L f,p s p+. (2.) p! Inequalities (2.9) and (2.) will be useful in our developements below, but immediately note that they in fact depend only on the weaer requirement that max ξ [,] p xf(x+ξs) p xf(x) p L f,p s, (2.) for all x and s of interest, rather than relying on (2.5). (2) Unfortunately, double indices are necessary for most of our notation, as we need to distinguish both the function to which the relevant quantity is associated (the first index) and its order (the second index).

6 Cartis, Gould, Toint: high-order optimality and complexity 5 3 Unconstrained and convexly constrained problems The problem we wish to solve is formally described as minf(x) (3.) x F where we assume that f : IR n IR is q-times continuously differentiable and bounded from below, and that f has Lipschitz continuous derivatives of orders to q. We also assume that the feasible set F is closed, convex and non-empty. Note that this formulation covers unconstrained optimization (F = IR n ), as well as standard inequality (and linear equality) constrained optimization in its different forms: the set F may be defined by simple bounds, and/or by polyhedral or more general convex constraints. We are tacitly assuming here that the cost of evaluating values and derivatives of the constraint functions possibly involved in the definition of F is negligible. 3. High-order optimality conditions Given that our ambition is to wor with high-order model, it seems natural to aim at finding high-order local minimizers. As is standard, we say that x is a local minimizer of f if and only if there exists a (sufficiently small) neighbourhood B of x such that f(x) f(x ) for all x B F. (3.2) However, we must immediately remember important intrinsic limitations. These are examplified by the smooth two-dimensional problem { ) min x IR 2f(x) = x 2 (x 2 e /x2 if x, (3.3) x 2 2 if x =, which is a simplified version of a problem stated by Hancoc nearly a century ago [3, p. 36], itself a variation of a famous problem stated even earlier by Peano [4, Nos ]. The contour lines of its objective function are shown in Figure 3. on the following page. The first conclusion which can be drawn by examining this example is that, in general, assessing that a given point x (the origin in this case) is a local minimizer needs more that verifying that every direction from this point is an ascent direction. Indeed, this latter property holds in the example, but the origin is not a local minimizer (it is a saddle point). This is caused by the fact that objective function decrease may occur along specific curves starting from the point under consideration, and these curves need not be lines (such as x(α) = +αe e /2α2 e for α in the example). The second conclusion is that the characterization of a local minimizer cannot always be translated into a set of conditions only involving the Taylor expansion of f at x. In our example, the difficulty arises because the coefficients of the Taylor s expansion of e /x2 at x all vanish as x approaches the origin, and therefore that the (non-)minimizing nature of this point cannot be determined from the values of these coefficients. Thus the gap between necessary and sufficient optimality conditions cannot be closed if one restricts one s attention to using derivatives of the objective function at a putative solution of problem (3.).

7 Cartis, Gould, Toint: high-order optimality and complexity Figure 3.: The contour lines of the objective function in (3.3) Note that worse situations may also occur, for instance if we consider the following variation on Hancoc simplified example (3.3): { ) min x IR 2f(x) = x 2 (x 2 sin(/x)e /x2 if x, (3.4) x 2 2 if x =, for which no continuous descent path exists in a neighbourhood of the origin despite the origin not being a local minimizer. 3.. Necessary conditions for convexly constrained problems The above examples show that fully characterizing a local minimizer in terms of general continuous descent paths is in general impossible. However, the fact that no such path exists remains a necessary condition for such points, even if Hancoc s example shows that these paths may not be amenable to a characterization using path derivatives. In what follows, we therefore propose derivative-based necessary optimality conditions by focussing on a specific (yet reasonably general) class of descent paths x(α) of the form x(α) = x + q α i s i +o(α q ) (3.5) where α >. Define the q-th order descriptor set of F at x by D q def F (x) = { } q (s,...,s q ) IR n q x+ α i s i F for all α ς ς> i= i= (3.6) Note that D F (x) = T F(x), the standard tangent cone to F at x. We say that a feasible curve (3) x(α) is tangent to D q F (x) if (3.5) holds for some (s,...,s Q ) D q F (x). (3) Or arc, or path.

8 Cartis, Gould, Toint: high-order optimality and complexity 7 Note that the definition (3.6) implies that s i T F (x ) for i {,...,u} (3.7) where s u is the first nonzero s l. We now consider some conditions that preclude the existence of feasible descent paths of the form (3.5). These conditions involve the index sets P(j,) defined, for j, by P(j,) def = {(l,...,l ) {,...,j} For j 4, these are given by Table 3.2. l i = j}. (3.8) j {()} 2 {(2)} {(,)} 3 {(3)} {(,2),(2,)} {(,,)} 4 {(4)} {(,3),(2,2),(3,))} {(,,2),(,2,),(2,,)} {(,,,)} i= Table 3.2: The sets P(j,) for j 4 We now state necessary conditions for x to be a local minimizer. Theorem 3. Suppose that f is q times continuously differentiable in an open neighbourhood of x and that x is a local minimizer for problem (3.). If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc (3.5) starting from x is tangent to D q F (x ). Then, for j {,...,q}, the inequality j =! (l,...,l ) P(j,) xf(x )[s l,...,s l ] (3.9) holds for all (s,...,s j ) D j F (x ) such that, for i {,...,j }, i =! (l,...,l ) P(i,) xf(x )[s l,...,s l ] =. (3.) Proof. Consider feasible paths of the form (3.5). Substituting this relation in the expression f(x(α)) f(x ) (given by (3.2)) and collecting terms of equal degree in α, we obtain that, for sufficiently small α, ( ) q j f(x(α)) f(x ) = α j! xf(x )[s l,...,s l ] +o(α q ) (3.) j= = (l,...,l ) P(j,)

9 Cartis, Gould, Toint: high-order optimality and complexity 8 where P(i,) is defined in (3.8). For this to be true, we need each coefficient of α j to be non-negative on the zero set of the coefficients,...,j, subject to the requirement that the arc (3.5) must be feasible for α sufficiently small. Assume now that s T and that (3.) holds for i =. This latter condition request s to be in the zero set of the coefficient in α in (3.), that is s T er [ xf(x )]. Then the coefficient of α 2 in (3.) must be non-negative, which yields, using P(2,) = {(2)}, P(2,2) = {()} (see Table 3.2), that which is (3.9) for q = 2. xf(x )[s 2 ]+ 2 2 xf(x )[s ] 2. (3.2) We may then proceed in the same manner for all coefficients up from order 3 to q, each time considering them in the zero set of the previous coefficients (that is (3.)) and verify that (3.) directly implies (3.9). Following a long tradition, we say that x is a q-th order critical point for problem (3.) if the conclusions of this theorem hold for j {,...,q}. Of course, a q-th order critical point need not be a local minimizer, but every local minimizer is a q-th order critical point. Note that the constraint qualification assumption automatically holds if F is defined by a set of explicit polynomial inequalities and/or linear equations. Also note that, as the order j grows, (3.9) may be interpreted as imposing a condition on s j (via xf(x )[s j ]), given the directions {s i } j i= satisfying (3.). In more general situations, the fact that conditions (3.9) and (3.) not only depends on the behaviour of the objective function in some well-chosen subspace, but involves the geometry of the all possible feasible arcs maes the second-order condition (3.2) difficult to use, particular in the case where F IR n. In what follows we discuss, as far as we currently can, two resulting questions of interest.. Are they cases where these conditions reduce to checing homogeneous polynomials involving the objective function s derivatives on a subspace? 2. If that is not the case, are they circumstances in which not only the complete left-hand side of (3.) vanishes, but also each term of this left-hand side? We start by deriving useful consequences of Theorem 3.. Corollary 3.2 Suppose that the assumptions of Theorem 3. hold and let N be the normal cone to F at x and T the corresponding tangent cone. Then xf(x ) N (3.3) and Moreover s T span { xf(x ) } T and s 2 T. xf(x )[s i ] (i =,2). (3.4)

10 Cartis, Gould, Toint: high-order optimality and complexity 9 Proof. First observe that (3.9) for j = reduces to xf(x )[s ] = for all s T. Thus (3.3) holds. Also note that (3.9)-(3.) impose that s T er [ xf(x )] = T span { xf(x ) }, (3.5) which, because of (3.3) and the polarity of N and T, yields that s belongs to T. Assume now that s 2 T. Then, for all α sufficiently small, αs +α 2 s 2 does not belong to T and thus x(α) = x +αs +α 2 s 2 +o(α 2 ) cannot belong to F, which is a contradiction. Hence s 2 T and (3.4) follows for i = 2, while it follows from s T and (3.3) for i =. The first-order necessary condition (3.3) is well-nown for general first-order minimizers (see [4, Th. 2.9, p. 353] for instance). Consider now the second-order conditions (3.2). If F = IR n (or if the convex constraints are inactive at x ), then xf(x ) = because of (3.3) and (3.2) is nothing but the familiar condition that the Hessian of the objective function must be positive semi-definite. If x happens to lie on the boundary of F and xf(x ), (3.2) indicates that the effect of the curvature of the boundary of F may be represented by the term xf(x )[s 2 ], which is non-negative because of (3.4). Consider, for example, the problem min x F IR 2x where F = {x IR n x 2 x2 2 }, whose global solution is at the origin. In this case it is easy to chec that xf() = e N = span{ e }, that 2 xf() =, and that second-order feasible arcs of the form (3.5) with x() = may be chosen with s = ±e 2 and s 2 = βe where β. This imposes 2 2 xf()[s ] 2, which (unsurprisingly) holds. Interestingly, there are cases where the geometry of the set of locally feasible arcs is simple and manageable. In particular, suppose that the boundary of F is locally polyhedral. Then, given xf(x ), either T span { xf(x ) } =, in which case conditions (3.9) and (3.) are void, or there exists d in that subspace. It then possible to define a locally feasible arc with s = d and s 2 = = s q =. As a consequence, the smallest possible value of xf(x )[s 2 ] for feasible arcs starting from x is identically zero and this term therefore vanishes from (3.9)-(3.). Morever, because of the definition of P(, j) (see Table 3.2), all terms but that in j xf(x )[s ] j also vanish from these conditions, which then simplify to ( j ) j xf(x )[s ] j for all s T er i [ i xf(x )] (3.6) for j = 2,...,q, which is a condition only involving subspaces and (for i 2) cones (for i 3) Necessary conditions for unconstrained problems Consider now the case where x belongs to F, which is obviously the case if the problem is unconstrained. Then we have that the path (3.5) is unrestricted, D q F (x ) = IR n, and one is then free to choose the vectors {s i } q i= (and their sign) arbitrarily. Note first that, since N = {}, (3.3) implies that, unsurprisingly, xf(x ) =. i=

11 Cartis, Gould, Toint: high-order optimality and complexity For the second-order condition, we obtain from (3.9), again unsurprisingly, that, because er [ xf(x ] = IR n, 2 xf(x ) is positive semi-definite on IR n. Hence, if there exists a vector s er 2 [ 2 xf(x )], we have that [ 2 xf(x )] 2s = and therefore that 2 xf(x )[s,s 2 ] = for all s 2 IR n. Thus the term for = vanishes from (3.9), as well as all terms involving 2 xf(x ) applied to a vector s er 2 [ 2 xf(x )]. This implies in particular that the third-order condition may now be written as 3 xf(x )[s ] 3 = for all s er 2 [ 2 xf(x )], (3.7) where the equality is obtained by considering both s and s. Unfortunately, complications arise with fourth-order conditions, even when the objective function is a polynomial. Consider the following variant of Peano s [4] problem: min x IR 2f(x) = x2 2 κ x 2 x 2 +κ 2 x 4, (3.8) where κ and κ 2 are parameters. Then one can verify that ( ) xf() =, 2 xf() = 2 and Hence { [ 3 2κ for (i,j,) {(,2,),(,,2),(2,,)} xf()] ij = otherwise, { [ 4 24κ2 for (i,j,,l) = (,,,) xf()] ijl = otherwise. er [ xf()] = IR 2, er 2 [ 2 xf()] = span{e }, and er 3 [ 3 xf()] = span{e } span{e 2 }. The necessary condition (3.9) then state that, if the origin is a minimizer, then, using the arc defined by s = e and s 2 = 2 κ e 2, 2 2 xf()[s 2 ] xf()[s,s,s 2 ] xf()[s ] 4 = 4 κ2 2 κ2 +κ 2 = 4 κ2 +κ 2. This shows that the condition 4 xf(x )[s ] 4 on 3 i= eri [ i xf(x )], although necessary, is arbitrarily far away from the weaer necessary condition 2 2 xf()[s 2 ] xf()[s,s,s 2 ] xf()[s ] 4 (3.9) when κ grows. As was already the case for problem (3.3), the example for κ = and κ 2 = 2, say, shows that a function may admit a saddle point (x = ) which is a maximum (x = ) along a curve (x 2 = 3 2 x2 in this case) while at the same time be minimal along every line passing through x. Figure 3.2 shows the contour lines of the objective function of (3.8) for increasing values of κ 2, eeping κ = 3. One may attribute the problem that not every term in (3.9) vanishes to the fact that switching signs of s or s 2 does imply that any of the terms in (3.9) is zero (as we have verified) because of the terms 2 xf()[s 2 ] 2 and 4 xf()[s ] 4. Is this a feature of even orders

12 Cartis, Gould, Toint: high-order optimality and complexity *3mm *3mm Figure 3.2: The contour lines of (3.8) for κ = 3 and κ 2 = 2 (left), κ 2 = 2.25 (center) and κ 2 = 2.5 (right) only? Unfortunately, this not the case for q = 7. Indeed is it not difficult to verify that the terms whose multi-index (l,...,l ) is a permutation of (,2,2,2) belong to P(7,4) and those whose multi-index is a permutation of (,,,,,2) belong to P(7,6). Moreover, the contribution of these terms to the sum (3.9) cannot be distinguished by varying s or s 2, for instance by switching their signs as this technique yields only one equality in two unnowns. In general, we may therefore conclude that (3.9) must involve a mixture of terms with derivative tensors of various degrees A sufficient condition Despite the limitations we have seen when considering the simplified Hancoc example, we may still derive a sufficient condition for x to be an isolated minimizer, which is inspired by the standard second-order case (see Theorem 2.4 in Nocedal and Wright [4] for instance). Theorem 3.3 Suppose that f is q times continuously differentiable in an open neighbourhood of x F. If x F, suppose also that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D p F (x ). Let T be the tangent cone to F at x. If there exists an q [,p ] such that, for all s T, i xf(x )[s] i = (i =,...,q) and q+ x f(x )[s] q+ >, (3.2) then x is an isolated minimizer for problem (3.). Proof. The second part of condition(3.2) and the continuity of the(q+)-th derivative imply that q+ x f(z)[s] q+ > (3.2) for all s T and all z is a sufficiently small feasible neighbourhood of x. Now, using Taylor s expansion, we obtain that, for all s T and all τ (,), f(x +τs) f(x ) = q i= τ i i! i xf(x )[s] i + τq+ (q +)! q+ x f(z)[s] q+

13 Cartis, Gould, Toint: high-order optimality and complexity 2 for some z [x,x +τs]. If τ is sufficiently small, then this equality, the first part (3.2) and (3.2) ensure that f(x +τs) > f(x ). Since this strict inequality holds for all s T and all τ sufficiently small, x must be a feasible isolated minimizer. Observe that, in Peano s example (see (3.8) with κ = 3 and κ 2 = 2), we have that the curvature of the objective function is positive along every line passing through the origin, but that the order of the curvature varies with s (second order along s = e 2 and fourth order along s = e ), which precludes applying Theorem An approach using Taylor models As already noted, the conditions expressed in Theorem 3. may, in general, be very complicated to verify in an algorithm, due to their dependence on the geometry of the set of feasible paths. To avoid this difficulty, we now explore a different approach. Let us now define, for some (,] and some j {,...,p}, φ def f,j (x) = f(x) globmint f,j (x,d), (3.22) x+d F d the smallest value of the j-th order Taylor model T f,j (x,s) achievable by a feasible point at distance at most from x. Note that φ f,j (x) is a continuous function of x and for given F and f (see [33, Th. 7]). The introduction of this quantity is in part motivated by the following theorem. Theorem 3.4 Suppose that f is q times continuously differentiable in an open neighbourhood of x. If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D q F (x). Define Z f,j def F (x) = {(s,...,s j ) D j F (x) (s,...,s i ) satisfy (3.) (at x) for i {,...,j } }. Then φ f,j lim (x) j = implies that (3.9) holds (at x) for all (s,...,s j ) Z f,j F (x). Proof. We start by rewriting the power series (3.) for degree j, for any given arc x(α) tangent to D j F (x) in the form f(x(α)) f(x) = j c i α i +o(α j ) = T f,j (x,s(α)) f(x) (3.23) i= where s(α) def = x(α) x and c i def = i =! (l,...,l ) P(i,) xf(x)[s l,...,s l ],

14 Cartis, Gould, Toint: high-order optimality and complexity 3 and where the last equality in (3.23) holds because f and T f,j share the first j derivatives at x. This reformulation allows us to write that, for i {,...,j}, c i = d i [ Tf,j i! dα i (x,s(α)) f(x) ]. (3.24) α= Assume now there exists an (s,...,s j ) Z f,j F (x) such that (3.9) does not hold. In the notation just introduced, this means that, for this particular (s,...,s j ), Then, from (3.24), c i = for i {,...,j } and c j <. d i [ Tf,j dα i (x,s(α)) f(x) ] = for i {,...,j }, (3.25) α= and thus the first (j ) coefficients of the polynomial T f,j (x,s(α)) f(x) vanish. Thus, using (3.23), d j [ Tf,j dα j (x,s(α)) f(x) ] T f,j (x,s(α)) f(x) = j! lim α= α α j. (3.26) Now let i be the index of the first nonzero s i. Note that i {,...,j} since otherwise the structure of the sets P(i,) implies that c j =. Observe also that we may redefine the α parameter to α s i /i so that we may assume, without loss of generality that s i =. As a consequence, we obtain that, for sufficiently small α, s(α) 3 2 αi 3 α. (3.27) 2 Hence, successively using the facts that c j <, that (3.24) and (3.26) hold for all arcs x(α) tangent to D q F (x), and that (3.27) and (3.22) hold, we may deduce that < c j j! j! f(x) T f,j (x,s(α)) lim α α j f(x) T f,j (x,s(α)) ( 3 2 )j lim α s(α) j f(x) T f,j (x,s(α)) = ( 3 2 )j lim s(α) s(α) j ( 3 2 )j lim φ f,j (x) j. The conclusion of the theorem immediately follows since lim φ f,j (x) j =. This theorem has a useful consequence.

15 Cartis, Gould, Toint: high-order optimality and complexity 4 Corollary 3.5 Suppose that f is q times continuously differentiable in an open neighbourhood of x. If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D q F (x). Then x is a q-th order critical point for problem (3.) if φ f,j lim (x) j = for j {,...,q}. (3.28) Proof. We successively apply Theorem 3.4 q times and deduce that x is a j-th order critical point for j =,...,q. This last result says that we may avoid the difficulty of dealing the the possibly complicated geometry of D q F (x) if we are ready to perform the global optimization occurring in (3.22) exactly and find a way to compute or overestimate the limit in (3.28). Although this is a positive conclusion, these two remaining challenges remain daunting. However, it worthwhile noting that the standard approach to computing first- second- and third-order criticality measures for unconstrained problems follows the exact same approach. In the first-order case, it is easy to verify that xf(x) = [ ] [ min d xf(x)[d] = ( ) ] f(x) globmin f(x)+ xf(x)[d] d where the first equality is justified by the convexity of xf(x)[d] as a function of d. Because the left-hand side of the above relation is independent of, the computation of the limit (3.28) for tending to zero is trivial when j = and the limiting value is x f(x). For the second-order case, assuming xf(x) =, [ [ )] min,λ min [ xf(x)]] 2 = 2 2 globmin d 2 xf(x)[d] 2 = 2 [f(x) globmin d ( f(x)+ xf(x)[d]+ 2 2 xf(x)[d] 2)], (3.29) the first global optimization problem being easily solvable by a trust-region-type calculation [2, Section 7.3] or directly by an equivalent eigenvalue analysis. As for the first-order case, the left-hand side of the equation is independent of and obtaining the limit for tending to zero is trivial. Finally, if M(x) def = er[ xf(x)] er[ 2 xf(x)] and P M(x) is the orthogonal projection onto that subspace, P M(x) ( 3 xf(x)) = 6 3 [ min d xf(x)[d] ] = 3 [ f(x) globmin d ( f(x)+ xf(x)[d]+ 2 2 xf(x)[d] xf(x)[d] 3 )] (3.3) where the first equality results from(2.2). In this case, the global optimization in the subspace M(x) is potentially harder to solve exactly (a randomization argument is used in [] to derive

16 Cartis, Gould, Toint: high-order optimality and complexity 5 a upper bound on its value), although it still involves a subpace (4). While we are unaware of a technique for maing the global minimization in (3.22) easy in the even more complicated general case, we may thin of approximating the limit in (3.28) by choosing a (user-supplied) value of > small enough (5) and consider the size of the quantity (x). (3.3) jφ f,j Unfortunately, it is easy to see that, if is fixed at some positive value, a zero value of φ f,j (x) alone is not a necessary condition for x being a local minimizer. Indeed consider the univariate problem of minimizing f(x) = x 2 ( αx) for α >. One verifies that, for any >, the choice α = 2/ yields that φ f, () =, φ f,2 () = but φ f,3 () = 4 >, (3.32) α2 despite being a local (but not global) minimizer. As a matter of fact, φ f,j (x) gives more information than the mere potential proximity of a j-th order critical point: it is able to see beyond an infinitesimal neighbourhood of x and provides information on possible further descent beyond such a neighbourhood. Rather than a true criticality measure, it can be considered, for fixed, as an indicator of further progress, but its use for terminating at a local minimizer is clearly imperfect. Despite this drawbac, the above arguments would suggest that it is reasonable to consider a (conceptual) minimization algorithm whose objective is to find a point x ǫ such that φ f,j (x ǫ) ǫ j for j =,...,q (3.33) for some (,] sufficiently small and some q {,...,p}. This condition implies an approximate minimizing property which we mae more precise by the following result. Theorem 3.6 Suppose that f is q times continuously differentiable and that q xf is Lipschitz continous with constant L f,q (in the sense of (2.5)) in an open neighbourhood of x ǫ of radius larger than. Suppose also (3.33) holds for j = q. Then ( ) p!ǫ f(x ǫ +d) f(x ǫ ) 2ǫ q q q+ for all x ǫ +d F such that d min. L f,p (3.34) Proof. Consider x+d F. Using the triangle inequality, we have that f(x ǫ +d) = f(x ǫ +d) T f,q (x ǫ,d)+t f,q (x ǫ,d) f(x ǫ +d) T f,q (x ǫ,d) +T f,q (x ǫ,d). (3.35) Now, condition (3.33) for j = q, implies that, if d, T f,q (x ǫ,d) T f,q (x ǫ,) ǫ q = f(x ǫ ) ǫ q. (3.36) (4) We saw in Section 3..2 that q = 3 is the highest order for which this is possible. (5) Note that a small has the advantage of limiting the global optimization effort.

17 Cartis, Gould, Toint: high-order optimality and complexity 6 Hence, substituting (2.) and (3.36) into (3.35) and using again that d <, we deduce that f(x ǫ +d) f(x ǫ ) L f,p d q+ ǫ q, p! and the desired result follows. The size of the neighbourhood of x ǫ where f is locally smallest in that the first part (3.34) holds therefore increases with the criticality order q, a feature potentially useful in various contexts such as global optimization. Before turning to more algorithmic aspects, we briefly compare the results of Theorem 3.6 which what can be deduced on the local behaviour of the Taylor series T f,q (x,s) if, instead of requiring the exact necessary condition (3.9) to hold exactly, this condition is relaxed to j! xf(x )[s l,...,s l ] ǫ (3.37) = (l,...,l ) P(j,) while insisting that (3.) should hold exactly. If j = q =, it is easy to verify that (3.37) for s T is equivalent to the condition that P T [ xf(x )] ǫ, (3.38) from which we deduce, using the Cauchy-Schwarz inequality, that T f, (x,s) T f, (x,) ǫ (3.39) for all s T with d, that is (3.33) for j =. Thus, by Theorem 3.6, we obtain that (3.34) holds for j =. 4 A trust-region minimization algorithm Aware of the optimality conditions and their limitations, we may now consider an algorithm to achieve (3.33). This objective naturally suggests a trust-region (6) formulation with adaptative model degree, in which the user specifies a desired criticality order q, assuming that derivatives of order,...,q are available when needed. We made this idea explicit in Algorithm 4. on the next page. We first state a useful property of Algorithm 4., which ensures that a fixed fraction of the iterations,2,..., must be either successful or very successful. Indeed, if we define the following bound holds. S def = {l {,...,} ρ l η }, (6) A detailed account and a comprehensive bibliography on trust-region methods can be found in [2].

18 Cartis, Gould, Toint: high-order optimality and complexity 7 Algorithm 4.: Trust-region algorithm using adaptive order models for convexly-contrained problems (TRq) Step : Initialization. A criticality order q, an accuracy threshold ǫ (, ], a starting point x and an initial trust-region radius [ǫ,] are given, as well as algorithmic parameters max [,], γ γ 2 < γ 3 and < η η 2 <. Compute x = P F [x ], evaluate f(x ) and set =. Step : Step computation. For j =,...,q,. Evaluate j f(x ) and compute φ f,j (x ) from (3.22). 2. If φ f,j (x ) > ǫ j, go to Step 3 with s = d, where d is the argument of the global minimum in the computation of φ f,j (x ). Step 2: Termination. Terminate with x ǫ = x and ǫ =. Step 3: Accept the new iterate. Compute f(x +s ) and ρ = f(x ) f(x +s ) T f,j (x,) T f,j (x,s ). (4.) If ρ η, set x + = x +s. Otherwise set x + = x. Step 4: Update the trust-region radius. Set [γ,γ 2 ] if ρ < η, + [γ 2, ] if ρ [η,η 2 ), [,min( max,γ 3 )] if ρ η 2, (4.2) increment by one and go to Step.

19 Cartis, Gould, Toint: high-order optimality and complexity 8 Lemma 4. Assume that min for some min > independent of. Then Algorithm 4. ensures that, whenever S, ( def κ u S, where κ u = + logγ ) ( ) 3 + logγ 2 logγ 2 log. (4.3) min Proof. The trust-region update (4.2) ensures that γ U 2 γ S 3, where U = {,...,} \ S. This inequality then yields (4.3) by taing logarithms and using that S and = S + U. 4. Evaluation complexity for Algorithm 4. We start our worst-case analysis by formalizing our assumptions. Let L f def = {x+z IR n x F, f(x) f(x ) and z max }. AS. AS.2 AS.3 The feasible set F is closed, convex and non-empty. The objective function f is q times continuously differentiable on an open set containing L f. For j {,...,q}, the j-th derivative of f is Lipschitz continuous on L f (in the sense of (2.5)) with Lipschitz constant L f,j. For simplicity of notation, define L f def = max j {,...,q} L f,j. Algorithm 4. is required to start from a feasible x F, which, together with the fact that the subproblem solution in Step 2 involves minimization over F, leads to AS.. Note that AS.3 automatically holds if f is q+ times continuously differentiable and F is bounded. It is also important to note that we could replace AS.3 by the condition that (2.) holds on the path of iterates [x,x + ] without altering any of the proofs below. While this weaer formulation may be useful, we prefer to use AS.3 in the sequel of this paper because it is independent of the sequence of iterates and may be easier to verify a priori, given the problem (3.). Lemma 4.2 Suppose that AS.2 and AS.3 hold. Then, for all l {,...,}, l κ ǫ (4.4) where [ def κ = min, γ ] ( η 2 ). (4.5) L f

20 Cartis, Gould, Toint: high-order optimality and complexity 9 Proof. Assume that, for some l {,...,} From (4.), we obtain that, for some j {,...,q}, l η 2 L f ǫ. (4.6) ρ l = f(x l +s l ) T f,j (x l,s l ) T f,j (x l,) T f,j (x l,s l ) < L f s l j+ j!ǫ j l L f l j!ǫ ( η 2 ), where we used (2.9) and the fact that φ l f,j (x l) > ǫ j l to deduce the first inequality, the bound s l l to deduce the second, and (4.6) with j to deduce the third. Thus ρ l η 2 and l+ l. The mechanism of the algorithm and the inequality ǫ then ensures that, for all l, [ l min, γ ] ( η 2 )ǫ κ ǫ. L f We now derive a simple lower bound on the objective function decrease at successful iterations. Lemma 4.3 Suppose that AS. AS.3 hold. Then, if is the index of a successful iteration before termination, f(x ) f(x + ) η κ ǫ q+. (4.7) Proof. We have, using (4.), the fact that φ f,j (x ) > ǫ j (4.4) successively, that for some j {,...,q} and f(x ) f(x + ) η [T f,j (x,) T f,j (x,s )] = η φ f,j (x ) > η κ ǫ j+ η κ ǫ q+. Our worst-case evaluation complexity results can now be proved by summing the decreases guaranteed by this last lemma.

21 Cartis, Gould, Toint: high-order optimality and complexity 2 Theorem 4.4 Suppose that AS. AS.3 hold. Let f low be a lower bound on f within F. Then, given ǫ (,], Algorithm 4. applied to problem (3.) needs at most κ f f(x ) f low S ǫ q+ (4.8) successful iterations(each possibly involving one evaluation of f and its q first derivatives) and at most κ u κ f f(x ) f low S ǫ q+ + (4.9) iterations in total to terminate with an iterate x ǫ such that (3.33) holds, where κ f S = [ max, η L f γ ( η 2 ) ], (4.) and κ u is given by (4.3). Moreover, if ǫ is the value of at termination, for all d such that f(x ǫ +d) f(x ǫ ) 2ǫ q ǫ (4.) ( ) x ǫ +d F and d (ǫ q ǫ) Lf q+ q+. (4.2) q! Observe that, because of (4.2) and (4.4), ǫ [κ δ ǫ, max ]. Proof. Let be the index of an arbitrary iteration before termination. Using the definition of f low, the nature of successful iterations, (4.) and Lemma 4.3, we deduce that f(x ) f low f(x ) f(x + ) = i S [f(x i ) f(x i+ )] S [κ f S ] ǫ q+ (4.3) which proves (4.8). We next call upon Lemma 4. to compute the upper bound on the total number of iterations before termination (obviously, there must be a least one successful iteration unless termination occurs for = ) and add one for the evaluation at termination. Finally, (4.)-(4.2) result from Theorem 3.6, AS.3 and the fact that φ f,q (x ǫ) ǫ q ǫ at termination. Theorem 4.4 generalizes the nown bounds for the cases where F = IR and q = [38], q = 2 [2,39] and q = 3 []. The results for q = 2 with F IR n and for q > 3 appear to be new. The latter provide the first evaluation complexity bounds for general criticality order q. Note that, if q =, bounds of the type O(ǫ (p+)/p ) exist if one is ready to minimize models of degree p > q (see [6]). Whether similar improvements can be obtained for q > remains an open question at this stage.

22 Cartis, Gould, Toint: high-order optimality and complexity 2 We also observe that the above theory remains valid if the termination rule φ,j (x ) ǫ j for j {,...,q} (4.4) used in Step is replaced by a more flexible one, involving other acceptable termination circumstances, such as if (4.4) hold or some other condition holds. We conclude this section by noting that the global optimization effort involved in the computation of φ j,j (x ) (j {,...,q}) in Algorithm 4. might be limited by choosing max small enough. 5 Sharpness It is interesting that an example was presented in [4] showing that the bound in O(ǫ 3 ) evaluations for q = 2 is essentially sharp for both the trust-region and a regularization algorithm. This is significant, because requiring φ f,2 (x) ǫ 2 is slightly stronger, for small, than the standard condition [ ] xf(x) ǫ and min,λ min [ 2 xf(x)] ǫ (5.) (used in [39] and [2] for instance). Indeed, for one-dimensional problems and assuming 2 xf(x), the former condition amounts to requiring that ( ) 2 2 xf(x)+2 x(f(x) ǫ (5.2) where the absolute value reflects the fact that s = ± depending on the sign of g. In the remainder of this section, we show that the example proposed in [4] can be extended to arbitrary order q, and thus that the complexity bounds (4.8)-(4.9) are esentially sharp for our trust-region algorithm. The idea of our generalized example is to apply Algorithm 4. to a unidimensional objective function f for some fixed q and F = IR + (hence guaranteeing AS.), generating a sequence of iterates {x } starting from the origin, i.e. x = x =. We first choose the sequences of derivatives values up to order q to be, for all, j xf(x ) = for j {,...,q } and q xf(x ) = q! ( ) q+ +δ (5.3) + where δ (,) is a (small) positive constant. This means that, at iterate x, the q-th order Taylor model is given by ( ) q+ +δ T f,q (x,s) = f(x ) s q + wherethevalueoff(x )remainsunspecifiedfornow. Thestepisthenobtainedbyminimizing this model in a trust-region of radius = ( ) q+ +δ, +

23 Cartis, Gould, Toint: high-order optimality and complexity 22 yielding that s = = As a consequence, the model decrease is given by ( ) q+ +δ (,). (5.4) + T f,q (x,) T f,q (x,s ) = ( ) q! q xf(x )s q +(q+)δ =. (5.5) + For our example, we the define the objective function decrease at iteration to be f def = f(x ) f(x +s ) = 2 (η +η 2 )[T f,q (x,) T f,q (x,s )], (5.6) thereby ensuring that ρ [η,η 2 ) and x + = x + s for each. Summing up function decreases, we may then specify the objective function s values at the iterates by f(x ) = η +η 2 ζ(+(q +)δ) and f(x + ) = f(x ) η ( ) +η 2 +(q+)δ. (5.7) where ζ(t) def = = t is the Riemann zeta function. This function is finite for all t > (and thus also for t = +(q +)δ), thereby ensuring that f(x ) for all. We also verify that ( ) + + q+ +δ = [γ 2,] +2 in accordance with (4.2), provided γ 2 ( 2 3 ) q+ +δ. Observe also that (5.3) and (5.5) ensure that, for each, φ f,j (x ) = for j {,...,q } (5.8) and ( ) φ +(q+)δ ( ) f,q (x q+ +δ ) = = q + +. (5.9) We now use Hermite interpolation to construct the objective function f on the successive intervals [x,x + ], and define where p is the polynomial f(x) = p (x x )+f(x ) for x [x,x + ] and, (5.) p (s) = 2q+ i= with coefficients defined by the interpolation conditions p () = f(x ) f(x + ), p (s ) = ; c i, s i, (5.) j sp () = = j sp (s ) for j {,...,q }, (5.2) q sp () = q xf(x ), q sp (s ) = q xf(x + ).

24 Cartis, Gould, Toint: high-order optimality and complexity 23 These conditions ensure that f(x) is q times continuously differentiable on IR + and thus that AS.2 holds. They also impose the following values for the first q + coefficients c, = f(x ) f(x + ), c j, = (j {,...,q }), c q, = q xf(x ); (5.3) and the remaining q + coefficients are solutions of the linear system s q+ s q+2... s 2q+ (q +)s q (q +2)s q+.. (q+)! (q+2)!! s 2! s (2q +)s 2q. (2q+)! (q+)! sq+ c q+, c q+2,. c 2q+, = r, (5.4) where the right-hand side is given by f q! q xf(x )s q (q )! q xf(x )s q r =.. (5.5) p xf(x )s q xf(x + ) q xf(x ) Observe now that the coefficient matrix of this linear system may be written as s q+ s q... M s q... s s q where M q def =... q + q q +. (q+)!!. (q+2)! 2!.... (2q+)! (q+)! is an invertible matrix independent of (see Appendix). Hence c q+, c q+2,. c 2q+, = s... s q M q s (q+) Observe now that, because of (5.4), (5.6), (5.5) and (5.3), s q... s (5.6) r (5.7) f = O(s q+ ), q xf(x )s q j = O(s q+ j ) (j {,...,q }) and, since q xf(x ) < q xf(x + ) <, q xf(x + ) q xf(x ) q xf(x ) q!s.

25 Cartis, Gould, Toint: high-order optimality and complexity 24 These bounds and (5.5) imply that [r ] i, the i-th component of r, satisfies [r ] i = O(s q+2 i ) for i {,...,q +}. Hence, using (5.7) and the non-singularity of M q, we obtain that there exists a constant κ q independent of such that and thus that c i, s i q κ q for i {q +,...,2q +}, (5.8) q+ s p (s) 2q+ i=q+ i! c i, s i q 2q+ i=q+ i! κ q Moreover, using successively (5.), the triangle inequality, (5.3), (5.3), (5.4), (5.8) and κ q, we obtain that, for j {,...,q}, j sp (s) = 2q+ i=j i! (i j)! c i, s i j q! (q j)! c q, s q j + 2q+ i=q+ 2q+ q! (q j)! + i! (i j)! c i, s i q i=q+ 2q+ i! κ q (i j)! i=q i! (i j)! c i, s i q s q+ j and thus all derivatives of order one up to q remain bounded on [,s ]. Because of (5.), we therefore obtain that AS.3 holds. Moreover (5.3), (5.8), the inequalities q xf(x ) q! and f(x ), (5.) and (5.4) also ensure that f(x) is bounded below. We have therefore shown that the bounds of Theorem 4.4 are essentially sharp, in that, for every δ >, Algorithm 4. applied to the problem of minimizing the lower-bounded objective function f just constructed and satisfying AS.-AS.3 will tae, because of (5.8) and (5.9), ǫ q+ +(q+)δ iterations and evaluation of f and its q first derivatives to find an iterate x such that condition (4.4) holds. Moreover, it is clear that, in the example presented, the global rate of convergence is driven by the term of degree q in the Taylor series. 6 Discussion We have analyzed the optimality conditions of order 2 and above, and proposed a measure of criticality for arbitrary order for convexly constrained nonlinear optimization problems. As this measure can be extended to define ǫ-approximate critical points of high-order, we have then used it in a conceptual trust-region algorithm to show that, if derivatives of the objective

26 Cartis, Gould, Toint: high-order optimality and complexity 25 function up to order q can be evaluated and are Lipschitz continuous, then this algorithm applied to the convexly constrained problem (3.) needs at most O(ǫ (q+) ) evaluations of f and its derivatives to compute an ǫ-approximate q-order critical point. Moreover, we have shown by an example that this bound is essentially sharp. In the purely unconstrained case, this result recovers nown results for q = (firstorder criticality for Lipschitz gradients) [38], q = 2 (second-order criticality (7) with Lipschitz Hessians) [4,39] and q = 3 (third-order criticality (8) with Lipschitz continuous third derivative) [], but extends them to arbitrary order. The results for the convexly constrained case appear to be new and provide in particular the first complexity bound for second- and third-order criticality for such inequality constrained problems. Because the condition (4.4) measure different orders of criticality, we could choose to use a different ǫ for every order (as in [4]), complicating the expression of the bound accordingly. However, as shown by our example, the worst-case behaviour of Algorithm 4. is dominated by that of q xf, which maes the distinction of the various ǫ-s less crucial. Because of the global optimization occurring in the definition of the criticality measure φ f,j (x), the algorithm discussed in the present paper remains, in general, of a theoretical nature. However there may be cases where this computation is tractable for small enough, for instance if the derivative tensors of the objective function are strongly structured. Such approaches may hopefully be of use for small dimensional or structured highly nonlinear problems, such as those occurring in machine learning using deep learning techniques (see []). The present framewor for handling convex constraints is not free of limitations, resulting from our choice to transfer difficulties associated with the original problem to the subproblem solution, thereby sparing precious evaluations of f and its derivatives. In particular, the cost of evaluating any constraint function/derivative possibly defining the convex feasible set F is neglected by the present approach, which must therefore be seen as a suitable framewor to handle cheap inequality constraint such as simple bounds. It is nown that the complexity of obtaining ǫ-approximate first-order criticalty for unconstrained and convexly-constrained problem can be reduced to O(ǫ (p+)/p ) if one is ready to define the step by using a regularization model of order p. In the unconstrained case, this was shown for p = 2 in [2,39] and for general p in [6], while the convexly constrained case was analyzed (for p = 2) in [3]. Whether this methodology and the associated improvements in evaluation complexity bounds can be extended to order above one is the subject of ongoing research. Acnowledgements The authors would lie to than Oliver Stein for suggesting reference [33], as well as Jim Bure and Adrian Lewis for interesting discussions. The third author would also lie to acnowlegge the support provided by the Belgian Fund for Scientific Research (FNRS), the Leverhulme Trust (UK), Balliol College (Oxford), the Department of Applied Mathematics of the Hong Kong Polytechnic University, ENSEEIHT (Toulouse, France) and INDAM (Florence, Italy). References [] A. Anandumar and R. Ge. Efficient approaches for escaping high-order saddle points in nonconvex optimization. arxiv , 26. (7) Using (3.29). (8) Using (3.3).

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian C. Cartis, N. I. M. Gould and Ph. L. Toint 3 May 23 Abstract An example is presented where Newton

More information

arxiv: v1 [math.oc] 3 Nov 2018

arxiv: v1 [math.oc] 3 Nov 2018 arxiv:8.0220v [math.oc] 3 Nov 208 Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints C. Cartis, N. I. M. Gould and Ph. L. Toint 3 November

More information

Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization

Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization C. Cartis N. I. M. Gould and h. L. Toint 9 May 27 Abstract Necessary conditions

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED

More information

A recursive model-based trust-region method for derivative-free bound-constrained optimization.

A recursive model-based trust-region method for derivative-free bound-constrained optimization. A recursive model-based trust-region method for derivative-free bound-constrained optimization. ANKE TRÖLTZSCH [CERFACS, TOULOUSE, FRANCE] JOINT WORK WITH: SERGE GRATTON [ENSEEIHT, TOULOUSE, FRANCE] PHILIPPE

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Evaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

The Trust Region Subproblem with Non-Intersecting Linear Constraints

The Trust Region Subproblem with Non-Intersecting Linear Constraints The Trust Region Subproblem with Non-Intersecting Linear Constraints Samuel Burer Boshi Yang February 21, 2013 Abstract This paper studies an extended trust region subproblem (etrs in which the trust region

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 ) A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions

Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions March 6, 2013 Contents 1 Wea second variation 2 1.1 Formulas for variation........................

More information

An introduction to complexity analysis for nonconvex optimization

An introduction to complexity analysis for nonconvex optimization An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint

More information

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012 (Homework 1: Chapter 1: Exercises 1-7, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

This manuscript is for review purposes only.

This manuscript is for review purposes only. 1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region

More information

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1 Lectures - Week 11 General First Order ODEs & Numerical Methods for IVPs In general, nonlinear problems are much more difficult to solve than linear ones. Unfortunately many phenomena exhibit nonlinear

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Implications of the Constant Rank Constraint Qualification

Implications of the Constant Rank Constraint Qualification Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Mini-Course 1: SGD Escapes Saddle Points

Mini-Course 1: SGD Escapes Saddle Points Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

1 Lagrange Multiplier Method

1 Lagrange Multiplier Method 1 Lagrange Multiplier Method Near a maximum the decrements on both sides are in the beginning only imperceptible. J. Kepler When a quantity is greatest or least, at that moment its flow neither increases

More information

ZEROES OF INTEGER LINEAR RECURRENCES. 1. Introduction. 4 ( )( 2 1) n

ZEROES OF INTEGER LINEAR RECURRENCES. 1. Introduction. 4 ( )( 2 1) n ZEROES OF INTEGER LINEAR RECURRENCES DANIEL LITT Consider the integer linear recurrence 1. Introduction x n = x n 1 + 2x n 2 + 3x n 3 with x 0 = x 1 = x 2 = 1. For which n is x n = 0? Answer: x n is never

More information

Arc Search Algorithms

Arc Search Algorithms Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where

More information

A Derivative-Free Gauss-Newton Method

A Derivative-Free Gauss-Newton Method A Derivative-Free Gauss-Newton Method Coralia Cartis Lindon Roberts 29th October 2017 Abstract We present, a derivative-free version of the Gauss-Newton method for solving nonlinear least-squares problems.

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July

More information

Analysis II: The Implicit and Inverse Function Theorems

Analysis II: The Implicit and Inverse Function Theorems Analysis II: The Implicit and Inverse Function Theorems Jesse Ratzkin November 17, 2009 Let f : R n R m be C 1. When is the zero set Z = {x R n : f(x) = 0} the graph of another function? When is Z nicely

More information

Lecture 3: Basics of set-constrained and unconstrained optimization

Lecture 3: Basics of set-constrained and unconstrained optimization Lecture 3: Basics of set-constrained and unconstrained optimization (Chap 6 from textbook) Xiaoqun Zhang Shanghai Jiao Tong University Last updated: October 9, 2018 Optimization basics Outline Optimization

More information

Simple Examples on Rectangular Domains

Simple Examples on Rectangular Domains 84 Chapter 5 Simple Examples on Rectangular Domains In this chapter we consider simple elliptic boundary value problems in rectangular domains in R 2 or R 3 ; our prototype example is the Poisson equation

More information

On proximal-like methods for equilibrium programming

On proximal-like methods for equilibrium programming On proximal-lie methods for equilibrium programming Nils Langenberg Department of Mathematics, University of Trier 54286 Trier, Germany, langenberg@uni-trier.de Abstract In [?] Flam and Antipin discussed

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

We simply compute: for v = x i e i, bilinearity of B implies that Q B (v) = B(v, v) is given by xi x j B(e i, e j ) =

We simply compute: for v = x i e i, bilinearity of B implies that Q B (v) = B(v, v) is given by xi x j B(e i, e j ) = Math 395. Quadratic spaces over R 1. Algebraic preliminaries Let V be a vector space over a field F. Recall that a quadratic form on V is a map Q : V F such that Q(cv) = c 2 Q(v) for all v V and c F, and

More information

An introduction to Birkhoff normal form

An introduction to Birkhoff normal form An introduction to Birkhoff normal form Dario Bambusi Dipartimento di Matematica, Universitá di Milano via Saldini 50, 0133 Milano (Italy) 19.11.14 1 Introduction The aim of this note is to present an

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization

Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization J. M. Martínez M. Raydan November 15, 2015 Abstract In a recent paper we introduced a trust-region

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28 26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min. MA 796S: Convex Optimization and Interior Point Methods October 8, 2007 Lecture 1 Lecturer: Kartik Sivaramakrishnan Scribe: Kartik Sivaramakrishnan 1 Conic programming Consider the conic program min s.t.

More information

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. . Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,

More information

arxiv: v3 [math.oc] 8 Jan 2019

arxiv: v3 [math.oc] 8 Jan 2019 Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate

More information

Copyrighted Material. L(t, x(t), u(t))dt + K(t f, x f ) (1.2) where L and K are given functions (running cost and terminal cost, respectively),

Copyrighted Material. L(t, x(t), u(t))dt + K(t f, x f ) (1.2) where L and K are given functions (running cost and terminal cost, respectively), Chapter One Introduction 1.1 OPTIMAL CONTROL PROBLEM We begin by describing, very informally and in general terms, the class of optimal control problems that we want to eventually be able to solve. The

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Linear Algebra. Preliminary Lecture Notes

Linear Algebra. Preliminary Lecture Notes Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date April 29, 23 2 Contents Motivation for the course 5 2 Euclidean n dimensional Space 7 2. Definition of n Dimensional Euclidean Space...........

More information

Elements of Convex Optimization Theory

Elements of Convex Optimization Theory Elements of Convex Optimization Theory Costis Skiadas August 2015 This is a revised and extended version of Appendix A of Skiadas (2009), providing a self-contained overview of elements of convex optimization

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

On the complexity of an Inexact Restoration method for constrained optimization

On the complexity of an Inexact Restoration method for constrained optimization On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

A line-search algorithm inspired by the adaptive cubic regularization framework, with a worst-case complexity O(ɛ 3/2 ).

A line-search algorithm inspired by the adaptive cubic regularization framework, with a worst-case complexity O(ɛ 3/2 ). A line-search algorithm inspired by the adaptive cubic regularization framewor, with a worst-case complexity Oɛ 3/. E. Bergou Y. Diouane S. Gratton June 16, 017 Abstract Adaptive regularized framewor using

More information

Smooth Structure. lies on the boundary, then it is determined up to the identifications it 1 2

Smooth Structure. lies on the boundary, then it is determined up to the identifications it 1 2 132 3. Smooth Structure lies on the boundary, then it is determined up to the identifications 1 2 + it 1 2 + it on the vertical boundary and z 1/z on the circular part. Notice that since z z + 1 and z

More information

Università di Firenze, via C. Lombroso 6/17, Firenze, Italia,

Università di Firenze, via C. Lombroso 6/17, Firenze, Italia, Convergence of a Regularized Euclidean Residual Algorithm for Nonlinear Least-Squares by S. Bellavia 1, C. Cartis 2, N. I. M. Gould 3, B. Morini 1 and Ph. L. Toint 4 25 July 2008 1 Dipartimento di Energetica

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

Ordinary Differential Equation Introduction and Preliminaries

Ordinary Differential Equation Introduction and Preliminaries Ordinary Differential Equation Introduction and Preliminaries There are many branches of science and engineering where differential equations arise naturally. Now days, it finds applications in many areas

More information

c 2000 Society for Industrial and Applied Mathematics

c 2000 Society for Industrial and Applied Mathematics SIAM J. OPIM. Vol. 10, No. 3, pp. 750 778 c 2000 Society for Industrial and Applied Mathematics CONES OF MARICES AND SUCCESSIVE CONVEX RELAXAIONS OF NONCONVEX SES MASAKAZU KOJIMA AND LEVEN UNÇEL Abstract.

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS

A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS Yugoslav Journal of Operations Research 25 (205), Number, 57 72 DOI: 0.2298/YJOR3055034A A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM FOR P (κ)-horizontal LINEAR COMPLEMENTARITY PROBLEMS Soodabeh

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section 4 of

More information

Accelerating the cubic regularization of Newton s method on convex problems

Accelerating the cubic regularization of Newton s method on convex problems Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method

More information

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3 MATH 167: APPLIED LINEAR ALGEBRA Chapter 3 Jesús De Loera, UC Davis February 18, 2012 Orthogonal Vectors and Subspaces (3.1). In real life vector spaces come with additional METRIC properties!! We have

More information