|
|
- Brittany Morgan
- 5 years ago
- Views:
Transcription
1 Second-order optimality and beyond: characterization and evaluation complexity in nonconvex convexly-constrained optimization by C. Cartis, N. I. M. Gould and Ph. L. Toint Report NAXYS August University of Namur, 6, rue de Bruxelles, B5 Namur (Belgium)
2 Second-order optimality and beyond: characterization and evaluation complexity in convexly-constrained nonlinear optimization C. Cartis N. I. M. Gould and Ph. L. Toint 5 August 26 Abstract High-order optimality conditions for convexly-constrained nonlinear optimization problems are analyzed. A corresponding (expensive) measure of criticality for arbitrary order is proposed and extended to define high-order ǫ-approximate critical points. This new measure is then used within a conceptual trust-region algorithm to show that, if derivatives of the objective function up to order q can be evaluated and are Lipschitz continuous, then this algorithm applied to the convexly constrained problem needs at most O(ǫ (q+) ) evaluations of f and its derivatives to compute an ǫ-approximate q-th order critical point. This provides the first evaluation complexity result for critical points of arbitrary order in nonlinear optimization. An example is discussed showing that the obtained evaluation complexity bounds are essentially sharp. Keywords: nonlinear optimization, high-order optimality conditions, complexity theory, machine learning. Introduction Recent years have seen a growing interest in the analysis of the worst-case evaluation complexity of nonlinear (possibly nonconvex) smooth optimization (for the nonconvex case only, see [ 7, 3, 5 9, 2 3, 34 36, 39, 43 47] among others). In general terms, this analysis aims at giving (sometimes sharp) bounds on the number of evaluations of a minimization problem s functions (objective and constraints, if relevant) and their derivatives that are, in the worst case, necessary for certain algorithms to find an approximate critical point for the unconstrained, convexly-constrained or general nonlinear optimization problem. It is not uncommon that such algorithms may involve possibly extremely costly internal computations, provided the number of calls to the problem functions is ept as low as possible. Mathematical Institute, Oxford University, Oxford OX2 6GG, Great Britain. coralia.cartis@maths.ox.ac.u. Numerical Analysis Group, Rutherford Appleton Laboratory, Chilton OX QX, Great Britain. nic.gould@stfc.ac.u. Namur Center for Complex Systems (naxys) and Department of Mathematics, University of Namur, 6, rue de Bruxelles, B-5 Namur, Belgium. philippe.toint@unamur.be.
3 Cartis, Gould, Toint: high-order optimality and complexity 2 Most of the research to date focusses on finding first, second and, in one case, third-order critical points. Evaluation complexity for first-order critical point was first investigated, for the unconstrained case, by Nesterov in [38] and for first- and second-order Nesterov and Polya [39] and by Cartis, Gould and Toint in [2]. Third-order critical points were studied in [], motivated by highly-nonlinear problems in machine learning. However, the analysis of evaluation complexity for orders higher than three is missing both concepts and results. Thepurposeofthepresentpaperistoimproveonthissituationintwoways. Thefirstisto review optimality conditions of arbitrary orders q for convexly-constrained minimization problems, and the second is to describe a theoretical algorithm whose behaviour provides, for this class of problems, the first evaluation complexity bounds for such arbitrary orders of optimality. The paper is organized as follows. After the present introduction, Section 2 discusses some preliminary results on tensor norms, a generalized Cauchy-Schwarz inequality and high-order error bounds from Taylor series. Section 3 investigates optimality conditions for convexlyconstrained optimization, while Section 4 proposes a trust-region based minimization algorithm for solving this class of problems and analyzes its evaluation complexity. An example is introduced in Section 5 to show that the new evaluation complexity bounds are essentially sharp. A final discussion is presented in Section 6. 2 Preliminaries 2. Basic notations In what follows, y T x denotes the Euclidean inner product of the vectors x and y of IR n and x = (x T x) /2 is the associated Euclidean norm. If T and T 2 are tensors, T T 2 is their tensor product. B(x, ), the ball of radius centered at x. If X is a closed set, X denotes its boundary and X denotes its interior. The vectors {e i } n i= are the coordinate vectors in IR n. The notation λ min [M] stands for the leftmost eigenvalue of the symmetric matrix M. If {a } and {b } are two infinite sequences of non-negative scalars converging to zero, we say that a = o(b ) if and only if lim a /b =. The normal cone to a general convex set C at x C is defined by N C (x) def = {s IR n s T (z x) for all z C} and its polar, the tangent cone to F at x, by T C (x) = N C(x) def = {s IR n s T v for all v N C }. Note that C T C (x) for all x C. We also define P C [ ] be the orthogonal projection onto C and use the Moreau decomposition [37] which states that, for every x C and every y IR n y = P TC (x)[y]+p NC (x)[y] and (P TC (x)[y] x) T (P NC (x)[y] x) =. (2.) (See [2, Section 3.5] for a brief introduction of the relevant properties of convex sets and cones, or to [32, Chapter 3] or [42, Part I] for an in-depth treatment.)
4 Cartis, Gould, Toint: high-order optimality and complexity Tensor norms and generalized Cauchy-Schwartz inequality We will mae substantial use of tensors and their norms in what follows, and thus start by establishing some concepts and notation. If the notation T[v,...,v j ] stands for the tensor of order q j resulting from the application of the q-th order tensor T to the vectors v,...,v j, the (recursively induced () ) Euclidean norm q on the space of q-th order tensors is the given by def T q = max T[v,...,v q ]. (2.2) v = = v q = (Observe that this value is always non-negative since we can flip the sign of T[v,...,v q ] by flipping that of one of the vectors v i.) Note that the definition (2.2) implies that T[v,...,v j ] q j = max T[v,...,v j ][w,...,w q j ] w = = w q j = ( [ ]) ( ) = max T v w = = w q j = v,..., v j j v j,w,...,w q j v i i= ( ) ( ) j max T[w,...,w q ] v i w = = w q = i= j = T q. v i, i= (2.3) a simple generalization of the standard Cauchy-Schwartz inequality for order- tensors (vectors) and of Mv M v which is valid for induced norms of matrices (order-2 tensors). Observe also that perturbation theory (see [33, Th. 7]) implies that T q is continuous as a function of T. If T is a symmetric tensor of order q, define the q-ernel of the multilinear q-form as T[v] q def = T[v,...,v] }{{} q times er q [T] def = {v IR n T[v] q = } (see [8,9]). Note that, in general, er q [T] is a union of cones. Interestingly, the q-ernels are not only unions of cones but also subspaces for q =. However this is not true for general q-ernels, since both (,) T and (,) T belong to the 2-ernel of the symmetric 2-form x x 2 on IR 2, but not their sum. We also note that, for symmetric tensors of odd order, T[v] q = T[ v] q and thus that min d T[d]q = min d ( T[ d]q ) = min d ( T[d]q ) = max d T[d]q, (2.4) where we used the symmetry of the unit ball with respect to the origin to deduce the second equality. () That it is the recursively norm induced by the standard Euclidean norm results from the observation that [ ] max T[v,...,vq] = max v = = v q = v q = max T[v,...,vq ] v = = v q = [v q].
5 Cartis, Gould, Toint: high-order optimality and complexity High-order error bounds from Taylor series The tensors considered in what follows are symmetric and arise as high-order derivatives of the objective function f. For the p-th derivative of a function f : IR n IR to be Lipschitz continuous on the set S IR n, we require that there exists a constant L f,p such that, for all x,y S, p xf(x) p xf(y) p L f,p x y (2.5) where p xh(x) is the p-th order symmetric derivative tensor of h at x. Let T f,p (x,s) denote (2) the p-th order Taylor-series approximation to f(x + s) at some x IR n given by and consider the Taylor identity φ() t () = T f,p (x,s) def = f(x)+ ( )! p j= j! j xf(x)[s] j (2.6) ( ξ) [φ () (ξ) φ () ()]dξ (2.7) involvingagivenunivariatec function φ(α) and its -th order Taylor approximation t (α) = i= φ(i) ()α i /i! expressed in terms of the ith derivatives φ i, i =,...,. Let x,s IR n. Then, picing φ(α) = f(x+αs) and = q, it follows immediately from the fact that t p () = T f,p (x,s), the identity (2.3), (2.5), (2.6) and (2.7) imply that, for all x,s IR n, ( ξ) p dξ = p, (2.8) f(x+s) T f,p (x,s)+ (p )! [ ( ξ) p T f,p (x,s)+ (p )! ( ξ) p p xf(x+ξs)[s] p p xf(x)[s] p dξ ] dξ max ξ [,] p xf(x+ξs)[s] p p xf(x)[s] p T f,p (x,s)+ p! s p max ξ [,] p xf(x+ξ) p xf(x) p (2.9) = T f,p (x,s)+ L f,p p! s p+. Similarly, f(x+s) T f,p (x,s) p! s p max ξ [,] p xf(x+ξ) p xf(x) p T f,p (x,s) L f,p s p+. (2.) p! Inequalities (2.9) and (2.) will be useful in our developements below, but immediately note that they in fact depend only on the weaer requirement that max ξ [,] p xf(x+ξs) p xf(x) p L f,p s, (2.) for all x and s of interest, rather than relying on (2.5). (2) Unfortunately, double indices are necessary for most of our notation, as we need to distinguish both the function to which the relevant quantity is associated (the first index) and its order (the second index).
6 Cartis, Gould, Toint: high-order optimality and complexity 5 3 Unconstrained and convexly constrained problems The problem we wish to solve is formally described as minf(x) (3.) x F where we assume that f : IR n IR is q-times continuously differentiable and bounded from below, and that f has Lipschitz continuous derivatives of orders to q. We also assume that the feasible set F is closed, convex and non-empty. Note that this formulation covers unconstrained optimization (F = IR n ), as well as standard inequality (and linear equality) constrained optimization in its different forms: the set F may be defined by simple bounds, and/or by polyhedral or more general convex constraints. We are tacitly assuming here that the cost of evaluating values and derivatives of the constraint functions possibly involved in the definition of F is negligible. 3. High-order optimality conditions Given that our ambition is to wor with high-order model, it seems natural to aim at finding high-order local minimizers. As is standard, we say that x is a local minimizer of f if and only if there exists a (sufficiently small) neighbourhood B of x such that f(x) f(x ) for all x B F. (3.2) However, we must immediately remember important intrinsic limitations. These are examplified by the smooth two-dimensional problem { ) min x IR 2f(x) = x 2 (x 2 e /x2 if x, (3.3) x 2 2 if x =, which is a simplified version of a problem stated by Hancoc nearly a century ago [3, p. 36], itself a variation of a famous problem stated even earlier by Peano [4, Nos ]. The contour lines of its objective function are shown in Figure 3. on the following page. The first conclusion which can be drawn by examining this example is that, in general, assessing that a given point x (the origin in this case) is a local minimizer needs more that verifying that every direction from this point is an ascent direction. Indeed, this latter property holds in the example, but the origin is not a local minimizer (it is a saddle point). This is caused by the fact that objective function decrease may occur along specific curves starting from the point under consideration, and these curves need not be lines (such as x(α) = +αe e /2α2 e for α in the example). The second conclusion is that the characterization of a local minimizer cannot always be translated into a set of conditions only involving the Taylor expansion of f at x. In our example, the difficulty arises because the coefficients of the Taylor s expansion of e /x2 at x all vanish as x approaches the origin, and therefore that the (non-)minimizing nature of this point cannot be determined from the values of these coefficients. Thus the gap between necessary and sufficient optimality conditions cannot be closed if one restricts one s attention to using derivatives of the objective function at a putative solution of problem (3.).
7 Cartis, Gould, Toint: high-order optimality and complexity Figure 3.: The contour lines of the objective function in (3.3) Note that worse situations may also occur, for instance if we consider the following variation on Hancoc simplified example (3.3): { ) min x IR 2f(x) = x 2 (x 2 sin(/x)e /x2 if x, (3.4) x 2 2 if x =, for which no continuous descent path exists in a neighbourhood of the origin despite the origin not being a local minimizer. 3.. Necessary conditions for convexly constrained problems The above examples show that fully characterizing a local minimizer in terms of general continuous descent paths is in general impossible. However, the fact that no such path exists remains a necessary condition for such points, even if Hancoc s example shows that these paths may not be amenable to a characterization using path derivatives. In what follows, we therefore propose derivative-based necessary optimality conditions by focussing on a specific (yet reasonably general) class of descent paths x(α) of the form x(α) = x + q α i s i +o(α q ) (3.5) where α >. Define the q-th order descriptor set of F at x by D q def F (x) = { } q (s,...,s q ) IR n q x+ α i s i F for all α ς ς> i= i= (3.6) Note that D F (x) = T F(x), the standard tangent cone to F at x. We say that a feasible curve (3) x(α) is tangent to D q F (x) if (3.5) holds for some (s,...,s Q ) D q F (x). (3) Or arc, or path.
8 Cartis, Gould, Toint: high-order optimality and complexity 7 Note that the definition (3.6) implies that s i T F (x ) for i {,...,u} (3.7) where s u is the first nonzero s l. We now consider some conditions that preclude the existence of feasible descent paths of the form (3.5). These conditions involve the index sets P(j,) defined, for j, by P(j,) def = {(l,...,l ) {,...,j} For j 4, these are given by Table 3.2. l i = j}. (3.8) j {()} 2 {(2)} {(,)} 3 {(3)} {(,2),(2,)} {(,,)} 4 {(4)} {(,3),(2,2),(3,))} {(,,2),(,2,),(2,,)} {(,,,)} i= Table 3.2: The sets P(j,) for j 4 We now state necessary conditions for x to be a local minimizer. Theorem 3. Suppose that f is q times continuously differentiable in an open neighbourhood of x and that x is a local minimizer for problem (3.). If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc (3.5) starting from x is tangent to D q F (x ). Then, for j {,...,q}, the inequality j =! (l,...,l ) P(j,) xf(x )[s l,...,s l ] (3.9) holds for all (s,...,s j ) D j F (x ) such that, for i {,...,j }, i =! (l,...,l ) P(i,) xf(x )[s l,...,s l ] =. (3.) Proof. Consider feasible paths of the form (3.5). Substituting this relation in the expression f(x(α)) f(x ) (given by (3.2)) and collecting terms of equal degree in α, we obtain that, for sufficiently small α, ( ) q j f(x(α)) f(x ) = α j! xf(x )[s l,...,s l ] +o(α q ) (3.) j= = (l,...,l ) P(j,)
9 Cartis, Gould, Toint: high-order optimality and complexity 8 where P(i,) is defined in (3.8). For this to be true, we need each coefficient of α j to be non-negative on the zero set of the coefficients,...,j, subject to the requirement that the arc (3.5) must be feasible for α sufficiently small. Assume now that s T and that (3.) holds for i =. This latter condition request s to be in the zero set of the coefficient in α in (3.), that is s T er [ xf(x )]. Then the coefficient of α 2 in (3.) must be non-negative, which yields, using P(2,) = {(2)}, P(2,2) = {()} (see Table 3.2), that which is (3.9) for q = 2. xf(x )[s 2 ]+ 2 2 xf(x )[s ] 2. (3.2) We may then proceed in the same manner for all coefficients up from order 3 to q, each time considering them in the zero set of the previous coefficients (that is (3.)) and verify that (3.) directly implies (3.9). Following a long tradition, we say that x is a q-th order critical point for problem (3.) if the conclusions of this theorem hold for j {,...,q}. Of course, a q-th order critical point need not be a local minimizer, but every local minimizer is a q-th order critical point. Note that the constraint qualification assumption automatically holds if F is defined by a set of explicit polynomial inequalities and/or linear equations. Also note that, as the order j grows, (3.9) may be interpreted as imposing a condition on s j (via xf(x )[s j ]), given the directions {s i } j i= satisfying (3.). In more general situations, the fact that conditions (3.9) and (3.) not only depends on the behaviour of the objective function in some well-chosen subspace, but involves the geometry of the all possible feasible arcs maes the second-order condition (3.2) difficult to use, particular in the case where F IR n. In what follows we discuss, as far as we currently can, two resulting questions of interest.. Are they cases where these conditions reduce to checing homogeneous polynomials involving the objective function s derivatives on a subspace? 2. If that is not the case, are they circumstances in which not only the complete left-hand side of (3.) vanishes, but also each term of this left-hand side? We start by deriving useful consequences of Theorem 3.. Corollary 3.2 Suppose that the assumptions of Theorem 3. hold and let N be the normal cone to F at x and T the corresponding tangent cone. Then xf(x ) N (3.3) and Moreover s T span { xf(x ) } T and s 2 T. xf(x )[s i ] (i =,2). (3.4)
10 Cartis, Gould, Toint: high-order optimality and complexity 9 Proof. First observe that (3.9) for j = reduces to xf(x )[s ] = for all s T. Thus (3.3) holds. Also note that (3.9)-(3.) impose that s T er [ xf(x )] = T span { xf(x ) }, (3.5) which, because of (3.3) and the polarity of N and T, yields that s belongs to T. Assume now that s 2 T. Then, for all α sufficiently small, αs +α 2 s 2 does not belong to T and thus x(α) = x +αs +α 2 s 2 +o(α 2 ) cannot belong to F, which is a contradiction. Hence s 2 T and (3.4) follows for i = 2, while it follows from s T and (3.3) for i =. The first-order necessary condition (3.3) is well-nown for general first-order minimizers (see [4, Th. 2.9, p. 353] for instance). Consider now the second-order conditions (3.2). If F = IR n (or if the convex constraints are inactive at x ), then xf(x ) = because of (3.3) and (3.2) is nothing but the familiar condition that the Hessian of the objective function must be positive semi-definite. If x happens to lie on the boundary of F and xf(x ), (3.2) indicates that the effect of the curvature of the boundary of F may be represented by the term xf(x )[s 2 ], which is non-negative because of (3.4). Consider, for example, the problem min x F IR 2x where F = {x IR n x 2 x2 2 }, whose global solution is at the origin. In this case it is easy to chec that xf() = e N = span{ e }, that 2 xf() =, and that second-order feasible arcs of the form (3.5) with x() = may be chosen with s = ±e 2 and s 2 = βe where β. This imposes 2 2 xf()[s ] 2, which (unsurprisingly) holds. Interestingly, there are cases where the geometry of the set of locally feasible arcs is simple and manageable. In particular, suppose that the boundary of F is locally polyhedral. Then, given xf(x ), either T span { xf(x ) } =, in which case conditions (3.9) and (3.) are void, or there exists d in that subspace. It then possible to define a locally feasible arc with s = d and s 2 = = s q =. As a consequence, the smallest possible value of xf(x )[s 2 ] for feasible arcs starting from x is identically zero and this term therefore vanishes from (3.9)-(3.). Morever, because of the definition of P(, j) (see Table 3.2), all terms but that in j xf(x )[s ] j also vanish from these conditions, which then simplify to ( j ) j xf(x )[s ] j for all s T er i [ i xf(x )] (3.6) for j = 2,...,q, which is a condition only involving subspaces and (for i 2) cones (for i 3) Necessary conditions for unconstrained problems Consider now the case where x belongs to F, which is obviously the case if the problem is unconstrained. Then we have that the path (3.5) is unrestricted, D q F (x ) = IR n, and one is then free to choose the vectors {s i } q i= (and their sign) arbitrarily. Note first that, since N = {}, (3.3) implies that, unsurprisingly, xf(x ) =. i=
11 Cartis, Gould, Toint: high-order optimality and complexity For the second-order condition, we obtain from (3.9), again unsurprisingly, that, because er [ xf(x ] = IR n, 2 xf(x ) is positive semi-definite on IR n. Hence, if there exists a vector s er 2 [ 2 xf(x )], we have that [ 2 xf(x )] 2s = and therefore that 2 xf(x )[s,s 2 ] = for all s 2 IR n. Thus the term for = vanishes from (3.9), as well as all terms involving 2 xf(x ) applied to a vector s er 2 [ 2 xf(x )]. This implies in particular that the third-order condition may now be written as 3 xf(x )[s ] 3 = for all s er 2 [ 2 xf(x )], (3.7) where the equality is obtained by considering both s and s. Unfortunately, complications arise with fourth-order conditions, even when the objective function is a polynomial. Consider the following variant of Peano s [4] problem: min x IR 2f(x) = x2 2 κ x 2 x 2 +κ 2 x 4, (3.8) where κ and κ 2 are parameters. Then one can verify that ( ) xf() =, 2 xf() = 2 and Hence { [ 3 2κ for (i,j,) {(,2,),(,,2),(2,,)} xf()] ij = otherwise, { [ 4 24κ2 for (i,j,,l) = (,,,) xf()] ijl = otherwise. er [ xf()] = IR 2, er 2 [ 2 xf()] = span{e }, and er 3 [ 3 xf()] = span{e } span{e 2 }. The necessary condition (3.9) then state that, if the origin is a minimizer, then, using the arc defined by s = e and s 2 = 2 κ e 2, 2 2 xf()[s 2 ] xf()[s,s,s 2 ] xf()[s ] 4 = 4 κ2 2 κ2 +κ 2 = 4 κ2 +κ 2. This shows that the condition 4 xf(x )[s ] 4 on 3 i= eri [ i xf(x )], although necessary, is arbitrarily far away from the weaer necessary condition 2 2 xf()[s 2 ] xf()[s,s,s 2 ] xf()[s ] 4 (3.9) when κ grows. As was already the case for problem (3.3), the example for κ = and κ 2 = 2, say, shows that a function may admit a saddle point (x = ) which is a maximum (x = ) along a curve (x 2 = 3 2 x2 in this case) while at the same time be minimal along every line passing through x. Figure 3.2 shows the contour lines of the objective function of (3.8) for increasing values of κ 2, eeping κ = 3. One may attribute the problem that not every term in (3.9) vanishes to the fact that switching signs of s or s 2 does imply that any of the terms in (3.9) is zero (as we have verified) because of the terms 2 xf()[s 2 ] 2 and 4 xf()[s ] 4. Is this a feature of even orders
12 Cartis, Gould, Toint: high-order optimality and complexity *3mm *3mm Figure 3.2: The contour lines of (3.8) for κ = 3 and κ 2 = 2 (left), κ 2 = 2.25 (center) and κ 2 = 2.5 (right) only? Unfortunately, this not the case for q = 7. Indeed is it not difficult to verify that the terms whose multi-index (l,...,l ) is a permutation of (,2,2,2) belong to P(7,4) and those whose multi-index is a permutation of (,,,,,2) belong to P(7,6). Moreover, the contribution of these terms to the sum (3.9) cannot be distinguished by varying s or s 2, for instance by switching their signs as this technique yields only one equality in two unnowns. In general, we may therefore conclude that (3.9) must involve a mixture of terms with derivative tensors of various degrees A sufficient condition Despite the limitations we have seen when considering the simplified Hancoc example, we may still derive a sufficient condition for x to be an isolated minimizer, which is inspired by the standard second-order case (see Theorem 2.4 in Nocedal and Wright [4] for instance). Theorem 3.3 Suppose that f is q times continuously differentiable in an open neighbourhood of x F. If x F, suppose also that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D p F (x ). Let T be the tangent cone to F at x. If there exists an q [,p ] such that, for all s T, i xf(x )[s] i = (i =,...,q) and q+ x f(x )[s] q+ >, (3.2) then x is an isolated minimizer for problem (3.). Proof. The second part of condition(3.2) and the continuity of the(q+)-th derivative imply that q+ x f(z)[s] q+ > (3.2) for all s T and all z is a sufficiently small feasible neighbourhood of x. Now, using Taylor s expansion, we obtain that, for all s T and all τ (,), f(x +τs) f(x ) = q i= τ i i! i xf(x )[s] i + τq+ (q +)! q+ x f(z)[s] q+
13 Cartis, Gould, Toint: high-order optimality and complexity 2 for some z [x,x +τs]. If τ is sufficiently small, then this equality, the first part (3.2) and (3.2) ensure that f(x +τs) > f(x ). Since this strict inequality holds for all s T and all τ sufficiently small, x must be a feasible isolated minimizer. Observe that, in Peano s example (see (3.8) with κ = 3 and κ 2 = 2), we have that the curvature of the objective function is positive along every line passing through the origin, but that the order of the curvature varies with s (second order along s = e 2 and fourth order along s = e ), which precludes applying Theorem An approach using Taylor models As already noted, the conditions expressed in Theorem 3. may, in general, be very complicated to verify in an algorithm, due to their dependence on the geometry of the set of feasible paths. To avoid this difficulty, we now explore a different approach. Let us now define, for some (,] and some j {,...,p}, φ def f,j (x) = f(x) globmint f,j (x,d), (3.22) x+d F d the smallest value of the j-th order Taylor model T f,j (x,s) achievable by a feasible point at distance at most from x. Note that φ f,j (x) is a continuous function of x and for given F and f (see [33, Th. 7]). The introduction of this quantity is in part motivated by the following theorem. Theorem 3.4 Suppose that f is q times continuously differentiable in an open neighbourhood of x. If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D q F (x). Define Z f,j def F (x) = {(s,...,s j ) D j F (x) (s,...,s i ) satisfy (3.) (at x) for i {,...,j } }. Then φ f,j lim (x) j = implies that (3.9) holds (at x) for all (s,...,s j ) Z f,j F (x). Proof. We start by rewriting the power series (3.) for degree j, for any given arc x(α) tangent to D j F (x) in the form f(x(α)) f(x) = j c i α i +o(α j ) = T f,j (x,s(α)) f(x) (3.23) i= where s(α) def = x(α) x and c i def = i =! (l,...,l ) P(i,) xf(x)[s l,...,s l ],
14 Cartis, Gould, Toint: high-order optimality and complexity 3 and where the last equality in (3.23) holds because f and T f,j share the first j derivatives at x. This reformulation allows us to write that, for i {,...,j}, c i = d i [ Tf,j i! dα i (x,s(α)) f(x) ]. (3.24) α= Assume now there exists an (s,...,s j ) Z f,j F (x) such that (3.9) does not hold. In the notation just introduced, this means that, for this particular (s,...,s j ), Then, from (3.24), c i = for i {,...,j } and c j <. d i [ Tf,j dα i (x,s(α)) f(x) ] = for i {,...,j }, (3.25) α= and thus the first (j ) coefficients of the polynomial T f,j (x,s(α)) f(x) vanish. Thus, using (3.23), d j [ Tf,j dα j (x,s(α)) f(x) ] T f,j (x,s(α)) f(x) = j! lim α= α α j. (3.26) Now let i be the index of the first nonzero s i. Note that i {,...,j} since otherwise the structure of the sets P(i,) implies that c j =. Observe also that we may redefine the α parameter to α s i /i so that we may assume, without loss of generality that s i =. As a consequence, we obtain that, for sufficiently small α, s(α) 3 2 αi 3 α. (3.27) 2 Hence, successively using the facts that c j <, that (3.24) and (3.26) hold for all arcs x(α) tangent to D q F (x), and that (3.27) and (3.22) hold, we may deduce that < c j j! j! f(x) T f,j (x,s(α)) lim α α j f(x) T f,j (x,s(α)) ( 3 2 )j lim α s(α) j f(x) T f,j (x,s(α)) = ( 3 2 )j lim s(α) s(α) j ( 3 2 )j lim φ f,j (x) j. The conclusion of the theorem immediately follows since lim φ f,j (x) j =. This theorem has a useful consequence.
15 Cartis, Gould, Toint: high-order optimality and complexity 4 Corollary 3.5 Suppose that f is q times continuously differentiable in an open neighbourhood of x. If x F, suppose furthermore that a constraint qualification holds in the sense that every feasible arc starting from x is tangent to D q F (x). Then x is a q-th order critical point for problem (3.) if φ f,j lim (x) j = for j {,...,q}. (3.28) Proof. We successively apply Theorem 3.4 q times and deduce that x is a j-th order critical point for j =,...,q. This last result says that we may avoid the difficulty of dealing the the possibly complicated geometry of D q F (x) if we are ready to perform the global optimization occurring in (3.22) exactly and find a way to compute or overestimate the limit in (3.28). Although this is a positive conclusion, these two remaining challenges remain daunting. However, it worthwhile noting that the standard approach to computing first- second- and third-order criticality measures for unconstrained problems follows the exact same approach. In the first-order case, it is easy to verify that xf(x) = [ ] [ min d xf(x)[d] = ( ) ] f(x) globmin f(x)+ xf(x)[d] d where the first equality is justified by the convexity of xf(x)[d] as a function of d. Because the left-hand side of the above relation is independent of, the computation of the limit (3.28) for tending to zero is trivial when j = and the limiting value is x f(x). For the second-order case, assuming xf(x) =, [ [ )] min,λ min [ xf(x)]] 2 = 2 2 globmin d 2 xf(x)[d] 2 = 2 [f(x) globmin d ( f(x)+ xf(x)[d]+ 2 2 xf(x)[d] 2)], (3.29) the first global optimization problem being easily solvable by a trust-region-type calculation [2, Section 7.3] or directly by an equivalent eigenvalue analysis. As for the first-order case, the left-hand side of the equation is independent of and obtaining the limit for tending to zero is trivial. Finally, if M(x) def = er[ xf(x)] er[ 2 xf(x)] and P M(x) is the orthogonal projection onto that subspace, P M(x) ( 3 xf(x)) = 6 3 [ min d xf(x)[d] ] = 3 [ f(x) globmin d ( f(x)+ xf(x)[d]+ 2 2 xf(x)[d] xf(x)[d] 3 )] (3.3) where the first equality results from(2.2). In this case, the global optimization in the subspace M(x) is potentially harder to solve exactly (a randomization argument is used in [] to derive
16 Cartis, Gould, Toint: high-order optimality and complexity 5 a upper bound on its value), although it still involves a subpace (4). While we are unaware of a technique for maing the global minimization in (3.22) easy in the even more complicated general case, we may thin of approximating the limit in (3.28) by choosing a (user-supplied) value of > small enough (5) and consider the size of the quantity (x). (3.3) jφ f,j Unfortunately, it is easy to see that, if is fixed at some positive value, a zero value of φ f,j (x) alone is not a necessary condition for x being a local minimizer. Indeed consider the univariate problem of minimizing f(x) = x 2 ( αx) for α >. One verifies that, for any >, the choice α = 2/ yields that φ f, () =, φ f,2 () = but φ f,3 () = 4 >, (3.32) α2 despite being a local (but not global) minimizer. As a matter of fact, φ f,j (x) gives more information than the mere potential proximity of a j-th order critical point: it is able to see beyond an infinitesimal neighbourhood of x and provides information on possible further descent beyond such a neighbourhood. Rather than a true criticality measure, it can be considered, for fixed, as an indicator of further progress, but its use for terminating at a local minimizer is clearly imperfect. Despite this drawbac, the above arguments would suggest that it is reasonable to consider a (conceptual) minimization algorithm whose objective is to find a point x ǫ such that φ f,j (x ǫ) ǫ j for j =,...,q (3.33) for some (,] sufficiently small and some q {,...,p}. This condition implies an approximate minimizing property which we mae more precise by the following result. Theorem 3.6 Suppose that f is q times continuously differentiable and that q xf is Lipschitz continous with constant L f,q (in the sense of (2.5)) in an open neighbourhood of x ǫ of radius larger than. Suppose also (3.33) holds for j = q. Then ( ) p!ǫ f(x ǫ +d) f(x ǫ ) 2ǫ q q q+ for all x ǫ +d F such that d min. L f,p (3.34) Proof. Consider x+d F. Using the triangle inequality, we have that f(x ǫ +d) = f(x ǫ +d) T f,q (x ǫ,d)+t f,q (x ǫ,d) f(x ǫ +d) T f,q (x ǫ,d) +T f,q (x ǫ,d). (3.35) Now, condition (3.33) for j = q, implies that, if d, T f,q (x ǫ,d) T f,q (x ǫ,) ǫ q = f(x ǫ ) ǫ q. (3.36) (4) We saw in Section 3..2 that q = 3 is the highest order for which this is possible. (5) Note that a small has the advantage of limiting the global optimization effort.
17 Cartis, Gould, Toint: high-order optimality and complexity 6 Hence, substituting (2.) and (3.36) into (3.35) and using again that d <, we deduce that f(x ǫ +d) f(x ǫ ) L f,p d q+ ǫ q, p! and the desired result follows. The size of the neighbourhood of x ǫ where f is locally smallest in that the first part (3.34) holds therefore increases with the criticality order q, a feature potentially useful in various contexts such as global optimization. Before turning to more algorithmic aspects, we briefly compare the results of Theorem 3.6 which what can be deduced on the local behaviour of the Taylor series T f,q (x,s) if, instead of requiring the exact necessary condition (3.9) to hold exactly, this condition is relaxed to j! xf(x )[s l,...,s l ] ǫ (3.37) = (l,...,l ) P(j,) while insisting that (3.) should hold exactly. If j = q =, it is easy to verify that (3.37) for s T is equivalent to the condition that P T [ xf(x )] ǫ, (3.38) from which we deduce, using the Cauchy-Schwarz inequality, that T f, (x,s) T f, (x,) ǫ (3.39) for all s T with d, that is (3.33) for j =. Thus, by Theorem 3.6, we obtain that (3.34) holds for j =. 4 A trust-region minimization algorithm Aware of the optimality conditions and their limitations, we may now consider an algorithm to achieve (3.33). This objective naturally suggests a trust-region (6) formulation with adaptative model degree, in which the user specifies a desired criticality order q, assuming that derivatives of order,...,q are available when needed. We made this idea explicit in Algorithm 4. on the next page. We first state a useful property of Algorithm 4., which ensures that a fixed fraction of the iterations,2,..., must be either successful or very successful. Indeed, if we define the following bound holds. S def = {l {,...,} ρ l η }, (6) A detailed account and a comprehensive bibliography on trust-region methods can be found in [2].
18 Cartis, Gould, Toint: high-order optimality and complexity 7 Algorithm 4.: Trust-region algorithm using adaptive order models for convexly-contrained problems (TRq) Step : Initialization. A criticality order q, an accuracy threshold ǫ (, ], a starting point x and an initial trust-region radius [ǫ,] are given, as well as algorithmic parameters max [,], γ γ 2 < γ 3 and < η η 2 <. Compute x = P F [x ], evaluate f(x ) and set =. Step : Step computation. For j =,...,q,. Evaluate j f(x ) and compute φ f,j (x ) from (3.22). 2. If φ f,j (x ) > ǫ j, go to Step 3 with s = d, where d is the argument of the global minimum in the computation of φ f,j (x ). Step 2: Termination. Terminate with x ǫ = x and ǫ =. Step 3: Accept the new iterate. Compute f(x +s ) and ρ = f(x ) f(x +s ) T f,j (x,) T f,j (x,s ). (4.) If ρ η, set x + = x +s. Otherwise set x + = x. Step 4: Update the trust-region radius. Set [γ,γ 2 ] if ρ < η, + [γ 2, ] if ρ [η,η 2 ), [,min( max,γ 3 )] if ρ η 2, (4.2) increment by one and go to Step.
19 Cartis, Gould, Toint: high-order optimality and complexity 8 Lemma 4. Assume that min for some min > independent of. Then Algorithm 4. ensures that, whenever S, ( def κ u S, where κ u = + logγ ) ( ) 3 + logγ 2 logγ 2 log. (4.3) min Proof. The trust-region update (4.2) ensures that γ U 2 γ S 3, where U = {,...,} \ S. This inequality then yields (4.3) by taing logarithms and using that S and = S + U. 4. Evaluation complexity for Algorithm 4. We start our worst-case analysis by formalizing our assumptions. Let L f def = {x+z IR n x F, f(x) f(x ) and z max }. AS. AS.2 AS.3 The feasible set F is closed, convex and non-empty. The objective function f is q times continuously differentiable on an open set containing L f. For j {,...,q}, the j-th derivative of f is Lipschitz continuous on L f (in the sense of (2.5)) with Lipschitz constant L f,j. For simplicity of notation, define L f def = max j {,...,q} L f,j. Algorithm 4. is required to start from a feasible x F, which, together with the fact that the subproblem solution in Step 2 involves minimization over F, leads to AS.. Note that AS.3 automatically holds if f is q+ times continuously differentiable and F is bounded. It is also important to note that we could replace AS.3 by the condition that (2.) holds on the path of iterates [x,x + ] without altering any of the proofs below. While this weaer formulation may be useful, we prefer to use AS.3 in the sequel of this paper because it is independent of the sequence of iterates and may be easier to verify a priori, given the problem (3.). Lemma 4.2 Suppose that AS.2 and AS.3 hold. Then, for all l {,...,}, l κ ǫ (4.4) where [ def κ = min, γ ] ( η 2 ). (4.5) L f
20 Cartis, Gould, Toint: high-order optimality and complexity 9 Proof. Assume that, for some l {,...,} From (4.), we obtain that, for some j {,...,q}, l η 2 L f ǫ. (4.6) ρ l = f(x l +s l ) T f,j (x l,s l ) T f,j (x l,) T f,j (x l,s l ) < L f s l j+ j!ǫ j l L f l j!ǫ ( η 2 ), where we used (2.9) and the fact that φ l f,j (x l) > ǫ j l to deduce the first inequality, the bound s l l to deduce the second, and (4.6) with j to deduce the third. Thus ρ l η 2 and l+ l. The mechanism of the algorithm and the inequality ǫ then ensures that, for all l, [ l min, γ ] ( η 2 )ǫ κ ǫ. L f We now derive a simple lower bound on the objective function decrease at successful iterations. Lemma 4.3 Suppose that AS. AS.3 hold. Then, if is the index of a successful iteration before termination, f(x ) f(x + ) η κ ǫ q+. (4.7) Proof. We have, using (4.), the fact that φ f,j (x ) > ǫ j (4.4) successively, that for some j {,...,q} and f(x ) f(x + ) η [T f,j (x,) T f,j (x,s )] = η φ f,j (x ) > η κ ǫ j+ η κ ǫ q+. Our worst-case evaluation complexity results can now be proved by summing the decreases guaranteed by this last lemma.
21 Cartis, Gould, Toint: high-order optimality and complexity 2 Theorem 4.4 Suppose that AS. AS.3 hold. Let f low be a lower bound on f within F. Then, given ǫ (,], Algorithm 4. applied to problem (3.) needs at most κ f f(x ) f low S ǫ q+ (4.8) successful iterations(each possibly involving one evaluation of f and its q first derivatives) and at most κ u κ f f(x ) f low S ǫ q+ + (4.9) iterations in total to terminate with an iterate x ǫ such that (3.33) holds, where κ f S = [ max, η L f γ ( η 2 ) ], (4.) and κ u is given by (4.3). Moreover, if ǫ is the value of at termination, for all d such that f(x ǫ +d) f(x ǫ ) 2ǫ q ǫ (4.) ( ) x ǫ +d F and d (ǫ q ǫ) Lf q+ q+. (4.2) q! Observe that, because of (4.2) and (4.4), ǫ [κ δ ǫ, max ]. Proof. Let be the index of an arbitrary iteration before termination. Using the definition of f low, the nature of successful iterations, (4.) and Lemma 4.3, we deduce that f(x ) f low f(x ) f(x + ) = i S [f(x i ) f(x i+ )] S [κ f S ] ǫ q+ (4.3) which proves (4.8). We next call upon Lemma 4. to compute the upper bound on the total number of iterations before termination (obviously, there must be a least one successful iteration unless termination occurs for = ) and add one for the evaluation at termination. Finally, (4.)-(4.2) result from Theorem 3.6, AS.3 and the fact that φ f,q (x ǫ) ǫ q ǫ at termination. Theorem 4.4 generalizes the nown bounds for the cases where F = IR and q = [38], q = 2 [2,39] and q = 3 []. The results for q = 2 with F IR n and for q > 3 appear to be new. The latter provide the first evaluation complexity bounds for general criticality order q. Note that, if q =, bounds of the type O(ǫ (p+)/p ) exist if one is ready to minimize models of degree p > q (see [6]). Whether similar improvements can be obtained for q > remains an open question at this stage.
22 Cartis, Gould, Toint: high-order optimality and complexity 2 We also observe that the above theory remains valid if the termination rule φ,j (x ) ǫ j for j {,...,q} (4.4) used in Step is replaced by a more flexible one, involving other acceptable termination circumstances, such as if (4.4) hold or some other condition holds. We conclude this section by noting that the global optimization effort involved in the computation of φ j,j (x ) (j {,...,q}) in Algorithm 4. might be limited by choosing max small enough. 5 Sharpness It is interesting that an example was presented in [4] showing that the bound in O(ǫ 3 ) evaluations for q = 2 is essentially sharp for both the trust-region and a regularization algorithm. This is significant, because requiring φ f,2 (x) ǫ 2 is slightly stronger, for small, than the standard condition [ ] xf(x) ǫ and min,λ min [ 2 xf(x)] ǫ (5.) (used in [39] and [2] for instance). Indeed, for one-dimensional problems and assuming 2 xf(x), the former condition amounts to requiring that ( ) 2 2 xf(x)+2 x(f(x) ǫ (5.2) where the absolute value reflects the fact that s = ± depending on the sign of g. In the remainder of this section, we show that the example proposed in [4] can be extended to arbitrary order q, and thus that the complexity bounds (4.8)-(4.9) are esentially sharp for our trust-region algorithm. The idea of our generalized example is to apply Algorithm 4. to a unidimensional objective function f for some fixed q and F = IR + (hence guaranteeing AS.), generating a sequence of iterates {x } starting from the origin, i.e. x = x =. We first choose the sequences of derivatives values up to order q to be, for all, j xf(x ) = for j {,...,q } and q xf(x ) = q! ( ) q+ +δ (5.3) + where δ (,) is a (small) positive constant. This means that, at iterate x, the q-th order Taylor model is given by ( ) q+ +δ T f,q (x,s) = f(x ) s q + wherethevalueoff(x )remainsunspecifiedfornow. Thestepisthenobtainedbyminimizing this model in a trust-region of radius = ( ) q+ +δ, +
23 Cartis, Gould, Toint: high-order optimality and complexity 22 yielding that s = = As a consequence, the model decrease is given by ( ) q+ +δ (,). (5.4) + T f,q (x,) T f,q (x,s ) = ( ) q! q xf(x )s q +(q+)δ =. (5.5) + For our example, we the define the objective function decrease at iteration to be f def = f(x ) f(x +s ) = 2 (η +η 2 )[T f,q (x,) T f,q (x,s )], (5.6) thereby ensuring that ρ [η,η 2 ) and x + = x + s for each. Summing up function decreases, we may then specify the objective function s values at the iterates by f(x ) = η +η 2 ζ(+(q +)δ) and f(x + ) = f(x ) η ( ) +η 2 +(q+)δ. (5.7) where ζ(t) def = = t is the Riemann zeta function. This function is finite for all t > (and thus also for t = +(q +)δ), thereby ensuring that f(x ) for all. We also verify that ( ) + + q+ +δ = [γ 2,] +2 in accordance with (4.2), provided γ 2 ( 2 3 ) q+ +δ. Observe also that (5.3) and (5.5) ensure that, for each, φ f,j (x ) = for j {,...,q } (5.8) and ( ) φ +(q+)δ ( ) f,q (x q+ +δ ) = = q + +. (5.9) We now use Hermite interpolation to construct the objective function f on the successive intervals [x,x + ], and define where p is the polynomial f(x) = p (x x )+f(x ) for x [x,x + ] and, (5.) p (s) = 2q+ i= with coefficients defined by the interpolation conditions p () = f(x ) f(x + ), p (s ) = ; c i, s i, (5.) j sp () = = j sp (s ) for j {,...,q }, (5.2) q sp () = q xf(x ), q sp (s ) = q xf(x + ).
24 Cartis, Gould, Toint: high-order optimality and complexity 23 These conditions ensure that f(x) is q times continuously differentiable on IR + and thus that AS.2 holds. They also impose the following values for the first q + coefficients c, = f(x ) f(x + ), c j, = (j {,...,q }), c q, = q xf(x ); (5.3) and the remaining q + coefficients are solutions of the linear system s q+ s q+2... s 2q+ (q +)s q (q +2)s q+.. (q+)! (q+2)!! s 2! s (2q +)s 2q. (2q+)! (q+)! sq+ c q+, c q+2,. c 2q+, = r, (5.4) where the right-hand side is given by f q! q xf(x )s q (q )! q xf(x )s q r =.. (5.5) p xf(x )s q xf(x + ) q xf(x ) Observe now that the coefficient matrix of this linear system may be written as s q+ s q... M s q... s s q where M q def =... q + q q +. (q+)!!. (q+2)! 2!.... (2q+)! (q+)! is an invertible matrix independent of (see Appendix). Hence c q+, c q+2,. c 2q+, = s... s q M q s (q+) Observe now that, because of (5.4), (5.6), (5.5) and (5.3), s q... s (5.6) r (5.7) f = O(s q+ ), q xf(x )s q j = O(s q+ j ) (j {,...,q }) and, since q xf(x ) < q xf(x + ) <, q xf(x + ) q xf(x ) q xf(x ) q!s.
25 Cartis, Gould, Toint: high-order optimality and complexity 24 These bounds and (5.5) imply that [r ] i, the i-th component of r, satisfies [r ] i = O(s q+2 i ) for i {,...,q +}. Hence, using (5.7) and the non-singularity of M q, we obtain that there exists a constant κ q independent of such that and thus that c i, s i q κ q for i {q +,...,2q +}, (5.8) q+ s p (s) 2q+ i=q+ i! c i, s i q 2q+ i=q+ i! κ q Moreover, using successively (5.), the triangle inequality, (5.3), (5.3), (5.4), (5.8) and κ q, we obtain that, for j {,...,q}, j sp (s) = 2q+ i=j i! (i j)! c i, s i j q! (q j)! c q, s q j + 2q+ i=q+ 2q+ q! (q j)! + i! (i j)! c i, s i q i=q+ 2q+ i! κ q (i j)! i=q i! (i j)! c i, s i q s q+ j and thus all derivatives of order one up to q remain bounded on [,s ]. Because of (5.), we therefore obtain that AS.3 holds. Moreover (5.3), (5.8), the inequalities q xf(x ) q! and f(x ), (5.) and (5.4) also ensure that f(x) is bounded below. We have therefore shown that the bounds of Theorem 4.4 are essentially sharp, in that, for every δ >, Algorithm 4. applied to the problem of minimizing the lower-bounded objective function f just constructed and satisfying AS.-AS.3 will tae, because of (5.8) and (5.9), ǫ q+ +(q+)δ iterations and evaluation of f and its q first derivatives to find an iterate x such that condition (4.4) holds. Moreover, it is clear that, in the example presented, the global rate of convergence is driven by the term of degree q in the Taylor series. 6 Discussion We have analyzed the optimality conditions of order 2 and above, and proposed a measure of criticality for arbitrary order for convexly constrained nonlinear optimization problems. As this measure can be extended to define ǫ-approximate critical points of high-order, we have then used it in a conceptual trust-region algorithm to show that, if derivatives of the objective
26 Cartis, Gould, Toint: high-order optimality and complexity 25 function up to order q can be evaluated and are Lipschitz continuous, then this algorithm applied to the convexly constrained problem (3.) needs at most O(ǫ (q+) ) evaluations of f and its derivatives to compute an ǫ-approximate q-order critical point. Moreover, we have shown by an example that this bound is essentially sharp. In the purely unconstrained case, this result recovers nown results for q = (firstorder criticality for Lipschitz gradients) [38], q = 2 (second-order criticality (7) with Lipschitz Hessians) [4,39] and q = 3 (third-order criticality (8) with Lipschitz continuous third derivative) [], but extends them to arbitrary order. The results for the convexly constrained case appear to be new and provide in particular the first complexity bound for second- and third-order criticality for such inequality constrained problems. Because the condition (4.4) measure different orders of criticality, we could choose to use a different ǫ for every order (as in [4]), complicating the expression of the bound accordingly. However, as shown by our example, the worst-case behaviour of Algorithm 4. is dominated by that of q xf, which maes the distinction of the various ǫ-s less crucial. Because of the global optimization occurring in the definition of the criticality measure φ f,j (x), the algorithm discussed in the present paper remains, in general, of a theoretical nature. However there may be cases where this computation is tractable for small enough, for instance if the derivative tensors of the objective function are strongly structured. Such approaches may hopefully be of use for small dimensional or structured highly nonlinear problems, such as those occurring in machine learning using deep learning techniques (see []). The present framewor for handling convex constraints is not free of limitations, resulting from our choice to transfer difficulties associated with the original problem to the subproblem solution, thereby sparing precious evaluations of f and its derivatives. In particular, the cost of evaluating any constraint function/derivative possibly defining the convex feasible set F is neglected by the present approach, which must therefore be seen as a suitable framewor to handle cheap inequality constraint such as simple bounds. It is nown that the complexity of obtaining ǫ-approximate first-order criticalty for unconstrained and convexly-constrained problem can be reduced to O(ǫ (p+)/p ) if one is ready to define the step by using a regularization model of order p. In the unconstrained case, this was shown for p = 2 in [2,39] and for general p in [6], while the convexly constrained case was analyzed (for p = 2) in [3]. Whether this methodology and the associated improvements in evaluation complexity bounds can be extended to order above one is the subject of ongoing research. Acnowledgements The authors would lie to than Oliver Stein for suggesting reference [33], as well as Jim Bure and Adrian Lewis for interesting discussions. The third author would also lie to acnowlegge the support provided by the Belgian Fund for Scientific Research (FNRS), the Leverhulme Trust (UK), Balliol College (Oxford), the Department of Applied Mathematics of the Hong Kong Polytechnic University, ENSEEIHT (Toulouse, France) and INDAM (Florence, Italy). References [] A. Anandumar and R. Ge. Efficient approaches for escaping high-order saddle points in nonconvex optimization. arxiv , 26. (7) Using (3.29). (8) Using (3.3).
Optimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationAn example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian
An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian C. Cartis, N. I. M. Gould and Ph. L. Toint 3 May 23 Abstract An example is presented where Newton
More informationarxiv: v1 [math.oc] 3 Nov 2018
arxiv:8.0220v [math.oc] 3 Nov 208 Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints C. Cartis, N. I. M. Gould and Ph. L. Toint 3 November
More informationOptimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization
Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization C. Cartis N. I. M. Gould and h. L. Toint 9 May 27 Abstract Necessary conditions
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationConstrained Optimization Theory
Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More information1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem
SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED
More informationA recursive model-based trust-region method for derivative-free bound-constrained optimization.
A recursive model-based trust-region method for derivative-free bound-constrained optimization. ANKE TRÖLTZSCH [CERFACS, TOULOUSE, FRANCE] JOINT WORK WITH: SERGE GRATTON [ENSEEIHT, TOULOUSE, FRANCE] PHILIPPE
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationEvaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationThe Trust Region Subproblem with Non-Intersecting Linear Constraints
The Trust Region Subproblem with Non-Intersecting Linear Constraints Samuel Burer Boshi Yang February 21, 2013 Abstract This paper studies an extended trust region subproblem (etrs in which the trust region
More informationLectures 9 and 10: Constrained optimization problems and their optimality conditions
Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained
More informationA line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )
A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationMultiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions
Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions March 6, 2013 Contents 1 Wea second variation 2 1.1 Formulas for variation........................
More informationAn introduction to complexity analysis for nonconvex optimization
An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint
More informationMAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012
(Homework 1: Chapter 1: Exercises 1-7, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions
More informationComplexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationThis manuscript is for review purposes only.
1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region
More informationOptimisation non convexe avec garanties de complexité via Newton+gradient conjugué
Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More informationHow to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization
How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department
More information2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1
Lectures - Week 11 General First Order ODEs & Numerical Methods for IVPs In general, nonlinear problems are much more difficult to solve than linear ones. Unfortunately many phenomena exhibit nonlinear
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationImplications of the Constant Rank Constraint Qualification
Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationStability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games
Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,
More informationMini-Course 1: SGD Escapes Saddle Points
Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationMATH 205C: STATIONARY PHASE LEMMA
MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More information1 Lagrange Multiplier Method
1 Lagrange Multiplier Method Near a maximum the decrements on both sides are in the beginning only imperceptible. J. Kepler When a quantity is greatest or least, at that moment its flow neither increases
More informationZEROES OF INTEGER LINEAR RECURRENCES. 1. Introduction. 4 ( )( 2 1) n
ZEROES OF INTEGER LINEAR RECURRENCES DANIEL LITT Consider the integer linear recurrence 1. Introduction x n = x n 1 + 2x n 2 + 3x n 3 with x 0 = x 1 = x 2 = 1. For which n is x n = 0? Answer: x n is never
More informationArc Search Algorithms
Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where
More informationA Derivative-Free Gauss-Newton Method
A Derivative-Free Gauss-Newton Method Coralia Cartis Lindon Roberts 29th October 2017 Abstract We present, a derivative-free version of the Gauss-Newton method for solving nonlinear least-squares problems.
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More information1 Lyapunov theory of stability
M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability
More informationCONSTRAINED NONLINEAR PROGRAMMING
149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationA Proximal Method for Identifying Active Manifolds
A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the
More informationIdentifying Active Constraints via Partial Smoothness and Prox-Regularity
Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,
More informationComplexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization
Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July
More informationAnalysis II: The Implicit and Inverse Function Theorems
Analysis II: The Implicit and Inverse Function Theorems Jesse Ratzkin November 17, 2009 Let f : R n R m be C 1. When is the zero set Z = {x R n : f(x) = 0} the graph of another function? When is Z nicely
More informationLecture 3: Basics of set-constrained and unconstrained optimization
Lecture 3: Basics of set-constrained and unconstrained optimization (Chap 6 from textbook) Xiaoqun Zhang Shanghai Jiao Tong University Last updated: October 9, 2018 Optimization basics Outline Optimization
More informationSimple Examples on Rectangular Domains
84 Chapter 5 Simple Examples on Rectangular Domains In this chapter we consider simple elliptic boundary value problems in rectangular domains in R 2 or R 3 ; our prototype example is the Poisson equation
More informationOn proximal-like methods for equilibrium programming
On proximal-lie methods for equilibrium programming Nils Langenberg Department of Mathematics, University of Trier 54286 Trier, Germany, langenberg@uni-trier.de Abstract In [?] Flam and Antipin discussed
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationWe simply compute: for v = x i e i, bilinearity of B implies that Q B (v) = B(v, v) is given by xi x j B(e i, e j ) =
Math 395. Quadratic spaces over R 1. Algebraic preliminaries Let V be a vector space over a field F. Recall that a quadratic form on V is a map Q : V F such that Q(cv) = c 2 Q(v) for all v V and c F, and
More informationAn introduction to Birkhoff normal form
An introduction to Birkhoff normal form Dario Bambusi Dipartimento di Matematica, Universitá di Milano via Saldini 50, 0133 Milano (Italy) 19.11.14 1 Introduction The aim of this note is to present an
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationCubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization
Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization J. M. Martínez M. Raydan November 15, 2015 Abstract In a recent paper we introduced a trust-region
More informationA Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity
A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University
More informationCORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28
26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.
MA 796S: Convex Optimization and Interior Point Methods October 8, 2007 Lecture 1 Lecturer: Kartik Sivaramakrishnan Scribe: Kartik Sivaramakrishnan 1 Conic programming Consider the conic program min s.t.
More informationSelected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.
. Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,
More informationarxiv: v3 [math.oc] 8 Jan 2019
Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate
More informationCopyrighted Material. L(t, x(t), u(t))dt + K(t f, x f ) (1.2) where L and K are given functions (running cost and terminal cost, respectively),
Chapter One Introduction 1.1 OPTIMAL CONTROL PROBLEM We begin by describing, very informally and in general terms, the class of optimal control problems that we want to eventually be able to solve. The
More informationDuke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014
Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document
More informationA Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties
A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationLinear Algebra. Preliminary Lecture Notes
Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date April 29, 23 2 Contents Motivation for the course 5 2 Euclidean n dimensional Space 7 2. Definition of n Dimensional Euclidean Space...........
More informationElements of Convex Optimization Theory
Elements of Convex Optimization Theory Costis Skiadas August 2015 This is a revised and extended version of Appendix A of Skiadas (2009), providing a self-contained overview of elements of convex optimization
More informationCourse Summary Math 211
Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.
More informationOn the complexity of an Inexact Restoration method for constrained optimization
On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More informationOn fast trust region methods for quadratic models with linear constraints. M.J.D. Powell
DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many
More informationA line-search algorithm inspired by the adaptive cubic regularization framework, with a worst-case complexity O(ɛ 3/2 ).
A line-search algorithm inspired by the adaptive cubic regularization framewor, with a worst-case complexity Oɛ 3/. E. Bergou Y. Diouane S. Gratton June 16, 017 Abstract Adaptive regularized framewor using
More informationSmooth Structure. lies on the boundary, then it is determined up to the identifications it 1 2
132 3. Smooth Structure lies on the boundary, then it is determined up to the identifications 1 2 + it 1 2 + it on the vertical boundary and z 1/z on the circular part. Notice that since z z + 1 and z
More informationUniversità di Firenze, via C. Lombroso 6/17, Firenze, Italia,
Convergence of a Regularized Euclidean Residual Algorithm for Nonlinear Least-Squares by S. Bellavia 1, C. Cartis 2, N. I. M. Gould 3, B. Morini 1 and Ph. L. Toint 4 25 July 2008 1 Dipartimento di Energetica
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More informationLecture 5. Theorems of Alternatives and Self-Dual Embedding
IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c
More informationIn English, this means that if we travel on a straight line between any two points in C, then we never leave C.
Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More informationMcMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.
McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal
More informationOrdinary Differential Equation Introduction and Preliminaries
Ordinary Differential Equation Introduction and Preliminaries There are many branches of science and engineering where differential equations arise naturally. Now days, it finds applications in many areas
More informationc 2000 Society for Industrial and Applied Mathematics
SIAM J. OPIM. Vol. 10, No. 3, pp. 750 778 c 2000 Society for Industrial and Applied Mathematics CONES OF MARICES AND SUCCESSIVE CONVEX RELAXAIONS OF NONCONVEX SES MASAKAZU KOJIMA AND LEVEN UNÇEL Abstract.
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationTopological properties
CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological
More informationA FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS
Yugoslav Journal of Operations Research 25 (205), Number, 57 72 DOI: 0.2298/YJOR3055034A A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM FOR P (κ)-horizontal LINEAR COMPLEMENTARITY PROBLEMS Soodabeh
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More information1. Introduction. We analyze a trust region version of Newton s method for the optimization problem
SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John
More informationLecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods
Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section 4 of
More informationAccelerating the cubic regularization of Newton s method on convex problems
Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method
More informationMATH 167: APPLIED LINEAR ALGEBRA Chapter 3
MATH 167: APPLIED LINEAR ALGEBRA Chapter 3 Jesús De Loera, UC Davis February 18, 2012 Orthogonal Vectors and Subspaces (3.1). In real life vector spaces come with additional METRIC properties!! We have
More information