Local Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems

Size: px

Start display at page:

Download "Local Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems"

Delphia Booker
5 years ago
Views:

1 Local Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems Yu. Nesterov and L. Tunçel November 009, revised: December 04 Abstract In this paper, we establish the local superlinear convergence property of some polynomialtime interior-point methods for an important family of conic optimization problems. The main structural property used in our analysis is the logarithmic homogeneity of self-concordant barrier function, which must have negative curvature. We propose a new path-following predictor-corrector scheme, which work only in the dual space. It is based on an easily computable gradient proximity measure, which ensures an automatic transformation of the global linear rate of convergence to the local superlinear one under some mild assumptions. Our step-size procedure for the predictor step is related to the maximum step size maintaining feasibility. As the optimal solution set is approached, our algorithm automatically tightens the neighborhood of the central path proportionally to the current duality gap. Keywords: Conic optimization problem, worst-case complexity analysis, self-concordant barriers, polynomial-time methods, predictor-corrector methods, local superlinear convergence. Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), 34 voie du Roman Pays, 348 Louvain-la-Neuve, Belgium; nesterov@core.ucl.ac.be. The research results presented in this paper have been supported by a grant Action de recherche concertè ARC 04/09-35 from the Direction de la recherche scientifique - Communautè française de Belgique. The scientific responsibility rests with its author(s). Department of Combinatorics and Optimization, Faculty of Mathematics, University of Waterloo, Waterloo, Ontario NL 3G, Canada; ltuncel@uwaterloo.ca. Research of this author was supported in part by Discovery Grants from NSERC.

2 Local Superlinear Convergence of IPMs Introduction Motivation. Local superlinear convergence is a natural and very desirable property of many methods in Nonlinear Optimization. However, for interior-point methods the corresponding analysis is not trivial. The reason is that the barrier function is not defined in a neighborhood of the solution. Therefore, in order to study the behavior of the central path, we need to employ somehow the separable structure of the functional inequality constraints. From the very beginning [3], this analysis was based on the Implicit Function Theorem as applied to Karush-Kuhn-Tucker conditions. This tradition explains, to some extend, the delay in developing an appropriate framework for analyzing the local behavior of general polynomial-time interior-point methods [3]. Indeed, in the theory of self-concordant functions it is difficult to analyze the local structure of the solution since we have no access to the components of the barrier function. Moreover, in general, it is difficult to relate the self-concordant barrier to the particular functional inequality constraints of the initial optimization problem. Therefore, up to now, the local superlinear convergence for polynomialtime path-following methods was proved only for Linear Programming [8, 0] and for Semidefinite Programming problems [6, 6, 9, 5, 5]. In both cases, the authors use in their analysis the special boundary structure of the feasible regions and of the set of optimal solutions. In this paper, we establish the local superlinear convergence property of interior-point pathfollowing methods by employing some geometric properties of quite general conic optimization problem. The main structural property used in our analysis is the logarithmic homogeneity of selfconcordant barrier functions, and the condition that the barrier must have negative curvature. We propose a new path-following predictor-corrector scheme, which works in the dual space. It is based on an easily computable gradient proximity measure, which ensures an automatic transformation of the global linear rate of convergence to the local superlinear rate (under a mild assumption). Our step-size procedure for the predictor step is related to the maximum step size maintaining feasibility. As the iterates approach the optimal solution set, our algorithm automatically tightens the neighborhood of the central path proportionally to the current duality gap. In the literature (as we noted above) similar conditions have been imposed on the iterates of interior-point algorithms in order to attain local superlinear convergence in the Semidefinite Programming setting. However, there exist different, weaker combinations of conditions that have been studied in the Semidefinite Programming setting as well (see for instance [5, 5, 7, 8]). As a key feature of our analysis, we avoid distinguishing individual subvectors as large or small. Many of the ingredients of our approach are primal-dual symetric; however, we break the symmetry, when necessary, by our choice of assumptions and algorithms. Indeed, in general (beyond the special case of symmetric cones and self-scaled barriers), only one of the primal, dual problems admit a logarithmically homogeneous self-concordant barrier with negative curvature. Contents. The paper is organized as follows. In Section we introduce a conic primal-dual problem and define the central path. After that, we consider a small full-dimensional dual problem and define the prediction operator. We derive some representations, which help to bound the curvature of the central path. In Section 3 we justify the choice of the fixed Euclidean metrics in the primal and dual spaces. Then, in Section 4 we introduce two main assumptions ensuring the quadratic drop in the duality gap for predicted points from a tight neighborhood of the central path. The first one is on the strict dual maximum, and the second one is on the boundedness of the vector F (s)s along the central path. The main result of this section is Theorem 3 which demonstrates the quadratic decrease of the distance to the optimal solution for the prediction point, measured in an appropriately chosen fixed Euclidean norm. In Section 5, we estimate efficiency of the predictor step measured in a local norm defined by the dual barrier function. Also, we show that the local quadratic convergence can be achieved by a feasible predictor step. In Section 6 we prepare for analysis of polynomial-time predictor-corrector strategies. For that, we study an important class of barriers with negative curvature. This class includes at least the self-scaled barriers [4] and hyperbolic barriers [4,, 7], possibly many others. In Section 7 we

3 Local Superlinear Convergence of IPMs 3 establish some bounds on the growth of a variant of the gradient proximity measure. We show that we can achieve a local superlinear rate of convergence. It is important to relate the decrease of penalty parameter of the central path with the distance to the boundary of the feasible set while performing the predictor step. At the same time, we show that, for achieving the local superlinear convergence, the centering condition must be satisfied with increasing accuracy. In Section 8 we show that the local superlinear convergence can be combined with the global polynomial-time complexity. We present a method, which works for the barriers with negative curvature, and has a cheap computation of the predictor step. Finally, in Section 9, we discuss the results and study two D-examples, which demonstrate that our assumptions are quite natural. Notation and generalities. In what follows, we denote by E a finite-dimensional linear space (other variants: H, V), and by E its dual space, composed by linear functions on E. The value of function s E at point x E is denoted by s, x. This notation is the same for all spaces in use. For an operator A : E H we denote by A the corresponding adjoint operator: Ax, y = A y, x, x E, y H. Thus, A : H E. A self-adjoint positive-definite operator B : E E (notation B 0) defines the Euclidean norms for the primal and dual spaces: x B = Bx, x /, x E, s B = s, B s /, s E. The sense of this notation is determined by the space of arguments. We use the following notation for ellipsoids in E: E B (x, r) = {u E : u x B r}. If in this notation parameter r is missing, this means that r =. In what follows, we often use the following simple result from Linear Algebra. Let self-adjoint linear operators and B map E to E, and B 0. Then, for a tolerance parameter τ > 0 we have ± τb B τ B. (.) For future reference, let us recall some facts from the theory of self-concordant functions. Most of these results can be found in Section 4 in []. We use the following notation for gradient and Hessian of function Φ: Φ(x) E, Φ(x) h E, x, h E. Let Φ be a self-concordant function defined on the interior of a convex set Q E: 3 Φ(x)[h, h, h] Φ(x)h, h 3/, x int Q, h E, (.) where 3 Φ(x)[h, h, h 3 ] is the third differential of function Φ at point x along the corresponding directions h, h, h 3. Note that 3 Φ(x)[h, h, h 3 ] is a trilinear symmetric form. Thus, 3 Φ(x)[h, h ] = 3 Φ(x)[h, h ] E, and 3 Φ(x)[h] is a self-adjoint linear operator from E to E. Assume that Q contains no straight line. Then Φ(u) is nondegenerate for every u int Q. Self-concordant function Φ is called ν-self-concordant barrier if Φ(u), [ Φ(u)] Φ(u) ν. (.3) For local norms related to self-concordant functions we use the following concise notation: h u = Φ(u)h, h /, h E, s u = s, [ Φ(u)] s /, s E. Thus, inequality (.3) can be written as Φ(u) u ν. The following result is very useful.

4 Local Superlinear Convergence of IPMs 4 Theorem (Theorem on Recession Direction; see Section 4 in [] for the proof.) recession direction of the set Q and u int Q, then If h is a h u Φ(u), h. (.4) For u int Q, define the Dikin ellipsoid W r (u) def = E Φ(u)(u, r). Then W r (u) Q for all r [0, ). If v W r (u), then For r [0, ) we have Φ(v) Φ(u), v u r +r, r 0. (.5) ( r) Φ(u) Φ(v) ( r) Φ(u), (.6) Φ(v) Φ(u) u r r, (.7) Φ(v) Φ(u) Φ(u)(v u) u r r. (.8) Finally, we need several statements on barriers for convex cones. We call cone K E regular, if it is a closed, convex, and pointed cone with nonempty interior. Sometimes it is convenient to write inclusion x K in the form x K 0. If K is regular, then the dual cone K = {s E : s, x 0, x K }, is also regular. For cone K, we assume available a ν-normal barrier F (x). This means that F is self-concordant and ν-logarithmically homogeneous: F (τx) = F (x) ν ln τ, x int K, τ > 0. (.9) Note that F (x) int K for every x int K. Equality (.9) leads to many interesting identities: where x int K and τ > 0. Note that the dual barrier F (τx) = τ F (x), (.0) F (τx) = τ F (x), (.) F (x), x = ν, (.) F (x) x = F (x), (.3) 3 F (x)[x] = F (x), (.4) F (x) x = ν, (.5) F (s) = max { s, x F (x) } x int K is a ν-normal barrier for cone K. The differential characteristics of the primal and dual barriers are related as follows: F ( F (s)) = s, F ( F (s)) = [ F (s)], F ( F (x)) = x, F ( F (x)) = [ F (x)], (.6) where x int K and s int K. For normal barriers, the Theorem on Recession Direction (.4) can be written both in primal and dual forms: The following statement is very useful. u x F (x), u, x int K, u K, (.7) s x s, x, x int K, s K. (.8)

5 Local Superlinear Convergence of IPMs 5 Lemma Let F be a ν-normal barrier for K and H : E E, H 0. Assume that E H (u) K, and for some x int K we have Then, H 4ν F (x). F (x), u x 0. Let us fix an arbitrary direction h E. We can assume that F (x), H h 0, (.9) (otherwise, multiply h by ). Denote y = u + H h h H. Then y K. Therefore, H h x h H u x + y x (.7) F (x), u + F (x), y (.9) F (x), u F (x), x (.) = ν. Thus, H F (x)h 4ν H. Corollary Let x, u int K and F (x), u x 0. Then F (u) 4ν F (x). Corollary Let x int K and u K. Then F (x + u) 4ν F (x). Denote y = x + u int K. Then F (y), x y = F (y), u 0. Hence, we can apply Corollary. To conclude with notation, let us introduce the following relative measure for directions in E: σ x (h) = min ρ 0 {ρ : ρ x h K} h x, x int K, h E. (.0) Predicting the optimal solution Consider the standard conic optimization problem: min x K { c, x : Ax = b }, (.) where c E, b H, A is a linear transformation from E to H, and K E is a regular cone. The problem dual to (.) is then max s K, y H { b, y : s + A y = c }. (.) Note that the feasible points of the primal and dual problems move in the orthogonal subspaces: s s, x x = 0 (.3) def def for all x, x F p = {x K : Ax = b}, and s, s F d = {s K : s + A y = c}. Under the strict feasibility assumption, x 0 int K, s 0 int K, y 0 H : Ax 0 = b, s 0 + A y 0 = c, (.4)

6 Local Superlinear Convergence of IPMs 6 the optimal sets of the primal and dual problems are nonempty and bounded, and there is no duality gap (see for instance [3]). Moreover, a primal-dual central path z def = (x, s, y ): is well defined. Note that Ax = b c + F (x ) = A y s = F (x ) (.6),(.0) x = F (s ), > 0, (.5) c, x b, y = s, x (.5),(.) = ν. (.6) The majority of modern strategies for solving the primal-dual problem pair (.), (.) suggest to follow this trajectory as 0. On the one hand, it is important that be decreased at a linear rate to attain a polynomial-time complexity. However, on the other hand, in a small neighborhood of the solution, it is highly desirable to switch on a superlinear rate. Such a possibility was already discovered for Linear Programming problems [9, 8, 0]. There has also been significant progress in the case of Semidefinite Programming [6, 6, 9, 5]. In this paper, we study more general conic problems. For a fast local convergence of a path-following scheme, we need to show that the predicted point ẑ = z z enters a small neighborhood of the solution point z = lim 0 z = (x, s, y ). It is more convenient to analyze this situation by looking at y-component of the central path. Note that s-component of the dual problem (.) can be easily eliminated: s = s(y) def = c A y. Then, the remaining part of the dual problem can be written in a more concise full-dimensional form: f def = max{ b, y : y Q}, y H (.7) Q def = {y H : c A y K }. In view of Assumption (.4), interior of the set Q is nonempty. Moreover, for this set we have a ν-self-concordant barrier f(y) = F (c A y), y int Q. Since the optimal set of problem (.7) is bounded, Q contains no straight line. Thus, this barrier has a nondegenerate Hessian at every strictly feasible point. Note that Assumption (.4) also implies that the linear transformation A is surjective. It is clear that y-component of the primal-dual central path z coincides with the central path of the problem (.7): b = f(y ) = A F (c A y ) = A F (s ) (.5) = Ax, > 0. (.8)

7 Local Superlinear Convergence of IPMs 7 Let us estimate the quality of the following prediction point: p(y) v(y) def = y + v(y), y int Q, def = [ f(y)] f(y), s p (y) def = s(y) A v(y). Definition of the displacement v(y) is motivated by identity (.3), which is valid for arbitrary convex cones. Indeed, in a neighborhood of a suitably non-degenerate solution, the barrier function should be close to the barrier of a tangent cone centered at the solution. Hence, the relation (.3) should be satisfied with a reasonably high accuracy. For every y int Q, we have p(y) = [ f(y)] [ f(y)y + f(y) ] = [ f(y)] [A F (c A y)a y A F (c A y) ] (.3) = [ f(y)] [A F (c A y)a y + A F (c A y)(c A y) ] = [ f(y)] A F (c A y) c. Let us choose an arbitrary pair (s, y ) from the optimal solution set of the problem (.). Then, c = A y + s. Thus, we have proved the following representation. Lemma For every y int Q and every optimal pair (s, y ) of dual problem (.), we have p(y) = y + [ f(y)] A F (s(y))s. (.9) Remark Note that the right-hand side of equation (.9) has a gradient interpretation. Indeed, let us fix some s K and define the function φ s (y) = s, F (c A y), y Q. Then φ s (y) = A F (c A y) s, and, for self-scaled barriers φ s ( ) is convex (as well as for the barriers with negative curvature, see Section 6). Thus, the representation (.9) can be rewritten as follows: p(y) = y + [ f(y)] φ s (y). (.0) Note that [ f(y)] in the limit acts as a projector onto the tangent subspace to the feasible set at the solution. To conclude this section, let us describe our prediction abilities from the points of the central path. For that, we need to compute derivatives of the trajectory (.5). Differentiating the last line of this definition in, we obtain Thus, 0 = A F (s ) + A F (s )A y. y = [A F (s )A ] A F (s ) = [ f(y )] f(y ). (.) Therefore, we have the following representation of the prediction point: p(y ) = y y. (.) In fact, in representation (.9) we can replace the pair (s, y ) by any pairs ( s, ȳ), satisfying condition c = A ȳ + s. However, in our analysis we are interested only in predicting the optimal solutions.

8 Local Superlinear Convergence of IPMs 8 Hence, for the points of the central path, identity (.9) can be written in the following form: For the primal trajectory, we have y y y = [ f(y )] A F (s )s. (.3) x (.5) = x + F (s )A y (.) = x + F (s )A [A F (s )A ] A F (s ) (.3) = x F (s )A [A F (s )A ] A F (s )(c A y ) = x F (s )A [A F (s )A ] A F (s )(s + A (y y )) = x F (s ) ( s s + A [A F (s )A ] A F (s )s ). Using now identity (.3), definition (.5), and equation (.3), we obtain the following representation: x = F (s )s F (s )A ( y y y ). (.4) Due to the primal-dual symmetry of our set-up, we have the following elegant geometric interpretation. Let H def = F (s ), then we have H / ( x x + x ) = projection of H / x onto the kernel of AH /, H / A ( y y + y ) = projection of H / s onto the image of ( AH /). In Section 4, we introduce two conditions, ensuring the boundedness of the derivative x and a high quality of the prediction from the central dual trajectory: y y y O( ). However, first we need to decide on the way of measuring the distances in primal and dual spaces. 3 Measuring the distances Recall that the global complexity analysis of interior-point methods is done in an affine-invariant framework. However, for analyzing the local convergence of these schemes, we need to fix some Euclidean norms in the primal and dual spaces. Recall the definitions of Euclidean norms based on a positive definite operator B : E E. x B = Bx, x /, x E, s B = s, B s /, s E. (3.) Using this operator, we can define operator G def = AB A : H H. (3.) By a Schur complement argument and the fact that A is surjective, we conclude that A G A B. (3.3) It is convenient to choose B related in a certain way to our cones and barriers. Sometimes it is useful to have B such that BK K. (3.4) Lemma 3 Let B be as above, suppose B satisfies (3.4) and u K ±v. Then v B u B. Let B, u and v be as above. Then, Bu, u Bv, v = B(u v), u + v (3.4) 0.

9 Local Superlinear Convergence of IPMs 9 Remark As a side remark, we will see that, if F has negative curvature (see Section 6), then B chosen in (3.5) satisfies (3.4). In the general case, it is also possible to satisfy (3.4) by choosing B = F ( x) + F ( x) [ F ( x)], where x is an arbitrary point from int K. This operator acts as Bh = F ( x)h + F ( x), h F ( x), h E. Then, the property (3.4) easily follows from the Theorem on Recession Direction. Note that F ( x) B (.5) (ν + ) F ( x). In what follows, we choose B related to the primal central path. Let us define B = F (x ) with =. (3.5) We need some bounds for the points of the primal and dual central paths. Denote by X E the set of limit points of the primal central path, and by S E the set of limit points of the dual central path. Lemma 4 If (0, 0 ], then In particular, for every x X and every s S we have: Moreover, if (0, ], then x x0 ν, s s0 ν. (3.6) x B ν, s B ν. (3.7) Indeed, 4ν B F (x ) 4ν B, 4ν B F (s ) 4ν B. (3.8) x (.7) x 0 F (x 0 ), x = s 0, x (.5) = [ c, x 0 b, y 0 ] (.6) ν. 0 The last inequality also uses the fact that c, x c, x 0. Applying this inequality with 0 = and taking the limit as 0, we obtain (3.7) in view of the choice (3.5). The reasoning for the dual central path is the same. Further, F (x ), x x (.5) = c, x x 0. Therefore, applying Corollary, we get the first relation in the first line of (3.8). Similarly, we justify the first relation in the second line of (3.8). Finally, F (x ) (.5) = F ( F (s )) (.) = F ( F (s )) (.6) = [ F (s )]. It remains to use the first relation in the second line of (3.8) =. Using this, with the first relation, in the second line of (3.8) = and the Löwner order reversing property of the inverse, we conclude the second relation in the first line of (3.8) =. The last unproved relation can be justified in a similar way.

10 Local Superlinear Convergence of IPMs 0 Corollary 3 For every (0, ] we have Indeed, for every h E, we have F (s ) B 4ν. (3.9) F (s )h B = B F (s )h, F (s )h (3.8) 4ν h, F (s )h (3.8) Finally, we need to estimate the norms of the initial data. Lemma 5 We have 6ν4 4 h, B h. A G,B A B,G def = max h E { Ah G : h B = }, def = max y H { A y B : y G = }, (3.0) Indeed, for every h E, we have b G ν /. Ah G,B = Ah, G Ah = max [ Ah, y Gy, y ] y H [ = max A y, h A y, B A y ] y H [ max s, h s, B s ] = h s E B. Further, A B,G = max h E,y H { A y, h : h B =, y G = } = max h E,y H { Ah, y : h B =, y G = } = A G,B. To justify the remaining inequality, note that b G = b, G b = max y H [ b, y A y, B A y ] [ = max A y, x A y, B A y ] y H [ max s, x s, B s ] = Bx, x s E (3.5) = F (x )x, x (.),(.3) = ν.

11 Local Superlinear Convergence of IPMs 4 Main assumptions Now, we can introduce our main assumptions. Assumption There exists a constant γ d > 0 such that f b, y = s, x γ d y y G (3.) = γ d s s B, (4.) for every y Q (that is s = s(y) F d ). Thus, we assume that the dual problem (.) admits a sharp optimal solution. Let us derive from Assumption that [ f(y)] becomes small in norm as y approaches y. Lemma 6 For every y int Q, we have [ f(y)] 4 [f b, y ] G. γd (4.) Let us fix some y int Q. Consider an arbitrary direction h H. Without loss of generality, we may assume that b, [ f(y)] h 0 (otherwise, we can consider direction h). Since f is a self-concordant barrier, the point y h def = y + [ f(y)] h h,[ f(y)] h / belongs to the set Q. Therefore, in view of inequality (4.), we have γ d y h y G f b, y h f b, y. Hence, γ d [f b, y ] [ f(y)] h G y y h,[ f(y)] h / G Thus, for every h H we have (4.) [ f(y)] h G h,[ f(y)] h / γ d [f b, y ]. [ f(y)] h G 4 [f b, y ] h, [ f(y)] h. γd This means that [ f(y)] G[ f(y)] 4 [f b, y ] [ f(y)], γd and (4.) follows. Now we can estimate the size of the Hessian f(y) with respect to the norm induced by G: [ f(y)] G def = max h H { [ f(y)] h G : h G = }. Corollary 4 For every y int Q, we have Therefore, v(y) G ν/ γ d [f b, y ]. [ f(y)] G 4 [f b, y ]. γd (4.3)

12 Local Superlinear Convergence of IPMs Note that [ f(y)] h G = h, [ f(y)] G[ f(y)] h, h H. Hence, (4.3) follows directly from (4.). Further, v(y) G = G[ f(y)] f(y), [ f(y)] f(y) (4.) 4 [f b, y ] f(y), [ f(y)] f(y). γd It remains to use inequality (.3). Assumption and Lemma 6 help us bound the norm of the right-hand side of representation (.3). However, in this expression there is one more object, which potentially can be large. This is the vector F (s )s. Therefore, we need one more assumption. Assumption There exists a constant σ d such that for every (0, ] we have F (s )s B σ d. (4.4) In what follows, we always suppose that Assumptions and are valid. Let us point out their immediate consequence. Theorem For every (0, ], we have the following bounds: y y y G 4σ dν γd ( x def B σ p = σ d Indeed, in view of representation (.3), we have + 6ν4 γ d, (4.5) y y y G f(y ) G A G,B F (s )s B. Thus, in view of inequalities (4.3), (3.0), and (4.4), we have y y y G 4σ d (f b, y γd ). ). (4.6) Applying now identity (.6), we obtain (4.5). Further, in view of representation (.4), we have x B F (s )s B + F (s ) B A B,G y y y G. It remains to apply inequalities (4.4), (3.9), (3.0), and (4.5). Corollary 5 There is a unique limit point of the primal central path: x = lim 0 x. Moreover, for every (0, ] we have x x B σ p. (4.7) Note that Assumptions and do not guarantee the uniqueness of the primal optimal solution in the problem (.). Let us show by some examples that our assumptions are natural.

13 Local Superlinear Convergence of IPMs 3 Example Consider nonnegative orthant K = K = R n + with barriers F (x) = n ln x (i), i= F (s) = n n ln s (i). i= Denote by I the set of positive components of the optimal dual solution s. Then denoting by e the vector of all ones, we have e, F (s )s = i I s (i) (s (i) ) = i I s (i) ( s (i) s (i) ) s s max i I s (i) (3.6) n max i I s (i) Since vector F (s )s is nonnegative, we obtain for its norm an upper bound in terms of max. n i I s (i) Note that this bound is valid even for degenerate dual solutions (too many active facets in Q), or multiple dual optimal solutions (which is excluded by Assumption ). It is interesting, that we can find a bound for vector F (s )s based on the properties of the primal central path. Indeed, for all i I, we have ( F (s )s ) ) (i) = s (i) = s (i) (s (i) ) ( (x(i) ) = s (i) ) x (i) x(i). Thus, assuming x x B O(), we get a bound for F (s )s. In view of inequality (4.7), this confirms that for Linear Programming our assumptions are very natural. Example For the cone of positive-semidefinite matrices K = K = S n +, we choose F (X) = ln det X, F (S) = n ln det S.. Then, I, F (S )S = I, S S S. It seems difficult to get an upper bound for this value in terms of S S However, the second approach also works here: = S S S, S. I, S S S = X, S = (X X ), S. Thus, we get an upper bound for F (S )S B assuming X X B O(). Our algorithms will work with points in a small neighborhood of the central path defined by the local gradient proximity measure. Denote { } N (, β) = y H : γ(y, ) def = f(y) b y β, (0, ], β [0, ]. (4.8) This proximity measure has a very familiar interpretation in the special case of Linear Programming. Denoting by S the diagonal matrix made up from the slack variable s = c A T y, notice that Dikin s affine scaling direction in this case is given by [ AS A T ] b. Then, our predictor step corresponds to the search direction [ AS A ] T AS e, and our proximity measure becomes AS e b AS. A T Let us prove the main result of this section.

14 Local Superlinear Convergence of IPMs 4 Lemma 7 Let y N (, β) with (0, ] and β [0, 9 ]. Then where κ = ν + β(β+ ν) β. Indeed, F (s(y))s B σ d + 6ν β, (4.9) f b, y κ, (4.0) s(y) s s(y) = F (s(y))a (y y ), A (y y ) / = y y y def = r. Therefore, by (.6) we have ( r) F (s ) F (s(y)) ( r) F (s ). Denote = F (s(y)) F (s ). Then, { } ± max ( r), ( r) F (s ) = r( r) ( r) F (s ). Note that F (s(y))s B (4.4) σ d + Hs B. At the same time, Hs B = BHs, Hs (3.8) 4ν [ F (s )] Hs, Hs (.) 4ν r ( r) ( r) 4 F (s )s, s (3.6) 4ν4 r ( r) ( r) 4. Thus, F (s(y))s B σ d + ν r( r) ( r). For proving (4.9), it remains to note that r (.5) and r( r) ( r) = ( r) ( β) ( β) = β 3β ( β) To establish (4.0), note that [f b, y ] = [ b, y y + b, y y ] (.6) ν + b, y y < β( 3β) 4β 3β, β [0, 9 ]. = ν + f(y) + b, y y + f(y), y y β β, ν + f(y) b y y y y + f(y) y y y y ν + β β β + ν β β, where the last inequality follows from the assumptions of the lemma and (.3). Now, we can put all our observations together. Theorem 3 Let dual problem (.) satisfy Assumptions and. If for some (0, ] and β [0, 9 ] we have y N (, β), then ( ) ( ) p(y) y G 4 σ γd d + 6ν β b, y y 4ν σ γd d + 6ν β y y G. (4.)

15 Local Superlinear Convergence of IPMs 5 Indeed, in view of representation (.9), we have p(y) y G [ f(y)] G A G,B F (s(y))s B. Now, we can use inequalities (4.3), (3.0), and (4.9). For justifying the second inequality, we apply the third bound in (3.0). The most important consequence of the estimate (4.) consists in the necessity to keep the neighborhood size parameter β at the same order as the (central path) penalty parameter. In our reasoning below, we often use β = 9. (4.) 5 Efficiency of the predictor step Let us estimate now the efficiency of the predictor step with respect to the local norm. Lemma 8 If y N (, β), then where κ = γ d ( σ d + 6ν β ), and κ = κ κ. Indeed, p(y) y y κ [f b, y ] κ, (5.) p(y) y y (.9) = A F (s(y))s, [ f(y)] A F (s(y))s (4.) 4 [f b, y ] A F γd (s(y))s, G A F (s(y))s (3.3) 4 [f b, y ] B F γd (s(y))s, F (s(y))s. It remains to use the bounds (4.9) and (4.0). Since y y y, inequality (5.) demonstrates a significant drop in the distance to the optimal point after a full predictor step. The following fact is also useful. Lemma 9 For every y Q, we have A F (s(y)) s p (y) = 0. Indeed, A F (s(y))s p (y) = A F (s(y))(s(y) A v(y)) (.3) = A F (s(y)) f(y) = 0. Corollary 6 If F p is bounded, then the point F (s(y)) s p (y) / K (therefore, it is infeasible for the primal problem (.)). We can show now that a large predictor step can still keep dual feasibility. Denote y(α) = y + αv(y), α [0, ].

16 Local Superlinear Convergence of IPMs 6 Theorem 4 Let y N (, β) with (0, ] and β [0, 9 ]. Then, for every r (0, ), the point y(ˆα) with belongs to Q. Moreover, where κ 3 = κ ( r + ν γ d ). ˆα def = r r+κ [f b,y ] (5.) f b, y(ˆα) κ 3 [f b, y ], (5.3) Consider the Dikin ellipsoid W r (y) = {u H : u y y r}. Since W r (y) Q, its convex combination with point y, defined as is contained in Q. Note that Hence, y(ˆα) Q. Further, Q(y) = {u H : u ( t)y ty y r( t), t [0, ]}, y(ˆα) ( ˆα)y ˆαy y = ˆα p(y) y y (5.) κ ˆα[f b, y ] (5.) = r( ˆα). f b, y(ˆα) = ( ˆα)[f b, y ] + ˆα b, y p(y) κ r [f b, y ] + b G p(y) y G. Since b G (3.0) ν and we obtain the desired inequality (5.3). (4.) p(y) y G κ γ d [f b, y ], Denote by ᾱ(y), the maximal feasible step along direction v(y): ᾱ(y) = max{α : y + αv(y) Q}. α 0 Let us show that ᾱ = ᾱ(y) is large enough. In general, ᾱ(y) v(y) y (.3) ν /. (5.4) However, in a small neighborhood of the solution, we can establish a better bound. Theorem 5 Let y N (, β) with (0, ] and β [0, 9 ]. Then Moreover, if is small enough: then and ᾱ(y) ᾱ(y) κ +κ. (5.5) < β κ, (5.6) κ κ β, (5.7) ( y(ᾱ) y y κ + ) ν κ β. (5.8)

17 Local Superlinear Convergence of IPMs 7 Since for every r (0, ) we have ᾱ ᾱ (5.) ˆα = r r+κ [f b,y ], κ[f (4.0) b,y ] +κ [f b,y ] κ +κ, which is (5.5). On the other hand, b, v(y) = b, [f (y)] (f (y) b + b b y β b y = b y ( f (y) + b f (y) y β) b y ( f (y) y β) = b y ( p(y) y y β) b y ( y y y p(y) y y β). Thus, using the estimate (5.) and the bound y y y (since y is on the boundary of Q), we get b, v(y) b y ( κ β). (5.9) Therefore, condition (5.6) guarantees that b, v(y) > 0. Define α = f b,y b,v(y). Then Therefore, y + αv(y) Q. Hence, b, y + αv(y) = f. ᾱ α = + b,y y v(y) b,v(y) (5.) + b y κ b,v(y) (5.9) + κ κ β. Further, y(ᾱ) y y y(ᾱ) p(y) y + p(y) y y (5.) ᾱ v(y) y + κ (.3) ᾱ ν + κ. Taking into account that in view of (5.5) and (5.7) ᾱ κ κ β, we get inequality (5.8). Despite the extremely good progress in function value, we have to worry about the distance to the central path and we cannot yet fully appreciate the new point y( ˆα). Indeed, for getting close again to the central path, we need to find an approximate solution to the auxiliary problem min{f(y) : b, y = b, y(ˆα) }. y In order to estimate the complexity of this corrector stage, we need to develop some bounds on the growth of the gradient proximity measure. 6 Barriers with negative curvature Definition Let F be a normal barrier for the regular cone K. curvature if for every x int K and h K we have We say that F has negative 3 F (x)[h] 0. (6.) It is clear that self-scaled barriers have negative curvature (see [4]). Some other important barriers, like the negation of the logarithms of hyperbolic polynomials (see [4]) also share this property.

18 Local Superlinear Convergence of IPMs 8 Theorem 6 Let K be a regular cone and F be a normal barrier for K. statements are equivalent:. F has negative curvature;. for every x int K and h E we have Then, the following 3 F (x)[h, h] K ; (6.) 3. for every x int K and for every h E such that x + h int K, we have F (x + h) F (x) K F (x)h. (6.3) Let F have negative curvature. Then, for every h E and u K we have 0 3 F (x)[h, h, u] = 3 F (x)[h, h], u. (6.4) Clearly, this condition is equivalent to (6.). On the other hand, from (6.) we have F (x + h) F (x) F (x)h = 3 F (x + τh)[h, h] dτ K 0. 0 Note that we can replace in (6.3) h by τh, divide everything by τ, and take the limit as τ 0 +. Then we arrive back at the inclusion (6.). Theorem 7 Let the curvature of F be negative. Then for every x K, we have F (x)h K 0, h K, (6.5) and, consequently, F (x + h) F (x) K 0. (6.6) Let us prove that F (x)h K for h K. Assume first that h int K. Consider the following vector function: s(t) = F (x + th)h E, t 0. Note that s (t) = 3 F (x + th)[h, h] (6.) K 0. This means that F (x)h K F (x + th)h (.) = t F (h + t x)h. Taking the limit as t, we get F (x)h K. By continuity arguments, we can extend this inclusion onto all h K. Therefore, F (x + h) = F (x) + F (x + τh)h dτ K 0 F (x). As we have proved, if F has negative curvature, then F (x)k K, for every x int K. This property implies that the situations when both F and F has negative curvature are very seldom. Lemma 0 Let both F and F have negative curvature. Then K is a symmetric cone.

19 Local Superlinear Convergence of IPMs 9 Indeed, for every x int K we have F (x)k K. Denote s = F (x). Since F has negative curvature, then F (s)k K. However, since F (s) (.6) = [ F (x)], this means K F (x)k. Thus K = F (x)k. Now, using the same arguments as in [4] it is easy to prove that for every pair x int K and s int K there exists a scaling point w int K such that s = F (w)x (this w can be taken as the minimizer of the convex function F (w), x + s, w ). Thus, we have proved that K is homogeneous and self-dual. Hence, it is symmetric. Theorem 8 Let K be a regular cone and F be a normal barrier for K, which has negative curvature. Further let x, x + h int K. Then for every α [0, ) we have (+ασ x(h)) F (x) F (x + αh) ( α) F (x). (6.7) Indeed, F (x + αh) = F (( α)x + α(x + h)) Further, denote x = x (6.) F (( α)x) (.) = ( α) F (x). h σ x(h). By definition, x K. Note that x = (x + αh) + ασx(h) +ασ x(h) ( x (x + αh)). Therefore, by the second inequality in (6.7), we have F (x) ( + ασ x (h)) F (x + αh). 7 Bounding the growth of the proximity measure Let us analyze now our predictor step y(α) = y + αv(y), α [0, ᾱ], where ᾱ = ᾱ(y). Denote s = s(y(ᾱ)) K. Lemma Let F have negative curvature. Then, for every α [0, ᾱ), we have δ y (α) def = f(y(α)) ᾱ ᾱ α f(y) G αᾱ (ᾱ α) F (s(y)) s B. (7.) Indeed, δ y(α) = G ( f(y(α)) ᾱ ᾱ α f(y)), f(y(α)) ᾱ ᾱ α f(y) = G A( F (s(y(α))) ᾱ ᾱ α F (s(y))), A( F (s(y(α))) ᾱ ᾱ α F (s(y))) (3.3) B( F (s(y(α))) F (( ᾱ α )s(y))), F (s(y(α))) F (( ᾱ α )s(y)).

20 Local Superlinear Convergence of IPMs 0 Note that y(α) = y + ᾱ α (y(ᾱ) y). Therefore, s(y(α)) = ( ᾱ α )s(y) + ᾱ α s. Since F has negative curvature, we have (( x def = F (s(y(α))) F ᾱ ) ) s(y) K 0. α Using (6.3) in Theorem 6, we also have (( x def = F ᾱ ) ) ( ᾱ ) s(y) α α s K x. Thus, x K ±x. By (6.5) in Theorem 7 and Lemma 3, we obtain Bx, x Bx, x which gives the desired conclusion. Note that at the predictor stage, we need to choose the rate of decrease of the penalty parameter (central path parameter) as a function of the predictor step size α. Inequality (7.) suggests the following dependence: (α) ( ᾱ α). (7.) However, if ᾱ is close to its lower limit (5.4), this strategy may be too aggressive. Indeed, in a small neighborhood of the point y we can guarantee only f(y(α)) ( + α) f(y) y = f(y(α)) f(y) α f(y)v(y) y (.8) α v(y) y α v(y) y. (7.3) In this situation, a more reasonable strategy for decreasing seems to be: (α) +α. (7.4) It appears that it is possible to combine both strategies (7.) and (7.4) in a single expression. Denote ξᾱ(α) = + αᾱ ᾱ α, α [0, ᾱ). Note that ξᾱ(α) = + α + α ᾱ α = ᾱ ᾱ α α( ᾱ) ᾱ α. (7.5) Let us prove an upper bound for the growth of the local gradient proximity measure along direction v(y), when the penalty parameter is divided by the factor ξᾱ(α). Theorem 9 Suppose F has negative curvature. Let y N (, β) with (0, ] and β [0, 9 ], satisfying condition (5.6). Then, for y(α) = y + αv(y) with α (0, ᾱ), we have ( ) γ y(α), ξᾱ(α) Γ (y, α) def = ( + α σ s(y) ( A v(y)) ) f(y(α)) ξᾱ(α) b y [ ( )] ( + α σ s(y) ( A v(y))) γ (α) + β + αᾱ ᾱ α, γ (α) def = f(y(α)) ξᾱ(α) f(y) y ( ( αᾱ ) (ᾱ α) α [ ( ᾱ κ ν + κ γ d σ d + 6ν β + κν + ) ]) ν β κ β β. (7.6)

21 Local Superlinear Convergence of IPMs Indeed, ( γ y(α), ) ξᾱ(α) = f(y(α)) ξᾱ(α) b y(α) Further, (6.7) ( + ασ s(y) ( A v(y))) f(y(α)) ξᾱ(α) b y. f(y(α)) ξᾱ(α) b y γ (α) + ξᾱ(α) f(y) b y. Since y N (, β), the last term does not exceed β ξᾱ(α). Let us estimate now γ (α). γ (α) (8.9) f(y(α)) ᾱ ᾱ α f(y) y + α( ᾱ) ᾱ α f(y) y (5.5) f(y(α)) ᾱ ᾱ α f(y) y + α ᾱ α κ ν +κ. For the second inequality above, we also used (.3). Note that f(y(α)) ᾱ ᾱ α f(y) y = [ f(y)] ( f(y(α)) ᾱ ᾱ α f(y)), f(y(α)) ᾱ ᾱ α f(y) (4.) 4 [f b, y ] f(y(α)) ᾱ γd ᾱ α f(y) G (4.0) 4κ δ γ y(α). d Moreover, δ y (α) (7.) αᾱ (ᾱ α) F (s(y)) s B [ ] αᾱ (ᾱ α) F (s(y))s B + F (s(y))( s s ) B (4.9) [ ] αᾱ (ᾱ α) σ d + 6ν β + F (s(y))( s s ) B. It remains to estimate the last term. (.5) Denote r = s(y) s s(y) β β. Then Therefore, B (3.8) 4ν [ (.6) 4ν F (s )] ( r) [ F (s(y))]. ν F (s(y))( s s ) B ( r) y(ᾱ) (5.8) y y ( κν + ) ν κ β β β. Putting all the estimates together, we obtain the claimed upper bound on γ (α). Taking into account the definition of ξᾱ(α), we can see that our predictor-corrector scheme with neighborhood size parameter β = O() has local superlinear convergence.

22 Local Superlinear Convergence of IPMs 8 Polynomial-time path-following method Let us describe now our path-following predictor-corrector strategy. It employs the following univariate function: { α, α [0, ηᾱ(α) = 3ᾱ), α+ᾱ, α [ 3ᾱ, ᾱ]. (8.) This function will be used for updating the length of the current predictor step α by the rule α + = ηᾱ(α). If the current step is small, then it will be doubled. On the other hand, if α is close enough to the maximal step size ᾱ, then this distance for the new value α + will be halved. Let us fix the tolerance parameter β = 6 for the proximity measure Γ. Consider the following method. Dual path-following method for barriers with negative curvature. Set 0 = and find point y 0 N ( 0, 5).. For k 0 iterate: a) Compute ᾱ k = ᾱ(y k ). b) Set α k,0 = 6 min {, v(y k ) yk }. Using recurrence α k,i+ = ηᾱk (α k,i ), find the maximal i i k, such that Γ k (y k, α k,i ) β. (8.) c) Set α k = α k,ik, p k = y k + α k v(y k ), k+ = k ξᾱk (α k ). d) Starting from p k, apply the Newton method for finding y k+ N ( k+, β k+ ) with β k+ = k+ 5. Recall that the bound Γ mu (y, α) = ( + ασ s(y) ( A v(y))) f(y(α)) ξ ᾱ(α)b y is explicitly computable for different values of α (we need to compute only the new vectors of the gradients f(y(α))). Let us show now that the predictor-corrector scheme (8.) has polynomial-time complexity. Lemma Suppose F has negative curvature and let y N (, β) with β 5. Then for all { }] α [0, 6 min, v(y k ) yk (8.3) we have Γ (y, α) β. Denote r = v(y) y, and ˆr = max{, v(y) y }. For every α [0, 6ˆr ] we have Γ (y, α) (.0) ( + αr) f(y(α)) ξᾱ(α) b y (8.9) ( ) ( + αr) f(y(α)) ( + α) f(y) y + α r ᾱ α + ξ ᾱ(α) f(y) b y (7.3),(5.4) ( + αr) ( [ ]) α r αr + β + α αr ( + αˆr) α ˆr +β(+α) αˆr.

23 Local Superlinear Convergence of IPMs 3 Hence, Γ (y, α) (8.3) ( + 6) (+ 6 ) 6 < 6. Corollary 7 Let sequence { k } be generated by method (8.). Then for every k 0 we have k+ ( + 6ν / ) k. (8.4) Note that by Lemma, we get Γ k (y k, α k,0 ) β. Therefore, in method (8.) we always have (.3) α k α k,0. It remains to note that ξᾱ(α 6ν / k ) (8.9) + α k. At the same time, it follows from Theorem 9 that method (8.) can be accelerated. Taking into account the choice β k = k 5 in (8.), we see that κ ν + 5 ( 5 +ν/ ), 5 κ γ d (σ d ν ), and κ = κ κ. Let us assume that k is small enough. Namely, we assume that Then κ k β k k (κ+ ). (8.5) 5 ᾱ k (5.7) + Therefore, taking into account that κ k κ k 5 k + κ k (8.5), (5.5) (8.5) ᾱ k +κ k 3. (8.6) σ s(y) ( A v(y)) (.0) A v(y) s(y) = v(y) y (.3) ν /, we have the following bound on the growth of the proximity measure: Γ k (y k, α) γ d [ (7.6) [ ( )] ( + ᾱ k ν / ) αᾱk (ᾱ k α) k c 0 + k 5 + αᾱ k ᾱ k α (8.6) ( + ν / ) k [ ᾱ k (ᾱ k α) c (8.5) where c 0 = κν / + κ σ d ν + κν( + ν / ) 5 5 Assume now that k is even smaller, namely, that ]. ( )] + ᾱ k ᾱ k α, (8.7) k β (+ν / )(9c ). (8.8) Denote by ξ() the unique positive solution of the equation In view of assumption (8.8), we have 3 ξ( k). Therefore, c 0 ξ + 5 ( + ξ) = β (+ν / ). (8.9) β (+ν / ) k (c )ξ ( k ). (8.0)

24 Local Superlinear Convergence of IPMs 4 Note that for α( k ) defined by the equation ᾱ k ᾱ k α( k ) = ξ( k ) (8.8) 3, we have Γ k (y k, α( k )) β. Therefore, y k + α( k )v(y k ) Q. At the same time, α( k ) 3ᾱk. (8.) Note that in view of the termination criterion of Step b) in method (8.) we either have α k α( k ), or (ᾱ k + α k ) α( k ). The first case is possible only if α k < 3ᾱk. This implies α( k ) < 3ᾱk, and this contradicts the lower bound (8.). Thus, we have Therefore, α k α( k ) ᾱ k. ξᾱk (α k ) = + α kᾱ k ᾱ k α k + ᾱk(α( k ) ᾱ k ) (ᾱ k α( k )) (8.) + ᾱ k 6(ᾱ k α( k )) β where c = (+ν / )(c 0+ 7 ). 9 5 Thus, we have proved the following theorem. = + 6ᾱkξ( k ) (8.6) + 9 ξ( k) (8.0) + 9 c k, Theorem 0 Suppose that k satisfies condition (8.8). Then method (8.) has a local superlinear rate of convergence: k+ 9 3/ c / k. It remains to estimate the full complexity of one iteration of method (8.). It has two auxiliary search procedures. The first one (that is Step b)) consists in finding an appropriate value of the predictor step. We need an auxiliary statement on performance of its recursive rule α + = ηᾱ(α). Lemma 3 If α 0 and α + = ηᾱ(α), then ξᾱ(α + ) ξᾱ(α). Hence, for the recurrence we have ξᾱ(α i ) + α 0 i. α i+ = ηᾱ(α i ), i 0, If α + = α, then ξᾱ(α + ) = + αᾱ ᾱ α + αᾱ ᾱ α = ξ ᾱ(α). If α + = α+ᾱ, then ξᾱ(α + ) = + ᾱ(ᾱ+α) ᾱ α = ξᾱ(α) + ᾱ ᾱ α ξ ᾱ(α). Therefore, ξᾱ(α i ) + (ξᾱ(α 0 ) ) i (8.9) + α 0 i. Note that the number of evaluations of the proximity measure Γ k (y k, ) at Step b) of method (8.) is equal to i k +. Therefore, for the first N iterations of this method we have N (i k + ) N + N log k k+ α k,0 N( + log ν) log N. k=0 k=0 Taking into account that for solving the problem (.) with absolute accuracy ɛ we need to ensure N ν ɛ, and using the rate of convergence (8.4), we conclude that the total number of evaluations of the proximity measure Γ k (y k, ) does not exceed O(ν / ν log ν log ɛ ).

25 Local Superlinear Convergence of IPMs 5 It remains to estimate the complexity of the correction process (this is Step d)). This process cannot be too long either. Note that the penalty parameters k are bounded from below by ɛ ν, where ɛ is the desired accuracy of the solution. On the other hand, the point p k belongs to the region of quadratic convergence of the Newton method. Therefore, the number of iterations at Step d) is bounded by O(ln ln κ ɛ ). In Section 9, we will demonstrate on simple examples that the high accuracy in approximating the trajectory of central path is crucial for local superlinear convergence of the proposed algorithm. There exists another possibility for organizing the correction process at Step d). We can apply a gradient method in the metric defined by the Hessian of the objective function at point p k. Then the rate of convergence will be linear, and we could have up to O(ln ν ɛ ) correction iterations at each step of method (8.). However, each of such an iteration will be cheap since it does not need evaluating and inverting the Hessian. 9 Discussion 9. D-examples Let us look now at several D-examples illustrating different aspects of our approach. Let us start with the following problem: max y R { b, y : y 0, y y}. (9.) For this problem, we can use the following barrier function: f(y) = ln(y y ) ln y. We are going to check our conditions for the optimal point y = 0. Problem (9.) can be seen as a restriction of the following conic problem: max s,y { b, y : s = y, s = y, s 3 =, s 4 = y, s s 3 s, s 4 0}, (9.) endowed with the barrier F (s) = ln(s s 3 s ) ln s 4. Note that ( s3 s s F (s) = s s 3 s, s s 3 s, s s 3 s, ) T. s 4 Denote ω = s s 3 s. Since in problem (9.) y = 0 corresponds to s = e 3, we have the following representation: Let us choose in the primal space the norm F (s)s = F (s)e 3 = ω (s, s s, s, 0 ) T. Bx, x = x + x + x 3 + x 4. Then F (s)s B = [s + s ]/ω. Hence, the region F (s(y))s B σ d is formed by vectors y = (y, y ) satisfying the inequality y + y σ d (y y ). Thus, the boundary curve of this region is given by equation y = y + y σ / + y, d

26 Local Superlinear Convergence of IPMs 6 which has a positive slope [σ d ] at the origin (see Figure ). Note that the central path corresponding to the vector b = (, 0) T can be found form the equations y y =, = y y y y. Thus, its characteristic equation is y = 3y, and, for any value of σ d, it leaves the region of quadratic convergence as 0. It is interesting that in our example, Assumption is valid if and only if the problem (9.) with y = 0 satisfies Assumption. y y = y + y σ / + y d F (s(y))s B σ d y = y Central path b = (, 0) T 0 y = y / σ d y Figure. Behavior of F (s(y))s B. In our second example we need the maximal neighborhood of the central path: ( ) M(β) = Cl N (, β) R Note that θ(y) = min t R f(y) tb y. Consider the following problem: = { } y : θ (y) def = f(y) y b f(y), [ f(y)] b β. y (9.3) max y R {y : y }. (9.4) where is the standard Euclidean norm. Let us endow the feasible set of this problem with the standard barrier function f(y) = ln( y ). Note that f(y) = [ f(y) ] y y, f(y) = I y + 4yyT ( y ), = y ) (I yyt + y, [ f(y) ] f(y) = y + y y.

27 Local Superlinear Convergence of IPMs 7 Therefore, and for b = (, 0) T we have f(y) y = y + y, Thus, b y = y y +y + y, f(y), [ f(y) ] b = y + y y. θ (y) = y + y y + y ( y ) y y +y (+ y ) ( ) = + y y y ( y ) = y +y y. y +y We conclude that for problem (9.4) the maximal neighborhood of the central path has the following representation: } M(β) = {y R : y + β β y (9.5) (see Figure ). p p(y) y Problem: y + y 0 Central path max y. y y Small neighborhood Large neighborhood Figure. Prediction in the absence of sharp maximum. Note that p(y) = y + y int Q. If the radii of the small and large neighborhoods of the central path are fixed, by straightforward computations we can see that the simple predictorcorrector update y y + shown in Figure has local linear rate of convergence. In order to get a superlinear rate, we need to tighten the small neighborhood of the central path as Examples of cones with negative curvature In accordance with the definition (6.), negative curvature of barrier functions is preserved by the following operations.

28 Local Superlinear Convergence of IPMs 8 If barriers F i for cones K i E i, i =,, have negative curvature, then the curvature of the barrier F + F for the cone K K is negative. If barriers F i for cones K i E, i =,, have negative curvature, then the curvature of the barrier F + F for the cone K K is negative. If barrier F for cone K has negative curvature, then the curvature of the barrier f(y) = F (A y) for the cone K y = {y H : A y K} is negative. If barrier F (x) for cone K has negative curvature, then the curvature of its restriction onto the linear subspace {x E : Ax = 0} is negative. At the same time, we know two important families of cones with negative curvature. Self-scaled barriers have negative curvature (see Corollary 3.(i) in [4]). Let p(x) be hyperbolic polynomial. Then the barrier F (x) = ln p(x) has negative curvature (see [4]). Thus, using above mentioned operations, we can construct barriers with negative curvature for many interesting cones. In some situations, we can argue that currently, some nonsymmetric treatments of the primal-dual problem pair have better complexity bounds than the primal-dual symmetric treatments. Example 3 Consider the cone of nonnegative polynomials: { } n K = p R n+ : p i t i 0, t R. The dual to this cone is the cone of positive semidefinite Hankel matrices. For k {0,,..., n}, denote { H k R (n+) (n+) : H (i,j), if i + j = k + k =, i, j {0,,..., n}. 0, otherwise For s R n+ we can define now the following linear operator: i=0 H(s) = n s i H i. i=0 Then the cone dual to K can be represented as follows: K = {s R n+ : H(s) 0}. The natural barrier for the dual cone is f(s) = ln det H(s). Clearly, it has negative curvature. Note that we can lift the primal cone to a higher dimensional space (see []): K = { p R n+ : p i = H i, Y, Y 0, i {0,,..., n} }, and use F (Y ) = ln det Y as a barrier function for the extended feasible set. However, in this case we significantly increase the number of variables. Moreover, we need O(n 3 ) operations for computing the value of the barrier F (Y ) and its gradient. On the other hand, in the dual space the cost of all necessary computations is very low (O(n ln n) for the function value and O(n ln n) for solution of the Newton system, see []). On top of these advantages, for non-degenerate dual problems, now we have a locally superlinearly convergent path-following scheme (8.). To conclude the paper, let us mention that the negative curvature seems to be a natural property of self-concordant barriers. Indeed, let us move from some point x int K along the direction h K: u = x + h. Then the Dikin ellipsoid of barrier F at point x, moved to the new center u, still belongs to K: u + (W r (x) x) = h + W r (x) K.

29 Local Superlinear Convergence of IPMs 9 We should expect that, in this situation, the Dikin ellipsoid W r (u) becomes even larger (in any case, we should expect that it does not get smaller). This is exactly the negative curvature condition: F (x) F (u). Thus, we are led to the following unsolved problem: What is the class of regular convex cones that admit a logarithmically homogeneous self-concordant barrier with negative curvature? In particular, does every regular convex cone admit a logarithmically homogeneous selfconcordant barrier with negative curvature? References [] H. H. Bauschke, O. Güler, A. S. Lewis and H. Sendov, Hyperbolic polynomials and convex analysis, Canad. J. Math. 53 (00) [] Y. Genin, Y. Hachez, Yu. Nesterov and P. Van Dooren, Optimization problems over positive pseudo-polynomial matrices, SIAM J. Matrix Anal. Appl. 5 (003) [3] A. V. Fiacco and G. P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques, J.Wiley & Sons Inc., New York, 968. [4] O. Güler, Hyperbolic polynomials and interior-point methods for convex programming, Math. Oper. Res. (997) [5] J. Ji, F. A. Potra and R. Sheng, On the local convergence of a predictor-corrector method for semidefinite programming, SIAM J. Optim. 0 (999) [6] M. Kojima, M. Shida and S. Shindoh, Local convergence of predictor-corrector infeasibleinterior-point algorithms for SDPs and SDLCPs, Math. Program. 80 (998) [7] Z. Lu and R. D. C. Monteiro, Limiting behavior of the Alizadeh-Haeberly-Overton weighted paths in semidefinite programming, Optim. Methods Softw. (007) [8] Z. Lu and R. D. C. Monteiro, Error bounds and limiting behavior of weighted paths associated with the SDP map X / SX /, SIAM J. Optim. 5 (004/05) [9] Z.-Q. Luo, J. F. Sturm and S. Zhang, Superlinear convergence of a symmetric primal-dual path following algorithm for semidefinite programming, SIAM J. Optim. 8 (998) [0] S. Mehrotra, Quadratic convergence in a primal-dual method, Math. Oper. Res. 8 (993) [] Yu. Nesterov, Introductory Lectures on Convex Optimization, Kluwer, Boston, 004. [] Yu. Nesterov, Squared functional systems and optimization problems, In High Performance Optimization, H.Frenk, T.Terlaky and S.Zhang (Eds.), Kluwer, 999. pp [3] Yu. Nesterov and A. Nemirovskii, Interior Point Polynomial Methods in Convex Programming: Theory and Applications, SIAM, Philadelphia, 994. [4] Yu. Nesterov and M. J. Todd, Self-scaled barriers and interior-point methods for convex programming, Math. Oper. Res. (997) 4. [5] F. A. Potra and R. Sheng, Superlinear convergence of a predictor-corrector method for semidefinite programming without shrinking central path neighborhood, Bull. Math. Soc. Sci. Math. Roumanie 43 (000) [6] F. A. Potra and R. Sheng, Superlinear convergence of interior-point algorithms for semidefinite programming, J. Optim. Theory Appl. 99 (998) [7] J. Renegar, Hyperbolic programs, and their derivative relaxations, Found. Comput. Math. 6 (006) [8] Y. Ye, O. Güler, R. A. Tapia and Y. Zhang, A quadratically convergent O( nl)-iteration algorithm for linear programming, Math. Program. 59 (993) 5 6. [9] Y. Zhang, R. A. Tapia and J. E. Dennis, On the superlinear and quadratic convergence of primal-dual interior point linear programming algorithms, SIAM J. Optim. (99)

Nonsymmetric potential-reduction methods for general cones

CORE DISCUSSION PAPER 2006/34 Nonsymmetric potential-reduction methods for general cones Yu. Nesterov March 28, 2006 Abstract In this paper we propose two new nonsymmetric primal-dual potential-reduction