Accelerating the cubic regularization of Newton s method on convex problems

Size: px
Start display at page:

Download "Accelerating the cubic regularization of Newton s method on convex problems"

Transcription

1 Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method [6]. The original version, used for minimizing a convex function with Lipschitzcontinuous Hessian, guarantees a global rate of convergence of order O( k ), where k is the iteration counter. Our modified version converges for the same problem class with order O( k ), keeping the complexity of each iteration unchanged. We study the complexity of both schemes on different classes of convex problems. In particular, we argue that for the second-order schemes, the class of non-degenerate problems is different from the standard class. Keywords: convex optimization, unconstrained minimization, Newton s method, cubic regularization, worst-case complexity, global complexity bounds, non-degenerate problems, condition number. Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), 4 voie du Roman Pays, 48 Louvain-la-Neuve, Belgium; nesterov@core.ucl.ac.be. The research results presented in this paper have been supported by a grant Action de recherche concertè ARC 04/09-5 from the Direction de la recherche scientifique - Communautè française de Belgique. The scientific responsibility rests with the author.

2 Introduction Motivation. Newton s method is one of the oldest schemes in Numerical Analysis []. The first theoretical study of its local performance was carried out in [4]. During many years, numerous useful suggestions were developed for stabilizing its local behavior (a detailed description of the results and references can be found in the comprehensive monographs [, ] ). However, the global worst-case complexity analysis for the second-order schemes was only given recently. In [6] a cubic regularization of the Newton s method (CNM), which can be applied to a general smooth unconstrained minimization problem, was proposed. The main advantage of this scheme consists in its natural geometrical interpretation: At each iteration we minimize a cubic model, which appears to be a global upper estimate for the objective function. This key feature ensures predictable global behavior of the process. In [6], attention was mainly paid to different classes of nonconvex problems. For convex problems, some of the statements of [6] can be strengthened. Moreover, we can employ convex optimization techniques for accelerating the local scheme. In this paper we assume that the objective function of a convex unconstrained minimization problem has Lipschitz-continuous Hessian (that is the basic problem class under consideration). As was shown in [6], the global rate of convergence of CNM on this problem class is of the order O( ), where k is the iteration counter. However, note that CNM k is a local one-step second-order method. From the complexity theory of smooth convex optimization (see, for example, Chapter in [5]), it is known that the rate of convergence of the local one-step first-order method (that is just a gradient method) can be improved from O( k ) to O( ) by passing to a multi-step strategy. In this paper we show that a k similar trick works with CNM also. As a result, we get a new method, which converges on the specified class of problems as O( ). k Contents. Section contains all necessary results related to regular functions. Namely, we present the main properties of uniformly convex functions and functions with Lipschitzcontinuous derivatives of a certain degree. In Section we introduce a cubic regularization of the Newton step and prove several useful inequalities. At the end of this section we show that CNM, as applied to a convex function from the basic problem class, converges globally as O( k ). The majority of the results of this section can be found in [6], but in a slightly weaker form. In Section 4 we derive an accelerated multi-step version of CNM. We prove that on the basic problem class it converges as O( k ). This acceleration is achieved by a modification of the technique of estimate sequences described in Section.. [5]. In Section 5 we introduce a new class of problems, which can be seen as non-degenerate problems for the second-order methods. This class is composed of the functions from the basic problem class, which are uniformly convex of degree three. On these problems both CNM and its accelerated version exhibit a global linear rate of convergence, which is proportional to the product of a certain power of the second-order condition number and the logarithm of the required accuracy. We show that the accelerated scheme with an appropriately chosen restarting strategy works better than the pure CNM. The results of this section suggest that the notion of non-degeneracy is method-dependent. The standard classification of non-degenerate problems based on the usual condition number works

3 properly only for the first-order schemes. In Section 6 we analyze the performance of the proposed schemes on strongly convex functions from the basic problem class. We show that the main computational effort is spent at the first stage of the process, aiming to enter the region of quadratic convergence. In some sense, the complexity of such problems is almost independent on the required accuracy. This situation can lead to erroneous conclusions on the efficiency of some numerical schemes. In Section 7 we give an example of a method which, when applied to strongly convex functions, formally converges as O( ). However, its actual performance k 8 appears to be worse than that of accelerated CNM. Finally, in the last Section 8 we discuss the results. Notation. In what follows E denotes a finite-dimensional real vector space, and E the dual space, which is formed by all linear functions on E. The value of function s E at x E is denoted by s, x. Let us fix a positive definite self-adjoint operator B : E E. Define the following norms: h = Bh, h /, h E, s = s, B s /, s E, A = max h Ah, A : E E. For a self-adjoint operator A = A, the same norm can be defined as A = max h Ah, h. (.) Any s E generates a rank-one self-adjoint operator ss : E E acting as follows ss x = s, x s, x E. We extend operator A(s) = ss s onto the origin in a continuous way: A(0) = 0. Further, for function f(x), x E, we denote by f(x) its gradient at x: f(x + h) = f(x) + f(x), h + o( h ), h E. Clearly f(x) E. Similarly, we denote by f(x) the Hessian of f at x: f(x + h) = f(x) + f(x)h + o( h ), h E. Of course, f(x) is a self-adjoint linear operator from E to E. Acknowledgements. The author would like to thank Laurence Wolsey for very useful comments on the content of the paper. Regular functions In this section, for the sake of completeness, we include all necessary results on convex functions possessing a certain type of regularity. We start from well known properties of uniformly convex functions (see, for example, [8]).

4 Let a function f(x) be differentiable on E. We call it uniformly convex on E of degree p if there exists a constant σ p = σ p (f) > 0 such that f(y) f(x) + f(x), y x + p σ p y x p, x, y E. (.) The pair (p, σ p ) is called the pair of parameters of the uniformly convex function. Adding such a function to an arbitrary convex function gives a uniformly convex function with the same pair of parameters. Recall that the degree p = corresponds to strongly convex functions. Lemma Assume that for some p, σ > 0, and all x, y E the following inequality holds: f(x) f(y), x y σ x y p, x, y E. (.) Then the function f is uniformly convex on E with parameters p and σ. Proof: Indeed, f(y) f(x) f(x), y x = f(x + τ(y x)) f(x), y x dτ 0 = 0 τ f(x + τ(y x)) f(x), τ(y x) dτ (.) 0 στ p y x p dτ = p σ y x p. In our analysis we often use the following inequality. Lemma For any h E, and s E we have s, h + p σ h p p p ( σ ) p p p s. (.) Proof: Denote by h the minimum of the left-hand side in (.). It satisfies the first-order optimality condition s + σ h p Bh = 0. Hence, s, h = σ h p and s = σ h p. Therefore s, h + p σ h p ( = p p σ h p = p p σ σ s ) p p, and (.) follows. 4

5 Lemma Let f be uniformly convex on E of degree p. Then f(y) f(x) f(x), y x p p Proof: Assume that f attains its global minimum at some point x. Then f(x ) = min f(y) (.) y ( ) (.) = f(x) p p p p p σ f(x). ( ) p p p σ p f(y) f(x). (.4) min [f(x) + f(x), y x + x E p σ p y x p] Let us fix x and consider the convex function φ(y) = f(y) f(x), y. Note that it is uniformly convex with parameters p and σ p. Moreover, it attains its minimum at y = x. Hence, applying the above inequality to φ(y), we get (.4). Note that for p = conditions (.) and (.4) are necessary and sufficient for a function f to be strongly convex with parameter σ = σ. Twice differentiable strongly convex functions admit also the following convenient characterization: f(x)h, h σ h, x, h E. (.5) Finally, let us give an important example of a uniformly convex function. Let us fix an arbitrary x 0 E. Define the function d p (x) = p x x 0 p. Then d p (x) = x x 0 p B(x x 0 ), x E. Lemma 4 For any x and y from E we have d p (x) d p (y), x y ( ) p x y p, (.6) d p (x) d p (y) d p (y), x y p Proof: Without loss of generality assume x 0 = 0. Then ( ) p x y p. (.7) d p (x) d p (y), x y = x p Bx y p By, x y = x p + y p Bx, y ( x p + y p ). For proving (.6), we need to show that the right-hand side of the latter equality is greater or equal than ( ) p x y p = ( ) p [ x + y Bx, y ] p/. Without loss of generality we can assume that x 0 and y 0. Then, denoting by τ = y x, α = Bx,y x y [, ], 5

6 we get the statement required to be proved: + τ p ατ( + τ p ) + ( ) p [ + τ ατ] p/, τ 0, α. (.8) Since the right-hand side of this inequality is convex in α, we need to justify two marginal inequalities: α = : + τ p τ( + τ p ) + ( ) p τ p, α = : + τ p τ( + τ p ) + ( ) p ( + τ) p (.9) for all τ 0. The second inequality in (.9) can be derived from the lower bound for the ratio + τ p + τ( + τ p ) ( + τ) p = + τ p, τ 0. ( + τ) p Indeed, its minimum is attained at τ =, and that proves the second line in (.9). For proving the first line, note that it is valid for τ =. If τ 0 and τ, then we need to estimate from below the ratio + τ p τ( + τ p ) τ p = ( τ)( τ p ) τ p = + τ τ p τ p Since the absolute value of any coefficient of the polynomial ( τ) p does not exceed p, the first line in inequality (.9) is also justified. This proves (.6), and, for proving (.7), we can use now Lemma. Another type of regularity that we are interested concerns the smoothness conditions (see, for example, [7]). Usually they are stated in terms of Lipschitz conditions for derivatives of a certain order: k f(x) k f(y) L k+ (f) x y, x, y E, k 0. In this paper we mainly consider functions with Lipschitz-continuous Hessian: f(x) f(y) L x y, x, y E, (.0) where L def = L (f). Consequently, for all x and y from E we have f(y) f(x) f(x)(y x) L y x. (.) Moreover, for the quadratic model we can bound the residual: f (x; y) def = f(x) + f(x), y x + f(x)(y x), y x f(y) f (x; y) L 6 y x, x, y E. (.) The simplest functions satisfying condition (.0) are, of course, the quadratic functions. However, sometimes another function d (x) is quite useful. 6

7 Lemma 5 For any x, y E we have d (x) d (y) x y. (.) Proof: Without loss of generality, assume x 0 = 0. Then d (x) = x, and for any x E we have d (x) = x B + x Bxx B. Let us fix two points x, y E and arbitrary direction h E. Define x(τ) = x + τ(y x) and φ(τ) = d (x(τ))h, h = x(τ) h + x(τ) Bx(τ), h, τ [0, ]. Assume first that 0 [x, y]. Then φ(τ) is continuously differentiable on [0, ] and φ (τ) = Bx(τ),y x x(τ) ( = Bx(τ),y x x(τ) h Denote α = Bx(τ),h x(τ) h [, ]. Then h + Bx(τ),h x(τ) Bh, y x Bx(τ),h Bx(τ), y x x(τ) ) Bx(τ), h x(τ) }{{} 0 + Bx(τ),h x(τ) Bh, y x. φ (τ) y x h ( α + α ) y x h. Hence, ( d (y) d (x))h, h = φ() φ(0) y x h, and we get (.) from (.). The remaining case 0 [x, y] is trivial since d (0) = 0. In the sequel we often establish the complexity of different problem classes in terms of condition numbers of certain degree: γ p (f) def = σ p(f) L p (f), p. (.4) It is clear, for example, that γ (d ) =. On the other hand, from (.7) and (.) we conclude that γ (d ) = 4. Cubic regularization of Newton iteration In this section we present the most important properties of cubic regularization of the Newton s method. As compared with [6], here we take into account the convexity of the objective function. The main problem of interest is as follows: min x E f(x), (.) 7

8 where f is a twice differentiable convex function with Lipschitz-continuous Hessian. As suggested in [6], we introduce the following mapping: [ def T M (x) = Arg min ˆfM (x; y) def = f (x; y) + M y E 6 y x ]. (.) Note that T = T M (x) is the unique solution of the following system: f(x) + f(x)(t x) + M T x B(T x) = 0. (.) Denote r M (x) = x T M (x). Then, f(t ) (.) = f(t ) f(x) f(x)(t x) M r M(x)B(T x) (.) L +M r M (x). Further, multiplying (.) by T x, we obtain (.4) f(x), x T = f(x)(t x), T x + Mr M(x). (.5) Let us assume M L. Then, in view of (.), we have f(x) f(t ) f(x) ˆf M (x; T ) = f(x), x T f(x)(t x), T x M 6 r M (x) (.6) In particular, since f is convex, = f(x)(t x), T x + M r M (x). f(x) f(t ) (.6) M (.4) r M (x) M ( ) / L +M f(t ). (.7) Sometimes we need to interpret this step from a global perspective: f(t ) (ML ) [ min f (x; y) + M y 6 y (.) x ] Finally, let us prove the following result. Lemma 6 If M L, then [ min f(y) + L +M y 6 y x ]. (.8) f(t ), x T L +M f(t ) /. (.9) Proof: Denote T = T M (x) and r = r M (x). Then 4 L r4 = ( L T x ) (.) f(t ) f(x) f(x)(t x) (.) = f(t ) + M r B(T x) = f(t ) + Mr f(t ), T x + 4 M r 4. 8

9 Hence, f(t ), x T Mr f(t ) + 4M (M L )r. (.0) In view of the conditions of the lemma, we can estimate the derivative in r of the righthand side of the latter inequality: Mr f(t ) + r 4M (M L ) Mr f(t ) + ( ) L +M r M (.4) 0. [ / Thus, its minimum is attained at the boundary point r = L +M ] f(t ) of the feasible ray (.4). Substituting this value in (.0), we obtain (.9). To conclude this section, let us estimate the rate of convergence of the CNM method as applied to our basic problem (.). We assume that a solution of this problem x exists, and the Lipschitz constant L for the Hessian of the objective function is known. Thus, we just iterate x k+ = T L (x k ), k = 0,,.... (.) Using the same arguments as in [6], we can prove the following statement. Theorem Assume that the level sets of the problem (.) are bounded: x x D x : f(x) f(x 0 ). (.) If the sequence {x k } k= is generated by (.), then f(x k ) f(x ) 9L D (k+4), k. (.) Proof: In view of (.6), f(x k+ ) f(x k ), k 0. Thus, x k x D for all k 0. Further, in view of (.8), we have f(x ) f(x ) + L D. (.4) Consider now an arbitrary k. Denote x k (τ) = x + ( τ)(x k x ). In view of (.8), for any τ [0, ] we have f(x k+ ) f(x k (τ)) + τ L x k x f(x k ) τ(f(x k ) f(x )) + τ L D The minimum of the right-hand side is attained for τ = Thus, for any k we have f(xk ) f(x ) L D f(x ) f(x ) L D (.4) <. Denote δ k = f(x k ) f(x ). Then f(x k+ ) f(x k (τ)) (f(x k) f(x )) / L D. (.5) δk+ δk = δ k δ k+ δk δ k+ ( δ k + δ k+ ) (.5) L D δ k δk+ ( δ k + δ k+ ) L. D 9

10 Thus, for any k, we have δk δ + k L D (.4) ( ) L + k D k+4. L D 4 Accelerated scheme In order to accelerate method (.), we apply a variant of the technique of estimate sequences, which was described in Section.. [5] as a tool for accelerating the usual gradient method. In our situation, this idea can be applied to CNM in different ways. We mention only two of them.. Linear estimate functions. For solving the optimization problem (.), we recursively update the following sequences. Sequence of estimate functions ψ k (x) = l k (x) + N 6 x x 0, k =,,..., where l k (x) are linear functions in x E, and N is a positive real parameter. A minimizing sequence {x k } k=. A sequence of scaling parameters {A k } k= : A k+ def = A k + a k, k =,,.... For these objects, we are going to maintain the following relations: R k : A kf(x k ) ψk min ψ k(x), x R k : ψ k(x) A k f(x) + L +N 6 x x 0, x E. Let us ensure that relations (4.) hold for k =. We choose, k. (4.) x = T L (x 0 ), l (x) f(x ), x E, A =. (4.) Then ψ = f(x ), so R holds. On the other hand, in view of, we get ψ (x) = f(x ) + N 6 x x 0 (.8) [ min f(y) + L y 6 y x 0 ] + N 6 x x 0, and R follows. Assume now that relations (4.) hold for some k. Denote v k = arg min x 0 ψ k (x).

11 Let us choose some a k > 0 and M L. Define α k = a k A k +a k, y k = ( α k )x k + α k v k, x k+ = T M (y k ), (4.) ψ k+ (x) = ψ k (x) + a k [f(x k+ ) + f(x k+ ), x x k+ ]. In view of R k, for any x E we have ψ k+ (x) A k f(x) + L +N 6 x x 0 + a k [f(x k+ ) + f(x k+ ), x x k+ ] (A k + a k )f(x) + L +N 6 x x 0, and this is R k+. Let us show now that, for an appropriate choice of a k, N and M, relation R k+ is also valid. Indeed, in view of R k and by Lemma 4 with p =, for any x E, we have Therefore ψ k (x) l k (x) + N d (x) ψ k + N 6 x v k A k f(x k ) + N 6 x v k. (4.4) ψ k+ = min x {ψ k (x) + a k [f(x k+ ) + f(x k+ ), x x k+ ]} (4.4) { } min A k f(x k ) + N x x v k + a k [f(x k+ ) + f(x k+ ), x x k+ ] min x {(A k + a k )f(x k+ ) + A k f(x k+ ), x k x k+ +a k f(x k+ ), x x k+ + N x v k ]} (4.) = min x {A k+ f(x k+ ) + f(x k+ ), A k+ y k a k v k A k x k+ +a k f(x k+ ), x x k+ + N x v k ]} = min x {A k+ f(x k+ ) + A k+ f(x k+ ), y k x k+ +a k f(x k+ ), x v k + N x v k ]}. Further, if we choose M L, then by (.9) we have f(x k+ ), y k x k+ L +M f(x k+) /.

12 Hence, our choice of parameters must ensure the following inequality: A k+ L +M f(x k+) / + a k f(x k+ ), x v k + N x v k 0, for all x E. Using inequality (.) with p =, s = a k f(x k+ ), and σ = 4N, we come to the following condition: A k+ L +M 4 N a/ k. (4.5) For k, let us choose A k = k(k+)(k+) 6, a k = A k+ A k = (k+)(k+)(k+) 6 k(k+)(k+) 6 (4.6) Since = (k+)(k+). a / k A k+ = / (k+)(k+)(k+) 6[(k+)(k+)] / = / (k+) [(k+)(k+)] /, inequality (4.5) leads to the following condition on the parameters: Hence, we can choose L +M N. M = L, N = 4(L + M) = L. (4.7) In this case L + N = 4L. Now we are ready to put all the pieces together. Accelerated cubic regularization of Newton s method Initialization: Choose x 0 E. Set M = L and N = L. Compute x = T L (x 0 ) and define ψ (x) = f(x ) + N 6 x x 0. Iteration k, (k ): (4.8). Compute v k = arg min x E ψ k(x) and choose y k =. Compute x k+ = T M (y k ) and update k k+ x k + k+ v k. ψ k+ (x) = ψ k (x) + (k+)(k+) [f(x k+ ) + f(x k+ ), x x k+ ]. The above discussion proves the following theorem.

13 Theorem If sequence {x k } k= is generated by method (4.8) as applied to the problem (.), then for any k we have: where x is an optimal solution to the problem. Proof: Indeed, we have shown that f(x k ) f(x ) 4 L x 0 x k(k + )(k + ), (4.9) A k f(x k ) R k ψ k R k A k f(x ) + L +N 6 x 0 x. Thus, (4.9) follows from (4.6) and (4.7). Note that the point v k can be found in (4.8) by an explicit formula. Consider s k = l k (x). This vector does not depend on x since the function l k (x) is linear. Then v k = x 0 N B s k. s k /. Quadratic estimate functions. Similar analysis can be applied to a variant of scheme (4.8) based on quadratic estimate functions. Indeed, let us define ψ k (x) = q k (x) + N 6 x x 0, k =,,..., where q k (x) are some quadratic functions. If we choose x = T L (x 0 ), q (x) = f (x 0 ; x), A =, (4.0) then, for any N L we can guarantee that the conditions A k f(x k ) ψ k (x) (.8) ψ k, (.) f(x) + L +N 6 x x 0, x E, hold for k =. Then, by the same arguments as above, we can see that the rules (4.) maintain (4.0) for all k provided that N 4(L + M). Thus, the new version of (4.8) has a slightly better constant in the estimate of the rate of convergence: f(x k ) f(x ) L x 0 x k(k + )(k + ). However, the rules for computing v k become more complicated.

14 5 Global non-degeneracy for second order schemes Traditionally, in numerical analysis the term non-degenerate is applied to certain classes of efficiently solvable problems. For unconstrained optimization, non-degeneracy of the objective function is usually characterized by a lower bound on the angle between the gradient at point x and the direction pointing to the optimal solution: α(x) def = f(x),x x f(x) x x µ(f) > 0, x E. (5.) This condition has a nice geometric interpretation. Moreover, there exists a large class of smooth convex functions, possessing the required property. This is the class of strongly convex functions with Lipschitz-continuous gradient. Lemma 7 µ(f) γ (f) +γ (f) > γ (f). Indeed, in view of inequality (..4) in [5], we have f(x), x x σ +L f(x) + σ L σ +L x x σ L σ +L f(x) x x, and this proves the required inequality. Note that the complexity of the first-order schemes for the class of smooth strongly convex function can be completely characterized in terms of the condition number γ. Indeed, on the one hand, the lower complexity bound for finding an ɛ-solution of any problem from this problem class is proven to be ( ) O γ ln L D (5.) calls of oracle, where the constant D bounds the distance between the initial point and the optimal solution. On the other hand, there exist simple numerical schemes, which exhibit the required rate of convergence (see Chapter in [5] for details). What can be said about the complexity of the above problem class for the second order schemes? Surprisingly enough, in this situation it is quite difficult to find any advantages of the condition (5.). We will discuss the complexity of this class in detail later in Section 6. Now let us present a new non-degeneracy condition, which replaces (5.) for the second-order methods. Assume that γ (f) = σ (f) L (f) > 0. In this case, ɛ f(x) f(x ) (.4) σ f(x) / (5.) Therefore, for method (.) we have f(x k ) f(x k+ ) (.7) f(x L k+ ) / 4 (5.) γ (f) (f(x k+ ) f(x )). (5.4)

15 Hence, for any k we have f(x k ) f(x ) (5.4) f(x ) f (.8) ) k ( + γ (f) e γ (f) (k ) + γ (f) L x 0 x (5.5) Thus, the complexity of minimizing a function with positive third condition number γ (f) by method (.) is of the order ( ) O ln L D γ (f) ɛ (5.6) calls of oracle. The structure of this estimate is similar to that of (5.). Therefore, we will say that such functions possessing the global second-order non-degeneracy property. Let us demonstrate that the accelerated variant of the Newton s method (4.8) can be used to improve the complexity estimate (5.6). Denote by A k (x 0 ), k, the point x k generated by method (4.8) with starting point x 0. Consider the following process.. Define m = 5 γ (f) /, and set y0 = x 0.. For k 0, iterate y k+ = A m (y k ). (5.7) The performance of this scheme can be derived from the following lemma. Lemma 8 For any k 0 we have y k+ x e y k x. Proof: ( Indeed, since m 4e /, γ (f)) we have σ y k+ x (.) f(y k+ ) f(x ) (4.9) 4L y k x m(m+)(m+) e σ y k x. Thus, f(t L (y k )) f(x ) (.8) L y k x (.8) L y 0 x e k, and we conclude that an ɛ-solution to our problem can be found by (5.7) in ( O [γ (f)] / ln [ L ɛ x 0 x ]) (5.8) iterations. Unfortunately, since the complexity theory for this problem class is not developed yet, we cannot say how far our results are from the best possible ones. 5

16 6 Minimizing strongly convex functions Let us look now at the complexity of problem (.) with σ (f) > 0, L (f) <. (6.) The main advantage of such functions consists in possibility for the Newton s method (.) to converge quadratically in a certain neighborhood of the solution. Indeed, for T = T L (x) we have Hence, f(x) f(t ) (.6) f(t )(T x), T x (.5) σ r L (x) (.4) f(t ) f(x ) (.4) σ L f(t ) σ L [σ (f(t ) f(x ))] /. (6.) L σ (6.) (f(x) f(t )) L (f(x) f(x )), (6.) σ and the region of quadratic convergence of method (.) can be defined as Q f = { } x E : f(x) f(x ) σ. (6.4) L Alternatively, the region of quadratic convergence can be described by the norm of the gradients. Indeed, σ r L (x) Thus, (.5) f(t )(T x), T x (.6) f(x) f(t ) f(x) r L (x). f(x) σ r L (x) (.4) σ [ L f(t ) ] /. Consequently, f(t ) 4L f(x) σ, (6.5) and the required region of quadratic convergence can be defined as { } Q g = x E : f(x) σ 4L. (6.6) Thus, the global complexity of problem (.), (6.) is mainly related to the number of iterations, that is required to get from x 0 to the region Q f (or, to Q g ). For method (.), this value can be estimated from above by ( ) L O (f)d σ (f), (6.7) where D is defined by (.) (see Section 6 in [6]). Let us show that, using the accelerated scheme (4.8), it is possible to improve this complexity bound. Assume that we know an upper bound for the distance to the solution: x 0 x R ( D). 6

17 Consider the following process.. Set y 0 = T L (x 0 ), and define m 0 = 5 [ ] L (f)r /. σ (f). while f(t L (y k )) σ 4L do {y k+ = A mk (y k ), m k+ = / m k }. Theorem The process (6.8) terminates at most after [ ( ) ] ln 4 ln 8 L (f)r σ (f) (6.8) (6.9) stages. The total number of Newton steps in all stages does not exceed 4m 0. Proof: Denote R k = R ( ) k. It is clear that For k 0, let us prove by induction that [ ] m k 5 L (f)r / k σ (f), k 0. (6.0) y k x R k. (6.) Assume that for some k 0 this statement is valid (it is true for k = 0). Then, σ y k+ x (.) f(y k+ ) f(x ) (4.9) 4L R k m k (m k +)(m k +) (6.0) 4 5 σ R k 8 σ R k = σ R k+. Thus, (6.) is valid for all k 0. On the other hand, f(y k+ ) f(x ) (4.9) 4L y k x m k (m k +)(m k +) (6.) 4L y k x R k m k (m k +)(m k +) Hence (6.0) 8 σ y k x (.) 4 (f(y k) f(x )). (6.) σ L f(t L (y k )) f(y k ) f(t L (y k )) f(y k ) f(x ) ( 4) k (f(y0 ) f(x )) (.8) ( 4 ) k L R, and (6.9) follows form (6.6). Finally, the total number of Newton steps does not exceed m k k=0 = m 0 k=0 k/ = m 0 / < 4m 0. 7

18 7 Fake acceleration Note that the properties of the class of smooth strongly convex functions (6.) leave some space for erroneous conclusions related to the rate of convergence of the optimization methods in the first stage of the process, aiming to enter the region of quadratic convergence of the Newton s method. Let us demonstrate a possible mistake on a particular example. Consider a modified version M of the method (4.8). The only modification is introduced in Step. Now it looks as follows:. Compute ŷ k = T M (y k ) and update ψ k+ (x) = ψ k (x) + (k+)(k+) [f(ŷ k ) + f(ŷ k ), x ŷ k ]. (7.) Choose ˆx k : f(ˆx k ) = min{f(x k ), f(ŷ k )}. Set x k+ = T M (ˆx k ). Note that for M the statement of Theorem is valid. Moreover, the process now becomes monotone, and, using the same reasoning as in (6.) and M = L, we obtain f(x k ) f(x k+ ) f(ˆx k ) f(x k+ ) σ / L [f(x k+ ) f(x )] /. (7.) Further, let us fix the number of steps N. Define ˆk = N. Then, in view of (4.9), we can guarantee that f(xˆk) f(x ) 7 L R. (7.) N On the other hand f(xˆk) f(x ) f(xˆk) f(x N+ ) (7.) N / σ L [f(x N+ ) f(x )] /. (7.4) Combining (7.) and (7.4) we obtain f(x N+ ) f(x ) 0 7 L 4 R6 5 σ N 8. (7.5) As compared with (4.9), the proposed modification looks amazingly efficient. However, that is just an illusion. Indeed, in view of (6.4), in order to enter the region of quadratic convergence of the Newton s method, we need to make the right-hand-side of inequality (7.5) smaller than σ. For that we need L ( [ ] ) /4 O L R σ (7.6) iterations of M. This is much worse than the complexity estimate (6.7) of the basic scheme (.) even without acceleration (4.8). Another test could be an estimate for the number of steps, which is necessary ( for M [ ] ) / to halve the distance to the minimum. From (7.5) we see that it needs O L R σ iterations, which is worse than the corresponding estimate for the method (4.8). 8

19 8 Discussion. From the complexity results presented in the previous sections we can derive a class of problems, which are easy for the second order schemes: σ (f) > 0, σ (f) > 0, L (f) <. (8.) For such functions, the second-order methods exhibit a global linear rate of convergence and a local quadratic convergence. In accordance with (5.8) and (6.4), we need O ( [ ] L (f) / [ σ (f) ln L (f) σ (f) x 0 x ] ) (8.) iterations of (4.8) to enter the region of quadratic convergence. Note that the class (8.) is non-trivial. It contains, for example, all functions with parameters ξ α,β (x) = αd (x) + βd (x), α, β > 0, σ (ξ α,β ) = α, σ (ξ α,β ) = β, L (ξ α,β ) = β. Moreover, any convex function with Lipschitz-continuous Hessian can be regularized by adding an auxiliary function ξ α,β.. For one important class of convex problems, that is σ (f) > 0, L (f) <, L (f) <, (8.) we have actually failed to clarify the situation. The standard theory of the optimal firstorder methods (see, for example, Section. in [5])) can bound the number of iterations, that are required to enter the region of quadratic convergence (6.4), as follows: O ( [ ] L (f) / [ L (f)l σ (f) ln (f) σ (f) x 0 x ]). (8.4) Note that in this estimate the role of the second-order scheme is quite weak: it is used only to establish the bounds of the termination stage. Of course, as it is shown in Section 6, we could use it on the first stage also. However, in this case the size of the optimal solution x enters polynomially the estimate for the number of iterations. Thus, the following question is still open: For the problem class (8.), can we get any advantage from the second order schemes being used at the initial stage of minimization process?. From the computational point of view, the results presented in this paper are rather preliminary. For example, we always assume that all necessary constants are known. In practical algorithms, the adaptive strategies for updating these parameters are of crucial importance. However, the author believes that the theory presented here could serve as a useful guideline for such developments. 9

20 References [] Bennet A.A., Newton s method in general analysis, Proc. Nat. Ac. Sci. USA, 96,, No 0, [] Conn A.B., Gould N.I.M., Toint Ph.L., Trust Region Methods, SIAM, Philadelphia, 000. [] Dennis J.E., Jr., Schnabel R.B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, SIAM, Philadelphia, 996. [4] Kantorovich L.V., Functional analysis and applied mathematics, Uspehi Matem. Nauk, 948,, No, 89 85, (in Russian), translated as N.B.S. Report 509, Washington D.C., 95. [5] Nesterov Yu., Introductory lectures on convex programming. A basic course, Kluwer, Boston, 004. [6] Yu.Nesterov, B.Polyak. Cubic regularization of Newton method and its global performance. CORE Discussion paper # 00/4, (00). Submitted to Mathematical Programming. [7] Ortega J.M., Rheinboldt W.C., Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, NY, 970. [8] A. Vladimirov, Yu. Nesterov, Yu. Chekanov. Uniformly convex functionals, (In Russian) Vestnik Moskovskogo universiteta, ser. Vychislit. Matem. i Kibern., 978, No.4,

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28 26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel

More information

Nonsymmetric potential-reduction methods for general cones

Nonsymmetric potential-reduction methods for general cones CORE DISCUSSION PAPER 2006/34 Nonsymmetric potential-reduction methods for general cones Yu. Nesterov March 28, 2006 Abstract In this paper we propose two new nonsymmetric primal-dual potential-reduction

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Constructing self-concordant barriers for convex cones

Constructing self-concordant barriers for convex cones CORE DISCUSSION PAPER 2006/30 Constructing self-concordant barriers for convex cones Yu. Nesterov March 21, 2006 Abstract In this paper we develop a technique for constructing self-concordant barriers

More information

Cubic regularization of Newton method and its global performance

Cubic regularization of Newton method and its global performance Math. Program., Ser. A 18, 177 5 (6) Digital Object Identifier (DOI) 1.17/s117-6-76-8 Yurii Nesterov B.T. Polyak Cubic regularization of Newton method and its global performance Received: August 31, 5

More information

Primal-dual subgradient methods for convex problems

Primal-dual subgradient methods for convex problems Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Smooth minimization of non-smooth functions

Smooth minimization of non-smooth functions Math. Program., Ser. A 103, 127 152 (2005) Digital Object Identifier (DOI) 10.1007/s10107-004-0552-5 Yu. Nesterov Smooth minimization of non-smooth functions Received: February 4, 2003 / Accepted: July

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian C. Cartis, N. I. M. Gould and Ph. L. Toint 3 May 23 Abstract An example is presented where Newton

More information

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Boris T. Polyak Andrey A. Tremba V.A. Trapeznikov Institute of Control Sciences RAS, Moscow, Russia Profsoyuznaya, 65, 117997

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

An improved convergence theorem for the Newton method under relaxed continuity assumptions

An improved convergence theorem for the Newton method under relaxed continuity assumptions An improved convergence theorem for the Newton method under relaxed continuity assumptions Andrei Dubin ITEP, 117218, BCheremushinsaya 25, Moscow, Russia Abstract In the framewor of the majorization technique,

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Worst case complexity of direct search

Worst case complexity of direct search EURO J Comput Optim (2013) 1:143 153 DOI 10.1007/s13675-012-0003-7 ORIGINAL PAPER Worst case complexity of direct search L. N. Vicente Received: 7 May 2012 / Accepted: 2 November 2012 / Published online:

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

An introduction to complexity analysis for nonconvex optimization

An introduction to complexity analysis for nonconvex optimization An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint

More information

Lecture 25: Subgradient Method and Bundle Methods April 24

Lecture 25: Subgradient Method and Bundle Methods April 24 IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

Introduction to Nonlinear Stochastic Programming

Introduction to Nonlinear Stochastic Programming School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Primal-dual IPM with Asymmetric Barrier

Primal-dual IPM with Asymmetric Barrier Primal-dual IPM with Asymmetric Barrier Yurii Nesterov, CORE/INMA (UCL) September 29, 2008 (IFOR, ETHZ) Yu. Nesterov Primal-dual IPM with Asymmetric Barrier 1/28 Outline 1 Symmetric and asymmetric barriers

More information

Evaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

The Trust Region Subproblem with Non-Intersecting Linear Constraints

The Trust Region Subproblem with Non-Intersecting Linear Constraints The Trust Region Subproblem with Non-Intersecting Linear Constraints Samuel Burer Boshi Yang February 21, 2013 Abstract This paper studies an extended trust region subproblem (etrs in which the trust region

More information

Local Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems

Local Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems Local Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems Yu. Nesterov and L. Tunçel November 009, revised: December 04 Abstract In this paper, we

More information

A projection algorithm for strictly monotone linear complementarity problems.

A projection algorithm for strictly monotone linear complementarity problems. A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu

More information

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

An interior-point trust-region polynomial algorithm for convex programming

An interior-point trust-region polynomial algorithm for convex programming An interior-point trust-region polynomial algorithm for convex programming Ye LU and Ya-xiang YUAN Abstract. An interior-point trust-region algorithm is proposed for minimization of a convex quadratic

More information

Lagrange Relaxation and Duality

Lagrange Relaxation and Duality Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Kantorovich s Majorants Principle for Newton s Method

Kantorovich s Majorants Principle for Newton s Method Kantorovich s Majorants Principle for Newton s Method O. P. Ferreira B. F. Svaiter January 17, 2006 Abstract We prove Kantorovich s theorem on Newton s method using a convergence analysis which makes clear,

More information

Augmented self-concordant barriers and nonlinear optimization problems with finite complexity

Augmented self-concordant barriers and nonlinear optimization problems with finite complexity Augmented self-concordant barriers and nonlinear optimization problems with finite complexity Yu. Nesterov J.-Ph. Vial October 9, 2000 Abstract In this paper we study special barrier functions for the

More information

A NOTE ON Q-ORDER OF CONVERGENCE

A NOTE ON Q-ORDER OF CONVERGENCE BIT 0006-3835/01/4102-0422 $16.00 2001, Vol. 41, No. 2, pp. 422 429 c Swets & Zeitlinger A NOTE ON Q-ORDER OF CONVERGENCE L. O. JAY Department of Mathematics, The University of Iowa, 14 MacLean Hall Iowa

More information

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition Man-Chung Yue Zirui Zhou Anthony Man-Cho So Abstract In this paper we consider the cubic regularization

More information

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL) Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition

On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition On the Quadratic Convergence of the Cubic Regularization Method under a Local Error Bound Condition Man-Chung Yue Zirui Zhou Anthony Man-Cho So Abstract In this paper we consider the cubic regularization

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Numerical Methods for Large-Scale Nonlinear Systems

Numerical Methods for Large-Scale Nonlinear Systems Numerical Methods for Large-Scale Nonlinear Systems Handouts by Ronald H.W. Hoppe following the monograph P. Deuflhard Newton Methods for Nonlinear Problems Springer, Berlin-Heidelberg-New York, 2004 Num.

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces

Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces Applied Mathematical Sciences, Vol. 6, 212, no. 63, 319-3117 Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces Nguyen Buong Vietnamese

More information

Second Order Optimization Algorithms I

Second Order Optimization Algorithms I Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

On the Local Convergence of Regula-falsi-type Method for Generalized Equations

On the Local Convergence of Regula-falsi-type Method for Generalized Equations Journal of Advances in Applied Mathematics, Vol., No. 3, July 017 https://dx.doi.org/10.606/jaam.017.300 115 On the Local Convergence of Regula-falsi-type Method for Generalized Equations Farhana Alam

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Lecture 6 : Projected Gradient Descent

Lecture 6 : Projected Gradient Descent Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

CONVERGENCE BEHAVIOUR OF INEXACT NEWTON METHODS

CONVERGENCE BEHAVIOUR OF INEXACT NEWTON METHODS MATHEMATICS OF COMPUTATION Volume 68, Number 228, Pages 165 1613 S 25-5718(99)1135-7 Article electronically published on March 1, 1999 CONVERGENCE BEHAVIOUR OF INEXACT NEWTON METHODS BENEDETTA MORINI Abstract.

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder 011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015 5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

Implications of the Constant Rank Constraint Qualification

Implications of the Constant Rank Constraint Qualification Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates

More information

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017 Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information