CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28

Size: px
Start display at page:

Download "CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28"

Transcription

1 26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS

2 CORE Voie du Roman Pays 4, L Tel 2 ) Fax 2 ) immaq-library@uclouvainbe

3 CORE DISCUSSION PAPER 26/28 Globally convergent second-order schemes for minimizing twice-differentiable functions GN Grapiglia and Yu Nesterov July 8, 26 Abstract In this paper, we suggest new universal second-order methods for unconstrained minimization of twice-differentiable convex or non-convex) objective function For the current function, these methods automatically achieve the best possible global complexity estimates among different Hölder classes containing the Hessian of the objective The universal methods for functional residual and for norm of the gradient are different For development of the latter methods, we introduced a new line-search acceptance criterion, which can be seen as a nonlinear modification of the Armijo-Goldstein condition Keywords: unconstrained minimization, second-order methods, Hölder condition, worstcase global complexity bounds Federal University of Paraná, Brazil; research of this author was partially supported by CNPq - Brazil, grant 4288/24-5 CORE, UCL, Belgium; research of this author was partially supported by CNPq - Brazil, grants 4288/24-5 and 498/24-5, and by the grant Action de recherche concertè ARC 4/9-6 from the Direction de la recherche scientifique - Communautè française de Belgique Scientific responsibility rests with the author

4 Introduction Motivation Recent results on the global worst-case complexity bounds for a new variant of Newton method 9] became a starting point for a sequence of publications, addressing the global complexity issues for the second-order methods see, for example, 2], ], 4], 6], 8]) These papers, present the upper and lower estimates for the rate of convergence of second-order schemes in terms of the residual in function value or in the norm of the gradient In the majority of these publications, the standard assumption is the Lipschitz property of the Hessian of the objective function However, it is clear that this is not the only possibility for measuring the level of smoothness of the Hessian For example, we could assume that the Hessian is Höldercontinuous At the same time, a straightforward tuning of the methods to corresponding smoothness assumptions usually results in the explicit changes in the algorithms This is indeed unfortunate, since very often we do not know a priori which smoothness assumption better fits our particular objective function Recently, in the paper 7], it was shown that it is possible to develop universal first-order methods, which do not need preliminary information on the type of Hölder continuity of the gradient Therefore, it looks interesting to develop a similar theory for the second-order methods This is the main goal of this paper Contents In Section 2, we study the structure of Hölder constants for twice differentiable functions and derive main geometric inequalities We also justify the rate of convergence in terms of the function residual both for the simplest overestimating second-order method with perfect knowledge of the Hölder parameter and Hölder constant for the Hessian, and for an adaptive scheme, which needs only the Hölder parameter at the input In Section we present a universal second-order method for achieving a small residual in the function value It appears that, up to the choice of initial estimate for the Hölder constant, this is exactly the Cubic Regularization of Newton Method suggested in 9] Its complexity bound for a twice-differentiable objective function coincides with the bound for a method employing the knowledge on the best Hölder class for this particular function In the second part of the paper, we present methods targeting the finding of points with small norm of the gradient First of all, in Section 4 we show that, if the Hölder parameter is close to zero, then the standard line-search acceptance criterion, based on the relation between the function value at the candidate point and the predicted value of the regularized quadratic model, cannot help in finding the points with small gradients Instead, we propose two new acceptance criteria, which can be seen as a multi-dimensional generalization of Armijo-Goldstein condition First of them allows efficient adaptation of Hölder constant And the second one leads to a fully universal method for finding the points with small norm of the gradient both for convex and non-convex objective functions see Section 5) Notations and generalities In what follows, we denote by E a finite-dimensional linear space, and by E its dual space, composed by linear functions on E The value of function s E at point x E is denoted by s, x Important elements of the dual space are the gradients of a differentiable function f : E R: fx) E, x E

5 For an operator A : E E, denote by A its adjoint operator defined by identity Ax, y = A y, x, x, y E Thus, A : E E It is called self-adjoint if A = A Important examples of such operators are Hessians of twice differentiable function f : E R: 2 fx)u, v = 2 fx)v, u, x, u, v E Operator B : E E is positive-definite if Bx, x >, x E \ {, notation B ; we use notation B if the above inequality is not strict) In what follows, we fix some self-adjoint positive-definite operator B for defining Euclidean norms in the primal and dual spaces: x = Bx, x /2, x E, s = s, B s /2, s E The norm of operator A : E E is defined in a standard way: A = max { Au { : u = min r : r 2 B A B A u E r In what follows, we often use the following simple statement Lemma Let α, 2] Assume that the sequence of positive numbers {δ k k satisfies inequalities δ k δ k+ δk+ α, k ) Then for any t, t m, we have ln δ t ) t α ln δ, 2) and for t m we can guarantee that δ t δ m + α )t m)δα m +δm α ] α On the other hand, if for some m and k m we have then δ t < for t m, and the following inequality holds: δ t ] +δ α m α α )t m) ) δ k δ k+ δ α k, 4) δ m +α )t m)δm α ] α α )t m) ] α 5) Indeed, from condition ), we have δ t δt+ α, and 2) follows On the other hand, inequality ) can be rewritten as δ α k δ k+ + δk+ α )α Therefore, δ α k+ δ α k δ α k+ δ k+ = δ k++δ +δk+ α k+ α )α δ α k+ )α δ α k+ δ k++δk+ α )α 6) 2

6 Since α, ], function τ α is concave for τ Therefore τ α τ α 2 + α )τ α 2 2 τ τ 2 ), τ, τ 2 > Choosing τ = δ k+ and τ 2 = δ k+ + δk+ α, we get δ α k+ δ k+ + δ α k+ )α α )δ k+ + δ α k+ )α 2 δ α k+ Substituting this inequality in 6), for any k {m,, t, we obtain δ α k+ δ α k α )δ k++δk+ α )α 2 δk+ α = α )δ k+ δ α k+ δ k++δk+ α )α δ k+ +δk+ α = α +δ α k+ α +δm α It remains to sum up these inequalities for k = m,, t In order to prove inequality 5), note that function is convex in τ for τ > τ α Therefore, α ) τ+ ) α τ α τ, τ, τ + > 7) α Let δ m < Therefore, δ k < for all k m Choosing in inequality 7) τ = δ k and = δk α, we get δ α k+ 4) δ k δ α k ] α 7) + α )δα δ α k δ α k k = + α δ α k Summing up these inequalities for k = m,, t, we get inequality 5) Note that the right-hand sides of inequalities ) and 5) are increasing functions of δ m Since in inequality 5) δ m, we get the following simple bound: lim α δ t +t m)α )] α Remark In inequality ) we can take the limit in α Since ]) α ln + t m)α )δα m = t m 2, +δm α we conclude that inequality ) has the following limiting form: δ t δ m exp t m ) 2, t m 2 Twice-differentiable convex functions 8) In this section and in Section we consider second-order methods for solving the following unconstrained minimization problem: min fx), 2) x E where f is a convex twice-differentiable function on E We assume that there exists at least one optimal solution x E of this problem, and denote f = fx )

7 The level of smoothness of the objective function f in problem 2) is characterized by the system of Hölder constants H f ν) : x y, ν 22) def = sup x,y E { 2 fx) 2 fy) x y ν For some values of ν the correspondent constants can be infinite However, it is easy to see that function H f ) is log-convex Hence, if ν < ν 2 and H f ν i ) <, i =, 2, then ν 2 ν ν ν ν H f ν) H 2 ν ν f ν ) H 2 ν f ν 2 ), ν ν, ν 2 ] 2) In particular, if H f ) < and H f ) <, then H f ν) H ν f ) Hf ν ), ν, ] 24) At the same time, it is easy to find a uniform lower bound for all constants H f ν) Indeed, if we take two points x, y E with x y =, then clearly H f ν) 2 fx) 2 fy), ν, ] 25) Note that the condition H f ) < allows discontinuous Hessian ) However, its global variation must be bounded An interesting example of such a function is a quadratic penalty for the system of linear inequalities: fx) = m a i, x b i ) 2 +, 26) i= where τ) + = max{, τ The following two simple consequences of definition 22) are valid for all ν, ] and x, y E: fy) fx) fx), y x 2 2 fx)y x), y x H f ν) y x +ν)), 27) fy) fx) 2 fx)y x) H f ν) y x +ν +ν 28) A geometric interpretation of the upper variant of inequality 27) leads to the following constructions Let H f ν) < for some ν, ] Consider the following model of function f around some point x E: Qx; y) M ν,h x; y) def = fx) + fx), y x fx)y x), y x, def = Qx; y) + H y x +ν)), y E, where the parameter H > is an estimate for the Hölder constant H f ν) Clearly, for H H f ν) we have 27) fy) M ν,h x; y), y E 29) ) For this, we need to assume that function f is twice differentiable almost everywhere In this case, we need to use in definition 22) only the points x and y where the Hessian do exist 4

8 Therefore, it is natural to consider the point T ν,h x) def = arg min y E M ν,hx; y) 2) Since function M ν,h x; y) is strictly convex in y, point T = T ν,h x) is the unique solution of the following equation: fx) + 2 fx)t x) + Multiplying equation 2) by T x, we get H T x ν +ν BT x) = 2) fx), T x + 2 fx)t x), T x + H +ν T x = 22) Denote R ν,h x) = T x Then, Mν,H def x) = M ν,h x; T ) 22) = fx) 2 2 fx)t x), T x H R ν,h x) 2) Thus, taking into account inequality 29), we come to the following statement Lemma 2 Let H H f ν) Then for T = T ν,h x) we have fx) ft ) 2 2 fx)t x), T x + H R ν,h x) We will need also some bounds on the growth of values R ν,h x) Lemma For any ν, ] and H > we have In particular, R ν,2h Consider the following function: ξτ) = min y E H R ν,h x) 24) x) R ν,h x) 4R ν,2h x) 25) R,H x) 2 2/ R,2H x) 26) {Qx; y) + y x τ+ν)), τ > The objective function in this minimization problem is jointly convex in y and τ Therefore, function ξ ) is also convex Consequently, its derivative ξ τ) = increasing in τ Thus, choosing τ = H, we conclude that R ν,/τ x) τ 2 +ν)) is H 2 R ν,h x) +ν)) = ξ ) H ξ ) 4H 2H = 2 R ν,2h x) +ν)) In particular, for ν =, we get inequality 26) For solving problem 2), the methods presented in Sections 2 and of this paper, generate the minimizing sequence {x t t E in accordance to the following generic iteration: Define x t+ = T ν,ht x t ) with coefficient H t > such that 27) fx t+ ) Mν,H t x t ) 5

9 The methods differ only by the rules for choosing the parameter ν, ] and generating the sequence of scaling coefficients {H t t Since Mν,H t x t ) fx t ), all our schemes produce the minimizing sequence with monotonically decreasing values of objective function Thus, {x t t Fx ) def = {x E : fx) fx ) Denote D = sup x x x Fx ) 28) We assume that D < Theorem Assume that for some ν, ] with H f ν) <, the scaling coefficients in method 27) satisfy condition for some constant γ Then, for any t we have < H t γ H f ν), t, 29) fx t ) f +γ +ν H f ν)d + t +2ν ) +ν 22) Indeed, M ν,h t x t ) = min y E {Qx t ; y) + H t y x t +ν)) Therefore, for all t, we have 27) 29) min y E min y E {fy) + H f ν)+h t) y x t +ν)) {fy) + +γ)h f ν) y x t +ν)) fx t ) fx ) f + +γ)h f ν) x x +ν)) f + +γ)h f ν) D +ν)) 22) On the other hand, for t we get fx t+ ) Mν,H t x t ) min {fy) + +γ)h f ν) y x t α,] +ν)) : y = x t + αx x t ) min {fx t ) αfx t ) f ) + +γ)h f ν) D α,] +ν)) α The minimum of the last minimization problem is achieved at α = ] fxt) f )+ν) +ν +γ)h f ν) D 22) 6

10 Hence, ] fx t+ ) fx t ) fxt ) f )+ν) +ν fx +γ)h f ν) D t ) f ) Denoting now δ t = +ν is satisfied with α = +ν ] + +γ)h f ν) D +ν)) fxt) f )+ν) +ν +γ)h f ν) D = fx t ) +ν +ν +γ)h f ν)d ] +ν fx t ) f ) +ν ) ] +ν = fx t ) +ν +ν)fxt) f +ν ) +γ)h f ν)d fxt ) f ) ) +ν +ν)fxt ) f ) +γ)h f ν)d At the same time,, we see that the condition ) of Lemma Therefore, δ α +ν ) +ν δ = +ν +ν)fx ) f ) +γ)h f ν)d, and we conclude that 22) +ν ) +ν ) δ t + t δ α +ν +δ α ] +ν) +ν + t +ν +2ν ] +ν) Let us look now at different strategies for satisfying condition 29) Constant H f ν) is known Then we can take H t = H f ν) for all t In this case, we choose in 29) γ = Therefore, in view of the estimate 22), this scheme can find ϵ-solution of problem 2) in ) ] Hf ν)d O +ν ϵ 222) iterations At each iteration the oracle of our objective function is called only once 2 Adaptive estimate of H f ν) For real-life problems, usually it is difficult to have a good a priori estimate of the constant H f ν) In this case, we can apply the following 7

11 adaptive strategy Adaptive method I with specific ν, ] Initialization Choose x E and H, H f ν)] Iteration t 22) a) Find the smallest integer i t such that ft ν,2 i t Ht x t )) M ν,2 i t H t x t ) b) Set x t+ = T ν,2 i tht x t ) and H t+ = 2 it H t Note that the constant H can be chosen, for example, from inequality 25) Theorem 2 Assume that H f ν) < Then the scaling coefficients in method 22) satisfy condition < 2 i t H t 2 H f ν), t 224) Moreover, for any t we have fx t ) f 2 +ν H f ν)d + t +2ν ) +ν 225) At the same time, the total number of calls of oracle N t after t iterations of method 22) is bounded as follows: N t 2t log 2 H f ν) log 2 H Let us prove that all scaling coefficients H t in method 22) satisfy inequality H t H f ν) Indeed, this is true for H Assume that we enter tth iteration with H t H f ν) Then the final value of this coefficient 2 it H t cannot be bigger than 2H f ν) since otherwise we should stop the line-search process earlier Hence, 224) is valid, an consequently, H t+ = 2 2it H t H f ν) Since we have justified relation 224), we can use inequality 22) with γ = This is 225) Finally, let us estimate the total number of calls of oracle At each iteration the oracle is called i t + times At the same time, H k+ = 2 2i kh k Therefore, t i k + ) = t + + t k= k= log 2 2H k+ H k = 2t + ) + log 2 H t+ log 2 H 2t log 2 H f ν) log 2 H As we have seen, the scheme 22) can efficiently estimate the constant H f ν) At the same time, in average it needs only two calls of oracle per iteration Nevertheless, for this strategy we need to choose the value of smoothness parameter ν In the next section we show how to avoid this requirement 8

12 Universal second-order method Denote γ ν ϵ) = 6Hf ν) ] 2 +ν)) Consider the following minimization scheme +ν 2D 5ϵ ) ν +ν ) Universal Method I Initialization Choose x E and H Iteration t ], inf γ νϵ) ν 2) a) Find the smallest integer i t such that ft,2 i t Ht x t )) M,2 i t H t x t ) b) Set x t+ = T,2 i t Ht x t ) and H t+ = 2 i t H t As compared with method 22), we choose here ν = Up to the choice of initial value H, this is exactly 2) version 57) of Cubic Regularization of the Newton Method suggested in 9] for minimizing functions with Lipschitz continuous Hessians However, we prove that this method can work properly even if H f ν) < for some ν, ] Let us prove first the following auxiliary result, which is valid also for nonconvex function f Lemma 4 Let x + = T,H x) for some x E and H > If for some δ > and ν, ] we have ] 2 CHf ν) +ν fx + ) δ and H ) ν +ν +ν)) δ, ) where the constant C 6 Then Moreover, in this case, x + x ν CH f ν) +ν))h 4) H x x + 2 fx + ) 5) For ν =, the statements are trivial Assume ν, ) Denote r = x + x Then δ ) fx + ) fx + ) f x) 2 f x)x + x) + f x) + 2 f x)x + x) 6) 28),2) H f ν)r +ν +ν + 2 Hr2 = r +ν Hf ν) +ν + 2 Hr ν ] 2) As compared with 57) from 9], in method 2) there is no artificial lower bound for scaling coefficients H t 9

13 Assume that Hr ν < δ < r +ν Hf ν) CH f ν) +ν)) Then +ν + CH f ν) 2 +ν)) ] ) = r+ν +ν H f ν) + C 2) < H f ν) +ν ) + C 2) Note that for C 6 we have + C 2) CH f ν) +ν))h ] +ν ν ] 2 C Therefore, δ < 6Hf ν) ν ) +ν ν +ν)) H This contradicts the second inequality in ) Let us prove now inequality 5) In view of inequality 4), we have H f ν) +ν C Hr ν 7) Therefore, in view of inequality 6), we have ] fx + ) r +ν Hf ν) +ν + 2 Hr ν r 2 H C + ] 2 r 2 H 6+C 2C r2 H Corollary Under conditions of Lemma 4 with C 6, we have fx + ) M,H x) f x) 8) fx + ) 27) Q x; x + ) + H f ν)r +ν)) 4) Q x; x + ) + Hr 6 = M,H x) Now we can justify complexity bounds of method 2) on the whole class of twice differentiable objective functions Theorem Assume that for some ν, ] we have H f ν) < + Let sequence {x t T t= be generated by method 2) and satisfy conditions ft,2 i H t x t )) f ϵ >, i =,, i t, t =,, T 9) Then, for all t =,, T we have H t γ ν ϵ) ) Moreover, for t =,, T, we have fx t ) f ) 2 6γ νϵ) t+) 2 +ν D, ) Therefore, T 6) +ν 26 ) ν ] ) Hf ν)d +ν 5 +ν))ϵ 2)

14 Let us prove first, that the sequence {x t T t= is well defined Indeed, assume that i t > Since for all i =,, i t we have M,2 i H t x t ) 2) < ft,2 i H t x t )) 27) M ν,hf ν)x t, T,2 i H t x t )), we get 2i H t 6 R ν,2 i H t x t ) conclude that H f ν) +ν)) Therefore, R ν,2 i H t x t ) T,2 i H t x t ) x ν T,2 i H t x t ) x t + x t x ] ν 6H f ν) +ν)) 2 i H t, and we T,2 i H t x t ) x t ν + x t x ν 6H f ν) +ν)) 2 i H t + D ν Ω i Note that ft,2 i H t x t )) ft,2 i H t x t)) f T,2 i Ht x t ) x 9) ϵ T,2 i Ht x t ) x Therefore, in view of Corollary, the line-search process at each iteration of method 2) terminates in finite time, at least when the following inequality is satisfied: 2 i H t ] 2 ) 6Hf ν) +ν Ω i +ν +ν)) ϵ ν Thus, we have proved that the whole sequence {x t T t= is well defined and x t x 28) D, t =,, T ) Let us establish now an upper bound for values H t, t =,, T For t =, inequality ) is justified by the initial conditions of method 2) Suppose that ) is valid for some t If i t =, then H t+ = 2 H t < γ ν ϵ) Consider now the case i t > Denote ϵ y t = T,2 i t H t x t ), and choose δ = y t x Since in view of Corollary, we have δ 9) fy t) f y t x fy t ), H t+ 2) = 2 i t H t 6Hf ν) ] 2 +ν)) +ν y t x ϵ ) ν +ν 4) It remains to note that y t x y t x t + x t x 26),) 2 2/ x t+ x t + D ) + 2 5/ )D < 2 5 D Substituting this upper bound in inequality 4), we get H t+ γ ν ϵ)

15 Let us estimate now the rate of convergence of method 2) Denote r t = x t x Note that fx t+ ) 2) 27) ) min M,2H t+ x t, y) y E { min fy) + H f ν) y E +ν)) y x t + 2H t+ 6 y x t min α,] {fx t ) αfx t ) f ] + H f ν)α r t +ν)) + H t+α r t min {fx t ) αfx t ) f ] + H f ν)α D α,] +ν)) + γ νϵ)α D Denote the objective function in the latter optimization problem by ω t α) Note that ω ) = f + H f ν)d +ν)) + γνϵ)d fx ) fx t ), t =,, T 5) Therefore, for all t = 2,, T we have ω t) = 5) H f ν)d +ν + γ ν ϵ)d fx t) f ) H f ν)d +ν + γ ν ϵ)d H f ν)d +ν)) γ νϵ)d > Therefore, the solution α t of the optimization problem ω t = min α,] ω t α) is smaller than one and can be found from the equation = ω tα) = H f ν)d +ν α +ν + γ ν ϵ)d α2 fx t ) f ) = H f ν)d +ν α +ν 6Hf ν)d + α +ν +ν)) ] 2 +ν 2 5ϵ ) ν +ν fx t ) f ) 6) Note that ω t 6) = ω t α t ) α t ω tα t ) 6) = fx t ) αt fx t ) f ] + H f ν)α t ) D +ν)) + γνϵ)α t ) D ] α t fx t ) f ] + H f ν)α t )+ν D +ν + γ ν ϵ)αt ) 2 D ) ) = fx t ) αt fx t ) f ] + γ ν ϵ)αt ) D 2

16 Thus, for any t =,, T, we have fx t+ ) ω t fx t ) +ν α t fx t ) f ] 7) Let us find now the lower bound for αt Multiplying the second line in 6) by 2 5ϵ, we have = 2 5ϵ ω tα) = H f ν)d +ν 2α+ν Let us define ᾱ as a solution to equation Then H f ν)d +ν 2ᾱ+ν 5ϵ 6Hf ν)d 5ϵ + +ν)) 2α+ν 5ϵ 6H f ν)d 2ᾱ +ν +ν)) 5ϵ = 2 = 2) 6 Therefore, 2 5ϵ ω tᾱ) ν 2fx t) f ) 5ϵ ] 2 +ν 9) < 2fx t) f ) 5ϵ Thus, ] αt ᾱ = 5ϵ +ν)) +ν 8) 6H f ν)d Note that the second inequality in 8) can be rewritten in the following way: Therefore, = α t ) 2 γ ν ϵ)d fx t ) f 6) 9) α t ) 2 H f ν)d +ν ] αt ) ν 6H f ν)d ν +ν 5ϵ +ν)) 9) = H f ν)d H f ν)d +ν +ν α t ) +ν + γ ν ϵ)d α t ) 2 ] 6Hf ν)d ν 5ϵ +ν)) ] 6Hf ν)d ν +ν 5ϵ +ν)) +ν +ν)) 6H f ν) + γ ν ϵ)d ] ] 2 ] ) ν +ν 5ϵ +ν 2D D + = α t ) 2 γ ν ϵ)d 2 + ν) ) 2 ] 6 ν +ν + = α t ) 2 γ ν ϵ)d 6 ] +ν + 2) ν 2 α t ) 2 γ ν ϵ)d Hence, we obtain the following inequality: fx t ) fx t+ ) 7) +ν 2fxt) f ) γ νϵ)d fx t ) f ], t Denoting now δ t = 2fx t) f ) γ ν ϵ)d +ν ) 2, we get δ t δ t+ δ /2 t, t

17 This implies δ Now we can apply inequality 8) with α = 2 This gives us inequality ) In order to get the upper bound 2), note that ϵ 9) fx t ) f ) 6γ νϵ) t+) 2 +ν ) 2 D, t T Therefore, ] /2 ] 2 T 6γνϵ) +ν ϵ D = 6 6Hf ν) +ν ϵ +ν)) = +ν 6 ] 6Hf ν)d 2 +ν))ϵ +ν 2 5 +ν 2D 5ϵ ) ν +ν ] /2 = 6) +ν ] ) ν /2 +ν D 26 ) ν ] ) Hf ν)d +ν 5 +ν))ϵ Note that the complexity bound 2), up to a constant factor, coincides with the estimate 222) However, method 2) does not require knowledge of the smoothness parameter ν, ] It adapts automatically to its best value, ensuring the smallest right-hand side of inequality 2) Using the same reasoning, as in the proof of Theorem 2, we can prove the following bound for the total number of calls of oracle N t in method 2) after t iterations: N t 2t log 2 ˆγϵ) log 2 H, t, 2) where ˆγϵ) = inf γ νϵ) ν For starting the method 2), we need to ensure initial condition H, ˆγϵ)] 2) Usually, this is not a serious problem since typically all values γ ν ϵ), ν, ], are expected to be big In any case, we can try to find this value from an auxiliary search procedure based on the following fact Lemma 5 Let H >, and for two points y = T,H x ), y = T,2H x ) we have fy i ) f ϵ, i =, 2 If fy ) M,H x ), 22) fy ) M,2H x ), then H ˆγϵ) Denote δ = ϵ y x Since δ fy ) f y x fy ), using Corollary, we get H < 6Hf ν) ] 2 +ν)) +ν y x ϵ ) ν +ν 2) 4

18 On the other hand, fy ) 22) y x 28) D Thus, M,2H x ) fx ) Hence, y Fx ) and therefore, y x y x + x x 26) 2 2/ y x + x x 28) 2 2/ + ) D < 2 D Consequently, H 2) ] 2 6Hf ν) +ν < 2D ) ν +ν ) +ν)) ϵ = γ ν ϵ) It remains to note that the condition 22) does not involve any particular value of ν, ] Thus, we can try to get an appropriate value of H by checking the points y i = T,2 ix ) with positive or negative integer values of i 4 Decreasing the norm of the gradient Let us assume now that the objective function in problem min x E fx) 4) is not convex Then our main goal consists in finding a point with a small norm of the gradient However, note that non-convexity of the objective creates additional difficulties First of all, in our optimization schemes we need to use the global solution of the auxiliary problem 2) it can be efficiently computed, see 5, 9]) This solution T = T ν,h x) is characterized by the first-order optimality condition 2), and by the secondorder condition 2 fx) + H +ν T x ν B 42) see ]; for ν =, this characterization was firstly obtained in 9]) Note that condition 42) is stronger than the usual second-order optimality condition for the auxiliary problem 2) However, even with its help, the standard machinery for convergence analysis often do not work Let us look, for example, what we can guarantee for the process as applied to non-convex problem 4) Indeed, x E, x t+ = T ν,hf ν)x t ), t, 4) fx t+ ) 27) fx t ) + fx t ), x t+ x t fx t )x t+ x t ), x t+ x t + H f ν) +ν)) x t+ x t 22) = fx t ) 2 2 fx t )x t+ x t ), x t+ x t H f ν) x t+ x t 42) fx t ) νh f ν) )) x t+ x t 5

19 Thus, it seems that for ν, the method becomes slower and slower and we cannot guarantee any rate of convergence for the limiting value ν = Therefore, for non-convex problems we need to employ stronger conditions for accepting next points in the minimization sequence Let us replace the functional condition 27) by the following criterion: { G κ x, x + ) fx) fx + ) κ fx) x x +, 44) where κ is a constant belonging to the interval, 2 ] Criterion G κ can be seen as a natural strengthening of Armijo-Goldstein condition Note that it does not depend on the smoothness parameter ν We will need also the following inequality: ft ) ft ) fx) 2 ft )T x) + fx) + 2 ft )T x) 45) 28),2) H f ν)+h +ν T x +ν Lemma 6 Let x + = T ν,h x) and H f ν) < + Then G /4 x, x + ) is true for any H satisfying the inequality ) H + 4 H f ν) 46) If function f is convex, then G /2 x, x + ) is satisfied by any ) H + 2 H f ν) 47) Indeed, f x) fx + ) 27) f x), x x f x)x + x), x + x H f ν) x + x +ν)) ] 22) = 2 2 f x)x + x), x + x + H +ν H f ν) +ν)) x + x 48) Using matrix inequality 42), we get f x) fx + ) H ) ] H f ν) +ν)) x + x In view of assumption 46), the right-hand side of this inequality is nonnegative Thus, f x) fx + ) 45) ) H+H f ν) 2 H H f ν) fx + ) x + x 46) 4 fx +) x + x 6

20 Let us assume now that function f is convex Then, from inequality 48) we have: ] f x) fx + ) H +ν H f ν) +ν)) x + x Since by assumption 47) the right-hand side of this inequality is nonnegative, we get 45) ) f x) fx + ) H+H f ν) H H f ν) fx + ) x + x 47) 2 fx +) x + x Let us consider now different variants of the regularized Newton Method The simplest version uses a constant coefficient in the prox-term: Denote g t = min fx k) kt+ x E, x t+ = T ν,h x t ), t 49) Theorem 4 Let the objective function in problem 4) be below bounded by f and H f ν) < + for some ν, ] If the parameter H of method 49) satisfies the condition 46), then ] ) +ν gt H+Hf ν) 4fx ) f ) +ν t+ 4) Indeed, in view of Lemma 6, we have fx k ) fx k+ ) 4 fx k+) x k x k+ 45) 4 Summing up these inequalities for k =,, t, we get ] +ν 4 t + )gt ) +ν 4 +ν H+H f ν) +ν H+H f ν) ] +ν +ν H+H f ν) t k= ] +ν +ν fx k+ ) +ν fx k+ ) fx ) fx t+ ) fx ) f This is exactly the inequality 4) If the objective function of problem 4) is convex, we can guarantee a better rate of convergence In this case, we assume that problem 4) is solvable and denote by x one of its optimal solutions We assume also that the constant D defined by 28) is finite Theorem 5 Let function f in problem 4) be convex and parameter H in method 49) satisfy condition 47) Then for any t we have ] +ν fx t ) fx H+Hf ν))d ) +ν, 4) and for any t 2 we have ν) t ) ] ] gt 2 +ν +ν) 2 ) H+Hf ν) +ν D +ν 8+ν) +ν 2 t 42) 7

21 Indeed, in view of Lemma 6, at each iteration of method 49) criterion G /2 x t, x t+ ) is satisfied Therefore, this method forms a monotonically decreasing sequence of function values Hence, {x t t Fx ), and we conclude that Further, fx t+ ) fx ) fx t+ ), x t+ x fx t+ ) D fx k ) fx k+ ) 2 fx k+) x k x k+ 45) 2 2 +ν H+H f ν) +ν H+H f ν) ] ) +ν + D fx k+ ) fx +ν )) ] +ν +ν fx k+ ) = ] +ν)fxk+ ) fx )) +ν fx 2 +ν H+H f ν))d k+ ) fx )) Denoting δ k = +ν)fx k) fx )), we can rewrite this inequality as δ k δ k+ δk+ α with 4 +ν H+H f ν))d α = 47) +ν Note that H H f ν) Therefore, fx ) 27) 27) { H min Qx ; x) + x E +ν)) x x { min fx) + H+H f ν) x E +ν)) x x fx ) + H+H f ν) +ν)) D This means that δ Consequently, 2 +ν δα ) 2) /+ν) 2 < Since the estimate ) is monotone in δ m, we conclude that ) δ t δ α + δα t ) +δ α )+ν) +ν + t ) + )+ν) +ν = + t ) 4+ν) This inequality leads to the estimate 4) In order to prove inequality 42), let us fix a number k, k < t Then for any i, k i t, we have fx i ) fx i+ ) 2 fx i+) x i x i+ 45) 2 Summing up these inequalities for i = k,, t, we get 2 +ν H+H f ν) ] +ν t k + )gt ) +ν 2 +ν H+H f ν) +ν H+H f ν) ] +ν t i=k ] +ν ] +ν +ν fx i+ ) +ν fx i+ ) fx k ) fx t+ ) fx k ) f 8

22 Thus, g t ) +ν 4) ] H+Hf ν) +ν fx 2 k ) f +ν t k+ ] H+Hf ν) +ν D +ν +ν 2 t k ν) k ) ] +ν ] H+Hf ν) +ν D +ν +ν 8+ν) ] +ν 2 t k+)k ) +ν Considering now even and odd numbers for t, it is easy to prove that max k Z { t k + )k ) +ν : k t ) t 2 Thus, we obtain the bound 42) Let us consider now two versions of method 49) with adjustable estimates for the Hölder parameters We start from a version applicable to the general functions Adaptive method II with specific ν, ] General functions) Initialization Choose x E and H, H f ν)] Iteration t 4) a) Find the smallest integer i t such that the condition G /4 x t, T ν,2 i t Ht x t )) is satisfied b) Set x t+ = T ν,2 i tht x t ) and H t+ = 2 it H t Theorem 6 Let the objective function in problem 4) be below bounded by f and H f ν) < + for some ν, ] Then the method 4) finds a point x E with f x) δ in T 4fx ) f ) ] 4+ν)Hf ν) +ν ) +ν +ν)) δ 44) iterations The number of calls of oracle in this process does not exceed ) NF 2T + log H + log f ν) 2 H 45) Indeed, in view of Lemma 6, the parameters i t and H t in method 4) satisfy inequalities ) ) 2 i t H t H f ν), H t + 4 H f ν) 46) 9

23 Therefore, in view of Lemma 6, we have fx t ) fx t+ ) 4 fx t+) x t x t+ 45) 4 +ν 2 i th t+h f ν) ] +ν 46) ] +ν)) +ν 4 4+ν)H f ν) +ν fx t+ ) +ν fx t+ ) Assume that fx t ) δ for all t, t T Summing up the above inequalities for t =,, T, we get ] +ν)) +ν 4 26+ν)H f ν) T δ +ν T fx t ) fx t+ )) t= fx ) fx T ) fx ) f This gives us the upper bound 44) for the number of iterations The number of calls of oracle at tth iteration of method 4) is equal to i t + Therefore, NF = T i t + ) = T + T 2H log t+ 2 H t t= t= 46) ) 2T + log H + log f ν) 2 H = 2T + log 2 H T H Let us justify now a version of method 4) applicable to convex functions It differs 2

24 from 4) only by the stopping criterion at Step a) Adaptive method III with specific ν, ] Convex functions) Initialization Choose x E and H, H f ν)] Find the smallest integer i such that ft ν,2 i H x )) M ν,2 i H x ) and condition G /2 x, T ν,2 i H x )) is satisfied Set x = T ν,2 i H x ), H = 2 i H 47) Iteration t a) Find the smallest integer i t such that the condition G /2 x t, T ν,2 i t Ht x t )) is satisfied b) Set x t+ = T ν,2 i t Ht x t ) and H t+ = 2 it H t Theorem 7 Let function f in problem 4) be convex Then for any t we have fx t ) fx ) and for any t 2 we have g t 2 +ν ν) t ) +ν)hf ν)d +ν +ν)) ] +ν +ν)hf ν)d +ν)), 48) ] ] +ν) 2 ) 8+ν) +ν 2 t 49) Indeed, in view of Lemma 6, the parameters i t and H t in method 47) satisfy inequalities ) ) 2 i t H t H f ν), H t + 2 H f ν) 42) On the other hand, using the same arguments as in the beginning of the proof of Theorem 5, we conclude that in method 47) we have fx t ) fx t+ ) ] +ν)fxt+ ) fx )) +ν fx 2 +ν 2 i t H t+h f ν))d t+ ) fx )) 2

25 Denote δ t = +ν)fx t+) fx )) α = +ν 2 +ν 2 i t H t +H f ν))d The above inequality is then δ t δ t+ δ α t+ with Note that by the initialization procedure of method 47), we have fx ) 27) 27) { min Qx ; x) + 2i H x E +ν)) x x { min fx) + 2i H +H f ν) x E +ν)) x x fx ) + 2i H +H f ν) +ν)) D This means that δ Hence, from the proof of Theorem 5, we get the bound 2 +ν ) δ α Using Lemma we obtain, as in Theorem 5, ] +ν δ t + t ) 4+ν) Taking into account the bounds 42), we obtain 48) The proof of the bound 49) is very similar to the proof of inequality 42) Let us fix a number k, k < t Then for any s, k s t, we have fx s ) fx s+ ) 45) 2 +ν 2 i sh s +H f ν) ] +ν 42) ] +ν)) +ν 2 +ν)h f ν) Summing up these inequalities for s = k,, t, we get Thus, ] ] +ν)) +ν 2 +ν)h f ν) t k + )gt ) +ν +ν)) +ν 2 +ν)h f ν) gt ) +ν)hf ν) +ν 2 48) 2 2 +ν)) +ν fx s+ ) +ν fx s+ ) t s=k +ν fx s+ ) fx k ) fx t+ ) fx k ) f ] +ν fx k ) fx ) t k+ ] +ν)hf ν)d +ν +ν +ν)) t k+ ] +ν)hf ν)d +ν +ν)) +ν 8+ν) ν) k ) ] +ν ] +ν t k+)k ) +ν Choosing k 2 t, we get t k + )k )+ν ) t 2 This lower bound justifies inequality 49) 22

26 5 Universal methods for decreasing the norm of gradient Let us start with the following auxiliary result Note that its conditions are identical to conditions of Lemma 4 Lemma 7 Let x + = T,H x) for some x E and H > If for some δ > and ν, ] we have ] 2 CHf ν) +ν fx + ) δ and H ) ν +ν +ν)) δ, 5) with constant C 6, then f x) fx + ) H 2 x x + 52) If f is convex, then f x) fx + ) H x x + 5) Denote r = x + x Then f x) fx + ) 27) f x), x x f x)x + x), x + x H f ν)r +ν)) 54) 22) = 2 2 f x)x + x), x + x + H 2 r H f ν) +ν)) r Using matrix inequality 42) with ν =, we get f x) fx + ) H 4 r H 7) f ν) +ν)) r H 4 r H C r 2 Hr If f is convex, then from inequality 54) we obtain f x) fx + ) H 2 r H 7) f ν) +ν)) r H 2 r H C r Hr Let us introduce now the following stopping criterion { U κ x, H) fx) ft,h x)) κ ft H /2,H x)) /2, 55) where the parameter κ belongs to the interval, ) Note that this criterion does not depend on Hölder parameter ν, ] Denote ] 2 6Hf ν) +ν ξ ν δ) = ) ν +ν +ν)) δ, ˆξδ) = inf ξ νδ) ν In view of inequalities 5), 52), and 5), the following statement is valid 2

27 Corollary 2 For non-convex functions, if H ξ ν δ) and ft,h x)) δ, the criterion U /2 x, H) is satisfied If function is convex, the same conditions are sufficient for satisfying U / x, H) For U /2, it is enough to combine inequalities 5) and 52) For U / we apply 5) and 5) Same as in Section, the simplest universal method for decreasing the norm of the gradient is based on cubic regularization Universal Method II General functions) Initialization Choose x E and H, ˆξδ) ] Iteration t a) Find the smallest integer i t such that either criterion 56) U /2 x t, 2 i t H t ) is satisfied, or ft,2 i tht x t )) δ b) Set x t+ = T,2 i t Ht x t ) and H t+ = 2 i t H t c) If fx t+ ) δ, then Stop Theorem 8 Let objective function f in problem 4) be below bounded by value f and H f ν) < + for some ν, ] Assume that for all t, t T +, we have fx t ) δ Then the number of such steps of method 56) is bounded as follows T 2 ] 6Hf ν) +ν 2fx ) f ) ) +ν +ν)) δ 57) The number of calls of the oracle in this scheme is bounded as follows: ] 2 6Hf ν) NF 2T + log 2 +ν)) Indeed, in view of Corollary 2, we have +ν δ ) ν +ν ) log 2 H 58) 2 i t H t 2ξ ν δ), t =,, T, H t ξ ν δ), t =,, T + 59) 24

28 Therefore, in view of definition 55), we have ] /2 fx t ) fx t+ ) 2 2 i t H t δ /2 59) /2 2 2ξ νδ)] δ /2 Summing up these inequalities for t =,, T, we get T 2fx ) fx T )) 2ξ ν δ)] /2 ) /2 δ ] 2 ] 6Hf ν) +ν = 2fx ) f ) 2 ) ν /2 +ν +ν)) δ δ = 2 ] 6Hf ν) +ν 2fx ) f ) ) +ν +ν)) δ We can bound the number of calls of oracle in this scheme as follows The number of calls of oracle at tth iteration of method 56) is equal to i t + Therefore, NF = T i t + ) = T + T 2H log t+ 2 H t t= t= 59) ] 2 6Hf ν) 2T + log 2 +ν)) +ν δ ) ν +ν ) /2 = 2T + log 2 H T H ) log 2 H Let us look now at the universal method for finding a point with small norm of the gradient of convex function Universal Method III Convex functions) Initialization Choose x E and H, ˆξδ) ] Iteration t a) Find the smallest integer i t such that either criterion 5) U / x t, 2 i t H t ) is satisfied, or ft,2 i t Ht x t )) δ b) Set x t+ = T,2 i t Ht x t ) and H t+ = 2 i t H t c) If fx t+ ) δ, then Stop 25

29 Theorem 9 Let the objective function in problem 4) be convex and H f ν) < + for some ν, ] Assume that the sequence of points {x t t is generated by method 5) and fx t ) δ for all t, t T + Denote by m the first iteration number such that fx m+ ) f 6ξ ν δ)d Then { m ln/2) ln max fx, log ) f 2, 5) 8ξ ν δ)d and for all k m we have fx k+ ) f 42ξ νδ)d k m) 2 52) At the same time, if T = m + s for some integer s, then g T def 2 = min fx k) ξ ν δ) 2D kt + T m) 5) Moreover, the maximal number of such steps T is bounded from above as follows: Indeed, in view of Corollary 2, we have ) 6Hf ν) +ν T m + 2D +ν))δ 54) 2 i kh k 2ξ ν δ), k =,, T, H k ξ ν δ), k =,, T + 55) Therefore, in view of definition 55), we have fx k ) fx k+ ) 55) ] /2 2 i k fxk+ ) /2 H k 55) 28) 2ξ νδ) 2ξ νδ)d 2ξ νδ)d ] /2 fxk+ ) /2 ] /2 fxk+ ) x k+ x ) /2 ] /2 fxk+ ) f ) /2 56) Denoting now δ k = fx k+) f, we see that this sequence satisfies condition ) with 8ξ νδ)d α = 2 Since m is the first iteration number such that δ m 2, in view of inequality 2) we have { m ln/2) ln max{, log 2 δ ln/2) ln max fx, log ) f 2 8ξ νδ)d 26

30 At the same time, in view of inequality ), for k > m we get the following rate of convergence: fx k+ ) f 8ξ ν δ)d Thus, we get inequality 52) Further, let T = m + s for some s Then 42ξ νδ)d 4s 2 52) fx m+2s+ ) f = fx T + ) f + ] ξ k m < ν δ)d k m) 2 T k=m+2s+ fx k ) fx k+ )) 56) s 2ξ ν δ)] /2 g T ) /2 Therefore, g T 5 2 s ξ /2 ν δ)d ] 2/ = 5 2) / T m ) 2 ξν δ)d 2 < 2D T m) 2 ξν δ) Since g T δ, we conclude that T m + 2D δ ξ νδ) ] ] /2 6Hf ν) +ν = m + ) 2D +ν +ν)) δ Comparing the efficiency bound 54) with the rate of convergence 49) of the adaptive method 47) with known value of the smoothness parameter ν, we can see that the universal method 5) ensures the same dependence on the accuracy δ as method 47) Since the value ν is not employed in the scheme of method 5), the bound 54) can be strengthen as follows: ) 6Hf ν) +ν T m + 2D inf ν +ν))δ 57) 27

31 References ] HS Dollar, NIM Gould, and DP Robinson On solving trust-region and other regularised subproblems in optimization Technical Report RAL-TR-29-, Rutherford Appleton Laboratory 29) 2] C Cartis, N I M Gould, and Ph L Toint Adaptive cubic regularisation methods for unconstrained optimization Mathematical Programming, 272), ) ] C Cartis, N I M Gould, and Ph L Toint On the evaluation complexity of cubic regularization methods for potentially rank-deficient nonlinear least-squares problems and its relevance to constrained nonlinear optimization SIOPT, 2), ) 4] FE Curtis, DP Robinson, and M Samadi A trust-region algorithm with a worstcase iteration complexity of Oϵ /2 ) for nonconvex optimization Mathematical Programming, DOI: 7/s ) 5] Y Hsia, R Shew, R, and Y Yuan On the p-regularized trust region subproblem arxiv: ) 6] JM Martínez, and M Raydan Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization Optimization Online 25) 7] Yu Nesterov Universal gradient methods for convex optimization problems Mathematical Programming, 52-2), ) 8] Yu Nesterov Accelerating the cubic regularization of Newton s method on convex problems Mathematical Programming, 2), ) 9] Yu Nesterov, B Polyak Cubic regularization of Newton s method and its global performance Mathematical Programming, 8), ) 28

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Accelerating the cubic regularization of Newton s method on convex problems

Accelerating the cubic regularization of Newton s method on convex problems Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Nonsymmetric potential-reduction methods for general cones

Nonsymmetric potential-reduction methods for general cones CORE DISCUSSION PAPER 2006/34 Nonsymmetric potential-reduction methods for general cones Yu. Nesterov March 28, 2006 Abstract In this paper we propose two new nonsymmetric primal-dual potential-reduction

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian

An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian An example of slow convergence for Newton s method on a function with globally Lipschitz continuous Hessian C. Cartis, N. I. M. Gould and Ph. L. Toint 3 May 23 Abstract An example is presented where Newton

More information

Numerical Experience with a Class of Trust-Region Algorithms May 23, for 2016 Unconstrained 1 / 30 S. Smooth Optimization

Numerical Experience with a Class of Trust-Region Algorithms May 23, for 2016 Unconstrained 1 / 30 S. Smooth Optimization Numerical Experience with a Class of Trust-Region Algorithms for Unconstrained Smooth Optimization XI Brazilian Workshop on Continuous Optimization Universidade Federal do Paraná Geovani Nunes Grapiglia

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

On the complexity of an Inexact Restoration method for constrained optimization

On the complexity of an Inexact Restoration method for constrained optimization On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Evaluation complexity for nonlinear constrained optimization using unscaled KKT conditions and high-order models by E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos and Ph. L. Toint Report NAXYS-08-2015

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

Cubic regularization of Newton method and its global performance

Cubic regularization of Newton method and its global performance Math. Program., Ser. A 18, 177 5 (6) Digital Object Identifier (DOI) 1.17/s117-6-76-8 Yurii Nesterov B.T. Polyak Cubic regularization of Newton method and its global performance Received: August 31, 5

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

Parameter Optimization in the Nonlinear Stepsize Control Framework for Trust-Region July 12, 2017 Methods 1 / 39

Parameter Optimization in the Nonlinear Stepsize Control Framework for Trust-Region July 12, 2017 Methods 1 / 39 Parameter Optimization in the Nonlinear Stepsize Control Framework for Trust-Region Methods EUROPT 2017 Federal University of Paraná - Curitiba/PR - Brazil Geovani Nunes Grapiglia Federal University of

More information

An introduction to complexity analysis for nonconvex optimization

An introduction to complexity analysis for nonconvex optimization An introduction to complexity analysis for nonconvex optimization Philippe Toint (with Coralia Cartis and Nick Gould) FUNDP University of Namur, Belgium Séminaire Résidentiel Interdisciplinaire, Saint

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Primal-dual subgradient methods for convex problems

Primal-dual subgradient methods for convex problems Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different

More information

This manuscript is for review purposes only.

This manuscript is for review purposes only. 1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization

Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization J. M. Martínez M. Raydan November 15, 2015 Abstract In a recent paper we introduced a trust-region

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization

A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization Math. Program., Ser. A DOI 10.1007/s10107-016-1026-2 FULL LENGTH PAPER A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization Frank E. Curtis 1 Daniel P.

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

Iteration and evaluation complexity on the minimization of functions whose computation is intrinsically inexact

Iteration and evaluation complexity on the minimization of functions whose computation is intrinsically inexact Iteration and evaluation complexity on the minimization of functions whose computation is intrinsically inexact E. G. Birgin N. Krejić J. M. Martínez September 5, 017 Abstract In many cases in which one

More information

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL) Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x

More information

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem

1. Introduction. We consider the numerical solution of the unconstrained (possibly nonconvex) optimization problem SIAM J. OPTIM. Vol. 2, No. 6, pp. 2833 2852 c 2 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF STEEPEST DESCENT, NEWTON S AND REGULARIZED NEWTON S METHODS FOR NONCONVEX UNCONSTRAINED

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Static Problem Set 2 Solutions

Static Problem Set 2 Solutions Static Problem Set Solutions Jonathan Kreamer July, 0 Question (i) Let g, h be two concave functions. Is f = g + h a concave function? Prove it. Yes. Proof: Consider any two points x, x and α [0, ]. Let

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material

More information

LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION

LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION ROMAN A. POLYAK Abstract. We introduce the Lagrangian Transformation(LT) and develop a general LT method for convex optimization problems. A class Ψ of

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

More information

Analysis II - few selective results

Analysis II - few selective results Analysis II - few selective results Michael Ruzhansky December 15, 2008 1 Analysis on the real line 1.1 Chapter: Functions continuous on a closed interval 1.1.1 Intermediate Value Theorem (IVT) Theorem

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO QUESTION BOOKLET EECS 227A Fall 2009 Midterm Tuesday, Ocotober 20, 11:10-12:30pm DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO You have 80 minutes to complete the midterm. The midterm consists

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

QUADRATIC MAJORIZATION 1. INTRODUCTION

QUADRATIC MAJORIZATION 1. INTRODUCTION QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Worst case complexity of direct search

Worst case complexity of direct search EURO J Comput Optim (2013) 1:143 153 DOI 10.1007/s13675-012-0003-7 ORIGINAL PAPER Worst case complexity of direct search L. N. Vicente Received: 7 May 2012 / Accepted: 2 November 2012 / Published online:

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Global convergence of the Heavy-ball method for convex optimization

Global convergence of the Heavy-ball method for convex optimization Noname manuscript No. will be inserted by the editor Global convergence of the Heavy-ball method for convex optimization Euhanna Ghadimi Hamid Reza Feyzmahdavian Mikael Johansson Received: date / Accepted:

More information

Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming

Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming arxiv:1211.0056v2 [math.oc] 2 Nov 2012 Zhaosong Lu October 30, 2012 Abstract In this paper we consider l 0 regularized convex

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Chapter 2: Preliminaries and elements of convex analysis

Chapter 2: Preliminaries and elements of convex analysis Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

A Value Estimation Approach to the Iri-Imai Method for Constrained Convex Optimization. April 2003; revisions on October 2003, and March 2004

A Value Estimation Approach to the Iri-Imai Method for Constrained Convex Optimization. April 2003; revisions on October 2003, and March 2004 A Value Estimation Approach to the Iri-Imai Method for Constrained Convex Optimization Szewan Lam, Duan Li, Shuzhong Zhang April 2003; revisions on October 2003, and March 2004 Abstract In this paper,

More information

Unconstrained minimization

Unconstrained minimization CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

More information

New hybrid conjugate gradient methods with the generalized Wolfe line search

New hybrid conjugate gradient methods with the generalized Wolfe line search Xu and Kong SpringerPlus (016)5:881 DOI 10.1186/s40064-016-5-9 METHODOLOGY New hybrid conjugate gradient methods with the generalized Wolfe line search Open Access Xiao Xu * and Fan yu Kong *Correspondence:

More information

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ. Convexity in R n Let E be a convex subset of R n. A function f : E (, ] is convex iff f(tx + (1 t)y) (1 t)f(x) + tf(y) x, y E, t [0, 1]. A similar definition holds in any vector space. A topology is needed

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Inexact Newton Methods and Nonlinear Constrained Optimization

Inexact Newton Methods and Nonlinear Constrained Optimization Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Microeconomics I. September, c Leopold Sögner

Microeconomics I. September, c Leopold Sögner Microeconomics I c Leopold Sögner Department of Economics and Finance Institute for Advanced Studies Stumpergasse 56 1060 Wien Tel: +43-1-59991 182 soegner@ihs.ac.at http://www.ihs.ac.at/ soegner September,

More information

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Boris T. Polyak Andrey A. Tremba V.A. Trapeznikov Institute of Control Sciences RAS, Moscow, Russia Profsoyuznaya, 65, 117997

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

A convergence result for an Outer Approximation Scheme

A convergence result for an Outer Approximation Scheme A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento

More information