Complexity bounds for primal-dual methods minimizing the model of objective function
|
|
- Oliver Lee
- 5 years ago
- Views:
Transcription
1 Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing to approach a primal-dual solution of convex optimization problem with composite objective function. Additional properties of complementary part of the objective (strong convexity significantly accelerate the scheme. We also justify a new variant of this method, which can be seen as a trust-region scheme applying to the linear model of objective function. For this variant, we prove also the rate of convergence for the total variation of linear model of composite objective over the feasible set. Our analysis works also for quadratic model, allowing to justify the global rate of convergence for a new second-order method as applied to a composite objective function. To the best of our knowledge, this is the first trust-region scheme supported by the worstcase complexity analysis both for the functional gap and for the variation of local quadratic model over the feasible set. Keywords: convex optimization, complexity bounds, linear optimization oracle, conditional gradient method, trust-region method. Center for Operations Research and Econometrics (CORE, Catholic University of Louvain (UCL, 34 voie du Roman Pays, 348 Louvain-la-Neuve, Belgium; Yurii.Nnesterov@uclouvain.be. The research results presented in this paper have been supported by the grant Action de recherche concertè ARC 4/9-060 from the Direction de la recherche scientifique - Communautè française de Belgique. Scientific responsibility rests with the author.
2 Introduction Motivation. In the last years, we can see an increasing interest to Frank-Wolfe algorithm [3,, ], which sometimes is called Conditional Gradient Method (CGM [5, 7, 9, 8]. At each iteration of this scheme, we need to solve an auxiliary problem of minimizing a linear function over a convex feasible set. In some situations, mainly when the feasible set is a simple polytope, the complexity of this subproblem is much lower than that of the standard projection technique (e.g. []. The standard complexity results for this method are related to convex objective function with Lipschitz-continuous gradient. In this situation, CGM converges as O( k, where k is the number of iterations. Moreover, it appears that this rate of convergence is optimal for methods with linear optimization oracle [0]. For nonsmooth functions, CGM cannot converge (we give a simple example in Section. Therefore, it is interesting to study the dependence of the rate of convergence of CGM on the level of smoothness of the objective function. On the other hand, sometimes nonsmoothness of the objective function results from a complementary regularization term. This situation can be treated in the framework of composite minimization [4]. However, the performance of CGM for this structure of the objective function was not studied yet. Finally, by its spirit, CGM is a primal-dual method. Indeed, it generates the lower bounds on the optimal value of the objective function, which converge to the optimal value [4]. Therefore, it would be natural to extract from this method an approximate solution of the dual problem. These questions served as the main motivations for this paper. Contents. In Section, we introduce our main problem of interest, where the objective function has a composite form. Our main assumption is that the problem of minimizing a linear function augmented by a simple convex complementary term is solvable. We assume that the smooth part of the objective has Hölder-continuous gradients. For proving efficiency estimate for CGM, we apply the technique of estimating sequences (e.g. [] in its extended form [8]. As a result, we get a significant freedom in the choice of stepsize coefficients. In the next Section 3 we consider a new variant of CGM, which can be seen as a trust-region method with linear model of the objective. For this scheme, the trust region is formed by contracting the feasible set towards the current test point. This method ensures an appropriate rate of convergence for total variation of the linear model, computed at the sequence of test points. Analytical form of our bounds for the primal-dual gap is similar to the bounds obtained in [4]. However, we estimate the difference of the current value of the objective function and the minimal value of the accumulated linear model. In Section 4, we explain how to extract from our convergence results an upper bound on the duality gap for some feasible primal-dual solution. In our technique we use an explicit max-representation of the smooth part of the objective function. In Section 5, we show that the additional properties of complementary part of the objective (strong convexity significantly accelerate the scheme. Finally, in Section 6, we apply our technique for a new second-order trust-region method, where the quadratic approximation of our objective function is minimized on a trust region formed by a con- Performance of CGM as applied to objective function regularized by a norm was studied in [6].
3 tracted feasible set. To the best of our knowledge, this is the first trust-region scheme [] supported by the worst-case complexity analysis. Notation. In what follows, we consider optimization problems over finite-dimensional linear space E with the dual space E. The value of linear function s E at x E is denoted by s, x. In E, we fix a norm, which defines the conjugate norm s = max x { s, x : x }, s E. For a linear operator A : E E, its conjugate operator A : E E is defined by identity Ax, y = A y, x, x E, y E. We call operator B : E E self-conjugate if B = B. It is positive-semidefinite if Bx, x 0 for all x E. For a linear operator B : E E, we define its operator norm in the usual way: B = max { Bx : x }. x For a differentiable function f(x with dom f E, we denote by f(x E its gradient, and by f(x : E E its Hessian. Note that ( f(x = f(x. In the sequel, we often need to estimate the partial sums of different series. For that, it is convenient to use the following trivial lemma. Lemma Let function ξ(τ, τ R, be decreasing and convex. Then, for any two integers a and b, such that [a, b + ] dom ξ, we have b+ a ξ(τdτ b k=a ξ(k b+/ a / ξ(τ dτ. (. For example, for any t 0 and p t, we have t+p k+p+ (. t+p+ t τ+p+ dτ = ln(τ + p + t+p+ t = ln t+p+ t+p+ = ln. (. On the other hand, if t, then t+ (. (k+ t+3/ t / (τ+ dτ = τ+ t+3/ t / = t+3/ t+7/ (.3 = 4t+8 (t+3(4t+7 (t+3. In this paper, we will use some special dual functions. Let Q E be a bounded closed convex set. For a closed convex function F ( with dom F int Q, we define its restricted dual function, (with respect to a central point x Q, as follows: F x,q (s = max { s, x x + F ( x F (x}, s x Q E. (.4 Clearly, this function is well defined for all s E. Moreover, it is convex and nonnegative on E. 3
4 In the theory of Nonsmooth Convex Optimization, the dual functions are often used for analyzing the behavior of first-order methods. In these applications, function F in (.4 is assumed to be a multiple of a strongly convex prox-function (see, for example, [3]. Such an assumption provides us with a possibility to control the step sizes in the corresponding optimization schemes. In this paper, we are going to drop the assumption on strong convexity. Hence, we need to develop another mechanism for controlling the sizes of our moves. This is the reason why we introduce in construction (.4 an additional scaling parameter τ [0, ]. We call function Fτ, x,q (s = max { s, x y + F ( x F (y : y = ( τ x + τx}, s x Q E, (.5 the scaled restricted dual of function F. Lemma For any s E and τ [0, ], we have F x,q (s F τ, x,q (s τ F x,q (s. (.6 Proof: Since for any x Q, the point y = ( τ x + τx belongs to Q, the first inequality is trivial. On the other hand, F τ, x,q (s = max{ s, τ( x x + F ( x F (y : y = ( τ x + τx } x Q max{ s, τ( x x + F ( x ( τf ( x τf (x } x Q = τf x,q (s. Conditional gradient method In this paper we consider numerical methods for solving the following composite minimization problem: { } def min f(x = f(x + Ψ(x, (. x where Ψ is a simple closed convex function with bounded domain Q E, and f is a convex function, which is subdifferentiable on Q. Denote by x one of the optimal solutions of (., and D def = diam(q. Our assumption on simplicity of function Ψ means that some auxiliary optimization problems related to Ψ are easily solvable. Complexity of these problems will be always discussed for corresponding optimization schemes. The most important examples of function Ψ are as follows. Ψ is an indicator function of a closed convex set Q: { Ψ(x = Ind Q (x def 0, x Q, = +, otherwise. Ψ is a self-concordant barrier for a closed convex set Q (see [6, ]. Ψ is a nonsmooth convex function with simple structure (e.g. Ψ(x = x. 4 (.
5 We assume that function f is represented by a black-box oracle. If it is a first-order oracle, we assume its gradients satisfy the following Hölder condition: f(x f(y G ν x y ν, x, y Q. (.3 Constant G ν is formally defined for any ν (0, ]. For some values of ν it can be +. Note that for any x and y in Q we have f(y f(x + f(x, y x + G ν +ν y x +ν. (.4 If this is a second-order oracle, we assume that its Hessians satisfy Hölder condition f(x f(y H ν x y ν, x, y Q. (.5 In this case, for any x and y in Q we have f(y f(x + f(x, y x + f(x(y x, y x + Hν y x +ν (+ν(+ν. (.6 For the first variant of Conditional Gradient Methods (CGM our assumption on simplicity of function Ψ means exactly the following. Assumption For any s E, the auxiliary problem min { s, x + Ψ(x} (.7 x Q is easily solvable. Denote by v Ψ (s Q one of its optimal solutions. In the case (., this assumption implies that we are able to solve the problem min{ s, x : x Q}. x Note that point v Ψ (s is characterized by the following variational principle: s, x v Ψ (s + Ψ(x Ψ(v Ψ (s, x Q. (.8 In order to solve problem (., we apply the following method. Conditional Gradient Method, Type I. Choose an arbitrary point x 0 Q.. For t 0 iterate: a Compute v t = v Ψ ( f(x t. (.9 b Choose τ t (0, ] and set x t+ = ( τ t x t + τ t v t. It is clear that this method can minimize only functions with continuous gradient. 5
6 Example Let Ψ(x = Ind Q (x with Q = {x R : (x ( + (x ( }. Define f(x = max{x (, x ( }. ( T Then clearly x =,. Let us choose in (.9 x0 x. For function f, we can apply an oracle, which returns at any x Q a subgradient f(x {(, 0 T, (0, T }. Then, for any feasible x, the point v Ψ ( f(x is equal either to y = (, 0 T, or to y = (0, T. Therefore, all points of the sequence {x t } t0, generated by method (.9, belong to the triangle Conv{x 0, y, y }, which does not contain the optimal point x. In order to justify the rate of convergence of method (.9 for functions with Hölder continuous gradients, we apply the estimating sequences technique [] in its relaxed form [8]. For that, it is convenient to introduce in (.9 new control variables. Consider a sequence of nonnegative weights {a t } t0. Define A t = t k=0 a k, τ t = a t+ A t+, t 0. (.0 From now on, we assume that parameter τ t in method (.9 is chosen in accordance to the rule (.0. Denote It is clear that (.6 V 0 V 0 = max { f(x 0, x 0 x + Ψ(x 0 Ψ(x}, x B ν,t = a 0 V 0 + a +ν k A ν G ν D +ν, t 0. k { } max f(x 0 f(x + G ν x +ν x x 0 +ν + Ψ(x 0 Ψ(x f(x 0 f(x + G νd +ν +ν def = (x 0 + G νd +ν +ν. (. (. Theorem Let sequence {x t } t0 be generated by method (.9. Then, for any ν (0, ], any step t 0, and any x Q we have Proof: A t (f(x t + Ψ(x t t a k [f(x k + f(x k, x x k + Ψ(x] + B ν,t. (.3 k=0 6
7 Indeed, in view of definition (., for t = 0 inequality (.3 is satisfied. Assume that it is valid for some t 0. Then t+ k=0 (.3 a k [f(x k + f(x k, x x k + Ψ(x] + B ν,t A t (f(x t + Ψ(x t + a t+ [f(x t+ + f(x t+, x x t+ + Ψ(x] A t+ f(x t+ + A t Ψ(x t + f(x t+, a t+ (x x t+ + A t (x t x t+ + a t+ Ψ(x (.9 b = At+ f(x t+ + A t Ψ(x t + a t+ [Ψ(x + f(x t+, x v t ] (.9 b A t+ (f(x t+ + Ψ(x t+ + a t+ [Ψ(x Ψ(v t + f(x t+, x v t ] It remains to note that Ψ(x Ψ(v t + f(x t+, x v t (.8 f(x t+ f(x t, x v t (.3 τ ν t L ν D +ν. Thus, for keeping (.3 valid for the next iteration, it is enough to choose B ν,t+ = B ν,t + a+ν t+ A ν t+ G ν D +ν. Corollary For any t 0 with A t > 0, and any ν (0, ] we have f(x t f(x A t B ν,t. (.4 Let us discuss now the possible variants for choosing the weights {a t } t0.. Constant weights. Let us choose a t, t 0. Then A t = t +, and for ν (0, we have (. B ν,t = V 0 + (+k G ν ν D+ν V 0 + G ν D +ν ν ( + t+/ τ ν / (. [ (x 0 + G ν D +ν +ν + ( ( 3 ν ( ] ν + 3 t ν Thus, for ν (0,, we have A t B ν,t O(t ν. For the most important case ν =, ( ( we have lim ν ν + 3 t ν = ln( + 3t. Therefore, f(x t f(x t+ In this situation, in method (.9 we take τ t (.0 = t+. ( (x0 + G D [ + ln( + 3 t]. (.5 7
8 . Linear weights. Let us choose a t t, t 0. Then A t = t(t+, and for ν (0, with t we have B ν,t = (. ν k +ν k ν (+k ν G ν D +ν G ν D+ν ν ν τ ν t+/ / ν k ν G ν D +ν [ (t = ν ν + ν ( ] ν G ν D +ν. Thus, for ν (0,, we again have A t B ν,t O(t ν. For the case ν =, we get the following bound: f(x t f(x 4 t+ G D, t. (.6 As we can see, this rate of convergence is better than (.5. In this case, in (.0 method (.9 we take τ t = t+, which is a standard recommendation for CGM ( Aggressive weights. Let us choose, for example, a t t, t 0. Then A t = t(t+(t+ k 6. Note that for k 0 we have +ν (k+ ν (k+ ν k ν. Therefore, for ν ν (0, with t we obtain B ν,t = (. 6 ν k (+ν k ν (+k ν (k+ ν G ν D +ν G ν D+ν 3ν 3 ν τ 3 ν t+/ / 3 ν k ν G ν D +ν [ (t = 3ν 3 ν + 3 ν ( ] 3 ν G ν D +ν. For ν (0,, we get again A t B ν,t O(t ν. For ν =, we obtain f(x t f(x 9 t+ G D, t., (.7 which is slightly worse than (.6. The rule for choosing the coefficients τ t in this (.0 6(t+ situation is τ t = (t+(t+3. It can be easily checked that further increase of the rate of growth of coefficients a t makes the rate of convergence of method (.9 even worse. Note that the above rules for choosing the coefficients {τ t } t0 in method (.9 do not depend on the smoothness parameter ν (0, ]. In this sense, method (.9 is a universal method for solving the problem (. (see [5]. Moreover, this method does not depend on the choice of the norm in E. Hence, its rate of convergence can be established with respect to the best norm describing the geometry of the feasible set. 3 Trust-region variant of Frank-Wolfe Method Let us consider a variant of method (.9, which takes into account the composite form of the objective function in problem (.. For Ψ(x Ind Q (x, these two methods coincide. 8
9 Otherwise, they generate different minimization sequences. Conditional Gradient Method, Type II. Choose an arbitrary point x 0 Q.. For t 0 iterate: Choose coefficient τ t (0, ] and compute (3. x t+ = arg min y { f(x t, y + Ψ(y : y = ( τ t x t + τ t x, x Q}. This method can be seen as a Trust-Region Scheme [] with linear model of the objective function. Trust region in method (3. is formed by a contraction of the initial feasible set. In Section 6, we will consider a more traditional trust-region method with quadratic model of the objective. Note that point x t+ in method (3. is characterized by the following variational principle: x t+ = ( τ t x t + τ t v t, v t Q, Ψ(( τ t x t + τ t x + τ t f(x t, x x t (3. Ψ(x t+ + f(x t, x t+ x t, x Q. Let us choose somehow the sequence of nonnegative weights {a t } t0, and define in (3. the coefficients τ t in accordance to (.0. Define now the estimating functional sequence {φ t (x} t0 as follows: φ 0 (x = a 0 f(x, φ t+ (x = φ t (x + a t+ [f(x t + f(x t, x x t + Ψ(x], t 0. (3.3 Clearly, for all t 0 we have Denote Let us introduce φ t (x A t f(x, x Q. (3.4 C ν,t = a 0 (x 0 + +ν a +ν k A ν G ν D +ν, t 0. (3.5 k δ(x def (.4 = max{ f(x, x y + Ψ(x Ψ(y} Ψ x,q ( f(x. (3.6 y Q For problem (., this value measures the level of satisfaction of the first-order optimality conditions at point x Q. For any x Q, we have δ(x f(x f(x 0. (3.7 We call δ(x the total variation of linear model of the composite objective function in problem (. over the feasible set. 9
10 Theorem Let sequence {x t } t0 be generated by method (3.. Then, for any ν (0, ] and any step t 0, we have Moreover, for any t 0 we have A t f(xt φ t (x + C ν,t, x Q. (3.8 f(x t f(x t+ τ t δ(x t G νd +ν +ν τ +ν t. (3.9 Proof: Let us prove inequality (3.8. For t = 0, we have C ν,0 = a 0 [ f(x 0 f(x ]. Thus, in this case (3.8 follows from (3.4. Assume now that (3.8 is valid for some t 0. In view of definition (.0, optimality condition (3. can written in the following form: a t+ f(x t, x x t A t+ [Ψ(x t+ Ψ(( τ t x t + τ t x + f(x t, x t+ x t ] for all x Q. Therefore, φ t+ (x + C ν,t = φ t (x + C ν,t + a t+ [f(x t + f(x t, x x t + Ψ(x] (3.,(3.8 A t [f(x t + Ψ(x t ] + a t+ [f(x t + Ψ(x] +A t+ [Ψ(x t+ Ψ(( τ t x t + τ t x + f(x t, x t+ x t ] A t+ [f(x t + f(x t, x t+ x t + Ψ(x t+ ] (.4 A t+ [ f(xt+ +ν G ν x t+ x t +ν ]. It remains to note that x t+ x t = τ t x t v t (.0 a t+ A t+ D. Thus, we can take C ν,t+ = C ν,t + a +ν t+ +ν A ν G ν D +ν. t+ In order to prove inequality (3.9, let us introduce the values δ t (τ def = max x Q { f(x t, x t y + Ψ(x t Ψ(y : y = ( τx t + τx} (.5 = Ψ τ,x t,q ( f(x t, τ [0, ]. Clearly, δ t (τ t = min x Q { f(x t, y x t + Ψ(y Ψ(x t : y = ( τ t x t + τ t x} = f(x t, x t+ x t + Ψ(x t+ Ψ(x t (.4 f(x t+ f(x t Gν +ν x t+ x t +ν. 0
11 Since x t+ x t τ t D, we conclude that f(x t f(x t+ δ t (τ t G νd +ν +ν τt +ν (.6 τ t δ(x t G νd +ν +ν τ +ν t. In view of (3.4, inequality (3.8 results in the following rate of convergence: f(x t f(x A t C ν,t, t 0. (3.0 For the linearly growing weights a t = t, A t = t(t+, t 0, we have already seen that [ C ν,t = +ν B (t ν,t ν (+ν( ν + ν ( ] ν G ν D +ν. In the case ν =, this results in the following rate of convergence: f(x t f(x t+ G D, t. (3. Let us justify for this case the rate of convergence of the sequence {δ(x t } t. We have (.0 τ t = a t+ A t+ = t+. On the other hand, for any T t, G D t+ (3. f(xt f(x (3.9 Denote δ T = min 0tT δ(x t. Then, choosing T = t +, we get ln δ T (. ( T k+ δt T [ τk δ(x k G D τk ] + f(xt + f(x. (3. G D [ (.3 [ ] G D t+ + (t+3 68 GD T +. t+ + T Thus, in the case ν =, for odd T, we get the following bound: ] (k+ [ ] = G D T + + (T + (3. δ T 34 ln GD T +. (3.3 4 Computing the primal-dual solution Note that both methods (.9 and (3. admit computable accuracy certificates. For the first method, denote { t } l t = A t min a k [f(x k + f(x k, x x k + Ψ(x] : x Q. x k=0 It seems that such line of arguments was used in the first time in Section 7.5 of [8].
12 Clearly, f(x t f(x f(x t l t (.3 A t B ν,t. (4. For the second method, let us choose a 0 = 0. Then the estimate functions are linear: φ t (x = t a k [f(x k + f(x k, x x k + Ψ(x]. Therefore, defining ˆl t = A t min{φ t (x : x Q}, we also have x f(x t f(x f(x t ˆl t (.3 A t C ν,t, t. (4. Accuracy certificates (4. and (4. justify that both methods (.9 and (3. are able to recover some information on the optimal dual solution. However, in order to implement this ability, we need to open the Black Box and introduce an explicit model of the function f(x. Let us assume that function f is representable in the following form: f(x = max u { Ax, u g(u : u Q d}, (4.3 where A : E E, Q d is a closed convex set in a finite-dimensional linear space E, and function g( is p-uniformly convex on Q d : g(u g(u, u u σ g u u p, u, u Q d, where the convexity degree p. ( It is well known (e.g. [5] that in this case, for ν = p we have G ν = p σ g. In view of Danskin Theorem, f(x = A u(x, where u(x Q d is the unique optimal solution to optimization problem in the representation (4.3. Let us write down the dual problem to (.. min{ f(x : x Q} (4.3 { } = min Ψ(x + max { Ax, u g(u : u Q x x d} u { } max g(u + min{ A u, x + Ψ(x}. u Q d x Thus, defining Φ(u = min{ A u, x + Ψ(x}, we get the following dual problem: x { } max ḡ(u def = g(u + Φ(u. (4.4 u Q d In this problem, the objective function is nonsmooth and uniformly strongly concave of degree p. Clearly, we have f(x ḡ(u 0, x Q, u Q d. (4.5 Let us show that both methods (.9 and (3. are able to approximate the optimal solution to the dual problem (4.4.
13 Note that for any x Q we have f( x + f( x, x x (4.3 = A x, u( x g(u( x + A u( x, x x = Ax, u( x g(u( x. Therefore, denoting for the first method (.9 u t = A t { l t = min Ψ(x + x Q A t t a k u(x k, we obtain k=0 } t a k [ Ax, u(x k g(u(x k ] k=0 Thus, we get t = Φ(u t A t a k g(u(x k ḡ(u t. k=0 0 (4.5 f(x t ḡ(u t f(x t l t (4. A t B ν,t, t 0. (4.6 For the second method (3., we need to choose a 0 = 0 and take u t = A t In this case, by the similar reasoning, we get t a k u(x k. 0 (4.5 f(x t ḡ(u t f(x t ˆl t (4. A t C ν,t, t. (4.7 5 Strong convexity of function Ψ In this section, we assume that function Ψ in problem (. is strongly convex (see, for example, Section..3 in []. This means that there exists a positive constant σ Ψ such that Ψ(τx + ( τy τψ(x + ( τψ(y σ Ψτ( τ x y (5. for all x, y Q and τ [0, ]. Let us show that in this case CG-methods converge much faster. We demonstrate it for method (.9. In view of strong convexity of Ψ, the variational principle (.8, characterizing the point v t in method (.9 can be strengthen: Ψ(x + f(x t, x v t Ψ(v t + σ ψ x v t, x Q. (5. Let V 0 be defined as in (.. Denote ˆB ν,t = a 0 V 0 + a +ν k G νd ν A ν σ k Ψ, t 0. (5.3 Theorem 3 Let sequence {x t } t0 be generated by method (.9, and function Ψ is strongly convex. Then, for any ν (0, ], any step t 0, and any x Q we have A t (f(x t + Ψ(x t t a k [f(x k + f(x k, x x k + Ψ(x] + ˆB ν,t. (5.4 k=0 3
14 Proof: The beginning of the proof of this statement is very similar to that of Theorem. Assuming that (5.4 is valid for some t 0, we get the following inequality: Further. t+ k=0 a k [f(x k + f(x k, x x k + Ψ(x] + B ν,t A t+ (f(x t+ + Ψ(x t+ + a t+ [Ψ(x Ψ(v t + f(x t+, x v t ]. Ψ(x Ψ(v t + f(x t+, x v t (5. f(x t+ f(x t, x v t + σ Ψ x v t (.3 σ Ψ f(x t+ f(x t (.3 σ Ψ ( a ν t+ A ν t+ G ν D ν. Thus, for keeping (5.4 valid for the next iteration, it is enough to choose ˆB ν,t+ = ˆB ν,t + a +ν t+ σ Ψ A ν t+ G νd ν. It can be easily checked that in our situation, the linear weights strategy a t t is not the best one. Let us choose a t = t, t 0. Then A t = t(t+(t+ 6, and we get ˆB ν,t = (. G νd ν σ Ψ 6 ν k (+ν G νd ν k ν (k+ ν (k+ ν σ Ψ 3 ν 3 ν τ 3 ν t+/ / [ = 3ν 3 ν ( 3 ν t k ( ν (t + 3 ν ( Thus, for ν (0,, we get A t ˆBν,t O(t ν. For ν =, we obtain which is much better than (.6. G νd ν σ Ψ 3 ν ] G ν D ν σ Ψ. f(x t f(x 54 (t+(t+ G D σ Ψ, (5.5 6 Second-order trust-region method Let us assume now that in problem (. function f is twice continuously differentiable. Then we can apply to this problem the following method. 4
15 Contracting Trust-Region Method. Choose an arbitrary point x 0 Q.. For t 0 iterate: Define coefficient τ t (0, ] and choose { x t+ Arg min y f(x t, y x t + f(x t (y x t, y x t + Ψ(y : } y ( τ t x t + τ t x, x Q. (6. Note that this scheme is well defined even if the Hessian of function f is positive semidefinite. Of course, in general, the computational cost of each iteration of this scheme can be big. However, in one important case, when Ψ(x is an indicator function of a Euclidean ball, the complexity of each iteration of this scheme is dominated by complexity of matrix inversion. Thus, method (6. can be easily applied to problems of the form min x {f(x : x x 0 r}, (6. where the norm is Euclidean. Let H ν < + for some ν (0, ]. In this section we assume that f(xh, h L h, x Q, h E. (6.3 Let us choose a sequence of nonnegative weights {a t } t0, and define in (6. the coefficients {τ t } t0 in accordance to (.0. Define the estimate functional sequence {φ t (x} t0 by recurrent relations (3.3, where the sequence {x t } t0 is generated by method (6.. Finally, denote Ĉ ν,t = a 0 (x 0 + a +ν k A +ν k H νd +ν (+ν(+ν + a k A k LD. (6.4 In our convergence results, we estimate also the level of the second-order optimality condition for problem (. at the current test points. Let us introduce θ(x def = max y Q { f(x, x y f(x(y x, y x + Ψ(x Ψ(y}. (6.5 For any x Q we have θ(x 0. We call θ(x the total variation of quadratic model of the composite objective function in problem (. over the feasible set. Denoting F x (y = f(x(y x, y x + Ψ(y, ( we get θ(x = F x ( f(x (see definition (.4. x,q Theorem 4 Let sequence {x t } t0 be generated by method (6.. Then, for any ν [0, ] and any step t 0 we have Moreover, for any t 0 we have A t f(xt φ t (x + Ĉν,t, x Q. (6.6 f(x t f(x t+ τ t θ(x t HνD+ν (+ν(+ν τ +ν t. (6.7 5
16 Proof: Let us prove inequality (6.6. For t = 0, Ĉ ν,0 = a 0 [ f(x 0 f(x ]. Therefore, this inequality is valid. Note that the point x t+ is characterized by the following variational principle: x t+ = ( τ t x t + τ t v t, v t Q, Ψ(y + f(x t + f(x t (x t+ x t, y x t+ Ψ(x t+, y = ( τ t x t + τ t x, x Q. Therefore, in view of definition (.0, for any x Q we have a t+ f(x t, x x t A t+ f(x t + f(x t (x t+ x t, x t+ x t +a t+ f(x t (x t+ x t, x t x +A t+ [Ψ(x t+ Ψ(( τ t x t + τ t x] (6.3 A t+ f(x t + f(x t (x t+ x t, x t+ x t Hence, +A t+ [Ψ(x t+ Ψ(( τ t x t + τ t x] a t+ A t+ LD. A t f(xt + a t+ [f(x t + f(x t, x x t + Ψ(x] A t Ψ(x t + A t+ [f(x t + f(x t + f(x t (x t+ x t, x t+ x t ] +a t+ Ψ(x + A t+ [Ψ(x t+ Ψ(( τ t x t + τ t x] a t+ A t+ LD (.6 A t+ [f(x t+ + Ψ(x t+ ] A t+ H ν x t+ x t +ν (+ν(+ν a t+ A t+ LD A t+ f(xt+ a+ν t+ A +ν t+ H ν D +ν (+ν(+ν a t+ A t+ LD. Thus, if (6.6 is valid for some t 0, then φ t+ (x + Ĉν,t A t f(xt + a t+ [f(x t + f(x t, x x t + Ψ(x] A t+ f(xt+ a+ν t+ A +ν t+ H ν D +ν (+ν(+ν a t+ A t+ LD. Therefore, we can take Ĉν,t+ = Ĉν,t + a+ν t+ A +ν t+ H νd +ν (+ν(+ν + a t+ A t+ LD. 6
17 In order to justify inequality (6.7, let us introduce the values θ t (τ def = max x Q { f(x t, x t y f(x t (y x t, y x t (.5 = +Ψ(x t Ψ(y : y = ( τx t + τx} ( F xt τ,x t,q ( f(x t, τ [0, ]. Clearly, θ t (τ t = min { f(x t, y x t x Q f(x t (y x t, y x t +Ψ(y Ψ(x t : y = ( τ t x t + τ t x} = f(x t, x t+ x t f(x t (x t+ x t, x t+ x t + Ψ(x t+ Ψ(x t (.6 f(x t+ f(x t H ν (+ν(+ν x t+ x t +ν. Since x t+ x t τ t D, we conclude that f(x t f(x t+ θ t (τ t H νd +ν (+ν(+ν τ t +ν (.6 τ t θ(x t H νd +ν (+ν(+ν τ +ν t. Thus, inequality (6.6 ensures the following rate of convergence of method (6. f(x t f(x A t Ĉ ν,t. (6.8 A particular expression of the right-hand side of this inequality for different values of ν [0, ] can be obtained exactly in the same way as it was done in Section. Here, we restrict ourselves only by the case ν = and a t = t, t 0. Then A t = t(t+(t+ 6, and t a 3 k A k = t 36k 6 k (k+ (k+ 8t, Thus, we get t a k A k = t f(x t f(x 3k 4 k(k+(k+ 3 t k = 3 4t(t +. 8H D 3 (t+(t+ + 9LD (t+. (6.9 Note that the rate of convergence (6.9 is worse than the convergence rate of cubic regularization of the Newton method [7]. However, to the best of our knowledge, inequality (6.9 gives us the first global rate of convergence of an optimization scheme belonging to the family of trust-region methods []. In view of inequality (6.6, the optimal solution of the dual problem (4.4 can be approximated by method (6. with a 0 = 0 in the same way as it was suggested in Section 4 for Conditional Gradient Methods. 7
18 Let us estimate now the rate of decrease of the values θ(x t, t 0, in the case ν =. (.0 Note that τ t = a t+ A t+ = 6(t+ (t+(t+3. It is easy to see that these coefficients satisfy the following inequalities: 3 t+3 τ t 6 t+5, t 0. (6.0 Therefore, choosing the total number of steps T = t +, we have T T τk 3 (6.0 τ k 3 t+ k+3 (. 3 ln, (6.0 t+ ( t+5/ (k+5/ 3 (k+5/ t / = 7 [ ] 4 = (T + (T +3 = 7 [ ] (t+ (t+5 7(3T +8(T +4 (T + (T +3 8 (T +(T +. (6. Now we can use the same trick as in the end of Section 3. Denote θ T = min 0tT θ(x t. Then 36H D 3 T (T + 9LD (T (6.9 f(x t f(x T ( f(x k f(x k+ (6.7 T θt τ k H D 3 6 Thus, for even T, we get the following bound: References θ T 3 ln T τk 3 [ 4H D 3 T (T + 3H D 3 4(T +(T + + LD (T [ ] 3 5H D 3 ln T (T + LD (T. (6. 3θ T ln 7H D 3 4(T +(T +. ] (6. [] Conn A.B., Gould N.I.M., Toint Ph.L., Trust Region Methods, SIAM, Philadelphia, 000. [] J. Dunn. Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM Journal on Control and Optimization, 8(5: , (980. [3] M. Frank, P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3: (956. [4] R.M. Freund, P. Grigas. New analysis and results for the FrankWolfe method. Mathematical Programming, DOI 0.007/s , (04. [5] D. Garber, E. Hazan. A linearly convergent conditional gradient algorithm with application to online and stochastic optimization. arxiv: v5, (03. [6] Z. Harchaoui, A. Juditsky, and A. Nemirovski. Conditional gradient algorithms for norm-regularized smooth convex optimization. Mathematical Programming, DOI 0.007/s , (04. 8
19 [7] M. Jaggi. Revisiting Frank-Wolfe: projection-free sparse convex optimization. In Proceedings of the 30 th International Conderence on Machine Learning, Atlanta, Georgia, (03. [8] A. Juditsky and A. Nemirovski. Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles. Mathemetical Programming, Ser. A, DOI 0.007/s [9] S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher. Block-coordinate Frank- Wolfe optimization of structural svms. In Proceedings of the 30 th International Conderence on Machine Learning, Atlanta, Georgia, (03. [0] G. Lan. The complexity of large-scale convex programming under a linear optimization oracle. arxiv: v, (04. [] A. Migdalas. A regularization of the Frank-Wolfe method and unification of certain nonlinear programming methods. Mathematical Programming, 63, , (994 [] Yu. Nesterov. Introductory Lectures on Convex Optimization. Kluwer, Boston, 004. [3] Yu. Nesterov. Primal-dual subgradient methods. (009 [4] Yu.Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 40(, 5-6 (03. [5] Yu. Nesterov. Universal gradient methods for convex optimization problems. Mathematical Programming, DOI: 0.007/s , (04. [6] Nesterov Yu., Nemirovskii A., Interior-Point Polynomial Algorithms in Convex Programming, SIAM, Philadelphia, 994. [7] Yu. Nesterov, B. Polyak. Cubic regularization of Newton s method and its global performance. Mathematical Programming, 08(, 77-05, (006. [8] Yu. Nesterov, V. Shikhman. Quasi-monotone subgradient methods for nonsmooth convex minimization. JOTA, DOI 0.007/s , (04. 9
Cubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationAccelerating the cubic regularization of Newton s method on convex problems
Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method
More informationCORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28
26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel
More informationNonsymmetric potential-reduction methods for general cones
CORE DISCUSSION PAPER 2006/34 Nonsymmetric potential-reduction methods for general cones Yu. Nesterov March 28, 2006 Abstract In this paper we propose two new nonsymmetric primal-dual potential-reduction
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationUniversal Gradient Methods for Convex Optimization Problems
CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex
More informationConstructing self-concordant barriers for convex cones
CORE DISCUSSION PAPER 2006/30 Constructing self-concordant barriers for convex cones Yu. Nesterov March 21, 2006 Abstract In this paper we develop a technique for constructing self-concordant barriers
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationPrimal-dual subgradient methods for convex problems
Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationPrimal-dual IPM with Asymmetric Barrier
Primal-dual IPM with Asymmetric Barrier Yurii Nesterov, CORE/INMA (UCL) September 29, 2008 (IFOR, ETHZ) Yu. Nesterov Primal-dual IPM with Asymmetric Barrier 1/28 Outline 1 Symmetric and asymmetric barriers
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationSmooth minimization of non-smooth functions
Math. Program., Ser. A 103, 127 152 (2005) Digital Object Identifier (DOI) 10.1007/s10107-004-0552-5 Yu. Nesterov Smooth minimization of non-smooth functions Received: February 4, 2003 / Accepted: July
More informationCubic regularization of Newton method and its global performance
Math. Program., Ser. A 18, 177 5 (6) Digital Object Identifier (DOI) 1.17/s117-6-76-8 Yurii Nesterov B.T. Polyak Cubic regularization of Newton method and its global performance Received: August 31, 5
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationarxiv: v1 [math.oc] 10 Oct 2018
8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv:80.04765v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationDISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder
011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium
More informationAn Optimal Affine Invariant Smooth Minimization Algorithm.
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,
More informationOn self-concordant barriers for generalized power cones
On self-concordant barriers for generalized power cones Scott Roy Lin Xiao January 30, 2018 Abstract In the study of interior-point methods for nonsymmetric conic optimization and their applications, Nesterov
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationIntroduction to Nonlinear Stochastic Programming
School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationGeneralized Uniformly Optimal Methods for Nonlinear Programming
Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationSolving Dual Problems
Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem
More informationConvergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer Abstract We generalize the classic convergence rate theory for subgradient methods to
More informationSupplement: Universal Self-Concordant Barrier Functions
IE 8534 1 Supplement: Universal Self-Concordant Barrier Functions IE 8534 2 Recall that a self-concordant barrier function for K is a barrier function satisfying 3 F (x)[h, h, h] 2( 2 F (x)[h, h]) 3/2,
More informationLocal Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems
Local Superlinear Convergence of Polynomial-Time Interior-Point Methods for Hyperbolic Cone Optimization Problems Yu. Nesterov and L. Tunçel November 009, revised: December 04 Abstract In this paper, we
More informationEfficient Methods for Stochastic Composite Optimization
Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June
More informationStochastic Semi-Proximal Mirror-Prox
Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology nhe6@gatech.edu Zaid Harchaoui NYU, Inria firstname.lastname@nyu.edu Abstract We present a direct extension of the Semi-Proximal
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationA Second-Order Path-Following Algorithm for Unconstrained Convex Optimization
A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationAugmented self-concordant barriers and nonlinear optimization problems with finite complexity
Augmented self-concordant barriers and nonlinear optimization problems with finite complexity Yu. Nesterov J.-Ph. Vial October 9, 2000 Abstract In this paper we study special barrier functions for the
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationAn adaptive accelerated first-order method for convex optimization
An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationSelf-Concordant Barrier Functions for Convex Optimization
Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationComplexity and Simplicity of Optimization Problems
Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 - March 23, 2012 (ULg) Yu. Nesterov Algorithmic Challenges in Optimization 1/27 Developments in Computer Sciences
More informationResearch Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization
Iranian Journal of Operations Research Vol. 4, No. 1, 2013, pp. 88-107 Research Note A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization B. Kheirfam We
More informationA full-newton step infeasible interior-point algorithm for linear programming based on a kernel function
A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function Zhongyi Liu, Wenyu Sun Abstract This paper proposes an infeasible interior-point algorithm with
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationSubgradient methods for huge-scale optimization problems
CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationLimited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives
Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Song
More informationFull Newton step polynomial time methods for LO based on locally self concordant barrier functions
Full Newton step polynomial time methods for LO based on locally self concordant barrier functions (work in progress) Kees Roos and Hossein Mansouri e-mail: [C.Roos,H.Mansouri]@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine
More informationAn interior-point trust-region polynomial algorithm for convex programming
An interior-point trust-region polynomial algorithm for convex programming Ye LU and Ya-xiang YUAN Abstract. An interior-point trust-region algorithm is proposed for minimization of a convex quadratic
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationLargest dual ellipsoids inscribed in dual cones
Largest dual ellipsoids inscribed in dual cones M. J. Todd June 23, 2005 Abstract Suppose x and s lie in the interiors of a cone K and its dual K respectively. We seek dual ellipsoidal norms such that
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationOptimisation in Higher Dimensions
CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationLecture 25: Subgradient Method and Bundle Methods April 24
IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything
More informationConvergence rate of inexact proximal point methods with relative error criteria for convex optimization
Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationc 2015 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 5, No. 1, pp. 115 19 c 015 Society for Industrial and Applied Mathematics DUALITY BETWEEN SUBGRADIENT AND CONDITIONAL GRADIENT METHODS FRANCIS BACH Abstract. Given a convex optimization
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationRandomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming
Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationOn Conically Ordered Convex Programs
On Conically Ordered Convex Programs Shuzhong Zhang May 003; revised December 004 Abstract In this paper we study a special class of convex optimization problems called conically ordered convex programs
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationWorst case complexity of direct search
EURO J Comput Optim (2013) 1:143 153 DOI 10.1007/s13675-012-0003-7 ORIGINAL PAPER Worst case complexity of direct search L. N. Vicente Received: 7 May 2012 / Accepted: 2 November 2012 / Published online:
More informationA Proximal Method for Identifying Active Manifolds
A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the
More informationGlobal convergence of the Heavy-ball method for convex optimization
Noname manuscript No. will be inserted by the editor Global convergence of the Heavy-ball method for convex optimization Euhanna Ghadimi Hamid Reza Feyzmahdavian Mikael Johansson Received: date / Accepted:
More information