One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

Similar documents
Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

6. Proximal gradient method

arxiv: v1 [math.oc] 10 May 2016

Cubic regularization of Newton s method for convex problems with constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Lecture 25: Subgradient Method and Bundle Methods April 24

An Optimal Affine Invariant Smooth Minimization Algorithm.

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

A Unified Approach to Proximal Algorithms using Bregman Distance

Universal Gradient Methods for Convex Optimization Problems

Efficient Methods for Stochastic Composite Optimization

Convex Optimization Lecture 16

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Conditional Gradient (Frank-Wolfe) Method

Math 273a: Optimization Subgradient Methods

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Smooth minimization of non-smooth functions

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

Lasso: Algorithms and Extensions

Optimization for Machine Learning

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Lecture 6: Conic Optimization September 8

Stochastic and online algorithms

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Lecture: Smoothing.

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated Proximal Gradient Methods for Convex Optimization

Complexity of gradient descent for multiobjective optimization

6. Proximal gradient method

Advanced Machine Learning

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Subgradient Method. Ryan Tibshirani Convex Optimization

A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems

Newton s Method. Javier Peña Convex Optimization /36-725

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Nesterov s Random Coordinate Descent Algorithms - Continued

Global convergence of the Heavy-ball method for convex optimization

constrained convex optimization, level-set technique, first-order methods, com-

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On self-concordant barriers for generalized power cones

arxiv: v1 [math.oc] 1 Jul 2016

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Accelerated gradient methods

Optimization for Machine Learning

5. Subgradient method

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Optimal Regularized Dual Averaging Methods for Stochastic Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

Complexity bounds for primal-dual methods minimizing the model of objective function

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

Active-set complexity of proximal gradient

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Dual and primal-dual methods

An adaptive accelerated first-order method for convex optimization

Math 273a: Optimization Convex Conjugacy

A Conservation Law Method in Optimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Dual Proximal Gradient Method

Stochastic Semi-Proximal Mirror-Prox

Optimization methods

arxiv: v1 [math.oc] 5 Dec 2014

Fast proximal gradient methods

Bregman Divergence and Mirror Descent

Linear Convergence under the Polyak-Łojasiewicz Inequality

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

LEARNING IN CONCAVE GAMES

Learning with stochastic proximal gradient

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods

Stochastic model-based minimization under high-order growth

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

A Tutorial on Primal-Dual Algorithm

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Constrained Optimization and Lagrangian Duality

Lecture 16: FTRL and Online Mirror Descent

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

More First-Order Optimization Algorithms

Selected Topics in Optimization. Some slides borrowed from

Data Science - Convex optimization and application

Adaptive Online Gradient Descent

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

GPU Applications for Modern Large Scale Asset Management

arxiv: v1 [math.oc] 9 Sep 2013

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Gradient Sliding for Composite Optimization

Proximal and First-Order Methods for Convex Optimization

Generalized Uniformly Optimal Methods for Nonlinear Programming

Sparse Optimization Lecture: Dual Methods, Part I

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Transcription:

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol, Russia fedyor@mail.ru Moscow Institute of Physics and Technologies, Moscow, Russia a.a.titov@phystech.edu Abstract. The paper is devoted to a special Mirror Descent algorithm for problems of convex minimization with functional constraints. The objective function may not satisfy the Lipschitz condition, but it must necessarily have a Lipshitz-continuous gradient. We assume, that the functional constraint can be non-smooth, but satisfying the Lipschitz condition. In particular, such functionals appear in the well-known Truss Topology Design problem. Also, we have applied the technique of restarts in the mentioned version of Mirror Descent for strongly convex problems. Some estimations for the rate of convergence are investigated for the Mirror Descent algorithms under consideration. Keywords: Adaptive Mirror Descent algorithm Lipshitz-continuous gradient Technique of restarts 1 Introduction The optimization of non-smooth functionals with constraints attracts widespread interest in large-scale optimization and its applications [6, 18]. There are various methods of solving this kind of optimization problems. Some examples of these methods are: bundle-level method [14], penalty method [19], Lagrange multipliers method [7]. Among them, Mirror Descent (MD) [4, 1] is viewed as a simple method for non-smooth convex optimization. In optimization problems with quadratic functionals we consider functionals which do not satisfy the usual Lipschitz property (or the Lipschitz constant is quite large), but they have a Lipshitz-continuous gradient. For such problems in ([], item 3.3) the ideas of [14, 15] were adopted to construct some adaptive Copyright c by the paper s authors. Copying permitted for private and academic purposes. In: S. Belim et al. (eds.): OPTA-SCL 018, Omsk, Russia, published at http://ceur-ws.org

One Mirror Descent Algorithm for Convex Constrained Optimization 373 version of Mirror Descent algorithm. For example, let A i (i = 1,..., m) be a positive-definite matrix: x T A i x 0 x and the objective function f(x) = max 1 i m f i(x) for f i (x) = 1 A ix, x b i, x + α i, i = 1,..., m. Note that such functionals appear in the Truss Topology Design problem with weights of the bars [15]. In this paper we propose some partial adaptive (by objective functional) version of algorithm from ([], item 3.3). It simplifies work with problems where the necessity of calculating the norm of the subgradient of the functional constraint is burdensome in view of the large number of constraints. The idea of restarts [9] is adopted to construct the proposed algorithm in the case of strongly convex objective and constraints. It is well-known that both considered methods are optimal in terms of the lower bounds [1]. Note that a functional constraint, generally, can be non-smooth. That is why we consider subgradient methods. These methods have a long history starting with the method for deterministic unconstrained problems and Euclidean setting in [17] and the generalization for constrained problems in [16], where the idea of steps switching between the direction of subgradient of the objective and the direction of subgradient of the constraint was suggested. Non-Euclidean extension, usually referred to as Mirror Descent, originated in [10, 1] and was later analyzed in [4]. An extension for constrained problems was proposed in [1], see also recent version in [3]. To prove faster convergence rate of Mirror Descent for strongly convex objective in an unconstrained case, the restart technique [11 13] was used in [8]. Usually, the stepsize and stopping rule for Mirror Descent requires to know the Lipschitz constant of the objective function and constraint, if any. Adaptive stepsizes, which do not require this information, are considered in [5] for problems without inequality constraints, and in [3] for constrained problems. We consider some Mirror Descent algorithms for constrained problems in the case of non-standard growth properties of objective functional (it has a Lipshitzcontinuous gradient). The paper consists of Introduction and three main sections. In Section we give some basic notation concerning convex optimization problems with functional constrains. In Section 3 we describe some partial adaptive version (Algorithm ) of Mirror Descent algorithm from ([], item 3.3) and prove some estimates for the rate of convergence of Algorithm. The last Section 4 is focused on the strongly convex case with restarting Algorithm and corresponding theoretical estimates for the rate of convergence.

374 F. Stonyakin, A. Titov Problem Statement and Standard Mirror Descent Basics Let (E, ) be a normed finite-dimensional vector space and E be the conjugate space of E with the norm: y = max{ y, x, x 1}, x where y, x is the value of the continuous linear functional y at x E. Let X E be a (simple) closed convex set. We consider two convex subdiffirentiable functionals f and g : X R. Also, we assume that g is Lipschitzcontinuous: g(x) g(y) M g x y x, y X. (1) We focus on the next type of convex optimization problems f(x) min x X, () s.t. g(x) 0. (3) Let d : X R be a distance generating function (d.g.f) which is continuously differentiable and 1-strongly convex w.r.t. the norm, i.e. x, y, X d(x) d(y), x y x y, and assume that min x X d(x) = d(0). Suppose we have a constant Θ 0 such that d(x ) Θ 0, where x is a solution of () (3). Note that if there is a set of optimal points X, then we may assume that min d(x ) Θ x X 0. For all x, y X consider the corresponding Bregman divergence V (x, y) = d(y) d(x) d(x), y x. Standard proximal setups, i.e. Euclidean, entropy, l 1 /l, simplex, nuclear norm, spectahedron can be found, e.g. in [5]. Let us define the proximal mapping operator standardly { } Mirr x (p) = arg min p, u + V (x, u) for each x X and p E. u X We make the simplicity assumption, which means that Mirr x (p) is easily computable.

One Mirror Descent Algorithm for Convex Constrained Optimization 375 3 Some Mirror Descent Algorithm for the Type of Problems Under Consideration Following [14], given a function f for each subgradient f(x) at a point y X, we define f(x), x y, f(x) 0 v f (x, y) = f(x), x X. (4) 0 f(x) = 0 In ([], item 3.3) the following adaptive Mirror Descent algorithm for Problem () (3) was proposed by the first author. Algorithm 1 Adaptive Mirror Descent, Non-Standard Growth Require:, Θ0, X, d( ) 1: x 0 = argmin d(x) x X : I =: 3: N 0 4: repeat 5: if g(x N ) then 6: h N f(x N ) 7: x N+1 Mirr x N (h N f(x N )) ( productive steps ) 8: N I 9: else 10: (g(x N ) > ) 11: h N g(x N ) 1: x N+1 Mirr x N (h N g(x N )) ( non-productive steps ) 13: end if 14: N N + 1 15: until Θ0 I + k I Ensure: x N := arg min x k,k I f(x k ) g(x k ) For the previous method the next result was obtained in []. Theorem 1. Let > 0 be a fixed positive number and Algorithm 1 works max{1, M g }Θ0 N = (5) steps. Then min v f (x k, x ) <. (6) k I Let us remind one well-known statement (see, e.g. [5]).

376 F. Stonyakin, A. Titov Lemma 1. Let f : X R be a convex subdifferentiable function over the convex set X and a sequence {x k } be defined by the following relation: Then for each x X x k+1 = Mirr x k(h k f(x k )). h k f(x k ), x k x h k f(xk ) + V (x k, x) V (x k+1, x). (7) The following Algorithm is proposed by us for Problem () (3). Algorithm Partial Adaptive Version of Algorithm 1 Require:, Θ0, X, d( ) 1: x 0 = argmin d(x) x X : I =: 3: N 0 4: if g(x N ) then 5: h N M g f(x N ) 6: x N+1 Mirr x N (h N f(x N )) ( productive steps ) 7: N I 8: else 9: (g(x N ) > ) 10: h N M g 11: x N+1 Mirr x N (h N g(x N )) ( non-productive steps ) 1: end if 13: N N + 1 Ensure: x N := arg min x k,k I f(x k ) Set [N] = {k 0, N 1}, J = [N]/I, where I is a collection of indexes of productive steps h k = M g f(x k, (8) ) and I is the number of productive steps. Similarly, for non-productive steps from the set J the analogous variable is defined as follows: h k = M g and J is the number of non-productive steps. Obviously,, (9) I + J = N. (10) Let us formulate the following analogue of Theorem 1.

One Mirror Descent Algorithm for Convex Constrained Optimization 377 Theorem. Let > 0 be a fixed positive number and Algorithm works M g Θ0 N = (11) steps. Then min v f (x k, x ) <. (1) k I M g Proof. 1) For the productive steps from (7), (8) one can get, that h k f(x k ), x k x h k f(xk ) + V (x k, x) V (x k+1, x). Taking into account h k f(x k ) = M, we have g h k f(x k ), x k x = f(x k ) M g f(x k, x k x = v f (x k, x). (13) ) M g ) Similarly for the non-productive steps k J: h k (g(x k ) g(x)) h k g(xk ) + V (x k, x) V (x k+1, x). Using (1) and g(x) M g we have h k (g(x k ) g(x)) M g + V (x k, x) (x k+1, x). (14) 3) From (13) and (14) for x = x we have: v f (x k, x ) + M g M (g(x k ) g(x )) k I k J g N M g (V (x k, x ) V (x k+1, x )). (15) N 1 + k=0 Let us note that for any k J and in view of g(x k ) g(x ) g(x k ) > N (V (x k, x ) V (x k+1, x )) Θ0 k=1 the inequality (15) can be transformed in the following way: v f (x k, x ) N M g k I M g + Θ0 Mg J.

378 F. Stonyakin, A. Titov On the other hand, v f (x k, x ) I min v f (x k, x ). k I Thus, Assume that whence k I M g I min v f (x k, x ) < N M g N Θ 0, or N M gθ 0. (16) N M g M g Mg J = Mg I, M g Mg J + Θ0 I min v f (x k, x ) < I min v f (x k, x ) <. (17) M g M g To finish the proof we should demonstrate that I 0. Supposing the reverse we claim that I = 0 J = N, i.e. all the steps are non-productive, so after using g(x k ) g(x ) g(x k ) > we can see, that N 1 k=0 h k (g(x k ) g(x )) N 1 M k=0 g + Θ 0 N M g + N M g = N Mg. So, and M g N 1 (g(x k ) g(x )) N Mg k=0 N 1 N < (g(x k ) g(x )) N. k=0 So, we have the contradiction. It means that I 0. The following auxiliary assertion (see, e.g [14, 15]) is fullfilled (x is a solution of () (3)). Lemma. Let us introduce the following function: ω(τ) = max x X {f(x) f(x ) : x x τ}, (18) where τ is a positive number. Then for any y X f(y) f(x ) ω(v f (y, x )). (19)

One Mirror Descent Algorithm for Convex Constrained Optimization 379 Now we can show, how using the previous assertion and Theorem, one can estimate the rate of convergence of the Algorithm if the objective function f is differentiable and its gradient satisfies the Lipschitz condition: f(x) f(y) L x y x, y X. (0) Using the next well-known fact f(x) f(x ) + f(x ) x x + 1 L x x, we can get that min k I f(xk ) f(x ) min { f(x ) x k x + 1 } k I L xk x. So, where f(x) f(x ) f(x ) + L = f + L, M g M g M g f = f(x ). M g That is why the following result holds. Corollary 1. If f is differentiable on X and (0) holds. Then after M g Θ0 N = steps of Algorithm working the next estimate can be fulfilled: min 0 k N f(xk ) f(x ) f + L M g, where f = f(x ). M g We can apply our method to some class of problems with a special class of non-smooth objective functionals. Corollary. Assume that f(x) = max f i (x), where f i is differentiable at each i=1,m x X and f i (x) f i (y) L i x y x, y X. Then after N = M g Θ0

380 F. Stonyakin, A. Titov steps of Algorithm working the next estimate can be fulfilled: min 0 k N f(xk ) f(x ) f + L M g, where f = f(x ), M g L = max L i. i=1,m Remark 1. Generally, f(x ) 0, because we consider some class of constrained problems. 4 On the Technique of Restarts in the Considered Version of Mirror Descent for Strongly Convex Problems In this subsection, we consider problem f(x) min, g(x) 0, x X (1) with assumption (1) and additional assumption of strong convexity of f and g with the same parameter µ, i.e., f(y) f(x) + f(x), y x + µ y x, x, y X and the same holds for g. We also slightly modify assumptions on prox-function d(x). Namely, we assume that 0 = arg min x X d(x) and that d is bounded on the unit ball in the chosen norm E, that is d(x) Ω x X : x 1, () where Ω is some known number. Finally, we assume that we are given a starting point x 0 X and a number R 0 > 0 such that x 0 x R 0. To construct a method for solving problem (1) under stated assumptions, we use the idea of restarting Algorithm. The idea of restarting a method for convex problems to obtain faster rate of convergence for strongly convex problems dates back to 1980 s, see e.g. [1, 13]. To show that restarting algorithm is also possible for problems with inequality constraints, we rely on the following Lemma (see, e.g. [1]). Lemma 3. If f and g are µ-strongly convex functionals with respect to the norm on X, x = arg min x X f(x), g(x) 0 ( x X) and f > 0 and g > 0: f(x) f(x ) f, g(x) g. (3) Then µ x x max{ f, g }. (4)

One Mirror Descent Algorithm for Convex Constrained Optimization 381 In conditions of Corollary, after Algorithm stops the inequalities will be true (3) for f = f(x ) + L M g Mg and g =. Consider the function τ : R + R + : { } τ(δ) = max δ f(x ) + δ L ; δm g. It is clear that τ increases and therefore for each > 0 there exists ϕ() > 0 : τ(ϕ()) =. Remark. ϕ() depends on f(x ) and Lipschitz constant L for f. If f(x ) < M g then ϕ() = for small enough : < (M g f(x ) ). L For another case ( f(x ) > M g ) we have > 0: f(x ) ϕ() = + L f(x ). L Let us consider the following Algorithm 3 for the problem (1). Algorithm 3 Algorithm for the Strongly Convex Problem Require: accuracy > 0; strong convexity parameter µ; Θ 0 s.t. d(x) Θ 0 x X : x 1; starting point x 0 and number R 0 s.t. x 0 x R 0. 1: Set d 0(x) = d ( x x0 R 0 ). : Set p = 1. 3: repeat 4: Set R p = R 0 p. 5: Set p = µr p. 6: Set x p as the output of Algorithm with accuracy p, prox-function d p 1( ) and Θ0. ( x xp 7: d p(x) d R p ). 8: Set p = p + 1. 9: until p > log µr 0. Ensure: x p. The following theorem is fulfilled. Theorem 3. Let f and g satisfy Corollary. If f, g are µ-strongly convex functionals on X R n and d(x) θ 0 x X, x 1. Let the starting point

38 F. Stonyakin, A. Titov x 0 X and the number R 0 > 0 be given and x 0 x R0. Then for µr 0 p = log x p is the -solution of Problem () (3), where x p x µ. At the same time, the total number of iterations of Algorithm does not exceed p + p p=1 θ 0Mg ϕ ( p ), where p = µr 0 p+1. Proof. The function d p (x) (p = 0, 1,,...) is 1-strongly convex with respect to the norm R p. By the method of mathematical induction, we show that x p x R p p 0. For p = 0 this assertion is obvious by virtue of the choice of x 0 and R 0. Suppose that for some p: x p x Rp. Let us prove that x p+1 x Rp+1. We have d p (x ) θ0 and on (p + 1)-th restart after no more than θ 0 Mg ϕ ( p+1 ) iterations of Algorithm the following inequalities are true: f(x p+1 ) f(x ) p+1, g(x p+1 ) p+1 for p+1 = µr p+1. Then, according to Lemma 3 So, for all p 0 x p+1 x p+1 µ = R p+1. x p x Rp = R 0 p, f(x p) f(x ) µr 0 p, g(x p ) µr 0 p. µr 0 For p = p = log the following relation is true x p x R p = R 0 p µ. It remains only to note that the number of iterations of the work of Algorithm is no more than p θ 0 Mg p θ0m g ϕ p + ( p+1 ) ϕ ( p+1 ). p=1 p=1

One Mirror Descent Algorithm for Convex Constrained Optimization 383 Acknowledgement. The authors are very grateful to Yurii Nesterov, Alexander Gasnikov and Pavel Dvurechensky for fruitful discussions. The authors would like to thank the unknown reviewers for useful comments and suggestions. The research by Fedor Stonyakin and Alexander Titov (Algorithm 3, Theorem 3, Corollary 1 and Corollary ) presented was partially supported by Russian Foundation for Basic Research according to the research project 18-31-0019. The research by Fedor Stonyakin (Algorithm and Theorem ) presented was partially supported by the grant of the President of the Russian Federation for young candidates of sciences, project no. MK-176.017.1. References 1. Bayandina, A., Gasnikov, A., Gasnikova, E., Matsievsky, S.: Primal-dual mirror descent for the stochastic programming problems with functional constraints. Computational Mathematics and Mathematical Physics. (accepted) (018). https: //arxiv.org/pdf/1604.08194.pdf, (in Russian). Bayandina, A., Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Mirror descent and convex optimization problems with non-smooth inequality constraints. In: LCCC Focus Period on Large-Scale and Distributed Optimization. Sweden, Lund: Springer. (accepted) (017). https://arxiv.org/abs/1710.0661 3. Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algorithm for solving nonsmooth constrained convex problems. Operations Research Letters 38(6), 493 498 (010) 4. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167 175 (003) 5. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. Society for Industrial and Applied Mathematics, Philadelphia (001) 6. Ben-Tal, A., Nemirovski, A.: Robust Truss Topology Design via semidefinite programming. SIAM Journal on Optimization 7(4), 991 1016 (1997) 7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (004) 8. Juditsky, A., Nemirovski, A.: First order methods for non-smooth convex largescale optimization. I: general purpose methods. Optimization for Machine Learning, S. Sra et al (eds.), 11 184. Cambridge, MA: MIT Press (01) 9. Juditsky, A., Nesterov, Y.: Deterministic and stochastic primal-gual subgradient algorithms for uniformly convex minimization. Stochastic Systems 4(1), 44 80 (014) 10. Nemirovskii, A.: Efficient methods for large-scale convex optimization problems. Ekonomika i Matematicheskie Metody (1979), (in Russian) 11. Nemirovskii, A., Nesterov, Y.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 5(), 1 30 (1985), (in Russian) 1. Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. J. Wiley & Sons, New York (1983) 13. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k ). Soviet Mathematics Doklady 7(), 37 376 (1983) 14. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Massachusetts (004)

384 F. Stonyakin, A. Titov 15. Nesterov, Y.: Subgradient methods for convex functions with nonstandard growth properties: https://www.mathnet.ru:8080/presentfiles/16179/ growthbm_nesterov.pdf, [Online; accessed 01-April-018] 16. Polyak, B.: A general method of solving extremum problems. Soviet Mathematics Doklady 8(3), 593 597 (1967), (in Russian) 17. Shor, N. Z.: Generalized gradient descent with application to block programming. Kibernetika 3(3), 53 55 (1967), (in Russian) 18. Shpirko, S., Nesterov Y.: Primal-dual subgradient methods for huge-scale linear conic problem. SIAM Journal on Optimization 4 (3), 1444 1457 (014) 19. Vasilyev, F.: Optimization Methods. Fizmatlit, Moscow (00), (in Russian)