On the acceleration of augmented Lagrangian method for linearly constrained optimization

Similar documents
On convergence rate of the Douglas-Rachford operator splitting method

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Improving an ADMM-like Splitting Method via Positive-Indefinite Proximal Regularization for Three-Block Separable Convex Minimization

Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1

Iteration-complexity of first-order penalty methods for convex programming

Accelerated primal-dual methods for linearly constrained convex problems

A relaxed customized proximal point algorithm for separable convex programming

496 B.S. HE, S.L. WANG AND H. YANG where w = x y 0 A ; Q(w) f(x) AT g(y) B T Ax + By b A ; W = X Y R r : (5) Problem (4)-(5) is denoted as MVI

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

New hybrid conjugate gradient methods with the generalized Wolfe line search

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

CONSTRAINED OPTIMALITY CRITERIA

5.6 Penalty method and augmented Lagrangian method

Linearized Alternating Direction Method of Multipliers via Positive-Indefinite Proximal Regularization for Convex Programming.

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Determination of Feasible Directions by Successive Quadratic Programming and Zoutendijk Algorithms: A Comparative Study

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

On the Convergence and O(1/N) Complexity of a Class of Nonlinear Proximal Point Algorithms for Monotonic Variational Inequalities

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

Inexact Alternating-Direction-Based Contraction Methods for Separable Linearly Constrained Convex Optimization

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

230 L. HEI if ρ k is satisfactory enough, and to reduce it by a constant fraction (say, ahalf): k+1 = fi 2 k (0 <fi 2 < 1); (1.7) in the case ρ k is n

Application of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming

Constrained Optimization Theory

Priority Programme 1962

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

Subgradient Methods in Network Resource Allocation: Rate Analysis

A Trust Region Algorithm Model With Radius Bounded Below for Minimization of Locally Lipschitzian Functions

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

A GENERALIZATION OF THE REGULARIZATION PROXIMAL POINT METHOD

A Unified Approach to Proximal Algorithms using Bregman Distance

Introduction to Nonlinear Stochastic Programming

Expanding the reach of optimal methods

A Proximal Method for Identifying Active Manifolds

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

Finite Convergence for Feasible Solution Sequence of Variational Inequality Problems

Algorithms for constrained local optimization

Generalization to inequality constrained problem. Maximize

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION

Algorithms for Constrained Optimization

Constrained optimization: direct methods (cont.)

Dual and primal-dual methods

Proximal-like contraction methods for monotone variational inequalities in a unified framework

The Squared Slacks Transformation in Nonlinear Programming

Journal of Convex Analysis (accepted for publication) A HYBRID PROJECTION PROXIMAL POINT ALGORITHM. M. V. Solodov and B. F.

Sparse Optimization Lecture: Dual Methods, Part I

Iteration-complexity of first-order augmented Lagrangian methods for convex programming

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Optimality, Duality, Complementarity for Constrained Optimization

arxiv: v1 [math.oc] 23 May 2017

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Primal/Dual Decomposition Methods

A Solution Method for Semidefinite Variational Inequality with Coupled Constraints

Additional Homework Problems

Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

arxiv: v1 [math.oc] 10 Apr 2017

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

Dual Proximal Gradient Method

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Decision Science Letters

3.10 Lagrangian relaxation

Composite nonlinear models at scale

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

Support Vector Machine via Nonlinear Rescaling Method

Lagrange Relaxation and Duality

Convergence of a Class of Stationary Iterative Methods for Saddle Point Problems

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

4y Springer NONLINEAR INTEGER PROGRAMMING

Splitting methods for decomposing separable convex programs

Constrained Optimization

Convergence rate estimates for the gradient differential inclusion

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Numerical Optimization

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

arxiv: v2 [math.oc] 25 Mar 2018

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

Workshop on Nonlinear Optimization

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

1 Computing with constraints

A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS

CONSTRAINED NONLINEAR PROGRAMMING

Interior-Point Methods for Linear Optimization

Transcription:

On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental role in algorithmic development of constrained optimization. In this paper, we mainly show that Nesterov s influential acceleration techniques can be applied to accelerate ALM, thus yielding an accelerated ALM whose iteration-complexity is O(/ 2 for linearly constrained convex programming. As a by-product, we also show easily that the convergence rate of the original ALM is O(/. Keywords. Convex programming, augmented Lagrangian method, acceleration. Introduction The classical augmented Lagrangian method (ALM, or well-nown as the method of multipliers, has been playing a fundamental role in the algorithmic development of constrained optimization ever since its presence in [2] and [9]. The existing literature about ALM is too many to be listed, and we only refer to [, 8] for its comprehensive study. In this paper, we restrict our discussion into the convex minimization with linear equation constrains: (P min {f(x Ax = b, x X }, (. where f(x : R n R is a differentiable convex function, A R m n and b R m and X is a convex closed set in R n. Throughout we assume that the solution set of (. denoted by X is not empty. Note that the Lagrange function of the problem (. is L(x, λ = f(x λ T (Ax b, (.2 where λ R m is the Lagrange multiplier. Then, the dual problem of (. is (D max x X,λ R m L(x, λ s. t (x x T x L(x, λ, x X. (.3 We denote the solution set of (.3 by X Λ. As analyzed in [], the ALM merges the penalty idea with the primal-dual and Lagrangian philosophy, and each of its iteration consists of the tas of minimizing the augmented Lagrangian function of (. and the tas of updating the Lagrange multiplier. More specifically, starting with λ R m, the -th iteration of ALM for (. is { x + = Argmin {f(x (λ T (Ax b + β 2 Ax b 2 x X }, λ + = λ β(ax + (.4 b, Department of Mathematics and National Key Laboratory for Novel Software Technology, Naning University, Naning, 293, China. This author was supported by the NSFC Grant 9795 and the NSF of Jiangsu Province Grant BK28255. Email: hebma@nu.edu.cn Department of Mathematics, Hong Kong Baptist University, Hong Kong, China. This author was supported in part by HKRGC 239. Email: xmyuan@hbu.edu.h

where β > is the penalty parameter for the violation of the linear constraints. We refer to [] for the relevance of ALM with the classical proximal point algorithm, which was originally proposed in [4] and concretely developed in []. Note that among significant differences of ALM from penalty methods is that the penalty parameter β can be fixed and it is not necessary to be forced to infinity, see e.g. [8]. In this paper, we use a symmetric positive definite matrix H to denote the penalty parameter, indicating the eligibility of adusting values of this parameter dynamically even though the specific strategy of this adustment will not be addressed. More specifically, let {H } be a given series of m m symmetric positive definite matrices and it satisfy H H +, >. Then, the -th iteration of ALM with matrix penalty parameter for (. can be written as { x + = Argmin {f(x (λ T (Ax b + 2 Ax b 2 H x X }, (.5 λ + = λ H (Ax + b. Inspirited by the attractive analysis of iteration-complexity for some gradient methods initiated mainly by Nesterov (see e.g. [5, 6, 7], in this paper, we are interested in analyzing the iteration-complexity of the ALM and discussing the possibility of accelerating ALM with Nesterov s acceleration schemes. More specifically, in Section 2 we shall first show that the iteration-complexity of the ALM is O(/ in terms of the obective residual of the associated Lagrange function of (.. Then, in Section 3, with the acceleration scheme in [6], we propose an accelerated ALM whose iteration-complexity is O(/ 2. Finally, some conclusions are given in Section 4. 2 The complexity of ALM In this section, we mainly show that the iteration-complexity of the classical ALM is O(/ in terms of the obective residual of the associated Lagrange function L(x, λ defined in (.2. Before that, we need to ustify the rationale of estimating the convergence rate of ALM in terms of the obective residual of L(x, λ and prove some properties of the sequence generated by ALM which are critical for complexity analysis to be addressed later. According to (.3, a pair (x, λ X R m is dual feasible if and only if (x x T ( f(x A T λ, x X. (2. Note that the minimization tas regarding x + in the ALM scheme (.5 is characterized by the following variational inequality: (x x + T { f(x + A T λ + A T H (Ax + b }, x X. Therefore, substituting the λ + -related equation in (.5 into the above variational inequality, we have (x x + T { f(x + A T λ +}, x X. (2.2 In other words, the pair (x +, λ + generated by the -th iteration of ALM is feasible to the dual problem (.3. On the other hand, a solution (x, λ X Λ of (.3 is also feasible. We thus have that the sequence {L(x, λ L(x +, λ + } is non-negative. This explains the rationale of estimating the convergence rate of ALM in terms of the obective residual of L(x, λ. Now, we present some properties of the sequence generated by the ALM in the following lemmas. Despite that their proofs are elementary, these lemmas are critical for deriving the main results of iteration-complexity later. Lemma 2.. For given λ, let (x +, λ + be generated by the ALM (.5. feasible solution (x, λ of the dual problem (.3, we have Then, for any L(x +, λ + L(x, λ λ λ + 2 H + (λ λ T H (λ λ +. (2.3 2

Proof. First, using the convexity of f we obtain L(x +, λ + L(x, λ = f(x + f(x + λ T (Ax b (λ + T (Ax + b (x + x T f(x + λ T (Ax b (λ + T (Ax + b. (2.4 Since (x, λ is a feasible solution of the dual problem and x + X, set x = x + in (2., we obtain (x + x T f(x (x + x T A T λ = λ T A(x + x. Substituting the last inequality in the right-hand side of (2.4, we obtain L(x +, λ + L(x, λ λ T A(x + x + λ T (Ax b (λ + T (Ax + b The assertion of this lemma is proved. = (λ λ + T (Ax + b (using (.5 = (λ λ + T H (λ λ + = λ λ + 2 H + (λ λ T H (λ λ +. Lemma 2.2. For given λ, let λ + be generated by the ALM (.5. Then we have λ + λ 2 H λ λ 2 λ λ + 2 H H 2 ( L(x, λ L(x +, λ +, (x, λ X Λ. (2.5 Proof. Since (x, λ is dual feasible, by setting (x, λ = (x, λ in (2.3, we obtain (λ λ T H (λ λ + λ λ + 2 + ( L(x, λ L(x +, λ +. H Using the above inequality and by a manipulation, we obtain λ + λ 2 H = (λ λ (λ λ + 2 H = λ λ 2 H λ λ 2 H The assertion of this lemma is proved. 2(λ λ T H λ λ + 2 H The following theorem implies the global convergence of ALM (.5. (λ λ + + λ λ + 2 H 2 ( L(x, λ L(x +, λ +. Theorem 2.3. Let (x +, λ + be generated by the ALM (.5. Then for any, we have and Moreover, if H H, we have L(x +, λ + L(x, λ + λ λ + 2, (2.6 H λ + λ 2 λ λ 2 H H + λ λ + 2 H. (2.7 Ax + b 2 H Ax b 2 H A(x x + 2 H. (2.8 Proof. The first assertion (2.6 of this theorem is derived immediately from (2.3. Since L(x +, λ + L(x, λ, it follows from (2.5 that Because H + H λ + λ 2 H λ λ 2 H λ λ + 2 H., the second assertion (2.7 follows from the last inequality directly. 3

Now, we start to prove the third assertion (2.8. Setting x = x in (2.2, we obtain Similarly, we have (x x + T ( f(x + A T λ +. (x + x T ( f(x A T λ. Adding the above two inequalities and using the monotonicity of f, we obtain (x x + T A T (λ λ +. By using λ + = λ H (Ax + b (and the assumption H H, the last inequality that Using the above inequality in the identity we obtain (x x + T A T H(Ax + b. Ax b 2 H = Ax + b 2 H + A(x x + 2 H + 2(Ax + b T HA(x x +, and thus the third assertion (2.8 is proved. Ax b 2 H Ax + b 2 H + A(x x + 2 H, Remar 2.4. The inequality (2.7 essentially implies the global convergence of the ALM (.5 with dynamically-adusted matrix penalty parameter. In fact, it follows from (2.7 that which instantly implies that l= λ l λ l+ 2 H l λ λ 2 H, lim λ λ + 2 H =. In the following we show that the sequence of function value {L(x, λ } converges to the optimal value L(x, λ at a rate of convergence that is no worse than O(/. Hence, the iteration-complexity of the ALM (.5 is shown to be O(/ in terms of the obective residual of the Lagrange function L(x, λ. Theorem 2.5. Let (x, λ be generated by the ALM (.5. Then, for any, we have L(x, λ L(x, λ λ λ 2 H, (x, λ X Λ. (2.9 2 Proof. Due to H + H, it follows from Lemma 2.2 that, for all, we have 2(L(x +, λ + L(x, λ λ + λ 2 λ λ 2 H + H + λ λ + 2, (x, λ X Λ. H Using the fact that L(x +, λ + L(x, λ and summing the above inequality over =,...,, we obtain ( 2 = L(x +, λ + L(x, λ λ λ 2 H λ λ 2 H + By using Lemma 2. for = ( and (x, λ = (x, λ, we get = L(x +, λ + L(x, λ λ λ + 2. H 4 λ λ + 2. (2. H

Multiplying the last inequality by 2 and summing over =,...,, it follows that ( 2 ( + L(x +, λ + L(x, λ L(x +, λ + 2 λ λ + 2, H = which can be simplified into ( 2 L(x, λ Adding (2. and (2., we get = 2 ( L(x, λ L(x, λ λ λ 2 H and hence it follows that L(x +, λ + = λ λ 2 H = 2 λ λ + 2. (2. H (2 + λ λ + 2, H + = The proof is complete. L(x, λ L(x, λ λ λ 2 H. 2 3 An accelerated ALM In this section, we show that the classical ALM (.5 can be accelerated by some influential acceleration techniques initialized by Nesterov in [6]. As a result, an accelerated ALM with the convergence rate O(/ 2 for solving (.3 is proposed. For the convenience of presenting the accelerated ALM, from now on we use ( x, λ, rather than (x +, λ +, to denote the iterate generated by the ALM scheme (.5. Namely, with the given λ, the new iterate generated by ALM for (. is ( x, λ : { x = Argmin {f(x (λ T (Ax b + 2 Ax b 2 H x X }, (3. λ = λ H (A x b. Accordingly, Lemmas 2. and 2.2 can be rewritten into the following lemmas. Lemma 3.. For given λ, let ( x, λ be generated by the ALM (3.. Then, for any feasible solution (x, λ of the dual problem (.3, we have L( x, λ L(x, λ λ λ 2 H + (λ λ T H (λ λ. (3.2 Lemma 3.2. For given λ, let ( x, λ be generated by the ALM (3.. Then we have λ λ 2 H λ λ 2 H λ λ 2 H 2 ( L(x, λ L( x, λ, (x, λ X Λ. (3.3 Then, the accelerated ALM for (. is as follows. An accelerated augmented Lagrangian method (AALM Step. Tae λ R m. Set λ = λ and t =. Step. Let ( x, λ be generated by the original ALM (3.. Set and t + = + + 4t 2, (3.4a 2 λ + = λ ( t + ( λ t λ ( t + ( λ λ. + t + (3.4b 5

We first propose some lemmas before the main result. Lemma 3.3. The sequence {t } generated by (3.4a with t = satisfies Proof. Elementary by induction. For the coming analysis, we use the notations t ( + /2,. (3.5 v := L(x, λ L( x, λ and u := t (2 λ λ λ + λ λ. (3.6 Lemma 3.4. The sequences {λ } and { λ } generated by the proposed AALM satisfy where v and u are defined in (3.6. 4t 2 v 4t 2 +v + u + 2 u 2,, (3.7 H + H + Proof. By using Lemma 3. for +, setting (x, λ = ( x, λ and (x, λ = (x, λ, we get and L( x +, λ + L( x, λ λ + λ + 2 + ( λ λ + T H H + (λ+ λ +, + L( x +, λ + L(x, λ λ + λ + 2 + (λ λ + T H H + (λ+ λ +, + respectively. Using the definition of v, the last two inequalities can be written as and v v + λ + λ + 2 + ( λ λ + T H H + (λ+ λ +, (3.8 + v + λ + λ + 2 + (λ λ + T H H + (λ+ λ +. (3.9 + To get a relation between v and v +, we multiply (3.8 by (t + and add it to (3.9: (t + v t + v + t + λ + λ + 2 + ( λ + λ + T H H +( t+ λ + (t + λ λ. + Multiplying the last inequality by t + and using which yields t 2 = t 2 + t + ( and thus t+ = ( + + 4t 2 /2 as in (3.4a, t 2 v t 2 +v + t + ( λ + λ + 2 + t H + ( λ + λ + T H ( + t+ λ + (t + λ λ + = ( t + ( λ + λ + T H +( t+ λ+ (t + λ λ. (3. Use the identity (b a T H + (b c = 4 (2b a c 2 H + 4 a c 2 H + (since x T y = 4 x + y 2 4 x y 2 to the right-hand side of (3. with we get a := t + λ +, b := t + λ+, c := (t + λ + λ, t 2 v t 2 +v + 4 t +(2 λ + λ + λ + λ λ 2 H + 4 t +(λ + λ + λ λ 2. H + 6

Using the notation of u := t (2 λ λ λ + λ λ (see (3.6, the last inequality can be written as 4t 2 v 4t 2 +v + u + 2 t H + (λ + λ + λ λ 2. (3. + H + In order to write the inequality (3. in the form (3.7, we need only to set t + (λ + λ + λ λ = t (2 λ λ λ + λ λ. From the last equality we obtain λ + = λ ( t + ( λ t λ ( t + ( λ λ. + t + This is ust the form (3.4b in the accelerated multi-step version of the ALM and the lemma is proved. Corollary 3.5. Let v and u be defined in (3.6. Then, we have Proof. Again, because H 4t 2 v 4t 2 v + u 2,. (3.2 H + H, from (3.7 we obtain 4t 2 v 4t 2 +v + u + 2 H + u 2. H Since {v } is a non-negative sequence, the last inequality implies (3.2 immediately. Now, we are ready to show the fact that the iteration-complexity of the proposed AALM is O(/ 2. Theorem 3.6. Let { λ } and {λ } be generated by the proposed AALM. Then, for any, we have L(x, λ L( x, λ λ λ 2 H ( + 2, (x, λ X Λ. (3.3 Proof. Using the definition of v in (3.6, it follows from (3.2 that L(x, λ L( x, λ = v Combining with the fact t ( + /2 (see (3.5, it yields 4t 2 v + u 2 H 4t 2. L(x, λ L( x, λ Since t =, and using the definition of u given in (3.6, we have 4t 2 v + u 2 H ( + 2. (3.4 4t 2 v = 4v = 4 ( L(x, λ L( x, λ, By using (3.3, we have u 2 H = 2 λ λ λ 2. (3.5 H 4(L(x, λ L( x, λ 2 λ λ 2 H Use the identity 2 λ λ 2 H 2 λ λ 2. (3.6 H 2 a c 2 2 b c 2 2 b a 2 = a c 2 (b a + (b c 2 to the right-hand side of (3.6 with a := λ, b := λ, c := λ, 7

we get 4(L(x, λ L( x, λ λ λ 2 H Consequently, it follows from (3.5 and (3.7 that 4t 2 v + u 2 H Substituting it in (3.4, the assertion is proved. λ λ 2. H 2 λ λ λ 2. (3.7 H According to Theorem 3.6, for obtaining an ε-optimal solution of (.3 (denoted by ( x, λ in the sense that L(x, λ L( x, λ ε, the number of iterations required by the proposed accelerated ALM is at most C/ ε where C = λ λ 2. H 4 Conclusions In this paper, we first show that the iteration-complexity of the classical augmented Lagrangian method (ALM is O(/ for solving linearly constrained convex programming. Then, we show that the ALM can be accelerated by applying Nesterov s acceleration techniques, and the iterationcomplexity of the yielded accelerated ALM is O(/ 2. In the future, we will investigate (a the complexity of inexact ALM where the subproblems are solved approximately subect to certain criteria, as [3]; (b the complexity of some ALM-based methods, e.g. the well-nown alternating direction method for solving separable convex programming with linear constraints. References [] D. P. Bertseas, Constrained Optimization and Lagrange Multiplier Method, Academic Press, New Yor, 982. [2] M. R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appli., 4 (969, pp. 33-32. [3] G. H. Lan and R. D. C. Monteiro, Iteration-complexity of first-order augmented Lagrangian methods for convex programming, manuscript, 29. [4] B. Martinet, Regularisation, d inéquations variationelles par approximations succesives, Rev. Francaise d Inform. Recherche Oper., 4 (97, pp. 54-59. [5] A. S. Nemirovsy and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley-Interscience Series in Discrete Mathematics, John Wiley & Sons, New Yor, 983. [6] Y. E. Nesterov, A method for solving the convex programming problem with convergence rate O(/ 2, Dol. Aad. Nau SSSR, 269 (983, pp. 543-547. [7] Y. E. Nesterov, Gradient methods for minimizing composite obective function, CORE report 27; available at http://www.ecore.be/dps/dp-933936.pdf. [8] J. Nocedal and S. J. Wright, Numerical Optimization, Springer Verlag, 999. [9] M. J. D. Powell, A method for nonlinear constraints in minimization problems, In Optimization edited by R. Fletcher, pp. 283-298, Academic Press, New Yor, 969. [] R.T. Rocafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Math. Oper. Res. (976, pp. 97-6. [] R.T. Rocafellar, Monotone operators and the proximal point algorithm, SIAM, J. Control Optim. 4 (976, pp. 877-898. 8