Primal-dual subgradient methods for convex problems

Size: px
Start display at page:

Download "Primal-dual subgradient methods for convex problems"

Transcription

1 Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different types of nonsmooth problems with convex structure. Our methods are primaldual since they are always able to generate a feasible approximation to the optimum of an appropriately formulated dual problem. Besides other advantages, this useful feature provides the methods with a reliable stopping criterion. The proposed schemes differ from the classical approaches (divergent series methods, mirror descent methods) by presence of two control sequences. The first sequence is responsible for aggregating the support functions in the dual space, and the second one establishes a dynamically updated scale between the primal and dual spaces. This additional flexibility allows to guarantee a boundedness of the sequence of primal test points even in the case of unbounded feasible set (however, we always assume the uniform boundedness of subgradients). We present the variants of subgradient schemes for nonsmooth convex minimization, minimax problems, saddle point problems, variational inequalities, and stochastic optimization. In all situations our methods are proved to be optimal from the view point of worst-case black-box lower complexity bounds. Keywords: convex optimization, subgradient methods, non-smooth optimization, minimax problems, saddle points, variational inequalities, stochastic optimization, black-box methods, lower complexity bounds. CORE Discussion Paper # 2005/67 Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), 34 voie du Roman Pays, 348 Louvain-la-Neuve, Belgium; nesterov@core.ucl.ac.be. This paper presents research results of the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office. The scientific responsibility rests with its author.

2 Introduction. Prehistory The results presented in this paper are not very new. Most of them were obtained by the author in However, a further purification of the developed framework led to rather surprising results related to the smoothing technique. Namely, in [3] it was shown that many nonsmooth convex minimization problems with an appropriate explicit structure can be solved up to absolute accuracy ϵ in O( ϵ ) iterations of special gradient-type methods. Recall that the exact lower complexity bound for any black-box subgradient scheme was established on the level of O( ) iterations (see [], or [2], Section 3., for ϵ 2 a more recent exposition). Thus, in [3] it was shown that the gradient-type methods of Structural Optimization outperform the black-box ones by an order of magnitude. At that moment of time, the author got an illusion that the importance of black-box approach in Convex Optimization will be irreversibly vanishing, and, finally, this approach will be completely replaced by other ones based on a clever use of problem s structure (interior-point methods, smoothing, etc.). This explains why the results included in this paper were not published at time. However, the developments of the last years clearly demonstrated that in some situations the black-box methods are irreplaceable. Indeed, the structure of a convex problem may be too complex for constructing a good selfconcordant barrier or for applying a smoothing technique. Note also, that optimization schemes sometimes are employed for modelling certain adjustment processes in real-life systems. In this situation, we are not free in selecting the type of optimization scheme and in the choice of its parameters. However, the results on convergence and the rate of convergence of corresponding methods remain interesting. These considerations encouraged the author to publish the above mentioned results on primal-dual subgradient methods for nonsmooth convex problems. Note that some elements of developed technique were used by the author later on in different papers related to smoothing approach (see [3], [4], [5]). Hence, by a proper referencing we have tried to shorten this paper..2 Motivation Historically, a subgradient method with constant step was the first numerical scheme suggested for approaching a solution to optimization problem min x {f(x) : x R n } (.) with nonsmooth convex objective function f (see [9] for historical remarks). In a convergent variant of this method [7, 8] we need to choose in advance a sequence of steps {λ k } k=0 satisfying the divergent-series rule: λ k > 0, λ k 0, λ k =. (.2) k=0 This unexpected observation resulted also in a series of papers of different authors related to improved methods for variational inequalities [0], [5], and []. 2

3 Then, for k 0 we can iterate: x k+ = x k λ k g k, k 0, (.3) with some g k f(x k ). In another variant of this scheme, the step direction is normalized: x k+ = x k λ k g k / g k 2, k 0, (.4) where 2 denotes the standard Euclidean norm in R n, introduced by the inner product x, y = n x (j) y (j), x, y R n. j= Since the objective function f is nonsmooth, we cannot expect its subgradients be vanishing in a neighborhood of optimal solution to (.). Hence, the condition λ k 0 is necessary for convergence of the processes (.3) or (.4). On the other hand, assuming that g k 2 L, for any x R n we get x x k+ 2 2 (.3) = x x k λ k g k, x x k + λ 2 k g k 2 2 Hence, for any x R n with 2 x x D x x k λ k g k, x x k + λ 2 k L2. f(x) l k (x) def = k λ i [f(x i ) + g i, x x i ]/ k λ i { } λ i f(x i ) D k 2 L2 λ 2 i / k λ i. (.5) Thus, denoting f D = min x {f(x) : 2 x x D}, f k = λ i f(x i ), ω k = λ i 2D+L 2 λ 2 i, 2 λ i we conclude that f k f D ω k. Note that the conditions (.2) are necessary and sufficient for ω k 0. In the above analysis, convergence of the process (.3) is based on the fact that the derivative of l k (x), the lower linear model of the objective function, is vanishing. This model is very important since it provides us also with a reliable stopping criterion. However, examining the structure of linear function l k ( ), we can observe a very strange feature: New subgradients enter the model with decreasing weights. (.6) This feature contradicts common sense. It contradicts also to general principles of iterative schemes, in accordance to which the new information is more important than the old one. Thus, something is wrong. Unfortunately, in our situation a simple treatment is hardly 3

4 possible: we have seen that decreasing weights ( steps) are necessary for convergence of the primal sequence {x k } k=0. The above contradiction served as a point of departure for the developments presented in this paper. The proposed alternative looks quite natural. Indeed, we have seen that in primal space it is necessary to have a vanishing sequence of steps, but in the dual space (the space of linear functions), we would like to apply non-decreasing weights. Consequently, we need two different sequences of parameters, each of which is responsible for some processes in primal and dual spaces. The idea to relate the primal minimization sequence with a master process existing in the dual space is not new. It was implemented first in the mirror descent methods (see [], and [5, 6] for newer versions and historical remarks). However, the divergent series somehow penetrated in this approach too. Therefore, in Euclidean situation the mirror descent method coincides with subgradient one and, consequently, shares the drawback (.6). In our approach, we managed to improve the situation. Of course, it was impossible to improve the dependence of complexity estimates in ϵ. However, the improvement in the constant is significant: the best step-size strategy in [5], [6] gives the same (infinite) constant in the estimate of the rate of convergence as the worst strategy for the new schemes (see Section 8 for discussion). In this paper we consider the primal-dual subgradient schemes. It seems that this intrinsic feature of all subgradient methods was not recognized yet explicitly. Consider, for example, the scheme (.3). From inequality (.5), it is clear that f D ˆf k (D) def = min x {l k (x) : 2 x x D}. Note that the value ˆf k (D) can be easily computed. On the other hand, in Convex Optimization there is only one way to get a lower bound for the optimal solution of a minimization problem. For that, we need to find a feasible solution to a certain dual problem. 2 Thus, computability of ˆf k (D) implies that we are able to point out a dual solution. And convergence of the primal-dual gap f k (D) ˆf k to zero implies that the dual solution approaches the optimal one. Below, we will discuss in details the meaning of the dual solutions generated by the proposed schemes. Finally, note that the subgradient schemes proposed in this paper are different from the standard search methods 3. For example, for problem (.) we suggest to use the following process: { } x k+ = arg min x R n k+ [f(x i ) + g i, x x i ] + µ k x x 0 2 2, (.7) where µ k = O( k ) 0. Note that in this scheme no artificial elements are needed to ensure the boundedness of the sequence of test points. 2 Depending on available information on the structure of the problem, this dual problem can be posed in different forms. Therefore, sometimes we prefer to call such problems the adjoint ones. 3 We mean the methods described in terms of the steps in primal space. The alternative is the mirror descent approach [], in accordance to which the main process is arranged in the dual space. 4

5 .3 Contents The paper is organized as follows. In Section 2 we consider a general scheme of Dual Averaging (DA) and prove the upper bounds for corresponding gap functions. These bounds will be specified later on for the particular problem classes. We give also two main variants of DA-methods, the method of Simple Dual Averages (SDA) and the method of Weighted Dual Averages. In Section 3 we apply DA-methods to minimization over simple sets. In Section 3. we consider different forms of DA-methods as applied to a general minimization problem. The main goal of Section 3 is to show that all DA-methods are primal-dual. For that, we always consider a kind of dual problem and point out the sequence which converges to its solution. In Section 3.2 this is done for general unconstrained minimization problem. In Section 3.3 we do that for minimax problems. It is interesting that approximations of dual multipliers in this case are proportional to the number of times the corresponding functional components were active during the minimization process (compare with [3]). Finally, in Section 3.4 we consider primal-dual schemes for solving minimization problems with simple functional constraints. We manage to obtain for such problems an approximate primal-dual solution despite to the fact that usually even a complexity of computation of the value of dual objective function is comparable with complexity of the initial problem. In the next Section 4 we apply SDA-method to saddle point problems, and in Section 5 it is applied to variational inequalities. It is important that for solvable problems SDAmethod generates a bounded sequence even for unbounded feasible set. In Section 6 we consider a stochastic version of SDA-method. The output of this method can be seen as a random point (the repeated runs of the algorithm give different results). However, we prove that the expected value of the objective function at the output points converges to the optimal value of stochastic optimization problem with certain rate. Finally, in Section 7 we consider two applications of DA-methods to modelling. In Section 7. it is shown that a natural adjustment strategy (which we call balanced development) can be seen as an implementation of DA-method. Hence, it is possible to prove that in the limit this process converges to a solution of some optimization problem. In Section 7.2, for a multi-period optimization problem we compare the efficiency of static strategies based on preliminary planning and a certain dynamic adjustment process (based on SDA-method). It is clear that the dynamic adjustment computationally is much cheaper. On the other hand, we show that its results are comparable with the results of preliminary planning based on complete knowledge of the future. We conclude the paper with a short discussion of results in Section 8. In Appendix we put the proof of the only result on strongly convex functions, for which we did not find an appropriate reference in the comprehensive monographs [8] and [7]..4 Notations and generalities Let E be a finite-dimensional real vector space and E be its dual. We denote the value of linear function s E at x E by s, x. For measuring distances in E, let us fix some (primal) norm. This norm defines a system of primal balls: B r (x) = {y E : y x r}. 5

6 The dual norm on E is introduced, as usual, by s = max x { s, x : x B (0)}, s E. Let Q be a closed convex set in E. Assume that we know a prox-function d(x) of the set Q. This means that d(x) is a continuous function with domain belonging to Q, which is strongly convex on Q with respect to : x, y Q, α [0, ], d(αx + ( α)y) αd(x) + ( α)d(y) 2 σα( α) x y 2, (.8) where σ 0 is the convexity parameter. Denote by x 0 the prox-center of the set Q: x 0 = arg min x {d(x) : x Q}. (.9) Without loss of generality, we assume that d(x 0 ) = 0. In view of Lemma 6 in Appendix, the prox-center is well-defined and d(x) 2 σ x x 0 2, x Q. An important example is the standard Euclidean norm: x = x 2 x, x /2, d(x) = 2 x x 0 2 2, σ =, with some x 0 Q. For another example, consider l -norm: Define Q = n def = {x R n + : x = x n x (i). (.0) i= n x (i) = }. Then the entropy function: i= d(x) = ln n + n x (i) ln x (i). i= is strongly convex on Q with σ = and x 0 = ( n,... n )T (see Lemma 3 [3]). 2 Main algorithmic schemes Let Q be a closed convex set in E endowed with a prox-function d(x). We allow Q to be unbounded (for example, Q E). For our analysis we need to define two support-type functions of the set Q: ξ D (s) = max x Q { s, x x 0 : d(x) D}, V β (s) = max x Q { s, x x 0 βd(x)}, (2.) where D 0 and β > 0 are some parameters. The first function is a usual support function for the set F D = {x Q : d(x) D}. 6

7 The second one is a proximal-type approximation of the support function of set Q. Since d( ) is strongly convex, for any positive D and β we have dom ξ D = dom V β = E. Note that both of the functions are nonnegative. Let us mention some properties of function V ( ). If β 2 β > 0, then for any s E we have V β2 (s) V β (s). (2.2) Note that the level of smoothness of function V β ( ) is controlled by parameter β. Lemma Function V β ( ) is convex and differentiable on E. Moreover, its gradient is Lipschitz continuous with constant βσ : V β (s ) V β (s 2 ) βσ s s 2, s, s 2 E. (2.3) For any s E, vector V β (s) belongs to Q: V β (s) = π β (s) x 0, π β (s) def = arg min{ s, x + βd(x)}. (2.4) x Q (This is a particular case of Theorem in [3].) As a trivial corollary of (2.3) we get the following inequality: V β (s + δ) V β (s) + δ, V β (s) + 2σβ δ 2 s, δ E. (2.5) Note that in view of definition (.9) we have π β (0) = x 0. This implies V β (0) = 0 and V β (0) = 0. Thus, in this case inequality (2.5) with s = 0 yields V β (δ) 2σβ δ 2 δ E. (2.6) In the sequel, we assume that the set Q is simple enough for computing vector π β (s) exactly. For our analysis we need the following relation between the functions (2.). Lemma 2 For any s E and β 0 we have ξ D (s) βd + V β (s). (2.7) Proof: Indeed, ξ D (s) = max x Q { s, x x 0 : d(x) D} = max x Q min { s, x x 0 + β [D d(x)] } β 0 min max { s, x x 0 + β [D d(x)] } β 0 x Q βd + V β (s). Consider now the sequences X k = {x i } k Q, G k = {g i } k E, Λ k = {λ i } k R +. 7

8 Typically, the test points x i and the weights λ i are generated by some algorithmic scheme and the points g i are computed by a black-box oracle G( ): g i = G(x i ), i 0, which related to a specific convex problem. In this paper we consider only the problem instances, for which there exists a point x Q satisfying the condition g, x x 0, g G(x), x Q. (2.8) We call such x the primal solution of our problem. At the same time, we are going to approximate a certain dual solution by the means of observed subgradients. In the sequel, we will explain the actual sense of this dual object for each particular problem class. We are going to approximate the primal and dual solutions of our problem using the following aggregates: = k s k+ = k λ i, ˆx k+ = λ i x i, λ i g i, ŝ k+ = s k+, (2.9) with ˆx 0 = x 0 and s 0 = 0. As we will see later, the quality of the test sequence X k can be naturally described by the following gap function: { } δ k (D) = max λ i g i, x i x : x F D,, D 0. (2.0) x Using notation (2.9), we get an explicit representation of the gap: Sometimes we will use an upper gap function δ k (D) = λ i g i, x i x 0 + ξ D ( s k+ ). (2.) k (β, D) = βd + k λ i g i, x i x 0 + V β ( s k+ ) = k λ i g i, x i π β ( s k+ ) + β (D d(π β ( s k+ ))). In view of (2.7) and (2.), for any non-negative D and β we have (2.2) δ k (D) k (β, D). (2.3) Since Q is a simple set, the values of the gap functions can be easily computed. Note that for some D these values can be negative. However, if a solution x of our problem exists (in the sense of (2.8)), then for D d(x ) ( x F D ), 8

9 the value δ k (D) is non-negative independently on the sequences X k, Λ k and G k, involved in its definition. Consider now the generic scheme of Dual Averaging (DA-scheme). Initialization: Set s 0 = 0 E. Choose β 0 > 0. Iteration (k 0):. Compute g k = G(x k ). (2.4) 2. Choose λ k > 0. Set s k+ = s k + λ k g k. 3. Choose β k+ β k. Set x k+ = π βk+ ( s k+ ). Theorem Let the sequences X k, G k and Λ k be generated by (2.4). Then:. For any k 0 and D 0 we have: δ k (D) k (β k+, D) β k+ D + 2σ 2. Assume that a solution x in the sense (2.8) exists. Then 2 σ x k+ x 2 d(x ) + 2σβ k+ λ 2 i β i g i 2. (2.5) λ 2 i β i g i 2. (2.6) 3. Assume that x is an interior solution: B r (x ) F D for some positive r and D. Then [ ] ŝ k+ r β k+ D + λ 2 i 2σ β i g i 2. (2.7) Proof:. In view of (2.3), we need to prove only the second inequality in (2.5). By the rules of scheme (2.4), for i we obtain V βi+ ( s i+ ) (2.2) V βi ( s i+ ) (2.5) V βi ( s i ) λ i g i, V βi ( s i ) + λ2 i 2σβ i g i 2 (2.4) = V βi ( s i ) + λ i g i, x 0 x i + λ2 i 2σβ i g i 2. Thus, λ i g i, x i x 0 V βi ( s i ) V βi+ ( s i+ ) + λ2 i 2σβ i g i 2, i =,..., k. 9

10 The summation of all these inequalities results in λ i g i, x i x 0 V β ( s ) V βk+ ( s k+ ) + 2σ i= λ 2 i β i g i 2. (2.8) But in view of (2.6) V β ( s ) λ2 0 2σβ g 0 2 λ2 0 2σβ 0 g 0 2. Thus, (2.8) results in (2.5). 2. Let us assume that x exists. Then, 0 (2.8) λ i g i, x i x (2.8) s k+, x 0 x V βk+ ( s k+ ) + 2σ (2.) = s k+, x k+ x + β k+ d(x k+ ) + 2σ (9.5) β k+ d(x ) 2 β k+σ x k+ x 2 + 2σ and that is (2.6). 3. Let us assume now that x is an interior solution. Then { } δ k (D) = max λ i g i, x i x : x F D x max x { Thus, (2.7) follows from (2.5). λ i g i, x i x : x B r (x ) } λ 2 i β i g i 2 λ 2 i β i g i 2 λ 2 i β i g i 2, r s k+. The form of inequalities (2.5) and (2.6) suggests some natural strategies for choosing the parameters β i in the scheme (2.4). Let us define the following sequence: ˆβ 0 = ˆβ =, ˆβi+ = ˆβ i + ˆβi, i. (2.9) The advantage of this sequence is justified by the following relation: ˆβ k+ = k ˆβ i, k 0. Thus, this sequence can be used for balancing the terms appearing in the right-hand side of inequality (2.5). Note that the growth of the sequence can be estimated as follows. Lemma 3 2k ˆβk k, k. (2.20) Proof: From (2.9) we have ˆβ = and ˆβ k+ 2 = ˆβ k ˆβ k + 2 for k. This gives the first inequality in (2.20). On the other hand, since the function β + β is increasing for β and ˆβ k, the second inequality in (2.20) can be justified by induction. We will consider two main strategies for choosing λ i in (2.4): 0

11 Simple averages: λ k =. Weighted averages: λ k = g k. Let us write down the corresponding algorithmic schemes in an explicit form. theorems below are straightforward consequences of Theorem. Both Method of Simple Dual Averages Initialization: Set s 0 = 0 E. Choose γ > 0. Iteration (k 0): (2.2). Compute g k = G(x k ). Set s k+ = s k + g k. 2. Choose β k+ = γ ˆβ k+. Set x k+ = π βk+ ( s k+ ). Theorem 2 Assume that g k L, k 0. For method (2.2) we have = k + and δ k (D) ˆβ ( ) k+ γd + L2 2σγ. Moreover, if a solution x in the sense (2.8) exists, then the scheme (2.2) generates a bounded sequence: x k x 2 2 σ d(x ) + L2 σ 2 γ 2, k 0. Method of Weighted Dual Averages Initialization: Set s 0 = 0 E. Choose ρ > 0. Iteration (k 0): (2.22). Compute g k = G(x k ). Set s k+ = s k + g k / g k. 2. Choose β k+ = ˆβ k+ ρ σ. Set x k+ = π βk+ ( s k+ ). Theorem 3 Assume that g k L, k 0. For method (2.22) we have k+ L δ k (D) ˆβ ( ) σ k+ D ρ + 2 ρ. and Moreover, if a solution x in the sense (2.8) exists, then the scheme (2.22) generates a bounded sequence: x k x 2 σ (2d(x ) + ρ 2 ). In the next sections we show how to apply the above results to different classes of problems with convex structure.

12 3 Minimization over simple sets 3. General minimization problem Consider the following minimization problem: min x {f(x) : x Q}, (3.) where f is a convex function defined on E and Q is a closed convex set. Recall that we assume Q to be simple, which means computability of exact optimal solutions to both minimization problems in (2.). In order to solve problem (3.), we need a black-box oracle, which is able to compute a subgradient of objective function at any test point: 4 G(x) f(x), x E. Then, we have the following interpretation of the gap function δ k (D). Denote l k (x) = λ i [f(x i ) + g i, x x i ], (g i f(x i )) ˆf k (D) = min x {l k (x) : x F D }, fd = min{f(x) : x F D}. x Since f( ) is convex, in view of definition (2.0) of gap function, we have δ k (D) = λ i f(x i ) ˆf N (D) f(ˆx k+ ) fd. (3.2) Thus, we can justify the rate of convergence of methods (2.2) and (2.22) as applied to problem (3.). In the estimates below we assume that g L g f(x), x Q.. Simple averages. In view of Theorem 2 and inequalities (2.20), (3.2) we have ) f(ˆx k+ ) fd k+ k+ (γd + L2 2σγ. (3.3) Note that parameters D and L are not used explicitly in this scheme. However, their estimates are needed for choosing a reasonable value of γ. The optimal choice of γ is as follows: γ = L 2σD. In the method of Simple Dual Averages (SDA), the accumulated lower linear model is remarkably simple: l k (x) = k+ [f(x i ) + g i, x x i ]. 4 Usually, such an oracle can compute also the value of the objective function. However, we do not need it in the algorithm. 2

13 Thus, its algorithmic scheme looks as follows: { } x k+ = arg min x Q k+ [f(x i ) + g i, x x i ] + µ k d(x), k 0, (3.4) where {µ k } k=0 is a sequence of positive scaling parameters. In accordance to the rules of (2.2), we choose µ k = β k+ k+. However, the rate of convergence of the method (3.4) ( remains on the same level for any µ k = O k ). Note that in (3.4) we do not actually need the function values. Note that for method (3.4) we do not need Q to be bounded. For example, we can have Q = E. Nevertheless, if a solution x of problem (3.) do exists, then by Theorem 2 the generated sequence {x k } k=0 is bounded. This feature may look surprising since µ k 0 and no special caution is taken in (3.4) to ensure the boundedness of the sequence. 2. Weighted averages. In view of Theorem 3 and inequalities (2.20), (3.2) we have ( ) f(ˆx k+ ) fd k+ (k+) σ L ρ D + ρ 2. (3.5) As in (2.2), parameters D and L are not used explicitly in method (2.22). But now, in order to choose a reasonable value of ρ, we need a reasonable estimate only for D. The optimal choice of ρ is as follows: ρ = 2D. 3.2 Primal-dual problem For the sake of notation, in this section we assume that the prox-center of the set Q is at the origin: x 0 = arg min d(x) = 0 E. (3.6) x Q Let us assume that an optimal solution x of problem (3.) exists. Then, choosing D d(x ), we can rewrite this problem in an equivalent form: f Consider the conjugate function Note that for any x E we have = min x {f(x) : x F D }. (3.7) f (s) = sup[ s, x f(x) ]. (3.8) x E f(x) dom f. (3.9) Since dom f = E, a converse representation to (3.8) is always valid: f(x) = max[ s, x f (s) : s dom f ], x E. s Hence, we can rewrite the problem (3.7) in a dual form: f = min x F D max [ s, x f (s) ] = s dom f = max s dom f [ ξ D ( s) f (s) ]. 3 max s dom f min [ s, x f (s) ] x F D

14 Thus, we come to the following dual problem: f = min s [ f (s) + ξ D ( s) : s dom f ]. (3.0) As usual, it is worth to unify (3.) and (3.0) in a single primal-dual problem: 0 = min x,s [ ψ D(x, s) def = f(x) + f (s) + ξ D ( s) : x Q, s dom f ]. (3.) Let us show that DA-methods converge to a solution of this problem. Theorem 4 Let pair (ˆx k+, ŝ k+ ) be defined by (2.9) with sequences X k, G k and Λ k being generated by method (2.4) for problem (3.7). Then ˆx k+ Q, ŝ k+ dom f, and ψ D (ˆx k+, ŝ k+ ) δ k (D). (3.2) Proof: Indeed, point ˆx k+ is feasible since it is a convex combination of feasible points. In view of inclusion (3.9), same arguments also work for ŝ k+. Finally, ψ D (ˆx k+, ŝ k+ ) = ξ D ( ŝ k+ ) + f(ˆx k+ ) + f (ŝ k+ ) (2.9) ξ D ( ŝ k+ ) + λ i [f(x i ) + f (g i )] (2.), (3.8) = ξ D ( s k+ ) + λ i g i, x i (2.), (3.6) = δ k (D). Thus, for corresponding methods, the right-hand sides of inequalities (3.3), (3.5) establish the rate of convergence of primal-dual function in (3.2) to zero. In the case of interior solution x, the optimal solution of the dual problem (3.0) is attained at 0 E. In this case the rate of convergence of ŝ k+ to the origin is given by (2.7). In this section, we have shown the abilities of DA-methods (2.4) on the general primal-dual problem (3.). However, as we will see in the next sections, these schemes can provide us with much more detailed dual information. For that we need to employ an available information on the structure of minimization problem. 3.3 Minimax problems Consider the following variant of problem (3.): min{f(x) = max f x j(x) : x Q}, (3.3) j p where f j ( ), j =,..., p, are convex functions defined on E. Of course, this problem admits a dual representation. Let us fix D d(x ). Recall that we denote by p a standard simplex in R p : { } p = y R+ p p : y (j) =. 4 j=

15 Then f = min x Q max f j(x) = j p min x F D max y p p y (j) f j (x) j= = max y p Thus, defining ϕ D (y) = min x F D p min y (j) f j (x). x F D j= p y (j) f j (x), we obtain the dual problem j= f = max y p ϕ D (y). (3.4) Let us show that the DA-schemes generate also an approximation to the optimal solution of the dual problem. For that, we need to employ the structure of the oracle G for the objective function in (3.3). Note that in our case f(x) = Conv { f j (x) : j I(x)}, I(x) = {j : f j (x) = f(x)}. Thus, for any g k in the method (2.4) as applied to the problem (3.3) we can define a vector y k p such that y (j) k = 0, j / I(x k ), g k = j I(x k ) y (j) k g k,j, where g k,j f j (x k ) for j I(x k ). Denote ŷ k+ = y i. (3.5) Theorem 5 Let pair (ˆx k+, ŷ k+ ) be defined by sequences X k, G k and Λ k generated by method (2.4) for problem (3.3). Then this pair is primal-dual feasible and 0 f(ˆx k+ ) ϕ D (ŷ k+ ) δ k (D). (3.6) Proof: Indeed, the pair (ˆx k+, ŷ k+ ) is feasible in view of convexity of primal-dual set Q p. Further, denote F (x) = (f (x),..., f p (x)) T R p. Then in view of (3.5), for any k 0 we have g k, x k x = j I(x k ) y (j) k g k,j, x k x j I(x k ) y (j) k [f j(x k ) f j (x)] Therefore δ k (D) = = y k, F (x k ) F (x) = f(x k ) y k, F (x). max x F D max x F D { λ i g i, x i x { } λ i [f(x i ) y i, F (x) ] } = λ i f(x i ) ϕ D (ŷ k+ ) f(ˆx k+ ) ϕ D (ŷ k+ ). 5

16 Let us write down an explicit form of SDA-method as applied to problem (3.3). Denote by e j the jth coordinate vector in R p. Initialization: Set l 0 (x) 0, m 0 = 0 Z p. Iteration (k 0):. Choose any j k : f j k (x k) = f(x k ). 2. Set l k+ (x) = k k+ l k(x) + k+ [f(x k) + g k,j k, x x k ]. { 3. Compute x k+ = arg min l k+ (x) + γ ˆβ k+ x Q k+ }. d(x) (3.7) 4. Update m k+ = m k + e j k. Output: ˆx k+ = k+ x i, ŷ k+ = k+ m k+. In (3.7), the entry of the optimal dual vector is approximated by the frequency of detecting the corresponding functional component as the maximal one. Note that SDA-method can be applied also to a continuous minimax problem. In this case the objective function has the following form: f(x) = max [ y, F (x) : y Q d}, y where Q d is a bounded closed convex set in R p. For this problem an approximation to the optimal dual solution is obtained as an average of the vectors y(x k ) defined by y(x) Arg max [ y, F (x) : y Q d}. y (Note that for computing y(x) we need to maximize a linear function over a convex set.) Corresponding modifications of the method (3.7) are straightforward. 3.4 Problems with simple functional constraints Let us assume that the set Q in problem (3.) has the following structure: Q = { x Q : Ax = b, F (x) 0 }, (3.8) where Q is a closed convex set in E, dim E = n, b R m, A is an m n-matrix, and F (x) : Q R p is a component-wise convex function. Usually, depending on the importance of corresponding functional components, we can freely decide whether they should be hidden 6

17 in the set Q or not. In any case, representation of the set Q in the form (3.8) is not unique; therefore we call the corresponding dual problem the adjoint one. Let us introduce dual variables u R m and v R p +. Assume that an optimal solution x of (3.), (3.8) exists and D d(x ). Denote ϕ D (u, v) = min{f(x) + u, b Ax + v, F (x) : d(x) D}. (3.9) x Q In this definition d(x) is still a prox-function of the set Q with prox-center x 0 Q. Then the adjoint problem to (3.), (3.8) consists in max u,v {ϕ D(u, v) : u R m, v R p +}. (3.20) Note that the complexity of computation of the objective function in this problem can be comparable with the complexity of the initial problem (3.). Nevertheless, we will see that DA-methods (2.4) are able to approach its optimal solutions. Let us consider two possibilities.. Let us fix D d(x ). Denote by û k, ˆv k the optimal multipliers for essential constraints in the following optimization problem: { } δ k = max x Q λ i g i, x i x : Ax = b, F (x) 0, d(x) D = max x Q, d(x) D min u R m,v R p + ˆL(x, u, v), (3.2) ˆL(x, u, v) = λ i g i, x i x + u, Ax b v, F (x). And let ˆx k be its optimal solution. Then for any x Q with d(x) D we have λ i g i + A T û k p j= with some ψ j F (j) (ˆx k ), j =,..., p, and p j= ˆv (j) k F (j) (ˆx k ) = 0. ˆv (j) k ψ j, x ˆx k 0, Therefore, since f and F are convex, for any such x we get f(x) + û k, b Ax + ˆv k, F (x) λ i [f(x i ) + g i, x x i ] + û k, b Ax + p ˆv (j) k [F (j) (ˆx k ) + ψ j, x ˆx k ] j= λ i [f(x i ) + g i, ˆx k x i ] = 7 λ i f(x i ) δ k (D).

18 Thus, we have proved that 0 f(ˆx k+ ) ϕ D (û k, ˆv k ) λ i f(x i ) ϕ D (û k, ˆv k ) δ k (D). (3.22) Hence, the quality of this primal-dual object can be estimated by Item of Theorem. 2. The above suggestion to form an approximate solution to the adjoint problem (3.20) by the dual multipliers of problem (3.2) has two drawbacks. First of all, it is necessary to guarantee that the auxiliary parameter D is sufficiently big. Secondly, the problem (3.2) needs additional computational efforts. Let us show that the approximate solution to problem (3.20) can be obtained for free, using the objects necessary for finding the point π βk+ ( s k+ ). Denote by ū k, v k the optimal multipliers for functional constraints in the following optimization problem: x k+ = π βk+ ( s k+ ) = arg max x Q = arg max x Q L(x, u, v) = Then for any x Q we have { min u R m,v R p + λ i g i, x β k+ d(x) : Ax = b, F (x) 0 L(x, u, v), λ i g i, x β k+ d(x) + u, Ax b v, F (x). λ i g i β k+ d + A T ū k p v (j) k ψ j, x x k+ 0, j= with some d d(x k+ ) and ψ j F (j) (x k+ ), j =,..., p, and Therefore, for all such x we obtain p j= f(x) + ū k, b Ax + v k, F (x) v (j) k F (j) (x k+ ) = 0. λ i [f(x i ) + g i, x x i ] + ū k, b Ax + p v (j) k [F (j) (x k+ ) + ψ j, x x k+ ] j= λ i [f(x i ) + g i, x k+ x i ] + β k+ d, x k+ x λ i f(x i ) β k+ d(x) [ λ i g i, x i x k+ β k+ d(x k+ ) 8 } ]. (3.23)

19 Since in definition (3.9) we need d(x) D, by (2.2) the above estimates results in ϕ D (ū k, v k ) λ i f(x i ) (β k+, D). (3.24) Hence, the quality of this primal-dual object can be estimated by Item of Theorem. 4 Saddle point problems Consider the general saddle point problem: min u U max v V f(u, v), (4.) where U is a closed convex set in E u, V is a closed convex set in E v, function f(, v) is convex in the first argument on E u for any v V, and function f(u, ) is concave in the second argument on E v for any u U. For set U we assume existence of prox-function d u ( ) with prox-center u 0, which is strongly convex on U with respect to the norm u with convexity parameter σ u. For the set V we introduce the similar assumptions. Note that the point x = (u, v ) Q def = U V is a solution to the problem (4.) if and only if f(u, v) f(u, v ) f(u, v ) u U, v V. Since for any x def = (u, v) Q we have we conclude that f(u, v ) f(u, v) + g v, v v g v f v (u, v), f(u, v) f(u, v) + g u, u u g u f u (u, v), g u, u u + g v, v v 0 (4.2) for any g = (g u, g v ) from f u (u, v) f v (u, v). Thus, the oracle G : x = (u, v) Q g(x) = (g u (x), g v (x)) (4.3) satisfies condition (2.8). Further, let us fix some α (0, ). Then we can introduce the following prox-function of the set Q: d(x) = αd u (u) + ( α)d v (v). In this case, the prox-center of Q is x 0 = (u 0, v 0 ). Defining the norm for E = E u E v by [ ] /2 x = ασ u u 2 u + ( α)σ v v 2 v. we get for function d( ) the convexity parameter σ =. Note that the norm for the dual space E = E u E v is defined now as [ g = ] /2 ασ u g u 2 u, + ( α)σ v g v 2 v,. 9

20 Assuming now the partial subdifferentials of f be uniformly bounded, g u u, L u, g v v, L v, g f u (u, v) f v (u, v), (u, v) Q, we obtain the following bound for the answers of oracle G: g 2 L 2 def = L2 u ασ u + L2 v ( α)σ v. (4.4) Finally, if d u (u ) D u and d v (v ) D v, then x = (u, v ) F D with D = αd u + ( α)d v. (4.5) Let us apply now to (4.) SDA-method (2.2). In accordance to Theorem 2, we get the following bound: δ k (D) ˆβ k+ def k+ Ω, Ω = γd + L2 2γ, with L and D defined by (4.4) and (4.5) respectively. With the optimal choice γ = we obtain Ω = [ 2L 2 D ] ( ) ] /2 = [2 L 2 /2 u ασ u + L2 v ( α)σ v (αd u + ( α)d v ) L 2D = [ ( )] 2 L 2 u D /2 u σ u + L2 vd v σ v + αl2 vd u ( α)σ v + ( α)l2 ud v ασ u Minimizing the latter expression in α we obtain Ω = ( ) 2 L Du u σ u + L Dv v σ v. Thus, we have shown that under an appropriate choice of parameters we can ensure for SDA-method the following rate of convergence for the gap function: δ k (D) ˆβ k+ ( ) k+ 2 L Du u σ u + L Dv v σ v. (4.6) It remains to show that the vanishing gap results in approaching the solution of the saddle point problem (4.). Let us introduce two auxiliary functions: ϕ Dv (u) = max v V {f(u, v) : d v(v) D v }, ψ Du (v) = min u U {f(u, v) : d u(u) D u }. In view of our assumptions, ϕ Dv ( ) is convex on U and ψ Du ( ) is concave on V. Moreover, for any u U and v V we have where f = f(u, v ). ψ Du (v) f ϕ Dv (u), 20

21 Theorem 6 Let point ˆx k+ = (û k+, ˆv k+ ) be defined by (2.9) with sequences X k, G k and Λ k being generated by method (2.4) for problem (4.) with oracle G defined by (4.3). Then ˆx k+ Q and 0 ϕ Dv (û k+ ) ψ Du (ˆv k+ ) δ k (D). (4.7) Proof: Indeed, since f(, v) is convex and f(u, ) is concave, { } def τ k = max λ i g u (u i, v i ), u i u : d u (u) D u u U max u U { } λ i [f(u i, v i ) f(u, v i )] : d u (u) D u λ i f(u i, v i ) min u U {f(u, ˆv k+) : d u (u) D u } Similarly, σ k = λ i f(u i, v i ) ψ Du (ˆv k+ ). { } def = max λ i g v (u i, v i ), v v i : d v (v) D v v V max v V { } λ i [f(u i, v) f(u i, v i )] : d v (v) D v max {f(û k+, v) : d v (v) D v } v V λ i f(u i, v i ) Thus, = ϕ Dv (û k+ ) λ i f(u i, v i ). max x Q { ϕ Dv (û k+ ) ψ Du (ˆv k+ ) τ k + σ k } λ i g(x i ), x i x : d(x) αd u + ( α)d v = δ k (D). Note that we did not assume boundedness of the sets U and V. Parameter D from inequality (4.7) does not appear in DA-method (2.4) in explicit form. 5 Variational inequalities Let Q be a closed convex set endowed, as usual, with a prox-function d(x). Consider a continuous operator V ( ) : Q E, which is monotone on Q: V (x) V (y), x y 0 x, y Q. 2

22 In this section we are interested in variational inequality problem (VI): Find x Q: V (x), x x 0 x Q. (5.) Sometimes this problem is called a weak variational inequality. Since V ( ) is continuous and monotone, the problem (5.) is equivalent to its strong variant: Find x Q: V (x ), x x 0 x Q. (5.2) We assume that for this problem there exists at least one solution x. Let us fix D > d(x ). A standard measure for quality of approximate solution to (5.) is the restricted merit function f D (x) = max{ V (y), x y : d(y) D}. (5.3) y Q Let us present two important facts related to this function (our exposition is partly taken from Section 2 in [5]). Lemma 4 Function f D (x) is well defined and convex on E. For any x Q we have f D (x) 0, and f D (x ) = 0. Conversely, if f D (ˆx) = 0 for some ˆx Q with d(ˆx) < D, then ˆx is a solution to problem (5.). Proof: Indeed, function f D (x) is well defined since the set F D is bounded. It is convex in x as a maximum of a parametric family of linear functions. If x F D, then taking in (5.3) y = x we guarantee f D (x) 0. If x Q and d(x) > D, consider y Q satisfying condition for certain α (0, ). Then y F D and d(y) = D, y = αx + ( α)x, V (y), x y = α α V (y), y x 0. Thus, f D (x) 0 for all x Q. On the other hand, since x is a solution to (5.), V (y), x y 0 for all y from F D. Hence, since x F D, we get f D (x ) = 0. Finally, let f D (ˆx) = 0 for some ˆx Q and d(ˆx) < D. This means that ˆx is a solution of the weak variational inequality V (y), y ˆx 0 for all y F D. Since V (y) is continuous, we conclude that ˆx is a solution to the strong variational inequality V (ˆx), y ˆx 0 y F D. Hence, the minimum of linear function l(y) = V (ˆx), y, y F D, is attained at y = ˆx. Moreover, at this point the constraint d(y) D is not active. Thus, this constraint can be removed and we conclude that ˆx is a solution to (5.2) and, consequently, to (5.). Let us use for problem (5.) the oracle G : x V (x). Lemma 5 For any k 0 we have f D (ˆx k+ ) δ k (D). (5.4) 22

23 Proof: Indeed, since V (x) is a monotone operator, f D (ˆx k+ ) = max x { V (x), ˆx k+ x : x F D } = max x max x { } λ i V (x), x i x : x F D { } λ i V (x i ), x i x : x F D δ k (D). Let us assume that V (x) L for all x Q. Then, for example, by Theorem 2 we conclude that SDA-method (2.2) as applied to the problem (5.) converges with the following rate: f D (ˆx k+ ) ˆβ k+ k+ ) (γd(x ) + L2 2σγ. (5.5) Note that the sequence {x k } k=0 remains bounded even for unbounded feasible set Q: x k x 2 2 σ d(x ) + L2 σ 2 γ 2. (5.6) 6 Stochastic optimization Let Ξ be a probability space endowed with a probability measure, and let Q be a closed convex set in E endowed with a prox-function d(x). We are given a cost function f : Q Ξ R. This mapping defines a family of random variables f(x, ξ) on Ξ. We assume that the expectation in ξ, E ξ (f(x, ξ)) is well-defined for all x Q. Our problem of interest is the stochastic optimization problem: min x {ϕ(x) E ξ (f(x, ξ)) : x Q}. (6.) We assume that problem (6.) is solvable and denote by x one of its solutions with ϕ def = ϕ(x ). In this section we make the standard convexity assumption. Assumption For any ξ Ξ, function f(, ξ) is convex and subdifferentiable on Q. Then, the function ϕ(x) is convex on Q. Note that computation of the exact value of ϕ(x) may not be numerically possible, even when the distribution of ξ is known. Indeed, ( ) if Ξ R m, then computing ˆϕ, such that ˆϕ ϕ(x) ˆϵ, may involve up to O ˆϵ computations of f(x, ξ) for different m ξ Ξ. Hence, the standard deterministic notion of approximate solution to problem (6.) becomes useless. However, an alternative definition of the solution to (6.) is possible. Indeed, it is clear that no solution concept can exclude failures in actual implementation. This is the very nature of decision under uncertainty not to have full control over the consequences of a 23

24 given decision. In this context, there is no reason to require that the computed solution be the result of some deterministic process. A random computational scheme would be equally acceptable, if the solution meets some probabilistic criterion. Namely, let x be a random output of some random algorithm. It appears, that it is enough to assume that this output is good only in average: E(ϕ(x)) ϕ ϵ. (6.2) Then, as it was shown in [6], any random optimization algorithm, which is able to generate such random output for any ϵ > 0, can be transformed in an algorithm generating random solutions with an appropriate level of confidence. In [6] this transformation was justified for a stochastic version of subgradient method (.3) (which is known to possess the feature (6.2)). The goal of this section is to prove that a stochastic version of SDA-method (2.2) as applied to problem (6.) is also able to produce a random variable x Q satisfying condition (6.2). In order to justify convergence of the method we need more assumptions. Assumption 2 At any point x Q and any implementation of random vector ξ Ξ the stochastic oracle G(x, ξ) always returns a predefined answer f (x, ξ) x f(x, ξ). Moreover, the answers of the oracle are uniformly bounded: f (x, ξ) L, x Q, ξ Ξ. Let us write down a stochastic modification of SDA-method in the same vein as [6]. In this section, we adopt notation τ for a particular result of drawing of a random variable τ. Thus, all ξ k R m in the scheme below denote some observations of corresponding i.i.d. random vectors ξ k, which are the identical copies of the random vector ξ defined in (6.). Method of Stochastic Simple Averages Initialization: Set s 0 = 0 E. Choose γ > 0. Iteration (k 0): (6.3). Generate ξ k and compute g k = f ( x k, ξ k ). 2. Set s k+ = s k + g k, x k+ = π γ ˆβk+ ( s k+ ). Denote by ξ k, k 0, the collection of random variables (ξ 0,..., ξ k ). (6.3) defines two sequences of random vectors SSA-method s k+ = s k+ (ξ k ) E, x k+ = x k+ (ξ k ) Q, k 0, with deterministic s 0 and x 0. Note that their implementations s k+ and x k+ are different for different runs of SSA-method. 24

25 Thus, for each particular run of this method, we can define a gap function { } δ k (D) = max f ( x i, ξ i ), x i x : x F D,, D 0, (6.4) x which can be seen as a drawing of random variable δ k (D). Since g k L, Theorem 2 results in the following uniform bound: S δk k (D) ˆβ ) k+ k+ (γd + L2 2σγ. (6.5) Let us choose now D big enough: D d(x ). Then S δk k (D) k+ f ( x i, ξ i ), x i x k+ [f( x i, ξ i ) f(x, ξ i )]. Since all ξ i are i.i.d. random vectors, in average we have ( ) E ξk δ k (D) k+ E ξk (f(x i, ξ i )) ϕ. (6.6) Note that the random vector x i is independent on ξ j, i j k. Therefore, and we conclude that k+ E ξk (f(x i, ξ i )) = k+ E ξk (f(x i, ξ i )) = E ξk (ϕ(x i )), ( E ξk (ϕ(x i )) E ξk (ϕ k+ )) x i. Thus, in view of inequalities (6.5), (6.6), we have proved the following theorem. Theorem 7 Let the random sequence {x k } k=0 be defined by SSA-method (6.3). Then for any k 0 we have ( )) E ξk (ϕ k+ x i ϕ ˆβ ) k+ k+ (γd(x ) + L2 2σγ. 7 Applications in modelling 7. Algorithm of balanced development It appears that a variant of DA-algorithm (2.4) has a natural interpretation of certain rational dynamic investment strategy. Let us discuss an example of such a model. Assume we are going to develop our property by a given priori sequence Λ = {λ k } k=0 of consecutive investments of the capital. This money can be spent for buying certain products with unitary prices p (j) 0, j =,..., N. 25

26 All products are perfectly divisible; they are available on the market in unlimited quantities. Thus, if for investment k we buy quantity X (j) k 0 of each product j, j =,..., N, then the balance equation can be written as X k = λ k x k, p, x k =, x k 0, (7.) where the vectors p = (p (),..., p (N) ) T and x k = (x () k,... x(n) ) T are both from R N. In order to measure the progress of our property (or, the efficiency of our investments) we introduce a set of m characteristics (features). Let us assume that after k 0 investments we are able to measure exactly an accumulated level s (i) k, i =,..., m, of each feature i in our property. On the other hand, we have a set of standards b (i) > 0, i =,..., m, which are treated as lower bounds on concentration of each feature in a perfectly balanced unit of such a property. Finally, let us assume that each product j is described by a vector a j = (a () j,..., a (m) j ) T, j =,..., N, with entries being the unitary concentrations of corresponding characteristics. Introducing the matrix A = (a,..., a N ) R m N, we can describe the dynamics of accumulated features as follows: s k+ = s k + λ k g k, g k = Ax k, k 0. (7.2) In accordance to our system of standards, the strategy of optimal balanced development can be found from the following optimization problem: τ def = max x,τ {τ : Ax τb, p, x =, x 0}. (7.3) Indeed, denote by x its optimal solution. Then, choosing in (7.2) x k = x we get s k = Ax τ b. Thus, the efficiency of such an investment strategy is maximal possible and its quality even does not depend on the schedule of the payments. However, note that in order to apply the above strategy, it is necessary to solve in advance a linear programming problem (7.3), which can be very complicated because of a high dimension. What can be done if this computation appears to be too difficult? Do we have some simple tools for approaching the optimal strategy in real life? Apparently, the latter question has a positive answer. First of all, let us represent the problem (7.3) in a dual form. From the Lagrangian L(x, τ, y, λ) = τ + T y, Ax τb + λ ( p, x ) where T is a diagonal matrix, T (i,i) =, i =,..., m, and corresponding saddle-point b (i) problem max min L(x, τ, y, λ), τ; x 0 λ; y 0 we come to the following dual representation τ = min y,λ {λ : AT T y λp, y m }. 26

27 Or, introducing the function ϕ(y) = max a j N p (j) j, T y, we obtain τ = min y {ϕ(y) : y m }. (7.4) Let us apply to problem (7.4) a variant of DA-scheme (2.4). For feasible set of this problem m, we choose the entropy prox-function d(y) = ln m + m y (i) ln y (i) with prox-center y 0 = ( m,..., m )T. Note that d(y) is strongly convex on m in l -norm (.0) with convexity parameter σ =, and for any y m we have i= d(y) ln m. (7.5) Moreover, with this prox-function the computation of projection π β (s) is very cheap: [ m π β (s) (i) = e s(i) /β e /β] s(j), i =,..., m, (7.6) j= (see, for example, Lemma 4 in [3]). In the algorithm below, the sequence Λ is the same as in (7.2). Algorithm of Balanced Development Initialization: Set s 0 = 0 E. Choose β 0 > 0. Iteration (k 0):. Form the set J k = { } j : a p (j) j, T y k = ϕ(y k ). 2. Choose any x k 0, p, x k =, with x (j) k = 0 for j / J k. (7.7) 3. Update s k+ = s k + λ k g k with g k = Ax k. 4. Choose β k+ β k and for i =,..., m define [ y (i) k+ = e s(i) k+ /(β k+b (i)) m k+ /(β k+b )] (j). e s(j) j= Note that in this scheme T g k ϕ(y k ) and ( y k+ = π βk+ T k j=0 λ j g j ). 27

28 Therefore it is indeed a variant of (2.4), and in view of Theorem and (7.5) we have ( ) λ i ϕ(y i ) τ β k+ ln m + k 2 L2 λ 2 i β i, L = max j N p (j) T a j = max j N, i m a (i) p (j) b (i) j. (7.8) BD-algorithm (7.7) admits the following interpretation. Its first three steps describe a natural investment strategy. Indeed, the algorithm updates a system of personal prices for characteristics def u k = T y k, k 0. By these prices, for any product j we can compute an estimate of its utility a j, u k. Therefore, in Step we form a subset of available products with the best quality-price ratio (Step ): J k = { j : p (j) a j, u k = max i N } a p (i) i, u k. In Step 2 we share a unit of budget in an arbitrary way among the best products. And in Step 3 we buy the products within the budget λ k and update the vector of accumulated features. A non-trivial part of interpretation is related to Step 4, which describes a dependence of the personal prices, developed to the end of period k +, in the vector of accumulated characteristics s k+ observed during this period. For this interpretation, we need to explain first the meaning of (7.6). Relations (7.6) appear in so-called logit variant of discrete choice model (see, for example, [2]). In this model we need to choose one of m variants with utilities s (i), i =,..., m. The utilities are observed with additive random errors ϵ (i), which are i.i.d. in accordance to double-exponential distribution: [ ] { F (τ) = Prob ϵ (i) τ = exp e γ τ/β}, i =,..., m, where γ is the Euler s constant and β > 0 is a tolerance parameter. Then, the expected observation of the utility function is given by ( ) U β (s) = E max i m [s(i) + ϵ (i) ], It can be proved that U β (s) = βd (s/β) + const with ( m ) d (s) = ln e s(i). Therefore we have the following important expressions for so-called choice probabilities: [ ] Prob s (i) + ϵ (i) = max j m [s(j) + ϵ (j) ] = U β (s) (i) [ m = d (s/β) (i) = e s(i) /β e /β] s(j), i =,..., m, j= 28

29 (compare with (7.6)). The smaller β is, the closer is our choice strategy to deterministic maximum. Using the logit model, we can adopt the following behavioral explanation of the rules of Step 4 in (7.7). During the period k + we regularly compare the levels of accumulated features in relative scale defined by the vector of minimal standards b. Each audit defines the worst characteristic, for which the corresponding level is minimal. The above computations are done with additive errors 5, which satisfy the logit model. To the end of the period we obtain the frequencies y (i) k+ of detecting the level of corresponding characteristic as the worst one. Then, we apply the following prices: u (i) k+ = b (i) y (i) k+, i =,..., m. (7.9) Note that the conditions for convergence of BD-method can be derived from the righthand side of inequality (7.8). Assume for example, that we have a sequence of equal investments λ. Further, during each period we compare average accumulation of characteristics using the logit model with tolerance parameter µ. Hence, in (7.7) we take β k+ = µ (k + ), λ k = λ, k 0. Defining β 0 = µ, we can estimate the right-hand side of inequality (7.8) as follows: µ λl2 λ ln m + 2µ 2+ln k k+, k. Thus, the quality of our property is stabilized on the level τ + µ λ ln m. In order to have convergence to the optimal balance, the inaccuracy level µ of audit testing must gradually go to zero. 7.2 Preliminary planning versus dynamic adjustment Let us consider a multi-period optimization problem in the following form. For k consecutive periods, k = 0,..., N, our expenses will be defined by different convex objective functions f 0 (x),..., f N (x), x Q, where Q is a closed convex set. Thus, if we choose to apply for the period k some strategy x k Q, then the total expenses are given by Ψ(X) = N f k (x k ), X = {x k } N k=0. k=0 In this model there are several possibilities for defining a rational strategy.. If all functions {f k (x)} N k=0 are known in advance, then we can define an optimal dynamic strategy X as x k = x k def = arg min x {f k (x) : x Q}, k = 0,..., N. (7.0) 5 Note that these errors can occur in a natural way, due to inexact data, for example. However, even if the actual data and measurements are exact, we need to introduce an artificial randomization of the results. 29

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Accelerating the cubic regularization of Newton s method on convex problems

Accelerating the cubic regularization of Newton s method on convex problems Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Nonsymmetric potential-reduction methods for general cones

Nonsymmetric potential-reduction methods for general cones CORE DISCUSSION PAPER 2006/34 Nonsymmetric potential-reduction methods for general cones Yu. Nesterov March 28, 2006 Abstract In this paper we propose two new nonsymmetric primal-dual potential-reduction

More information

Smooth minimization of non-smooth functions

Smooth minimization of non-smooth functions Math. Program., Ser. A 103, 127 152 (2005) Digital Object Identifier (DOI) 10.1007/s10107-004-0552-5 Yu. Nesterov Smooth minimization of non-smooth functions Received: February 4, 2003 / Accepted: July

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder 011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Constructing self-concordant barriers for convex cones

Constructing self-concordant barriers for convex cones CORE DISCUSSION PAPER 2006/30 Constructing self-concordant barriers for convex cones Yu. Nesterov March 21, 2006 Abstract In this paper we develop a technique for constructing self-concordant barriers

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.

More information

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28

CORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28 26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel

More information

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017 Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov

More information

Complexity and Simplicity of Optimization Problems

Complexity and Simplicity of Optimization Problems Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 - March 23, 2012 (ULg) Yu. Nesterov Algorithmic Challenges in Optimization 1/27 Developments in Computer Sciences

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Lagrange Relaxation and Duality

Lagrange Relaxation and Duality Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler

More information

arxiv: v2 [math.oc] 1 Jul 2015

arxiv: v2 [math.oc] 1 Jul 2015 A Family of Subgradient-Based Methods for Convex Optimization Problems in a Unifying Framework Masaru Ito (ito@is.titech.ac.jp) Mituhiro Fukuda (mituhiro@is.titech.ac.jp) arxiv:403.6526v2 [math.oc] Jul

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

Additional Homework Problems

Additional Homework Problems Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

arxiv: v1 [math.oc] 21 Apr 2016

arxiv: v1 [math.oc] 21 Apr 2016 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

Duality Theory of Constrained Optimization

Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

More information

Stochastic model-based minimization under high-order growth

Stochastic model-based minimization under high-order growth Stochastic model-based minimization under high-order growth Damek Davis Dmitriy Drusvyatskiy Kellie J. MacPhee Abstract Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 35, No., May 010, pp. 84 305 issn 0364-765X eissn 156-5471 10 350 084 informs doi 10.187/moor.1090.0440 010 INFORMS On the Power of Robust Solutions in Two-Stage

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Algorithmic models of market equilibrium

Algorithmic models of market equilibrium CORE DISCUSSION PAPER 2013/66 Algorithmic models of market equilibrium Yu. Nesterov and V. Shikhman November 26, 2013 Abstract In this paper we suggest a new framework for constructing mathematical models

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Online First-Order Framework for Robust Convex Optimization

Online First-Order Framework for Robust Convex Optimization Online First-Order Framework for Robust Convex Optimization Nam Ho-Nguyen 1 and Fatma Kılınç-Karzan 1 1 Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. July 20, 2016;

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Mathematics for Economists

Mathematics for Economists Mathematics for Economists Victor Filipe Sao Paulo School of Economics FGV Metric Spaces: Basic Definitions Victor Filipe (EESP/FGV) Mathematics for Economists Jan.-Feb. 2017 1 / 34 Definitions and Examples

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

More First-Order Optimization Algorithms

More First-Order Optimization Algorithms More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM

More information

A convergence result for an Outer Approximation Scheme

A convergence result for an Outer Approximation Scheme A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento

More information

Solving generalized semi-infinite programs by reduction to simpler problems.

Solving generalized semi-infinite programs by reduction to simpler problems. Solving generalized semi-infinite programs by reduction to simpler problems. G. Still, University of Twente January 20, 2004 Abstract. The paper intends to give a unifying treatment of different approaches

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Introduction to Nonlinear Stochastic Programming

Introduction to Nonlinear Stochastic Programming School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

arxiv: v1 [math.oc] 13 Dec 2018

arxiv: v1 [math.oc] 13 Dec 2018 A NEW HOMOTOPY PROXIMAL VARIABLE-METRIC FRAMEWORK FOR COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, LIANG LING, AND KIM-CHUAN TOH arxiv:8205243v [mathoc] 3 Dec 208 Abstract This paper suggests two novel

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions Olga Galinina olga.galinina@tut.fi ELT-53656 Network Analysis Dimensioning II Department of Electronics Communications Engineering Tampere University of Technology, Tampere, Finl January 29, 2014 1 2 3

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION

LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION ROMAN A. POLYAK Abstract. We introduce the Lagrangian Transformation(LT) and develop a general LT method for convex optimization problems. A class Ψ of

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Maximal Monotone Inclusions and Fitzpatrick Functions

Maximal Monotone Inclusions and Fitzpatrick Functions JOTA manuscript No. (will be inserted by the editor) Maximal Monotone Inclusions and Fitzpatrick Functions J. M. Borwein J. Dutta Communicated by Michel Thera. Abstract In this paper, we study maximal

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Computing closest stable non-negative matrices

Computing closest stable non-negative matrices Computing closest stable non-negative matrices Yu. Nesterov and V.Yu. Protasov August 22, 2017 Abstract Problem of finding the closest stable matrix for a dynamical system has many applications. It is

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information