Primal-dual subgradient methods for convex problems
|
|
- Erick Arnold
- 5 years ago
- Views:
Transcription
1 Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different types of nonsmooth problems with convex structure. Our methods are primaldual since they are always able to generate a feasible approximation to the optimum of an appropriately formulated dual problem. Besides other advantages, this useful feature provides the methods with a reliable stopping criterion. The proposed schemes differ from the classical approaches (divergent series methods, mirror descent methods) by presence of two control sequences. The first sequence is responsible for aggregating the support functions in the dual space, and the second one establishes a dynamically updated scale between the primal and dual spaces. This additional flexibility allows to guarantee a boundedness of the sequence of primal test points even in the case of unbounded feasible set (however, we always assume the uniform boundedness of subgradients). We present the variants of subgradient schemes for nonsmooth convex minimization, minimax problems, saddle point problems, variational inequalities, and stochastic optimization. In all situations our methods are proved to be optimal from the view point of worst-case black-box lower complexity bounds. Keywords: convex optimization, subgradient methods, non-smooth optimization, minimax problems, saddle points, variational inequalities, stochastic optimization, black-box methods, lower complexity bounds. CORE Discussion Paper # 2005/67 Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), 34 voie du Roman Pays, 348 Louvain-la-Neuve, Belgium; nesterov@core.ucl.ac.be. This paper presents research results of the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office. The scientific responsibility rests with its author.
2 Introduction. Prehistory The results presented in this paper are not very new. Most of them were obtained by the author in However, a further purification of the developed framework led to rather surprising results related to the smoothing technique. Namely, in [3] it was shown that many nonsmooth convex minimization problems with an appropriate explicit structure can be solved up to absolute accuracy ϵ in O( ϵ ) iterations of special gradient-type methods. Recall that the exact lower complexity bound for any black-box subgradient scheme was established on the level of O( ) iterations (see [], or [2], Section 3., for ϵ 2 a more recent exposition). Thus, in [3] it was shown that the gradient-type methods of Structural Optimization outperform the black-box ones by an order of magnitude. At that moment of time, the author got an illusion that the importance of black-box approach in Convex Optimization will be irreversibly vanishing, and, finally, this approach will be completely replaced by other ones based on a clever use of problem s structure (interior-point methods, smoothing, etc.). This explains why the results included in this paper were not published at time. However, the developments of the last years clearly demonstrated that in some situations the black-box methods are irreplaceable. Indeed, the structure of a convex problem may be too complex for constructing a good selfconcordant barrier or for applying a smoothing technique. Note also, that optimization schemes sometimes are employed for modelling certain adjustment processes in real-life systems. In this situation, we are not free in selecting the type of optimization scheme and in the choice of its parameters. However, the results on convergence and the rate of convergence of corresponding methods remain interesting. These considerations encouraged the author to publish the above mentioned results on primal-dual subgradient methods for nonsmooth convex problems. Note that some elements of developed technique were used by the author later on in different papers related to smoothing approach (see [3], [4], [5]). Hence, by a proper referencing we have tried to shorten this paper..2 Motivation Historically, a subgradient method with constant step was the first numerical scheme suggested for approaching a solution to optimization problem min x {f(x) : x R n } (.) with nonsmooth convex objective function f (see [9] for historical remarks). In a convergent variant of this method [7, 8] we need to choose in advance a sequence of steps {λ k } k=0 satisfying the divergent-series rule: λ k > 0, λ k 0, λ k =. (.2) k=0 This unexpected observation resulted also in a series of papers of different authors related to improved methods for variational inequalities [0], [5], and []. 2
3 Then, for k 0 we can iterate: x k+ = x k λ k g k, k 0, (.3) with some g k f(x k ). In another variant of this scheme, the step direction is normalized: x k+ = x k λ k g k / g k 2, k 0, (.4) where 2 denotes the standard Euclidean norm in R n, introduced by the inner product x, y = n x (j) y (j), x, y R n. j= Since the objective function f is nonsmooth, we cannot expect its subgradients be vanishing in a neighborhood of optimal solution to (.). Hence, the condition λ k 0 is necessary for convergence of the processes (.3) or (.4). On the other hand, assuming that g k 2 L, for any x R n we get x x k+ 2 2 (.3) = x x k λ k g k, x x k + λ 2 k g k 2 2 Hence, for any x R n with 2 x x D x x k λ k g k, x x k + λ 2 k L2. f(x) l k (x) def = k λ i [f(x i ) + g i, x x i ]/ k λ i { } λ i f(x i ) D k 2 L2 λ 2 i / k λ i. (.5) Thus, denoting f D = min x {f(x) : 2 x x D}, f k = λ i f(x i ), ω k = λ i 2D+L 2 λ 2 i, 2 λ i we conclude that f k f D ω k. Note that the conditions (.2) are necessary and sufficient for ω k 0. In the above analysis, convergence of the process (.3) is based on the fact that the derivative of l k (x), the lower linear model of the objective function, is vanishing. This model is very important since it provides us also with a reliable stopping criterion. However, examining the structure of linear function l k ( ), we can observe a very strange feature: New subgradients enter the model with decreasing weights. (.6) This feature contradicts common sense. It contradicts also to general principles of iterative schemes, in accordance to which the new information is more important than the old one. Thus, something is wrong. Unfortunately, in our situation a simple treatment is hardly 3
4 possible: we have seen that decreasing weights ( steps) are necessary for convergence of the primal sequence {x k } k=0. The above contradiction served as a point of departure for the developments presented in this paper. The proposed alternative looks quite natural. Indeed, we have seen that in primal space it is necessary to have a vanishing sequence of steps, but in the dual space (the space of linear functions), we would like to apply non-decreasing weights. Consequently, we need two different sequences of parameters, each of which is responsible for some processes in primal and dual spaces. The idea to relate the primal minimization sequence with a master process existing in the dual space is not new. It was implemented first in the mirror descent methods (see [], and [5, 6] for newer versions and historical remarks). However, the divergent series somehow penetrated in this approach too. Therefore, in Euclidean situation the mirror descent method coincides with subgradient one and, consequently, shares the drawback (.6). In our approach, we managed to improve the situation. Of course, it was impossible to improve the dependence of complexity estimates in ϵ. However, the improvement in the constant is significant: the best step-size strategy in [5], [6] gives the same (infinite) constant in the estimate of the rate of convergence as the worst strategy for the new schemes (see Section 8 for discussion). In this paper we consider the primal-dual subgradient schemes. It seems that this intrinsic feature of all subgradient methods was not recognized yet explicitly. Consider, for example, the scheme (.3). From inequality (.5), it is clear that f D ˆf k (D) def = min x {l k (x) : 2 x x D}. Note that the value ˆf k (D) can be easily computed. On the other hand, in Convex Optimization there is only one way to get a lower bound for the optimal solution of a minimization problem. For that, we need to find a feasible solution to a certain dual problem. 2 Thus, computability of ˆf k (D) implies that we are able to point out a dual solution. And convergence of the primal-dual gap f k (D) ˆf k to zero implies that the dual solution approaches the optimal one. Below, we will discuss in details the meaning of the dual solutions generated by the proposed schemes. Finally, note that the subgradient schemes proposed in this paper are different from the standard search methods 3. For example, for problem (.) we suggest to use the following process: { } x k+ = arg min x R n k+ [f(x i ) + g i, x x i ] + µ k x x 0 2 2, (.7) where µ k = O( k ) 0. Note that in this scheme no artificial elements are needed to ensure the boundedness of the sequence of test points. 2 Depending on available information on the structure of the problem, this dual problem can be posed in different forms. Therefore, sometimes we prefer to call such problems the adjoint ones. 3 We mean the methods described in terms of the steps in primal space. The alternative is the mirror descent approach [], in accordance to which the main process is arranged in the dual space. 4
5 .3 Contents The paper is organized as follows. In Section 2 we consider a general scheme of Dual Averaging (DA) and prove the upper bounds for corresponding gap functions. These bounds will be specified later on for the particular problem classes. We give also two main variants of DA-methods, the method of Simple Dual Averages (SDA) and the method of Weighted Dual Averages. In Section 3 we apply DA-methods to minimization over simple sets. In Section 3. we consider different forms of DA-methods as applied to a general minimization problem. The main goal of Section 3 is to show that all DA-methods are primal-dual. For that, we always consider a kind of dual problem and point out the sequence which converges to its solution. In Section 3.2 this is done for general unconstrained minimization problem. In Section 3.3 we do that for minimax problems. It is interesting that approximations of dual multipliers in this case are proportional to the number of times the corresponding functional components were active during the minimization process (compare with [3]). Finally, in Section 3.4 we consider primal-dual schemes for solving minimization problems with simple functional constraints. We manage to obtain for such problems an approximate primal-dual solution despite to the fact that usually even a complexity of computation of the value of dual objective function is comparable with complexity of the initial problem. In the next Section 4 we apply SDA-method to saddle point problems, and in Section 5 it is applied to variational inequalities. It is important that for solvable problems SDAmethod generates a bounded sequence even for unbounded feasible set. In Section 6 we consider a stochastic version of SDA-method. The output of this method can be seen as a random point (the repeated runs of the algorithm give different results). However, we prove that the expected value of the objective function at the output points converges to the optimal value of stochastic optimization problem with certain rate. Finally, in Section 7 we consider two applications of DA-methods to modelling. In Section 7. it is shown that a natural adjustment strategy (which we call balanced development) can be seen as an implementation of DA-method. Hence, it is possible to prove that in the limit this process converges to a solution of some optimization problem. In Section 7.2, for a multi-period optimization problem we compare the efficiency of static strategies based on preliminary planning and a certain dynamic adjustment process (based on SDA-method). It is clear that the dynamic adjustment computationally is much cheaper. On the other hand, we show that its results are comparable with the results of preliminary planning based on complete knowledge of the future. We conclude the paper with a short discussion of results in Section 8. In Appendix we put the proof of the only result on strongly convex functions, for which we did not find an appropriate reference in the comprehensive monographs [8] and [7]..4 Notations and generalities Let E be a finite-dimensional real vector space and E be its dual. We denote the value of linear function s E at x E by s, x. For measuring distances in E, let us fix some (primal) norm. This norm defines a system of primal balls: B r (x) = {y E : y x r}. 5
6 The dual norm on E is introduced, as usual, by s = max x { s, x : x B (0)}, s E. Let Q be a closed convex set in E. Assume that we know a prox-function d(x) of the set Q. This means that d(x) is a continuous function with domain belonging to Q, which is strongly convex on Q with respect to : x, y Q, α [0, ], d(αx + ( α)y) αd(x) + ( α)d(y) 2 σα( α) x y 2, (.8) where σ 0 is the convexity parameter. Denote by x 0 the prox-center of the set Q: x 0 = arg min x {d(x) : x Q}. (.9) Without loss of generality, we assume that d(x 0 ) = 0. In view of Lemma 6 in Appendix, the prox-center is well-defined and d(x) 2 σ x x 0 2, x Q. An important example is the standard Euclidean norm: x = x 2 x, x /2, d(x) = 2 x x 0 2 2, σ =, with some x 0 Q. For another example, consider l -norm: Define Q = n def = {x R n + : x = x n x (i). (.0) i= n x (i) = }. Then the entropy function: i= d(x) = ln n + n x (i) ln x (i). i= is strongly convex on Q with σ = and x 0 = ( n,... n )T (see Lemma 3 [3]). 2 Main algorithmic schemes Let Q be a closed convex set in E endowed with a prox-function d(x). We allow Q to be unbounded (for example, Q E). For our analysis we need to define two support-type functions of the set Q: ξ D (s) = max x Q { s, x x 0 : d(x) D}, V β (s) = max x Q { s, x x 0 βd(x)}, (2.) where D 0 and β > 0 are some parameters. The first function is a usual support function for the set F D = {x Q : d(x) D}. 6
7 The second one is a proximal-type approximation of the support function of set Q. Since d( ) is strongly convex, for any positive D and β we have dom ξ D = dom V β = E. Note that both of the functions are nonnegative. Let us mention some properties of function V ( ). If β 2 β > 0, then for any s E we have V β2 (s) V β (s). (2.2) Note that the level of smoothness of function V β ( ) is controlled by parameter β. Lemma Function V β ( ) is convex and differentiable on E. Moreover, its gradient is Lipschitz continuous with constant βσ : V β (s ) V β (s 2 ) βσ s s 2, s, s 2 E. (2.3) For any s E, vector V β (s) belongs to Q: V β (s) = π β (s) x 0, π β (s) def = arg min{ s, x + βd(x)}. (2.4) x Q (This is a particular case of Theorem in [3].) As a trivial corollary of (2.3) we get the following inequality: V β (s + δ) V β (s) + δ, V β (s) + 2σβ δ 2 s, δ E. (2.5) Note that in view of definition (.9) we have π β (0) = x 0. This implies V β (0) = 0 and V β (0) = 0. Thus, in this case inequality (2.5) with s = 0 yields V β (δ) 2σβ δ 2 δ E. (2.6) In the sequel, we assume that the set Q is simple enough for computing vector π β (s) exactly. For our analysis we need the following relation between the functions (2.). Lemma 2 For any s E and β 0 we have ξ D (s) βd + V β (s). (2.7) Proof: Indeed, ξ D (s) = max x Q { s, x x 0 : d(x) D} = max x Q min { s, x x 0 + β [D d(x)] } β 0 min max { s, x x 0 + β [D d(x)] } β 0 x Q βd + V β (s). Consider now the sequences X k = {x i } k Q, G k = {g i } k E, Λ k = {λ i } k R +. 7
8 Typically, the test points x i and the weights λ i are generated by some algorithmic scheme and the points g i are computed by a black-box oracle G( ): g i = G(x i ), i 0, which related to a specific convex problem. In this paper we consider only the problem instances, for which there exists a point x Q satisfying the condition g, x x 0, g G(x), x Q. (2.8) We call such x the primal solution of our problem. At the same time, we are going to approximate a certain dual solution by the means of observed subgradients. In the sequel, we will explain the actual sense of this dual object for each particular problem class. We are going to approximate the primal and dual solutions of our problem using the following aggregates: = k s k+ = k λ i, ˆx k+ = λ i x i, λ i g i, ŝ k+ = s k+, (2.9) with ˆx 0 = x 0 and s 0 = 0. As we will see later, the quality of the test sequence X k can be naturally described by the following gap function: { } δ k (D) = max λ i g i, x i x : x F D,, D 0. (2.0) x Using notation (2.9), we get an explicit representation of the gap: Sometimes we will use an upper gap function δ k (D) = λ i g i, x i x 0 + ξ D ( s k+ ). (2.) k (β, D) = βd + k λ i g i, x i x 0 + V β ( s k+ ) = k λ i g i, x i π β ( s k+ ) + β (D d(π β ( s k+ ))). In view of (2.7) and (2.), for any non-negative D and β we have (2.2) δ k (D) k (β, D). (2.3) Since Q is a simple set, the values of the gap functions can be easily computed. Note that for some D these values can be negative. However, if a solution x of our problem exists (in the sense of (2.8)), then for D d(x ) ( x F D ), 8
9 the value δ k (D) is non-negative independently on the sequences X k, Λ k and G k, involved in its definition. Consider now the generic scheme of Dual Averaging (DA-scheme). Initialization: Set s 0 = 0 E. Choose β 0 > 0. Iteration (k 0):. Compute g k = G(x k ). (2.4) 2. Choose λ k > 0. Set s k+ = s k + λ k g k. 3. Choose β k+ β k. Set x k+ = π βk+ ( s k+ ). Theorem Let the sequences X k, G k and Λ k be generated by (2.4). Then:. For any k 0 and D 0 we have: δ k (D) k (β k+, D) β k+ D + 2σ 2. Assume that a solution x in the sense (2.8) exists. Then 2 σ x k+ x 2 d(x ) + 2σβ k+ λ 2 i β i g i 2. (2.5) λ 2 i β i g i 2. (2.6) 3. Assume that x is an interior solution: B r (x ) F D for some positive r and D. Then [ ] ŝ k+ r β k+ D + λ 2 i 2σ β i g i 2. (2.7) Proof:. In view of (2.3), we need to prove only the second inequality in (2.5). By the rules of scheme (2.4), for i we obtain V βi+ ( s i+ ) (2.2) V βi ( s i+ ) (2.5) V βi ( s i ) λ i g i, V βi ( s i ) + λ2 i 2σβ i g i 2 (2.4) = V βi ( s i ) + λ i g i, x 0 x i + λ2 i 2σβ i g i 2. Thus, λ i g i, x i x 0 V βi ( s i ) V βi+ ( s i+ ) + λ2 i 2σβ i g i 2, i =,..., k. 9
10 The summation of all these inequalities results in λ i g i, x i x 0 V β ( s ) V βk+ ( s k+ ) + 2σ i= λ 2 i β i g i 2. (2.8) But in view of (2.6) V β ( s ) λ2 0 2σβ g 0 2 λ2 0 2σβ 0 g 0 2. Thus, (2.8) results in (2.5). 2. Let us assume that x exists. Then, 0 (2.8) λ i g i, x i x (2.8) s k+, x 0 x V βk+ ( s k+ ) + 2σ (2.) = s k+, x k+ x + β k+ d(x k+ ) + 2σ (9.5) β k+ d(x ) 2 β k+σ x k+ x 2 + 2σ and that is (2.6). 3. Let us assume now that x is an interior solution. Then { } δ k (D) = max λ i g i, x i x : x F D x max x { Thus, (2.7) follows from (2.5). λ i g i, x i x : x B r (x ) } λ 2 i β i g i 2 λ 2 i β i g i 2 λ 2 i β i g i 2, r s k+. The form of inequalities (2.5) and (2.6) suggests some natural strategies for choosing the parameters β i in the scheme (2.4). Let us define the following sequence: ˆβ 0 = ˆβ =, ˆβi+ = ˆβ i + ˆβi, i. (2.9) The advantage of this sequence is justified by the following relation: ˆβ k+ = k ˆβ i, k 0. Thus, this sequence can be used for balancing the terms appearing in the right-hand side of inequality (2.5). Note that the growth of the sequence can be estimated as follows. Lemma 3 2k ˆβk k, k. (2.20) Proof: From (2.9) we have ˆβ = and ˆβ k+ 2 = ˆβ k ˆβ k + 2 for k. This gives the first inequality in (2.20). On the other hand, since the function β + β is increasing for β and ˆβ k, the second inequality in (2.20) can be justified by induction. We will consider two main strategies for choosing λ i in (2.4): 0
11 Simple averages: λ k =. Weighted averages: λ k = g k. Let us write down the corresponding algorithmic schemes in an explicit form. theorems below are straightforward consequences of Theorem. Both Method of Simple Dual Averages Initialization: Set s 0 = 0 E. Choose γ > 0. Iteration (k 0): (2.2). Compute g k = G(x k ). Set s k+ = s k + g k. 2. Choose β k+ = γ ˆβ k+. Set x k+ = π βk+ ( s k+ ). Theorem 2 Assume that g k L, k 0. For method (2.2) we have = k + and δ k (D) ˆβ ( ) k+ γd + L2 2σγ. Moreover, if a solution x in the sense (2.8) exists, then the scheme (2.2) generates a bounded sequence: x k x 2 2 σ d(x ) + L2 σ 2 γ 2, k 0. Method of Weighted Dual Averages Initialization: Set s 0 = 0 E. Choose ρ > 0. Iteration (k 0): (2.22). Compute g k = G(x k ). Set s k+ = s k + g k / g k. 2. Choose β k+ = ˆβ k+ ρ σ. Set x k+ = π βk+ ( s k+ ). Theorem 3 Assume that g k L, k 0. For method (2.22) we have k+ L δ k (D) ˆβ ( ) σ k+ D ρ + 2 ρ. and Moreover, if a solution x in the sense (2.8) exists, then the scheme (2.22) generates a bounded sequence: x k x 2 σ (2d(x ) + ρ 2 ). In the next sections we show how to apply the above results to different classes of problems with convex structure.
12 3 Minimization over simple sets 3. General minimization problem Consider the following minimization problem: min x {f(x) : x Q}, (3.) where f is a convex function defined on E and Q is a closed convex set. Recall that we assume Q to be simple, which means computability of exact optimal solutions to both minimization problems in (2.). In order to solve problem (3.), we need a black-box oracle, which is able to compute a subgradient of objective function at any test point: 4 G(x) f(x), x E. Then, we have the following interpretation of the gap function δ k (D). Denote l k (x) = λ i [f(x i ) + g i, x x i ], (g i f(x i )) ˆf k (D) = min x {l k (x) : x F D }, fd = min{f(x) : x F D}. x Since f( ) is convex, in view of definition (2.0) of gap function, we have δ k (D) = λ i f(x i ) ˆf N (D) f(ˆx k+ ) fd. (3.2) Thus, we can justify the rate of convergence of methods (2.2) and (2.22) as applied to problem (3.). In the estimates below we assume that g L g f(x), x Q.. Simple averages. In view of Theorem 2 and inequalities (2.20), (3.2) we have ) f(ˆx k+ ) fd k+ k+ (γd + L2 2σγ. (3.3) Note that parameters D and L are not used explicitly in this scheme. However, their estimates are needed for choosing a reasonable value of γ. The optimal choice of γ is as follows: γ = L 2σD. In the method of Simple Dual Averages (SDA), the accumulated lower linear model is remarkably simple: l k (x) = k+ [f(x i ) + g i, x x i ]. 4 Usually, such an oracle can compute also the value of the objective function. However, we do not need it in the algorithm. 2
13 Thus, its algorithmic scheme looks as follows: { } x k+ = arg min x Q k+ [f(x i ) + g i, x x i ] + µ k d(x), k 0, (3.4) where {µ k } k=0 is a sequence of positive scaling parameters. In accordance to the rules of (2.2), we choose µ k = β k+ k+. However, the rate of convergence of the method (3.4) ( remains on the same level for any µ k = O k ). Note that in (3.4) we do not actually need the function values. Note that for method (3.4) we do not need Q to be bounded. For example, we can have Q = E. Nevertheless, if a solution x of problem (3.) do exists, then by Theorem 2 the generated sequence {x k } k=0 is bounded. This feature may look surprising since µ k 0 and no special caution is taken in (3.4) to ensure the boundedness of the sequence. 2. Weighted averages. In view of Theorem 3 and inequalities (2.20), (3.2) we have ( ) f(ˆx k+ ) fd k+ (k+) σ L ρ D + ρ 2. (3.5) As in (2.2), parameters D and L are not used explicitly in method (2.22). But now, in order to choose a reasonable value of ρ, we need a reasonable estimate only for D. The optimal choice of ρ is as follows: ρ = 2D. 3.2 Primal-dual problem For the sake of notation, in this section we assume that the prox-center of the set Q is at the origin: x 0 = arg min d(x) = 0 E. (3.6) x Q Let us assume that an optimal solution x of problem (3.) exists. Then, choosing D d(x ), we can rewrite this problem in an equivalent form: f Consider the conjugate function Note that for any x E we have = min x {f(x) : x F D }. (3.7) f (s) = sup[ s, x f(x) ]. (3.8) x E f(x) dom f. (3.9) Since dom f = E, a converse representation to (3.8) is always valid: f(x) = max[ s, x f (s) : s dom f ], x E. s Hence, we can rewrite the problem (3.7) in a dual form: f = min x F D max [ s, x f (s) ] = s dom f = max s dom f [ ξ D ( s) f (s) ]. 3 max s dom f min [ s, x f (s) ] x F D
14 Thus, we come to the following dual problem: f = min s [ f (s) + ξ D ( s) : s dom f ]. (3.0) As usual, it is worth to unify (3.) and (3.0) in a single primal-dual problem: 0 = min x,s [ ψ D(x, s) def = f(x) + f (s) + ξ D ( s) : x Q, s dom f ]. (3.) Let us show that DA-methods converge to a solution of this problem. Theorem 4 Let pair (ˆx k+, ŝ k+ ) be defined by (2.9) with sequences X k, G k and Λ k being generated by method (2.4) for problem (3.7). Then ˆx k+ Q, ŝ k+ dom f, and ψ D (ˆx k+, ŝ k+ ) δ k (D). (3.2) Proof: Indeed, point ˆx k+ is feasible since it is a convex combination of feasible points. In view of inclusion (3.9), same arguments also work for ŝ k+. Finally, ψ D (ˆx k+, ŝ k+ ) = ξ D ( ŝ k+ ) + f(ˆx k+ ) + f (ŝ k+ ) (2.9) ξ D ( ŝ k+ ) + λ i [f(x i ) + f (g i )] (2.), (3.8) = ξ D ( s k+ ) + λ i g i, x i (2.), (3.6) = δ k (D). Thus, for corresponding methods, the right-hand sides of inequalities (3.3), (3.5) establish the rate of convergence of primal-dual function in (3.2) to zero. In the case of interior solution x, the optimal solution of the dual problem (3.0) is attained at 0 E. In this case the rate of convergence of ŝ k+ to the origin is given by (2.7). In this section, we have shown the abilities of DA-methods (2.4) on the general primal-dual problem (3.). However, as we will see in the next sections, these schemes can provide us with much more detailed dual information. For that we need to employ an available information on the structure of minimization problem. 3.3 Minimax problems Consider the following variant of problem (3.): min{f(x) = max f x j(x) : x Q}, (3.3) j p where f j ( ), j =,..., p, are convex functions defined on E. Of course, this problem admits a dual representation. Let us fix D d(x ). Recall that we denote by p a standard simplex in R p : { } p = y R+ p p : y (j) =. 4 j=
15 Then f = min x Q max f j(x) = j p min x F D max y p p y (j) f j (x) j= = max y p Thus, defining ϕ D (y) = min x F D p min y (j) f j (x). x F D j= p y (j) f j (x), we obtain the dual problem j= f = max y p ϕ D (y). (3.4) Let us show that the DA-schemes generate also an approximation to the optimal solution of the dual problem. For that, we need to employ the structure of the oracle G for the objective function in (3.3). Note that in our case f(x) = Conv { f j (x) : j I(x)}, I(x) = {j : f j (x) = f(x)}. Thus, for any g k in the method (2.4) as applied to the problem (3.3) we can define a vector y k p such that y (j) k = 0, j / I(x k ), g k = j I(x k ) y (j) k g k,j, where g k,j f j (x k ) for j I(x k ). Denote ŷ k+ = y i. (3.5) Theorem 5 Let pair (ˆx k+, ŷ k+ ) be defined by sequences X k, G k and Λ k generated by method (2.4) for problem (3.3). Then this pair is primal-dual feasible and 0 f(ˆx k+ ) ϕ D (ŷ k+ ) δ k (D). (3.6) Proof: Indeed, the pair (ˆx k+, ŷ k+ ) is feasible in view of convexity of primal-dual set Q p. Further, denote F (x) = (f (x),..., f p (x)) T R p. Then in view of (3.5), for any k 0 we have g k, x k x = j I(x k ) y (j) k g k,j, x k x j I(x k ) y (j) k [f j(x k ) f j (x)] Therefore δ k (D) = = y k, F (x k ) F (x) = f(x k ) y k, F (x). max x F D max x F D { λ i g i, x i x { } λ i [f(x i ) y i, F (x) ] } = λ i f(x i ) ϕ D (ŷ k+ ) f(ˆx k+ ) ϕ D (ŷ k+ ). 5
16 Let us write down an explicit form of SDA-method as applied to problem (3.3). Denote by e j the jth coordinate vector in R p. Initialization: Set l 0 (x) 0, m 0 = 0 Z p. Iteration (k 0):. Choose any j k : f j k (x k) = f(x k ). 2. Set l k+ (x) = k k+ l k(x) + k+ [f(x k) + g k,j k, x x k ]. { 3. Compute x k+ = arg min l k+ (x) + γ ˆβ k+ x Q k+ }. d(x) (3.7) 4. Update m k+ = m k + e j k. Output: ˆx k+ = k+ x i, ŷ k+ = k+ m k+. In (3.7), the entry of the optimal dual vector is approximated by the frequency of detecting the corresponding functional component as the maximal one. Note that SDA-method can be applied also to a continuous minimax problem. In this case the objective function has the following form: f(x) = max [ y, F (x) : y Q d}, y where Q d is a bounded closed convex set in R p. For this problem an approximation to the optimal dual solution is obtained as an average of the vectors y(x k ) defined by y(x) Arg max [ y, F (x) : y Q d}. y (Note that for computing y(x) we need to maximize a linear function over a convex set.) Corresponding modifications of the method (3.7) are straightforward. 3.4 Problems with simple functional constraints Let us assume that the set Q in problem (3.) has the following structure: Q = { x Q : Ax = b, F (x) 0 }, (3.8) where Q is a closed convex set in E, dim E = n, b R m, A is an m n-matrix, and F (x) : Q R p is a component-wise convex function. Usually, depending on the importance of corresponding functional components, we can freely decide whether they should be hidden 6
17 in the set Q or not. In any case, representation of the set Q in the form (3.8) is not unique; therefore we call the corresponding dual problem the adjoint one. Let us introduce dual variables u R m and v R p +. Assume that an optimal solution x of (3.), (3.8) exists and D d(x ). Denote ϕ D (u, v) = min{f(x) + u, b Ax + v, F (x) : d(x) D}. (3.9) x Q In this definition d(x) is still a prox-function of the set Q with prox-center x 0 Q. Then the adjoint problem to (3.), (3.8) consists in max u,v {ϕ D(u, v) : u R m, v R p +}. (3.20) Note that the complexity of computation of the objective function in this problem can be comparable with the complexity of the initial problem (3.). Nevertheless, we will see that DA-methods (2.4) are able to approach its optimal solutions. Let us consider two possibilities.. Let us fix D d(x ). Denote by û k, ˆv k the optimal multipliers for essential constraints in the following optimization problem: { } δ k = max x Q λ i g i, x i x : Ax = b, F (x) 0, d(x) D = max x Q, d(x) D min u R m,v R p + ˆL(x, u, v), (3.2) ˆL(x, u, v) = λ i g i, x i x + u, Ax b v, F (x). And let ˆx k be its optimal solution. Then for any x Q with d(x) D we have λ i g i + A T û k p j= with some ψ j F (j) (ˆx k ), j =,..., p, and p j= ˆv (j) k F (j) (ˆx k ) = 0. ˆv (j) k ψ j, x ˆx k 0, Therefore, since f and F are convex, for any such x we get f(x) + û k, b Ax + ˆv k, F (x) λ i [f(x i ) + g i, x x i ] + û k, b Ax + p ˆv (j) k [F (j) (ˆx k ) + ψ j, x ˆx k ] j= λ i [f(x i ) + g i, ˆx k x i ] = 7 λ i f(x i ) δ k (D).
18 Thus, we have proved that 0 f(ˆx k+ ) ϕ D (û k, ˆv k ) λ i f(x i ) ϕ D (û k, ˆv k ) δ k (D). (3.22) Hence, the quality of this primal-dual object can be estimated by Item of Theorem. 2. The above suggestion to form an approximate solution to the adjoint problem (3.20) by the dual multipliers of problem (3.2) has two drawbacks. First of all, it is necessary to guarantee that the auxiliary parameter D is sufficiently big. Secondly, the problem (3.2) needs additional computational efforts. Let us show that the approximate solution to problem (3.20) can be obtained for free, using the objects necessary for finding the point π βk+ ( s k+ ). Denote by ū k, v k the optimal multipliers for functional constraints in the following optimization problem: x k+ = π βk+ ( s k+ ) = arg max x Q = arg max x Q L(x, u, v) = Then for any x Q we have { min u R m,v R p + λ i g i, x β k+ d(x) : Ax = b, F (x) 0 L(x, u, v), λ i g i, x β k+ d(x) + u, Ax b v, F (x). λ i g i β k+ d + A T ū k p v (j) k ψ j, x x k+ 0, j= with some d d(x k+ ) and ψ j F (j) (x k+ ), j =,..., p, and Therefore, for all such x we obtain p j= f(x) + ū k, b Ax + v k, F (x) v (j) k F (j) (x k+ ) = 0. λ i [f(x i ) + g i, x x i ] + ū k, b Ax + p v (j) k [F (j) (x k+ ) + ψ j, x x k+ ] j= λ i [f(x i ) + g i, x k+ x i ] + β k+ d, x k+ x λ i f(x i ) β k+ d(x) [ λ i g i, x i x k+ β k+ d(x k+ ) 8 } ]. (3.23)
19 Since in definition (3.9) we need d(x) D, by (2.2) the above estimates results in ϕ D (ū k, v k ) λ i f(x i ) (β k+, D). (3.24) Hence, the quality of this primal-dual object can be estimated by Item of Theorem. 4 Saddle point problems Consider the general saddle point problem: min u U max v V f(u, v), (4.) where U is a closed convex set in E u, V is a closed convex set in E v, function f(, v) is convex in the first argument on E u for any v V, and function f(u, ) is concave in the second argument on E v for any u U. For set U we assume existence of prox-function d u ( ) with prox-center u 0, which is strongly convex on U with respect to the norm u with convexity parameter σ u. For the set V we introduce the similar assumptions. Note that the point x = (u, v ) Q def = U V is a solution to the problem (4.) if and only if f(u, v) f(u, v ) f(u, v ) u U, v V. Since for any x def = (u, v) Q we have we conclude that f(u, v ) f(u, v) + g v, v v g v f v (u, v), f(u, v) f(u, v) + g u, u u g u f u (u, v), g u, u u + g v, v v 0 (4.2) for any g = (g u, g v ) from f u (u, v) f v (u, v). Thus, the oracle G : x = (u, v) Q g(x) = (g u (x), g v (x)) (4.3) satisfies condition (2.8). Further, let us fix some α (0, ). Then we can introduce the following prox-function of the set Q: d(x) = αd u (u) + ( α)d v (v). In this case, the prox-center of Q is x 0 = (u 0, v 0 ). Defining the norm for E = E u E v by [ ] /2 x = ασ u u 2 u + ( α)σ v v 2 v. we get for function d( ) the convexity parameter σ =. Note that the norm for the dual space E = E u E v is defined now as [ g = ] /2 ασ u g u 2 u, + ( α)σ v g v 2 v,. 9
20 Assuming now the partial subdifferentials of f be uniformly bounded, g u u, L u, g v v, L v, g f u (u, v) f v (u, v), (u, v) Q, we obtain the following bound for the answers of oracle G: g 2 L 2 def = L2 u ασ u + L2 v ( α)σ v. (4.4) Finally, if d u (u ) D u and d v (v ) D v, then x = (u, v ) F D with D = αd u + ( α)d v. (4.5) Let us apply now to (4.) SDA-method (2.2). In accordance to Theorem 2, we get the following bound: δ k (D) ˆβ k+ def k+ Ω, Ω = γd + L2 2γ, with L and D defined by (4.4) and (4.5) respectively. With the optimal choice γ = we obtain Ω = [ 2L 2 D ] ( ) ] /2 = [2 L 2 /2 u ασ u + L2 v ( α)σ v (αd u + ( α)d v ) L 2D = [ ( )] 2 L 2 u D /2 u σ u + L2 vd v σ v + αl2 vd u ( α)σ v + ( α)l2 ud v ασ u Minimizing the latter expression in α we obtain Ω = ( ) 2 L Du u σ u + L Dv v σ v. Thus, we have shown that under an appropriate choice of parameters we can ensure for SDA-method the following rate of convergence for the gap function: δ k (D) ˆβ k+ ( ) k+ 2 L Du u σ u + L Dv v σ v. (4.6) It remains to show that the vanishing gap results in approaching the solution of the saddle point problem (4.). Let us introduce two auxiliary functions: ϕ Dv (u) = max v V {f(u, v) : d v(v) D v }, ψ Du (v) = min u U {f(u, v) : d u(u) D u }. In view of our assumptions, ϕ Dv ( ) is convex on U and ψ Du ( ) is concave on V. Moreover, for any u U and v V we have where f = f(u, v ). ψ Du (v) f ϕ Dv (u), 20
21 Theorem 6 Let point ˆx k+ = (û k+, ˆv k+ ) be defined by (2.9) with sequences X k, G k and Λ k being generated by method (2.4) for problem (4.) with oracle G defined by (4.3). Then ˆx k+ Q and 0 ϕ Dv (û k+ ) ψ Du (ˆv k+ ) δ k (D). (4.7) Proof: Indeed, since f(, v) is convex and f(u, ) is concave, { } def τ k = max λ i g u (u i, v i ), u i u : d u (u) D u u U max u U { } λ i [f(u i, v i ) f(u, v i )] : d u (u) D u λ i f(u i, v i ) min u U {f(u, ˆv k+) : d u (u) D u } Similarly, σ k = λ i f(u i, v i ) ψ Du (ˆv k+ ). { } def = max λ i g v (u i, v i ), v v i : d v (v) D v v V max v V { } λ i [f(u i, v) f(u i, v i )] : d v (v) D v max {f(û k+, v) : d v (v) D v } v V λ i f(u i, v i ) Thus, = ϕ Dv (û k+ ) λ i f(u i, v i ). max x Q { ϕ Dv (û k+ ) ψ Du (ˆv k+ ) τ k + σ k } λ i g(x i ), x i x : d(x) αd u + ( α)d v = δ k (D). Note that we did not assume boundedness of the sets U and V. Parameter D from inequality (4.7) does not appear in DA-method (2.4) in explicit form. 5 Variational inequalities Let Q be a closed convex set endowed, as usual, with a prox-function d(x). Consider a continuous operator V ( ) : Q E, which is monotone on Q: V (x) V (y), x y 0 x, y Q. 2
22 In this section we are interested in variational inequality problem (VI): Find x Q: V (x), x x 0 x Q. (5.) Sometimes this problem is called a weak variational inequality. Since V ( ) is continuous and monotone, the problem (5.) is equivalent to its strong variant: Find x Q: V (x ), x x 0 x Q. (5.2) We assume that for this problem there exists at least one solution x. Let us fix D > d(x ). A standard measure for quality of approximate solution to (5.) is the restricted merit function f D (x) = max{ V (y), x y : d(y) D}. (5.3) y Q Let us present two important facts related to this function (our exposition is partly taken from Section 2 in [5]). Lemma 4 Function f D (x) is well defined and convex on E. For any x Q we have f D (x) 0, and f D (x ) = 0. Conversely, if f D (ˆx) = 0 for some ˆx Q with d(ˆx) < D, then ˆx is a solution to problem (5.). Proof: Indeed, function f D (x) is well defined since the set F D is bounded. It is convex in x as a maximum of a parametric family of linear functions. If x F D, then taking in (5.3) y = x we guarantee f D (x) 0. If x Q and d(x) > D, consider y Q satisfying condition for certain α (0, ). Then y F D and d(y) = D, y = αx + ( α)x, V (y), x y = α α V (y), y x 0. Thus, f D (x) 0 for all x Q. On the other hand, since x is a solution to (5.), V (y), x y 0 for all y from F D. Hence, since x F D, we get f D (x ) = 0. Finally, let f D (ˆx) = 0 for some ˆx Q and d(ˆx) < D. This means that ˆx is a solution of the weak variational inequality V (y), y ˆx 0 for all y F D. Since V (y) is continuous, we conclude that ˆx is a solution to the strong variational inequality V (ˆx), y ˆx 0 y F D. Hence, the minimum of linear function l(y) = V (ˆx), y, y F D, is attained at y = ˆx. Moreover, at this point the constraint d(y) D is not active. Thus, this constraint can be removed and we conclude that ˆx is a solution to (5.2) and, consequently, to (5.). Let us use for problem (5.) the oracle G : x V (x). Lemma 5 For any k 0 we have f D (ˆx k+ ) δ k (D). (5.4) 22
23 Proof: Indeed, since V (x) is a monotone operator, f D (ˆx k+ ) = max x { V (x), ˆx k+ x : x F D } = max x max x { } λ i V (x), x i x : x F D { } λ i V (x i ), x i x : x F D δ k (D). Let us assume that V (x) L for all x Q. Then, for example, by Theorem 2 we conclude that SDA-method (2.2) as applied to the problem (5.) converges with the following rate: f D (ˆx k+ ) ˆβ k+ k+ ) (γd(x ) + L2 2σγ. (5.5) Note that the sequence {x k } k=0 remains bounded even for unbounded feasible set Q: x k x 2 2 σ d(x ) + L2 σ 2 γ 2. (5.6) 6 Stochastic optimization Let Ξ be a probability space endowed with a probability measure, and let Q be a closed convex set in E endowed with a prox-function d(x). We are given a cost function f : Q Ξ R. This mapping defines a family of random variables f(x, ξ) on Ξ. We assume that the expectation in ξ, E ξ (f(x, ξ)) is well-defined for all x Q. Our problem of interest is the stochastic optimization problem: min x {ϕ(x) E ξ (f(x, ξ)) : x Q}. (6.) We assume that problem (6.) is solvable and denote by x one of its solutions with ϕ def = ϕ(x ). In this section we make the standard convexity assumption. Assumption For any ξ Ξ, function f(, ξ) is convex and subdifferentiable on Q. Then, the function ϕ(x) is convex on Q. Note that computation of the exact value of ϕ(x) may not be numerically possible, even when the distribution of ξ is known. Indeed, ( ) if Ξ R m, then computing ˆϕ, such that ˆϕ ϕ(x) ˆϵ, may involve up to O ˆϵ computations of f(x, ξ) for different m ξ Ξ. Hence, the standard deterministic notion of approximate solution to problem (6.) becomes useless. However, an alternative definition of the solution to (6.) is possible. Indeed, it is clear that no solution concept can exclude failures in actual implementation. This is the very nature of decision under uncertainty not to have full control over the consequences of a 23
24 given decision. In this context, there is no reason to require that the computed solution be the result of some deterministic process. A random computational scheme would be equally acceptable, if the solution meets some probabilistic criterion. Namely, let x be a random output of some random algorithm. It appears, that it is enough to assume that this output is good only in average: E(ϕ(x)) ϕ ϵ. (6.2) Then, as it was shown in [6], any random optimization algorithm, which is able to generate such random output for any ϵ > 0, can be transformed in an algorithm generating random solutions with an appropriate level of confidence. In [6] this transformation was justified for a stochastic version of subgradient method (.3) (which is known to possess the feature (6.2)). The goal of this section is to prove that a stochastic version of SDA-method (2.2) as applied to problem (6.) is also able to produce a random variable x Q satisfying condition (6.2). In order to justify convergence of the method we need more assumptions. Assumption 2 At any point x Q and any implementation of random vector ξ Ξ the stochastic oracle G(x, ξ) always returns a predefined answer f (x, ξ) x f(x, ξ). Moreover, the answers of the oracle are uniformly bounded: f (x, ξ) L, x Q, ξ Ξ. Let us write down a stochastic modification of SDA-method in the same vein as [6]. In this section, we adopt notation τ for a particular result of drawing of a random variable τ. Thus, all ξ k R m in the scheme below denote some observations of corresponding i.i.d. random vectors ξ k, which are the identical copies of the random vector ξ defined in (6.). Method of Stochastic Simple Averages Initialization: Set s 0 = 0 E. Choose γ > 0. Iteration (k 0): (6.3). Generate ξ k and compute g k = f ( x k, ξ k ). 2. Set s k+ = s k + g k, x k+ = π γ ˆβk+ ( s k+ ). Denote by ξ k, k 0, the collection of random variables (ξ 0,..., ξ k ). (6.3) defines two sequences of random vectors SSA-method s k+ = s k+ (ξ k ) E, x k+ = x k+ (ξ k ) Q, k 0, with deterministic s 0 and x 0. Note that their implementations s k+ and x k+ are different for different runs of SSA-method. 24
25 Thus, for each particular run of this method, we can define a gap function { } δ k (D) = max f ( x i, ξ i ), x i x : x F D,, D 0, (6.4) x which can be seen as a drawing of random variable δ k (D). Since g k L, Theorem 2 results in the following uniform bound: S δk k (D) ˆβ ) k+ k+ (γd + L2 2σγ. (6.5) Let us choose now D big enough: D d(x ). Then S δk k (D) k+ f ( x i, ξ i ), x i x k+ [f( x i, ξ i ) f(x, ξ i )]. Since all ξ i are i.i.d. random vectors, in average we have ( ) E ξk δ k (D) k+ E ξk (f(x i, ξ i )) ϕ. (6.6) Note that the random vector x i is independent on ξ j, i j k. Therefore, and we conclude that k+ E ξk (f(x i, ξ i )) = k+ E ξk (f(x i, ξ i )) = E ξk (ϕ(x i )), ( E ξk (ϕ(x i )) E ξk (ϕ k+ )) x i. Thus, in view of inequalities (6.5), (6.6), we have proved the following theorem. Theorem 7 Let the random sequence {x k } k=0 be defined by SSA-method (6.3). Then for any k 0 we have ( )) E ξk (ϕ k+ x i ϕ ˆβ ) k+ k+ (γd(x ) + L2 2σγ. 7 Applications in modelling 7. Algorithm of balanced development It appears that a variant of DA-algorithm (2.4) has a natural interpretation of certain rational dynamic investment strategy. Let us discuss an example of such a model. Assume we are going to develop our property by a given priori sequence Λ = {λ k } k=0 of consecutive investments of the capital. This money can be spent for buying certain products with unitary prices p (j) 0, j =,..., N. 25
26 All products are perfectly divisible; they are available on the market in unlimited quantities. Thus, if for investment k we buy quantity X (j) k 0 of each product j, j =,..., N, then the balance equation can be written as X k = λ k x k, p, x k =, x k 0, (7.) where the vectors p = (p (),..., p (N) ) T and x k = (x () k,... x(n) ) T are both from R N. In order to measure the progress of our property (or, the efficiency of our investments) we introduce a set of m characteristics (features). Let us assume that after k 0 investments we are able to measure exactly an accumulated level s (i) k, i =,..., m, of each feature i in our property. On the other hand, we have a set of standards b (i) > 0, i =,..., m, which are treated as lower bounds on concentration of each feature in a perfectly balanced unit of such a property. Finally, let us assume that each product j is described by a vector a j = (a () j,..., a (m) j ) T, j =,..., N, with entries being the unitary concentrations of corresponding characteristics. Introducing the matrix A = (a,..., a N ) R m N, we can describe the dynamics of accumulated features as follows: s k+ = s k + λ k g k, g k = Ax k, k 0. (7.2) In accordance to our system of standards, the strategy of optimal balanced development can be found from the following optimization problem: τ def = max x,τ {τ : Ax τb, p, x =, x 0}. (7.3) Indeed, denote by x its optimal solution. Then, choosing in (7.2) x k = x we get s k = Ax τ b. Thus, the efficiency of such an investment strategy is maximal possible and its quality even does not depend on the schedule of the payments. However, note that in order to apply the above strategy, it is necessary to solve in advance a linear programming problem (7.3), which can be very complicated because of a high dimension. What can be done if this computation appears to be too difficult? Do we have some simple tools for approaching the optimal strategy in real life? Apparently, the latter question has a positive answer. First of all, let us represent the problem (7.3) in a dual form. From the Lagrangian L(x, τ, y, λ) = τ + T y, Ax τb + λ ( p, x ) where T is a diagonal matrix, T (i,i) =, i =,..., m, and corresponding saddle-point b (i) problem max min L(x, τ, y, λ), τ; x 0 λ; y 0 we come to the following dual representation τ = min y,λ {λ : AT T y λp, y m }. 26
27 Or, introducing the function ϕ(y) = max a j N p (j) j, T y, we obtain τ = min y {ϕ(y) : y m }. (7.4) Let us apply to problem (7.4) a variant of DA-scheme (2.4). For feasible set of this problem m, we choose the entropy prox-function d(y) = ln m + m y (i) ln y (i) with prox-center y 0 = ( m,..., m )T. Note that d(y) is strongly convex on m in l -norm (.0) with convexity parameter σ =, and for any y m we have i= d(y) ln m. (7.5) Moreover, with this prox-function the computation of projection π β (s) is very cheap: [ m π β (s) (i) = e s(i) /β e /β] s(j), i =,..., m, (7.6) j= (see, for example, Lemma 4 in [3]). In the algorithm below, the sequence Λ is the same as in (7.2). Algorithm of Balanced Development Initialization: Set s 0 = 0 E. Choose β 0 > 0. Iteration (k 0):. Form the set J k = { } j : a p (j) j, T y k = ϕ(y k ). 2. Choose any x k 0, p, x k =, with x (j) k = 0 for j / J k. (7.7) 3. Update s k+ = s k + λ k g k with g k = Ax k. 4. Choose β k+ β k and for i =,..., m define [ y (i) k+ = e s(i) k+ /(β k+b (i)) m k+ /(β k+b )] (j). e s(j) j= Note that in this scheme T g k ϕ(y k ) and ( y k+ = π βk+ T k j=0 λ j g j ). 27
28 Therefore it is indeed a variant of (2.4), and in view of Theorem and (7.5) we have ( ) λ i ϕ(y i ) τ β k+ ln m + k 2 L2 λ 2 i β i, L = max j N p (j) T a j = max j N, i m a (i) p (j) b (i) j. (7.8) BD-algorithm (7.7) admits the following interpretation. Its first three steps describe a natural investment strategy. Indeed, the algorithm updates a system of personal prices for characteristics def u k = T y k, k 0. By these prices, for any product j we can compute an estimate of its utility a j, u k. Therefore, in Step we form a subset of available products with the best quality-price ratio (Step ): J k = { j : p (j) a j, u k = max i N } a p (i) i, u k. In Step 2 we share a unit of budget in an arbitrary way among the best products. And in Step 3 we buy the products within the budget λ k and update the vector of accumulated features. A non-trivial part of interpretation is related to Step 4, which describes a dependence of the personal prices, developed to the end of period k +, in the vector of accumulated characteristics s k+ observed during this period. For this interpretation, we need to explain first the meaning of (7.6). Relations (7.6) appear in so-called logit variant of discrete choice model (see, for example, [2]). In this model we need to choose one of m variants with utilities s (i), i =,..., m. The utilities are observed with additive random errors ϵ (i), which are i.i.d. in accordance to double-exponential distribution: [ ] { F (τ) = Prob ϵ (i) τ = exp e γ τ/β}, i =,..., m, where γ is the Euler s constant and β > 0 is a tolerance parameter. Then, the expected observation of the utility function is given by ( ) U β (s) = E max i m [s(i) + ϵ (i) ], It can be proved that U β (s) = βd (s/β) + const with ( m ) d (s) = ln e s(i). Therefore we have the following important expressions for so-called choice probabilities: [ ] Prob s (i) + ϵ (i) = max j m [s(j) + ϵ (j) ] = U β (s) (i) [ m = d (s/β) (i) = e s(i) /β e /β] s(j), i =,..., m, j= 28
29 (compare with (7.6)). The smaller β is, the closer is our choice strategy to deterministic maximum. Using the logit model, we can adopt the following behavioral explanation of the rules of Step 4 in (7.7). During the period k + we regularly compare the levels of accumulated features in relative scale defined by the vector of minimal standards b. Each audit defines the worst characteristic, for which the corresponding level is minimal. The above computations are done with additive errors 5, which satisfy the logit model. To the end of the period we obtain the frequencies y (i) k+ of detecting the level of corresponding characteristic as the worst one. Then, we apply the following prices: u (i) k+ = b (i) y (i) k+, i =,..., m. (7.9) Note that the conditions for convergence of BD-method can be derived from the righthand side of inequality (7.8). Assume for example, that we have a sequence of equal investments λ. Further, during each period we compare average accumulation of characteristics using the logit model with tolerance parameter µ. Hence, in (7.7) we take β k+ = µ (k + ), λ k = λ, k 0. Defining β 0 = µ, we can estimate the right-hand side of inequality (7.8) as follows: µ λl2 λ ln m + 2µ 2+ln k k+, k. Thus, the quality of our property is stabilized on the level τ + µ λ ln m. In order to have convergence to the optimal balance, the inaccuracy level µ of audit testing must gradually go to zero. 7.2 Preliminary planning versus dynamic adjustment Let us consider a multi-period optimization problem in the following form. For k consecutive periods, k = 0,..., N, our expenses will be defined by different convex objective functions f 0 (x),..., f N (x), x Q, where Q is a closed convex set. Thus, if we choose to apply for the period k some strategy x k Q, then the total expenses are given by Ψ(X) = N f k (x k ), X = {x k } N k=0. k=0 In this model there are several possibilities for defining a rational strategy.. If all functions {f k (x)} N k=0 are known in advance, then we can define an optimal dynamic strategy X as x k = x k def = arg min x {f k (x) : x Q}, k = 0,..., N. (7.0) 5 Note that these errors can occur in a natural way, due to inexact data, for example. However, even if the actual data and measurements are exact, we need to introduce an artificial randomization of the results. 29
Gradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationUniversal Gradient Methods for Convex Optimization Problems
CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationAccelerating the cubic regularization of Newton s method on convex problems
Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationNonsymmetric potential-reduction methods for general cones
CORE DISCUSSION PAPER 2006/34 Nonsymmetric potential-reduction methods for general cones Yu. Nesterov March 28, 2006 Abstract In this paper we propose two new nonsymmetric primal-dual potential-reduction
More informationSmooth minimization of non-smooth functions
Math. Program., Ser. A 103, 127 152 (2005) Digital Object Identifier (DOI) 10.1007/s10107-004-0552-5 Yu. Nesterov Smooth minimization of non-smooth functions Received: February 4, 2003 / Accepted: July
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationDISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder
011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium
More informationComplexity bounds for primal-dual methods minimizing the model of objective function
Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems
An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract
More informationConstructing self-concordant barriers for convex cones
CORE DISCUSSION PAPER 2006/30 Constructing self-concordant barriers for convex cones Yu. Nesterov March 21, 2006 Abstract In this paper we develop a technique for constructing self-concordant barriers
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationSubgradient methods for huge-scale optimization problems
CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.
More informationCORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28
26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel
More informationPavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017
Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov
More informationComplexity and Simplicity of Optimization Problems
Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 - March 23, 2012 (ULg) Yu. Nesterov Algorithmic Challenges in Optimization 1/27 Developments in Computer Sciences
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationLagrange Relaxation and Duality
Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler
More informationarxiv: v2 [math.oc] 1 Jul 2015
A Family of Subgradient-Based Methods for Convex Optimization Problems in a Unifying Framework Masaru Ito (ito@is.titech.ac.jp) Mituhiro Fukuda (mituhiro@is.titech.ac.jp) arxiv:403.6526v2 [math.oc] Jul
More informationKaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationSolving Dual Problems
Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem
More informationEfficient Methods for Stochastic Composite Optimization
Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June
More informationConvergence rate of inexact proximal point methods with relative error criteria for convex optimization
Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationLIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS
LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More information3.10 Lagrangian relaxation
3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationAn inexact subgradient algorithm for Equilibrium Problems
Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,
More informationAdditional Homework Problems
Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationarxiv: v1 [math.oc] 21 Apr 2016
Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationDuality Theory of Constrained Optimization
Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive
More informationStochastic model-based minimization under high-order growth
Stochastic model-based minimization under high-order growth Damek Davis Dmitriy Drusvyatskiy Kellie J. MacPhee Abstract Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationOn the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. 35, No., May 010, pp. 84 305 issn 0364-765X eissn 156-5471 10 350 084 informs doi 10.187/moor.1090.0440 010 INFORMS On the Power of Robust Solutions in Two-Stage
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationSome Properties of the Augmented Lagrangian in Cone Constrained Optimization
MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationAlgorithmic models of market equilibrium
CORE DISCUSSION PAPER 2013/66 Algorithmic models of market equilibrium Yu. Nesterov and V. Shikhman November 26, 2013 Abstract In this paper we suggest a new framework for constructing mathematical models
More informationOn the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean
On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual
More informationOnline First-Order Framework for Robust Convex Optimization
Online First-Order Framework for Robust Convex Optimization Nam Ho-Nguyen 1 and Fatma Kılınç-Karzan 1 1 Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. July 20, 2016;
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems
Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm
More informationLecture 24 November 27
EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve
More informationMathematics for Economists
Mathematics for Economists Victor Filipe Sao Paulo School of Economics FGV Metric Spaces: Basic Definitions Victor Filipe (EESP/FGV) Mathematics for Economists Jan.-Feb. 2017 1 / 34 Definitions and Examples
More informationSubgradient methods for huge-scale optimization problems
Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More informationA convergence result for an Outer Approximation Scheme
A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento
More informationSolving generalized semi-infinite programs by reduction to simpler problems.
Solving generalized semi-infinite programs by reduction to simpler problems. G. Still, University of Twente January 20, 2004 Abstract. The paper intends to give a unifying treatment of different approaches
More informationSparse Regularization via Convex Analysis
Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationIntroduction to Nonlinear Stochastic Programming
School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationarxiv: v1 [math.oc] 13 Dec 2018
A NEW HOMOTOPY PROXIMAL VARIABLE-METRIC FRAMEWORK FOR COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, LIANG LING, AND KIM-CHUAN TOH arxiv:8205243v [mathoc] 3 Dec 208 Abstract This paper suggests two novel
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationDouglas-Rachford splitting for nonconvex feasibility problems
Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationJanuary 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions
Olga Galinina olga.galinina@tut.fi ELT-53656 Network Analysis Dimensioning II Department of Electronics Communications Engineering Tampere University of Technology, Tampere, Finl January 29, 2014 1 2 3
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationLAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION
LAGRANGIAN TRANSFORMATION IN CONVEX OPTIMIZATION ROMAN A. POLYAK Abstract. We introduce the Lagrangian Transformation(LT) and develop a general LT method for convex optimization problems. A class Ψ of
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationarxiv: v1 [math.oc] 5 Dec 2014
FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been
More informationMaximal Monotone Inclusions and Fitzpatrick Functions
JOTA manuscript No. (will be inserted by the editor) Maximal Monotone Inclusions and Fitzpatrick Functions J. M. Borwein J. Dutta Communicated by Michel Thera. Abstract In this paper, we study maximal
More informationON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction
J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationMini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationRegret bounded by gradual variation for online convex optimization
Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date
More informationComputing closest stable non-negative matrices
Computing closest stable non-negative matrices Yu. Nesterov and V.Yu. Protasov August 22, 2017 Abstract Problem of finding the closest stable matrix for a dynamical system has many applications. It is
More informationWE consider an undirected, connected network of n
On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been
More informationAssignment 1: From the Definition of Convexity to Helley Theorem
Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More information