constrained convex optimization, level-set technique, first-order methods, com-

Size: px
Start display at page:

Download "constrained convex optimization, level-set technique, first-order methods, com-"

Transcription

1 A LEVEL-SET METHOD FOR CONVEX OPTIMIZATION WITH A FEASIBLE SOLUTION PATH QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI Abstract. Large-scale constrained convex optimization problems arise in several application domains. First-order methods are good candidates to tackle such problems due to their low iteration complexity and memory requirement. The level-set framework extends the applicability of first-order methods to tackle problems with complicated convex objectives and constraint sets. Current methods based on this framework either rely on the solution of challenging subproblems or do not guarantee a feasible solution, especially if the procedure is terminated before convergence. We develop a level-set method that finds an ɛ-relative optimal and feasible solution to a constrained convex optimization problem with a fairly general objective function and set of constraints, maintains a feasible solution at each iteration, and only relies on calls to first-order oracles. We establish the iteration complexity of our approach, also accounting for the smoothness and strong convexity of the objective function and constraints when these properties hold. The dependence of our complexity on ɛ is similar to the analogous dependence in the unconstrained setting, which is not known to be true for level-set methods in the literature. Nevertheless, ensuring feasibility is not free. The iteration complexity of our method depends on a condition number, while existing level-set methods that do not guarantee feasibility can avoid such dependence. 9 0 Key words. plexity analysis constrained convex optimization, level-set technique, first-order methods, com AMS subject classifications. 90C5, 90C30, 90C5. Introduction. Large scale constrained convex optimization problems arise in several business, science, and engineering applications. A commonly encountered form for such problems is f := min x X fx s.t. g i x 0, i =,..., m, where X R n is a closed and convex set, and f and g i, i =,..., m, are convex real functions defined on X. Given ɛ > 0, a point x ɛ is ɛ-optimal if it satisfies fx ɛ f ɛ and ɛ-feasible if it satisfies max i=,...,m g i x ɛ ɛ. In addition, given 0 < ɛ, and a feasible and suboptimal solution e X, a point x ɛ is ɛ-relative optimal with respect to e if it satisfies fx ɛ f /fe f ɛ. Level-set techniques [, 3,, 7, 7, 8] provide a flexible framework for solving the optimization problem -. Specifically, this problem can be reformulated as a convex root finding problem L t = 0, where L t = min x X max{fx t; g i x, i =,..., m} see Lemaréchal et al. [7]. Then a bundle method [6, 9, 4] or an accelerated gradient method [0, Section.3.5] can be applied to handle the minimization subproblem to estimate L t and facilitate the root finding process. In general, these methods solve the subproblem using a nontrivial quadratic program that needs to be solved at each iteration. An important variant of - arising in statistics, image processing, and signal processing occurs when fx takes a simple form, such as a gauge function, and there is a single constraint with the structure ρax b λ, where ρ is a convex function, A is a matrix, b is a vector, and λ is a scalar. In this setting, Van Den Berg and Tippie College of Business, University of Iowa, USA, qihang-lin@uiowa.edu College of Business Administration, University of Illinois at Chicago, USA, selvan@uic.edu College of Business Administration, University of Illinois at Chicago, USA, nazad@uic.edu

2 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI Friedlander [7, 8] and Aravkin et al. [] use a level-set approach to swap the roles of the objective function and constraints in - to obtain the convex root finding problem vt = 0 with vt := min x X,fx t ρax b λ. Harchaoui et al. [] proposed a similar approach. The root finding problem arising in this approach is solved using inexact newton and secant methods in [7, 8, ] and [], respectively. Since fx is assumed to be simple, projection onto the feasible set {x R n x X, fx t} is possible and the subproblem in the definition of vt can be solved by various first-order methods. Although the assumption on the objective function is important, the constraint structure is not limiting because can be obtained by choosing A to be an identity matrix, b = 0, λ = 0, and ρx = max i=,...,m g i x. The level-set techniques discussed above find an ɛ-feasible and ɛ-optimal solution to -, that is, the terminal solution may not be feasible. This issue can be overcome when the infeasible terminal solution is also super-optimal and a strictly feasible solution to the problem exists. In this case, a radial-projection scheme [, 5] can be applied to obtain a feasible and ɛ-relative optimal solution. Super-optimality of the terminal solution holds if the constraint fx t is satisfied at termination. This condition can be enforced when projections on to the feasible set with this constraint are computationally cheap, which typically requires fx to be simple e.g. see []. Otherwise, such a guarantee does not exist. Building on this rich literature, we develop a feasible level-set method that i employs a first-order oracle to solve subproblems and compute a feasible and ɛ-relative optimal solution to - with a general objective function and set of constraints while also maintaining feasible intermediate solutions at each iteration i.e., a feasible solution path, and ii avoids potentially costly projections. Our focus on maintaining feasibility is especially desirable in situations where level-set methods need to be terminated before convergence, for instance due to a time constraint. In this case, our level-set approach would return a feasible, albeit possibly suboptimal, solution that can be implemented whereas the solution from existing level-set approaches may not be implementable due to potentially large constraint violations. Convex optimization problems where constraints model operating conditions that need to be satisfied by solutions for implementation arise in several applications, including network and nonlinear resource allocation, signal processing, and circuit design [7, 8, 3]. The radial sub-gradient method proposed by Renegar [5] and extended by Grimmer [0] also emphasizes feasibility. This method avoids projection and finds a feasible and ɛ-relative optimal solution by performing a line search at each iteration. This line search, while easy to execute for some linear or conic programs, can be challenging for general convex programs -. Moreover, it is unknown whether the iteration complexity of this approach can be improved by utilizing the potential smoothness and strong convexity of the objective function and constraints. In contrast, the feasible level-set approach in this paper avoids the need for line search, in addition to projection, and its iteration complexity leverages the strong convexity and smoothness of the objective and constraints when these properties are true. Our work is indeed related to first-order methods, which are popular due to their low per-iteration cost and memory requirement, for example compared to interior point methods, for solving large scale convex optimization problems []. Most existing first-order methods find a feasible and ɛ-optimal solution to a version of the optimization problem - with a simple feasible region e.g. a box, simplex or ball which makes projection on to the feasible set at each iteration inexpensive. Ensuring feasibility using projection becomes difficult for general constraints. Some sub-gradient methods, including the method by Nesterov [0, Section 3..4] and its

3 recent variants by Lan and Zhou [5] and Bayandina [4], can be applied to - without the need for projection mappings. However these methods do not ensure feasibility of the feasible solution as we do, that is, they instead find an ɛ-feasible and ɛ-optimal solution to - at convergence. Given the focus of our research, one naturally wonders whether there is an associated computational cost of ensuring feasibility in the context of level-set methods. Our analysis suggests that the answer depends on the perspective one takes. First, consider the dependence of the iteration complexity on a condition measure as a criterion. A nice property of the level-set method in [] is that its iteration complexity is independent of a condition measure for finding an ɛ-feasible and ɛ-optimal solution. Instead, finding an ɛ-relative optimal and feasible solution leads to iteration complexities with dependence on condition measures when i combining the level-set approach [] and the transformation in [5], or ii using our feasible level-set approach. This suggests that a cost of ensuring feasibility is the presence of a condition measure in the iteration complexity. Second, suppose we consider the dependence of the number of iterations on ɛ as our assessment criterion. The complexity result of the level-set approach provided by [] can be interpreted as the product of the iteration complexity of the oracle used to solve the subproblem and a Olog/ɛ factor. The dependence of our algorithm would be similar under a non-adaptive analysis akin to []. However, using a novel adaptive analysis we find that the iteration complexity of our feasible level-set approach can in fact eliminate the Olog/ɛ factor. In other words, our complexity is analogous to the known complexity of first order methods for unconstrained convex optimization and it is unclear to us whether a similar adaptive analysis can be applied to the approach in []. Overall, in terms of the dependence on ɛ there appears to be no cost to ensuring feasibility. The rest of this paper is organized as follows. Section introduces the level-set formulation and our feasible level-set approach, also establishing its outer iteration complexity to compute an ɛ-relative optimal and feasible solution to the optimization problem -. Section 3 describes two first-order oracles that solve the subproblem of our level-set formulation when the functions f and g i, i =,..., m, are smooth and non-smooth, and in each case accounts for the potential strong convexity of these functions. The inner iteration complexity of each oracle is also analyzed in this section. Section 4 presents an adaptive complexity analysis to establish the overall iteration complexity of using our feasible level-set approach.. Feasible Level-set Method. Our methodological developments rely on the assumption below. Assumption. There exists a solution e X such that max i=,...,m g i e < 0 and fe > f. We label such a solution g-strictly feasible. Given t R, we define L t := min x X Lt, x, where Lt, x := max{fx t; g i x, i =,..., m} for all x X. Let L t denote the sub-differential of L at t. Lemma summarizes known properties of L t. Lemma Lemmas.3.4,.3.5, and.3.6 in [0]. It holds that a L t is non-increasing and convex in t; b L t 0, if t f and L t > 0, if t < f. Moreover, under Assumption, it holds that L t < 0, for any t > f.

4 4 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI c Given 0, we have L t L t + L t. Therefore, ζ L 0 for all ζ L L t and t R. Proof. We only prove the second part of Item b as the proofs of other items are provided in [0]. In particular, we show L t < 0 for any t > f under Assumption. Let x denote an optimal solution to -. If t > f we have fx t < 0. Since f is a continuous function, there exists δ > 0 such that for any x B δ x X we have fx t < 0, where B δ x is a ball of radius δ centered at x. Take x B δ x X such that x = λx + λe for some λ 0,, where e is a solution that satisfies Assumption. By the convexity of the functions g i, i =,..., m, the feasibility of x, and the g-strict feasibility of e, we have g i x λg i x + λg i e < 0 for i =,..., m. In addition, since x B δ x, we have f x t < 0. Therefore, L t max{f x t, g i x, i =,..., m} < 0. Lemma b, specifically L t < 0 for any t > f, ensures that the solution t = f is a unique root to L t = 0 and the solution x t determining L t is an optimal solution to -. In this case, it is possible to find an ɛ-relative optimal and feasible solution by closely approximating t from above with a ˆt and then finding a point ˆx X with Lˆt, ˆx < 0. The latter condition ensures ˆx is feasible. In order to obtain such a ˆt, we start with an initial value t 0 > t and generate a decreasing sequence t 0 t t t, where t k is updated to t k+ based on a term obtained by solving 3 with t = t k up to a certain optimality gap. We present this approach in Algorithm which utilizes an oracle A defined as follows. Definition. An algorithm is called an oracle A if, given t f, x X, α > and γ Lt, x L t, it finds a point x + X, such that L t αlt, x +. Algorithm Feasible Level-Set Method Input: g-strictly feasible solution e, and parameters α > and 0 < ɛ. Initialization: Set x 0 = e and t 0 = fx 0. 3 for k = 0,,,... do 4 Choose γ k such that γ k Lt k, x k L t k. 5 Generate x k+ = At k, x k, α, γ k such that L t k αlt k, x k+. 6 if αlt k, x k+ Lt 0, x ɛ then 7 Terminate and return x k+. 8 end if 9 Set t k+ = t k + Lt k, x k+. 0 end for Given a g-strictly feasible solution e and α >, Algorithm finds an ɛ-relative optimal and feasible solution by calling an oracle A with input tuple t k, x k, α, γ k at each iteration k. This oracle provides a lower bound, αlt k, x k+ 0, and an upper bound, Lt k, x k+ 0, for L t k. We use the upper bound to update t k and the lower bound in the stopping criterion. In particular, Algorithm terminates when αlt k, x k+ is greater than or equal to Lt 0, x ɛ. We discuss possible choices for oracle A in 3 and its input γ in Table. Theorem shows that Algorithm is a feasible level-set method, that is x k generated at each iteration k is feasible, and terminates with an ɛ-relative optimal solution. It also establishes its outer iteration complexity. This complexity depends on a condition measure β := L t 0 t 0 t,

5 5 L t L t t t 0 0 β t t t 0 0 β L t 0 t L t 0 a Well conditioned case. b Poorly conditioned case. Fig. : Illustration of condition measure β for t and t 0 equals 0 and, respectively for problem -. The suboptimality of x 0 = e implies t 0 = fx 0 > t. By Lemma we can show that β 0, ]. Larger values of β signify a better conditioned problem, as illustrated in Figure, because we iteratively use a close approximation of L t to approach t from t 0. Therefore, larger L t in absolute value intuitively suggests faster movement towards t. In addition, the following inequalities can be easily verified from the convexity of L t and Lemma c: 4 β = L t 0 t 0 t L t t t, t t, t 0 ]. Theorem. Given α >, 0 < ɛ, and a g-strictly feasible solution e, Algorithm maintains a feasible solution at each iteration and ensures 5 t k+ t β t k t. α In addition, it returns an ɛ-relative optimal solution to - in at most α K := log β + α βɛ outer iterations. Proof. Notice that t 0 > t since t 0 = fe. Suppose t t k t 0 at iteration k. At iteration k + we have t k+ t = t k t + Lt k, x k+ t k t + L t k t k t t k t 0,

6 6 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI where the first equality is obtained by substituting for t k+ using the update equation for t in Algorithm, the first inequality using the definition of L t k, the second inequality from 4 with t = t k, and the final inequality from t k t 0. This implies that t k t for all k. In addition, the condition L t k αlt k, x k+ satisfied by the oracle A, α >, and Lemma b imply that Lt k, x k+ L t k /α 0. It thus follows that t k+ t k and the x k+ maintained by Algorithm is feasible for all k. Next we establish 5 and the outer iteration complexity of Algorithm. Using the definiton of t k+, α >, and the condition L t k αlt k, x k+ satisfied by oracle A, we write 6 t k+ t = t k t + Lt k, x k+ t k t + L t k α. Using 4 with t = t k to substitute for L t k in 6 gives the required inequality 5. Recursively using 5 starting from k = 0 gives 7 t k t β k t 0 t. α To obtain the terminal condition, we note that, for k K, αlt k, x k+ αl t k αt k t α β K t 0 t α ɛ α L t 0 ɛlt 0, x, where the first inequality holds by the definition of L t; the second using 4 for t = t k ; the third using 7; the forth by applying the definitions of K and β. The last inequality follows from the condition satisfied by the oracle at t 0 and x, that is, αlt 0, x L t 0. Hence, Algorithm terminates after K outer iterations. Finally we proceed to show that the terminal solution is ɛ-relative optimal. Observe that 8 fx k+ f = fx k+ t t k t, where the equality is a consequence of the definition of t and the inequality is obtained using fx k+ t k, which holds because fx k+ t k Lt k, x k+ 0. Combining t k t t 0 t L t k /L t 0 from 4 and 8, we obtain fx k+ f t 0 t L t k /L t 0. Rearranging terms results in the inequality fx k+ f fe f L t k L t 0, where we use t = f and t 0 = fe. Indeed, L t k /L t 0 ɛ when the algorithm terminates. To verify this, the terminal condition and the property satisfied by oracle A indicate that L t k αlt k, x k+ Lt 0, x ɛ L t 0 ɛ, which further implies L t k L t 0 ɛ.

7 First-order oracles. In this section, we adapt first-order methods to be used as the oracle A in Algorithm to solve 3 under different properties of the convex functions f and g i, i =,..., m. We adapt these methods in a manner that facilitates our adaptive complexity analysis in 4. We assume functions f and g i, i =,..., m, are µ-convex for some µ 0. Namely, for h = f, g,..., g m, we have 9 hx hy + ζ h, x y + µ x y for any x and y in X and ζ h hy, where hy denotes the set of all sub-gradients of h at y X and denotes the -norm. We call functions f and g i, i =,..., m, convex if µ = 0 and strongly convex if µ > 0. Let D := max x X x and M := max x X max{ ζ f ; ζ gi, i =,..., m}, where ζ f fx and ζ gi g i x, i =,..., m. When f and g i, i =,..., m are smooth, M := max x X max{ fx ; g i x, i =,..., m}. We make Assumption in the rest of the paper. Assumption. M < + and either i µ > 0 or ii µ = 0 and D < +. Assuming M to be finite is a common assumption in the literature when developing first-order methods for solving smooth min-max problems see e.g. [6, 4] and nonsmooth problems see e.g. [9, 8, ]. The assumption of finite D is present to obtain computable stopping criteria for the first-order oracles that we use. 3.. Smooth convex functions. In this subsection, in addition to assumptions and, we also assume that the functions in the objective and constrains of - are smooth, formally stated below. Assumption 3. Functions f and g i, i =,..., m, are S-smooth for some S 0, namely, for h = f, g,..., g m, hx hy + hy, x y + S x y for any x and y in X. A first-order method achieves a fast linear convergence rate when the objective function is both smooth and strongly convex [0]. Despite this smoothness assumption, since the function Lt, x is a maximum of functions modeling the objective and constraints, it may be non-smooth in x. Nevertheless, we can approximate Lt, x by a smooth and strongly convex function of x using a smoothing method [6, ] and a standard regularization technique. Applying first-order methods to minimize this smooth approximation of Lt, x leads to a lower iteration complexity than directly working with the original non-smooth function. The technique we utilize is a modified version of the smoothing and regularization reduction method by Allen- Zhu and Hazan []. However, since the formulation and the non-smooth structure of 3 are quite different from the problems considered in [], we present below the reduction technique and conduct the complexity analysis under our framework. The exponentially smoothed and regularized version of Lt, x is 0 L σ,λ t, x := σ log expσfx t + m expσg i x + λ x, where σ > 0 and λ 0 are smoothing and regularization parameters, respectively. Lemma is based on related arguments in [6] and summarizes properties of L σ,λ t, x and its closeness to Lt, x. i=

8 8 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI Lemma. Suppose assumptions and 3 hold. Given σ > 0 and λ 0, the function L σ,λ t, x is i σm + S-smooth and ii λ + µ-convex. Moreover, we have 0 L σ,λ t, x Lt, x logm + σ + λd. Here, we follow the convention that 0 + = 0 so that λd = 0 if λ = 0 and D = +. Proof. Item i and the relationship are respectively minor modifications of Proposition 4. and Example 4.9 in [6]. Item ii directly follows from the µ-convexity of f, g i, i =,..., m and λ-convexity of the function λ x. We solve the following version of 3 with Lt, x replaced by L σ,λ t, x: min x X L σ,λt, x. Specifically, we apply a variant of Nesterov s accelerated gradient method [5, 6] for a given t > t and a sequence of σ, λ with σ increasing to infinity and λ decreasing to zero. We refer to this gradient method as restarted smoothing gradient method RSGM and summarize its steps in Oracle. The choice of σ, λ and the number of iterations performed by RSGM depend on whether f and g i, i =,..., m are strongly convex µ > 0 or convex µ = 0. Moreover, RSGM is re-started after N i accelerated gradient method iterations with a smaller precision γ i until this precision is below the required threshold of αlt, x i. Theorem 3 establishes that Oracle returns a solution x + X such that L t αlt, x +. Hence, RSGM can be used as an oracle A see Definition. Theorem 3. Suppose assumptions,, and 3 hold. Given level t t, t 0 ], solution x X, parameters α > and γ Lt, x L t, Oracle returns a solution x + X that satisfies L t αlt, x + in at most 40S µ + α γ log αl t + 60α logm + + 3M µ αl t iterations of the accelerated gradient method when µ > 0 and 0Sα D αl t + 4MDα 80 logm + α γ αl + log t αl t + such iterations when µ = 0 and D < Proof. Let x σ,λ t := arg min x X L σ,λ t, x and L σ,λ t := L σ,λt, x σ,λ t. We 304 will first show by induction that at each iteration i = 0,,,... of Oracle the 305 following inequality holds Lt, x i L t γ i. The validity of for i = 0 holds because x 0 = x, γ 0 = γ, and we know Lt, x L t γ. Suppose holds for iteration i. We claim that it also holds for iteration i + in both the strongly convex and convex cases. In fact, at iteration i +, we have Lt, x i+ L t L σi,λ i t, x i+ L σ i,λ i t + logm + + λ id σ i

9 Oracle Restarted Smoothing Gradient Method: RSGMt, x, α, γ Input: Level t, initial solution x X, constant α >, and initial precision γ Lt, x L t. Set x 0 = x and γ 0 = γ. 3 for i = 0,,,... do 4 if γ i αlt, x i then 5 Terminate and return x + = x i. 6 end if 7 Set σ i = 4 logm + γ i if µ > 0 8 logm + γ i if µ = 0 and λ i = 0 if µ > 0 γ i 4D if µ = 0. 8 Starting at x i, apply the accelerated gradient method [6, Algorithm ] on the optimization problem min x X L σi,λ i t; x for { 40S max µ, M } 60 logm + if µ > 0 µγi N i = { 60SD max, 4MD } 80 logm + if µ = 0, γ i iterations and let x i+ be the solution returned by this method. 9 Set γ i+ = γ i /. 0 end for γ i σ im + S x i x σ i,λ i t logm + Ni + + λ id σ i 4σ im + SL σi,λ i t, x i L σ i,λ i t logm + + 4σ im + S µ + λ i N i = 5γ iσ i M + S µ + λ i Ni γ i 8 + γ i 8 + γ i 4 = γ i = γ i+. µ + λ i Ni logm + γ i + + λ id σ i + γ i 4 + λ id σ i logm λ id σ i The first inequality follows from Lemma ; the second inequality from the σ i M + S-smoothness of L σi,λ i t, x and the convergence property, i.e., L σi,λ i t, x i+ L σ i,λ i t σ Ni i M + S x i x σ i,λ i t, of an accelerated gradient method [6, Corollary ]; the third using the λ i + µ-convexity of L σi,λ i t, x in x; and the fourth by combining Lemma and the induction hypothesis that Lt, x i L t γ i. The

10 0 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI rest of the relationships follow from the inequalities 5γ i σ i M /µ + λ i Ni γ i/8 and 5γ i S/µ + λ i Ni γ i/8, which hold for both µ > 0 and µ = 0 according to the definitions of σ i, λ i, and N i. It can be easily verified that Oracle terminates in at most I iterations, where α γ I := log αl t + 0. In fact, according to, for any i I, we have Lt, x i L t γ i = γ 0 i γ I α α L t, which implies that αlt, x i L t. Hence γ i α α αlt, x i = αlt, x i and Oracle terminates. We next analyze the total number of iterations across all calls of the accelerated gradient method [6, Algorithm ] during the execution of Oracle. Suppose µ > 0. The number of iterations taken by Oracle can be bounded as follows: I I I 40S N i µ + 60 logm + I M + µγ i i=0 = = = i=0 40S 40S 40S i=0 i=0 µ + 60 logm + I / I + M µ γ i=0 i µ + 60 logm + I I + M i/ µγ 0 i=0 µ + 60 logm + I/ I + M µ γ α γ log αl t + + 3M 40S µ + 60α logm + µ αl t. The first inequality above is obtained using the inequality a a+ for a R +, and by bounding the maximum determining N i by a sum of its terms; the first equality from simplifying terms; the second equality from using the definition of γ i ; the third equality from γ 0 = γ and the geometric sum formula I i=0 i/ = I/ / ; and the last inequality by substituting for I, using the relationship a + a + for a R + to obtain the inequality 3 / / I/ α γ = αl t + α γ αl, t and bounding / above by 3. Suppose µ = 0. The iteration complexity can be bounded using the following steps: I I I 0S 80 logm + I N i 4D + 4MD + γ i=0 i=0 i γ i=0 i i=0 I 0S 80 logm + I = 4D i/ + 4MD i + I γ 0 γ i=0 0 i=0 0S I/ 80 logm + I = 4D γ 0 + 4MD + I γ 0

11 Sα D αl t + 4αMD 80 logm + α γ αl + log t αl t +. The first inequality follows from replacing the maximum in the definition of N i by the sum of its terms and using a a + for a R +, the first and second equalities by rearranging terms and using the geometric sum formula, respectively, and the last inequality by substituting for I and using the inequality Non-smooth functions. In this subsection, we only enforce assumptions and. In other words, the functions f and g i i =,..., m can be non-smooth. Under Assumption, we have max x X ζ L,t x M for any ζ L,t x x Lt, x and t R, where x Lt, x denotes the set of all sub-gradients of Lt, at x with respect to x. As oracle A, we choose the standard sub-gradient method with varying stepsizes from [9] and its variant in [3], respectively, when µ = 0 and µ > 0 to directly solve 3. We present this sub-gradient method for both these choices of µ in Oracle. The iteration complexity of Oracle is well-known and given in Proposition without proof. The proof for µ = 0 can be found in [9,.] and and the one for µ > 0 is given in [3, Section 3.]. Here we only provide arguments to show that Oracle will return a solution with the desired optimality gap required by Algorithm, which then qualifies this algorithm to be used as an oracle A see Definition. Oracle Sub-gradient Descent: SGDt, x, α Input: Level t, initial solution x X and constant α > Set x 0 = x. 3 for i = 0,,,... do 4 Set M if µ > 0, if µ > 0, η i = i + µ and E i := i + µ D DM if µ = 0, if µ = 0. i + M i + 5 Set i j=0 x i = i + i + j + x j if µ > 0, i j=0 i + x j if µ = 0. 6 Let x i+ = arg min x X x x i + η i ζ L,t x i. 7 if E i αlt, x i then 8 Terminate and return x + = x i. 9 end if 0 end for Proposition [3, 9]. Given level t t, t 0 ], solution x X, and a parameter α >, Oracle returns a solution x + X that satisfies L t αlt, x + in at The original algorithms proposed in [3] and [9] are stochastic sub-gradient methods. Here, we apply these methods with deterministic sub-gradient.

12 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI most 4 αm µ αl t iterations of the sub-gradient method when µ > 0 and at most 5 such iterations when µ = 0 and D < +. 8α M D αl t Proof. We first consider the case when µ = 0 and D < +. Oracle is the subgradient method in [9] with the distance-generating function. From equation.4 in [9], we have Hence, when i 8α M D α L t Lt, x i L t DM i + = E i. implies L t αlt, x i. Hence, E i α α, we have Lt, x i L t E i α α L t, which L t αlt, x i which indicates 8α that Oracle terminates in no more than M D α L t iterations. Now, suppose µ > 0. Oracle, in this case, is the sub-gradient method in [3]. By Section 3. in [3], we have Lt, x i L t M i + µ. αm Therefore, when i µα L t, we have Lt, x i L t E i α α L t, which implies L t αlt, x i. In addition, the algorithm terminates after at most this many iterations because E i α α αlt, x i αlt, x i. This proposition indicates that the sub-gradient method in [9] and its variant in [3] qualify as oracles A in Algorithm when µ = 0 and µ > 0, respectively see Definition. Since the iteration complexities 4 and 5 do not depend on the initial solution x, the worst-case complexity does not restrict this choice of x X. In addition, γ is not needed for executing Oracle. Therefore, we omit it as an input to this algorithm even though the generic call to the oracle A in Algorithm includes γ. 4. Overall iteration complexity. We present an adaptive complexity analysis to establish the overall complexity of Algorithm when using the oracles A discussed in 3. We present two lemmas that are needed to show our main complexity result in Theorem 4. Lemma 3. Suppose Algorithm terminates at iteration K. Then for any k K, we have 6 L t k < L t 0 ɛ α. Proof. We first use the relationship t k = t k + Lt k, x k to show L t k L t k for any k. Defining x k := arg min x X Lt k, x, we proceed as follows: L t k max{fx k t k Lt k, x k ; g i x k, i =,..., m}

13 = max{fx k t k ; g i x k + Lt k, x k, i =,..., m} Lt k, x k L t k Lt k, x k L t k, where the second inequality holds since Lt k, x k α L t k 0 and the third inequality follows from the inequality L t k Lt k, x k. Since Algorithm has not terminated at iteration k for k K, we have 8 L t 0 ɛ αlt 0, x ɛ > α Lt k, x k α L t k, where the first inequality holds by the condition satisfied by oracle A, the second by the stopping criterion of Algorithm not being satisfied, and the third by the definition of L t k. The inequality 6 then follows by combining 7 and 8. Lemma 4. Suppose Algorithm terminates at iteration K. The following hold:. K L t k 4α 3 β L t 0 ɛ ;. K L t k 4 α β.5 L t 0 ɛ ; 3. K L t k 6α 5 3β 3 L t 0 ɛ. Proof. We only prove Item. Items and 3 can be derived following analogous steps. We have K L t k βt k t = β α K k βt K t βl t K α βl t 0 ɛ α βl t 0 ɛ 4α 3 β L t 0 ɛ, β K k α β K k α [ ] β α K+ β α where the first inequality follows by 4; the second by 5; the third by 4; the fourth using 6 with k = K; the first equality by the geometric sum; and the fifth inequality by dropping negative terms.

14 4 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI Table : Choices of the oracle A and γ k in Algorithm Smooth Asms,, & 3 Non-smooth Asms & Case Oracle Choice of γ k Strongly convex µ > 0 Convex µ = 0 & D < + MD Strongly convex µ > 0 N/A Convex µ = 0 & D < + N/A any γ L t 0 if k = 0 αlt k, x k if k We summarize the choice of oracle A and γ k in each iteration of Algorithm in Table asms abbreviates assumptions and characterize the overall iteration complexity measured by the total number of gradient iterations of this algorithm in Theorem 4. The dependence on ɛ in our big-o complexity for solving the constrained convex optimization problem - and the known analogue of this complexity for the unconstrained version of this problem are interestingly the same. However, unlike the unconstrained case, the complexity terms in Theorem 4 depend on the condition measure β. Theorem 4. Suppose the oracle A and γ k in Algorithm are chosen according to Table. Given 0 < ɛ, α >, and a g-strictly feasible solution e, Algorithm returns an ɛ-relative optimal and feasible solution to - in at most O β.5 L t 0 ɛ + log + log β α βɛ L, t 0 ɛ O β L, O t 0 ɛ β L, and O t 0 ɛ β 3 L t 0 ɛ gradient iterations, respectively, when the functions f and g i, i =,..., m are smooth strongly convex, smooth convex, non-smooth strongly convex, and non-smooth convex. Notice that we have only presented the terms ɛ, β and L t 0 in our big-o complexity. The exact numbers of iterations are given explicitly in the proof. Remark. Based on Table, when the functions f, g i, i =,..., m, are smooth and µ > 0, the input γ used in the first call of Oracle should be bounded below by L t 0. We do not specify the choice of such γ 0 in our complexity bound see 3. However, one simple choice of γ 0, in this case, could be µ fe since { L t 0 min max x X min x X µ fe, fe, x e + µ } x e, g i x, i =,..., m { fe, x e + µ x e } 464 where the first inequality follows from 9 and the third holds since x = e { µ fe solves min x R n fe, x e + µ 465 x e }.

15 Proof. [Smooth strongly convex case] We first show that we can guarantee γ k Lt k, x k L t k by choosing γ 0 L t 0 and γ k = αlt k, x k for k. Hence γ k can be used as an input to Oracle. When k = 0, we have Lt 0, x 0 = 0 and hence Lt 0, x 0 L t 0 = L t 0 γ 0. Take k. We show that Lt k, x k = max{fx k t k Lt k, x k ; g i x k, i =,..., m} = max{fx k t k ; g i x k + Lt k, x k, i =,..., m} Lt k, x k Lt k, x k Lt k, x k 0, where the first and the second inequalities hold because Lt k, x k α L t k 0. Furthermore, since L t is a decreasing function in t we have L t k L t k since t k t k. Combining this with 9 yields Lt k, x k L t k L t k αlt k, x k = γ k. We next characterize the iteration complexity of oracle A in each iteration of Algorithm using Theorem 3. By virtue of this theorem, the total number of gradient iterations performed by Algorithm is at most = 0 [ 40S 40S µ + + 3M µ + α γ k log αl t k + + 3M α γ 0 [log αl t α logm + µα L t k, k= ] 60α logm + µ αl t k α α ] Lt k, x k log αl t k where we have replaced γ k by αlt k, x k for k > 0, and used the inequality αl t k αl t k α αlt k, x k. The first sum in 0 can be bounded as follows. k= α α Lt k, x k log αl t k K α α Lt k, x k = log αl t k k= α K α = log Lt 0, x α L t K α K α log Lt 0, x α L t K [ α α = K log α K k= + log Lt0, x L t K Lt k, x k+ L t k ]

16 6 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI α α α log β + log α βɛ α + log α where the first inequality holds because Lt k, x k+ /L t k for all k = 0,,..., K, α and the second inequality follows from setting K equal to log β +, α using 6 for k = K, and the inequality L t 0 Lt 0, x 0. The second sum in 0 can be upper bounded using Item of Lemma 4. Specifically, 60α logm + 3M αµl t k Mα 30α logm + β.5 µ αl t 0 ɛ. Adding and, we obtain the following upper bound on the total number of iterations 0 required by Algorithm : [ 40S α α µ + α α log β + log α βɛ + log α ɛ ] / α γ 0 + log αl t Mα 30α logm + 3 β.5 µ αl. t 0 ɛ [Smooth convex case] Notice that γ k = MD can be used as an input to Oracle at every iteration k since Lt k, x k L t k ζ L,tk x k, x k x k ζ L,tk x k x k x k MD, where the first inequality follows from the convexity of Lt, x in x, the second using the Cauchy-Schwartz inequality, and the last using the definition of M and D. By theorems and 3, the total number of iterations required by Algorithm is equal to [ 0αS D αl t k + 4MDα ] 80 logm + α γ k αl + log t k αl t k +. An upper bound on this quantity is [ 0αS D αl t k + 4MDα 80 logm + αl + t k βɛ MDα αl t k which is obtained by replacing γ k = MD for all k and using the inequality loga a a for a >. Next using Lemma 4, we obtain the required upper bound on the total number of iterations taken by Algorithm as 48Dα β.5 + 8α β.5 0Sα αl t 0 ɛ / + MDα /. αl t 0 ɛ 6MDα 4 β αl 80 logm + / t 0 ɛ ɛ ],,

17 [Non-smooth strongly convex case] By Lemma 4 and Proposition, the total number of iterations required by Algorithm can be bounded above in a straightforward manner: αm µ αl t k 4M α 4 µβ αl t 0 ɛ. [Non-smooth convex case] In a similar fashion, by Lemma 4 and Proposition, the total number of iterations required by Algorithm is upper bounded by 8α M D αl t k 43M D α 7 β 3 αl t 0 ɛ REFERENCES [] Z. Allen-Zhu and E. Hazan, Optimal black-box reductions between optimization objectives, arxiv preprint arxiv: , 06. [] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy, Level-set methods for convex optimization, arxiv preprint arxiv: , 06. [3] A. Y. Aravkin, J. V. Burke, and M. P. Friedlander, Variational properties of value functions, SIAM Journal on Optimization, 3 03, pp [4] A. Bayandina, Adaptive mirror descent for constrained optimization, arxiv preprint arxiv: , 07. [5] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 009, pp [6] A. Beck and M. Teboulle, Smoothing and first order methods: A unified framework, SIAM Journal on Optimization, 0, pp [7] K. M. Bretthauer and B. Shetty, The nonlinear knapsack problem algorithms and applications, European Journal of Operational Research, 38 00, pp [8] M. Chiang, Geometric programming for communication systems, Foundations and Trends in Communications and Information Theory, 005, pp. 54. [9] R. M. Freund and H. Lu, New computational guarantees for solving convex optimization problems with first order methods, via a function growth condition measure, arxiv preprint arxiv:5.0974, 05. [0] B. Grimmer, Radial subgradient method, arxiv preprint arxiv: v, 07. [] Z. Harchaoui, A. Juditsky, and A. Nemirovski, Conditional gradient algorithms for normregularized smooth convex optimization, Mathematical Programming, 5 05, pp. 75. [] A. Juditsky and A. Nemirovski, First-order methods for nonsmooth convex large-scale optimization, i: General purpose methods, in Optimization for Machine Learning, S. Sra, S. Nowozin, and S. Wright, eds., MIT Press, Cambridge, MA, USA, 0, pp. 48. [3] S. Lacoste-Julien, M. Schmidt, and F. Bach, A simpler approach to obtaining an O/t convergence rate for the projected stochastic subgradient method, arxiv preprint arxiv:.00, 0. [4] G. Lan, Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization, Mathematical Programming, 49 05, pp. 45. [5] G. Lan and Z. Zhou, Algorithms for stochastic optimization with expectation constraints, arxiv preprint arxiv: , 06. [6] C. Lemaréchal, An extension of davidon methods to non differentiable problems, Nondifferentiable Optimization, 975, pp [7] C. Lemaréchal, A. Nemirovskii, and Y. Nesterov, New variants of bundle methods, Mathematical programming, , pp. 47. [8] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 9 009, pp [9] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 9 009, pp

18 8 QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI [0] Y. Nesterov, Introductory lectures on convex optimization: A basic course, Kluwer, Netherlands, 004. [] Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, , pp [] Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical programming, 0 009, pp. 59. [3] M. Patriksson and C. Strömberg, Algorithms for the continuous nonlinear resource allocation problem new implementations and numerical studies, European Journal of Operational Research, 43 05, pp [4] E. Pee and J. O. Royset, On solving large-scale finite minimax problems using exponential smoothing, Journal of optimization theory and applications, 48 0, pp [5] J. Renegar, Efficient subgradient methods for general convex optimization, SIAM Journal on Optimization, 6 06, pp [6] P. Tseng, On accelerated proximal gradient methods for convex-concave optimization, submitted to SIAM Journal on Optimization, 008. [7] E. Van Den Berg and M. P. Friedlander, Probing the pareto frontier for basis pursuit solutions, SIAM Journal on Scientific Computing, 3 008, pp [8] E. Van den Berg and M. P. Friedlander, Sparse optimization with least-squares constraints, SIAM Journal on Optimization, 0, pp [9] P. Wolfe, A method of conjugate subgradients for minimizing nondifferentiable functions, 975, pp

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Radial Subgradient Descent

Radial Subgradient Descent Radial Subgradient Descent Benja Grimmer Abstract We present a subgradient method for imizing non-smooth, non-lipschitz convex optimization problems. The only structure assumed is that a strictly feasible

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer Abstract We generalize the classic convergence rate theory for subgradient methods to

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

An adaptive accelerated first-order method for convex optimization

An adaptive accelerated first-order method for convex optimization An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Jianhui Chen 1, Tianbao Yang 2, Qihang Lin 2, Lijun Zhang 3, and Yi Chang 4 July 18, 2016 Yahoo Research 1, The

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimal Value Function Methods in Numerical Optimization Level Set Methods Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ɛ)

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ɛ) 1 28 Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/) Yi Xu yi-xu@uiowa.edu Yan Yan yan.yan-3@student.uts.edu.au Qihang Lin qihang-lin@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017 Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson 204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Santa Claus Schedules Jobs on Unrelated Machines

Santa Claus Schedules Jobs on Unrelated Machines Santa Claus Schedules Jobs on Unrelated Machines Ola Svensson (osven@kth.se) Royal Institute of Technology - KTH Stockholm, Sweden March 22, 2011 arxiv:1011.1168v2 [cs.ds] 21 Mar 2011 Abstract One of the

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725 Primal-Dual Interior-Point Methods Ryan Tibshirani Convex Optimization 10-725/36-725 Given the problem Last time: barrier method min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h i, i = 1,...

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

arxiv: v1 [math.oc] 10 Oct 2018

arxiv: v1 [math.oc] 10 Oct 2018 8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv:80.04765v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming Altuğ Bitlislioğlu and Colin N. Jones Abstract This technical note discusses convergence

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

arxiv: v1 [math.oc] 7 Dec 2018

arxiv: v1 [math.oc] 7 Dec 2018 arxiv:1812.02878v1 [math.oc] 7 Dec 2018 Solving Non-Convex Non-Concave Min-Max Games Under Polyak- Lojasiewicz Condition Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee University of Southern California

More information

Subgradient Methods in Network Resource Allocation: Rate Analysis

Subgradient Methods in Network Resource Allocation: Rate Analysis Subgradient Methods in Networ Resource Allocation: Rate Analysis Angelia Nedić Department of Industrial and Enterprise Systems Engineering University of Illinois Urbana-Champaign, IL 61801 Email: angelia@uiuc.edu

More information

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi Gradient Descent This lecture introduces Gradient Descent a meta-algorithm for unconstrained minimization. Under convexity of the

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Lecture 14: Newton s Method

Lecture 14: Newton s Method 10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Iteration-complexity of first-order augmented Lagrangian methods for convex programming

Iteration-complexity of first-order augmented Lagrangian methods for convex programming Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.

More information

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Joint work with Nicolas Le Roux and Francis Bach University of British Columbia Context: Machine Learning for Big Data Large-scale

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Revisiting Approximate Linear Programming Using a Saddle Point Based Reformulation and Root Finding Solution Approach

Revisiting Approximate Linear Programming Using a Saddle Point Based Reformulation and Root Finding Solution Approach Revisiting Approximate Linear Programming Using a Saddle Point Based Reformulation and Root Finding Solution Approach Qihang Lin Selvaprabu Nadarajah Negar Soheili Abstract Approximate linear programs

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Global convergence of the Heavy-ball method for convex optimization

Global convergence of the Heavy-ball method for convex optimization Noname manuscript No. will be inserted by the editor Global convergence of the Heavy-ball method for convex optimization Euhanna Ghadimi Hamid Reza Feyzmahdavian Mikael Johansson Received: date / Accepted:

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information