Stochastic Semi-Proximal Mirror-Prox
|
|
- Brett Ward
- 5 years ago
- Views:
Transcription
1 Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology Zaid Harchaoui NYU, Inria Abstract We present a direct extension of the Semi-Proximal Mirror-Prox algorithm [3] for minimizing convex composite problems with objectives consisted of three parts a convex loss represented by stochastic oracle, a proximal-friendly regularizer and another LMO-friendly regularizer. he algorithm leverages stochastic oracle, proximal operators, and linear minimization over problems domain and attains optimality in two-fold: i) optimal in the number of calls to the stochastic oracle representing the objective, ii) optimal in the number of calls to linear minimization oracle representing problem s domain in the smooth saddle point case. 1 Introduction Last decade demonstrates significant interests in minimizing composite functions of the form: min x X 1 n n i=1 f i(x) + h(x) where f i are convex continuously differentiable functions and h is a convex but perhaps not differentiable function. Such problems arise ubiquitously in machine learning, where the first term usually refers to empirical risk and h is a regularizer. he mainstream algorithms are devoted to address two challenges, i) large number of summation, ii) nonsmoothness of regularization term. o deal with the finite sum, a variety of stochastic and incremental gradient methods have been developed that use only one or a mini batch of components at a time. o handle the nonsmooth regularization and utilize the underlying structure, two kinds of algorithms have been widely studied the proximal type and conditional gradient type. Proximal type methods ([10, 1, 12]) require computation of a composite proximal operator at each iteration, i.e. solving problems of the form, min x X { 1 2 x ξ, x + αh(x) }, given input vector ξ and positive scalar α. Function h that admits easy-to-compute proximal operators, are called proximal-friendly. In contrast, conditional gradient type methods [2, 11] operate with the linear minimization oracle (LMO) at each iteration, i.e., solving problems of the form, min x X { ξ, x + αh(x)}, much cheaper than computing proximal operators. For instance, in the case of nuclear-norm, LMO only requires computing the leading pair of singular vectors, which is by orders of magnitude faster than full singular value decomposition required when computing proximal operator. Such h, are called LMO-friendly. he scope of this paper. Despite of the much success in this regime, few work has been devoted to the situation where the regularization is given by a mixture of proximal friendly and LMO-friendly components. Such mixtures are often introduced to promote several desired properties of the solution simultaneously, such as sparsity and low rank. While recent work [3, 7] considers only deterministic case, in this paper, we focus on the stochastic setting and aio solve problems such as (i) stochastic composite optimization (ii) stochastic composite saddle point problem min x=[x1,x 2] X E ξ [f(x, ξ)] + h 1 (x 1 ) + h 2 (x 2 ) (1) min x1 X 1 max x2 X 2 E ξ [Φ(x 1, x 2, ξ)] + h 1 (x 1 ) h 2 (x 2 ) (2) 1
2 where E[f(x, ξ)] is convex in x X; E[Φ(x 1, x 2, ξ)] is convex in x 1 X 1 and concave in x 2 X 2 ; h 1 (x 1 ) is proximal-friendly and h 2 (x 2 ) is LMO-friendly. Such problems indeed arise from many sparse and low-rank regularized models used in recommendation systems and social network prediction. o address them in a unified manner, we will focus on a broad class of variational inequalities with mixed structures, which further generalize problems (1) and (2). Our contribution. We develop a stochastic variant of the Semi-Proximal Mirror-Prox proposed in [3] that supports and leverages stochastic oracles, proximal operators and linear minimization oracles and achieves the best of three worlds. he algorithm extends the usual conditional gradient to stochastic setting, and enjoys, in terms of LMO calls, a O(1/ 2 ) complexity for smooth case (e.g f and Φ in (1) and (2) are smooth) as well as a O(1/ 4 ) complexity for general nonsmooth case. In both situation, the algorithm attains the optimal O(1/ 2 ) complexity of the number of stochastic oracles. o our best knowledge, the algorithm as well as theoretical results presented seeo be novel. 2 Stochastic Semi-Proximal Mirror-Prox 2.1 he situation Structured Variational Inequalities. We consider the variational inequality VI(X, F ): Find x X : F (x), x x 0, x X with domain X and operator F that satisfy the assumptions (A.1) (A.4) below. (A.1) Set X E u E v is closed convex and its projection P X = {u : x = [u; v] X} U, where U is convex and closed, E u, E v are Euclidean spaces; (A.2) he function ω( ) : U R is continuously differentiable and also 1-strongly convex w.r.t. some norm. his defines the Bregman distance V u (u ) = ω(u ) ω(u) ω (u), u u 1 2 u u 2 ; (A.3) he operator F (x = [u, v]) : X E u E v is monotone and of form F (u, v) = [F u (u); F v ] with F v E v being a constant and F u (u) E u satisfying the condition for some L <, M < ; u, u U : F u (u) F u (u ) L u u + M (A.4) he linear form F v, v of [u; v] E u E v is bounded from below on X and is coercive on X w.r.t. v: whenever [u t ; v t ] X, t = 1, 2,... is a sequence such that {u t } is bounded and v t 2 as t, we have F v, v t, t. Semi-structured Variational Inequalities. he class of semi-structured variational inequalities allows to go beyond Assumptions (A.1) (A.4), by assuming more sub-structure. his structure is consistent with what we call a semi-proximal setup, which encompasses both the regular proximal setup and the regular linear minimization setup as special cases. Indeed, we assume further (S.1) Proximal setup for X: we assume that E u = E u1 E u2, E v = E v1 E v2, and U U 1 U 2, X = X 1 X 2 with X i E ui E vi and P i X = {u i : [u i ; v i ] X i } U i for i = 1, 2, where U 1 is convex and closed, U 2 is convex and compact. We also assume that ω(u) = ω 1 (u 1 ) + ω 2 (u 2 ) and u = u 1 Eu1 + u 2 Eu2, with ω 2 ( ) : U 2 R continuously differentiable such that ω 2 (u 2) ω 2 (u 2 ) + ω 2 (u 2 ), u 2 u 2 + L0 κ u 2 u 2 κ E u2, u 2, u 2 U 2 ; for a particular 1 < κ 2 and L 0 <. Furthermore, we assume that the Eu2 -diameter of U 2 is bounded by some D > 0. (S.2) Partition of F : operator F induced by the above partition of X 1 and X 2 can be written as F (x) = [F u (u); F v ] with F u (u) = [F u1 (u 1, u 2 ); F u2 (u 1, u 2 )], F v = [F v1 ; F v2 ]. (S.3) Proximal mapping on X 1 : we assume that for any η 1 E u1 and α > 0, we have at our disposal easy-to-compute prox-mappings of the form, Prox ω1 (η 1, α) := argmin x1=[u 1;v 1] X 1 {ω 1 (u 1 ) + η 1, u 1 + α F v1, v 1 }. 2
3 (S.4) Linear minimization oracle for X 2 : we assume that we we have at our disposal Composite Linear Minimization Oracle (LMO), which given any input η 2 E u2 and α > 0, returns an optimal solution to the minimization problem with linear form, that is, LMO(η 2, α) := argmin x2=[u 2;v 2] X 2 { η 2, u 2 + α F v2, v 2 }. Stochastic Oracles. We are interested in the situation where we only have access to noisy information on F u (u). More specifically, we assume that the operator is represented by the following stochastic oracle, such that for any u U, it returns a vector g(u, ξ) satisfying (C.1) Unbiasedness and bounded variance: E[g(u, ξ)] = F u (u), E[ g(u, ξ) F u (u) 2 ] σ 2, (C.2) Light tail: E [ exp{ g(u, ξ) F u (u) 2 /σ 2 } ] exp{1} for some σ > 0. where is the dual norm same as in (A.3). Note that by Jensen s inequality, (C.2) implies (C.1). 2.2 Stochastic Semi-Proximal Mirror-Prox We present our Stochastic Semi-Proximal Mirror-Prox in Algoirthm 1. he algorithm blends composite Mirror Prox and composite Conditional Gradient to exploit the best efficiency out of the mixed structure. At each iteration t, we estimate the operator F by taking the average of vectors returned by the stochastic oracle. For sub-domain X 2 given by LMO, instead of computing exactly the prox-mapping, we mimick the prox-mapping via a conditional gradient algorithm (denoted as CCG) directly from [3]. For the sub-domain X 1, we compute the prox-mapping as it is. Algorithm 1 Stochastic Semi-Proximal Mirror-Prox Algorithm for Semi-VI(X, F ) Input: stepsizes γ t > 0, accuracies t 0, t = 1, 2,... [1] Initialize x 1 = [x 1 1; x 1 2] X, where x 1 1 = [u 1 1; v1]; 1 x 1 2 = [u 1 2; v2]. 1 for t = 1, 2,..., do [2] Set u t = [u t 1; u t 2], compute [g1; t g2] t = 1 mt j=1 g(ut, ξj t) and yt = [y1; t y2] t that y1 t := [û t 1; v 1] t = Prox ω1 (γ t g1 t ω 1(u t 1), γ t ) y2 t := [û t 2; v 2] t = CCG(X 2, ω 2 ( ) + γ t g2 t ω 2(u t 2),, γ t F v2 ; t ) [3] Set û t = [û t 1; û t 2], compute [ĝ t 1; ĝ t 2] = 1 2mt j=+1 g(ût, ξ t j ) and xt+1 = [x t+1 1 ; x t+1 2 ] that x t+1 1 := [u t+1 1 ; v t+1 x t+1 2 := [u t+1 2 ; v t+1 end for Output: x := [ū ; v ] = ( γ t) 1 γ ty t 1 ] = Prox ω1 (γ t ĝ1 t ω 1(u t 1), γ t ) 2 ] = CCG(X 2, ω 2 ( ) + γ t ĝ2 t ω 2(u t 2),, γ t F v2 ; t ) he CCG routine [3] is designed to solve smooth semi-linear problems: min x=[u;v] X {φ + (u, v) = φ(u) + c, v }. For the input pair (X, φ, c; ), the algorithm works as follows: (a) x 1 := [u 1 ; v 1 ] X; (b) x t := [û t ; v t ] = argmin x=[u;v] X { φ(u t ), u + c, v } compute δ t = φ + (x t ), x t x t, return if δ t (c) x t+1 := [u t+1 ; v t+1 ] s.t. φ + (x t+1 ) φ + (x t + 2 t+1 ( xt x t )) We provide the convergence analysis in the next section. 3 Convergence Results 3.1 Main Results (CCG(X, φ, c; )) Let us denote the resolution of the execution protocal I = {y t, F (y t )} and a collection λ = {λ t } with λ t = γ t / γ t on any set X X by Res(X I, λ ) = sup x X λ t F (y t, y t x. he resolution can be used to certify the accuracy of the approximate solution x = λ ty t to generic convex problems not limited to variational inequalities (see [8] for details). We arrive at the following 3
4 heorem 3.1. Let the stepsizes γ t satisfy 0 < γ t ( 3L) 1 and Θ[X ] = sup [u;v] X V u 1(u) for any X X. For a sequence of inexact prox-mappings with inexactness t 0 and batch size > 0, we have under assumption (C.1) that E[Res(X I, λ )] M 0 ( ) := ( γ t) 1 ( 2Θ[X ] γ 2 t σ2 γ2 t (M 2 + 2σ2 ) + 2 ) t. Moreover, if assumption (C.2) holds, then for any Λ > 0, Prob { Res(X I, λ ) M 0 ( ) + ΛM 1 ( ) } exp{ Λ 2 /3} + exp{ Λ} where M 1 ( ) = ( ( γ t) 1 7 γt 2 + 3Θ[X] 2σ2 ). As a corollary, we can derive (based on immediate results in [8]) (i) when F is a monotone vector field, the resulting efficiency estimate takes place for the dual gap of variational inequalities, i.e. E[ VI ( x X, F )] M 0 ( ); (ii) when F stems frohe (sub)gradient of a convex minimization problem min x X f(x) (for instance, problem (1)) with optimal solution being x, the resulting efficiency estimate takes place for the suboptimality, i.e. E[f( x ) f(x )] M 0 ( ); (iii) when F stems from a convex-concave saddle point problem min x 1 X 1 max x 2 X 2 Φ(x 1, x 2 ) (for instance, problem (2)), along with two induced convex optimization problems Opt(P ) = min x1 X 1 [ Φ(x 1 ) = sup x2 X 2 Φ(x 1, x 2 ) ] (P ) Opt(D) = max x2 X 2 [ Φ(x 2 ) = inf x1 X 1 Φ(x 1, x 2 ) ] (D) the resulting efficiency estimate is inherited both by primal and dual suboptimality gaps, i.e. E[Φ( x 1 ) Opt(P )] M 0( ) and E[Opt(D) Φ( x 2 )] M 0( ). 3.2 When F u is Lipschitz continuous (M = 0) In the case when F u is a Lipschitz continuous monotone operator with L > 0 and M = 0, we have Proposition 3.1. Under the assumptions (A.1) (A.4), (S.1) (S.4) with M = 0 and the proximal setup on U 2 being Euclidean setup. Setting stepsize γ t = ( 2L) 1 and batch size = O(γ 2 t σ 2 /Θ[X]), t = 1,...,, for the Stochastic Semi-Proximal Mirror-Prox algorithm to return an stochastic -solution to the variational inequality V I(X, F ) represented by stochastic oracle in (C.1) (C.2), the total number of stochastic oracle calls required does not exceed N SO = O(1)σ 2 Θ[X]/ 2 and the total number of calls to the Linear Minimization Oracle does not exceed N LMO = O(1)L 2 D 2 Θ[X]/ 2 where σ 2, D, Θ[X] are defined previously. 3.3 When F u is bounded (L = 0) In the case when F u is a uniformly bounded monotone operator with M > 0 and L = 0, we have Proposition 3.2. Under same assumptions with L = 0. Setting stepsize γ t = O(1) Θ[X] batch size = O(1) σ2 M 2, t = 1,...,, for the Stochastic Semi-Proximal Mirror-Prox algoritho return an stochastic -solution to the variational inequality V I(X, F ), the total number of stochastic oracle calls required does not exceed N SO = O(1)σ 2 Θ[X]/ 2 and the total number of calls to the Linear Minimization Oracle does not exceed N LMO = O(1)M 4 D 2 Θ[X]/ 4. Discussion. he stochastic variant of Semi-Proximal Mirror-Prox algorithm is designed to solve a broad class of variational inequalities allowing to cover the stochastic composite minimization and saddle point problems in (1) and (2). he algorithm enjoys the optimal complexity bounds, i.e. O(1/ 2 ), both in terms of the number of calls to stochastic oracle (see [9]) and the number of calls to linear minimization oracle (see [6]) when the underlying problem is in the saddle point form and associated monotone operator is Lipschitz continuous. In the general nonsmooth situation, the algorithm enjoys still optimal complexity bound O(1/ 2 ) in terms of the number of calls to stochastic oracles, but that of the linear minimization oracle becomes O(1/ 4 ). M and 4
5 References [1] Amir Beck and Marc eboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): , [2] Zaid Harchaoui, Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradient algorithms for normregularized smooth convex optimization. Mathematical Programming, pages 1 38, [3] Niao He and Zaid Harchaoui. Semi-proximal mirror-prox for nonsmooth composite minimization. Neural Information Processing Systems (NIPS), [4] Niao He, Anatoli Juditsky, and Arkadi Nemirovski. Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Computational Optimization and Applications, 61(2): , [5] Anatoli Juditsky, Arkadi Nemirovski, Claire auvel, et al. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17 58, [6] Guanghui Lan. he complexity of large-scale convex programming under a linear optimization oracle. arxiv, [7] Cun Mu, Yuqian Zhang, John Wright, and Donald Goldfarb. Scalable robust matrix recovery: Frankwolfe meets proximal methods. arxiv preprint arxiv: , [8] Arkadi Nemirovski, Shmuel Onn, and Uriel G Rothblum. Accuracy certificates for computational problems with convex structure. Mathematics of Operations Research, 35(1):52 78, [9] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley- Interscience Series in Discrete Mathematics, [10] Yu Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1): , [11] Yurii Nesterov. Complexity bounds for primal-dual methods minimizing the model of objective function. echnical report, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), [12] Neal Parikh and Stephen Boyd. Proximal algorithms. Foundations and rends in Optimization, 1(3): ,
6 4 echnical Details. 4.1 Problems (1) and (2) in the Form of Semi-Structured Variational Inequality he stochastic composite optimization in the form of (1) can be written as min E ξ [f(u, ξ)] + h 1 (u 1 ) + h 2 (u 2 ) u=[u 1,u 2] U=U 1 U 2 min E ξ[f(u, ξ)] + v 1 + v 2. u X,v 1 h 1(u 1),v 2 h 2(u 2) he variational inequality VI(X, F ) associated with the above problem is given by X = {x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]] : u 1 U 1, u 2 U 2, v 1 h 1 (u 1 ), v 2 h 2 (u 2 )}, F (x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]]) = [[ u1 E ξ [f(u, ξ)], u2 E ξ [f(u, ξ)]]; [1, 1]]. Clearly, the above VI(X, F ) satisfies the assumptions (A.1) (A.4), (S.1) (S.4). Similarly, for the stochastic composite saddle point problem in the form of (2) and its reformulation min u 1 U 1 min u 1 U 1,v 1 h 1(v 1) max E ξ [Φ(u 1, u 2, ξ)] + h 1 (u 1 ) h 2 (u 2 ) u 2 U 2 max E ξ[φ(u 1, u 2, ξ)] + v 1 v 2 u 2 U 2,v 2 h 2(v 2) the variational inequality VI(X, F ) associated with the above problem is given by X = {x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]] : u 1 U 1, u 2 U 2, v 1 h 1 (u 1 ), v 2 h 2 (u 2 )}, F (x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]]) = [[ u1 E ξ [Φ(u 1, u 2, ξ)], u2 E ξ [Φ(u 1, u 2, ξ)]]; [1, 1]]. Clearly, the above VI(X, F ) also satisfies the assumptions (A.1) (A.4), (S.1) (S.4). 4.2 Proof of heorem 3.1 Proof. he proof builds upon [5] and [4]. First of all, we show that Lemma 4.1. For any 0, x = [u; v] X and ξ = [η; ζ] E, let [u ; v ] = P x(ξ), where P x(ξ) = { x = [û; v] X : η + ω (û) ω (u), û s + ζ, v w [s; w] X}, we have for all [s; w] X, η, u s + ζ, v w V u (s) V u (s) V u (u ) +. (3) 1 0. When applying Lemma 4.1 with [u; v] = x t, ξ = [γ τ g t ; γ τ F v ], [u ; v ] = y t, and [s; w] = [u t+1 ; v t+1 ] = x t+1 we obtain: γ τ [ g t, û t u t+1 + F v, v t v t+1 ] V u t(u t+1 ) Vût(u t+1 ) V u t(u τ ) + τ (4) and applying Lemma 4.1 with [u; v] = x τ, ξ = γ τ [ĝ t, F v ], [u ; v ] = x t+1, and [s; w] = z X we get: γ τ [ ĝ t, u t+1 s + F v, v t+1 w ] V u t(s) V u t+1(s) V u t(u t+1 ) + τ. (5) Adding (5) to (4) we obtain for every z = [s; w] X for any [s, w] X, where γ t [ ĝ t, û t s + F v, v t w ] V u t(s) V u t+1(s) + σ t + 2 t, (6) σ t := γ t ĝ t g t, û t u t+1 Vût(u t+1 ) V u t(û t ). 6
7 Let t = F u (û t ) ĝ t, then for any z = [s, w] X, we have γ t F (y t ), y t z Θ[X] + σ t + 2 t + γ t t, û t s (7) Let e t = g t g t and ê t = ĝ t F u (û t ) = t, hen we have Hence, ĝ t g t 2 = (ĝ t F u (û t ) + (F u (û t ) g t ) + (g t g t ) 2 (ê t + L û t u t + M + e t ) 2 3L 2 û t u t 2 + 3M 2 + 3(e t + ê t ) 2 σ t γ2 t 2 ĝt g t ût u t+1 2 Vût(u t+1 ) V u t(û t ) γ2 t 2 ĝt g t ût u t 2. Since the stepsize γ t satisfy that 3γ 2 t L 1, we further have Define a special sequence ũ t such that ũ 1 = u 1 ; σ t 3γ2 t 2 [M 2 + (e t + ê t ) 2 ]. (8) ũ t+1 = argmin{ γ t t, u + Vũt(u)} u P ux he sequence defined above satisfies the following relation (see Corollary 2 in [5] for details): for any z = [s, w] X, γ t t, ũ t s Θ[X] + Combining (7), (8), (9), we end up with Res(X I, λ ) ( γ t ) (2Θ[X] Under Assumption (C.1), we have t γ 2 t 2 t 2 = Θ[X] + 7γ 2 t 2 [M 2 + (e 2 t + ê 2 t )] + t 2 t + γ 2 t 2 êt (9) ) γ t t, û t ũ t (10) E[ t F t ] = 0, E[e 2 t G t 1 ] σ2, and E[ê 2 t F t ] σ2. where F t = σ(ξ 1 1,..., ξ 1 2,..., ξ t 1,..., ξ t ) and G t = σ(ξ 1 1,..., ξ 1 2,..., ξ t 1,..., ξ t 2 ). One can further show that E[ t, û t ũ t ] = 0. It follows from (10) that E[Res(X I, λ )] ( γ t ) (2Θ[X] 1 + which proves the first part of the theorem. 7γ 2 t 2 [M 2 + 2σ2 ] + ) 2 t (11) 3 0. Under Assumption (C.2), we have Let C 1 = [ { E exp E[exp{e 2 t /(σ/ ) 2 }] exp{1} and E[exp{ê 2 t /(σ/ ) 2 }] exp{1}. 1 γ 2 t σ2 C 1, it follows from convexity and the above equation that }] [ γt 2 (e 2 t + ê 2 1 t ) E C 1 7 γ 2 t σ 2 exp { (e 2 t + ê 2 t )/(σ/ ) 2} ] exp{2}.
8 Applying Markov s inequality, we obtain: ( ) Λ > 0 : Prob γt 2 (e 2 t + ê 2 t ) (2 + Λ)C 1 exp{ Λ}. (12) Let ζ t = t, û t ũ t. We showed earlier that E[ζ t ] = 0. since û t ũ t 2 2Θ[X], then we also have E[exp{ζ 2 t /(2 2Θ[X]σ/ ) 2 }] exp{1} Applying the relation exp{x} x + exp{9x 2 /16}, one has for any s 0, [ { }] [ { 9s 2 }] { E exp s γ t ζ t E 16 exp γt 2 ζt 2 9s 2 exp 16 By Markov s inequality, one has Λ > 0 : Prob γ t ζ t 3ΛΘ[X] Combing equation (10), (12), and (13), we arrive at where σ 2 γ 2 t 8σ 2 Θ[X] 2 γ 2 t } exp{ Λ 2 /2} (13) Λ > 0 Prob ( Res(X I, λ ) M 0 ( ) + ΛM 1 ( ) ) exp{ Λ} + exp{ Λ 2 /2} M 0 ( ) = ( ( γ t) 1 2Θ[X] + 7γ 2 t 2 [M 2 + 2σ2 ] + M 1 ( ) = ( ( γ t) 1 ) 7 γt 2 + 3Θ[X] 2σ2. γ 2 t σ2 2 t ), 4.3 Proof of Proposition 3.1 Proof. Let us fix as the number of Mirror Prox steps, and since M = 0, from heorem 3.1, the efficiency estimate of the variational inequality implies that = Θ[X] γ2 t σ2 E[ VI ( x X, F )] 2Θ[X] + 7 γ t + 2 t Let us fix t for each t = 1,...,, then from Proposition 3.1 in [3], it takes at most s = O(1)( L0Dκ Θ[X] ) 1/(κ 1) calls to the LMO oracles to generate a point such that s t. Moreover, we have E[ VI ( x X, F )] O(1) LΘ[X]. herefore, to ensure E[ VI ( x X, F ) ] for a given accuracy > 0, the number of Mirror Prox steps is at most O( LΘ[X] ). herefore, the number of stochastic oracle calls used is at most N SO = = O(γt 2 2 σ 2 /Θ[X]) = O(1) σ2 Θ[X]. Moreover, the number of linear 2 ( ) L minimization oracle calls on X 2 needed is at most N LMO = s = O(1) 0L κ D 1/(κ 1)Θ[X]. κ In particular, if κ = 2 and L 0 = 1, this becomes N LMO = O(1) L2 D 2 Θ[X] Proof of Proposition 3.2 Proof. Let us fix as the number of Mirror Prox steps, and let, from t = Θ[X] for each t = 1,...,, since = O(1)σ 2 /M 2, from heorem 3.1, the efficiency estimate of the variational inequality implies that Θ[X]M E[ VI ( x X, F )] O(1).. κ 8
9 herefore, to ensure E[ VI ( x X, F ) ] for a given accuracy > 0, the number of Mirror Prox steps is at most O( M 2 Θ[X] ). Hence, the number of stochastic oracle calls used is at most 2 N SO = = O(1)σ 2 /M 2 = O(1) σ2 Θ[X]. From Proposition 3.1 in [3], it takes at most 2 s = O(1) D2 Θ[X] calls to the LMO oracles to generate a point such that s t. Hence, the number of linear minimization oracle calls on X 2 needed is at most N LMO = s = O(1) D2 2 Θ[X] = O(1) D2 M 4 Θ[X]. 4 9
Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization
Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization Niao He nhe6@gatech.edu GeorgiaTech Zaid Harchaoui zaid.harchaoui@inria.fr NYU, Inria arxiv:1507.01476v1 [math.oc] 6 Jul 2015 August 13, 2018
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationDifficult Geometry Convex Programs and Conditional Gradient Algorithms
Difficult Geometry Convex Programs and Conditional Gradient Algorithms Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research with
More informationConvex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions
Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia
More informationOptimal Regularized Dual Averaging Methods for Stochastic Optimization
Optimal Regularized Dual Averaging Methods for Stochastic Optimization Xi Chen Machine Learning Department Carnegie Mellon University xichen@cs.cmu.edu Qihang Lin Javier Peña Tepper School of Business
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationEfficient Methods for Stochastic Composite Optimization
Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationMini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationPavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017
Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov
More informationComplexity bounds for primal-dual methods minimizing the model of objective function
Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing
More informationSolving variational inequalities with monotone operators on domains given by Linear Minimization Oracles
Math. Program., Ser. A DOI 10.1007/s10107-015-0876-3 FULL LENGTH PAPER Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles Anatoli Juditsky Arkadi Nemirovski
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationOnline First-Order Framework for Robust Convex Optimization
Online First-Order Framework for Robust Convex Optimization Nam Ho-Nguyen 1 and Fatma Kılınç-Karzan 1 1 Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. July 20, 2016;
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationDISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder
011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More information1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods
1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P. 53 38041
More informationDual subgradient algorithms for large-scale nonsmooth learning problems
Math. Program., Ser. B (2014) 148:143 180 DOI 10.1007/s10107-013-0725-1 FULL LENGTH PAPER Dual subgradient algorithms for large-scale nonsmooth learning problems Bruce Cox Anatoli Juditsky Arkadi Nemirovski
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationAn adaptive accelerated first-order method for convex optimization
An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationFAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč
FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom
More informationBlock stochastic gradient update method
Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationIncremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning
Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic
More informationTutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization
Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of
More informationconsistent learning by composite proximal thresholding
consistent learning by composite proximal thresholding Saverio Salzo Università degli Studi di Genova Optimization in Machine learning, vision and image processing Université Paul Sabatier, Toulouse 6-7
More informationSaddle Point Algorithms for Large-Scale Well-Structured Convex Optimization
Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationAccelerating SVRG via second-order information
Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationWE consider an undirected, connected network of n
On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationALGORITHMS FOR STOCHASTIC OPTIMIZATION WITH EXPECTATION CONSTRAINTS
ALGORITHMS FOR STOCHASTIC OPTIMIZATIO WITH EXPECTATIO COSTRAITS GUAGHUI LA AD ZHIIAG ZHOU Abstract This paper considers the problem of minimizing an expectation function over a closed convex set, coupled
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationConvex Optimization Under Inexact First-order Information. Guanghui Lan
Convex Optimization Under Inexact First-order Information A Thesis Presented to The Academic Faculty by Guanghui Lan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy H. Milton
More informationSADAGRAD: Strongly Adaptive Stochastic Gradient Methods
Zaiyi Chen * Yi Xu * Enhong Chen Tianbao Yang Abstract Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition,
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 4
Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.
More informationMulti-class classification via proximal mirror descent
Multi-class classification via proximal mirror descent Daria Reshetova Stanford EE department resh@stanford.edu Abstract We consider the problem of multi-class classification and a stochastic optimization
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationHow hard is this function to optimize?
How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationarxiv: v1 [math.oc] 10 Oct 2018
8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv:80.04765v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationPrimal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming
Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More informationProvable Non-Convex Min-Max Optimization
Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationStochastic Subgradient Method
Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)
More informationFast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Fast Stochastic Maximization with O1/n)-Convergence Rate Mingrui Liu 1 Xiaoxuan Zhang 1 Zaiyi Chen Xiaoyu Wang 3 ianbao Yang 1 Abstract In this paper we consider statistical learning with area under ROC
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems
An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationStochastic model-based minimization under high-order growth
Stochastic model-based minimization under high-order growth Damek Davis Dmitriy Drusvyatskiy Kellie J. MacPhee Abstract Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More information6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure
6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P.
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems
Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm
More informationStability of optimization problems with stochastic dominance constraints
Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationUniversal Gradient Methods for Convex Optimization Problems
CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationThe basic problem of interest in this paper is the convex programming (CP) problem given by
Noname manuscript No. (will be inserted by the editor) An optimal randomized incremental gradient method Guanghui Lan Yi Zhou the date of receipt and acceptance should be inserted later arxiv:1507.02000v3
More informationarxiv: v1 [math.oc] 18 Mar 2016
Katyusha: Accelerated Variance Reduction for Faster SGD Zeyuan Allen-Zhu zeyuan@csail.mit.edu Princeton University arxiv:1603.05953v1 [math.oc] 18 Mar 016 March 18, 016 Abstract We consider minimizing
More informationPrimal-dual subgradient methods for convex problems
Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationFinite-sum Composition Optimization via Variance Reduced Gradient Descent
Finite-sum Composition Optimization via Variance Reduced Gradient Descent Xiangru Lian Mengdi Wang Ji Liu University of Rochester Princeton University University of Rochester xiangru@yandex.com mengdiw@princeton.edu
More informationConditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization
Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization Zaid Harchaoui Anatoli Juditsky Arkadi Nemirovski April 12, 2013 Abstract Motivated by some applications in signal processing
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationOptimisation non convexe avec garanties de complexité via Newton+gradient conjugué
Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationConvergence rate of inexact proximal point methods with relative error criteria for convex optimization
Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,
More information