Stochastic Semi-Proximal Mirror-Prox

Size: px
Start display at page:

Download "Stochastic Semi-Proximal Mirror-Prox"

Transcription

1 Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology Zaid Harchaoui NYU, Inria Abstract We present a direct extension of the Semi-Proximal Mirror-Prox algorithm [3] for minimizing convex composite problems with objectives consisted of three parts a convex loss represented by stochastic oracle, a proximal-friendly regularizer and another LMO-friendly regularizer. he algorithm leverages stochastic oracle, proximal operators, and linear minimization over problems domain and attains optimality in two-fold: i) optimal in the number of calls to the stochastic oracle representing the objective, ii) optimal in the number of calls to linear minimization oracle representing problem s domain in the smooth saddle point case. 1 Introduction Last decade demonstrates significant interests in minimizing composite functions of the form: min x X 1 n n i=1 f i(x) + h(x) where f i are convex continuously differentiable functions and h is a convex but perhaps not differentiable function. Such problems arise ubiquitously in machine learning, where the first term usually refers to empirical risk and h is a regularizer. he mainstream algorithms are devoted to address two challenges, i) large number of summation, ii) nonsmoothness of regularization term. o deal with the finite sum, a variety of stochastic and incremental gradient methods have been developed that use only one or a mini batch of components at a time. o handle the nonsmooth regularization and utilize the underlying structure, two kinds of algorithms have been widely studied the proximal type and conditional gradient type. Proximal type methods ([10, 1, 12]) require computation of a composite proximal operator at each iteration, i.e. solving problems of the form, min x X { 1 2 x ξ, x + αh(x) }, given input vector ξ and positive scalar α. Function h that admits easy-to-compute proximal operators, are called proximal-friendly. In contrast, conditional gradient type methods [2, 11] operate with the linear minimization oracle (LMO) at each iteration, i.e., solving problems of the form, min x X { ξ, x + αh(x)}, much cheaper than computing proximal operators. For instance, in the case of nuclear-norm, LMO only requires computing the leading pair of singular vectors, which is by orders of magnitude faster than full singular value decomposition required when computing proximal operator. Such h, are called LMO-friendly. he scope of this paper. Despite of the much success in this regime, few work has been devoted to the situation where the regularization is given by a mixture of proximal friendly and LMO-friendly components. Such mixtures are often introduced to promote several desired properties of the solution simultaneously, such as sparsity and low rank. While recent work [3, 7] considers only deterministic case, in this paper, we focus on the stochastic setting and aio solve problems such as (i) stochastic composite optimization (ii) stochastic composite saddle point problem min x=[x1,x 2] X E ξ [f(x, ξ)] + h 1 (x 1 ) + h 2 (x 2 ) (1) min x1 X 1 max x2 X 2 E ξ [Φ(x 1, x 2, ξ)] + h 1 (x 1 ) h 2 (x 2 ) (2) 1

2 where E[f(x, ξ)] is convex in x X; E[Φ(x 1, x 2, ξ)] is convex in x 1 X 1 and concave in x 2 X 2 ; h 1 (x 1 ) is proximal-friendly and h 2 (x 2 ) is LMO-friendly. Such problems indeed arise from many sparse and low-rank regularized models used in recommendation systems and social network prediction. o address them in a unified manner, we will focus on a broad class of variational inequalities with mixed structures, which further generalize problems (1) and (2). Our contribution. We develop a stochastic variant of the Semi-Proximal Mirror-Prox proposed in [3] that supports and leverages stochastic oracles, proximal operators and linear minimization oracles and achieves the best of three worlds. he algorithm extends the usual conditional gradient to stochastic setting, and enjoys, in terms of LMO calls, a O(1/ 2 ) complexity for smooth case (e.g f and Φ in (1) and (2) are smooth) as well as a O(1/ 4 ) complexity for general nonsmooth case. In both situation, the algorithm attains the optimal O(1/ 2 ) complexity of the number of stochastic oracles. o our best knowledge, the algorithm as well as theoretical results presented seeo be novel. 2 Stochastic Semi-Proximal Mirror-Prox 2.1 he situation Structured Variational Inequalities. We consider the variational inequality VI(X, F ): Find x X : F (x), x x 0, x X with domain X and operator F that satisfy the assumptions (A.1) (A.4) below. (A.1) Set X E u E v is closed convex and its projection P X = {u : x = [u; v] X} U, where U is convex and closed, E u, E v are Euclidean spaces; (A.2) he function ω( ) : U R is continuously differentiable and also 1-strongly convex w.r.t. some norm. his defines the Bregman distance V u (u ) = ω(u ) ω(u) ω (u), u u 1 2 u u 2 ; (A.3) he operator F (x = [u, v]) : X E u E v is monotone and of form F (u, v) = [F u (u); F v ] with F v E v being a constant and F u (u) E u satisfying the condition for some L <, M < ; u, u U : F u (u) F u (u ) L u u + M (A.4) he linear form F v, v of [u; v] E u E v is bounded from below on X and is coercive on X w.r.t. v: whenever [u t ; v t ] X, t = 1, 2,... is a sequence such that {u t } is bounded and v t 2 as t, we have F v, v t, t. Semi-structured Variational Inequalities. he class of semi-structured variational inequalities allows to go beyond Assumptions (A.1) (A.4), by assuming more sub-structure. his structure is consistent with what we call a semi-proximal setup, which encompasses both the regular proximal setup and the regular linear minimization setup as special cases. Indeed, we assume further (S.1) Proximal setup for X: we assume that E u = E u1 E u2, E v = E v1 E v2, and U U 1 U 2, X = X 1 X 2 with X i E ui E vi and P i X = {u i : [u i ; v i ] X i } U i for i = 1, 2, where U 1 is convex and closed, U 2 is convex and compact. We also assume that ω(u) = ω 1 (u 1 ) + ω 2 (u 2 ) and u = u 1 Eu1 + u 2 Eu2, with ω 2 ( ) : U 2 R continuously differentiable such that ω 2 (u 2) ω 2 (u 2 ) + ω 2 (u 2 ), u 2 u 2 + L0 κ u 2 u 2 κ E u2, u 2, u 2 U 2 ; for a particular 1 < κ 2 and L 0 <. Furthermore, we assume that the Eu2 -diameter of U 2 is bounded by some D > 0. (S.2) Partition of F : operator F induced by the above partition of X 1 and X 2 can be written as F (x) = [F u (u); F v ] with F u (u) = [F u1 (u 1, u 2 ); F u2 (u 1, u 2 )], F v = [F v1 ; F v2 ]. (S.3) Proximal mapping on X 1 : we assume that for any η 1 E u1 and α > 0, we have at our disposal easy-to-compute prox-mappings of the form, Prox ω1 (η 1, α) := argmin x1=[u 1;v 1] X 1 {ω 1 (u 1 ) + η 1, u 1 + α F v1, v 1 }. 2

3 (S.4) Linear minimization oracle for X 2 : we assume that we we have at our disposal Composite Linear Minimization Oracle (LMO), which given any input η 2 E u2 and α > 0, returns an optimal solution to the minimization problem with linear form, that is, LMO(η 2, α) := argmin x2=[u 2;v 2] X 2 { η 2, u 2 + α F v2, v 2 }. Stochastic Oracles. We are interested in the situation where we only have access to noisy information on F u (u). More specifically, we assume that the operator is represented by the following stochastic oracle, such that for any u U, it returns a vector g(u, ξ) satisfying (C.1) Unbiasedness and bounded variance: E[g(u, ξ)] = F u (u), E[ g(u, ξ) F u (u) 2 ] σ 2, (C.2) Light tail: E [ exp{ g(u, ξ) F u (u) 2 /σ 2 } ] exp{1} for some σ > 0. where is the dual norm same as in (A.3). Note that by Jensen s inequality, (C.2) implies (C.1). 2.2 Stochastic Semi-Proximal Mirror-Prox We present our Stochastic Semi-Proximal Mirror-Prox in Algoirthm 1. he algorithm blends composite Mirror Prox and composite Conditional Gradient to exploit the best efficiency out of the mixed structure. At each iteration t, we estimate the operator F by taking the average of vectors returned by the stochastic oracle. For sub-domain X 2 given by LMO, instead of computing exactly the prox-mapping, we mimick the prox-mapping via a conditional gradient algorithm (denoted as CCG) directly from [3]. For the sub-domain X 1, we compute the prox-mapping as it is. Algorithm 1 Stochastic Semi-Proximal Mirror-Prox Algorithm for Semi-VI(X, F ) Input: stepsizes γ t > 0, accuracies t 0, t = 1, 2,... [1] Initialize x 1 = [x 1 1; x 1 2] X, where x 1 1 = [u 1 1; v1]; 1 x 1 2 = [u 1 2; v2]. 1 for t = 1, 2,..., do [2] Set u t = [u t 1; u t 2], compute [g1; t g2] t = 1 mt j=1 g(ut, ξj t) and yt = [y1; t y2] t that y1 t := [û t 1; v 1] t = Prox ω1 (γ t g1 t ω 1(u t 1), γ t ) y2 t := [û t 2; v 2] t = CCG(X 2, ω 2 ( ) + γ t g2 t ω 2(u t 2),, γ t F v2 ; t ) [3] Set û t = [û t 1; û t 2], compute [ĝ t 1; ĝ t 2] = 1 2mt j=+1 g(ût, ξ t j ) and xt+1 = [x t+1 1 ; x t+1 2 ] that x t+1 1 := [u t+1 1 ; v t+1 x t+1 2 := [u t+1 2 ; v t+1 end for Output: x := [ū ; v ] = ( γ t) 1 γ ty t 1 ] = Prox ω1 (γ t ĝ1 t ω 1(u t 1), γ t ) 2 ] = CCG(X 2, ω 2 ( ) + γ t ĝ2 t ω 2(u t 2),, γ t F v2 ; t ) he CCG routine [3] is designed to solve smooth semi-linear problems: min x=[u;v] X {φ + (u, v) = φ(u) + c, v }. For the input pair (X, φ, c; ), the algorithm works as follows: (a) x 1 := [u 1 ; v 1 ] X; (b) x t := [û t ; v t ] = argmin x=[u;v] X { φ(u t ), u + c, v } compute δ t = φ + (x t ), x t x t, return if δ t (c) x t+1 := [u t+1 ; v t+1 ] s.t. φ + (x t+1 ) φ + (x t + 2 t+1 ( xt x t )) We provide the convergence analysis in the next section. 3 Convergence Results 3.1 Main Results (CCG(X, φ, c; )) Let us denote the resolution of the execution protocal I = {y t, F (y t )} and a collection λ = {λ t } with λ t = γ t / γ t on any set X X by Res(X I, λ ) = sup x X λ t F (y t, y t x. he resolution can be used to certify the accuracy of the approximate solution x = λ ty t to generic convex problems not limited to variational inequalities (see [8] for details). We arrive at the following 3

4 heorem 3.1. Let the stepsizes γ t satisfy 0 < γ t ( 3L) 1 and Θ[X ] = sup [u;v] X V u 1(u) for any X X. For a sequence of inexact prox-mappings with inexactness t 0 and batch size > 0, we have under assumption (C.1) that E[Res(X I, λ )] M 0 ( ) := ( γ t) 1 ( 2Θ[X ] γ 2 t σ2 γ2 t (M 2 + 2σ2 ) + 2 ) t. Moreover, if assumption (C.2) holds, then for any Λ > 0, Prob { Res(X I, λ ) M 0 ( ) + ΛM 1 ( ) } exp{ Λ 2 /3} + exp{ Λ} where M 1 ( ) = ( ( γ t) 1 7 γt 2 + 3Θ[X] 2σ2 ). As a corollary, we can derive (based on immediate results in [8]) (i) when F is a monotone vector field, the resulting efficiency estimate takes place for the dual gap of variational inequalities, i.e. E[ VI ( x X, F )] M 0 ( ); (ii) when F stems frohe (sub)gradient of a convex minimization problem min x X f(x) (for instance, problem (1)) with optimal solution being x, the resulting efficiency estimate takes place for the suboptimality, i.e. E[f( x ) f(x )] M 0 ( ); (iii) when F stems from a convex-concave saddle point problem min x 1 X 1 max x 2 X 2 Φ(x 1, x 2 ) (for instance, problem (2)), along with two induced convex optimization problems Opt(P ) = min x1 X 1 [ Φ(x 1 ) = sup x2 X 2 Φ(x 1, x 2 ) ] (P ) Opt(D) = max x2 X 2 [ Φ(x 2 ) = inf x1 X 1 Φ(x 1, x 2 ) ] (D) the resulting efficiency estimate is inherited both by primal and dual suboptimality gaps, i.e. E[Φ( x 1 ) Opt(P )] M 0( ) and E[Opt(D) Φ( x 2 )] M 0( ). 3.2 When F u is Lipschitz continuous (M = 0) In the case when F u is a Lipschitz continuous monotone operator with L > 0 and M = 0, we have Proposition 3.1. Under the assumptions (A.1) (A.4), (S.1) (S.4) with M = 0 and the proximal setup on U 2 being Euclidean setup. Setting stepsize γ t = ( 2L) 1 and batch size = O(γ 2 t σ 2 /Θ[X]), t = 1,...,, for the Stochastic Semi-Proximal Mirror-Prox algorithm to return an stochastic -solution to the variational inequality V I(X, F ) represented by stochastic oracle in (C.1) (C.2), the total number of stochastic oracle calls required does not exceed N SO = O(1)σ 2 Θ[X]/ 2 and the total number of calls to the Linear Minimization Oracle does not exceed N LMO = O(1)L 2 D 2 Θ[X]/ 2 where σ 2, D, Θ[X] are defined previously. 3.3 When F u is bounded (L = 0) In the case when F u is a uniformly bounded monotone operator with M > 0 and L = 0, we have Proposition 3.2. Under same assumptions with L = 0. Setting stepsize γ t = O(1) Θ[X] batch size = O(1) σ2 M 2, t = 1,...,, for the Stochastic Semi-Proximal Mirror-Prox algoritho return an stochastic -solution to the variational inequality V I(X, F ), the total number of stochastic oracle calls required does not exceed N SO = O(1)σ 2 Θ[X]/ 2 and the total number of calls to the Linear Minimization Oracle does not exceed N LMO = O(1)M 4 D 2 Θ[X]/ 4. Discussion. he stochastic variant of Semi-Proximal Mirror-Prox algorithm is designed to solve a broad class of variational inequalities allowing to cover the stochastic composite minimization and saddle point problems in (1) and (2). he algorithm enjoys the optimal complexity bounds, i.e. O(1/ 2 ), both in terms of the number of calls to stochastic oracle (see [9]) and the number of calls to linear minimization oracle (see [6]) when the underlying problem is in the saddle point form and associated monotone operator is Lipschitz continuous. In the general nonsmooth situation, the algorithm enjoys still optimal complexity bound O(1/ 2 ) in terms of the number of calls to stochastic oracles, but that of the linear minimization oracle becomes O(1/ 4 ). M and 4

5 References [1] Amir Beck and Marc eboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): , [2] Zaid Harchaoui, Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradient algorithms for normregularized smooth convex optimization. Mathematical Programming, pages 1 38, [3] Niao He and Zaid Harchaoui. Semi-proximal mirror-prox for nonsmooth composite minimization. Neural Information Processing Systems (NIPS), [4] Niao He, Anatoli Juditsky, and Arkadi Nemirovski. Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Computational Optimization and Applications, 61(2): , [5] Anatoli Juditsky, Arkadi Nemirovski, Claire auvel, et al. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17 58, [6] Guanghui Lan. he complexity of large-scale convex programming under a linear optimization oracle. arxiv, [7] Cun Mu, Yuqian Zhang, John Wright, and Donald Goldfarb. Scalable robust matrix recovery: Frankwolfe meets proximal methods. arxiv preprint arxiv: , [8] Arkadi Nemirovski, Shmuel Onn, and Uriel G Rothblum. Accuracy certificates for computational problems with convex structure. Mathematics of Operations Research, 35(1):52 78, [9] A. S. Nemirovsky and D. B. Yudin. Problem complexity and method efficiency in optimization. Wiley- Interscience Series in Discrete Mathematics, [10] Yu Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1): , [11] Yurii Nesterov. Complexity bounds for primal-dual methods minimizing the model of objective function. echnical report, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), [12] Neal Parikh and Stephen Boyd. Proximal algorithms. Foundations and rends in Optimization, 1(3): ,

6 4 echnical Details. 4.1 Problems (1) and (2) in the Form of Semi-Structured Variational Inequality he stochastic composite optimization in the form of (1) can be written as min E ξ [f(u, ξ)] + h 1 (u 1 ) + h 2 (u 2 ) u=[u 1,u 2] U=U 1 U 2 min E ξ[f(u, ξ)] + v 1 + v 2. u X,v 1 h 1(u 1),v 2 h 2(u 2) he variational inequality VI(X, F ) associated with the above problem is given by X = {x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]] : u 1 U 1, u 2 U 2, v 1 h 1 (u 1 ), v 2 h 2 (u 2 )}, F (x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]]) = [[ u1 E ξ [f(u, ξ)], u2 E ξ [f(u, ξ)]]; [1, 1]]. Clearly, the above VI(X, F ) satisfies the assumptions (A.1) (A.4), (S.1) (S.4). Similarly, for the stochastic composite saddle point problem in the form of (2) and its reformulation min u 1 U 1 min u 1 U 1,v 1 h 1(v 1) max E ξ [Φ(u 1, u 2, ξ)] + h 1 (u 1 ) h 2 (u 2 ) u 2 U 2 max E ξ[φ(u 1, u 2, ξ)] + v 1 v 2 u 2 U 2,v 2 h 2(v 2) the variational inequality VI(X, F ) associated with the above problem is given by X = {x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]] : u 1 U 1, u 2 U 2, v 1 h 1 (u 1 ), v 2 h 2 (u 2 )}, F (x = [u = [u 1, u 2 ]; v = [v 1, v 2 ]]) = [[ u1 E ξ [Φ(u 1, u 2, ξ)], u2 E ξ [Φ(u 1, u 2, ξ)]]; [1, 1]]. Clearly, the above VI(X, F ) also satisfies the assumptions (A.1) (A.4), (S.1) (S.4). 4.2 Proof of heorem 3.1 Proof. he proof builds upon [5] and [4]. First of all, we show that Lemma 4.1. For any 0, x = [u; v] X and ξ = [η; ζ] E, let [u ; v ] = P x(ξ), where P x(ξ) = { x = [û; v] X : η + ω (û) ω (u), û s + ζ, v w [s; w] X}, we have for all [s; w] X, η, u s + ζ, v w V u (s) V u (s) V u (u ) +. (3) 1 0. When applying Lemma 4.1 with [u; v] = x t, ξ = [γ τ g t ; γ τ F v ], [u ; v ] = y t, and [s; w] = [u t+1 ; v t+1 ] = x t+1 we obtain: γ τ [ g t, û t u t+1 + F v, v t v t+1 ] V u t(u t+1 ) Vût(u t+1 ) V u t(u τ ) + τ (4) and applying Lemma 4.1 with [u; v] = x τ, ξ = γ τ [ĝ t, F v ], [u ; v ] = x t+1, and [s; w] = z X we get: γ τ [ ĝ t, u t+1 s + F v, v t+1 w ] V u t(s) V u t+1(s) V u t(u t+1 ) + τ. (5) Adding (5) to (4) we obtain for every z = [s; w] X for any [s, w] X, where γ t [ ĝ t, û t s + F v, v t w ] V u t(s) V u t+1(s) + σ t + 2 t, (6) σ t := γ t ĝ t g t, û t u t+1 Vût(u t+1 ) V u t(û t ). 6

7 Let t = F u (û t ) ĝ t, then for any z = [s, w] X, we have γ t F (y t ), y t z Θ[X] + σ t + 2 t + γ t t, û t s (7) Let e t = g t g t and ê t = ĝ t F u (û t ) = t, hen we have Hence, ĝ t g t 2 = (ĝ t F u (û t ) + (F u (û t ) g t ) + (g t g t ) 2 (ê t + L û t u t + M + e t ) 2 3L 2 û t u t 2 + 3M 2 + 3(e t + ê t ) 2 σ t γ2 t 2 ĝt g t ût u t+1 2 Vût(u t+1 ) V u t(û t ) γ2 t 2 ĝt g t ût u t 2. Since the stepsize γ t satisfy that 3γ 2 t L 1, we further have Define a special sequence ũ t such that ũ 1 = u 1 ; σ t 3γ2 t 2 [M 2 + (e t + ê t ) 2 ]. (8) ũ t+1 = argmin{ γ t t, u + Vũt(u)} u P ux he sequence defined above satisfies the following relation (see Corollary 2 in [5] for details): for any z = [s, w] X, γ t t, ũ t s Θ[X] + Combining (7), (8), (9), we end up with Res(X I, λ ) ( γ t ) (2Θ[X] Under Assumption (C.1), we have t γ 2 t 2 t 2 = Θ[X] + 7γ 2 t 2 [M 2 + (e 2 t + ê 2 t )] + t 2 t + γ 2 t 2 êt (9) ) γ t t, û t ũ t (10) E[ t F t ] = 0, E[e 2 t G t 1 ] σ2, and E[ê 2 t F t ] σ2. where F t = σ(ξ 1 1,..., ξ 1 2,..., ξ t 1,..., ξ t ) and G t = σ(ξ 1 1,..., ξ 1 2,..., ξ t 1,..., ξ t 2 ). One can further show that E[ t, û t ũ t ] = 0. It follows from (10) that E[Res(X I, λ )] ( γ t ) (2Θ[X] 1 + which proves the first part of the theorem. 7γ 2 t 2 [M 2 + 2σ2 ] + ) 2 t (11) 3 0. Under Assumption (C.2), we have Let C 1 = [ { E exp E[exp{e 2 t /(σ/ ) 2 }] exp{1} and E[exp{ê 2 t /(σ/ ) 2 }] exp{1}. 1 γ 2 t σ2 C 1, it follows from convexity and the above equation that }] [ γt 2 (e 2 t + ê 2 1 t ) E C 1 7 γ 2 t σ 2 exp { (e 2 t + ê 2 t )/(σ/ ) 2} ] exp{2}.

8 Applying Markov s inequality, we obtain: ( ) Λ > 0 : Prob γt 2 (e 2 t + ê 2 t ) (2 + Λ)C 1 exp{ Λ}. (12) Let ζ t = t, û t ũ t. We showed earlier that E[ζ t ] = 0. since û t ũ t 2 2Θ[X], then we also have E[exp{ζ 2 t /(2 2Θ[X]σ/ ) 2 }] exp{1} Applying the relation exp{x} x + exp{9x 2 /16}, one has for any s 0, [ { }] [ { 9s 2 }] { E exp s γ t ζ t E 16 exp γt 2 ζt 2 9s 2 exp 16 By Markov s inequality, one has Λ > 0 : Prob γ t ζ t 3ΛΘ[X] Combing equation (10), (12), and (13), we arrive at where σ 2 γ 2 t 8σ 2 Θ[X] 2 γ 2 t } exp{ Λ 2 /2} (13) Λ > 0 Prob ( Res(X I, λ ) M 0 ( ) + ΛM 1 ( ) ) exp{ Λ} + exp{ Λ 2 /2} M 0 ( ) = ( ( γ t) 1 2Θ[X] + 7γ 2 t 2 [M 2 + 2σ2 ] + M 1 ( ) = ( ( γ t) 1 ) 7 γt 2 + 3Θ[X] 2σ2. γ 2 t σ2 2 t ), 4.3 Proof of Proposition 3.1 Proof. Let us fix as the number of Mirror Prox steps, and since M = 0, from heorem 3.1, the efficiency estimate of the variational inequality implies that = Θ[X] γ2 t σ2 E[ VI ( x X, F )] 2Θ[X] + 7 γ t + 2 t Let us fix t for each t = 1,...,, then from Proposition 3.1 in [3], it takes at most s = O(1)( L0Dκ Θ[X] ) 1/(κ 1) calls to the LMO oracles to generate a point such that s t. Moreover, we have E[ VI ( x X, F )] O(1) LΘ[X]. herefore, to ensure E[ VI ( x X, F ) ] for a given accuracy > 0, the number of Mirror Prox steps is at most O( LΘ[X] ). herefore, the number of stochastic oracle calls used is at most N SO = = O(γt 2 2 σ 2 /Θ[X]) = O(1) σ2 Θ[X]. Moreover, the number of linear 2 ( ) L minimization oracle calls on X 2 needed is at most N LMO = s = O(1) 0L κ D 1/(κ 1)Θ[X]. κ In particular, if κ = 2 and L 0 = 1, this becomes N LMO = O(1) L2 D 2 Θ[X] Proof of Proposition 3.2 Proof. Let us fix as the number of Mirror Prox steps, and let, from t = Θ[X] for each t = 1,...,, since = O(1)σ 2 /M 2, from heorem 3.1, the efficiency estimate of the variational inequality implies that Θ[X]M E[ VI ( x X, F )] O(1).. κ 8

9 herefore, to ensure E[ VI ( x X, F ) ] for a given accuracy > 0, the number of Mirror Prox steps is at most O( M 2 Θ[X] ). Hence, the number of stochastic oracle calls used is at most 2 N SO = = O(1)σ 2 /M 2 = O(1) σ2 Θ[X]. From Proposition 3.1 in [3], it takes at most 2 s = O(1) D2 Θ[X] calls to the LMO oracles to generate a point such that s t. Hence, the number of linear minimization oracle calls on X 2 needed is at most N LMO = s = O(1) D2 2 Θ[X] = O(1) D2 M 4 Θ[X]. 4 9

Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization

Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization Niao He nhe6@gatech.edu GeorgiaTech Zaid Harchaoui zaid.harchaoui@inria.fr NYU, Inria arxiv:1507.01476v1 [math.oc] 6 Jul 2015 August 13, 2018

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Difficult Geometry Convex Programs and Conditional Gradient Algorithms

Difficult Geometry Convex Programs and Conditional Gradient Algorithms Difficult Geometry Convex Programs and Conditional Gradient Algorithms Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research with

More information

Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions

Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions Convex Stochastic and Large-Scale Deterministic Programming via Robust Stochastic Approximation and its Extensions Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia

More information

Optimal Regularized Dual Averaging Methods for Stochastic Optimization

Optimal Regularized Dual Averaging Methods for Stochastic Optimization Optimal Regularized Dual Averaging Methods for Stochastic Optimization Xi Chen Machine Learning Department Carnegie Mellon University xichen@cs.cmu.edu Qihang Lin Javier Peña Tepper School of Business

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017 Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles

Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles Math. Program., Ser. A DOI 10.1007/s10107-015-0876-3 FULL LENGTH PAPER Solving variational inequalities with monotone operators on domains given by Linear Minimization Oracles Anatoli Juditsky Arkadi Nemirovski

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Online First-Order Framework for Robust Convex Optimization

Online First-Order Framework for Robust Convex Optimization Online First-Order Framework for Robust Convex Optimization Nam Ho-Nguyen 1 and Fatma Kılınç-Karzan 1 1 Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. July 20, 2016;

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder 011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods

1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods 1 First Order Methods for Nonsmooth Convex Large-Scale Optimization, I: General Purpose Methods Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P. 53 38041

More information

Dual subgradient algorithms for large-scale nonsmooth learning problems

Dual subgradient algorithms for large-scale nonsmooth learning problems Math. Program., Ser. B (2014) 148:143 180 DOI 10.1007/s10107-013-0725-1 FULL LENGTH PAPER Dual subgradient algorithms for large-scale nonsmooth learning problems Bruce Cox Anatoli Juditsky Arkadi Nemirovski

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

An adaptive accelerated first-order method for convex optimization

An adaptive accelerated first-order method for convex optimization An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic

More information

Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization

Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization Tutorial: Mirror Descent Algorithms for Large-Scale Deterministic and Stochastic Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of

More information

consistent learning by composite proximal thresholding

consistent learning by composite proximal thresholding consistent learning by composite proximal thresholding Saverio Salzo Università degli Studi di Genova Optimization in Machine learning, vision and image processing Université Paul Sabatier, Toulouse 6-7

More information

Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization

Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization Saddle Point Algorithms for Large-Scale Well-Structured Convex Optimization Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Accelerating SVRG via second-order information

Accelerating SVRG via second-order information Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

ALGORITHMS FOR STOCHASTIC OPTIMIZATION WITH EXPECTATION CONSTRAINTS

ALGORITHMS FOR STOCHASTIC OPTIMIZATION WITH EXPECTATION CONSTRAINTS ALGORITHMS FOR STOCHASTIC OPTIMIZATIO WITH EXPECTATIO COSTRAITS GUAGHUI LA AD ZHIIAG ZHOU Abstract This paper considers the problem of minimizing an expectation function over a closed convex set, coupled

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

Convex Optimization Under Inexact First-order Information. Guanghui Lan

Convex Optimization Under Inexact First-order Information. Guanghui Lan Convex Optimization Under Inexact First-order Information A Thesis Presented to The Academic Faculty by Guanghui Lan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy H. Milton

More information

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods Zaiyi Chen * Yi Xu * Enhong Chen Tianbao Yang Abstract Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition,

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4 Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.

More information

Multi-class classification via proximal mirror descent

Multi-class classification via proximal mirror descent Multi-class classification via proximal mirror descent Daria Reshetova Stanford EE department resh@stanford.edu Abstract We consider the problem of multi-class classification and a stochastic optimization

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

How hard is this function to optimize?

How hard is this function to optimize? How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

arxiv: v1 [math.oc] 10 Oct 2018

arxiv: v1 [math.oc] 10 Oct 2018 8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv:80.04765v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Provable Non-Convex Min-Max Optimization

Provable Non-Convex Min-Max Optimization Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Fast Stochastic Maximization with O1/n)-Convergence Rate Mingrui Liu 1 Xiaoxuan Zhang 1 Zaiyi Chen Xiaoyu Wang 3 ianbao Yang 1 Abstract In this paper we consider statistical learning with area under ROC

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Stochastic model-based minimization under high-order growth

Stochastic model-based minimization under high-order growth Stochastic model-based minimization under high-order growth Damek Davis Dmitriy Drusvyatskiy Kellie J. MacPhee Abstract Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure

6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure 6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization, II: Utilizing Problem s Structure Anatoli Juditsky Anatoli.Juditsky@imag.fr Laboratoire Jean Kuntzmann, Université J. Fourier B. P.

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Stochastic Gradient Descent with Only One Projection

Stochastic Gradient Descent with Only One Projection Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

The basic problem of interest in this paper is the convex programming (CP) problem given by

The basic problem of interest in this paper is the convex programming (CP) problem given by Noname manuscript No. (will be inserted by the editor) An optimal randomized incremental gradient method Guanghui Lan Yi Zhou the date of receipt and acceptance should be inserted later arxiv:1507.02000v3

More information

arxiv: v1 [math.oc] 18 Mar 2016

arxiv: v1 [math.oc] 18 Mar 2016 Katyusha: Accelerated Variance Reduction for Faster SGD Zeyuan Allen-Zhu zeyuan@csail.mit.edu Princeton University arxiv:1603.05953v1 [math.oc] 18 Mar 016 March 18, 016 Abstract We consider minimizing

More information

Primal-dual subgradient methods for convex problems

Primal-dual subgradient methods for convex problems Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

Finite-sum Composition Optimization via Variance Reduced Gradient Descent

Finite-sum Composition Optimization via Variance Reduced Gradient Descent Finite-sum Composition Optimization via Variance Reduced Gradient Descent Xiangru Lian Mengdi Wang Ji Liu University of Rochester Princeton University University of Rochester xiangru@yandex.com mengdiw@princeton.edu

More information

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization Zaid Harchaoui Anatoli Juditsky Arkadi Nemirovski April 12, 2013 Abstract Motivated by some applications in signal processing

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,

More information