arxiv: v1 [math.oc] 10 Oct 2018

Size: px
Start display at page:

Download "arxiv: v1 [math.oc] 10 Oct 2018"

Transcription

1 8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv: v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University of Iowa, Iowa ity, IA 54 October 0, 08 Abstract Error bound condition has recently gained revived interest in optimization. It has been leveraged to derive faster convergence for many popular algorithms, including subgradient methods, proximal gradient method and accelerated proximal gradient method. However, it is still unclear whether the Frank-Wolfe (FW) method can enjoy faster convergence under error bound condition. In this short note, we give an affirmative answer to this question. We show that the FW method (with a line search for the step size) for optimization over a strongly convex set is automatically adaptive to the error bound condition of the problem. In particular, the iteration complexity of FW can be characterized by O(max(/ǫ θ,log(/ǫ))) where θ [0,] is a constant that characterizes the error bound condition. Our results imply that if the constrained set is characterized by a strongly convex function and the objective function can achieve a smaller value outside the considered domain, then the FW method enjoys a fast rate of O(/t ).. Introduction In this draft, we consider the following constrained convex optimization problem: minf(x) () x Ω where f(w) is a smooth function and Ω E is a bounded strongly convex set. We assume that linear optimization over Ω is much more cheaper than projection onto Ω, which makes the FW method more suitable for solving the above problem than gradient methods. The goal of this paper is to show that the FW method is automatically adaptive to an error bound condition of the optimization problem. Below, we will first review the FW method and the error bound condition. In next section, we will prove that the FW method is automatically adaptive to the error bound condition. The original FW method, introduced by Frank and Wolfe (956) (a.k.a. onditional Graident method (Levitin and Polyak, 966)), is a projection-free fist-order method for minimizing smooth convex objective functions over a convex set. In recent years, the FW method has gained an increasing interest in large-scale optimization and machine learning (e.g., (Garber and Hazan, 05; Freund and Grigas, 06; Nesterov, 08; Narasimhan, 08)). Many existing works have shown the convergence rate of the standard FW method is O(/t) even for strongly convex objectives (larkson, 008; Hazan, 008; Jaggi, 03), and in general the rate could not be improved. Under different assumptions or for some c Y. Xu & T. Yang.

2 Xu Yang special cases, a series of works tried to get faster rates of the FW method and its variants(levitin and Polyak, 966; Demyanov and Rubinov, 970; Dunn, 979; Guélat and Marcotte, 986; Beck and Teboulle, 004; Garber and Hazan, 03; Lan, 03; Lacoste-Julien and Jaggi, 03; Garber and Hazan, 05; Lacoste-Julien and Jaggi, 05; Lan and Zhou, 06). For example, for minimizing smooth and strongly convex objective functions over a strongly convex set, Garber and Hazan (05) showed that the FM method enjoyed fast rate of O(/t ). In this paper, we first consider the FW method shown in Algorithm, where L f denotes a smoothness constant of f(x) with respect to such that f(x) f(y) + f(y) (x y) + L f x y holds for any x,y Ω. Note that both options for selecting the step size have been considered in the literature (Jaggi, 03; Garber and Hazan, 05). Option I requires evaluating the objective function but does not need to know the smoothness constant. Option II could be cheaper but requires knowing the Lipschitz constant of the gradient. Our analysis applies to both options. In the sequel, we will focus on option I, with which we have f(x t+ ) f(x t +η(y t x t )), η [0,] f(x t )+η(y t x t ) f(x t )+ η L f y t x t, η [0,] () Note that for option II, the second inequality above still holds. We consider the following definition of error bound condition for the optimization problem (). Definition (Hölderian error bound (HEB)) A function f(x) is said to satisfy a HEB condition on Ω if there exist θ [0,] and 0 < c < such that for any x Ω min x w c(f(x) f ) θ. (3) w Ω where Ω denotes the optimal set of min x Ω f(x) and f denotes the optimal objective value. It is notable that θ = 0 is a trivial condition since it always hold due to that Ω is a compact set. The above HEB condition has been considered for deriving faster convergence of subgradient methods (Yang and Lin, 08), proximal gradient method (Liu and Yang, 07), accelerated gradient method (Xu et al., 06), and stochastic subgradient methods (Xu et al., 07a). It has been shown that many problems satisfy the above condition (Xu et al., 06, 07a,b; Liu and Yang, 07; Yang and Lin, 08). For example, when functions are semialgebraic and regular (for instance, continuous), the above inequality is known to hold on any compact set (c.f. (Bolte et al., 07) and references therein). The last definition in this section is regarding the strongly convex set. Definition A convex set Ω is a α-strongly convex with respect to if for any x,y Ω, any γ [0,] and any vector z E such that z =, it holds that γx+( γ)y+γ( γ) α x y z Ω. Remark. Many previous works (e.g., (Levitin and Polyak, 966; Demyanov and Rubinov, 970; Dunn, 979; Garber and Hazan, 05)) considered this condition of feasible set when studying the FW method.

3 Algorithm Frank-Wolfe Method Initilization: x 0 Ω for t = 0,...,T do ompute y t argmin y Ω f(x t ) y Option I: Set η t = argmin η [0,] f(x t +η(y t x t )) Option II: Set η t = argmin η [0,] η(y t x t ) f(x t )+ η L f y t x t ompute x t+ = x t +η t (y t x t ) end for. Adaptive onvergence of the FW method In this section, we show that the FW method is automatically adaptive to the HEB condition, enjoying a faster convergence rate than the standard O(/t) rate without the knowledge of the HEB condition. We first prove the following lemma. Lemma 3 Assume f(x) obeys the HEB condition on Ω with θ [0,], then it holds that f(x) c (f(x) f ) θ. Proof Let x denote the optimal solution in Ω that is closest to x measured in. By convexity of f( ), we have Thus, As a result, f(x ) f(x)+ f(x) (x x). f(x) f(x ) f(x) x x c(f(x) f ) θ f(x). f(x) c (f(x) f ) θ. The second lemma is from (Garber and Hazan, 05). Lemma 4 For the FW method given in Algorithm, for t = 0,..., we have { ( f(x t+ ) f (f(x t ) f )max, α f(x )} t). 8L f Finally, we prove the following theorem. Theorem For every t, we have if θ [0,) (t+k) /( θ) f(x t ) f ρ t (f(x 0 ) f ) otherwise 3

4 Xu Yang { where k max θ {L }, θ, max f D (+k) θ ( ) }, θ M, = θ θ( θ ), and ρ = max{, α 8cL f }. Remark. In order to find an ǫ-approximate solution x t such that f(x t ) f ǫ, the iteration complexity of FW method is O(max(/ǫ θ,log(/ǫ))) with θ [0,]. Proof When θ =, the conclusion is trivial, which follows directly from Lemma 4. Next, we prove for θ [0,). Let β = θ. h t = f(x t ) f. ombining Lemma 3 and Lemma 4, we have { h t+ h t max, α } { } h β t = h t max 8cL f, Mhβ t (4) We prove by induction that h t (t+k) /β. For the case t =, following () we have h h 0 ( η)+ L fη D { Lf D } max,h 0 L fd, η [0,], where we use the fact that h 0 = f(x 0 ) f(x ) f(x ) (x 0 x ) + L f x 0 x = L f x 0 x L fd ]. As long as /(+k) /β L f D /, we have the conclusion holds for t =. Next, we consider t. First assume that the max operation in (4) gives /, i.e., h t+ h t (t+k) /β where the last inequality holds as long as (t++k) /β (t++k) /β (t+k) /β (t++k) /β, (t++k) /β (t+k) /β, t, i.e., + t+k β, t, i.e., k β β. Next, consider the case that the max operation is the second argument. In this case, if h t, the same conclusion holds under the above condition of k. Otherwise, (t+k) /β h t. We have (t+k) /β ( ( ) ) h t+ h t ( Mh β t ) β (t+k) /β M t+k (t+k+) /β (t+k+) /β (t+k) /β (t+k+) /β (+ t+k ( ( M ) )( t+k ) ) β t+k To show the last inequality holds, we can set = β ( β)( β ) > and ( /M) /β. To see this, we need to show that log(+ x) β log(+x) 0, 0 x β. 4

5 In fact, due to + x β(+x) 0, 0 x β, it gives + x (+x) /β holds for all 0 x β. Plugging x = /(t +k) β into this inequality, we get what we want + /(t+k) (+ t+k )/β. 3. Examples Lastly, we give examples exhibiting the HEB condition with θ = /. In particular, let us consider min f(x) (5) g(x) r where g(x) is a non-negative, strongly and smooth function. It is shown that Ω = {x : g(x) r} is a strongly convex set (Garber and Hazan, 05). Lemma 5 Assume that min x f(x) < min g(x) r f(x) and there exists a x 0 such that g(x 0 ) < r, then the above problem satisfies HEB with θ = /. Proof We set Ω = {x : g(x) r} and Ω = argmin g(x) r f(x), and we define an indicator function as follows, { 0 if x Ω, I Ω (x) = + if x / Ω. Then the problem of (5) can be written as min x f(x) := f(x)+iω (x), and thus we also have Ω = argmin x f(x). We only need to consider any fixed x Ω. By the condition of g(x 0 ) < r and orollary 8.. of (Rockafellar, 970), there exists λ 0 such that f(x ) =min f(x) = f(x ) = minf(x) = min x x {f(x)+λ (g(x) r)} x Ω f(x )+λ (g(x ) r) f(x ), (6) where the first inequality is due to x Ω ; the second inequality uses the fact that x Ω hence g(x ) r 0. Then, equality holds for (6), which implies f(x ) +λ (g(x ) r) = f(x ), that is, λ (g(x ) r) = 0. (7) On the other hand, let u argmin x f(x), then based on the assumption of min x f(x) < min g(x) r f(x) we know u / Ω hence u / Ω. By (6), we also know f(u ) < minf(x) = min x Ω x {f(x)+λ (g(x) r)} f(u )+λ (g(u ) r), 5

6 Xu Yang which implies λ (g(u ) r) > 0. (8) Since u / Ω, then g(u ) r > 0. In order to have (8), we need λ > 0. Thus, by (7) we have g(x ) r = 0. (9) For any such λ > 0, then by Theorem 8. of (Rockafellar, 970), we also have Ω = {x : g(x) = r} argmin x {f(x)+λ (g(x) r)}. (0) Since g(x) is strongly convex, f(x) is convex and λ > 0, then f(x)+λ (g(x) r) is also stronglyconvex, implyingthat v = argmin x {f(x)+λ (g(x) r)} isauniqueconstant. Due to λ > 0, g(v ) is also a constant (Li and Pong, 07). By (0) we have g(v ) = g(x ) = r. Therefore, Ω = argmin x {f(x)+λ (g(x) r)}. () By the strong convexity of f(x)+λ (g(x) r) we know for any x Ω and x Ω Ω, c x x f(x)+λ (g(x) r) [f(x )+λ (g(x ) r)], where c > 0. Since λ > 0, g(x) r 0 and g(x ) r = 0, we get Therefore, for any x Ω which implies θ = /. c x x f(x) f(x ). min x w c(f(x) f ) /, w Ω References Amir Beck and Marc Teboulle. A conditional gradient method with linear rate of convergence for solving convex linear systems. Mathematical Methods of Operations Research, 59():35 47, 004. Jérôme Bolte, Trong Phong Nguyen, Juan Peypouquet, and Bruce W Suter. From error bounds to the complexity of first-order descent methods for convex functions. Mathematical Programming, 65():47 507, 07. Kenneth L larkson. oresets, sparse greedy approximation, and the frank-wolfe algorithm. In Proceedings of the nineteenth annual AM-SIAM symposium on Discrete algorithms (SODA), pages Society for Industrial and Applied Mathematics,

7 Vladimir Fedorovich Demyanov and Aleksandr Moiseevich Rubinov. Approximate methods in optimization problems, volume 3. Elsevier Publishing ompany, 970. Joseph Dunn. Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM Journal on ontrol and Optimization, 7():87, 979. Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval research logistics quarterly, 3(-):95 0, 956. Robert M Freund and Paul Grigas. New analysis and results for the frank wolfe method. Mathematical Programming, 55(-):99 30, 06. Dan Garber and Elad Hazan. Playing non-linear games with linear oracles. In Foundations of omputer Science (FOS), 03 IEEE 54th Annual Symposium on, pages 40 48, 03. Dan Garber and Elad Hazan. Faster rates for the frank-wolfe method over strongly-convex sets. In Proceedings of the 3nd International onference on Machine Learning (IML), pages , 05. Jacques Guélat and Patrice Marcotte. Some comments on wolfe s away step. Mathematical Programming, 35():0 9, 986. Elad Hazan. Sparse approximate solutions to semidefinite programs. In Latin American symposium on theoretical informatics, pages Springer, 008. Martin Jaggi. Revisiting frank-wolfe: projection-free sparse convex optimization. In Proceedings of the 30th International onference on Machine Learning (IML), pages , 03. Simon Lacoste-Julien and Martin Jaggi. An affine invariant linear convergence analysis for frank-wolfe algorithms. arxiv preprint arxiv:3.7864, 03. Simon Lacoste-Julien and Martin Jaggi. On the global linear convergence of frank-wolfe optimization variants. In Advances in Neural Information Processing Systems (NIPS), pages , 05. Guanghui Lan. The complexity of large-scale convex programming under a linear optimization oracle. arxiv preprint arxiv: , 03. Guanghui Lan and Yi Zhou. onditional gradient sliding for convex optimization. SIAM Journal on Optimization, 6(): , 06. ES Levitin and BT Polyak. onstrained minimization methods. USSR omputational Mathematics and Mathematical Physics, 6(5): 50, 966. Guoyin Li and Ting Kei Pong. alculus of the exponent of kurdyka lojasiewicz inequality and its applications to linear convergence of first-order methods. Foundations of omputational Mathematics, pages 34, 07. 7

8 Xu Yang Mingrui Liu and Tianbao Yang. Adaptive accelerated gradient converging method under hölderian error bound condition. In Advances in Neural Information Processing Systems, pages , 07. Harikrishna Narasimhan. Learning with complex loss functions and constraints. In International onference on Artificial Intelligence and Statistics, pages , 08. Yu Nesterov. omplexity bounds for primal-dual methods minimizing the model of objective function. Mathematical Programming, 7(-):3 330, 08. R Tyrrell Rockafellar. onvex Analysis. Princeton University Press, 970. Yi Xu, Yan Yan, Qihang Lin, and Tianbao Yang. Homotopy smoothing for non-smooth problems with lower complexity than O(/ǫ). In Advances in Neural Information Processing Systems (NIPS), pages 08 6, 06. Yi Xu, Qihang Lin, and Tianbao Yang. Stochastic convex optimization: Faster local growth implies faster global convergence. In Proceedings of the 34th International onference on Machine Learning (IML), pages , 07a. Yi Xu, Mingrui Liu, Qihang Lin, and Tianbao Yang. ADMM without a fixed penalty parameter: Faster convergence with new adaptive penalization. In Advances in Neural Information Processing Systems 30 (NIPS), pages 67 77, 07b. Tianbao Yang and Qihang Lin. Rsg: Beating subgradient method without smoothness and strong convexity. Journal of Machine Learning Research, 9(6), 08. 8

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ɛ)

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ɛ) 1 28 Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/) Yi Xu yi-xu@uiowa.edu Yan Yan yan.yan-3@student.uts.edu.au Qihang Lin qihang-lin@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Variance-Reduced and Projection-Free Stochastic Optimization

Variance-Reduced and Projection-Free Stochastic Optimization Elad Hazan Princeton University, Princeton, NJ 08540, USA Haipeng Luo Princeton University, Princeton, NJ 08540, USA EHAZAN@CS.PRINCETON.EDU HAIPENGL@CS.PRINCETON.EDU Abstract The Frank-Wolfe optimization

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

arxiv: v2 [cs.lg] 14 Sep 2017

arxiv: v2 [cs.lg] 14 Sep 2017 Elad Hazan Princeton University, Princeton, NJ 08540, USA Haipeng Luo Princeton University, Princeton, NJ 08540, USA EHAZAN@CS.PRINCETON.EDU HAIPENGL@CS.PRINCETON.EDU arxiv:160.0101v [cs.lg] 14 Sep 017

More information

From error bounds to the complexity of first-order descent methods for convex functions

From error bounds to the complexity of first-order descent methods for convex functions From error bounds to the complexity of first-order descent methods for convex functions Nguyen Trong Phong-TSE Joint work with Jérôme Bolte, Juan Peypouquet, Bruce Suter. Toulouse, 23-25, March, 2016 Journées

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Yi Xu 1 Qihang Lin 2 Tianbao Yang 1 Abstract In this paper, a new theory is developed for firstorder stochastic convex

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

On the von Neumann and Frank-Wolfe Algorithms with Away Steps

On the von Neumann and Frank-Wolfe Algorithms with Away Steps On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili July 16, 015 Abstract The von Neumann algorithm is a simple coordinate-descent algorithm to determine

More information

Adaptive restart of accelerated gradient methods under local quadratic growth condition

Adaptive restart of accelerated gradient methods under local quadratic growth condition Adaptive restart of accelerated gradient methods under local quadratic growth condition Olivier Fercoq Zheng Qu September 6, 08 arxiv:709.0300v [math.oc] 7 Sep 07 Abstract By analyzing accelerated proximal

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods Zaiyi Chen * Yi Xu * Enhong Chen Tianbao Yang Abstract Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition,

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections Jianhui Chen 1, Tianbao Yang 2, Qihang Lin 2, Lijun Zhang 3, and Yi Chang 4 July 18, 2016 Yahoo Research 1, The

More information

Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives

Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Song

More information

Pairwise Away Steps for the Frank-Wolfe Algorithm

Pairwise Away Steps for the Frank-Wolfe Algorithm Pairwise Away Steps for the Frank-Wolfe Algorithm Héctor Allende Department of Informatics Universidad Federico Santa María, Chile hallende@inf.utfsm.cl Ricardo Ñanculef Department of Informatics Universidad

More information

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods Zaiyi Chen * Yi Xu * Enhong Chen Tianbao Yang Abstract Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition,

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Provable Non-Convex Min-Max Optimization

Provable Non-Convex Min-Max Optimization Provable Non-Convex Min-Max Optimization Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA, 52242 Department of Mathematics, The

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Stochastic Gradient Descent with Only One Projection

Stochastic Gradient Descent with Only One Projection Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM

Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM Ching-pei Lee LEECHINGPEI@GMAIL.COM Dan Roth DANR@ILLINOIS.EDU University of Illinois at Urbana-Champaign, 201 N. Goodwin

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Projection-free Distributed Online Learning in Networks

Projection-free Distributed Online Learning in Networks Wenpeng Zhang Peilin Zhao 2 Wenwu Zhu Steven C. H. Hoi 3 Tong Zhang 4 Abstract The conditional gradient algorithm has regained a surge of research interest in recent years due to its high efficiency in

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

A Universal Catalyst for Gradient-Based Optimization

A Universal Catalyst for Gradient-Based Optimization A Universal Catalyst for Gradient-Based Optimization Julien Mairal Inria, Grenoble CIMI workshop, Toulouse, 2015 Julien Mairal, Inria Catalyst 1/58 Collaborators Hongzhou Lin Zaid Harchaoui Publication

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

arxiv: v1 [math.oc] 7 Dec 2018

arxiv: v1 [math.oc] 7 Dec 2018 arxiv:1812.02878v1 [math.oc] 7 Dec 2018 Solving Non-Convex Non-Concave Min-Max Games Under Polyak- Lojasiewicz Condition Maziar Sanjabi, Meisam Razaviyayn, Jason D. Lee University of Southern California

More information

arxiv: v1 [math.oc] 9 Oct 2018

arxiv: v1 [math.oc] 9 Oct 2018 Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University

More information

Stochastic Semi-Proximal Mirror-Prox

Stochastic Semi-Proximal Mirror-Prox Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology nhe6@gatech.edu Zaid Harchaoui NYU, Inria firstname.lastname@nyu.edu Abstract We present a direct extension of the Semi-Proximal

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

SVD-free Convex-Concave Approaches for Nuclear Norm Regularization

SVD-free Convex-Concave Approaches for Nuclear Norm Regularization SV-free Convex-Concave Approaches for Nuclear Norm Regularization Yichi Xiao, Zhe Li, ianbao Yang, Lijun Zhang National Key Laboratory for Novel Software echnology, Nanjing University, Nanjing 003, China

More information

Sharpness, Restart and Compressed Sensing Performance.

Sharpness, Restart and Compressed Sensing Performance. Sharpness, Restart and Compressed Sensing Performance. Alexandre d Aspremont, CNRS & D.I., Ecole normale supérieure. With Vincent Roulet (U. Washington) and Nicolas Boumal (Princeton U.). Support from

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

A Unified Analysis of Stochastic Momentum Methods for Deep Learning A Unified Analysis of Stochastic Momentum Methods for Deep Learning Yan Yan,2, Tianbao Yang 3, Zhe Li 3, Qihang Lin 4, Yi Yang,2 SUSTech-UTS Joint Centre of CIS, Southern University of Science and Technology

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Barzilai-Borwein Step Size for Stochastic Gradient Descent Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan The Chinese University of Hong Kong chtan@se.cuhk.edu.hk Shiqian Ma The Chinese University of Hong Kong sqma@se.cuhk.edu.hk Yu-Hong

More information

A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization

A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization Dan Garber Technion - Israel Inst. of Tech. dangar@tx.technion.ac.il Elad Hazan Technion - Israel

More information

arxiv: v3 [math.oc] 25 Nov 2015

arxiv: v3 [math.oc] 25 Nov 2015 arxiv:1507.04073v3 [math.oc] 5 Nov 015 On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili October 14, 018 Abstract The von Neumann algorithm is a simple

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-7) A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis Zhe Li, Tianbao Yang, Lijun Zhang, Rong

More information

constrained convex optimization, level-set technique, first-order methods, com-

constrained convex optimization, level-set technique, first-order methods, com- 3 4 5 6 7 8 9 0 3 4 5 6 7 8 A LEVEL-SET METHOD FOR CONVEX OPTIMIZATION WITH A FEASIBLE SOLUTION PATH QIHANG LIN, SELVAPRABU NADARAJAH, AND NEGAR SOHEILI Abstract. Large-scale constrained convex optimization

More information

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom

More information

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson 204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization Modern Stochastic Methods Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization 10-725 Last time: conditional gradient method For the problem min x f(x) subject to x C where

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Polytope conditioning and linear convergence of the Frank-Wolfe algorithm

Polytope conditioning and linear convergence of the Frank-Wolfe algorithm Polytope conditioning and linear convergence of the Frank-Wolfe algorithm Javier Peña Daniel Rodríguez December 24, 206 Abstract It is known that the gradient descent algorithm converges linearly when

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

arxiv: v1 [math.oc] 18 Mar 2016

arxiv: v1 [math.oc] 18 Mar 2016 Katyusha: Accelerated Variance Reduction for Faster SGD Zeyuan Allen-Zhu zeyuan@csail.mit.edu Princeton University arxiv:1603.05953v1 [math.oc] 18 Mar 016 March 18, 016 Abstract We consider minimizing

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Barzilai-Borwein Step Size for Stochastic Gradient Descent Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan Shiqian Ma Yu-Hong Dai Yuqiu Qian May 16, 2016 Abstract One of the major issues in stochastic gradient descent (SGD) methods is how

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence

More information

c 2015 Society for Industrial and Applied Mathematics

c 2015 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 5, No. 1, pp. 115 19 c 015 Society for Industrial and Applied Mathematics DUALITY BETWEEN SUBGRADIENT AND CONDITIONAL GRADIENT METHODS FRANCIS BACH Abstract. Given a convex optimization

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Global convergence of the Heavy-ball method for convex optimization

Global convergence of the Heavy-ball method for convex optimization Noname manuscript No. will be inserted by the editor Global convergence of the Heavy-ball method for convex optimization Euhanna Ghadimi Hamid Reza Feyzmahdavian Mikael Johansson Received: date / Accepted:

More information

A LINEARLY CONVERGENT CONDITIONAL GRADIENT ALGORITHM WITH APPLICATIONS TO ONLINE AND STOCHASTIC OPTIMIZATION

A LINEARLY CONVERGENT CONDITIONAL GRADIENT ALGORITHM WITH APPLICATIONS TO ONLINE AND STOCHASTIC OPTIMIZATION A LINEARLY CONVERGENT CONDITIONAL GRADIENT ALGORITHM WITH APPLICATIONS TO ONLINE AND STOCHASTIC OPTIMIZATION DAN GARBER AND ELAD HAZAN Abstract. Linear optimization is many times algorithmically simpler

More information

arxiv: v2 [math.oc] 1 Nov 2017

arxiv: v2 [math.oc] 1 Nov 2017 Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

arxiv: v1 [math.oc] 23 Jan 2019

arxiv: v1 [math.oc] 23 Jan 2019 Model Function Based Conditional Gradient Method with Armijo-like Line Search arxiv:1901.08087v1 [math.oc] 23 Jan 2019 Yura Malitsky and Peter Ochs Univeristy of Göttingen, Göttingen, Germany Saarland

More information