Sharpness, Restart and Compressed Sensing Performance.

Similar documents
SHARPNESS, RESTART AND ACCELERATION

An Optimal Affine Invariant Smooth Minimization Algorithm.

Tractable performance bounds for compressed sensing.

Hamiltonian Descent Methods

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

Adaptive Restarting for First Order Optimization Methods

A direct formulation for sparse PCA using semidefinite programming

Tractable Upper Bounds on the Restricted Isometry Constant

Sparse Covariance Selection using Semidefinite Programming

Newton s Method. Javier Peña Convex Optimization /36-725

A Conservation Law Method in Optimization

A direct formulation for sparse PCA using semidefinite programming

5. Subgradient method

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Primal-Dual Interior-Point Methods. Javier Peña Convex Optimization /36-725

Optimization methods

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

arxiv: v1 [math.oc] 10 Oct 2018

Strengthened Sobolev inequalities for a random subspace of functions

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Optimisation Combinatoire et Convexe.

Reconstruction from Anisotropic Random Measurements

Adaptive restart of accelerated gradient methods under local quadratic growth condition

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Composite nonlinear models at scale

Subgradient Method. Ryan Tibshirani Convex Optimization

Radial Subgradient Descent

Accelerated Proximal Gradient Methods for Convex Optimization

Optimization methods

SMOOTH OPTIMIZATION WITH APPROXIMATE GRADIENT. 1. Introduction. In [13] it was shown that smooth convex minimization problems of the form:

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

arxiv: v4 [math.oc] 5 Jan 2016

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Unconstrained minimization of smooth functions

Weak sharp minima on Riemannian manifolds 1

Sparse PCA with applications in finance

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Fast proximal gradient methods

Convex Optimization and l 1 -minimization

arxiv: v7 [math.oc] 22 Feb 2018

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

Lecture 14: October 17

Integration Methods and Optimization Algorithms

Optimal Value Function Methods in Numerical Optimization Level Set Methods

arxiv: v1 [math.oc] 1 Jul 2016

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Elaine T. Hale, Wotao Yin, Yin Zhang

Conditional Gradient (Frank-Wolfe) Method

Cubic regularization of Newton s method for convex problems with constraints

STAT 200C: High-dimensional Statistics

Sparse Solutions of an Undetermined Linear System

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Complexity bounds for primal-dual methods minimizing the model of objective function

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

A Differential Equation for Modeling Nesterov s Accelerated Gradient Method: Theory and Insights

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Homework 4. Convex Optimization /36-725

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Lasso: Algorithms and Extensions

Steepest descent method on a Riemannian manifold: the convex case

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

CSC 576: Variants of Sparse Learning

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

Nonlinear Optimization for Optimal Control

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

On well definedness of the Central Path

Sparse Optimization Lecture: Dual Methods, Part I

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

A gradient type algorithm with backward inertial steps for a nonconvex minimization

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Dual Proximal Gradient Method

Lecture 14: Newton s Method

Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization

Lecture Notes 9: Constrained Optimization

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/ɛ)

Accelerated primal-dual methods for linearly constrained convex problems

15. Conic optimization

Optimization Algorithms for Compressed Sensing

Solving Corrupted Quadratic Equations, Provably

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Largest dual ellipsoids inscribed in dual cones

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Learning with stochastic proximal gradient

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

On the von Neumann and Frank-Wolfe Algorithms with Away Steps

arxiv: v1 [math.oc] 10 May 2016

Minimizing the Difference of L 1 and L 2 Norms with Applications

arxiv: v2 [math.oc] 11 Dec 2018

Phase Transition Phenomenon in Sparse Approximation

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

An Introduction to Sparse Approximation

Transcription:

Sharpness, Restart and Compressed Sensing Performance. Alexandre d Aspremont, CNRS & D.I., Ecole normale supérieure. With Vincent Roulet (U. Washington) and Nicolas Boumal (Princeton U.). Support from ERC SIPA. Alex d Aspremont Newton Institute, January 2018. 1/27

Introduction Statistical performance vs. computational complexity. Clear empirical link between statistical performance and computational complexity. Quantities describing computational complexity lack statistical meaning. Today: Two minor enigmas... Alex d Aspremont Newton Institute, January 2018. 2/27

Introduction Fast solution of l 1 -norm minimization problems when the solution may be sparse by [Donoho and Tsaig, 2008]. Figure 3: Computational Cost of Homotopy. Panel (a) shows the operation count as a fraction of one least-squares solution on a - grid, with n = 1000. Panel (b) shows the number of iterations as a fraction of d = n. The superimposed dashed curve depicts the curve W, below which Homotopy recovers the sparsest solution with high probability. 1. k-step region. When (d, n, k) are such that the k-step property holds, then Homotopy successfully recovers the sparse solution in k steps, as suggested by Empirical Findings 1 and 2. Otherwise; First enigma: Phase transition for computation and recovery match... Alex d Aspremont Newton Institute, January 2018. 3/27 2. k-sparse region. When (d, n, k) are such that `1 minimization correctly recovers k-sparse

Introduction Templates for convex cone problems with applications to sparse signal recovery. (TFOCS) by [Becker, Candès, and Grant, 2011b]. 10 1 10 2 All variants, fixed and backtracking steps, no restart AT LLM TS N07 N83 GRA 10 1 10 2 10 3 AT method, various restart strategies fixed step no restart restart=100 restart=200 restart=400 kxk x? µ k/kx? µ k 10 3 10 4 kxk x? µk/kx? µk 10 4 10 5 10 6 10 5 10 7 10 8 10 6 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 calls to A and A 10 9 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 calls to A and A Figure 6: Comparing first order methods applied to a smoothed Dantzig selector model. Left: comparing all variants using a fixed step size (dashed lines) and backtracking line search (solid lines). Right: comparing various restart strategies using the AT method. Second enigma: Restarting yields linear convergence... Alex d Aspremont Newton Institute, January 2018. 4/27

Outline Today. Sharpness Optimal restart schemes, adaptation Compressed Sensing Performance Numerical results Alex d Aspremont Newton Institute, January 2018. 5/27

Sharpness Consider minimize subject to where f(x) is a convex function, Q R n. f(x) x Q Assume f is Hölder continuous, f(x) f(y) L x y s 1, for every x, y R n, Assume sharpness, i.e. µd(x, X ) r f(x) f, for every x K, where f is the minimum of f, K R n is a compact set, d(x, X ) the distance from x to the set X K of minimizers of f, and r 1, µ > 0 are constants. Alex d Aspremont Newton Institute, January 2018. 6/27

Sharpness, Restart Strong convexity is a particular case of sharpness. µd(x, X ) 2 f(x) f If f is also smooth, an optimal algorithm (ignoring strong convexity), will produce a point x satisfying after t iterations. f(x) f cl t 2 d(x 0, X ) 2, Restarting the algorithm, we thus get f(x k+1 ) f cl µt 2 k (f(x k ) f ), k = 1,..., N at each outer iteration, after t k inner iterations. Restart yields linear convergence, without explicitly modifying the algorithm. Alex d Aspremont Newton Institute, January 2018. 7/27

Sharpness Smoothness is classical [Nesterov, 1983, 2005], sharpness less so... µd(x, X ) r f(x) f, for every x K. Real analytic functions all satisfy this locally, a result known as Lojasiewicz s inequality [Lojasiewicz, 1963]. Generalizes to a much wider class of non-smooth functions [Lojasiewicz, 1993, Bolte et al., 2007] Conditions of this form are also known as sharp minimum, error bound, etc. [Polyak, 1979, Burke and Ferris, 1993, Burke and Deng, 2002]. 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0-0.5 0 0.5 0-1 0 1 0-1 0 1 The functions x, x 2 and exp( 1/x 2 ). Alex d Aspremont Newton Institute, January 2018. 8/27

Sharpness & Smoothness Gradient f Hölder continuous ensures f(x) f L s d(x, X ) s, an upper bound on suboptimality. If in addition f sharp on a set K with parameters (r, µ), we have sµ rl d(x, X ) s r hence s r. In the following, we write κ L 2 s/µ 2 r and τ 1 s r If r = s = 2, κ matches the classical condition number of the function. Alex d Aspremont Newton Institute, January 2018. 9/27

Sharpness & Complexity Restart schemes were studied for strongly or uniformly convex functions [Nemirovskii and Nesterov, 1985, Nesterov, 2007, Iouditski and Nesterov, 2014, Lin and Xiao, 2014] In particular, Nemirovskii and Nesterov [1985] link sharpness with (optimal) faster convergence rates using restart schemes. Weaker versions of this strict minimum condition used more recently in restart schemes by [Renegar, 2014, Freund and Lu, 2015]. Several heuristics [O Donoghue and Candes, 2015, Su et al., 2014, Giselsson and Boyd, 2014] studied adaptive restart schemes to speed up convergence. The robustness of restart schemes was also studied by Fercoq and Qu [2016] in the strongly convex case. Sharpness used to prove linear converge matrix games by Gilpin et al. [2012]. Alex d Aspremont Newton Institute, January 2018. 10/27

Restart schemes Algorithm 1 Scheduled restarts for smooth convex minimisation (RESTART) Inputs : x 0 R n and a sequence t k for k = 1,..., R. for k = 1,..., R do end for Output : ˆx := x R x k := A(x k 1, t k ) Here, the number of inner iterations t k satisfies t k = Ce αk, k = 1,..., R. for some C > 0 and α 0 and will ensure f(x k ) f νe γk. Alex d Aspremont Newton Institute, January 2018. 11/27

Restart schemes Proposition [Roulet and A., 2017] Restart. Let f be a smooth convex function with parameters (2, L), sharp with parameters (r, µ) on a set K. Restart with iteration schedule t k = Cκ,τe τk, for k = 1,..., R, where C κ,τ e 1 τ (cκ) 1 2(f(x 0 ) f ) τ 2, with c = 4e 2/e here. The precision reached at the last point ˆx is given by, f(ˆx) f e 2e 1 (cκ) 1 2N (f(x 0 ) f ) = O ) (exp( κ 1 2 N), when τ = 0, while, f(ˆx) f f(x 0 ) f ( ) τe 1 (f(x 0 ) f ) τ 2 = O τ 2(cκ) 1 2N + 1 ( ) N τ 2, when τ > 0, where N = R k=1 t k is the total number of iterations. Alex d Aspremont Newton Institute, January 2018. 12/27

Adaptation The sharpness constant µ and exponent r in are of course never observed. µd(x, X ) r f(x) f, for every x K. Can we make restart schemes adaptive? Otherwise, sharpness is useless... Solves robustness problem for accelerated methods on strongly convex functions. Alex d Aspremont Newton Institute, January 2018. 13/27

Adaptation Proposition [Roulet and A., 2017] Adaptation. Assume N 2Cκ,τ, and if 1 N > τ > 0, C κ,τ > 1. If τ = 0, there exists i [1,..., log 2 N ] such that scheme S i,0 precision given by ) f(ˆx) f exp ( e 1 (cκ) 1 2 N (f(x 0 ) f ). achieves a If τ > 0, there exist i [1,..., log 2 N ] and j [1,..., log 2 N ] such that scheme S i,j achieves a precision given by f(ˆx) f f(x 0 ) f ( ) τe 1 (cκ) 1 2(f(x 0 ) f ) τ 2. τ 2(N 1)/4 + 1 Overall, running the logarithmic grid search has a complexity (log 2 N) 2 times higher than running N iterations using the optimal (oracle) scheme. Alex d Aspremont Newton Institute, January 2018. 14/27

Hölder smooth case The generic Hölder smooth case s 2 is harder. When f is smooth with parameters (s, L) and s 2, the restart scheme is more complex. The universal fast gradient method in [Nesterov, 2015], outputs after t iterations a point x U(x 0, ɛ, t), such that ( ) f(x) f ɛ 2 + cl 2 sd(x 0, X ) 2 ɛ 2 st 2ρ s ɛ 2, where c is a constant (c = 8) and ρ 3s 2 1 is the optimal rate of convergence for s-smooth functions. Contrary to the case s = 2 above, we need to schedule both the target accuracy ɛ k used by the algorithm and the number of iterations t k. We lose adaptivity when s 2. Alex d Aspremont Newton Institute, January 2018. 15/27

Outline Sharpness Optimal restart schemes, adaptation Compressed Sensing Performance Numerical results Alex d Aspremont Newton Institute, January 2018. 16/27

Compressed Sensing Sparse Recovery. Given A R n p and observations b = Ax on x R p, solve minimize x 1 subject to Ax = b (l 1 recovery) in the variable x R p. Definition [Cohen et al., 2009] Nullspace Property. The matrix A satisfies the Null Space Property (NSP) on support S [1, p] with constant α 1 if for any z Null(A) \ {0}, α z S 1 < z S c 1. (NSP) The matrix A satisfies the Null Space Property at order s with constant α 1 if it satisfies it on every support S of cardinality at most s. Alex d Aspremont Newton Institute, January 2018. 17/27

NSP & Shaprness Sharpness. Sharpness for l 1 -recovery of a sparse signals x means x 1 x 1 > γ x x 1 (Sharp) for any x x such that Ax = b, and some 0 γ < 1. Proposition [Roulet, Boumal, and A., 2017] NSP & Sharpness. Given a coding matrix A R n p satisfying (NSP) at order s with constant α 1, if the original signal x is s-sparse, then for any x R p satisfying Ax = b, x x, we have x 1 x 1 > α 1 α + 1 x x 1. This implies signal recovery, i.e. optimality of x for (l 1 recovery), and the sharpness bound (Sharp) with γ = (α 1)/(α + 1). Alex d Aspremont Newton Institute, January 2018. 18/27

Computational Complexity Restart scheme. Restart Scheme (Restart) Input: y 0 R p, gap ɛ 0 y 0 1 ˆx 1, decreasing factor ρ, restart clock t For k = 1..., K compute ɛ k = ρɛ k 1, y k = A(y k 1, ɛ k, t) (NESTA) (Restart) Output: A point ŷ = y K approximately solving (l 1 recovery). Restart NESTA by [Becker et al., 2011a] with geometrically increasing precision targets. Alex d Aspremont Newton Institute, January 2018. 19/27

Computational Complexity Proposition [Roulet, Boumal, and A., 2017] Complexity. Given a Gaussian design matrix A R n p and a signal x with sparsity s < n/(c 2 log p), the optimal (Restart) scheme outputs a point ŷ such that ( ( ) ) s log p ŷ 1 x e 1 exp 1 c n 2 p N ɛ 0, with high probability, where c > 0 is a universal constant and N is the total number of iterations. The iteration complexity of solving the (l 1 recovery) problem is controlled by the oversampling ratio n/s. Directly generalizes to other decomposable norms. Similar result involving Renegar s condition number and cone restricted eigenvalues. Alex d Aspremont Newton Institute, January 2018. 20/27

Outline Sharpness Optimal restart schemes, adaptation Compressed Sensing Performance Numerical results Alex d Aspremont Newton Institute, January 2018. 21/27

Numerical results Sample l 1 -recovery problems. Solve using NESTA and restart scheme. minimize x 1 subject to Ax = b Generate random design matrix A R n p with i.i.d. Gaussian coefficients. Normalize A so that AA T = I (to fit NESTA s format) Generate observations b = Ax where x R p is an s-sparse vector whose nonzero coefficients are all ones. Alex d Aspremont Newton Institute, January 2018. 22/27

Numerical results 10 0 f(xt) f 10-5 10-10 Plain NESTA low Plain NESTA high Restart Fair restart 0 100 200 300 400 500 Inner iterations Best restarted NESTA (solid red line), overall cost of the adaptive restart scheme (dashed red line) versus plain NESTA implementation with low accuracy ɛ = 10 1 (dotted black line), and higher accuracy ɛ = 10 3 (dash-dotted black line). Total budget of 500 iterations. Alex d Aspremont Newton Institute, January 2018. 23/27

Numerical results 10 0 10 0 f(xt) f 10-2 10-4 10-6 NESTA Continuation Restart Fair restart 10-8 0 50 100 150 200 Inner iterations f(xt) f 10-2 10-4 10-6 NESTA Continuation Restart Fair restart 10-8 0 50 100 150 200 Inner iterations Best restarted NESTA (solid red line) and overall cost of the practical restart schemes (dashed red line) versus NESTA with 5 continuation/restart steps (dotted blue line) for a total budget of 500 iterations. Crosses at restart occurrences. Left: n = 200. Right : n = 300. Alex d Aspremont Newton Institute, January 2018. 24/27

Numerical results 10 0 10 0 f(xk) f(x ) τ 10-5 10-10 τ = 5 τ = 10 τ = 20 0 100 200 300 400 500 Inner iterations f(xk) f(x ) τ 10-5 10-10 τ = 5 τ = 10 τ = 20 0 100 200 300 400 500 Inner iterations Best restart scheme found by grid search for increasing values of the oversampling ratio τ = n/s. Left: Constant sparsity s = 20. Right: constant number of samples n = 200. Alex d Aspremont Newton Institute, January 2018. 25/27

Numerical results Number of samples n 100 200 400 Time in seconds for f(x t ) f < 10 2 5.07 10 2 3.07 10 2 1.66 10 2 Time to achieve ɛ = 10 2 by the best restart scheme for increasing number of samples n More data less work (ignoring cost of adaptation). Alex d Aspremont Newton Institute, January 2018. 26/27

Conclusion Sharpness holds generically. Restarting then accelerates convergence, cost of adaptation is marginal. Shows better conditioned recovery problems are faster to solve. Open problems. Adaptation in generic Hölder gradient case. Optimal algorithm for sharp problems without restart. Local adaptation to sharpness. Alex d Aspremont Newton Institute, January 2018. 27/27

* References Stephen Becker, Jérôme Bobin, and Emmanuel J Candès. Nesta: a fast and accurate first-order method for sparse recovery. SIAM Journal on Imaging Sciences, 4(1):1 39, 2011a. Stephen R Becker, Emmanuel J Candès, and Michael C Grant. Templates for convex cone problems with applications to sparse signal recovery. Mathematical Programming Computation, 3(3):165 218, 2011b. Jérôme Bolte, Aris Daniilidis, and Adrian Lewis. The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization, 17(4):1205 1223, 2007. James Burke and Sien Deng. Weak sharp minima revisited part i: basic theory. Control and Cybernetics, 31:439 469, 2002. JV Burke and Michael C Ferris. Weak sharp minima in mathematical programming. SIAM Journal on Control and Optimization, 31(5): 1340 1359, 1993. A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. Journal of the AMS, 22(1):211 231, 2009. David L Donoho and Yaakov Tsaig. Fast solution of l 1 -norm minimization problems when the solution may be sparse. Information Theory, IEEE Transactions on, 54(11):4789 4812, 2008. Olivier Fercoq and Zheng Qu. Restarting accelerated gradient methods with a rough strong convexity estimate. arxiv preprint arxiv:1609.07358, 2016. Robert M Freund and Haihao Lu. New computational guarantees for solving convex optimization problems with first order methods, via a function growth condition measure. arxiv preprint arxiv:1511.02974, 2015. Andrew Gilpin, Javier Pena, and Tuomas Sandholm. First-order algorithm with O(log 1/ɛ) convergence for ɛ-equilibrium in two-person zero-sum games. Mathematical programming, 133(1-2):279 298, 2012. Pontus Giselsson and Stephen Boyd. Monotonicity and restart in fast gradient methods. In 53rd IEEE Conference on Decision and Control, pages 5058 5063. IEEE, 2014. Anatoli Iouditski and Yuri Nesterov. Primal-dual subgradient methods for minimizing uniformly convex functions. arxiv preprint arxiv:1401.1792, 2014. Qihang Lin and Lin Xiao. An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization. In ICML, pages 73 81, 2014. Stanislas Lojasiewicz. Sur la géométrie semi-et sous-analytique. Annales de l institut Fourier, 43(5):1575 1595, 1993. Stanislaw Lojasiewicz. Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles, pages 87 89, 1963. Alex d Aspremont Newton Institute, January 2018. 28/27

AS Nemirovskii and Yu E Nesterov. Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics, 25(2):21 30, 1985. Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k 2 ). Soviet Mathematics Doklady, 27(2): 372 376, 1983. Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127 152, 2005. Y. Nesterov. Gradient methods for minimizing composite objective function. CORE DP2007/96, 2007. Yu Nesterov. Universal gradient methods for convex optimization problems. Mathematical Programming, 152(1-2):381 404, 2015. Brendan O Donoghue and Emmanuel Candes. Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics, 15(3):715 732, 2015. BT Polyak. Sharp minima institute of control sciences lecture notes, moscow, ussr, 1979. In IIASA workshop on generalized Lagrangians and their applications, IIASA, Laxenburg, Austria, 1979. James Renegar. Efficient first-order methods for linear programming and semidefinite programming. arxiv preprint arxiv:1409.5832, 2014. Vincent Roulet and Alexandre d Aspremont. Sharpness, restart and acceleration. ArXiv preprint arxiv:1702.03828, 2017. Vincent Roulet, Nicolas Boumal, and Alexandre d Aspremont. Computational complexity versus statistical performance on sparse recovery problems. arxiv preprint arxiv:1506.03295, 2017. Weijie Su, Stephen Boyd, and Emmanuel Candes. A differential equation for modeling nesterov s accelerated gradient method: Theory and insights. In Advances in Neural Information Processing Systems, pages 2510 2518, 2014. Alex d Aspremont Newton Institute, January 2018. 29/27