References. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization

Similar documents
arxiv: v1 [math.oc] 18 Mar 2016

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

ONLINE VARIANCE-REDUCING OPTIMIZATION

SVRG++ with Non-uniform Sampling

Optimization for Machine Learning

A Universal Catalyst for Gradient-Based Optimization

Optimization for Machine Learning (Lecture 3-B - Nonconvex)

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization

arxiv: v1 [math.oc] 5 Feb 2018

Finite-sum Composition Optimization via Variance Reduced Gradient Descent

Lecture 3: Minimizing Large Sums. Peter Richtárik

Adaptive Negative Curvature Descent with Applications in Non-convex Optimization

arxiv: v3 [math.oc] 23 Jan 2018

ON THE INEFFECTIVENESS OF VARIANCE REDUCED OPTIMIZATION FOR DEEP LEARNING

Fast Stochastic Optimization Algorithms for ML

Accelerating SVRG via second-order information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

An Estimate Sequence for Geodesically Convex Optimization

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Stochastic optimization: Beyond stochastic gradients and convexity Part I

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Nesterov s Acceleration

Variance-Reduced and Projection-Free Stochastic Optimization

arxiv: v1 [stat.ml] 20 May 2017

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Barzilai-Borwein Step Size for Stochastic Gradient Descent

On the Iteration Complexity of Oblivious First-Order Optimization Algorithms

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm

arxiv: v2 [math.oc] 1 Nov 2017

SEGA: Variance Reduction via Gradient Sketching

arxiv: v2 [cs.lg] 14 Sep 2017

Stochastic Gradient Descent with Variance Reduction

arxiv: v1 [math.oc] 7 Jun 2018

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms

Big Data Analytics: Optimization and Randomization

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Provable Non-Convex Min-Max Optimization

COR-OPT Seminar Reading List Sp 18

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

On Gradient-Based Optimization: Accelerated, Stochastic, Asynchronous, Distributed

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Composite nonlinear models at scale

ON THE INSUFFICIENCY OF EXISTING MOMENTUM

Stochastic Dual Coordinate Ascent with Adaptive Probabilities

Stochastic and online algorithms

Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Large-scale Stochastic Optimization

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning

On the Local Minima of the Empirical Risk

Trade-Offs in Distributed Learning and Optimization

Stochastic Cubic Regularization for Fast Nonconvex Optimization

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

arxiv: v2 [math.oc] 5 Nov 2017

Solving Ridge Regression using Sketched Preconditioned SVRG

A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Stochastic Optimization

Coordinate Descent Faceoff: Primal or Dual?

SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization

arxiv: v1 [math.oc] 9 Oct 2018

arxiv: v2 [stat.ml] 16 Jun 2015

STA141C: Big Data & High Performance Statistical Computing

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Stochastic gradient methods for machine learning

Accelerating Stochastic Gradient Descent for Least Squares Regression

A Conservation Law Method in Optimization

arxiv: v2 [stat.ml] 3 Jun 2017

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

A Greedy Framework for First-Order Optimization

On Computational Thinking, Inferential Thinking and Data Science

A Stochastic PCA Algorithm with an Exponential Convergence Rate. Ohad Shamir

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Barzilai-Borwein Step Size for Stochastic Gradient Descent

ECS289: Scalable Machine Learning

Adaptive restart of accelerated gradient methods under local quadratic growth condition

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

arxiv: v3 [math.oc] 1 Mar 2018

Tight Complexity Bounds for Optimizing Composite Objectives

Stochastic gradient methods for machine learning

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Linear Convergence under the Polyak-Łojasiewicz Inequality

Stochastic Gradient Descent with Only One Projection

The proximal point method revisited

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

arxiv: v1 [cs.lg] 10 Jan 2019

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Transcription:

References --- a tentative list of papers to be mentioned in the ICML 2017 tutorial Recent Advances in Stochastic Convex and Non-Convex Optimization Disclaimer: in a quite arbitrary order. 1. [ShalevShwartz-Zhang, 2013]: AccSDCA Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization by Shai Shalev-Shwartz, Tong Zhang 2. [Lin-Lu-Xiao, 2014]: APCG An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization by Qihang Lin, Zhaosong Lu, Lin Xiao 3. [BenTal-Nemirovski, 2013] Lectures on Modern Convex Optimization By Aharon Ben-Tal and Arkadi Nemirovski 4. [Nemirovski, 2005] Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. By Arkadi Nemirovski 5. [Chambolle-Pock, 2011] A first-order primal-dual algorithm for convex problems with applications to imaging By Antonin Chambolle and Thomas Pock 6. [LeRoux-Schmidt-Bach, 2012]: SAG A stochastic gradient method with an exponential convergence rate for finite training sets. By Nicolas Le Roux, Mark Schmidt, and Francis R. Bach 7. [Johnson-Zhang, 2013]: SVRG Accelerating stochastic gradient descent using predictive variance reduction. By Rie Johnson and Tong Zhang 8. [Defazio-Bach-LacosteJulien, 2014]: SAGA SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.

By Defazio, A., Bach, F., & Lacoste-Julien, S. 9. [ShalevShwartz-Zhang, 2012]: SDCA Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization By Shai Shalev-Shwartz and Tong Zhang 10. [ShalevShwartz-Zhang, 2014]: AccSDCA Accelerated Proximal Stochastic Dual Coordinate As- cent for Regularized Loss Minimization By Shai Shalev-Shwartz and Tong Zhang 11. [Nesterov 2012]: RCDM/ACDM Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems. By Yurii Nesterov. 12. [Lee-Sidford, 2013]: ACDM (better proof) Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. By Yin Tat Lee and Aaron Sidford 13. [AllenZhu-Qu-Richtarik-Yuan, 2016]: NUACDM Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling By Zeyuan Allen-Zhu, Zheng Qu, Peter Richtarik and Yang Yuan. 14. [Zhang-Xiao, 2015]: SPDC Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization By Yuchen Zhang and Lin Xiao. 15. [Lan-Zhou, 2015]: RPDG An optimal randomized incremental gradient method By Guanghui Lan and Yi Zhou. 16. [Frostig-Ge-Kakade-Sidford, 2015]: APPA Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization By Roy Frostig, Rong Ge, Sham M. Kakade, and Aaron Sidford 17. [Lin-Mairal-Harchaoui, 2015]: Catalyst A universal catalyst for first-order optimization By Hongzhou Lin, Julien Mairal and Zaid Harchaoui 18. [Zhang-Mahdavi-Jin, 2013] Linear convergence with condition number independent access of full gradients. By Lijun Zhang, Mehrdad Mahdavi, and Rong Jin. 19. [AllenZhu-Yuan, 2015] SVRG++

Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives By Zeyuan Allen-Zhu and Yang Yuan 20. [AllenZhu-Hazan 2016b]: reduction Optimal Black-Box Reductions Between Optimization Objectives By Zeyuan Allen-Zhu and Elad Hazan 21. [AllenZhu-Hazan 2016a]: non-convex SVRG Variance Reduction for Faster Non-Convex Optimization By Zeyuan Allen-Zhu and Elad Hazan 22. [Mahdavi-Zhang-Jin 2013] Mixed optimization for smooth functions By Mehrdad Mahdavi, Lijun Zhang, and Rong Jin. 23. [Lei-Jordan, 2017] Less than a Single Pass: Stochastically Controlled Stochastic Gradient By Lihua Lei and Michael I. Jordan. 24. [Lei-Ju-Chen-Jordan, 2017] Nonconvex Finite-Sum Optimization Via SCSG Methods By Lihua Lei, Cheng Ju, Jianbo Chen, and Michael I. Jordan. 25. [Nesterov 2005] Smooth minimization of non-smooth functions By Yurii Nesterov 26. [Su-Boyd-Candes, 2014] A differential equation for modeling nesterovs accelerated gradient method: Theory and insights By Weijie Su, Stephen Boyd, Emmanuel J. Candès 27. [Bubeck-Lee-Singh, 2015] A geometric alternative to Nesterov's accelerated gradient descent By Sébastien Bubeck, Yin Tat Lee, Mohit Singh 28. [Wibisono-Wilson-Jordan, 2016] A Variational Perspective on Accelerated Methods in Optimization By Andre Wibisono, Ashia C. Wilson, Michael I. Jordan 29. [AllenZhu-Orecchia, 2014] Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent By Zeyuan Allen-Zhu and Lorenzo Orecchia. 30. [Nitanda, 2014]

Stochastic proximal gradient descent with acceleration techniques. By Atsushi Nitanda. 31. [AllenZhu, 2016] Katyusha: The First Direct Acceleration of Stochastic Gradient Methods By Zeyuan Allen-Zhu 32. [Woodworth-Srebro, 2016] Tight Complexity Bounds for Optimizing Composite Objectives By Blake Woodworth and Nathan Srebro 33. [Frostig-Musco-Musco-Sidford, 2016]: PCR Principal Component Projection Without Principal Component Analysis By Roy Frostig, Cameron Musco, Christopher Musco and Aaron Sidford. 34. [AllenZhu-Li, 2017]: PCR Faster Principal Component Regression and Stable Matrix Chebyshev Approximation By Zeyuan Allen-Zhu and Yuanzhi Li 35. [Shibagaki-Takeuchi, 2017]: mini-batch SPDC Stochastic primal dual coordinate method with nonuniform sampling based on optimality violations By Atsushi Shibagaki and Ichiro Takeuchi. 36. [Murata-Suzuki, 2017]: DASVRDA Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization By Tomoya Murata and Taiji Suzuki 37. [Garber et al, 2015]: shift and invert (two papers about the same result) Faster Eigenvector Computation via Shift-and-Invert Preconditioning By Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford [Garber-Hazan, 2015] Fast and Simple PCA via Convex Optimization By Dan Garber and Elad Hazan 38. [AllenZhu-Li, 2016]: LazySVD LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain By Zeyuan Allen-Zhu and Yuanzhi Li. 39. [Sun-Luo, 2014] Guaranteed Matrix Completion via Non-convex Factorization By Ruoyu Sun and Zhi-Quan Luo

40. [Arora-Ge-Ma-Moitra, 2015] Simple, Efficient, and Neural Algorithms for Sparse Coding By Sanjeev Arora, Rong Ge, Tengyu Ma and Ankur Moitra 41. [Chen-Candès, 2015] Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems By Yuxin Chen, Emmanuel J. Candès 42. [Nesterov, 2004]: textbook Introductory Lectures on Convex Programming Volume: A Basic course By Yurii Nesterov 43. [Li-Yuan, 2017] Convergence Analysis of Two-layer Neural Networks with ReLU Activation By Yuanzhi Li and Yang Yuan 44. [ShalevShwartz, 2015] SDCA without Duality By Shai Shalev-Shwartz 45. [Reddi et al., 2016a] Stochastic variance reduction for nonconvex optimization By Sashank J Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, and Alex Smola 46. [Reddi et al., 2016b] Fast incremental method for nonconvex optimization Sashank J Reddi, Suvrit Sra, Barnabás Póczos, and Alex Smola. 47. [Reddi et al., 2016c] Fast Stochastic Methods for Nonsmooth Nonconvex Optimization Sashank J Reddi, Suvrit Sra, Barnabás Póczos, and Alex Smola. 48. [Lei-Ju-Chen-Jordan, 2017] Nonconvex Finite-Sum Optimization Via SCSG Methods By Lihua Lei, Cheng Ju, Jianbo Chen, and Michael I. Jordan 49. [Carmon-Duchi-Hinder-Sidford, 2016] Accelerated Methods for Non-Convex Optimization By Yair Carmon, John C. Duchi, Oliver Hinder and Aaron Sidford 50. [AllenZhu, 2017] Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter By Zeyuan Allen-Zhu

51. [Carmon- Duchi-Hinder-Sidford, 2017] "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions By Yair Carmon, John C. Duchi, Oliver Hinder and Aaron Sidford 52. [Ge-Huang-Jin-Yuan, 2015] Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition By Rong Ge, Furong Huang, Chi Jin, Yang Yuan 53. [Jin et al., 2017] How to Escape Saddle Points Efficiently By Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan 54. [Agarwal et al., 2016] Finding Approximate Local Minima Faster Than Gradient Descent By Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma