Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
|
|
- April Hudson
- 5 years ago
- Views:
Transcription
1 Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi
2 Problem Minimize the convex function min f (x) + g(x) + h(mx) x Rn f, g, h convex f is differentiable prox τ,g and prox σ,h are easy to compute prox τ,g (y) = arg min g(x) + 1 n x R n 2 i=1 M : R n R m is linear 1 τ i (x (i) y (i) ) 2 } {{ } 1 2 y 2 τ 1 Equivalent to saddle point problem if 0 ri(mdom g dom h) min x R n max y R m f (x) + g(x) h (y) + Mx, y 2 25 March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
3 Examples: Lasso A is a dictionary b is a vector we want to explain Solution x: a sparse representation of b 1 min x R n 2 Ax b λ x 1 f (x) = 1 2 Ax b 2 2 g(x) = λ x 1 h(x) = March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
4 Examples: L 1 + TV regularized regression Each line of A: a 3D image of brains b j : the result of experiment j x: which regions explain best the result of the experiments? 1 min x R n 2 Ax b α 1 x 1 + α 2 x 2,1 f (x) = 1 2 Ax b 2 2 g(x) = α 1 x 1 = discrete gradient h(y) = α 2 y 2,1 = α 2 n i=1 y 2 i,1 + y 2 i,2 + y 2 i, March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
5 Examples: dual of Support Vector Machines ( 1 m n ) 2 min b x R n 2λn 2 i A ji x (i) 1 n x (i) n j=1 st : x [0, 1] n b T x = 0 i=1 1.5 i=1 f (x) = ( n i=1 b i A ji x (i) ) 2 e x 2λn 2 n g(x) = I [0,1] n(x) h(y) = I b (y) Mx = x March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
6 Part 1: Classical coordinate descent min f (x) + g(x) x Rn h = 0 g is separable f has a coordinate-wise Lipschitz gradient: x R n, i {1,..., n}, t R, i f (x + te i ) i f (x) β i t 6 25 March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
7 At iteration k: Coordinate descent 1. Choose randomly a coordinate i k+1 2. x k+1 = prox τ,g ( xk τ f (x k ) ) 3. Update: x k+1 = { x i k+1 if i = i k+1 x i k if i i k+1 Remarks As g is separable, one only needs to compute x i k+1 k+1 Convergence if τ i < 2 β i for all i [Richtárik, Takáč, 2011] 7 25 March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
8 Part 2: Stochastic primal-dual coordinate descent min f (x) + g(x) + h(mx) x Rn f is L( f )-Lipschitz g and h need not be separable M R m n 8 25 March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
9 Vũ-Condat s algorithm [Vũ + Condat, 2013] ( ) y k+1 = prox σh yk + σmx k ( x k+1 = prox τg xk τm (2y k+1 y k ) τ f (x k ) ) Generalizes Chambolle-Pock, ADMM and proximal gradient Converges to a saddle point of the Lagrangian 1 Convergence as soon as τ < L( f ) + σ M 2 2 (fixed point of a firmly nonexpansive operator) 9 25 March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
10 Coordinate descent for firmly nonexpansive operators [Bianchi, Hachem & Iutzeler + Combettes & Pesquet, 2014] At iteration k: 1. Choose randomly a (block of) coordinate i k+1 2. z k+1 = T (z k ) 3. Update: z k+1 = { z i k+1 if i = i k+1 z i k if i i k+1 Convergence: if T = αr + (1 α)i, where α (0, 1) and R is nonexpansive in a separable norm March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
11 Primal-dual coordinate descent [Bianchi, Hachem & Iutzeler, 2014] We take Vũ-Condat s fixed point operator ȳ {}}{ ( ) ( ( ) ) y prox σh y + σmx T = ( x prox τg x τm (2ȳ y) τ f (x) ) Assume that M is block diagonal and define blocks of primal-dual variables z = (y, x) accordingly { T i (z k ) if i = i k+1 The algorithm is: z k+1 = zk i if i i k+1 1 Convergence as soon as τ < L( f ) + σ M March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
12 Duplication trick M 1,1 0 0 M 1,1 M 1,2 0 0 M 1,2 0 M = 0 M 2,2 0 K = 0 M 2,2 0 M 3,1 M 3,2 M 3,3 M 3, M 3, M 3, Define S = and D(m) = We have D(m)SK = M. We define h = h (D(m)S) : min x R n f (x) + g(x) + h(kx) = min x R n f (x) + g(x) + h(mx) prox mσ,h (y) = (1 m1 prox (1) σ,h (S(y)),..., 1 mp prox (p) σ,h (S(y))) March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
13 Part 3: Stochastic primal-dual coordinate descent with long steps March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
14 Comparison of coordinate descent algorithms Classical Primal-dual f (x) + n i=1 g i(x i ) f (x) + g(x) + h(kx) g separable g and h non-separable M: additional coupling i f (x + te i ) i f (x) β i t Bases on operator T : Longer steps β = L( f ) Speed in O(1/k) Speed in O(1/k) March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
15 Combine advantages of both approaches min f (x) + g(x) + h(mx) x Rn f has a coordinate-wise Lipschitz gradient: x R n, i {1,..., n}, t R, i f (x + te i ) i f (x) β i t g and h need not be separable M R m n March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
16 New algorithm [Suppose that M is block diagonal (duplication trick available)] At iteration k: 1. y k+1 = prox σ,h (y k + D(σ)Mx k ) x k+1 = prox τ,g (x k D(τ)( f (x k ) + M (2y k+1 y k ))) 2. For i = i k+1 and j : M j,ik+1 0, update: x i k+1 = x i k+1 y j k+1 = y j k+1 3. Otherwise, set x i k+1 = x i k, y j k+1 = y j k March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
17 Example: Iterates need not be feasible f (x) = 1 2 { x 2 0 if x 1 = x 2 g(x) = + if x 1 x 2 h = 0 Start with x 0 = [1, 1] (feasible): dom g x March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
18 Iterates need not be feasible Example: f (x) = 1 2 { x 2 0 if x 1 = x 2 g(x) = + if x 1 x 2 h = 0 Start with x 0 = [1, 1] (feasible): x 1 = [0, 0] Let i 1 = 1 dom g + x1 x March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
19 Example: Iterates need not be feasible f (x) = 1 2 { x 2 0 if x 1 = x 2 g(x) = + if x 1 x 2 h = 0 Start with x 0 = [1, 1] (feasible): x 1 = [0, 0] Let i 1 = 1, we get x 1 = [0, 1] (unfeasible) dom g x x1 x March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
20 Distance to optimum may not change f (x) = 1 2 (x 1 + x 2 1) 2, L( f ) = 2, β i = 1 x x 0 x March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
21 Distance to optimum may not change f (x) = 1 2 (x 1 + x 2 1) 2, L( f ) = 2, β i = 1 x x 0 x 1 E[ x 1 x 2 ] = 1 2 = x 0 x March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
22 Distance to optimum may increase f (x) = 1 2 (x 1 + x 2 + x 3 1) 2, L( f ) = 3, β i = 1 x x 0 x 1 E[ x 1 x 2 ] = 2 3 > x 0 x 2 = March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
23 Distance to optimum may increase a lot f (x) = 1 2 ( n x i 1) 2, L( f ) = n, β i = 1 i=1 x x 0 x 1 E[ x 1 x 2 ] = n n 1 x 0 x 2 = 1 n March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
24 Theorem Convergence If for all i {1,..., n}, P(i k+1 = i) = 1/n and τ i < 1 ( ) β i + ρ j J(i) σ jmj,i M j,i then there exists a saddle point (x, y ) of the Lagrangian L(x, y) = f (x) + g(x) + Ax, y h (y) such that lim (x k, y k ) = (x, y ) k Proof: A stochastic Lyapunov function is S k = f (x k ) f (x ) f (x ), x k x z k z 2 P March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
25 L 1 + TV regularized least squares min x R n 1 2 Ax b α(r x 1 + (1 r) x 2,1 ) A: ,280 dense matrix : 195,840 65,280 sparse matrix (3D gradient) FISTA: alpha=0.001,l1_ratio=0.5,nvoxels= FISTA: alpha=0.1,l1_ratio=0.1,nvoxels= FISTA: alpha=0.01,l1_ratio=0.5,nvoxels= FISTA: alpha=0.1,l1_ratio=0.9,nvoxels= March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
26 Comparison of algorithms on L 1 + TV regularized least squares l1_ratio=0.1 l1_ratio=0.5 l1_ratio= alpha= alpha= Chambolle-Pock FISTA alpha= L-BFGS-B Vu-Condat dupl_prim_dual_cd time (s) time (s) time (s) March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
27 Dual SVM max x R n 1 2λ AD(b)x et x I [0,C] n(x) I b (x) RCV1 dataset: A = 47,236 20,242 matrix, nnz = 0.157% C = 1, λ = 1, 100 passes through the data n n SDCA (Shalev-Shwartz & Zhang) Primal-dual coordinate descent RCD (Necoara & Patrascu) Duality gap CPU time (s) March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
28 Larger SVM problem max x R n 1 2λ AD(b)x et x I [0,C] (x) I b (x) KDD 2009: A = 86,825 50,000 matrix, nnz = 1.79% λ = 1 n, C i {C +, C }, 300 passes SDCA (Shalev-Shwartz & Zhang) Primal-dual coordinate descent 10-1 Duality gap CPU time (s) March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
29 Conclusion Summary Genuine coordinate descent method with non-separable and non-smooth convex function Promising numerical results Open questions Non uniform probabilities Speed of convergence Replace β i by β i /2 Longer steps for non-separable proximal operators: eg. MISO, projection on {x : x 1 =... = x n } March 2016 O. Fercoq and P. Bianchi Primal-dual coordinate descent
Primal-dual coordinate descent
Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x
More informationA Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq Pascal Bianchi February 5, 2018 arxiv:1508.04625v2 [math.oc] 2 Feb 2018 Abstract This
More informationA Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non Separable Functions
A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non Separable Functions Olivier Fercoq, Pascal Bianchi To cite this version: Olivier Fercoq, Pascal Bianchi. A Coordinate Descent
More informationAccelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization
Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization Lin Xiao (Microsoft Research) Joint work with Qihang Lin (CMU), Zhaosong Lu (Simon Fraser) Yuchen Zhang (UC Berkeley)
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationCoordinate descent methods
Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationNesterov s Acceleration
Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationMinimizing Finite Sums with the Stochastic Average Gradient Algorithm
Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Joint work with Nicolas Le Roux and Francis Bach University of British Columbia Context: Machine Learning for Big Data Large-scale
More informationPrimal-dual algorithms for the sum of two and three functions 1
Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationA first-order primal-dual algorithm with linesearch
A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationMini-Batch Primal and Dual Methods for SVMs
Mini-Batch Primal and Dual Methods for SVMs Peter Richtárik School of Mathematics The University of Edinburgh Coauthors: M. Takáč (Edinburgh), A. Bijral and N. Srebro (both TTI at Chicago) arxiv:1303.2314
More informationFAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč
FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom
More informationStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationPrimal and Dual Variables Decomposition Methods in Convex Optimization
Primal and Dual Variables Decomposition Methods in Convex Optimization Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint works with Edouard Pauwels, Shoham Sabach, Luba Tetruashvili,
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationLecture 3: Minimizing Large Sums. Peter Richtárik
Lecture 3: Minimizing Large Sums Peter Richtárik Graduate School in Systems, Op@miza@on, Control and Networks Belgium 2015 Mo@va@on: Machine Learning & Empirical Risk Minimiza@on Training Linear Predictors
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationSplitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches
Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationA Universal Catalyst for Gradient-Based Optimization
A Universal Catalyst for Gradient-Based Optimization Julien Mairal Inria, Grenoble CIMI workshop, Toulouse, 2015 Julien Mairal, Inria Catalyst 1/58 Collaborators Hongzhou Lin Zaid Harchaoui Publication
More informationA Primal-dual Three-operator Splitting Scheme
Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm
More informationAsynchronous Parallel Computing in Signal Processing and Machine Learning
Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,
More informationAdaptive Probabilities in Stochastic Optimization Algorithms
Research Collection Master Thesis Adaptive Probabilities in Stochastic Optimization Algorithms Author(s): Zhong, Lei Publication Date: 2015 Permanent Link: https://doi.org/10.3929/ethz-a-010421465 Rights
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationFast Stochastic Optimization Algorithms for ML
Fast Stochastic Optimization Algorithms for ML Aaditya Ramdas April 20, 205 This lecture is about efficient algorithms for minimizing finite sums min w R d n i= f i (w) or min w R d n f i (w) + λ 2 w 2
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationarxiv: v7 [math.oc] 22 Feb 2018
A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationPrimal and Dual Predicted Decrease Approximation Methods
Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationParallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence
Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationarxiv: v4 [math.oc] 29 Jan 2018
Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:
More informationIncremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning
Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic
More informationDistributed Convex Optimization
Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates
ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationOptimization for Machine Learning
Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS
More informationAn Optimal Algorithm for Stochastic Three-Composite Optimization
An Optimal Algorithm for Stochastic Three-Composite Optimization Renbo Zhao 1, William B. Haskell 2, Vincent Y. F. Tan 3 1 ORC, Massachusetts Institute of Technology 2 Dept. ISEM, National University of
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationLecture: Algorithms for Compressed Sensing
1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationIN this work we are concerned with the problem of minimizing. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting
Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting Jakub Konečný, Jie Liu, Peter Richtárik, Martin Takáč arxiv:504.04407v2 [cs.lg] 6 Nov 205 Abstract We propose : a method incorporating
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationOn the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems
On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the
More informationRandom coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara
Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.
More informationarxiv: v3 [math.oc] 29 Jun 2016
MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationPrimal and dual predicted decrease approximation methods
Math. Program., Ser. B DOI 10.1007/s10107-017-1108-9 FULL LENGTH PAPER Primal and dual predicted decrease approximation methods Amir Beck 1 Edouard Pauwels 2 Shoham Sabach 1 Received: 11 November 2015
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationDecentralized Consensus Optimization with Asynchrony and Delay
Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University
More informationOn Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization
On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization Linbo Qiao, and Tianyi Lin 3 and Yu-Gang Jiang and Fan Yang 5 and Wei Liu 6 and Xicheng Lu, Abstract We consider
More information