Primal-dual coordinate descent
|
|
- Avis Cross
- 5 years ago
- Views:
Transcription
1 Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July /28
2 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x Rn (I + τ g) 1 and (I + σ h ) 1 are easy to compute M : R n R m is linear Equivalent to saddle point problem if 0 ri(mdom g dom h) min x R n max y R m f (x) + g(x) h (y) + Mx, y 2/28
3 Examples: Lasso min x R n Ax b λ x 1 f (x) = Ax b 2 2 g(x) = λ x 1 h(x) = 0 3/28
4 Examples: dual of Support Vector Machines 1 min x R n 2λn 2 ( m n ) 2 b i A ji x (i) 1 n j=1 st :x [0, 1] b T x = 0 i=1 n i=1 x (i) ( N ) 2 f (x) = 1 2λn 2 i=1 b ia ji x (i) 1 n n i=1 x (i) g(x) = I [0,1] N(x) h(y) = I b (y) Mx = x 4/28
5 Examples: L 1 + TV regularized regression min x R n Ax b α 1 x 1 + α 2 x 2,1 f (x) = Ax b 2 2 g(x) = α 1 x 1 = discrete gradient h(y) = α 2 y 2,1 = α 2 n i=1 y 2 i,1 + y 2 i,2 + y 2 i,3 5/28
6 Part 1: Classical coordinate descent h = 0 g is separable min f (x) + g(x) x Rn f has a coordinate-wise Lipschitz gradient: x R n, i {1,..., n}, t R, i f (x + te i ) i f (x) β i t 6/28
7 Coordinate descent At iteration k: Remarks 1. Choose randomly a coordinate i k+1 2. x k+1 = (I + τ g) 1( x k τ f (x k ) ) { x k+1 i if i = i k+1 3. Update: x k+1 = xk i if i i k+1 As g is separable, one only needs to compute x i k+1 k+1 Convergence if 1 τ i < β i 2 for all i [Richtárik, Takáč, 2011] 7/28
8 Part 2: Stochastic primal-dual coordinate descent f is L( f )-Lipschitz min f (x) + g(x) + h(mx) x Rn g and h need not be separable M R m n 8/28
9 Vũ-Condat s algorithm [Vũ + Condat, 2013] y k+1 = (I + σ h ) 1( ) y k + σmx k x k+1 = (I + τ g) 1( x k τm (2y k+1 y k ) τ f (x k ) ) Generalizes Chambolle-Pock, ADMM and proximal gradient Converges to a saddle point of the Lagrangian Proof: fixed point algorithm for a firmly nonexpansive operator 9/28
10 Vũ-Condat s algorithm y k+1 = (I + σ h ) 1( y k + σmx k ) x k+1 = (I + τ g) 1( x k τm (2y k+1 y k ) τ f (x k ) ) The fixed point operator is: T ( ) y = x ȳ {}}{ (I + σ h ) 1( y + σmx ) (I + τ g) 1( x τm (2ȳ y) τ f (x) ) Convergence as soon as 1 τ σ M 2 < L( f ) 2 10/28
11 Coordinate descent for firmly nonexpansive operators [Bianchi, Hachem & Iutzeler + Combettes & Pesquet, 2014] At iteration k: 1. Choose randomly a (block of) coordinate i k+1 2. z k+1 = T (z k ) 3. Update: z k+1 = { z i k+1 if i = i k+1 z i k if i i k+1 Convergence: if T = αr + (1 α)i where α (0, 1) and R is nonexpansive in a separable norm 11/28
12 Primal-dual coordinate descent We take Vũ-Condat s fixed point operator ȳ {}}{ ( ) (I + σ h ) 1( y + σmx ) y T = x (I + τ g) 1( x τm (2ȳ y) τ f (x) ) Assume that M is block diagonal and define blocks of primal-dual variables z = (y, x) accordingly The algorithm is z k+1 = { T i (z k ) if i = i k+1 z i k if i i k+1 Convergence as soon as 1 τ σ M 2 < L( f ) 2 12/28
13 Speed of convergence Theorem Assume that for all i {1,..., n}, p i = P(i k+1 = i) > 0 and τ 1 > σ M 2 + L( f ) 2. Denote ˆ x k+1 = 1( k+1 k l=1 x l) and ˆȳ k+1 = 1( k+1 k l=1 ȳl). Then for all (y, x) R m R n, E[L(ˆ x k+1, y) L(x, ˆȳ k+1 )] 1 ( 2k max 1, 1 + ( 1 κ 1) 1 ) z z 0 2 p 1 1/(2κ) 1 P [ ] τ where κ = (τ 1 σ M 2 )/L( f ) > 1 and P = 1 I M 2 M T σ 1 I 13/28
14 Duplication trick M 1,1 0 0 M 1,1 M 1,2 0 0 M 1,2 0 M = 0 M 2,2 0 K = 0 M 2,2 0 M 3,1 M 3,2 M 3,3 M 3, M 3, M 3, Define S = and D(m) = We have D(m)SK = M. We define h = h (D(m)S) : prox mσ,h (y) = (1 m1 prox (1) σ,h (S(y)),..., 1 mp prox (p) σ,h (S(y))) 14/28
15 Part 3: Stochastic primal-dual coordinate descent with long steps 15/28
16 Comparison of coordinate descent algorithms Classical Primal-dual f (x) + n i=1 g i(x i ) f (x) + g(x) + h(kx) g separable g and h non-separable M: additional coupling i f (x + te i ) i f (x) β i t Bases on operator T : Longer steps β = L( f ) Speed in O(1/k) Speed in O(1/k) 16/28
17 Combine advantages of both approaches min f (x) + g(x) + h(mx) x Rn f has a coordinate-wise Lipschitz gradient: x R n, i {1,..., n}, t R, i f (x + te i ) i f (x) β i t g and h need not be separable M R m n 17/28
18 Iterates need not be feasible Example: f (x) = { 1 2 x 2 0 if x 1 = x 2 g(x) = + if x 1 x 2 h = 0 Start with x 0 = [1, 1] (feasible): x 1 = [0, 0] Let i 1 = 1, we get x 1 = [0, 1] (unfeasible) 18/28
19 Distance to optimum may not change f (x) = 1 2 (x 1 + x 2 1) 2, L( f ) = 2, β i = 1 x x 0 x 1 19/28
20 Distance to optimum may not change f (x) = 1 2 (x 1 + x 2 1) 2, L( f ) = 2, β i = 1 x x 0 x 1 E[ x 1 x 2 ] = 1 2 = x 0 x 2 19/28
21 E[ x 1 x 2 ] = 2 3 > x 0 x 2 19/28 Distance to optimum may increase f (x) = 1 2 (x 1 + x 2 + x 3 1) 2, L( f ) = 3, β i = 1 x x 0 x 1
22 Distance to optimum may increase a lot f (x) = 1 n 2 ( x i 1) 2, L( f ) = n, β i = 1 i=1 x x 0 x 1 E[ x 1 x 2 ] = n n 1 x 0 x 2 = 1 n 19/28
23 New algorithm [Suppose that M is block diagonal (duplication trick available)] At iteration k: 1. y k+1 = prox σ,h (y k + D(σ)Mx k ) x k+1 = prox τ,g (x k D(τ)( f (x k )+M (2y k+1 y k ))) 2. For i = i k+1 and j : M j,ik+1 0, update: x (i) k+1 = x (i) k+1 y (j) k+1 = y (j) k+1 3. Otherwise, set x (i) k+1 = x (i) k, y (j) k+1 = y (j) k 20/28
24 Convergence Theorem If for all i {1,..., n}, P(i k+1 = i) = 1/n and τ i < 1 ( ) β i + ρ j J(i) σ jmj,i M j,i then there exists a saddle point (x, y ) of L such that lim (x k, y k ) = (x, y ) k Proof: We consider the Lyapunov function S k = f (x k ) f (x ) f (x ), x k x z k z P 21/28
25 Primal value Dual SVM max x R n 1 2 AD(b)x e T x I [0,C] n(x) I b (x) RCV1 dataset: A = 47,236 20,242 matrix, n = 47,236 C = Vu-Condat Bianchi et al SDCA primal-dual CD Passes on the data 22/28
26 alpha alpha=0.01 alpha=0.001 alpha=0.1 alpha= L 1 + TV 8500 regularized least squares l1_ratio= FISTA: alpha=0.001,l1_ratio=0.5,nvoxels= min x R n 1 2 Ax b α(r x (1 r) x 2,1 ) A: ,280 dense matrix 9650 : 195,840 65,280 sparse matrix (3D gradient) Lightly regularized problem α = 0.001, l1_ratio=0.5 r = 0.5 Chambolle-Pock FISTA L-BFGS-B Vu-Condat dupl_prim_dual_cd /
27 alpha= alpha=0.001 alpha=0.01 alpha=0.01 alpha=0.1 alpha= L 1 + TV 8500 regularized least squares l1_ratio= FISTA: alpha=0.01,l1_ratio=0.5,nvoxels= min x R n 1 2 Ax b α(r x (1 r) x 2,1 ) A: ,280 dense 9700 matrix : 195,840 65,280 sparse matrix (3D gradient) 9840 Mediumly regularized problem l1_ratio=0.5 α = 0.01, r = 0.5 Chambolle-Pock FISTA L-BFGS-B Vu-Condat dupl_prim_dual_cd /
28 L 1 + TV 8500 regularized least squares min x R n 1 2 Ax b α(r x 1 + (1 r) x 2,1 ) 9400 A: ,280 dense 9700 matrix : 195,840 65,280 sparse matrix (3D gradient) alpha alpha=0.01 FISTA: alpha=0.1,l1_ratio=0.1,nvoxels= alpha= Strongly TV-regularized problem α = 0.1, r = 0.1 Chambolle-Pock FISTA L-BFGS-B Vu-Condat dupl_prim_dual_cd /28
29 L 1 + TV 8500 regularized least squares min x R n 1 2 Ax b α(r x 1 + (1 r) x 2,1 ) A: ,280 dense 9700 matrix 9650 : 195,840 65,280 sparse matrix (3D gradient) FISTA: alpha=0.1,l1_ratio=0.9,nvoxels= Strongly L 1 -regularized problem α = 0.1, r = 0.9 Chambolle-Pock FISTA L-BFGS-B Vu-Condat dupl_prim_dual_cd /28
30 8500 L 1 + TV regularized least squares 9850 min x R n 1 2 Ax b α(r x 1 + (1 r) x 2,1 ) A: ,280 dense 9700 matrix 9650 : 195,840 65,280 sparse matrix (3D gradient) 9600 l1_ratio= l1_ratio= alpha=0.001 alpha=0.01 alpha= Chambolle-Pock FISTA L-BFGS-B Vu-Condat dupl_prim_dual_cd l1_ratio= /28
31 Summary Conclusion Genuine coordinate descent method with non-separable and non-smooth convex function Promising numerical results Open questions Long steps - Non uniform probabilities - Speed of convergence - Replace β i by β i /2 Non-ergodic speed of convergence Longer steps for non-separable proximal operators: eg. MISO, projection on {x : x 1 =... = x n } 28/28
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationA Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq Pascal Bianchi February 5, 2018 arxiv:1508.04625v2 [math.oc] 2 Feb 2018 Abstract This
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationAccelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization
Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization Lin Xiao (Microsoft Research) Joint work with Qihang Lin (CMU), Zhaosong Lu (Simon Fraser) Yuchen Zhang (UC Berkeley)
More informationA Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non Separable Functions
A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non Separable Functions Olivier Fercoq, Pascal Bianchi To cite this version: Olivier Fercoq, Pascal Bianchi. A Coordinate Descent
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationPrimal-dual algorithms for the sum of two and three functions 1
Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms
More informationCoordinate descent methods
Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationMinimizing Finite Sums with the Stochastic Average Gradient Algorithm
Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Joint work with Nicolas Le Roux and Francis Bach University of British Columbia Context: Machine Learning for Big Data Large-scale
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationSplitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches
Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationA first-order primal-dual algorithm with linesearch
A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationNesterov s Acceleration
Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationPrimal and Dual Variables Decomposition Methods in Convex Optimization
Primal and Dual Variables Decomposition Methods in Convex Optimization Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint works with Edouard Pauwels, Shoham Sabach, Luba Tetruashvili,
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationA Primal-dual Three-operator Splitting Scheme
Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationFAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč
FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationAsynchronous Parallel Computing in Signal Processing and Machine Learning
Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationLecture 3: Minimizing Large Sums. Peter Richtárik
Lecture 3: Minimizing Large Sums Peter Richtárik Graduate School in Systems, Op@miza@on, Control and Networks Belgium 2015 Mo@va@on: Machine Learning & Empirical Risk Minimiza@on Training Linear Predictors
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationarxiv: v4 [math.oc] 29 Jan 2018
Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates
ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationOn Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization
On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization Linbo Qiao, and Tianyi Lin 3 and Yu-Gang Jiang and Fan Yang 5 and Wei Liu 6 and Xicheng Lu, Abstract We consider
More informationFast Stochastic Optimization Algorithms for ML
Fast Stochastic Optimization Algorithms for ML Aaditya Ramdas April 20, 205 This lecture is about efficient algorithms for minimizing finite sums min w R d n i= f i (w) or min w R d n f i (w) + λ 2 w 2
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationPrimal and Dual Predicted Decrease Approximation Methods
Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained
More informationA Universal Catalyst for Gradient-Based Optimization
A Universal Catalyst for Gradient-Based Optimization Julien Mairal Inria, Grenoble CIMI workshop, Toulouse, 2015 Julien Mairal, Inria Catalyst 1/58 Collaborators Hongzhou Lin Zaid Harchaoui Publication
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationDecentralized Consensus Optimization with Asynchrony and Delay
Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University
More informationSecond order machine learning
Second order machine learning Michael W. Mahoney ICSI and Department of Statistics UC Berkeley Michael W. Mahoney (UC Berkeley) Second order machine learning 1 / 88 Outline Machine Learning s Inverse Problem
More informationarxiv: v7 [math.oc] 22 Feb 2018
A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationOn the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems
On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationMini-Batch Primal and Dual Methods for SVMs
Mini-Batch Primal and Dual Methods for SVMs Peter Richtárik School of Mathematics The University of Edinburgh Coauthors: M. Takáč (Edinburgh), A. Bijral and N. Srebro (both TTI at Chicago) arxiv:1303.2314
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationIterative Convex Regularization
Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova Istituto Italiano di Tecnologia Massachusetts Institute of Technology Optimization and Statistical Learning Workshop,
More informationARock: an Algorithmic Framework for Async-Parallel Coordinate Updates
ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin July 7, 215 The problem of finding a fixed point to a nonexpansive operator is an abstraction
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationAdaptive Probabilities in Stochastic Optimization Algorithms
Research Collection Master Thesis Adaptive Probabilities in Stochastic Optimization Algorithms Author(s): Zhong, Lei Publication Date: 2015 Permanent Link: https://doi.org/10.3929/ethz-a-010421465 Rights
More informationIn collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29
A Random block-coordinate primal-dual proximal algorithm with application to 3D mesh denoising Emilie CHOUZENOUX Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est, France Horizon Maths 2014
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationDistributed Computation of Quantiles via ADMM
1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationSub-Sampled Newton Methods
Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1
More informationDual Ascent. Ryan Tibshirani Convex Optimization
Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationA New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator
https://doi.org/10.1007/s10915-018-0680-3 A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator Ming Yan 1,2 Received: 22 January 2018 / Accepted: 22 February 2018
More informationStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationLARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION
LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION Aryan Mokhtari, Alec Koppel, Gesualdo Scutari, and Alejandro Ribeiro Department of Electrical and Systems
More informationDistributed Convex Optimization
Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More information