SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

Size: px
Start display at page:

Download "SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University"

Transcription

1 SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16

2 Some Facts about FISTA Some Facts about FISTA: FISTA developed in [Beck, Teboulle: A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences 2(1): , 2009], 5600 citations on google scholar. c 2018 Peter Ochs Adaptive FISTA 2 / 16

3 Some Facts about FISTA Some Facts about FISTA: FISTA developed in [Beck, Teboulle: A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences 2(1): , 2009], 5600 citations on google scholar. Motivated by [Nesterov: A method of solving a convex programming problem with convergence rate O(1/k 2 ), Soviet Mathematics Doklady, 1983], 2000 citations on google scholar. optimal method in the sense of [Nemirovskii, Yudin 83], [Nesterov 04]. c 2018 Peter Ochs Adaptive FISTA 2 / 16

4 Some Facts about FISTA Some Facts about FISTA: FISTA developed in [Beck, Teboulle: A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences 2(1): , 2009], 5600 citations on google scholar. Motivated by [Nesterov: A method of solving a convex programming problem with convergence rate O(1/k 2 ), Soviet Mathematics Doklady, 1983], 2000 citations on google scholar. optimal method in the sense of [Nemirovskii, Yudin 83], [Nesterov 04]. Accelerated Gradient Method: y (k) = x (k) + (x (k) x (k 1) ) ( ) ( ) k N cleverly predefined x (k+1) = y (k) τ f(y (k) ) FISTA extends Accelerated Gradient Method by [Nesterov 83] to non-smooth problem. c 2018 Peter Ochs Adaptive FISTA 2 / 16

5 A Class of Structured Non-smooth Optimization Problems A Class of Structured Non-smooth Optimization Problems: min x g(x) + f(x) simple proximal mapping smooth argmin x g(x) x x 2 Update Scheme: FISTA (f, g convex) y (k) = x (k) + (x (k) x (k 1) ) x (k+1) = argmin g(x) + f(y (k) ) + x f(y (k) ), x y (k) + 1 x y(k) 2τ k 2 c 2018 Peter Ochs Adaptive FISTA 3 / 16

6 A Class of Structured Non-smooth Optimization Problems A Class of Structured Non-smooth Optimization Problems: min x g(x) + f(x) simple proximal mapping smooth argmin x g(x) x x 2 Update Scheme: FISTA (f, g convex) y (k) = x (k) + (x (k) x (k 1) ) x (k+1) = argmin g(x) + f(y (k) ) + x Equivalent to x (k+1) = argmin g(x)+ 1 ( x R 2τ x y (k) τ f(y (k) ) N f(y (k) ), x y (k) + 1 x y(k) 2τ k 2 ) 2 =: prox τg ( y (k) τ f(y (k) ) c 2018 Peter Ochs Adaptive FISTA 3 / 16 )

7 A Class of Structured Non-smooth Optimization Problems A Class of Structured Non-smooth Optimization Problems: min x g(x) + f(x) simple proximal mapping smooth argmin x g(x) x x 2 Update Scheme: Adaptive FISTA (also non-convex) y (k) = x (k) + (x (k) x (k 1) ) x (k+1) = argmin x min g(x) + f(y (k) ) + k f(y (k) ), x y (k) + 1 x y(k) 2τ k 2 c 2018 Peter Ochs Adaptive FISTA 3 / 16

8 A Class of Structured Non-smooth Optimization Problems A Class of Structured Non-smooth Optimization Problems: min x g(x) + f(x) simple proximal mapping smooth argmin x g(x) x x 2 Update Scheme: Adaptive FISTA (f quadratic) y (k) = x (k) + (x (k) x (k 1) ) x (k+1) = argmin x min g(x) + f(y (k) ) + k f(y (k) ), x y (k) + 1 x y(k) 2τ k 2... Taylor expansion around x (k) and optimize for = (x)... x (k+1) = argmin x g(x) x (x(k) Q 1 k f(x(k) )) 2 Q k c 2018 Peter Ochs Adaptive FISTA 3 / 16

9 A Class of Structured Non-smooth Optimization Problems Update Scheme: Adaptive FISTA (f quadratic) x (k+1) = argmin x R N g(x) x (x(k) Q 1 k f(x(k) )) 2 Q k =: prox Q k g (x (k) Q 1 k f(x(k) )) with Q k S ++ (N) as in the (zero memory) SR1 quasi-newton method: Q = I uu (identity minus rank-1). SR1 proximal quasi-newton method proposed by [Becker, Fadili 12] (convex case). Special setting is treated in [Karimi, Vavasis 17]. Unified and extended in [Becker, Fadili, O. 18]. c 2018 Peter Ochs Adaptive FISTA 4 / 16

10 Discussion about Solving the Proximal Mapping Discussion about Solving the Proximal Mapping: (g convex) For general Q, the main algorithmic step is hard to solve: ˆx = prox Q g := argmin x R N g(x) x x 2 Q Theorem: [Becker, Fadili 12] Q = D ± uu S ++ for u R N and D diagonal. Then prox Q g ( x) = D 1/2 prox g D 1/2(D 1/2 x v ) where v = α D 1/2 u and α is the unique root of l(α) = u, x D 1/2 prox g D 1/2 D 1/2 ( x αd 1 u) + α, which is strictly increasing and Lipschitz continuous with 1 + i u2 i d i. c 2018 Peter Ochs Adaptive FISTA 5 / 16

11 Solving the rank-1 Proximal Mapping for l 1 -norm Example: (Solving the rank-1 prox of the l 1 -norm) The proximal mapping wrt. the diagonal matrix is separable and simple prox g D 1/2(z) = argmin D 1/2 x x z 2 x R 2 N N = argmin x i / d i + 1 x R 2 (x i z i ) 2 N ( = = i=1 argmin x i / d i + 1 x i R 2 (x i z i ) 2) i=1,...,n ) ( max ( 0, z i 1/ d i ) sign(zi ) i=1,...,n c 2018 Peter Ochs Adaptive FISTA 6 / 16

12 Solving the rank-1 Proximal Mapping for l 1 -norm The root finding problem in the rank-1 prox of the l 1 -norm: α is the root of the 1D function (i.e. l(α ) = 0) l(α) = u, x D 1/2 prox g D 1/2 D 1/2 ( x αd 1 u) + α = u, x PLin( x αd 1 u) + α which is a piecewise linear function. Construct this function by sorting K N breakpoints. Cost: O(K log(k)). The root is determined using binary search. Cost: O(log(K)). (remember: l(α) is strictly increasing) Computing l(α) costs O(N). Total cost: O(K log(k)). c 2018 Peter Ochs Adaptive FISTA 7 / 16

13 Solving the rank-1 Proximal Mapping for l 1 -norm ŝ 1 ŝ 2 ŝ 3 ŝ 4 from [S. Becker] c 2018 Peter Ochs Adaptive FISTA 8 / 16

14 Discussion about Solving the Proximal Mapping Function g l 1 -norm Hinge l -ball Box constraint Positivity constraint Linear constraint l 1 -ball l -norm Simplex Algorithm Separable: exact Separable: exact Separable: exact Separable: exact Separable: exact Nonseparable: exact Nonseparable: Semi-smooth Newton + prox g D 1/2 exact Nonseparable: Moreau identity Nonseparable: Semi-smooth Newton + prox g D 1/2 exact From [Becker, Fadili 12]. c 2018 Peter Ochs Adaptive FISTA 9 / 16

15 Rank-r Modified Metric Rank-r Modified Metric: (g convex) (L-)BFGS uses a rank-r update of the metric with r > 1. Theorem: [Becker, Fadili, O. 18] Q = P ± V S ++, P S ++, V = r i=1 u iu i, rank(v ) = r. Then prox Q g ( x) = P 1/2 prox g P 1/2 P 1/2 ( x P 1 Uα ) where U = (u 1,..., u r ) and α is the unique root of l(α) = U ( x ) P 1/2 prox g P 1/2 P 1/2 ( x P 1 Uα) + Xα, where X := U V + U S ++ (r). The mapping l : R r R r is Lipschitz continuous with constant X + P 1/2 U 2 and strongly monotone. c 2018 Peter Ochs Adaptive FISTA 10 / 16

16 Variants with O(1/k 2 )-convergence rate Adaptive FISTA: Variants with O(1/k 2 )-convergence rate: (convex case) Adaptive FISTA cannot be proved to have the accelerated rate O(1/k 2 ). For each point x, afista decreases the objective more than a FISTA. However, global view on the sequence is lost. afista can be embedded into schemes with accelerated rate O(1/k 2 ). Monotone FISTA version: (Motivated by [Li, Lin 15], [Beck, Teboulle 09].) Tseng-like version: (Motivated by [Tseng 08].) c 2018 Peter Ochs Adaptive FISTA 11 / 16

17 Nesterov s Worst Case Function FISTA MFISTA afista amfista atseng CG lower bound O(1/k 2 ) f(x (k) ) min f iteration k time [sec] c 2018 Peter Ochs Adaptive FISTA 12 / 16

18 LASSO min h(x), h(x) = 1 x R N 2 Ax b 2 + λ x 1, FISTA afista atseng MFISTA amfista h(x (k) ) h iteration k time [sec] c 2018 Peter Ochs Adaptive FISTA 13 / 16

19 Proposed Algorithm Proposed Algorithm: (non-convex setting) Current iterate x (k) R N. Step size: T S ++ (N). Define the extrapolated point y (k) that depends on : y (k) := x (k) + (x (k) x (k 1) ). Exact version: Compute x (k+1) as follows: x (k+1) = argmin min l g f (x; y(k) ) + 1 x y(k) x R N 2 2 T, l g f (x; y(k) ) := g(x) + f(y(k) ) + f(y (k) ), x y(k) Inexact version: Find x (k+1) and such that l g f (x(k+1) ; y (k) ) x(k+1) y (k) 2 T f(x (k) ) + g(x (k) ) c 2018 Peter Ochs Adaptive FISTA 14 / 16

20 Neural network optimization problem / non-linear inverse problem min W 0,W 1,W 2 b 0,b 1,b 2 N ( (W 2 σ 2 (W 1 σ 1 (W 0 X +B 0 )+B 1 )+B 2 Ỹ ) 1,i 2 +ε 2) 1/2 2 +λ W j 1 i=1 FBS MFISTA ipiano afista j= h(x (k) )/h(x (0) ) iteration k time [sec] c 2018 Peter Ochs Adaptive FISTA 15 / 16

21 Conclusion Conclusion: Proposed adaptive FISTA for solving problems of the form min g(x) + f(x), x R N Adaptive FISTA is locally better than FISTA. Prove convergence to a stationary point. Equivalence to a proximal quasi-newton method, if f is quadratic. Often, the proximal mapping can be computed efficiently. Adaptive FISTA can be embedded into accelerated / optimal schemes. c 2018 Peter Ochs Adaptive FISTA 16 / 16

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

arxiv: v1 [math.oc] 26 Jan 2018

arxiv: v1 [math.oc] 26 Jan 2018 ON QUASI-NEWTON FORWARD BACKWARD SPLITTING: PROXIMAL CALCULUS AND CONVERGENCE STEPHEN BECKER, JALAL FADILI, AND PETER OCHS arxiv:1801.08691v1 [math.oc] 26 Jan 2018 Abstract. We introduce a framework for

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Accelerated gradient methods

Accelerated gradient methods ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

An Optimal Affine Invariant Smooth Minimization Algorithm.

An Optimal Affine Invariant Smooth Minimization Algorithm. An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

A first-order primal-dual algorithm with linesearch

A first-order primal-dual algorithm with linesearch A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of

More information

Optimized first-order minimization methods

Optimized first-order minimization methods Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Sep 29 DUE: Oct 13, 5:00

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

Primal and Dual Variables Decomposition Methods in Convex Optimization

Primal and Dual Variables Decomposition Methods in Convex Optimization Primal and Dual Variables Decomposition Methods in Convex Optimization Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint works with Edouard Pauwels, Shoham Sabach, Luba Tetruashvili,

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19 Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

Mathematical methods for Image Processing

Mathematical methods for Image Processing Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning

More information

BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption

BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption Daniel Reem (joint work with Simeon Reich and Alvaro De Pierro) Department of Mathematics, The Technion,

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

Lecture 2 February 25th

Lecture 2 February 25th Statistical machine learning and convex optimization 06 Lecture February 5th Lecturer: Francis Bach Scribe: Guillaume Maillard, Nicolas Brosse This lecture deals with classical methods for convex optimization.

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

A Universal Catalyst for Gradient-Based Optimization

A Universal Catalyst for Gradient-Based Optimization A Universal Catalyst for Gradient-Based Optimization Julien Mairal Inria, Grenoble CIMI workshop, Toulouse, 2015 Julien Mairal, Inria Catalyst 1/58 Collaborators Hongzhou Lin Zaid Harchaoui Publication

More information

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom

More information

Global convergence of the Heavy-ball method for convex optimization

Global convergence of the Heavy-ball method for convex optimization Noname manuscript No. will be inserted by the editor Global convergence of the Heavy-ball method for convex optimization Euhanna Ghadimi Hamid Reza Feyzmahdavian Mikael Johansson Received: date / Accepted:

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Adaptive restart of accelerated gradient methods under local quadratic growth condition

Adaptive restart of accelerated gradient methods under local quadratic growth condition Adaptive restart of accelerated gradient methods under local quadratic growth condition Olivier Fercoq Zheng Qu September 6, 08 arxiv:709.0300v [math.oc] 7 Sep 07 Abstract By analyzing accelerated proximal

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

A Conservation Law Method in Optimization

A Conservation Law Method in Optimization A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Low Complexity Regularization 1 / 33

Low Complexity Regularization 1 / 33 Low Complexity Regularization 1 / 33 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models Sparse representations

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Computational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom

Computational Statistics and Optimisation. Joseph Salmon   Télécom Paristech, Institut Mines-Télécom Computational Statistics and Optimisation Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information