Accelerated Proximal Gradient Methods for Convex Optimization

Size: px
Start display at page:

Download "Accelerated Proximal Gradient Methods for Convex Optimization"

Transcription

1 Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008

2 ACCELERATED PROXIMAL GRADIENT METHODS 1 Talk Outline A Convex Opimization Problem

3 ACCELERATED PROXIMAL GRADIENT METHODS 1 Talk Outline A Convex Opimization Problem Proximal Gradient Method

4 ACCELERATED PROXIMAL GRADIENT METHODS 1 Talk Outline A Convex Opimization Problem Proximal Gradient Method Accelerated Proximal Gradient Method I Accelerated Proximal Gradient Method II

5 ACCELERATED PROXIMAL GRADIENT METHODS 1 Talk Outline A Convex Opimization Problem Proximal Gradient Method Accelerated Proximal Gradient Method I Accelerated Proximal Gradient Method II Example: Matrix Game

6 ACCELERATED PROXIMAL GRADIENT METHODS 1 Talk Outline A Convex Opimization Problem Proximal Gradient Method Accelerated Proximal Gradient Method I Accelerated Proximal Gradient Method II Example: Matrix Game Conclusions & Extensions

7 ACCELERATED PROXIMAL GRADIENT METHODS 2 A Convex Optimization Problem min x E f P (x) := f(x) + P (x) E is a real linear space with norm. E is the dual space of cont. linear functionals on E, with dual norm x = sup x 1 x, x. P : E (, ] is proper, convex, lsc (and simple ). f : E R is convex diff. f(x) f(y) L x y x, y domp (L 0).

8 ACCELERATED PROXIMAL GRADIENT METHODS 2 A Convex Optimization Problem min x E f P (x) := f(x) + P (x) E is a real linear space with norm. E is the dual space of cont. linear functionals on E, with dual norm x = sup x 1 x, x. P : E (, ] is proper, convex, lsc (and simple ). f : E R is convex diff. f(x) f(y) L x y x, y domp (L 0). Constrained case: P δ X with X E nonempty, closed, convex. δ X (x) = { 0 if x X else

9 ACCELERATED PROXIMAL GRADIENT METHODS 3 Examples: E = R n, P (x) = x 1, f(x) = Ax b 2 2 Basis Pursuit/Lasso E = R n 1 R n N, P (x) = w 1 x w N x N 2 (w j > 0), f(x) = g(ax) with g(y) = m i=1 ln(1 + ey i) b i y i group Lasso E = R n, P { δ X with X = {x x 0, x x n = 1}, f(x) = g (Ax) m with g(y) = i=1 y i ln y i if y 0, y y m = 1 else E = S n, { P δ X with X = {x x ij ρ i, j}, f(x) = g (x + s) with g(y) = ln dety if αi y βi (ρ, α, β > 0) else matrix game covariance selection

10 ACCELERATED PROXIMAL GRADIENT METHODS 4 How to solve this (nonsmooth) convex optimization problem? In applications, m and n are large (m, n 1000), A may be dense. 2nd-order methods (Newton, interior-point)? Few iterations, but each iteration can be too expensive (e.g., O(n 3 ) ops). 1st-order methods (gradient)? Each iteration is cheap (by using suitable prox function ), but often too many iterations. Accelerate convergence by interpolation Nesterov.

11 ACCELERATED PROXIMAL GRADIENT METHODS 5 Proximal Gradient Method Let l(x; y) := f(y) + f(y), x y + P (x) D(x, y) := h(x) h(y) h(y), x y, Bregman,... with h : E (, ] strictly convex, differentiable on X h int(domp ), and D(x, y) 1 2 x y 2 x domp, y X h.

12 ACCELERATED PROXIMAL GRADIENT METHODS 5 Proximal Gradient Method Let l(x; y) := f(y) + f(y), x y + P (x) D(x, y) := h(x) h(y) h(y), x y, Bregman,... with h : E (, ] strictly convex, differentiable on X h int(domp ), and D(x, y) 1 2 x y 2 x domp, y X h. For k = 0, 1,..., x k+1 = arg min x with x 0 domp. Assume x k X h k. {l(x; x k ) + LD(x, x k )} Special cases: steepest descent, gradient-projection Goldstein, Levitin, Polyak,..., mirror-descent Yudin, Nemirovski, iterative thresholding Daubechies et al.,...

13 ACCELERATED PROXIMAL GRADIENT METHODS 6 For the earlier examples, x k+1 has closed form when h is chosen suitably: E = R n, P (x) = x 1, h(x) = x 2 2/2. E = R n 1 R n N, P (x) = w 1 x w N x N 2 (w j > 0), h(x) = x 2 2/2. E = R n, P δ X with X = {x x 0, x x n = 1}, h(x) = n j=1 x j ln x j. E = S n, P δ X with X = {x x ij ρ i, j}, h(x) = x 2 F /2.

14 ACCELERATED PROXIMAL GRADIENT METHODS 7 Fact 1: f P (x) l(x; y) f P (x) L 2 x y 2 x, y domp. Fact 2: For any proper convex lsc ψ : E (, ] and z X h, let z + = arg min x {ψ(x) + D(x, z)}. If z + X h, then ψ(z + ) + D(z +, z) ψ(x) + D(x, z) D(x, z + ) x domp.

15 ACCELERATED PROXIMAL GRADIENT METHODS 8 Prop. 1: For any x domp, with e k := f P (x k ) f P (x). min{e 1,..., e k } LD(x, x 0), k = 1, 2,... k

16 ACCELERATED PROXIMAL GRADIENT METHODS 8 Prop. 1: For any x domp, with e k := f P (x k ) f P (x). Proof: min{e 1,..., e k } LD(x, x 0), k = 1, 2,... k f P (x k+1 ) l(x k+1 ; x k ) + L 2 x k+1 x k 2 Fact 1 l(x k+1 ; x k ) + LD(x k+1, x k ) l(x; x k ) + LD(x, x k ) LD(x, x k+1 ) Fact 2 f P (x) + LD(x, x k ) LD(x, x k+1 ), Fact 1 so 0 LD(x, x k+1 ) LD(x, x k ) e k+1 LD(x, x 0 ) (e e k+1 ) LD(x, x 0 ) (k + 1) min{e 1,..., e k+1 }

17 ACCELERATED PROXIMAL GRADIENT METHODS 9 We will improve the global convergence rate by interpolation. Idea: At iteration k, use a stepsize of O(k/L) instead of 1/L and backtrack towards x k.

18 ACCELERATED PROXIMAL GRADIENT METHODS 10 Accelerated Proximal Gradient Method I For k = 0, 1,..., y k = (1 θ k )x k + θ k z k z k+1 = arg min {l(x; y k ) + θ k LD(x, z k )} x k+1 = x (1 θ k )x k + θ k z k+1 1 θ k+1 θk θk 2 (0 < θ k+1 1) with θ 0 = 1, x 0, z 0 domp Nesterov, Auslender, Teboulle, Lan, Lu, Monteiro,... Assume z k X h k. For example, θ k = 2 k + 2 or θ k+1 = θ 4 k + 4θk 2 θ2 k. 2

19 ACCELERATED PROXIMAL GRADIENT METHODS 11 Prop. 2: For any x domp, min{e 1,..., e k } LD(x, z 0 )θ 2 k, k = 1, 2,... with e k := f P (x k ) f P (x).

20 ACCELERATED PROXIMAL GRADIENT METHODS 11 Prop. 2: For any x domp, min{e 1,..., e k } LD(x, z 0 )θ 2 k, k = 1, 2,... with e k := f P (x k ) f P (x). Proof: f P (x k+1 ) l(x k+1 ; y k ) + L 2 x k+1 y k 2 Fact 1 = l((1 θ k )x k + θ k z k+1 ; y k ) + L 2 (1 θ k)x k + θ k z k+1 y k 2 (1 θ k )l(x k ; y k ) + θ k l(z k+1 ; y k ) + L 2 θ2 k z k+1 z k 2 (1 θ k )l(x k ; y k ) + θ k (l(z k+1 ; y k ) + θ k LD(z k+1, z k )) (1 θ k )l(x k ; y k ) + θ k (l(x; y k ) + θ k LD(x, z k ) θ k LD(x, z k+1 )) Fact 2 (1 θ k )f P ( (x k ) + θ k f P (x) + θ k LD(x, z k ) θ k LD(x, z k+1 ) ) Fact 1

21 ACCELERATED PROXIMAL GRADIENT METHODS 12 so, subtracting by f P (x) and then dividing by θk 2, we have etc. 1 θ 2 k e k+1 1 θ k θk 2 e k + LD(x; z k ) LD(x; z k+1 ) Thus, global convergence rate improves from O(1/k) to O(1/k 2 ) with little extra work per iteration!

22 ACCELERATED PROXIMAL GRADIENT METHODS 13 Can also replace l(x; y k ) by a certain weighted sum of l(x; y 0 ), l(x; y 1 ),..., l(x; y k ).

23 ACCELERATED PROXIMAL GRADIENT METHODS 14 Accelerated Proximal Gradient Method II For k = 0, 1,..., y k = (1 θ k )x { k + θ k z k k l(x; y i ) z k+1 = arg min x ϑ i=0 i x k+1 = (1 θ k )x k + θ k z k+1 1 θ k+1 θ k+1 ϑ k+1 = + Lh(x) 1 θ k ϑ k (ϑ k+1 θ k+1 > 0) } with ϑ 0 θ 0 = 1, x 0 domp, and z 0 = arg min x domp Assume z k X h k. h(x) Nesterov, d Aspremont et al., Lu,... For example, ϑ k = 2 k + 1, θ k = 2 k + 2 or ϑ k+1 = θ k+1 = θ 4 k + 4θk 2 θ2 k. 2

24 ACCELERATED PROXIMAL GRADIENT METHODS 15 Prop. 3: For any x domp, min{e 1,..., e k } L(h(x) h(z 0 ))θ k 1 ϑ k 1, k = 1, 2,... with e k := f P (x k ) f P (x).

25 ACCELERATED PROXIMAL GRADIENT METHODS 15 Prop. 3: For any x domp, min{e 1,..., e k } L(h(x) h(z 0 ))θ k 1 ϑ k 1, k = 1, 2,... with e k := f P (x k ) f P (x). Proof replaces Fact 2 with: Fact 3: For any proper convex lsc ψ : E (, ], let z = arg min x {ψ(x) + h(x)}. If z X h, then ψ(z) + h(z) ψ(x) + h(x) D(x, z) x domp. Advantage?

26 ACCELERATED PROXIMAL GRADIENT METHODS 16 Example: Matrix Game min max x X v V v, Ax with X and V unit simplices in R n and R m, and A R m n. Generate A ij U[ 1, 1] with probab. p; otherwise A ij = 0. Nesterov, Nemirovski Set P { δ X and f(x) = g (Ax/µ), with µ = m g(v) = i=1 v i ln v i if v V else ɛ 2 ln m (ɛ > 0) and (L = 1 µ, = 1-norm)

27 ACCELERATED PROXIMAL GRADIENT METHODS 16 Example: Matrix Game min max x X v V v, Ax with X and V unit simplices in R n and R m, and A R m n. Generate A ij U[ 1, 1] with probab. p; otherwise A ij = 0. Nesterov, Nemirovski Set P { δ X and f(x) = g (Ax/µ), with µ = m g(v) = i=1 v i ln v i if v V else ɛ 2 ln m (ɛ > 0) and (L = 1 µ, = 1-norm) Implement PGM, APGM I & II in Matlab, with h(x) = n j=1 x j ln x j. Dynamically adjust L. Initialize x 0 = z 0 = ( 1 n,..., 1 n ). Terminate when max i (Ax k ) i min(a v k ) j ɛ with v k V a weighted sum of dual vectors associated with x 0, x 1,..., x k. j

28 ACCELERATED PROXIMAL GRADIENT METHODS 17 PGM APGM I APGM II n/m/p ɛ k/cpu (sec) k/cpu (sec) k/cpu (sec) 1000/100/ / / / / / /100/ / / /100/ / / /1000/ / / /1000/ / /695 Table 1: Performance of PGM, APGM I & II for different n, m, sparsity p, and soln accuracy ɛ.

29 ACCELERATED PROXIMAL GRADIENT METHODS 18 Conclusions & Extensions 1. Accelerated prox gradient method is promising in theory and practice. Applicable to convex-concave optimization by using smoothing Nesterov. Further extension to add cutting planes.

30 ACCELERATED PROXIMAL GRADIENT METHODS 18 Conclusions & Extensions 1. Accelerated prox gradient method is promising in theory and practice. Applicable to convex-concave optimization by using smoothing Nesterov. Further extension to add cutting planes. 2. Application to matrix completion, where E = R m n and P (x) = σ(x) 1?

31 ACCELERATED PROXIMAL GRADIENT METHODS 18 Conclusions & Extensions 1. Accelerated prox gradient method is promising in theory and practice. Applicable to convex-concave optimization by using smoothing Nesterov. Further extension to add cutting planes. 2. Application to matrix completion, where E = R m n and P (x) = σ(x) 1? 3. Extending the interpolation technique to incremental gradient methods and coordinate-wise gradient methods?

32 ACCELERATED PROXIMAL GRADIENT METHODS 18 Conclusions & Extensions 1. Accelerated prox gradient method is promising in theory and practice. Applicable to convex-concave optimization by using smoothing Nesterov. Further extension to add cutting planes. 2. Application to matrix completion, where E = R m n and P (x) = σ(x) 1? 3. Extending the interpolation technique to incremental gradient methods and coordinate-wise gradient methods? Thanks for coming!..

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder

DISCUSSION PAPER 2011/70. Stochastic first order methods in smooth convex optimization. Olivier Devolder 011/70 Stochastic first order methods in smooth convex optimization Olivier Devolder DISCUSSION PAPER Center for Operations Research and Econometrics Voie du Roman Pays, 34 B-1348 Louvain-la-Neuve Belgium

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Hessian Riemannian Gradient Flows in Convex Programming

Hessian Riemannian Gradient Flows in Convex Programming Hessian Riemannian Gradient Flows in Convex Programming Felipe Alvarez, Jérôme Bolte, Olivier Brahic INTERNATIONAL CONFERENCE ON MODELING AND OPTIMIZATION MODOPT 2004 Universidad de La Frontera, Temuco,

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Hamiltonian Descent Methods

Hamiltonian Descent Methods Hamiltonian Descent Methods Chris J. Maddison 1,2 with Daniel Paulin 1, Yee Whye Teh 1,2, Brendan O Donoghue 2, Arnaud Doucet 1 Department of Statistics University of Oxford 1 DeepMind London, UK 2 The

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

An Optimal Affine Invariant Smooth Minimization Algorithm.

An Optimal Affine Invariant Smooth Minimization Algorithm. An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Adaptive restart of accelerated gradient methods under local quadratic growth condition

Adaptive restart of accelerated gradient methods under local quadratic growth condition Adaptive restart of accelerated gradient methods under local quadratic growth condition Olivier Fercoq Zheng Qu September 6, 08 arxiv:709.0300v [math.oc] 7 Sep 07 Abstract By analyzing accelerated proximal

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Nesterov s Optimal Gradient Methods

Nesterov s Optimal Gradient Methods Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Estimate sequence methods: extensions and approximations

Estimate sequence methods: extensions and approximations Estimate sequence methods: extensions and approximations Michel Baes August 11, 009 Abstract The approach of estimate sequence offers an interesting rereading of a number of accelerating schemes proposed

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex

More information

Auxiliary-Function Methods in Optimization

Auxiliary-Function Methods in Optimization Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

Mathematics of Data: From Theory to Computation

Mathematics of Data: From Theory to Computation Mathematics of Data: From Theory to Computation Prof. Volkan Cevher volkan.cevher@epfl.ch Lecture 8: Composite convex minimization I Laboratory for Information and Inference Systems (LIONS) École Polytechnique

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

From Evolutionary Biology to Interior-Point Methods

From Evolutionary Biology to Interior-Point Methods From Evolutionary Biology to Interior-Point Methods Paul Tseng Mathematics, University of Washington Seattle WCOM, UBC Kelowna October 27, 2007 Joint work with Immanuel Bomze and Werner Schachinger (Univ.

More information

Primal and Dual Variables Decomposition Methods in Convex Optimization

Primal and Dual Variables Decomposition Methods in Convex Optimization Primal and Dual Variables Decomposition Methods in Convex Optimization Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint works with Edouard Pauwels, Shoham Sabach, Luba Tetruashvili,

More information

SOCP Relaxation of Sensor Network Localization

SOCP Relaxation of Sensor Network Localization SOCP Relaxation of Sensor Network Localization Paul Tseng Mathematics, University of Washington Seattle University of Vienna/Wien June 19, 2006 Abstract This is a talk given at Univ. Vienna, 2006. SOCP

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

arxiv: v2 [math.oc] 1 Jul 2015

arxiv: v2 [math.oc] 1 Jul 2015 A Family of Subgradient-Based Methods for Convex Optimization Problems in a Unifying Framework Masaru Ito (ito@is.titech.ac.jp) Mituhiro Fukuda (mituhiro@is.titech.ac.jp) arxiv:403.6526v2 [math.oc] Jul

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Smooth minimization of non-smooth functions

Smooth minimization of non-smooth functions Math. Program., Ser. A 103, 127 152 (2005) Digital Object Identifier (DOI) 10.1007/s10107-004-0552-5 Yu. Nesterov Smooth minimization of non-smooth functions Received: February 4, 2003 / Accepted: July

More information

10-725/ Optimization Midterm Exam

10-725/ Optimization Midterm Exam 10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

More information

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom

More information

SOCP Relaxation of Sensor Network Localization

SOCP Relaxation of Sensor Network Localization SOCP Relaxation of Sensor Network Localization Paul Tseng Mathematics, University of Washington Seattle IWCSN, Simon Fraser University August 19, 2006 Abstract This is a talk given at Int. Workshop on

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information