Proximal methods. S. Villa. October 7, 2014

Size: px
Start display at page:

Download "Proximal methods. S. Villa. October 7, 2014"

Transcription

1 Proximal methods S. Villa October 7, Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem of the form min y Kc 2, c R d for various choices of the loss function. Another typical problem is the regularized one, e.g. Tikhonov regularization where, for linear kernels one looks for 1 min w R d n n V ( w, x i, y i ) + λr(w). i=1 The class of methods we will consider are suitable to solve problems involving a non smooth regularization term. In particular, we will be motivated by the following regularization terms: (i) l 1 norm regularization (ii) group l 1 norm regularization (iii) matrix norm regularization More generally, we are interested in solving a minimization problem min F (w). w R d We review the basic concepts that allow to study the problem. Existence of a minimizer domain of F is We will consider extended real valued functions F : R d R {+ }. The domf = {w R d : F (w) < + }. This all F is proper if the domain is nonempty. It is useful to consider extended valued functions since they allow to include constraints in the regularization. F is lower semicontinuous if epif is closed (example). F is coercive if lim w + F (w) = +. Theorem 1.1. If F is lower semicontinuous and coercive then there exists w such that F (w ) = min F. We will always assume that the functions we consider are lower semicontinuous. 1

2 1.1 Convexity concepts Convexity F is convex if ( w, w domf )( λ [0, 1]) F (λw + (1 λ)w ) λf (w) + (1 λ)f (w ). If F is differentiable, we can write an equivalent characterization of convexity based on the gradient: ( w, w R d ) F (w ) F (w) + F (w), w w If F is twice differentiable, and 2 F is the Hessian matrix, convexity is equivalent to 2 F (w) positive semidefinite for all w R d. If a function is convex and differentiable, then F (w) = 0 implies that w is a global minimizer. Strict Convexity F is strictly convex if ( w, w domf )( λ (0, 1)) F (λw + (1 λ)w ) < λf (w) + (1 λ)f (w ). If F is differentiable, we can write an equivalent charcterization of strct convexity based on the gradient: ( w, w R d ) F (w ) > F (w) + F (w), w w If F is twice differentiable, and 2 F is the Hessian matrix, convexity is implied by 2 F (w) positive definite for all w R d. The minimizer of a strictly convex function is unique (if it exists) Strong Convexity [0, 1]) F is µ-strongly convex if the function f µ 2 is convex, i.e. ( w, w domf )( λ F (λw + (1 λ)w ) λf (w) + (1 λ)f (w ) µ 2 λ(1 λ) w w 2. If F is differentiable, then strong convexity is equivalent to ( w, w R d ) F (w ) F (w) + F (w), w w + µ 2 w w 2 If F is twice differentiable, and 2 F is the Hessian matrix, strong convexity is equivalent to 2 F (w) µi for all w R d. If F is strongly convex then it is coercive. Therefore if it is lsc, it admits a unique minimizer. Moreover F (w) F (w ) µ 2 w w 2. We will often assume Lipschitz continuity of the gradient This gives a useful quadratic upper bound of F F (w) F (w ) L w w. F (w ) F (w) + F (w), w w + L 2 w w 2 ( w, w domf ) (1) Moreover, for every w domf and w is a minimizer, 1 2L F (w) 2 F (w) F (w ) L 2 w w 2. The second inequality follows by substituting in the quadratic upper bound w = w and w = w. The first follows by substituting w = w 1 L F (w). 2

3 2 Convergence of the gradient method with constant step-size Assume F to be convex, differentiable, with L Lipschitz continuous gradient, and that a minimizer exists. The first order necessary condition is F (w) = 0. Therefore w α F (w ) = w This suggests an algorithm based on the fixed point iteration w k+1 = w k α F (w k ). We want to study convergence of this algorithm. Convergence can be intended in two senses, towards the minimum or towards a minimizer. Start from the first one. Different strategis to choose stepsize. We keep α fixed and determine a priori conditions guaranteeing convergence. From the quadratic upper bound (1) we get F (w k+1 ) F (w k ) α F (w k ) 2 + Lα2 2 F (w k) 2 = F (w k ) α (1 L2 ) α F (w k ) 2 If 0 < α < 2/L the iteration decreases the function value. Choose α = 1/L (which gives the maximum decrease) and get F (w k+1 ) F (w k ) 1 2L F (w k) 2 F (w ) + F (w k ), w k w 1 2L F (w k) 2 = F (w ) + L ( 1L 2 F (w k), w k w 1L ) 2 F (w k) 2 w k w 2 + w k w 2 = F (w ) + L 2 ( w k w 2 w k 1 L F (w k) w 2 ) = F (w ) + L 2 ( w k w 2 w k+1 w 2 ) Summing the above inequality for k = 0,..., K 1 we get K 1 k=0 K 1 k=0 F (w k ) F (w ) K 1 k=0 F (w k ) F (w ) L 2 w 0 w 2 L 2 ( w k w 2 w k+1 w 2 ) Noting that F (w k ) is decreasing, F (w K ) F (w ) F (w k ) F (w ) for every k, therefore we obtain F (w K ) F (w ) L 2K w 0 w 2. This is called sublinear rate of convergence. For strongly convex functions, it is possible to prove that the operator I α F is a contraction, and therefore we get linear convergence rate: w K w 2 ( ) 2K L µ w 0 w 2 L + µ 3

4 which gives, using the bound following (1) which is much better. F (w K ) F (w ) L 2 ( ) 2K L µ w 0 w 2 L + µ It is known that for general convex problems problems, with Lipschitz continuous gradient, the performance of any first order method is lower bounded by 1/k 2. Nesterov in 1983 devised an algorithm reaching the lower bound. The algorithm is called accelerated gradient descent and is very similar to the gradient. It needs to store two iterates, instead of only one. It is of the form w k+1 = u k 1 L F (u k) u k+1 = a k w k + b k w k+1, for some w 0 domf, and u 1 = w 0 and a suitable (a priori determined) sequence of parameters a k and b k. More precisely, choose w 0 domf, and u 1 = w 0. Set t 1 = 1. Then define w k+1 = u k 1 L F (u k) t k+1 = t 2 k ( 2 u k+1 = 1 + t ) k 1 w k + 1 t k w k+1. t k+1 t k+1 We obtain F (w k ) F (w ) L w 0 w 2 2k 2 3 Regularized optimization We often want to minimize min F (w) + R(w), w R d where either F is smooth (e.g. square loss) and R is convex and nonsmooth, either R is smooth and F is not (SVM). We would like to write a similar condition to = 0 to characterize a minimizer. We use the subdifferential. Let R be a convex, lsc proper function. η R d is a subgradient of R at w if R(w ) R(w) + η, w w. The subdifferential R(w) is the set of all subgradients. It is easy to see that R(w ) = min R 0 R(w ). If R is differentiable, the subdifferential is a singleton and coincides with the gradient. Example 3.1 (Subdifferential of the indicator function). Let i C be the indicator function of a convex set C (constrained regularization). Let w C. Then i C =. If w C, then η i C (w) if and only if, for all v C i C (v) i C (w) η, v w 0 η, v w. This is the normal cone to C. 4

5 Example 3.2 (Subdifferential of R = 1 ). If, η is such that for all i = 1,..., d n v i i=1 n w i η, v w. i=1 v i w i η i (v i w i ), then η R(w). Vice versa, taking v j = w j for all j i we get that η R(w) implies that v i w i η i (v i w i ), and thus η i (w i ). We therefore proved that R(w) = ( (w 1 ),..., (w d )). Proximity operator Let R be lsc, convex, proper. Then prox R (v) = argmin w R d{r(w) w v 2 } is well-defined and is unique. Imposing the first order necessary conditions, we get Example 3.3. u = prox R (v) 0 R(u) + (u v) v u R(u) u = (I + R) 1 (v) i) If R = 0, then prox(v) = v. ii) If R = i C, directly from the definition prox R (v) = P C (v). ii) Proximity operator of the l 1 norm. Let λ > 0 and set R = λ 1. Let v R d and u = prox R (v). Then v u λ 1 (u). Since the subdifferential can be computed componentwise, the same holds for the prox. In particular, u = (I + R) 1 (v) By the previous example, this is equivalent to u = (I + R) 1 (v). To compute this quantity first note that v i + λ if v i > λ ((I + R)(v)) i = [ λ, λ] if v i = 0 v i λ if v i < λ Inverting the previous relationship we get u i λ if u i > λ (prox 1 (u)) i = 0 if u i [ λ, λ] u i + λ if u i < λ 4 Basic proximal algorithm (forward-backward splitting) Assume that F is convex and differentiable with Lipschitz continuous gradient. As for gradient descent, the idea is to start from a fixed point equation characterizing the minimizer. If we write the first order conditions, we get 0 F (w ) + R(w ) α F (w ) α R(w ) w α F (w ) w αr(w ) w = prox αr (w α F (w )). 5

6 We consider the fixed point iteration Another interpretation: w k+1 = prox αk R(w k α k F (w k )). w k+1 = argmin{α k R(w) w (w k α k F (w k )) 2 } = argmin{r(w) + 1 2α k w w k 2 + w w k, F (w k ) + F (w k )} Special cases: R = 0 (gradient method), R = i C (projected gradient method). The proof of convergence for the sequence of objective values with α k = 1/L is similar to the proof of convergence for the differentiable case. The rate of convergence is the same as in the differentiable case (this would not be the case if a subdifferential method was used, compare...) F (w k ) F (w ) L w 0 w 2 2k Convergence proof Set α k = 1/L and define the gradient mapping as Then G 1/L (w) = L(w 1 L prox R/L(w 1 F (w))) L w k+1 = w k 1 L G 1/L(w k ). Note that G 1/L is not a gradient or a subgradient of F + R but is called gradient mapping. By wiriting the first order condition for the prox operator, we get: Recalling the upper bound (1), we obtain G 1/L (w) F (w) + R(w 1 L G 1/L(w)) F (w 1 L G 1/L(w)) F (w) 1 L F (w), G 1/L(w) + 1 2L G 1/L(w) 2 (2) If inequality (2) holds, then for every v R d :... F (w 1 L G 1/L(w)) F (v) + G 1/L (w), w v + 1 2L G 1/L(w) 2 Accelerated versions As for the gradient. The problem is that the forward-backward algorithm is effective only when prox is easy to compute. Note indeed that we replaced our original problem with a sequence of new minimization problems. They are strongly convex (therefore easier), but in general not solvable in closed form. 5 Fenchel conjugate and Moreau decomposition Fenchel conjugate The Fenchel conjugate is a function R : R d R {+ } defined as R (η) w R d { η, w R(w)}. R is a convex function (even if R is not), since it is the pointwise supremum of convex (linear) functions. Example 6

7 1. Conjugate of an affine function. It is the indicator function. If R(w) = a, w +b, then R (w) = bι {a} 2. The conjugate of an indicator function i C is the support function sup u C u, 3. Conjugate of the norm R(w) = w. In this case define η w: w 1 η, w. Then R = i B, where B = {η η 1}. (for the l 1 norm, it is the l norm) Indeed, let η R d : R (η) w R d η, w w sup t R + w =1 t R + t( η 1) η, tw t w and thus R (η) = i B By definition, R (η) + R(w) η, w for η, w R d (Fenchel Young inequality). Moreover, R(w) + R (η) = η, w η R(w) w R (η) Suppose that R (η) = η, w R(w) iff η, w R(w ) η, w R(w) for every w iff η R(w). From R(w) + R (η) = w, η we get R (η ) η, u R(u) u η, w R(w) = η η, w + η, w R(w) = η η, w R (η). If R is lsc and convex, then R = R (which gives the other equivalence). Moreau decomposition w = prox R (w) + prox R (w) It follows from the properties stated above of the subdifferential and of the conjugate: u = prox R (w) w u R(u) u R (w u) w (w u) R (w u) w u = prox R (w). Note that this is a generalization of the classical decomposition on orthogonal components. So if V is a linear subspace and V is the orthogonal subspace, we know w = P V (w) + P V (w). This is a special case of the Moreau decomposition obtained by choosing R = i V (and noting that R = i V ). Properties of the proximity operators examples prox R (w) = (prox R1 (w 1 ), prox R2 (w 2 )). Scaling: ( ) v prox R+ µ 2 2(v) = prox 1 1+µ R 1 + µ Separable sum: If R(w) = R 1 (w 1 )+R 2 (w 2 ), then 7

8 Generalized Moreau decomposition: for every λ > 0: w = prox λr (w) + λ prox R /λ(w/λ) Sometimes, Moreau decomposition is useful to compute proximity operators. Let R(w) = λ w. We have seen that R = i B (λ). Therefore, from the Moreau decomposition, we get prox R (w) = w P B (λ)(w). In particular, if R = 1, noting that =, we obtain again the formula for the soft-thresholding seen before. Elastic-net. Let G = {G 1,..., G t } be a partition of the indices {1,..., d}. The following norm is called group lasso penalty: t R(w) = w Gi, i=1 where w 2 G i = j G i w 2 j. The dual norm is max j=1,...,t w G j, and therefore prox R (w) = w P B (w), where B = {w R d : w Gj 1, j = 1,..., t}. The projection on this set can be expressed componentwise as w Gj if w Gj 1 (P B (w)) Gj = w Gj w Gj otherwise 6 Computation of the proximity operator of matrix functions Let W R D T and let σ : R D T R q, where q = min{d, T } and σ 1 (W ) σ 2 (W )... σ q (W ) 0 are the singular values of the matrix W. We consider regularization terms of the form R(W ) = g(σ(w )) where g is absolutely symmetric, namely g(α) = g( ˆα), where ˆα is the vector obtained ordering the components of α in a decreasing order. We will show that the computation of the proximity operator of R can be reduced to the computation of that of the function g. Indeed, if W = U diag(σ(w ))V T (with U R D D, diag(σ(w )) R D T, and V R T T ), it holds prox λr (W ) = U diag(prox λg (σ(w )))V T. The proof of this statement is based on two results. Theorem 6.1 (Von Neumann 1937). For any D T matrices Z, W and A, and hence max{ UZV T, W U and V orthogonal} = σ(z), σ(w ), Z, W σ(z), σ(w ). Equality holds if and only if there exists a simultaneous SVD of Z and W. 8

9 Theorem 6.2 (Conjugacy formula). If g : R q ], + ] is absolutely symmetric then (g σ) = g σ. Proof. Let Z R D T. Then (g σ) (Z) Z, W g(σ(z)) W U,V ort.,a A R D T UAV T, W g(σ(a)) { } sup UAV T, W U,V A R D T σ(a), σ(w ) g(σ(a)) α R q α, σ(w ) g(α) = g (σ(w )). g(σ(a)) Using the two previous results it is easy to show a formula for the subdifferential of the function R. Theorem 6.3. Let g : R q ], + ] be absolutely symmetric. Then (g σ)(w ) = {U diag(α)v T α g(σ(w )), W = Uσ(W )V T, U and V orthogonal} Proof. Let Z (g σ)(w ). By definition of Fenchel conjugate and using the previous theorem (g σ)(w ) = {Z (g σ)(w ) + (g σ) (Z) = W, Z } = {Z (g σ)(w ) + (g σ)(z) = W, Z }. Now, by Young-Fenchel inequality (g σ)(w ) + (g σ)(z) σ(w ), σ(z) W, Z. Let Z (g σ)(w ). Then, we must have σ(w ), σ(z) = W, Z and hence, by Von Neumann Theorem Z and W have the same singular value decomposition. Moreover, by Theorem 6.1 g(σ(w )) + g (σ)(z) = σ(w ), σ(z) and therefore σ(z) g(σ(w )). So we proved one inclusion. Let s prove the other one. Take α g(σ(w ) and define Z = U diag(α)v T where U and V are orthonormal matrices such that W = Uσ(W )V T. Then W and Z have a simultaneous singular value decomposition and hence This implies that Z (g σ)(w ). (g σ)(w ) + (g σ) (Z) = g(σ(w )) + g (α) = σ(w ), α = σ(w ), σ(z) = W, Z. Theorem 6.4. Let R = g σ, with g : R q ], + ] absolutely symmetric, let W = Uσ(W )V T, with U and V are orthogonal, and let λ > 0. Then prox λr (W ) = U prox λg (σ(w ))V T. Proof. By definition, Z = prox λr (W ) is the unique minimizer of Z R(Z) + 1 Z W 2 2λ 9

10 and is thus the unique point in R D T satisfying 1 ( ) W Z R(Z). λ Now, let Z = U prox λg (σ(w ))V T, and let α = prox λg (σ(w )). By definition of prox λg, we have 1 (σ(w ) α) g(α). λ Using the charachterization of the subdifferential we found in Theorem 6.3 we get U 1 λ (σ(w ) α)v T (g σ)(z), and the conclusion follows by noting that Z = Z since the left hand side coincides with (W Z)/λ. Using these general results it is easy to compute the proximity operator of the nuclear norm (a.k.a. trace class norm, Schatten 1 norm): R(W ) = W 1 = q σ(w ) i. The function 1 is the ocmposition of the absolutely symmetric function g : R q R, g(α) = q i=1 α i with σ. The computation of the prox of R is thus reduced to the computation of the prox of the l 1 norm in R q. We already know that prox λg (α) = S λ (α), and finally we get i=1 prox λr (W ) = US λ (σ(w ))V T, where W = Uσ(W )V T. 7 References [A] Combettes, Pesquet Proximal splitting methods in signal processing, 2009 [B] Combettes and Wajs, Signal Recovery by Proximal forward-backward splitting, Multiscale Model Simul, 2005 [C] Lewis, The convex analysis of unitary invariant matrix functions, 1995 [D] Nesterov, A basic course in optimization [E] Beck- Teboulle, A fast iterative soft-thresholding algorithm for linear inverse problems, SIAM J Imaging Sciences

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Convex Optimization Conjugate, Subdifferential, Proximation

Convex Optimization Conjugate, Subdifferential, Proximation 1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

3.1 Convexity notions for functions and basic properties

3.1 Convexity notions for functions and basic properties 3.1 Convexity notions for functions and basic properties We start the chapter with the basic definition of a convex function. Definition 3.1.1 (Convex function) A function f : E R is said to be convex

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016) Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f

More information

A semi-algebraic look at first-order methods

A semi-algebraic look at first-order methods splitting A semi-algebraic look at first-order Université de Toulouse / TSE Nesterov s 60th birthday, Les Houches, 2016 in large-scale first-order optimization splitting Start with a reasonable FOM (some

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Linear convergence of iterative soft-thresholding

Linear convergence of iterative soft-thresholding arxiv:0709.1598v3 [math.fa] 11 Dec 007 Linear convergence of iterative soft-thresholding Kristian Bredies and Dirk A. Lorenz ABSTRACT. In this article, the convergence of the often used iterative softthresholding

More information

consistent learning by composite proximal thresholding

consistent learning by composite proximal thresholding consistent learning by composite proximal thresholding Saverio Salzo Università degli Studi di Genova Optimization in Machine learning, vision and image processing Université Paul Sabatier, Toulouse 6-7

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Mathematical methods for Image Processing

Mathematical methods for Image Processing Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4 Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Dedicated to Michel Théra in honor of his 70th birthday

Dedicated to Michel Théra in honor of his 70th birthday VARIATIONAL GEOMETRIC APPROACH TO GENERALIZED DIFFERENTIAL AND CONJUGATE CALCULI IN CONVEX ANALYSIS B. S. MORDUKHOVICH 1, N. M. NAM 2, R. B. RECTOR 3 and T. TRAN 4. Dedicated to Michel Théra in honor of

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

A user s guide to Lojasiewicz/KL inequalities

A user s guide to Lojasiewicz/KL inequalities Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Self-dual Smooth Approximations of Convex Functions via the Proximal Average

Self-dual Smooth Approximations of Convex Functions via the Proximal Average Chapter Self-dual Smooth Approximations of Convex Functions via the Proximal Average Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang Abstract The proximal average of two convex functions has proven

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

arxiv: v1 [cs.lg] 27 Dec 2015

arxiv: v1 [cs.lg] 27 Dec 2015 New Perspectives on k-support and Cluster Norms New Perspectives on k-support and Cluster Norms arxiv:151.0804v1 [cs.lg] 7 Dec 015 Andrew M. McDonald Department of Computer Science University College London

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Radial Subgradient Descent

Radial Subgradient Descent Radial Subgradient Descent Benja Grimmer Abstract We present a subgradient method for imizing non-smooth, non-lipschitz convex optimization problems. The only structure assumed is that a strictly feasible

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem 1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD

ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD TIM HOHEISEL, MAXIME LABORDE, AND ADAM OBERMAN Abstract. In this article we study the connection

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State

More information

Convergence rate estimates for the gradient differential inclusion

Convergence rate estimates for the gradient differential inclusion Convergence rate estimates for the gradient differential inclusion Osman Güler November 23 Abstract Let f : H R { } be a proper, lower semi continuous, convex function in a Hilbert space H. The gradient

More information

Epiconvergence and ε-subgradients of Convex Functions

Epiconvergence and ε-subgradients of Convex Functions Journal of Convex Analysis Volume 1 (1994), No.1, 87 100 Epiconvergence and ε-subgradients of Convex Functions Andrei Verona Department of Mathematics, California State University Los Angeles, CA 90032,

More information

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Christian Clason Faculty of Mathematics, Universität Duisburg-Essen joint work with Barbara Kaltenbacher, Tuomo Valkonen,

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information