Computational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom

Size: px
Start display at page:

Download "Computational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom"

Transcription

1 Computational Statistics and Optimisation Joseph Salmon Télécom Paristech, Institut Mines-Télécom

2 Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward analysis Forward-backward accelerated Coordinate descent

3 Fenchel Duality for stopping criterion F objective function, fix ε ą 0 small, and stop when Fpθ t`1 q Fpθ t q Fpθ t q ď ε or Fpθ t q ď ε Alternative : leverage on the duality gap Notation : Fpθq f pxθq ` gpθq with f : R n Ñ R, g : R p Ñ R and X : n ˆ p matrix. Fenchel-Duality Consider the problem min θ Fpθq, then the following holds supt f puq g p X J uqu ď inftf pxθq ` gpθqu u θ Moreover, if f and g are convex, then under mild assumptions, equality of both sides holds (strong duality, no duality gap) proof : use Fenchel-Young inequality (TO DO: Blackboard)

4 Fenchel Duality We denote by θ : primal optimal solution of inf θ tf pxθq ` gpθqu u : dual solution of sup u t f puq g p X J uqu Define the duality gap by : pθ, uq Fpθq ` f puq ` g p X J uq Property of the duality u, pθ, uq ě Fpθq Fpθ q ě 0 proof : Fenchel-duality applied to a primal solution θ Motivation : pθ, uq ď ε ñ Fpθq Fpθ q ď ε

5 Example : Duality gap for the Lasso Lasso objective : Fpθq 1 2 }Xθ y}2 2 ` λ}θ} 1 f pzq 1 2 }z y}2 2 ; f puq 1 2 }u}2 2 ` xu, yy (translation prop.) gpθq λ}θ} 1 ; g puq ι tu,}u}8ďλu (l 8 ball indicator) Duality gap : pθ, uq Fpθq ` f puq ` g p X J uq Fpθq ` 1 2 }u}2 2 ` xu, yy as soon as }X J u} 8 ď λ, otherwise the bound is `8 : useless Rem: at optimum solutions and under mild assumptions pθ, u q 0

6 Example : Duality gap for the Lasso (II) Possible choice : θ t (current iterate of any iterative algorithm), r t Xθ k y (minus current residuals) u t µ t r t with µ t minp1, λ{}x J r t } 8 q Motivation for this choice : at optimum u f pxθ q Stopping criterion : 1 2 }r t} 2 2 ` λ}θ t } 1 ` 1 2 }u t} 2 2 ` xu t, yy ď ε ô 1 2 p1 ` µ2 t q}r t } 2 2 ` λ}θ t } 1 ` µ t xr t, yy ď ε

7 Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward analysis Forward-backward accelerated Coordinate descent

8 Convergence : Lipschitz gradient θ t`1 θ t α f px t q Convergence rate for fixed step size Hypothesis : f convex, differentiable with gradient L-Lipschitz, θ 1 q, } f pθq f pθ 1 q} ď L}θ θ 1 } Result : for any minimum θ of f, if α ď 1 L then θt satisfies f pθ T q f pθ q ď }θ0 θ } 2 2αT Rem: if f is twice differentiable 2 f pxq ĺ L Id Example: θ ÞÑ }Xθ y}2 2 2 then L λ max px J Xq (spectral radius)

9 Convergence : proof Point 1 : gradient L-Lipschitz implies quadratic upper yq f pxq ď f pyq ` x f pxq, y xy ` L }y x}2 2 Point 2 : use definition θ t`1 θ t α f pθ t q and get f pθ t`1 q ď f pθ t q p1 Lα 2 qα} f pθt q} 2 Point 3 : use convexity, 0 ă α ď 1 L and ab pa2 ` b 2 pa bq 2 q{2 f pθ t`1 q ďf pθ q ` f pθ t q J pθ t θ q α 2 } f pθt q} 2 f pθ q ` 1 2α p}θt θ } 2 }θ t`1 θ } 2 q

10 Convergence proof (bis) Point 4 : Telescopic sums 1 T T 1 ÿ t 0 `f pθ t`1 q f pθ q ď 1 T 1 2α p}θ0 θ } 2 }x T θ } 2 q ď 1 2αT }θ0 θ } 2 From Point 2 f pθ t`1 q ď f pθ t q, hence f pθ t`1 q f px q ď 1 T T 1 ÿ t 0 `f pθ t`1 q f pθ q ď 1 2αT }θ0 θ } 2

11 Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward analysis Forward-backward accelerated Coordinate descent

12 Composite minimization One aims at minimizing : F f ` g f smooth : f is L-Lipschitz g proximable (prox-capable) : ˆ1 prox αg pyq arg min zpr d 2 }z y}2 2 ` αgpzq efficiently computable Examples: Projection over a box-constraint, (block) Soft-thresholding operator, shrinkage operator (Ridge)

13 Forward-Backward algorithm Notation : φ α pθq : prox αg pθ α f pθqq Forward-Backward algorithm Input: Initialization θ 0, step size α Result: θ T while not converged do θ t`1 φ α pθ t q end Rem: φ α pθq arg min θ 1 ˆ f pθq ` x f pθq, θ 1 θy ` 1 2α }θ1 θ} 2 ` gpθ 1 q

14 Convergence : f gradient Lipschitz θ t`1 φ α pθ t q prox αg pθ t α f pθ t qq Convergence rate for fixed step size Hypothesis : f convex, differentiable with gradient L-Lipschitz, θ 1 q, } f pθq f pθ 1 q} ď L}θ θ 1 } Result : for any minimum θ of F, if α ď 1 L then θt satisfies Fpθ T q Fpθ q ď }θ0 θ } 2 2αT Rem: same bound as in the case with g 0

15 Proof : gradient Point 1 : for α ď 1{L and ˆx φ α p xq then for all y : Fpˆxq ` }ˆx y}2 2 2α ď Fpyq ` } x y}2 2 2α Proof : H α pyq f p xq ` x f p xq, y xy ` 1 2α }y x}2 ` gpyq H α is 1{α-strongly convex since α ď 1{L and H p q 1{p2αq} } 2 2 is convex (cf. for instance page 280, Hiriart-Urruty and Lemaréchal (1993)) ˆx arg min H α pyq (cf. two slides up) y By 1{α-strong convexity H α pˆxq ` 1{p2αq}ˆx y} 2 2 ď H αpyq

16 Point 1 (continued) gpˆxq ` f p xq ` x f p xq, ˆx xy ` 1 2α p}ˆx x}2 ` }ˆx y} 2 q ď By convexity of f : gpyq ` f p xq ` x f p xq, y xy ` 1 }y x}2 2α f p xq ` x f p xq, y xy ď f pyq and by the choice α ď 1{L the following bound holds : f pˆxq ď f p xq ` x f p xq, ˆx xy ` 1 }ˆx x}2 2α Hence Point 1 : Fpˆxq ` 1 2α }ˆx y}2 ď Fpyq ` 1 }y x}2 2α

17 Theorem proof Point 1 states : Fpˆxq ` 1 2α }ˆx y}2 ď Fpyq ` 1 2α }y x}2, so choosing y θ (any minimizer of F) and x θ t, ˆx θ t`1 : Fpθ t`1 q ` 1 2α }θt`1 θ } 2 ď Fpθ q ` 1 2α }θ θ t } 2 This leads to Point 3 of the smooth case : Fpθ t`1 q ďfpθ q ` 1 2α p}θt θ } 2 }θ t`1 θ } 2 q and then the same telescopic argument lead to the bound

18 Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward analysis Forward-backward accelerated Coordinate descent

19 Forward-Backward accelerated algorithm Notation : φ α pθq : prox αg pθ α f pθqq Forward-Backward algorithm Input: Initialization θ 0, step size α, a sequence pµ t q tpn satisfying :µ 1 1 and µ 2 t`1 µ t`1 ď µ 2 t Result: θ T while not converged do θ t`1 φ α pz t q z t`1 θ t`1 ` µt 1 µ t`1 pθ t`1 θ t q end Examples of admissible sequences : µ t`1 a µ 2 t ` 1{4 ` 1{2 (i.e., µ2 t`1 µ t`1 µ 2 t ) µ t`1 pt ` 1q{2 µ t`1 pt ` a 1q{a

20 Convergence : Lipschitz gradient θ t`1 φ α pθ t q prox αg pθ t α f pθ t qq Convergence rate for fixed step size Hypothesis : f convex, differentiable with gradient L-Lipschitz, θ 1 q, } f pθq f pθ 1 q} ď L}θ θ 1 } Result : for any minimum θ of F, if α ď 1 L then θt satisfies Fpθ T q Fpθ q ď }θ0 θ } 2 2αµ 2 T Rem: for common choices given above µ t «t Rem: define F Fpθ q for the proof

21 Proof : rate for the Nesterov acceleration Point 1 with ˆx φ α pz t q, x z t, y p1 1{µ t`1 qθ t ` 1{µ t`1 θ Fpˆxq ` }ˆx y}2 2 2α ď Fpyq ` } x y}2 2 2α with u t`1 θ t ` µ t`1 pθ t`1 θ t q and a little algebra gives : Fpθ t`1 q ` }ut`1 θ } 2 2 2αµ 2 t`1 ďfpyq ` }ut θ } 2 2 2αµ 2 t`1 Fpθ t`1 q F p1 1 qpfpθ t q F q ď }ut θ } 2 2 }ut`1 θ } 2 2 µ t`1 2αµ 2 t`1 2αµ 2 t`1 µ 2 t`1 F t`1 pµ 2 t`1 µ t`1 qp F t q ď }ut θ } 2 2 2α }ut`1 θ } 2 2 2α (convexity of F and F t`1 Fpθt`1 q F )

22 Proof continued Define ρ t`1 : µ t`1 µ 2 t`1 ` µ2 t ě 0 so µ 2 t`1 F t`1 pµ 2 t`1 µ t`1 qp F t q ď }ut θ } 2 2 2α µ 2 t`1 F t`1 µ 2 t F t ` ρ t`1 F t ď }ut θ } 2 2 2α }ut`1 θ } 2 2 2α }ut`1 θ } 2 2 2α Telescopic terms again (convention µ 0 0 and u 0 x 0 x 1 ) µ 2 T F T ` Tÿ t 0 ρ t`1 F t ď }u0 θ } 2 2 2α }ut θ } 2 2 2α µ 2 T F T ď }u0 θ } 2 2 2α

23 Convergence of the iterates Very recent result : Chambolle and Dossal 2014 Proof out of the scope of this course More reading on the previous theme : Nesterov for proofs, strong convexity, etc. Beck and Teboulle09 for ISTA/FISTA analysis Chambolle and Dossal 2014 for FISTA with larger choice of updating rules

24 Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward analysis Forward-backward accelerated Coordinate descent

25 Coordinate descent Objective : solve arg min f pθq θpr p Initialization : θ p0q θ pkq 1 P arg min f pθ 1, θ pk 1q 2, θ pk 1q 3..., θp pk 1q q θ pkq θ 1 PR 2 P arg min f pθ pkq 1, θ 2, θ pk 1q 3,..., θp pk 1q q θ pkq θ 2 PR 3 P arg min f pθ pkq 1, θpkq 2, θ 3,..., θp pk 1q q. θ pkq p θ 3 PR P arg min f pθ pkq 1, θpkq 2, θpkq 3,..., θ pq θ ppr cycle over coordinates

26 Motivation Convergence guarantees toward a minimum for smooth functions cf. Tseng (2001)

27 Motivation Convergence guarantees toward a minimum for separable functions cf. Tseng (2001)

28 Motivation NO CONVERGENCE toward a minimum for non-separable/non-smooth functions : some points get stuck

29 Motivation NO CONVERGENCE toward a minimum for non-separable/non-smooth functions : some points get stuck

30 Motivation NO CONVERGENCE toward a minimum for non-separable/non-smooth functions : some points get stuck

31 Motivation NO CONVERGENCE toward a minimum for non-separable/non-smooth functions : some points get stuck

32 Motivation Coordinate descent can be extremely fast Possibly visit the coordinate in arbitrary order (cycle, random, more refined methods,etc.) Possibly by blocks : update not only one coordinate, but a bunch of them (optimize according to your architecture) Exo: Testing over a the ridge regression problem the relative performance of cycling and random sampling of the coordinate

33 CD for least square II arg min θpr p f pθq pour f pθq 1 2 }y Xθ} nÿ py i i 1 pÿ θ j x j q 2 x J Bf 1 pxθ yq Bθ 1 pθq Reminder : f pθq X J pxθ yq.. xp J Bf pxθ yq Bθ p pθq Minimize w.r.t θ j with fixed θ k (k j) 0 Bf pθq xj J pxθ yq xj J x j θ j ` ÿ x k θ k y Bθ j k j xj y J řk j x kθ k ôθ j xj Jx xj j py řp k 1 x kθ k ` x j θ j q j }x j } 2 2 j 1

34 CD for least square II Clever update scheme with low memory impact by storing : current residual in a variable r pkq (size n vector) current coefficient in θ pkq (size p vector) Coordinate descent for least square Input: Observations y, features X rx 1,, x p s, initial θ p0q Result: Vector θ pkq while not converged do Pick a coordinate j r int Ð r pkq ` x j θ pkq j θ pk`1q j Ð xj J r int {}x j } 2 2 r pk`1q r int x j θ pk`1q j end Rem: computational simplification }x j } (NORMALIZE!) Rem: the residual update can be done in place

35 Ridge regression with coordinate descent arg min θpr p f pθq for f pθq 1 2 nÿ py i i 1 pÿ j 1 θ j x j q 2 ` λ 2 pÿ θj 2 j 1 x1 J f pθq X J pxθ yq ` λθ Bf 1 Bθ 1 pθq pxθ yq ` λθ.. xp J Bf pxθ yq ` λθ p Bθ p pθq Minimize w.r.t θ j fixing θ k (k j) 0 Bf pθq xj J pxθ yq ` λθ j xj J x j θ j ` ÿ x k θ k y ` λθ j Bθ j k j xj y J řk j x kθ k ôθ j xj Jx xj j py řp k 1 x kθ k ` x j θ j q j ` λ }x j } 2 2 ` λ

36 Ridge regression with coordinate descent II Clever update scheme with low memory impact by storing : current residual in a variable r pkq (size n vector) current coefficient in θ pkq (size p vector) Coordinate Descent for Ridge regression Input: Observations y, features X rx 1,, x p s, initial θ p0q Result: Vector θ pkq while not converged do Pick a coordinate j r int Ð r pkq ` x j θ pkq j Ð xj J r int {p}x j } 2 2 ` λq θ pk`1q j r pk`1q r int x j θ pk`1q j end Rem: computational simplification }x j } (NORMALIZE!) Rem: the residual update can be done in place

37 Lasso with coordinate descent arg min θpr p f pθq for f pθq 1 2 }y Xθ}2 ` λ Minimize w.r.t θ j fixing θ k (k j) ˆθ j arg min f pθ 1,, θ p q θ j PR arg min θ j PR arg min θ j PR arg min θ j PR 1 2 }y ÿ k j θ k x k x j θ j } 2 ` λ ÿ k j 1 2 }x j} 2 θj 2 xy ÿ» }x j } k j pÿ θ j j 1 θ j ` λ θ j θ k x k, x j yθ j ` λ θ j θ j }x } 2 j xy ÿ θ k x k, x j y k j 2 fi ` λ }x j } 2 θ j fl

38 Lasso with coordinate descent ˆθ j arg min θ j PR» }x j } θ j }x } 2 j xy ÿ θ k x k, x j y k j 2 fi ` λ }x j } 2 θ j fl Soft-Thresholding : S λ pzq arg min x ÞÑ 1 xpr 2 pz xq2 2 ` λ x Update rule : ˆθj S λ{}xj } 2 }x } 2 j xy ÿ θ k x k, x j y k j

39 Ridge regression with coordinate descent II Clever update scheme with low memory impact by storing : current residual in a variable r pkq (size n vector) current coefficient in θ pkq (size p vector) Coordinate descent for least square Input: Observations y, features X rx 1,, x p s, initial θ p0q Result: Vector θ pkq while not converged do Pick a coordinate j r int Ð r pkq ` x j θ pkq j Ð S λ{}xj} `xj 2 j r int {}x j } 2 θ pk`1q j r pk`1q r int x j θ pk`1q j end Rem: computational simplification }x j } (NORMALIZE!) Rem: the residual update can be done in place

40 Similarity with Forward-Backward Update rule for CD : Update rule for FB : θ pk`1q j Ð S λ{}xj } 2 `xj j r{}x j } 2 θ pk`1q Ð S λ{l `X J r{l where L is the Lipschitz constant of X J X, L λ max px J Xq Rem: The Forward-Backward update could be really useful if the following operation can be performed efficiently : # R n Ñ R p r ÞÑ X J r Common examples includes : FFT, wavelet transform, etc. Rem: Note that the residual is usually a full vector ( sparse)

41 Optimisation Other alternatives to obtained a/the Lasso solution LARS Efron et al. (2004) for full Lasso path Forward-Backward (ISTA, FISTA, cf. Beck et Teboulle(2009)) Conditional Gradient / Frank-Wolfe (Jaggi (2013)) useful for distributed datasets

42 Références I A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci., 2(1) : , A. Chambolle and C. Dossal. How to make sure the iterates of fista converge B. Efron, T. Hastie, I. M. Johnstone, and R. Tibshirani. Least angle regression. Ann. Statist., 32(2) : , With discussion, and a rejoinder by the authors. J-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimization algorithms. I, volume 305. Springer-Verlag, Berlin, M. Jaggi. Revisiting tfrank-wolfeu : Projection-free sparse convex optimization. In ICML, pages , 2013.

43 Références II Y. Nesterov. Introductory lectures on convex optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl., 109(3) : , 2001.

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Efficient Energy Maximization Using Smoothing Technique

Efficient Energy Maximization Using Smoothing Technique 1/35 Efficient Energy Maximization Using Smoothing Technique Bogdan Savchynskyy Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 2/35 Energy Maximization Problem G pv, Eq, v

More information

Convex optimization, sparsity and regression in high dimension CIMAT 2015

Convex optimization, sparsity and regression in high dimension CIMAT 2015 Convex optimization, sparsity and regression in high dimension CIMAT 2015 Joseph Salmon http://josephsalmon.eu LTCI, CNRS, Télécom ParisTech, Université Paris-Saclay Outline Variable selection and sparsity

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Nesterov s Acceleration

Nesterov s Acceleration Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x

More information

GAP Safe screening rules for sparse multi-task and multi-class models

GAP Safe screening rules for sparse multi-task and multi-class models GAP Safe screening rules for sparse multi-task and multi-class models Eugene Ndiaye Olivier Fercoq Alexandre Gramfort Joseph Salmon LTCI, CNRS, Télécom ParisTech, Université Paris-Saclay Paris, 75013,

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016) Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Accelerated gradient methods

Accelerated gradient methods ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system

Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system 1 Joinings and channels 1.1 Joinings Definition 1. If px, µ, T q and py, ν, Sq are MPSs, then a joining

More information

GAP Safe Screening Rules for Sparse-Group Lasso

GAP Safe Screening Rules for Sparse-Group Lasso GAP Safe Screening Rules for Sparse-Group Lasso Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon LTCI, CNRS, Télécom ParisTech Université Paris-Saclay 75013 Paris, France first.last@telecom-paristech.fr

More information

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19 Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 MARCO VITTURI Contents 1. Solution to exercise 5-2 1 2. Solution to exercise 5-3 2 3. Solution to exercise 5-7 4 4. Solution to exercise 5-8 6 5. Solution

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives Paul Grigas May 25, 2016 1 Boosting Algorithms in Linear Regression Boosting [6, 9, 12, 15, 16] is an extremely

More information

COMPLEX NUMBERS LAURENT DEMONET

COMPLEX NUMBERS LAURENT DEMONET COMPLEX NUMBERS LAURENT DEMONET Contents Introduction: numbers 1 1. Construction of complex numbers. Conjugation 4 3. Geometric representation of complex numbers 5 3.1. The unit circle 6 3.. Modulus of

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Celer: Fast solver for the Lasso with dual extrapolation

Celer: Fast solver for the Lasso with dual extrapolation Celer: Fast solver for the Lasso with dual extrapolation Mathurin Massias https://mathurinm.github.io INRIA Joint work with: Alexandre Gramfort (INRIA) Joseph Salmon (Télécom ParisTech) To appear in ICML

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates

The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates Radu Ioan Boţ Dang-Khoa Nguyen January 6, 08 Abstract. We propose two numerical algorithms

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Mathematical methods for Image Processing

Mathematical methods for Image Processing Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

An Efficient Proximal Gradient Method for General Structured Sparse Learning

An Efficient Proximal Gradient Method for General Structured Sparse Learning Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Singular integral operators and the Riesz transform

Singular integral operators and the Riesz transform Singular integral operators and the Riesz transform Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto November 17, 017 1 Calderón-Zygmund kernels Let ω n 1 be the measure

More information

Lecture 16: October 22

Lecture 16: October 22 0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Lecture 19 - Covariance, Conditioning

Lecture 19 - Covariance, Conditioning THEORY OF PROBABILITY VLADIMIR KOBZAR Review. Lecture 9 - Covariance, Conditioning Proposition. (Ross, 6.3.2) If X,..., X n are independent normal RVs with respective parameters µ i, σi 2 for i,..., n,

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Lecture 2 February 25th

Lecture 2 February 25th Statistical machine learning and convex optimization 06 Lecture February 5th Lecturer: Francis Bach Scribe: Guillaume Maillard, Nicolas Brosse This lecture deals with classical methods for convex optimization.

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Regression.

Regression. Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets

More information

Non-asymptotic bound for stochastic averaging

Non-asymptotic bound for stochastic averaging Non-asymptotic bound for stochastic averaging S. Gadat and F. Panloup Toulouse School of Economics PGMO Days, November, 14, 2017 I - Introduction I - 1 Motivations I - 2 Optimization I - 3 Stochastic Optimization

More information

Approximation in the Zygmund Class

Approximation in the Zygmund Class Approximation in the Zygmund Class Odí Soler i Gibert Joint work with Artur Nicolau Universitat Autònoma de Barcelona New Developments in Complex Analysis and Function Theory, 02 July 2018 Approximation

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

arxiv: v1 [stat.ml] 19 Feb 2016

arxiv: v1 [stat.ml] 19 Feb 2016 GAP Safe Screening Rules for Sparse-Group Lasso Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon CNRS LTCI, Télécom ParisTech, Université Paris-Saclay 46 rue Barrault, 75013, Paris, France

More information

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes

More information

Mariusz Jurkiewicz, Bogdan Przeradzki EXISTENCE OF SOLUTIONS FOR HIGHER ORDER BVP WITH PARAMETERS VIA CRITICAL POINT THEORY

Mariusz Jurkiewicz, Bogdan Przeradzki EXISTENCE OF SOLUTIONS FOR HIGHER ORDER BVP WITH PARAMETERS VIA CRITICAL POINT THEORY DEMONSTRATIO MATHEMATICA Vol. XLVIII No 1 215 Mariusz Jurkiewicz, Bogdan Przeradzki EXISTENCE OF SOLUTIONS FOR HIGHER ORDER BVP WITH PARAMETERS VIA CRITICAL POINT THEORY Communicated by E. Zadrzyńska Abstract.

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014 Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information