This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
|
|
- Ferdinand Bryan
- 5 years ago
- Views:
Transcription
1 This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms!
2 PROXIMAL METHODS
3 WHY PROXIMAL METHODS Smooth functions minimize f(x) Non-differentiable minimize f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc Proximal methods Constrained problems? minimize f(x) subject to g(x) apple 0 Lagrangian methods h(x) =0
4 PROXIMAL OPERATOR prox f (z, ) = arg min x f(x)+ 1 2 kx zk2 objective proximal penalty Stay close to z
5 PROXIMAL OPERATOR prox f (z, ) = arg min x f(x)+ 1 2 kx zk2 stepsize prox f (z,t) z
6 GRADIENT INTERPRETATION prox f (z, ) = arg min x f(x)+ 1 2 kx zk2 assume differentiable rf(x)+ 1 (x z) =0 x = z rf(x) Two type of gradient descent Forward gradient x = z rf(z) Backward gradient x = z rf(x)
7 SUBDIFFERENTIAL prox f (z, ) = arg min x f(x)+ 1 2 kx zk2 non-differentiable 0 1 (x z) 0=g + 1 (x z), for some g Backward gradient descent x = z g
8 EXAMPLE: L1 prox f (z, ) = arg min x x kx zk2 Is sub-differential unique? Is solution unique? prox f (z,t) z
9 PROPERTIES backward gradient descent prox f (z, ) = arg min x x = z f(x)+ 1 2 zk2 forward gradient descent x = z rf(z) Existence Exists for any proper convex function (and some non-convex) Exists only for smooth functions Uniqueness Always unique result Sub-gradient unique if differentiable
10 RESOLVENT prox f (z, ) = arg min x f(x)+ 1 2 kx zk2 non-differentiable 0 1 (x z) z z 2 + I)x x + I) 1 z =prox f (z, ) unique
11 EXAMPLE: L1 prox f (z, ) = arg min x x kx 8 >< 1+ 1 (x? z)=0, if x? > 0 >: 1+ 1 (x? z)=0, if x? < 0 1 (x? z) 2 [ 1, 1], if x? =0 zk2 shrink(z, ) = 8 >< x? = z, if x? > 0 x? = z +, if x? < 0 >: 0, otherwise
12 THE RIGOROUS WAY prox f (z, ) = arg min x max minimize 1 x kx dual problem 1 X 1 ( / ) 2 k zk2 or 2 k zk2 subject to k k 1 apple Lagrangian zk2 minimize x x ky zk2 + hy x, i y z + =0 y = z = z max(min(,z), )
13 SHRINK OPERATOR y =shrink(x, 1)
14 NUCLEAR NORM kxk = X i Singular values Why would you use this regularizer? If X is PSD, we call this the trace norm kxk = X i = trace(x)
15 NUCLEAR NORM PROX minimize kxk kx Zk2 Same singular vectors minimize kus X V T k kus XV T US Z V T k 2 minimize S X ks X S Z k 2 S X =shrink(s Z, ) X = U shrink(s Z, )V T
16 CHARACTERISTIC FUNCTIONS C = some convex set X C (x) = ( 0, if x 2 C 1, otherwise proximal minimize X C (x)+ 1 2t kx zk2 What does this do? Does the stepsize matter?
17 2-norm ball EXAMPLES C = B 2 = {x prox 2 (z) = z kzk kxk apple1} min{kzk, 1} infinity ball C = B 2 = {x kxk 1 apple 1} prox 1 (z) i =min{max{z i, 1}, 1} positive half space C = {x x 0} prox + (z) = max{z,0}
18 HARDER EXAMPLES semidefinite cone C = S ++ = {X X 0} prox S++ (Z) =U max{s, 0}V T L1-ball C = B 1 = {x kxk 1 apple 1} prox 1 (z) =???
19 PROXIMAL-POINT METHOD minimize f(x) backward gradient descent x k+1 =prox f (x k, ) =x k+1 ) x k+1 = arg min f(x)+ 1 2 kx xk k 2 stepsize restriction? does it depend on the norm? Would you ever use this?
20 FORWARD-BACKWARD SPLITTING minimize x g(x)+f(x) non-smooth backward gradient descent smooth forward gradient descent FBS forward step backward step ˆx = x k rf(x k ) x k+1 =prox g (ˆx, )
21 WHY FORWARD BACKWARD? forward step minimize x g(x)+f(x) ˆx = x k rf(x k ) backward step x k+1 =prox g (ˆx, ) x k+1 = x k rf(x k k+1 ) fixed-point property x? = x? rf(x? ) 0 2rf(x? )+@g(x? )
22 GRADIENT INTERPRETATION forward step backward step ˆx = x k rf(x k ) x k+1 =prox g (ˆx, ) x k+1 = x k rf(x k k+1 ) ˆx x?
23 MAJORIZATION- MINIMIZATION minimize h(x) surrogate function m(x k )=h(x k ) m(x) h(x), 8x surrogate majorizes f f m x k x k+1
24 MM ALGORITHM minimize h(x) =g(x)+f(x) f(x) apple f(x k )+hx x k, rf(x k )i kx xk k 2 apple 1 L f m x k x k+1
25 MM ALGORITHM minimize h(x) =g(x)+f(x) f(x) apple f(x k )+hx x k, rf(x k )i + 1 minimize m(x) =g(x)+ 1 2 kx xk k 2 minimize m(x) =g(x)+f(x k )+hx x k, rf(x k )i kx xk k 2 2 kx xk + rf(x k )k 2 f m x k x k+1
26 MM ALGORITHM minimize h(x) =g(x)+f(x) f(x) apple f(x k )+hx x k, rf(x k )i + 1 minimize m(x) =g(x)+f(x k )+hx x k, rf(x k )i kx xk k 2 minimize m(x) =g(x)+ 1 2 kx xk k 2 2 kx xk + rf(x k )k 2 prox(x k rf(x k ), )
27 PROXIMAL POINT INTERPRETATION minimize h(x) =g(x)+f(x) f(x) apple f(x k )+hx x k, rf(x k )i kx xk k 2 apple 1 L build a proximal penalty / distance measure d(x, x k )= 1 2 kx xk k 2 f(x)+hx x k, rf(x k )i + f(x k )
28 PROXIMAL POINT INTERPRETATION minimize h(x) =g(x)+f(x) d(x, x k )= 1 2 kx xk k 2 f(x)+hx x k, rf(x k )i + f(x k ) minimize h(x)+d(x, x k ) minimize g(x)+ 1 2 kx xk + rf(x k )k 2 prox(x k rf(x k ), )
29 SPARSE LEAST SQUARES minimize g(x) + f(x) minimize µ x kax bk2 non-smooth smooth forward step ˆx = x k rf(x k ) ˆx = x k A T (Ax k b) x k+1 = arg min g(x)+ 1 2 kx backward step ˆxk2 x k+1 = arg min µ x kx iterative shrinkage / thresholding x k+1 =shrink(ˆx, µ ) ˆxk2
30 SPARSE LOGISTIC minimize g(x) + f(x) minimize µ x + L(Ax, y) L(z,y) = X i log(1 + e y iz i ) ˆx = x k rf(x k ) forward step ˆx = x k A T rl(ax k,y) x k+1 = arg min g(x)+ 1 2 kx backward step ˆxk2 x k+1 = arg min µ x kx x k+1 =shrink(ˆx, µ ) ˆxk2
31 EXAMPLE: DEMOCRATIC REPRESENTATIONS { 1 minimize kax bk2 2 subject to kxk 1 apple µ minimize X µ 1(x)+ 1 2 kax bk2 forward step ˆx = x k A T (Ax k b) backward step x k+1 = arg min X 1(x)+ µ 1 2 kx =min{max{ˆx, µ},µ} ˆxk2
32 EXAMPLE: LASSO { 1 minimize kax bk2 2 subject to x apple µ implicit constraints minimize X µ 1 (x)+1 2 kax bk2 forward step ˆx = x k A T (Ax k b) backward step x k+1 = arg min X µ 1 (x)+ 1 2 kx ˆxk2
33 TOTAL VARIATION minimize µ rx kx fk2 Dualize the easy way z = max z 2[ 1,1] µ z = max z 2[ µ,µ] why? min max 2[ µ,µ] h, rxi kx fk2 hr T,xi max min 2[ µ,µ] x hrt,xi + 1 kx fk2 2 r T + x f =0
34 TOTAL VARIATION minimize µ rx kx fk2 max min 2[ µ,µ] x hrt,xi + 1 kx fk2 2 optimality condition r T + x f =0 x = f r T max 2[ µ,µ] hrt,f r T i krt k 2 max 2[ µ,µ] hrt,fi hr T, r T i krt k 2 1 max 2[ µ,µ] hrt,fi 2 krt k 2 max 2[ µ,µ] 1 2 krt fk 2
35 FBS FOR TV minimize µ rx kx fk2 max 2[ µ,µ] 1 2 krt fk 2 Dual x = f r T ( 0, z 2 [ µ, µ] X µ (z) = 1, otherwise minimize X µ ( )+ 1 2 krt fk 2 ˆ = forward step k r(r T k f) backward step k+1 = arg min X µ ( )+ 1 2 k =min{max{ µ, ˆ},µ} ˆk2
36 SVM minimize 1 2 kwk2 + Ch(LDw) h(z) = max{1 z,0} min w use the trick Ch(z) = max 2[0,C] max 2[0,C] 1 (1 z) 2 kwk2 + h, 1 LDwi optimality w D T L =0 maximize subject to 1 2 kdt L k T 2 [0,C]
37 SVM maximize subject to 1 2 kdt L k T 2 [0,C] forward ASCENT step ˆ = k (LDD T L k 1) backward step k+1 =min{max{0, ˆ},C}
38 NETFLIX PROBLEM movies people data imputation : fill in missing values low rank assumption: everyone described by affinity for horror, action, rom-com, etc matrix completion
39 NUCLEAR NORM FORM relax minimize rank(x)+ 1 2 mask km X Dk2 data minimize kxk km X Dk2 coordinate multiplication forward ASCENT step ˆX = X k M (M X k D) why not transpose? backward step X k+1 = arg min kxk kx ˆXk 2 what s this do?
40 MATRIX FACTORIZATION minimize X,Y 1 kxy Dk2 2 subject to X, Y 0 forward step ˆX = X k (X k Y k D)(Y k ) T Ŷ = Y k (X k ) T (X k Y k D) backward step X k+1 = max{ ˆX,0} Y k+1 = max{ŷ,0}
41 CONVERGENCE by reduction to proximal-point method minimize h(x) =g(x)+f(x) Theorem Suppose the stepsize for FBS satisfies does not depend on g apple 2 L where L is a Lipschitz constant for rf. Then h(x k ) h(x? ) apple O(1/k)
42 Nemirovski and Yundin GRADIENT VS. NESTEROV Gradient x 0 Nesterov x 0 x 1 x 2 x 4 x 3 x 1 x 4 x 2 x3 O 1 k O 1 k 2 Optimal
43 FISTA Fast iterative shrinkage/thresholding - Beck and Teboulle 09 FISTA x k+1 =prox g (y k rf(y k ), ) k+1 = 1 q 1+ 4( k ) y k+1 = x k+1 + k 1 k+1 (xk+1 x k ) momentum
44 SPARSA / FASTA idea: adaptive stepsize outperforms acceleration f(x) 2 xt x secant equation rf(x k+1 ) rf(x k )= (x k+1 x k ) g = x minimize least-squares solution 1 k x gk2 2 = xt g k xk 2 = 1 = k xk2 x T g
45 BACKTRACKING based on MM interpretation f(x k+1 ) <f k + hx k+1 x k, rf(x k )i + 1 kx k+1 x k k 2 2 k behaves like Lipschitz constant Choose stepsize t k FBS+Backtracking x k+1 =prox g (x k k rf(x k ), k ) While f(x k+1 ) >f k + hx k+1 x k, rf(x k )i k kx k+1 x k k 2 k k /2 x k+1 =prox g (x k k rf(x k ), k )
46 STOPPING CONDITIONS minimize h(x) =g(x)+f(x) stop when residual is small.but what s the residual?? x k+1 = arg min g(x)+ 1 2 kx 0 k+1 )+ 1 (xk+1 ˆx) 1 (ˆx xk+1 ) k+1 ) form the k+1 ) ˆxk2 r k+1 = 1 (ˆx xk+1 )+rf(x k+1 ) k+1 )
47 HOW SMALL IS SMALL? RELATIVE RESIDUAL minimize h(x) k+1 ) residual r k+1 = 1 (ˆx xk+1 )+rf(x k+1 ) k+1 ) stop when r k+1 = 1 (ˆx xk+1 )+rf(x k+1 ) 0 or equivalently r k+1 r = 1 (ˆx xk+1 ) rf(x k+1 ) relative residual r k max{ 1 (ˆx x k+1 ), rf(x k+1 )}
WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationStochastic Optimization: First order method
Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO
More informationPrimal-dual coordinate descent
Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationLow Complexity Regularization 1 / 33
Low Complexity Regularization 1 / 33 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models Sparse representations
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationAdaptive Restarting for First Order Optimization Methods
Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationProximal gradient methods
ELE 538B: Large-Scale Optimization for Data Science Proximal gradient methods Yuxin Chen Princeton University, Spring 08 Outline Proximal gradient descent for composite functions Proximal mapping / operator
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationNesterov s Optimal Gradient Methods
Yurii Nesterov http://www.core.ucl.ac.be/~nesterov Nesterov s Optimal Gradient Methods Xinhua Zhang Australian National University NICTA 1 Outline The problem from machine learning perspective Preliminaries
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationSEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS
SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationMATH 680 Fall November 27, Homework 3
MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationLecture: Algorithms for Compressed Sensing
1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More information10-725/ Optimization Midterm Exam
10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted
More informationDouglas-Rachford Splitting: Complexity Estimates and Accelerated Variants
53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella
More informationAn accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems
An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationIterative Convex Regularization
Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova Istituto Italiano di Tecnologia Massachusetts Institute of Technology Optimization and Statistical Learning Workshop,
More informationSparse Regularization via Convex Analysis
Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationSequential Unconstrained Minimization: A Survey
Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationFAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING
FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationarxiv: v1 [cs.na] 13 Nov 2014
A FIELD GUIDE TO FORWARD-BACKWARD SPLITTING WITH A FASTA IMPLEMENTATION TOM GOLDSTEIN, CHRISTOPH STUDER, RICHARD BARANIUK arxiv:1411.3406v1 [cs.na] 13 Nov 2014 Abstract. Non-differentiable and constrained
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationNesterov s Acceleration
Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationLecture: Smoothing.
Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate
More informationACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.
ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationLinearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学
Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks
More informationAgenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms
Agenda Interior Point Methods 1 Barrier functions 2 Analytic center 3 Central path 4 Barrier method 5 Primal-dual path following algorithms 6 Nesterov Todd scaling 7 Complexity analysis Interior point
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationAuxiliary-Function Methods in Optimization
Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More information