Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Size: px
Start display at page:

Download "Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013"

Transcription

1 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra

2 Optimal gradient methods 2 / 27

3 Optimal gradient methods We saw following efficiency estimates for the gradient method f CL 1 : f(x k ) f 2L x0 x 2 2 k + 4 ( ) L µ 2k f SL,µ 1 : f(x k ) f L 2 x 0 x 2 L + µ 2. 3 / 27

4 Optimal gradient methods We saw following efficiency estimates for the gradient method f CL 1 : f(x k ) f 2L x0 x 2 2 k + 4 ( ) L µ 2k f SL,µ 1 : f(x k ) f L 2 x 0 x 2 L + µ 2. We also saw lower complexity bounds f CL 1 : f(x k ) f(x ) 3L x0 x (k + 1) 2 ( fsl,µ : f(x k ) f(x ) µ ) 2k L µ 2 x 0 x 2 2. L + µ 3 / 27

5 Optimal gradient methods We saw following efficiency estimates for the gradient method f CL 1 : f(x k ) f 2L x0 x 2 2 k + 4 ( ) L µ 2k f SL,µ 1 : f(x k ) f L 2 x 0 x 2 L + µ 2. We also saw lower complexity bounds f CL 1 : f(x k ) f(x ) 3L x0 x (k + 1) 2 ( fsl,µ : f(x k ) f(x ) µ ) 2k L µ 2 x 0 x 2 2. L + µ Can we close the gap? 3 / 27

6 Gradient with momentum Polyak s method (aka heavy-ball) for f S 1 L,µ x k+1 = x k α k f(x k ) + β k (x k x k 1 ) 4 / 27

7 Gradient with momentum Polyak s method (aka heavy-ball) for f S 1 L,µ x k+1 = x k α k f(x k ) + β k (x k x k 1 ) Converges (locally, i.e., for x 0 x 2 ɛ) as x k x 2 2 ( L µ L + µ ) 2k x 0 x 2 2, ( ) 2 4 for α k = ( L+ and β L µ µ) 2 k = L+ µ 4 / 27

8 Nesterov s optimal gradient method min x f(x), where S 1 L,µ with µ 0 5 / 27

9 Nesterov s optimal gradient method 1. Choose x 0 R n, α 0 (0, 1) 2. Let y 0 x 0 ; set q = µ/l min x f(x), where S 1 L,µ with µ 0 5 / 27

10 Nesterov s optimal gradient method min x f(x), where S 1 L,µ with µ 0 1. Choose x 0 R n, α 0 (0, 1) 2. Let y 0 x 0 ; set q = µ/l 3. k-th iteration (k 0): a). Compute f(y k ) and f(y k ); update primary solution x k+1 = y k 1 L f(yk ) 5 / 27

11 Nesterov s optimal gradient method min x f(x), where S 1 L,µ with µ 0 1. Choose x 0 R n, α 0 (0, 1) 2. Let y 0 x 0 ; set q = µ/l 3. k-th iteration (k 0): a). Compute f(y k ) and f(y k ); update primary solution x k+1 = y k 1 L f(yk ) b). Compute stepsize α k+1 by solving α 2 k+1 = (1 α k+1 )α 2 k + qα k+1 5 / 27

12 Nesterov s optimal gradient method min x f(x), where S 1 L,µ with µ 0 1. Choose x 0 R n, α 0 (0, 1) 2. Let y 0 x 0 ; set q = µ/l 3. k-th iteration (k 0): a). Compute f(y k ) and f(y k ); update primary solution x k+1 = y k 1 L f(yk ) b). Compute stepsize α k+1 by solving α 2 k+1 = (1 α k+1 )α 2 k + qα k+1 c). Set β k = α k (1 α k )/(α 2 k + α k+1) d). Update secondary solution y k+1 = x k+1 + β k (x k+1 x k ) 5 / 27

13 Optimal gradient method rate Theorem Let { x k} be sequence generated by above algorithm. If α 0 µ/l, then f(x k ) f(x ) c 1 min { ( 1 where constants c 1, c 2 depend on α 0, L, µ. Proof: Somewhat involved; see notes. µ ) } k, 4L L (2, L + c 2 k) 2 6 / 27

14 Strongly convex case simplification If µ > 0, select α 0 = µ/l. The two main steps get simplified: 1. Set β k = α k (1 α k )/(α 2 k + α k+1) 2. y k+1 = x k+1 + β k (x k+1 x k ) α k = µ L µ L β k =, k 0. L + µ 7 / 27

15 Strongly convex case simplification If µ > 0, select α 0 = µ/l. The two main steps get simplified: 1. Set β k = α k (1 α k )/(α 2 k + α k+1) 2. y k+1 = x k+1 + β k (x k+1 x k ) α k = µ L µ L β k =, k 0. L + µ Optimal method simplifies to 1. Choose y 0 = x 0 R n 2. k-th iteration (k 0): 7 / 27

16 Strongly convex case simplification If µ > 0, select α 0 = µ/l. The two main steps get simplified: 1. Set β k = α k (1 α k )/(α 2 k + α k+1) 2. y k+1 = x k+1 + β k (x k+1 x k ) α k = µ L µ L β k =, k 0. L + µ Optimal method simplifies to 1. Choose y 0 = x 0 R n 2. k-th iteration (k 0): a). x k+1 = y k 1 L f(yk ) b). y k+1 = x k+1 + β(x k+1 x k ) 7 / 27

17 Strongly convex case simplification If µ > 0, select α 0 = µ/l. The two main steps get simplified: 1. Set β k = α k (1 α k )/(α 2 k + α k+1) 2. y k+1 = x k+1 + β k (x k+1 x k ) α k = µ L µ L β k =, k 0. L + µ Optimal method simplifies to 1. Choose y 0 = x 0 R n 2. k-th iteration (k 0): a). x k+1 = y k 1 L f(yk ) b). y k+1 = x k+1 + β(x k+1 x k ) Notice similarity to Polyak s method! 7 / 27

18 Summary so far Convex f(x) with f G subgradient method Differentiable f CL 1 using gradient methods Rate of convergence for smooth convex problems Faster rate of convergence for smooth, strongly convex Constrained gradient methods Frank-Wolfe method Constrained gradient methods gradient projection Nesterov s optimal gradient methods (smooth) 8 / 27

19 Summary so far Convex f(x) with f G subgradient method Differentiable f CL 1 using gradient methods Rate of convergence for smooth convex problems Faster rate of convergence for smooth, strongly convex Constrained gradient methods Frank-Wolfe method Constrained gradient methods gradient projection Nesterov s optimal gradient methods (smooth) Gap between lower and upper bounds O(1/ t) convex (subgradient method); O(1/t 2 ) for CL 1 ; linear for smoooth, strongly convex 8 / 27

20 Nonsmooth optimization Unconstrained problem: min f(x), where x R n f convex on R n, and Lipschitz cont. on bounded set f(x) f(y) L x y 2, x, y X. 9 / 27

21 Nonsmooth optimization Unconstrained problem: min f(x), where x R n f convex on R n, and Lipschitz cont. on bounded set f(x) f(y) L x y 2, x, y X. At each step, we access to f(x) and g f(x) 9 / 27

22 Nonsmooth optimization Unconstrained problem: min f(x), where x R n f convex on R n, and Lipschitz cont. on bounded set f(x) f(y) L x y 2, x, y X. At each step, we access to f(x) and g f(x) Find x k R n such that f(x k ) f ɛ 9 / 27

23 Nonsmooth optimization Unconstrained problem: min f(x), where x R n f convex on R n, and Lipschitz cont. on bounded set f(x) f(y) L x y 2, x, y X. At each step, we access to f(x) and g f(x) Find x k R n such that f(x k ) f ɛ First-order methods: x k x 0 + span { g 0,..., g k 1} 9 / 27

24 Nonsmooth optimization EXAMPLE Let φ(x) = x for x R 10 / 27

25 Nonsmooth optimization EXAMPLE Let φ(x) = x for x R Subgradient method x k+1 = x k α k g k, where g k x k. 10 / 27

26 Nonsmooth optimization EXAMPLE Let φ(x) = x for x R Subgradient method x k+1 = x k α k g k, where g k x k. If x 0 = 1 and α k = 1 k k+2 (this stepsize is known to be optimal), then x k = 1 k+1 10 / 27

27 Nonsmooth optimization EXAMPLE Let φ(x) = x for x R Subgradient method x k+1 = x k α k g k, where g k x k. If x 0 = 1 and α k = 1 k k+2 (this stepsize is known to be optimal), then x k = 1 k+1 Thus, O( 1 ɛ 2 ) iterations are needed to obtain ɛ-accuracy. 10 / 27

28 Nonsmooth optimization EXAMPLE Let φ(x) = x for x R Subgradient method x k+1 = x k α k g k, where g k x k. If x 0 = 1 and α k = 1 k k+2 (this stepsize is known to be optimal), then x k = 1 k+1 Thus, O( 1 ɛ 2 ) iterations are needed to obtain ɛ-accuracy. This behavior typical for the subgradient method which exhibits O(1/ k) convergence in general 10 / 27

29 Nonsmooth optimization EXAMPLE Let φ(x) = x for x R Subgradient method x k+1 = x k α k g k, where g k x k. If x 0 = 1 and α k = 1 k k+2 (this stepsize is known to be optimal), then x k = 1 k+1 Thus, O( 1 ɛ 2 ) iterations are needed to obtain ɛ-accuracy. This behavior typical for the subgradient method which exhibits O(1/ k) convergence in general Can we do better in general? 10 / 27

30 Nonsmooth optimization Nope! 11 / 27

31 Nonsmooth optimization Nope! Theorem (Nesterov.) Let B = { x x x 0 2 D }. Assume, x B. There exists a convex function f in CL 0 (B) (with L > 0), such that for 0 k n 1, the lower-bound f(x k ) f(x ) LD 2(1+ k+1), holds for any algorithm that generates x k by linearly combining the previous iterates and subgradients. 11 / 27

32 Nonsmooth optimization Nope! Theorem (Nesterov.) Let B = { x x x 0 2 D }. Assume, x B. There exists a convex function f in CL 0 (B) (with L > 0), such that for 0 k n 1, the lower-bound f(x k ) f(x ) LD 2(1+ k+1), holds for any algorithm that generates x k by linearly combining the previous iterates and subgradients. Should we give up? 11 / 27

33 Nonsmooth optimization Nope! Theorem (Nesterov.) Let B = { x x x 0 2 D }. Assume, x B. There exists a convex function f in CL 0 (B) (with L > 0), such that for 0 k n 1, the lower-bound f(x k ) f(x ) LD 2(1+ k+1), holds for any algorithm that generates x k by linearly combining the previous iterates and subgradients. Should we give up? No! 11 / 27

34 Nonsmooth optimization Nope! Theorem (Nesterov.) Let B = { x x x 0 2 D }. Assume, x B. There exists a convex function f in CL 0 (B) (with L > 0), such that for 0 k n 1, the lower-bound f(x k ) f(x ) LD 2(1+ k+1), holds for any algorithm that generates x k by linearly combining the previous iterates and subgradients. Should we give up? No! Several possibilities remain! 11 / 27

35 Nonsmooth optimization Blackbox too pessimistic 12 / 27

36 Nonsmooth optimization Blackbox too pessimistic Nesterov s breakthroughs Excessive gap technique Composite objective minimization 12 / 27

37 Nonsmooth optimization Blackbox too pessimistic Nesterov s breakthroughs Excessive gap technique Composite objective minimization Nemirovski s workshorse of general convex optimization Mirror-descent, NERML Mirror-prox 12 / 27

38 Nonsmooth optimization Blackbox too pessimistic Nesterov s breakthroughs Excessive gap technique Composite objective minimization Nemirovski s workshorse of general convex optimization Mirror-descent, NERML Mirror-prox Other techniques, problem classes, etc. 12 / 27

39 Proximal splitting 13 / 27

40 Composite objectives Frequently nonsmooth problems take the form minimize f(x) := l(x) + r(x) 14 / 27

41 Composite objectives Frequently nonsmooth problems take the form minimize f(x) := l(x) + r(x) l + r 14 / 27

42 Composite objectives Frequently nonsmooth problems take the form minimize f(x) := l(x) + r(x) l + r Example: l(x) = 1 2 Ax b 2 and r(x) = λ x 1 Lasso, L1-LS, compressed sensing 14 / 27

43 Composite objectives Frequently nonsmooth problems take the form minimize f(x) := l(x) + r(x) l + r Example: l(x) = 1 2 Ax b 2 and r(x) = λ x 1 Lasso, L1-LS, compressed sensing Example: l(x) : Logistic loss, and r(x) = λ x 1 L1-Logistic regression, sparse LR 14 / 27

44 Composite objective minimization minimize f(x) := l(x) + r(x) subgradient: x k+1 = x k α k g k, g k f(x k ) 15 / 27

45 Composite objective minimization minimize f(x) := l(x) + r(x) subgradient: x k+1 = x k α k g k, g k f(x k ) subgradient: converges slowly at rate O(1/ k) 15 / 27

46 Composite objective minimization minimize f(x) := l(x) + r(x) subgradient: x k+1 = x k α k g k, g k f(x k ) subgradient: converges slowly at rate O(1/ k) but: f is smooth plus nonsmooth we should exploit: smoothness of l for better method! 15 / 27

47 Projections: another view Let I X be the indicator function for closed, cvx X, defined as: { 0 if x X, I X (x) := otherwise. 16 / 27

48 Projections: another view Let I X be the indicator function for closed, cvx X, defined as: { 0 if x X, I X (x) := otherwise. Recall orthogonal projection P X (y) P X (y) := argmin 1 2 x y 2 2 s.t. x X. 16 / 27

49 Projections: another view Let I X be the indicator function for closed, cvx X, defined as: { 0 if x X, I X (x) := otherwise. Recall orthogonal projection P X (y) P X (y) := argmin 1 2 x y 2 2 s.t. x X. Rewrite orthogonal projection P X (y) as P X (y) := argmin x R n 1 2 x y I X (x). 16 / 27

50 Generalizing projections proximity Projection P X (y) := argmin x R n 1 2 x y I X (x) 17 / 27

51 Generalizing projections proximity Projection P X (y) := argmin x R n 1 2 x y I X (x) Proximity: Replace I X by some convex function! prox r (y) := argmin x R n 1 2 x y r(x) 17 / 27

52 Generalizing projections proximity Projection P X (y) := argmin x R n 1 2 x y I X (x) Proximity: Replace I X by some convex function! prox r (y) := argmin x R n 1 2 x y r(x) Def. prox R : R n R n is called a proximity operator 17 / 27

53 Proximity operator l 1 -norm ball of radius ρ(λ) 18 / 27

54 Proximity operators Exercise: Let r(x) = x 1. Solve prox λr (y). 1 min x R n 2 x y λ x 1. Hint 1: The above problem decomposes into n independent subproblems of the form min x R 1 2 (x y)2 + λ x. Hint 2: Consider the two cases separately: either x = 0 or x 0 19 / 27

55 Proximity operators Exercise: Let r(x) = x 1. Solve prox λr (y). 1 min x R n 2 x y λ x 1. Hint 1: The above problem decomposes into n independent subproblems of the form min x R 1 2 (x y)2 + λ x. Hint 2: Consider the two cases separately: either x = 0 or x 0 Aka: Soft-thresholding operator 19 / 27

56 Basics of proximal splitting Recall Gradient projection for solving min X f(x) for f C 1 L : x k+1 = P X (x k α k f(x k )) 20 / 27

57 Basics of proximal splitting Recall Gradient projection for solving min X f(x) for f C 1 L : x k+1 = P X (x k α k f(x k )) Proximal gradient method solves min l(x) + r(x) x k+1 = prox αk r(x k α k f(x k )). 20 / 27

58 Basics of proximal splitting Recall Gradient projection for solving min X f(x) for f C 1 L : x k+1 = P X (x k α k f(x k )) Proximal gradient method solves min l(x) + r(x) x k+1 = prox αk r(x k α k f(x k )). This method aka: Forward-backward splitting (FBS) Forward step: The gradient-descent step Backward step: The prox-operator 20 / 27

59 FBS example Lasso / L1-LS min 1 2 Ax b λ x / 27

60 FBS example Lasso / L1-LS min 1 2 Ax b λ x 1. prox λ x 1 y = sgn(y) max( y λ, 0) x k+1 = prox αk λ 1 (x k α k A T (Ax k b)). so-called iterative soft-thresholding algorithm! 21 / 27

61 FBS example Lasso / L1-LS min 1 2 Ax b λ x 1. prox λ x 1 y = sgn(y) max( y λ, 0) x k+1 = prox αk λ 1 (x k α k A T (Ax k b)). so-called iterative soft-thresholding algorithm! Exercise: Try solving the problem: min 1 2 Ax b λ x / 27

62 Exercise Recall our older example: 1 2 DT x b 2 2. We solved its unconstrained and constrained versions so far. Now implement a Matlab script to solve min 1 2 DT x b λ x / 27

63 Exercise Recall our older example: 1 2 DT x b 2 2. We solved its unconstrained and constrained versions so far. Now implement a Matlab script to solve Use FBS as shown above min 1 2 DT x b λ x 1. Try different values of λ > 0 in your code 22 / 27

64 Exercise Recall our older example: 1 2 DT x b 2 2. We solved its unconstrained and constrained versions so far. Now implement a Matlab script to solve Use FBS as shown above min 1 2 DT x b λ x 1. Try different values of λ > 0 in your code Use different choices of b 22 / 27

65 Exercise Recall our older example: 1 2 DT x b 2 2. We solved its unconstrained and constrained versions so far. Now implement a Matlab script to solve Use FBS as shown above min 1 2 DT x b λ x 1. Try different values of λ > 0 in your code Use different choices of b Show a choice of b for which x = 0 (zero vector) is optimal 22 / 27

66 Exercise Recall our older example: 1 2 DT x b 2 2. We solved its unconstrained and constrained versions so far. Now implement a Matlab script to solve Use FBS as shown above min 1 2 DT x b λ x 1. Try different values of λ > 0 in your code Use different choices of b Show a choice of b for which x = 0 (zero vector) is optimal Experiment with different stepsizes (try α k = 1/4 and if that does not work try smaller values that might work). Do not expect monotonic descent 22 / 27

67 Exercise Recall our older example: 1 2 DT x b 2 2. We solved its unconstrained and constrained versions so far. Now implement a Matlab script to solve Use FBS as shown above min 1 2 DT x b λ x 1. Try different values of λ > 0 in your code Use different choices of b Show a choice of b for which x = 0 (zero vector) is optimal Experiment with different stepsizes (try α k = 1/4 and if that does not work try smaller values that might work). Do not expect monotonic descent Compare with versions of the subgradient method 22 / 27

68 Proximity operators 23 / 27

69 Proximity operators prox r has several nice properties Read / Skim the paper: Proximal Splitting Methods in Signal Processing, by Combettes and Pesquet (2010). Theorem The operator prox r is firmly nonexpansive (FNE) prox r x prox r y 2 2 prox r x prox r y, x y Proof: (blackboard) Corollary. The operator prox r is nonexpansive Proof: apply Cauchy-Schwarz to FNE. 24 / 27

70 Consequence of FNE Gradient projection x k+1 = P X (x k α k f(x k )) Proximal gradients / FBS x k+1 = prox αk r(x k α k f(x k )) Same convergence theory goes through! Exercise: Try extending proof of gradient-projection convergence to convergence for FBS. Hint: First show that at x, the fixed-point equation x = prox αr (x α f(x )), α > 0 25 / 27

71 Moreau Decomposition Aim: Compute prox r y Sometimes it is easier to compute prox r y r (u) := sup u T x r(x) x Moreau decomposition: y = prox R y + prox R y 26 / 27

72 Moreau decomposition Proof sketch: Consider min 1 2 x y r(x) 27 / 27

73 Moreau decomposition Proof sketch: Consider min 1 2 x y r(x) Introduce new variable z = x, to get prox r y := 1 2 x y r(z), s.t. x = z 27 / 27

74 Moreau decomposition Proof sketch: Consider min 1 2 x y r(x) Introduce new variable z = x, to get prox r y := 1 2 x y r(z), s.t. x = z Derive Lagrangian dual for this 27 / 27

75 Moreau decomposition Proof sketch: Consider min 1 2 x y r(x) Introduce new variable z = x, to get prox r y := 1 2 x y r(z), s.t. x = z Derive Lagrangian dual for this Simplify, and conclude! 27 / 27

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Gradient Methods Using Momentum and Memory

Gradient Methods Using Momentum and Memory Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution 1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19 Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Composite Objective Mirror Descent

Composite Objective Mirror Descent Composite Objective Mirror Descent John C. Duchi 1,3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Proximal gradient methods

Proximal gradient methods ELE 538B: Large-Scale Optimization for Data Science Proximal gradient methods Yuxin Chen Princeton University, Spring 08 Outline Proximal gradient descent for composite functions Proximal mapping / operator

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Lecture: Smoothing.

Lecture: Smoothing. Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Non-negative Matrix Factorization via accelerated Projected Gradient Descent

Non-negative Matrix Factorization via accelerated Projected Gradient Descent Non-negative Matrix Factorization via accelerated Projected Gradient Descent Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium Email: manshun.ang@umons.ac.be Homepage: angms.science

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants 53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information