A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Size: px
Start display at page:

Download "A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization"

Transcription

1 A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London pp500 jointly with D.V. (Ryan) Luong, D. Rueckert, B. Rustem

2 2 / 29 Composite Convex Optimisation min x Ω f (x) + g(x) f : Ω R convex & Lipschitz continuous gradient, f (x) f (y) L x y g : Ω R convex, continuous, non-differentiable. g is simple

3 Algorithms A small sample First Order Algorithms min x f (x) + g(x) Iterative Shrinkage Thresholding Algorithm (ISTA, Proximal Point Algorithm) [Rockafellar, 1976] Accelerated Gradient Methods [Nesterov, 2013] Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [Beck and Teboulle, 2009] Block Coordinate Descent [Nesterov, 2012] Incremental gradient/subgradient [Bertsekas, 2011] Mirror Descent [Ben-Tal et al., 2001] Smoothing Algorithms [Nesterov, 2005] Bundle Methods [Kiwiel, 1990] Dual Proximal Augmented Lagrangian Method [Yang and Zhang, 2011] Homotopy Methods [Donoho and Tsaig, 2008]

4 4 / 29 Composite Convex Optimisation min x Ω h f h (x) + g h (x) f h : Ω R convex & Lipschitz continuous gradient, f h (x) f h (y) L x y g h : Ω h R convex, continuous, non-differentiable. g h is simple Multilevel notation: h fine (full) model H coarse (approximate) model

5

6 Algorithms State of the art First Order Algorithms min x f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA, Proximal Point Algorithm)[Rockafellar, 1976], [Beck and Teboulle, 2009] Accelerated Gradient Methods [Nesterov, 2013] Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [Beck and Teboulle, 2009] Block Coordinate Descent [Nesterov, 2012] Incremental gradient/subgradient [Bertsekas, 2011] Mirror Descent [Ben-Tal et al., 2001] Smoothing Algorithms [Nesterov, 2005] Bundle Methods [Kiwiel, 1990] Dual Proximal Augmented Lagrangian Method [Yang and Zhang, 2011] Homotopy Methods [Donoho and Tsaig, 2008]

7 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.

8 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.

9 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.

10 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.

11 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Computationally favorable model 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.

12 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Computationally favorable model 3 Compute Gradient Map Solve (approx) favorable model D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) h = x h,k arg min Q L (x h,k, x) x 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.

13 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Computationally favorable model 3 Compute Gradient Map Solve (approx) favorable model D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) h = x h,k arg min Q L (x h,k, x) x 4 Error Correction Step: Compute & Apply Error Correction x h,k+1 = x h,k s h,k D h,k.

14 8 / 29 What is a Computationally Favorable Model? Global convergence rate: F h (x k ) F h O ( Lh k ) k 1 A smoother model (smaller Lipschitz constant) 2 A lower dimensional model (e.g x H R H with H < h) Numerical Experiment min Ax x R b 2 n 2 + µ x 1, b R m, A R m n (Gaussian N (0, 1)) 10 4 models for n = 25, 50, 100, 200, and 400

15 8 / 29 What is a Computationally Favorable Model? Global convergence rate: F h (x k ) F h O ( Lh k ) k 1 A smoother model (smaller Lipschitz constant) 2 A lower dimensional model (e.g x H R H with H < h) Numerical Experiment min Ax x R b 2 n 2 + µ x 1, b R m, A R m n (Gaussian N (0, 1)) 10 4 models for n = 25, 50, 100, 200, and 400

16 8 / 29 What is a Computationally Favorable Model? Global convergence rate: F h (x k ) F h O ( Lh k ) k 1 A smoother model (smaller Lipschitz constant) 2 A lower dimensional model (e.g x H R H with H < h) Numerical Experiment min Ax x R b 2 n 2 + µ x 1, b R m, A R m n (Gaussian N (0, 1)) 10 4 models for n = 25, 50, 100, 200, and 400

17 What is a Computationally Favorable Model? min Ax x R b 2 n 2 + µ x 1, Lipschitz Constant ISTA Iterations Lipschitz Constant # Iterations Model Dimension (n) Model Dimension (n)

18 What is a Computationally Favorable Model? min Ax x R b 2 n 2 + µ x 1, Lipschitz Constant Lipschitz Constant Model Dimension (n) # Iterations FISTA Iterations Model Dimension (n)

19 Background: Composite Convex Optimisation min f (x) + g(x) x Applications & Algorithms Quadratic Approximations & Search Directions Coarse Approximations Motivation A Multilevel Algorithm Information Transfer Coarse Model Construction Multilevel Iterative Shrinkage Thresholding (MISTA) Convergence Analysis

20 Start-of-the-art in Multilevel Optimization Nonlinear Optimization Nash, S. G. A multigrid approach to discretized optimization problems. Optimization Methods and Software, 2000 Gratton, S., Sartenaer, A., Toint, P. L.. Recursive trust-region methods for multiscale nonlinear optimization. SIAM Journal on Optimization, 2008 W., Zaiwen, and D. Goldfarb. A line search multigrid method for large-scale nonlinear optimization. SIAM Journal on Optimization, 2009 This work Nonsmooth/constrained problems Beyond PDEs! Same complexity ISTA but (10 ) faster, and 3 faster than FISTA PP, Duy V. N. Luong, Daniel Rueckert, Berc Rustem, A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization, Optimization Online, March 2014 Paper & Code: pp500/mista.html 11 / 29

21 Start-of-the-art in Multilevel Optimization Nonlinear Optimization Nash, S. G. A multigrid approach to discretized optimization problems. Optimization Methods and Software, 2000 Gratton, S., Sartenaer, A., Toint, P. L.. Recursive trust-region methods for multiscale nonlinear optimization. SIAM Journal on Optimization, 2008 W., Zaiwen, and D. Goldfarb. A line search multigrid method for large-scale nonlinear optimization. SIAM Journal on Optimization, 2009 This work Nonsmooth/constrained problems Beyond PDEs! Same complexity ISTA but (10 ) faster, and 3 faster than FISTA PP, Duy V. N. Luong, Daniel Rueckert, Berc Rustem, A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization, Optimization Online, March 2014 Paper & Code: pp500/mista.html 11 / 29

22 Information transfer between levels 12 / 29 Coarse model design vector: x H R H Fine model design vector: x h R h and h > H Two standard techniques RH h Restriction Operator: Ih H Prolongation Operator: IH h Rh H Main Assumption: IH h = c(ih h ), c > 0 I Geometric II Algebraic

23 Coarse Model Construction Smooth Case 13 / 29 First Order Coherent Condition min f h (x h ) Coarse Model: x H,0 = I H h x h,k, then f H,0 = I H h f h,k f H (x H ) fh (x H ) + Ih }{{} f h,k f H,0, x H }{{} coarse representation of f h H first order coherent [Lewis and Nash, 2005, Gratton et al., 2008, Wen and Goldfarb, 2009]

24 Coarse Model Construction Smooth Case 13 / 29 First Order Coherent Condition min f h (x h ) Coarse Model: x H,0 = I H h x h,k, then f H,0 = I H h f h,k f H (x H ) fh (x H ) + Ih }{{} f h,k f H,0, x H }{{} coarse representation of f h H first order coherent [Lewis and Nash, 2005, Gratton et al., 2008, Wen and Goldfarb, 2009]

25 Non-Smooth Case 14 / 29 min f h (x h ) + g h (x h ) Optimality Conditions Gradient Mapping D h,k = x h,k prox h (x h,k 1 L f h,k ) = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) D h,k = 0 if and only if x h,k is stationary. First Order Coherent Condition: D H,0 = I H h D h,k

26 Non-Smooth Case 14 / 29 min f h (x h ) + g h (x h ) Optimality Conditions Gradient Mapping D h,k = x h,k prox h (x h,k 1 L f h,k ) = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) D h,k = 0 if and only if x h,k is stationary. First Order Coherent Condition: D H,0 = I H h D h,k

27 Non-Smooth Case 14 / 29 min f h (x h ) + g h (x h ) Optimality Conditions Gradient Mapping D h,k = x h,k prox h (x h,k 1 L f h,k ) = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) D h,k = 0 if and only if x h,k is stationary. First Order Coherent Condition: D H,0 = I H h D h,k

28 Three Ways to Satisfy First Order Coherency 15 / 29 Fine Model (h) Coarse Model (H) min x h f h (x h ) + g h (x h ) min f H (x h ) + g H (x h ) + v x H }{{} H, x H }{{} coarse representation f h +g h force FOC 1 Smooth Coarse Model: v H = L H I H h D h,k ( f H,0 + g H,0 ). 2 Non-smooth model, dual norm projection v H = L H Lh I H h f h,k f H,0. 3 Non-smooth model, indicator projection ( )) v H = L H x H,0 f H,0 + L H Ih H prox h (x h,k 1Lh f h,k.

29 Three Ways to Satisfy First Order Coherency 15 / 29 Fine Model (h) Coarse Model (H) min x h f h (x h ) + g h (x h ) min f H (x h ) + g H (x h ) + v x H }{{} H, x H }{{} coarse representation f h +g h force FOC 1 Smooth Coarse Model: v H = L H I H h D h,k ( f H,0 + g H,0 ). 2 Non-smooth model, dual norm projection v H = L H Lh I H h f h,k f H,0. 3 Non-smooth model, indicator projection ( )) v H = L H x H,0 f H,0 + L H Ih H prox h (x h,k 1Lh f h,k.

30 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29

31 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29

32 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29

33 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29

34 18 / 29 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) Conditions to use coarse model: Ih H D h,k >κ D h,k x h,k x h >η x h, where x h last point to use coarse correction Monotone Algorithm: e.g. ISTA Step size: Determined by convergence analysis

35 Global Convergence Rate Analysis 19 / 29 Main Result: Coarse correction step contraction on the optimal point: where σ (0, 1). x h,k+1 x h, 2 σ x h,k x h, 2,

36 Global Convergence Rate Analysis Main Result: Coarse correction step contraction on the optimal point: where σ (0, 1). Theorem x h,k+1 x h, 2 σ x h,k x h, 2, [Rockafellar, 1976, Theorem 1] Suppose that the coarse correction step satisfies the contraction property, and that f (x) is convex and has Lipschitz-continuous gradients. Then any MISTA step is at least nonexpansive (coarse correction steps are contractions and gradient proximal steps are non-expansive), x h,k+1 x h, 2 x h,k x h, 2, and the sequence {x h,k } converges R-linearly. 19 / 29

37 Global Convergence Rate Analysis Main Result: Coarse correction step contraction on the optimal point: where σ (0, 1). Theorem x h,k+1 x h, 2 σ x h,k x h, 2, [Rockafellar, 1976, Theorem 2] Suppose that the coarse correction step satisfies the contraction property, and that f (x) is strongly convex and has Lipschitz-continuous gradients. Then any MISTA step is always a contraction, x h,k+1 x h, 2 σ x h,k x h, 2, where σ (0, 1) and the sequence {x h,k } converges Q-linearly. 19 / 29

38 Key Lemma 20 / 29 Useful inequality: x y, f (x) f (y) 1 L f (x) f (y) 2. Lemma Consider two coarse correction terms generated by performing m iterations at the coarse level starting from the points x h,k and x h, : then the following inequality holds: d h,k = IH h e H,m = IH h m 1 s H,i D H,i i=0 d h, = IH h m 1 s H,i DH,i = 0 i=0 x h,k x h,, d h,k d h, m d h,k d h, 2 where m > 0 is a constant independent of x h,k & x h, (given in the paper).

39 Key Lemma 20 / 29 Useful inequality: x y, f (x) f (y) 1 L f (x) f (y) 2. Lemma Consider two coarse correction terms generated by performing m iterations at the coarse level starting from the points x h,k and x h, : then the following inequality holds: d h,k = IH h e H,m = IH h m 1 s H,i D H,i i=0 d h, = IH h m 1 s H,i DH,i = 0 i=0 x h,k x h,, d h,k d h, m d h,k d h, 2 where m > 0 is a constant independent of x h,k & x h, (given in the paper).

40 Generalization of Lipschitz Continuity f (x) f (y) 2 L x y 2 Lemma Suppose that a convergent algorithm with nonexpansive steps is applied at the coarse level (e.g. ISTA) then, d h,k d h, m2 ω 2 1 ω2 2 s2 H,0 x h,k x h, 2, ω 1, ω 2 known constants (depend on prolongation/restriction operators) m is the number of iterations in the coarse level s H,0 is the step size used in the coarse algorithm. 21 / 29

41 Generalization of Lipschitz Continuity f (x) f (y) 2 L x y 2 Lemma Suppose that a convergent algorithm with nonexpansive steps is applied at the coarse level (e.g. ISTA) then, d h,k d h, m2 ω 2 1 ω2 2 s2 H,0 x h,k x h, 2, ω 1, ω 2 known constants (depend on prolongation/restriction operators) m is the number of iterations in the coarse level s H,0 is the step size used in the coarse algorithm. 21 / 29

42 Main Theorem 22 / 29 Theorem (Contraction for coarse correction update) Let τ denote the step size in feasible projection step and s denote the step size used when updating the fine model then, x h,k+1 x h, 2 σ(s, τ) x h,k x h, 2 where σ(τ, s) = 2 + (τ)s 2 and, (τ) = 8 9 mω2 1 s2 H,0 (4mω2 2 τ2 2τ(1 + 2m)). There always exists τ > 0 such that, (τ) < 1 and, { } 1 2 s min, 1 (τ) (τ) Therefore σ(τ, s) < 1.

43

44

45 24 / 29 Numerical Experiments Image Restoration min x h A h x h b h µ h W (x h ) 1 b h input image A h blurring operator W ( ) wavelet transform x R h restored image, h = Algorithms run on Matlab & standard laptop

46 25 / 29 Image Restoration Coarse model min x H A H x H b H v H, x H + µ H W (x H ) i + ϱ ϱ i b h = I H h b H coarse input image A H = I H h A h(i H h ) blurring operator x R H H = or H =

47 Iteration Comparison 500 MISTA Gradient Update MISTA Coarse Correction FISTA ISTA 400 F(x) Iteration k 26 / 29

48 CPU Time Comparison (1% Noise) ISTA FISTA MISTA CPU Time (secs) Barb. Cam. Chess Cog Lady Lena Image 26 / 29

49 CPU Time Comparison Summary 26 / 29

50 Conclusions 27 / 29 Current Agorithms Quadratic Approx. Exploit Structure

51 Conclusions 27 / 29 Current Agorithms Multilevel Algorithms Quadratic Approx. Exploit Structure Hierarchical Approx. Exploit (more) Structure

52 Conclusions 27 / 29 Current Agorithms Multilevel Algorithms Numerical Results Quadratic Approx. Exploit Structure Hierarchical Approx. Exploit (more) Structure 3-4 faster than state of the art

53 Conclusions 27 / 29 Current Agorithms Multilevel Algorithms Numerical Results Quadratic Approx. Exploit Structure Hierarchical Approx. Exploit (more) Structure 3-4 faster than state of the art PP, Duy V. N. Luong, Daniel Rueckert, Berc Rustem, A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization, Optimization Online, March 2014 Paper & Code: pp500/mista.html

54 References I Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): Ben-Tal, A., Margalit, T., and Nemirovski, A. (2001). The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal on Optimization, 12(1): Bertsekas, D. P. (2011). Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Optimization for Machine Learning, 2010:1 38. Donoho, D. L. and Tsaig, Y. (2008). Fast solution of-norm minimization problems when the solution may be sparse. Information Theory, IEEE Transactions on, 54(11): Gratton, S., Sartenae, A., and Toint, P. (2008). Recursive trust-region methods for multiscale nonlinear optimization. SIAM Journal on Optimization, 19(1): Kiwiel, K. C. (1990). Proximity control in bundle methods for convex nondifferentiable minimization. Mathematical Programming, 46(1-3): Lewis, R. and Nash, S. (2005). Model problems for the multigrid optimization of systems governed by differential equations. SIAM Journal on Scientific Computing, 26(6):

55 References II Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming, 103(1): Nesterov, Y. (2012). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2): Nesterov, Y. (2013). Gradient methods for minimizing composite functions. Mathematical Programming, 140(1): Rockafellar, R. (1976). Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5): Wen, Z. and Goldfarb, D. (2009). A line search multigrid method for large-scale nonlinear optimization. SIAM Journal on Optimization, 20(3): Yang, J. and Zhang, Y. (2011). Alternating direction algorithms for \ell 1-problems in compressive sensing. SIAM journal on scientific computing, 33(1):

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

A Line search Multigrid Method for Large-Scale Nonlinear Optimization

A Line search Multigrid Method for Large-Scale Nonlinear Optimization A Line search Multigrid Method for Large-Scale Nonlinear Optimization Zaiwen Wen Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University 2008 Siam Conference on

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Multilevel Optimization Methods: Convergence and Problem Structure

Multilevel Optimization Methods: Convergence and Problem Structure Multilevel Optimization Methods: Convergence and Problem Structure Chin Pang Ho, Panos Parpas Department of Computing, Imperial College London, United Kingdom October 8, 206 Abstract Building upon multigrid

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19 Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

A weighted Mirror Descent algorithm for nonsmooth convex optimization problem

A weighted Mirror Descent algorithm for nonsmooth convex optimization problem Noname manuscript No. (will be inserted by the editor) A weighted Mirror Descent algorithm for nonsmooth convex optimization problem Duy V.N. Luong Panos Parpas Daniel Rueckert Berç Rustem Received: date

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Minimizing Isotropic Total Variation without Subiterations

Minimizing Isotropic Total Variation without Subiterations MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

A Recursive Trust-Region Method for Non-Convex Constrained Minimization

A Recursive Trust-Region Method for Non-Convex Constrained Minimization A Recursive Trust-Region Method for Non-Convex Constrained Minimization Christian Groß 1 and Rolf Krause 1 Institute for Numerical Simulation, University of Bonn. {gross,krause}@ins.uni-bonn.de 1 Introduction

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Accelerated Gradient Method for Multi-Task Sparse Learning Problem

Accelerated Gradient Method for Multi-Task Sparse Learning Problem Accelerated Gradient Method for Multi-Task Sparse Learning Problem Xi Chen eike Pan James T. Kwok Jaime G. Carbonell School of Computer Science, Carnegie Mellon University Pittsburgh, U.S.A {xichen, jgc}@cs.cmu.edu

More information

Mathematics of Data: From Theory to Computation

Mathematics of Data: From Theory to Computation Mathematics of Data: From Theory to Computation Prof. Volkan Cevher volkan.cevher@epfl.ch Lecture 8: Composite convex minimization I Laboratory for Information and Inference Systems (LIONS) École Polytechnique

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

Recent developments on sparse representation

Recent developments on sparse representation Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Comparative Study of Restoration Algorithms ISTA and IISTA

Comparative Study of Restoration Algorithms ISTA and IISTA Comparative Study of Restoration Algorithms ISTA and IISTA Kumaresh A K 1, Kother Mohideen S 2, Bremnavas I 3 567 Abstract our proposed work is to compare iterative shrinkage thresholding algorithm (ISTA)

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

MIST: l 0 Sparse Linear Regression with Momentum

MIST: l 0 Sparse Linear Regression with Momentum MIST: l 0 Sparse Linear Regression with Momentum Goran Marjanovic, Magnus O. Ulfarsson, Alfred O. Hero III arxiv:409.793v [stat.ml] 9 Mar 05 Abstract Significant attention has been given to minimizing

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Accelerated gradient methods

Accelerated gradient methods ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated

More information

Approximate Message Passing Algorithms

Approximate Message Passing Algorithms November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information