A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
|
|
- Scot Casey
- 5 years ago
- Views:
Transcription
1 A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London pp500 jointly with D.V. (Ryan) Luong, D. Rueckert, B. Rustem
2 2 / 29 Composite Convex Optimisation min x Ω f (x) + g(x) f : Ω R convex & Lipschitz continuous gradient, f (x) f (y) L x y g : Ω R convex, continuous, non-differentiable. g is simple
3 Algorithms A small sample First Order Algorithms min x f (x) + g(x) Iterative Shrinkage Thresholding Algorithm (ISTA, Proximal Point Algorithm) [Rockafellar, 1976] Accelerated Gradient Methods [Nesterov, 2013] Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [Beck and Teboulle, 2009] Block Coordinate Descent [Nesterov, 2012] Incremental gradient/subgradient [Bertsekas, 2011] Mirror Descent [Ben-Tal et al., 2001] Smoothing Algorithms [Nesterov, 2005] Bundle Methods [Kiwiel, 1990] Dual Proximal Augmented Lagrangian Method [Yang and Zhang, 2011] Homotopy Methods [Donoho and Tsaig, 2008]
4 4 / 29 Composite Convex Optimisation min x Ω h f h (x) + g h (x) f h : Ω R convex & Lipschitz continuous gradient, f h (x) f h (y) L x y g h : Ω h R convex, continuous, non-differentiable. g h is simple Multilevel notation: h fine (full) model H coarse (approximate) model
5
6 Algorithms State of the art First Order Algorithms min x f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA, Proximal Point Algorithm)[Rockafellar, 1976], [Beck and Teboulle, 2009] Accelerated Gradient Methods [Nesterov, 2013] Fast Iterative Shrinkage Thresholding Algorithm (FISTA) [Beck and Teboulle, 2009] Block Coordinate Descent [Nesterov, 2012] Incremental gradient/subgradient [Bertsekas, 2011] Mirror Descent [Ben-Tal et al., 2001] Smoothing Algorithms [Nesterov, 2005] Bundle Methods [Kiwiel, 1990] Dual Proximal Augmented Lagrangian Method [Yang and Zhang, 2011] Homotopy Methods [Donoho and Tsaig, 2008]
7 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.
8 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.
9 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.
10 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Q L (x h,k, x) = f h,k + f h,k, x x h,k + L h 2 x x h,k 2 + g h (x) 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.
11 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Computationally favorable model 3 Compute Gradient Map: (minimize Quadratic Approximation) D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g h (x) h = x h,k arg min x Q L (x h,k, x) 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.
12 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Computationally favorable model 3 Compute Gradient Map Solve (approx) favorable model D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) h = x h,k arg min Q L (x h,k, x) x 4 Error Correction Step: x h,k+1 = x h,k s h,k D h,k.
13 min x Ω h F h (x) f h (x) + g h (x) Iterative Shrinkage Thresholding Algorithm (ISTA ) 1 Iteration k: x h,k, f h,k, f h,k, L h. 2 Quadratic Approximation: Computationally favorable model 3 Compute Gradient Map Solve (approx) favorable model D h,k = x h,k prox h (x h,k 1 L f h,k ) h = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) h = x h,k arg min Q L (x h,k, x) x 4 Error Correction Step: Compute & Apply Error Correction x h,k+1 = x h,k s h,k D h,k.
14 8 / 29 What is a Computationally Favorable Model? Global convergence rate: F h (x k ) F h O ( Lh k ) k 1 A smoother model (smaller Lipschitz constant) 2 A lower dimensional model (e.g x H R H with H < h) Numerical Experiment min Ax x R b 2 n 2 + µ x 1, b R m, A R m n (Gaussian N (0, 1)) 10 4 models for n = 25, 50, 100, 200, and 400
15 8 / 29 What is a Computationally Favorable Model? Global convergence rate: F h (x k ) F h O ( Lh k ) k 1 A smoother model (smaller Lipschitz constant) 2 A lower dimensional model (e.g x H R H with H < h) Numerical Experiment min Ax x R b 2 n 2 + µ x 1, b R m, A R m n (Gaussian N (0, 1)) 10 4 models for n = 25, 50, 100, 200, and 400
16 8 / 29 What is a Computationally Favorable Model? Global convergence rate: F h (x k ) F h O ( Lh k ) k 1 A smoother model (smaller Lipschitz constant) 2 A lower dimensional model (e.g x H R H with H < h) Numerical Experiment min Ax x R b 2 n 2 + µ x 1, b R m, A R m n (Gaussian N (0, 1)) 10 4 models for n = 25, 50, 100, 200, and 400
17 What is a Computationally Favorable Model? min Ax x R b 2 n 2 + µ x 1, Lipschitz Constant ISTA Iterations Lipschitz Constant # Iterations Model Dimension (n) Model Dimension (n)
18 What is a Computationally Favorable Model? min Ax x R b 2 n 2 + µ x 1, Lipschitz Constant Lipschitz Constant Model Dimension (n) # Iterations FISTA Iterations Model Dimension (n)
19 Background: Composite Convex Optimisation min f (x) + g(x) x Applications & Algorithms Quadratic Approximations & Search Directions Coarse Approximations Motivation A Multilevel Algorithm Information Transfer Coarse Model Construction Multilevel Iterative Shrinkage Thresholding (MISTA) Convergence Analysis
20 Start-of-the-art in Multilevel Optimization Nonlinear Optimization Nash, S. G. A multigrid approach to discretized optimization problems. Optimization Methods and Software, 2000 Gratton, S., Sartenaer, A., Toint, P. L.. Recursive trust-region methods for multiscale nonlinear optimization. SIAM Journal on Optimization, 2008 W., Zaiwen, and D. Goldfarb. A line search multigrid method for large-scale nonlinear optimization. SIAM Journal on Optimization, 2009 This work Nonsmooth/constrained problems Beyond PDEs! Same complexity ISTA but (10 ) faster, and 3 faster than FISTA PP, Duy V. N. Luong, Daniel Rueckert, Berc Rustem, A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization, Optimization Online, March 2014 Paper & Code: pp500/mista.html 11 / 29
21 Start-of-the-art in Multilevel Optimization Nonlinear Optimization Nash, S. G. A multigrid approach to discretized optimization problems. Optimization Methods and Software, 2000 Gratton, S., Sartenaer, A., Toint, P. L.. Recursive trust-region methods for multiscale nonlinear optimization. SIAM Journal on Optimization, 2008 W., Zaiwen, and D. Goldfarb. A line search multigrid method for large-scale nonlinear optimization. SIAM Journal on Optimization, 2009 This work Nonsmooth/constrained problems Beyond PDEs! Same complexity ISTA but (10 ) faster, and 3 faster than FISTA PP, Duy V. N. Luong, Daniel Rueckert, Berc Rustem, A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization, Optimization Online, March 2014 Paper & Code: pp500/mista.html 11 / 29
22 Information transfer between levels 12 / 29 Coarse model design vector: x H R H Fine model design vector: x h R h and h > H Two standard techniques RH h Restriction Operator: Ih H Prolongation Operator: IH h Rh H Main Assumption: IH h = c(ih h ), c > 0 I Geometric II Algebraic
23 Coarse Model Construction Smooth Case 13 / 29 First Order Coherent Condition min f h (x h ) Coarse Model: x H,0 = I H h x h,k, then f H,0 = I H h f h,k f H (x H ) fh (x H ) + Ih }{{} f h,k f H,0, x H }{{} coarse representation of f h H first order coherent [Lewis and Nash, 2005, Gratton et al., 2008, Wen and Goldfarb, 2009]
24 Coarse Model Construction Smooth Case 13 / 29 First Order Coherent Condition min f h (x h ) Coarse Model: x H,0 = I H h x h,k, then f H,0 = I H h f h,k f H (x H ) fh (x H ) + Ih }{{} f h,k f H,0, x H }{{} coarse representation of f h H first order coherent [Lewis and Nash, 2005, Gratton et al., 2008, Wen and Goldfarb, 2009]
25 Non-Smooth Case 14 / 29 min f h (x h ) + g h (x h ) Optimality Conditions Gradient Mapping D h,k = x h,k prox h (x h,k 1 L f h,k ) = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) D h,k = 0 if and only if x h,k is stationary. First Order Coherent Condition: D H,0 = I H h D h,k
26 Non-Smooth Case 14 / 29 min f h (x h ) + g h (x h ) Optimality Conditions Gradient Mapping D h,k = x h,k prox h (x h,k 1 L f h,k ) = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) D h,k = 0 if and only if x h,k is stationary. First Order Coherent Condition: D H,0 = I H h D h,k
27 Non-Smooth Case 14 / 29 min f h (x h ) + g h (x h ) Optimality Conditions Gradient Mapping D h,k = x h,k prox h (x h,k 1 L f h,k ) = x h,k arg min x (x x h,k 1 ) 2 L f h,k + g(x) D h,k = 0 if and only if x h,k is stationary. First Order Coherent Condition: D H,0 = I H h D h,k
28 Three Ways to Satisfy First Order Coherency 15 / 29 Fine Model (h) Coarse Model (H) min x h f h (x h ) + g h (x h ) min f H (x h ) + g H (x h ) + v x H }{{} H, x H }{{} coarse representation f h +g h force FOC 1 Smooth Coarse Model: v H = L H I H h D h,k ( f H,0 + g H,0 ). 2 Non-smooth model, dual norm projection v H = L H Lh I H h f h,k f H,0. 3 Non-smooth model, indicator projection ( )) v H = L H x H,0 f H,0 + L H Ih H prox h (x h,k 1Lh f h,k.
29 Three Ways to Satisfy First Order Coherency 15 / 29 Fine Model (h) Coarse Model (H) min x h f h (x h ) + g h (x h ) min f H (x h ) + g H (x h ) + v x H }{{} H, x H }{{} coarse representation f h +g h force FOC 1 Smooth Coarse Model: v H = L H I H h D h,k ( f H,0 + g H,0 ). 2 Non-smooth model, dual norm projection v H = L H Lh I H h f h,k f H,0. 3 Non-smooth model, indicator projection ( )) v H = L H x H,0 f H,0 + L H Ih H prox h (x h,k 1Lh f h,k.
30 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29
31 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29
32 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29
33 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) 1.0 If condition to use coarse model is satisfied at x h,k 1.1. Set x H,0 = I H h x h,k 1.2. m coarse iterations, any monotone algorithm 1.3. Compute feasible coarse correction term, 1.4. Update fine model d h,k = I h H (x H,0 x H,m ) x + = prox h (x h,k τd h,k ) x h,k+1 = x h,k s h,k (x h,k x + h ) 1.5. Go to Otherwise do a fine iteration, any monotone algorithm, go to / 29
34 18 / 29 Multilevel Iterative Shrinkage Thresholding Algorithm (MISTA) Conditions to use coarse model: Ih H D h,k >κ D h,k x h,k x h >η x h, where x h last point to use coarse correction Monotone Algorithm: e.g. ISTA Step size: Determined by convergence analysis
35 Global Convergence Rate Analysis 19 / 29 Main Result: Coarse correction step contraction on the optimal point: where σ (0, 1). x h,k+1 x h, 2 σ x h,k x h, 2,
36 Global Convergence Rate Analysis Main Result: Coarse correction step contraction on the optimal point: where σ (0, 1). Theorem x h,k+1 x h, 2 σ x h,k x h, 2, [Rockafellar, 1976, Theorem 1] Suppose that the coarse correction step satisfies the contraction property, and that f (x) is convex and has Lipschitz-continuous gradients. Then any MISTA step is at least nonexpansive (coarse correction steps are contractions and gradient proximal steps are non-expansive), x h,k+1 x h, 2 x h,k x h, 2, and the sequence {x h,k } converges R-linearly. 19 / 29
37 Global Convergence Rate Analysis Main Result: Coarse correction step contraction on the optimal point: where σ (0, 1). Theorem x h,k+1 x h, 2 σ x h,k x h, 2, [Rockafellar, 1976, Theorem 2] Suppose that the coarse correction step satisfies the contraction property, and that f (x) is strongly convex and has Lipschitz-continuous gradients. Then any MISTA step is always a contraction, x h,k+1 x h, 2 σ x h,k x h, 2, where σ (0, 1) and the sequence {x h,k } converges Q-linearly. 19 / 29
38 Key Lemma 20 / 29 Useful inequality: x y, f (x) f (y) 1 L f (x) f (y) 2. Lemma Consider two coarse correction terms generated by performing m iterations at the coarse level starting from the points x h,k and x h, : then the following inequality holds: d h,k = IH h e H,m = IH h m 1 s H,i D H,i i=0 d h, = IH h m 1 s H,i DH,i = 0 i=0 x h,k x h,, d h,k d h, m d h,k d h, 2 where m > 0 is a constant independent of x h,k & x h, (given in the paper).
39 Key Lemma 20 / 29 Useful inequality: x y, f (x) f (y) 1 L f (x) f (y) 2. Lemma Consider two coarse correction terms generated by performing m iterations at the coarse level starting from the points x h,k and x h, : then the following inequality holds: d h,k = IH h e H,m = IH h m 1 s H,i D H,i i=0 d h, = IH h m 1 s H,i DH,i = 0 i=0 x h,k x h,, d h,k d h, m d h,k d h, 2 where m > 0 is a constant independent of x h,k & x h, (given in the paper).
40 Generalization of Lipschitz Continuity f (x) f (y) 2 L x y 2 Lemma Suppose that a convergent algorithm with nonexpansive steps is applied at the coarse level (e.g. ISTA) then, d h,k d h, m2 ω 2 1 ω2 2 s2 H,0 x h,k x h, 2, ω 1, ω 2 known constants (depend on prolongation/restriction operators) m is the number of iterations in the coarse level s H,0 is the step size used in the coarse algorithm. 21 / 29
41 Generalization of Lipschitz Continuity f (x) f (y) 2 L x y 2 Lemma Suppose that a convergent algorithm with nonexpansive steps is applied at the coarse level (e.g. ISTA) then, d h,k d h, m2 ω 2 1 ω2 2 s2 H,0 x h,k x h, 2, ω 1, ω 2 known constants (depend on prolongation/restriction operators) m is the number of iterations in the coarse level s H,0 is the step size used in the coarse algorithm. 21 / 29
42 Main Theorem 22 / 29 Theorem (Contraction for coarse correction update) Let τ denote the step size in feasible projection step and s denote the step size used when updating the fine model then, x h,k+1 x h, 2 σ(s, τ) x h,k x h, 2 where σ(τ, s) = 2 + (τ)s 2 and, (τ) = 8 9 mω2 1 s2 H,0 (4mω2 2 τ2 2τ(1 + 2m)). There always exists τ > 0 such that, (τ) < 1 and, { } 1 2 s min, 1 (τ) (τ) Therefore σ(τ, s) < 1.
43
44
45 24 / 29 Numerical Experiments Image Restoration min x h A h x h b h µ h W (x h ) 1 b h input image A h blurring operator W ( ) wavelet transform x R h restored image, h = Algorithms run on Matlab & standard laptop
46 25 / 29 Image Restoration Coarse model min x H A H x H b H v H, x H + µ H W (x H ) i + ϱ ϱ i b h = I H h b H coarse input image A H = I H h A h(i H h ) blurring operator x R H H = or H =
47 Iteration Comparison 500 MISTA Gradient Update MISTA Coarse Correction FISTA ISTA 400 F(x) Iteration k 26 / 29
48 CPU Time Comparison (1% Noise) ISTA FISTA MISTA CPU Time (secs) Barb. Cam. Chess Cog Lady Lena Image 26 / 29
49 CPU Time Comparison Summary 26 / 29
50 Conclusions 27 / 29 Current Agorithms Quadratic Approx. Exploit Structure
51 Conclusions 27 / 29 Current Agorithms Multilevel Algorithms Quadratic Approx. Exploit Structure Hierarchical Approx. Exploit (more) Structure
52 Conclusions 27 / 29 Current Agorithms Multilevel Algorithms Numerical Results Quadratic Approx. Exploit Structure Hierarchical Approx. Exploit (more) Structure 3-4 faster than state of the art
53 Conclusions 27 / 29 Current Agorithms Multilevel Algorithms Numerical Results Quadratic Approx. Exploit Structure Hierarchical Approx. Exploit (more) Structure 3-4 faster than state of the art PP, Duy V. N. Luong, Daniel Rueckert, Berc Rustem, A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization, Optimization Online, March 2014 Paper & Code: pp500/mista.html
54 References I Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): Ben-Tal, A., Margalit, T., and Nemirovski, A. (2001). The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal on Optimization, 12(1): Bertsekas, D. P. (2011). Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Optimization for Machine Learning, 2010:1 38. Donoho, D. L. and Tsaig, Y. (2008). Fast solution of-norm minimization problems when the solution may be sparse. Information Theory, IEEE Transactions on, 54(11): Gratton, S., Sartenae, A., and Toint, P. (2008). Recursive trust-region methods for multiscale nonlinear optimization. SIAM Journal on Optimization, 19(1): Kiwiel, K. C. (1990). Proximity control in bundle methods for convex nondifferentiable minimization. Mathematical Programming, 46(1-3): Lewis, R. and Nash, S. (2005). Model problems for the multigrid optimization of systems governed by differential equations. SIAM Journal on Scientific Computing, 26(6):
55 References II Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming, 103(1): Nesterov, Y. (2012). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2): Nesterov, Y. (2013). Gradient methods for minimizing composite functions. Mathematical Programming, 140(1): Rockafellar, R. (1976). Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5): Wen, Z. and Goldfarb, D. (2009). A line search multigrid method for large-scale nonlinear optimization. SIAM Journal on Optimization, 20(3): Yang, J. and Zhang, Y. (2011). Alternating direction algorithms for \ell 1-problems in compressive sensing. SIAM journal on scientific computing, 33(1):
Optimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationA Line search Multigrid Method for Large-Scale Nonlinear Optimization
A Line search Multigrid Method for Large-Scale Nonlinear Optimization Zaiwen Wen Donald Goldfarb Department of Industrial Engineering and Operations Research Columbia University 2008 Siam Conference on
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationMultilevel Optimization Methods: Convergence and Problem Structure
Multilevel Optimization Methods: Convergence and Problem Structure Chin Pang Ho, Panos Parpas Department of Computing, Imperial College London, United Kingdom October 8, 206 Abstract Building upon multigrid
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.
ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationJournal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19
Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationStochastic Optimization: First order method
Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO
More informationA weighted Mirror Descent algorithm for nonsmooth convex optimization problem
Noname manuscript No. (will be inserted by the editor) A weighted Mirror Descent algorithm for nonsmooth convex optimization problem Duy V.N. Luong Panos Parpas Daniel Rueckert Berç Rustem Received: date
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationSPARSE SIGNAL RESTORATION. 1. Introduction
SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful
More informationEfficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization
Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationMinimizing Isotropic Total Variation without Subiterations
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationAlternating Direction Augmented Lagrangian Algorithms for Convex Optimization
Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationA Recursive Trust-Region Method for Non-Convex Constrained Minimization
A Recursive Trust-Region Method for Non-Convex Constrained Minimization Christian Groß 1 and Rolf Krause 1 Institute for Numerical Simulation, University of Bonn. {gross,krause}@ins.uni-bonn.de 1 Introduction
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationLecture: Algorithms for Compressed Sensing
1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationHomotopy methods based on l 0 norm for the compressed sensing problem
Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationSOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1
SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationAccelerated Gradient Method for Multi-Task Sparse Learning Problem
Accelerated Gradient Method for Multi-Task Sparse Learning Problem Xi Chen eike Pan James T. Kwok Jaime G. Carbonell School of Computer Science, Carnegie Mellon University Pittsburgh, U.S.A {xichen, jgc}@cs.cmu.edu
More informationMathematics of Data: From Theory to Computation
Mathematics of Data: From Theory to Computation Prof. Volkan Cevher volkan.cevher@epfl.ch Lecture 8: Composite convex minimization I Laboratory for Information and Inference Systems (LIONS) École Polytechnique
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationA Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem
A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a
More informationBeyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory
Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More informationPrimal-dual coordinate descent
Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationRecent developments on sparse representation
Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationComparative Study of Restoration Algorithms ISTA and IISTA
Comparative Study of Restoration Algorithms ISTA and IISTA Kumaresh A K 1, Kother Mohideen S 2, Bremnavas I 3 567 Abstract our proposed work is to compare iterative shrinkage thresholding algorithm (ISTA)
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationRandomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming
Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone
More informationMIST: l 0 Sparse Linear Regression with Momentum
MIST: l 0 Sparse Linear Regression with Momentum Goran Marjanovic, Magnus O. Ulfarsson, Alfred O. Hero III arxiv:409.793v [stat.ml] 9 Mar 05 Abstract Significant attention has been given to minimizing
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationFAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING
FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationOn the acceleration of the double smoothing technique for unconstrained convex optimization problems
On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationAccelerated gradient methods
ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated
More informationApproximate Message Passing Algorithms
November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More information