Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara
|
|
- Amberly Andrea Phelps
- 6 years ago
- Views:
Transcription
1 Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1
2 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ. Catholique Louvain) A. Patrascu, D. Clipici (Univ. Politehnica Bucharest) Papers can be found at: 2
3 Outline Motivation Problem formulation Previous work Random coordinate descent alg. for smooth convex problems Random coordinate descent alg. for composite convex problems Random coordinate descent alg. for composite nonconvex problems Conclusions 3
4 Motivation Recent Big Data applications: (a) internet (e.g. PageRank) (b) support vector machine (c) truss topology design (d) distributed control...gave birth to many huge-scale optimization problems (dimension of variables n ) BUT matrices defining the optimization problem are very sparse! 4
5 Motivation PageRank problem (Google ranking, network control, data analysis) Let E R n n be adjacency matrix (column stochastic, sparse matrix) Find maximal unitary eigenvector satisfying Ex = x Number of variables (pages) n Standard technique: power method calculations of PageRank on supercomputers take about one week! Formulation as an optimization problem: 1 min x R n 2 Ex x 2 s.t. e T x = 1, x 0. E has at most p << n nonzeros on each row 1 min x R n 2 xt Z T Zx+q T x s.t. a T x = b, l x u ( Z sparse!) 5
6 Motivation Linear SVM problem Let zi R m i = 1,...,n be a set of training data points, m << n Two classes of data points zi Find hyperplane a T y = b which separates data points z i in two classes Formulation as optimization problem: min a R m,b R 1 2 a 2 +Ce T ξ s.t. α i (a T z i b) 1 ξ i, ξ i 0 i = 1,...,n α i { 1, 1} the id (label) of the class corresponding to z i n very big (many constraints) 6
7 Motivation The dual formulation for linear SVM: 1 min x R n 2 xt (Z T Z)x e T x s.t. α T x = 0, 0 x Ce. Z R m n,m << n depends on training points z i (columns of Z are α i z i ) or Z R n n with sparse columns Primal solution is recovered via: a = i α ix i z i & b = i (at z i α i )/n 1 min x R n 2 xt Z T Zx+q T x s.t. a T x = b, l x u ( Z sparse!) 7
8 Motivation State-of-the-art: 1. Second-order algorithms (Newton method, Interior point method): solve at least one linear system per iteration Second-order methods Complexity per iteration O(n 3 ) Worst-case no. of iterations O(lnln 1 ǫ )/O(ln 1 ǫ ) where ǫ is the desired accuracy for solving the optimization problem Let n = 10 8, a standard computer with 2GHz processor takes: 10 7 years to finish only 1 iteration! 8
9 Motivation 2. First-order algorithms (Gradient method, Fast-Gradient method) perform at least one matrix-vector multiplication per iteration (in quadratic case) First-order methods Complexity per iteration O(n 2 ) Worst-case no. of iterations O( 1 ǫ )/O( 1 ǫ ) For n = 10 8, a standard computer with 2GHz processor takes hours per iteration and 100 days to attain ǫ = 0.01 accuracy! Conclusion: for n we require algorithms with low complexity per iteration O(n) or even O(1)! Coordinate Descent Methods 9
10 Problem formulation F = min F(x) (= f(x)+h(x)) x Rn Define decompositions: n = N i=1 x = N i=1 s.t. a T x = b (or even Ax = b) coupling constraints n i,i n = [E 1...E N ] with n E i x i R N and x ij = E i x i +E j x j, x i R n i (i) (ii) A R m n, m << n f has block-component Lipschitz continuous gradient, i.e. (iii) i f(x+e i s i ) i f(x) L i s i x R n,s i R n i, i = 1,...,N h nonsmooth, convex and componentwise separable, i.e. n h(x) = h i (x i ) e.g. : h = 0 or h = 1 [l,u] or h = µ x 1... i=1 10
11 Previous work - Greedy algorithms Tseng (2009) developed coordinate gradient descent methods with greedy strategy min F(x) (= f(x)+h(x)) x Rn s.t. a T x = b (or Ax = b) Let J {1,...,N} be set of indices at current iteration x, then define direction: d H (x;j) = arg min s R nf(x)+ f(x),s Hs,s +h(x+s) s.t. a T s = 0, s j = 0 j / J, (1) where H R n n is a positive definite matrix chosen at initial step of algorithm Tseng & Yun, A Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization, J. Opt. Theory Applications,
12 Previous work - Greedy algorithms Algorithm (CGD): 1. Choose set of indices J k {1,...,N} w.r.t. Gauss-Southwell rule 2. Solve (1) with x = x k, J = J k, H = H k to obtain d k = d Hk (x k ;J k ) 3. Choose stepsize α k > 0 and set x k+1 = x k +α k d k Procedure of choosing J k (Gauss-Southwell rule): (i) (ii) (iii) decompose projected gradient direction d k into low-dimensional vectors evaluate function (1) in each low-dim. vector choose the vector with smallest evaluation and assign to J its support Alg. (CGD) takes O(n) operations per iteration (for quadratic case & A = a) An estimate for rate of convergence of objective function values is: ( nl x 0 x 2 ) O, L = maxl i ǫ i Recently Beck (2012) developed a greedy coordinate descent algorithm (approx. same complexity) for singly linear constrained models with h box indicator function 12
13 Previous work - Random algorithms Nesterov (2010) derived complexity estimates of random coordinate descent methods min f(x) x Q Q = Q 1 Q N convex h(x) = 1 Q (x) f convex and block-component Lipschitz gradient a = 0 (no coupling constraints) Algorithm (RCGD): 1. Choose randomly and index i k with respect to given probability p ik 2. Set x k+1 = x k +E ik ik f(x k ). We can choose Lipschitz dependent probabilitiesp i = L i / N i=1 L i For structured cases (sparse matrices with p << n number of nonzeros per row) has complexity per iteration O(p)! 13
14 Previous work - Random algorithms An estimate for rate of convergence for the expected values of objective function for Nesterov s method (RCGD) is O ( N i=1 L ) i x 0 x 2 Richtarik (2012), Lu (2012) extended complexity estimates of Nesterov s random coordinate descent method to composite case ǫ min F(x) (= f(x)+h(x)) x Rn f convex and has block-component Lipschitz gradient h nonsmooth, convex, block-separable parallel implementations & inexact implementations were also analyzed Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Opt.,
15 Random coordinate descent - smooth & constrained case min x R n f(x) s.t. a T x = b (or Ax = b) f convex & has block-component Lipschitz gradient communication via connected graph G = (V,E) Algorithm (RCD) : given x 0, a T x 0 = b 1. Choose randomly a pair (i k,j k ) E with probability p ik j k 2. Set x k+1 = x k +E ik d ik +E jk d jk, d ij = (d i,d j ) = arg min s ij R n i +n j f(x)+ ij f(x),s ij + L i +L j 2 s ij 2 s.t. a T i s i +a T j s j = 0 each iteration requires approximately O(p) operations (quadratic case)! Necoara, Nesterov & Glineur, A random coordinate descent method on large optimization problems with linear constraints, ICCOPT, 2013 Necoara, Random coordinate descent algorithms for multi-agent convex optimization over networks, IEEE Trans. Automatic Control,
16 (RCD) smooth case - convergence rate Characteristics: only 2 components (in E) of x are updated per iteration (distributed!) alg. (RCD) needs only 2 components of gradient complexity per iterationo(p)! ( ) closed-form solution e.g. di = 1 L i +L i f(x) at ij ijf(x) a i j a T ij a ij Theorem 1 Let x k generated by Algorithm (RCD). Then, the following estimates for expected values of objective function can be obtained E[f(x k )] f x0 x 2 If additionaly, function f is σ-strongly convex, then λ 2 (Q)k E[f(x k )] f ( 1 λ 2 (Q)σ) k(f(x 0 ) f ) where Q = (i,j) E ( ) p ij I L i +L ni +n j j a ija T ij a T ij a ij (Laplacian matrix of the graph) 16
17 Selection of probabilities I. uniform probabilities: p ij = 1 E II. probabilities dependent on the Lipschitz constants L i p α ij = Lα ij L α, where Lα = (i,j) E L α ij, α 0. III. optimal probabilities obtained from max Q M λ 2 (Q) SDP [p ij ] (i,j) E = argmax t, Q } {t : Q+t aat a T a ti n, Q M. M = {Q R n n : Q = (i,j) E p ij L ij Q ij, p ij = p ji, p ij = 0 if (i,j) E, (i,j) E p ij = 1}. 17
18 Comparison with full projected gradient alg. Assume: then a = e and Lipschitz dependent probabilities p 1 ij = L i +L j L 1 Q = 1 n i=1 L i (I n 1n eet ) λ 2 (Q) = 1 n i=1 L i Alg. (RCD) Alg. full projected grad. iter. complexity O(p) iter. complexity O(n p) E[f(x k )] f i L i x 0 x 2 f(x k k ) f L f x 0 x 2 k 2 f(x) L f I n Remark: maximal eigenvalue of a symmetric matrix can reach its trace! worst case: rate of convergence of (RCD) met. is the same as of full gradient met.! However: - (RCD) method has cheap iteration - (RCD) method has more chances to accelerate 18
19 Numerical tests (I) - Google problem Google problem: mine T x=1 Ex x 2 accuracy ǫ = 10 3, full iterations: x 0,x n 2,x n,,x kn * P ij 1 P ij P ij 0 P ij 0 P ij A*x k x k / x k A*x k x k / x k k k Equivalent number of full iterations versus Ex k x k / x k Left: n = 30 using probabilities p 0 ij,p1 ij and p ij Right: n = 10 6 using probabilitiesp 0 ij and p1 ij 19
20 Random coordinate descent - composite case min x R nf(x)+h(x) s.t. a T x = b f convex with block-component Lipsch. gradient h convex, nonsmooth and separable: h(x) = n (e.g. h = 1 [l,u] or h = µ x 1...) i=1 h i (x i ) Algorithm (CRCD): a T x 0 = b 1. Choose randomly a pair (i k,j k ) with probability p ik j k 2. Set x k+1 = x k +E ik d ik +E jk d jk, d ij = (d i,d j ) = arg min s ij R n i +n j f(x)+ ij f(x),s ij + L i +L j 2 s ij 2 +h(x+s ij ) s.t. a T i s i +a T j s j = 0 each iteration requires approximately O(p) operations (quadratic case)! 20
21 Random coordinate descent - composite case Characteristics: only 2 components of x are updated per iteration alg. (CRCD) needs only 2 components of the gradient and is using only 2 functions h i & h j of h if N = n and h is given by l1 norm or indicator function for box, then the direction d ij can be computed in closed form if N < n and h is coordinatewise separable, strictly convex and piece-wise linear/quadratic with O(1) pieces (e.g. h given by l 1 norm), then the direction d ij can be computed in linear-time (i.e. O(n i +n j ) operations). the complexity of choosing randomly a pair (i,j) with a uniform probability distribution requires O(1) operations 21
22 (CRCD) composite case - convergence rate Theorem 2 Let x k be generated by Algorithm (CRCD) and L = max i L i. If the index pairs are selected with uniform distribution, then we have E[F(x k )] F N2 L x 0 x 2. k If additionaly, function f is σ-strongly convex, then E[F(x k )] F ( 1 2(1 γ) ) k N 2 (F(x 0 ) F ), where γ is defined by: γ = 1 σ 2L σ, 8L, if σ 4L otherwise. Necoara & Patrascu, Random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, Computational Opt. Appl.,
23 Arithmetic complexity - comparison N = n (scalar case) & sparse QP R 2 = x x 0 2 L f i L i & L = max i L i & L av = i L i n metoda grad. Lipsc. model complexity per iteration GM (Nesterov) O( L fr 2 ) ǫ h & a full gradient - O(n) CGM (Tseng) O( nlr2 ) ǫ h & a partial gradient - O(n) RCGM (Nesterov) O( nl avr 2 ) ǫ h & a = 0 partial gradient - O(1) RCD O( nl avr 2 ) ǫ h = 0 & a partial gradient - O(1) CRCD O( n2 L av R 2 ) ǫ h & a partial gradient - O(1) our methods RCD & CRCD have usually better (N < n) or comparable (N = n) arithmetic complexity than (or with) existing methods adequate for parallel or distributed architectures robust and have more chances to accelerate (due to randomness) easy to implement (closed-form solution) 23
24 Numerical tests (II) - SVM problem Data set n/m (CRCD) full-iter/obj/time(s) (CGD) iter/obj/time(s) a7a 16100/122 (p = 14) 11242/ / / /21.5 a9a 32561/123 (p = 14) 15355/ / / /89.0 w8a 49749/300 (p = 12) 15380/ / / /27.2 ijcnn /22 (p = 13) 7601/ / / /16.5 web /254 (p = 85) 1428/ / / /748 covtyp /54 (p = 12) 1722/ / /-24000/480 test /10 6 (p = 50) 228/ / / /568 test /10 3 (p = 10) 500/ / / /516,66 real test problems taken from LIBSVM library Our alg. (CRCD) - by a factor of 10 faster than (CGD) method (Tseng)! 24
25 Numerical tests (III) - Chebyshev center problem Chebyshev center problem: given a set of points z 1,...,z n R m, find the center z c and radius r of the smallest enclosing ball of the given points Applications: pattern recognition, protein analysis, mechanical engineering Formulation as an optimization problem: min r,z c r s.t.: z i z c 2 r i = 1,...,n, where r is the radius and z c is the center of the enclosing ball. Dual problem: min x R n Zx 2 s.t. e T x = 1, n z i 2 x i +1 [0, ) (x) (2) i=1 where Z contains the given points z i as columns 25
26 Numerical tests (III) - Chebyshev center problem Simple recovery of primal optimal solution from dual x : r = ( Zx 2 + n z i 2 x i i=1 ) 1/2, z c = Zx. (3) Two sets of numerical experiments: all alg. start from x 0 = e 1 : observe that Tseng s algorithm has good performance and Gradient Method is worst starting from x 0 = e/n: observe that Gradient Method has good performance and Tseng is worst algorithm (CRCD) is very robust w.r.t. starting point x 0 26
27 Numerical tests (III) - Chebyshev center problem x 0 = e/n (CGD) x 0 = e/n (GM) x 0 = e/n (CRCD) x 0 = e 1 x 0 = e 1 x 0 = e 1 27
28 Random coordinate descent - nonconvex & composite min f(x)+h(x) x R n f nonconvex with block-component Lip. gradient & a = 0 (no coupling constraints) h is proper, convex and block separable e.g. : h = 0 or h = 1 [l,u] or h = µ x 1... If x R n is a local minimum, then the following relation holds 0 f(x )+ h(x ) (stationary points) Algorithm (NRCD): 1. Choose randomly an index i k with probability p ik 2. Compute x k+1 = x k +E ik d ik d i = arg min s i R n i f(x)+ i f(x),s i + L i 2 s i 2 +h(x+s i ). Each iteration is cheap, complexity O(p), where p << n (even closed-form solution)! Patrascu & Necoara, Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, submitted J. Global Opt.,
29 (NRCD) nonconvex & composite - convergence rate We introduce the following map (here L = [L 1...L N ] and D L = diag(l)): d L (x) = arg min s R nf(x)+ f(x),s s 2 L +h(x+s) We define optimality measure: M 1 (x,l) = D L d L (x) L, where s 2 L = st D L s and u L its dual norm (observe that M 1(x,L) = 0 x stationary point) Theorem 3 Let the sequence x k be generated by Algorithm (NRCD) using the uniform distribution, then the following statements are valid: (i) (ii) The sequence of random variables M 1 (x k,l) converges to 0 a.s. and the sequence F(x k ) converges to a random variable F a.s. Any accumulation point of the sequence x k is a stationary point Moreover, in expectation [ ( ] 2 min E M 1 (x,l)) l 2N ( F(x 0 ) F ) 0 l k k k 0 29
30 Random coordinate descent - nonconvex & constrained min x R nf(x)+h(x) s.t. a T x = b, f nonconvex with block-component Lip. gradient h is proper, convex and separable If x is a local minimum, then there exists a scalar λ such that: 0 f(x )+ h(x )+λ a and a T x = b. Algorithm (NCRCD): 1. Choose randomly a pair (i k,j k ) with probability p ik j k 2. Compute x k+1 = x k +E ik d ik +E jk d jk, d ij = (d i,d j ) = arg min s ij R n i +n j f(x)+ ij f(x),s ij + L i +L j 2 s ij 2 +h(x+s ij ) s.t. a T i s i +a T j s j = 0 Each iteration is cheap, complexity O(p) (even closed-form solution)! 30
31 (NCRCD) nonconvex & constrained - convergence rate We introduce the following map: d T(x) = arg min f(x)+ f(x),s + 1 s R n : a T s=0 2 s 2 T +h(x+s). We define the optimality measure: M 2 (x,t) = D T d NT (x) T, where T i = 1 N j L ij (observe that M 2 (x,t) = 0 x stationary point) Theorem 4 Let the sequence x k be generated by Algorithm (NCRCD) using the uniform distribution, then the following statements are valid: (i) (ii) The sequence of random variables M 2 (x k,t) converges to 0 a.s. and the sequence F(x k ) converges to a random variable F a.s. Any accumulation point of the sequence x k is a stationary point Moreover, in expectation [ ( ] 2 min E M 2 (x,t)) l N ( F(x 0 ) F ) 0 l k k k 0. 31
32 Numerical tests (IV) - eigenvalue complementarity problem Eigenvalue complem. prob. (EiCP): given matrices A,B R n n, find λ R & x 0 w = (λb A)x, w 0, x 0, w T x = 0 Applications of EiCP: optimal control, stability analysis of dynamic systems, electrical networks, quantum chemistry, chemical reactions, economics... If A, B are symmetric, then we have symmetric (EiCP). Symmetric (EiCP) is equivalent with finding a stationary point of a generalized Rayleigh quotient on the simplex: x T Ax min x R n x T Bx s.t.: et x = 1, x 0. Equivalent nonconvex logarithmic formulation (for A,B 0, with a ii,b ii > 0 e.g. stability of positive dynamical systems): ( ) max f(x) = lnx T Ax lnx T Bx x R n s.t.: e T x = 1, x 0 h(x) = 1 [0, ) (x) Perron-Frobenius theory for A irreducible and B = I n implies global maximum! 32
33 Numerical tests (IV) - eigenvalue complementarity problem Compare with DC (Difference of Convex functions) algorithm (Thi et al. 2012), equivalent in some sense with Projected Gradient method Hard to estimate Lipschitz parameter µ in DC alg., but crucial for convergence of DC ( µ µ ) max x: e T x=1, x 0 2 x2 +lnx T Ax lnx Bx) ( T 2 x2 DC NCRCD n µ CPU (sec) Iter F CPU (sec) Iter F n n n n n n n n n n n n
34 Conclusions usually full first/second-order methods are inefficient for huge-scale optimization for sparse problems coordinate descent methods are adequate for their low complexity per iteration (O(p)) randomized coordinate descent methods have simple strategy for choosing the working set - O(1) operations for index choice usually randomized methods outperform greedy methods we provide rates of convergence and arithmetic complexities for randomized coordinate descent methods randomized methods are easy to implement and adequate for modern parallel and distributed architectures 34
35 References A. Beck, The 2-coordinate descent method for solving double-sided simplex constrained minimization problems, Technical Report, Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM Journal on Optimization 22(2), , P. Richtarik and M. Takac, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, Series A, DOI /s z, P. Tseng and S. Yun, A Coordinate Gradient Descent Method for Nonsmooth Separable Minimization, Mathematical Programming, 117, , P. Tseng and S. Yun, A Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization, Journal of Optimization Theory and Applications, 140, ,
36 References I. Necoara, Y. Nesterov and F. Glineur, A random coordinate descent method on large optimization problems with linear constraints, Technical Report, University Politehnica Bucharest, 2011, I. Necoara, Random coordinate descent algorithms for multi-agent convex optimization over networks, IEEE Transactions on Automatic Control, 58(7), 1-12, I. Necoara and A. Patrascu, A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, Computational Optimization and Applications, in press, 2013, A. Patrascu and I. Necoara, Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, submitted to Journal of Global Optimization,
A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints
Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More informationCoordinate gradient descent methods. Ion Necoara
Coordinate gradient descent methods Ion Necoara January 2, 207 ii Contents Coordinate gradient descent methods. Motivation..................................... 5.. Coordinate minimization versus coordinate
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More informationCoordinate Descent Methods on Huge-Scale Optimization Problems
Coordinate Descent Methods on Huge-Scale Optimization Problems Zhimin Peng Optimization Group Meeting Warm up exercise? Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant,
More informationRandom Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks
J. Optimization Theory & Applications manuscript No. (will be inserted by the editor) Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks Ion Necoara, Yurii Nesterov
More informationEfficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization
Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Martin Takáč The University of Edinburgh Based on: P. Richtárik and M. Takáč. Iteration complexity of randomized
More informationSubgradient methods for huge-scale optimization problems
Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationCoordinate descent methods
Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationRandomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming
Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationThe 2-Coordinate Descent Method for Solving Double-Sided Simplex Constrained Minimization Problems
J Optim Theory Appl 2014) 162:892 919 DOI 10.1007/s10957-013-0491-5 The 2-Coordinate Descent Method for Solving Double-Sided Simplex Constrained Minimization Problems Amir Beck Received: 11 November 2012
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationParallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Meisam Razaviyayn meisamr@stanford.edu Mingyi Hong mingyi@iastate.edu Zhi-Quan Luo luozq@umn.edu Jong-Shi Pang jongship@usc.edu
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationOn Nesterov s Random Coordinate Descent Algorithms
On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationSupplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM
Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM Ching-pei Lee LEECHINGPEI@GMAIL.COM Dan Roth DANR@ILLINOIS.EDU University of Illinois at Urbana-Champaign, 201 N. Goodwin
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More informationSOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1
SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More informationA Quick Tour of Linear Algebra and Optimization for Machine Learning
A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications
ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao June 8, 2014 Abstract In this paper we propose a randomized nonmonotone block
More informationFirst-order methods for structured nonsmooth optimization
First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationFAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč
FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom
More informationDEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular
form) Given: matrix C = (c i,j ) n,m i,j=1 ODE and num math: Linear algebra (N) [lectures] c phabala 2016 DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationConditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint
Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and
More informationIterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming
Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationSubgradient methods for huge-scale optimization problems
CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationAffine iterations on nonnegative vectors
Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationAn adaptive accelerated first-order method for convex optimization
An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant
More informationMATH36001 Perron Frobenius Theory 2015
MATH361 Perron Frobenius Theory 215 In addition to saying something useful, the Perron Frobenius theory is elegant. It is a testament to the fact that beautiful mathematics eventually tends to be useful,
More informationComputational Economics and Finance
Computational Economics and Finance Part II: Linear Equations Spring 2016 Outline Back Substitution, LU and other decomposi- Direct methods: tions Error analysis and condition numbers Iterative methods:
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationSOCP Relaxation of Sensor Network Localization
SOCP Relaxation of Sensor Network Localization Paul Tseng Mathematics, University of Washington Seattle University of Vienna/Wien June 19, 2006 Abstract This is a talk given at Univ. Vienna, 2006. SOCP
More informationInterpolation-Based Trust-Region Methods for DFO
Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationThe speed of Shor s R-algorithm
IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University
More informationDistributed Box-Constrained Quadratic Optimization for Dual Linear SVM
Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Lee, Ching-pei University of Illinois at Urbana-Champaign Joint work with Dan Roth ICML 2015 Outline Introduction Algorithm Experiments
More informationEfficient Learning on Large Data Sets
Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong Joint work with Mu Li, Chonghai Hu, Weike Pan, Bao-liang Lu Chinese Workshop on Machine Learning
More informationBlock stochastic gradient update method
Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationPrimal and Dual Predicted Decrease Approximation Methods
Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained
More informationNumerical Methods - Numerical Linear Algebra
Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear
More informationNotes on PCG for Sparse Linear Systems
Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationLinear and non-linear programming
Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)
More informationWE consider an undirected, connected network of n
On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been
More informationALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS
ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State
More information