Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara

Size: px
Start display at page:

Download "Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara"

Transcription

1 Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1

2 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ. Catholique Louvain) A. Patrascu, D. Clipici (Univ. Politehnica Bucharest) Papers can be found at: 2

3 Outline Motivation Problem formulation Previous work Random coordinate descent alg. for smooth convex problems Random coordinate descent alg. for composite convex problems Random coordinate descent alg. for composite nonconvex problems Conclusions 3

4 Motivation Recent Big Data applications: (a) internet (e.g. PageRank) (b) support vector machine (c) truss topology design (d) distributed control...gave birth to many huge-scale optimization problems (dimension of variables n ) BUT matrices defining the optimization problem are very sparse! 4

5 Motivation PageRank problem (Google ranking, network control, data analysis) Let E R n n be adjacency matrix (column stochastic, sparse matrix) Find maximal unitary eigenvector satisfying Ex = x Number of variables (pages) n Standard technique: power method calculations of PageRank on supercomputers take about one week! Formulation as an optimization problem: 1 min x R n 2 Ex x 2 s.t. e T x = 1, x 0. E has at most p << n nonzeros on each row 1 min x R n 2 xt Z T Zx+q T x s.t. a T x = b, l x u ( Z sparse!) 5

6 Motivation Linear SVM problem Let zi R m i = 1,...,n be a set of training data points, m << n Two classes of data points zi Find hyperplane a T y = b which separates data points z i in two classes Formulation as optimization problem: min a R m,b R 1 2 a 2 +Ce T ξ s.t. α i (a T z i b) 1 ξ i, ξ i 0 i = 1,...,n α i { 1, 1} the id (label) of the class corresponding to z i n very big (many constraints) 6

7 Motivation The dual formulation for linear SVM: 1 min x R n 2 xt (Z T Z)x e T x s.t. α T x = 0, 0 x Ce. Z R m n,m << n depends on training points z i (columns of Z are α i z i ) or Z R n n with sparse columns Primal solution is recovered via: a = i α ix i z i & b = i (at z i α i )/n 1 min x R n 2 xt Z T Zx+q T x s.t. a T x = b, l x u ( Z sparse!) 7

8 Motivation State-of-the-art: 1. Second-order algorithms (Newton method, Interior point method): solve at least one linear system per iteration Second-order methods Complexity per iteration O(n 3 ) Worst-case no. of iterations O(lnln 1 ǫ )/O(ln 1 ǫ ) where ǫ is the desired accuracy for solving the optimization problem Let n = 10 8, a standard computer with 2GHz processor takes: 10 7 years to finish only 1 iteration! 8

9 Motivation 2. First-order algorithms (Gradient method, Fast-Gradient method) perform at least one matrix-vector multiplication per iteration (in quadratic case) First-order methods Complexity per iteration O(n 2 ) Worst-case no. of iterations O( 1 ǫ )/O( 1 ǫ ) For n = 10 8, a standard computer with 2GHz processor takes hours per iteration and 100 days to attain ǫ = 0.01 accuracy! Conclusion: for n we require algorithms with low complexity per iteration O(n) or even O(1)! Coordinate Descent Methods 9

10 Problem formulation F = min F(x) (= f(x)+h(x)) x Rn Define decompositions: n = N i=1 x = N i=1 s.t. a T x = b (or even Ax = b) coupling constraints n i,i n = [E 1...E N ] with n E i x i R N and x ij = E i x i +E j x j, x i R n i (i) (ii) A R m n, m << n f has block-component Lipschitz continuous gradient, i.e. (iii) i f(x+e i s i ) i f(x) L i s i x R n,s i R n i, i = 1,...,N h nonsmooth, convex and componentwise separable, i.e. n h(x) = h i (x i ) e.g. : h = 0 or h = 1 [l,u] or h = µ x 1... i=1 10

11 Previous work - Greedy algorithms Tseng (2009) developed coordinate gradient descent methods with greedy strategy min F(x) (= f(x)+h(x)) x Rn s.t. a T x = b (or Ax = b) Let J {1,...,N} be set of indices at current iteration x, then define direction: d H (x;j) = arg min s R nf(x)+ f(x),s Hs,s +h(x+s) s.t. a T s = 0, s j = 0 j / J, (1) where H R n n is a positive definite matrix chosen at initial step of algorithm Tseng & Yun, A Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization, J. Opt. Theory Applications,

12 Previous work - Greedy algorithms Algorithm (CGD): 1. Choose set of indices J k {1,...,N} w.r.t. Gauss-Southwell rule 2. Solve (1) with x = x k, J = J k, H = H k to obtain d k = d Hk (x k ;J k ) 3. Choose stepsize α k > 0 and set x k+1 = x k +α k d k Procedure of choosing J k (Gauss-Southwell rule): (i) (ii) (iii) decompose projected gradient direction d k into low-dimensional vectors evaluate function (1) in each low-dim. vector choose the vector with smallest evaluation and assign to J its support Alg. (CGD) takes O(n) operations per iteration (for quadratic case & A = a) An estimate for rate of convergence of objective function values is: ( nl x 0 x 2 ) O, L = maxl i ǫ i Recently Beck (2012) developed a greedy coordinate descent algorithm (approx. same complexity) for singly linear constrained models with h box indicator function 12

13 Previous work - Random algorithms Nesterov (2010) derived complexity estimates of random coordinate descent methods min f(x) x Q Q = Q 1 Q N convex h(x) = 1 Q (x) f convex and block-component Lipschitz gradient a = 0 (no coupling constraints) Algorithm (RCGD): 1. Choose randomly and index i k with respect to given probability p ik 2. Set x k+1 = x k +E ik ik f(x k ). We can choose Lipschitz dependent probabilitiesp i = L i / N i=1 L i For structured cases (sparse matrices with p << n number of nonzeros per row) has complexity per iteration O(p)! 13

14 Previous work - Random algorithms An estimate for rate of convergence for the expected values of objective function for Nesterov s method (RCGD) is O ( N i=1 L ) i x 0 x 2 Richtarik (2012), Lu (2012) extended complexity estimates of Nesterov s random coordinate descent method to composite case ǫ min F(x) (= f(x)+h(x)) x Rn f convex and has block-component Lipschitz gradient h nonsmooth, convex, block-separable parallel implementations & inexact implementations were also analyzed Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Opt.,

15 Random coordinate descent - smooth & constrained case min x R n f(x) s.t. a T x = b (or Ax = b) f convex & has block-component Lipschitz gradient communication via connected graph G = (V,E) Algorithm (RCD) : given x 0, a T x 0 = b 1. Choose randomly a pair (i k,j k ) E with probability p ik j k 2. Set x k+1 = x k +E ik d ik +E jk d jk, d ij = (d i,d j ) = arg min s ij R n i +n j f(x)+ ij f(x),s ij + L i +L j 2 s ij 2 s.t. a T i s i +a T j s j = 0 each iteration requires approximately O(p) operations (quadratic case)! Necoara, Nesterov & Glineur, A random coordinate descent method on large optimization problems with linear constraints, ICCOPT, 2013 Necoara, Random coordinate descent algorithms for multi-agent convex optimization over networks, IEEE Trans. Automatic Control,

16 (RCD) smooth case - convergence rate Characteristics: only 2 components (in E) of x are updated per iteration (distributed!) alg. (RCD) needs only 2 components of gradient complexity per iterationo(p)! ( ) closed-form solution e.g. di = 1 L i +L i f(x) at ij ijf(x) a i j a T ij a ij Theorem 1 Let x k generated by Algorithm (RCD). Then, the following estimates for expected values of objective function can be obtained E[f(x k )] f x0 x 2 If additionaly, function f is σ-strongly convex, then λ 2 (Q)k E[f(x k )] f ( 1 λ 2 (Q)σ) k(f(x 0 ) f ) where Q = (i,j) E ( ) p ij I L i +L ni +n j j a ija T ij a T ij a ij (Laplacian matrix of the graph) 16

17 Selection of probabilities I. uniform probabilities: p ij = 1 E II. probabilities dependent on the Lipschitz constants L i p α ij = Lα ij L α, where Lα = (i,j) E L α ij, α 0. III. optimal probabilities obtained from max Q M λ 2 (Q) SDP [p ij ] (i,j) E = argmax t, Q } {t : Q+t aat a T a ti n, Q M. M = {Q R n n : Q = (i,j) E p ij L ij Q ij, p ij = p ji, p ij = 0 if (i,j) E, (i,j) E p ij = 1}. 17

18 Comparison with full projected gradient alg. Assume: then a = e and Lipschitz dependent probabilities p 1 ij = L i +L j L 1 Q = 1 n i=1 L i (I n 1n eet ) λ 2 (Q) = 1 n i=1 L i Alg. (RCD) Alg. full projected grad. iter. complexity O(p) iter. complexity O(n p) E[f(x k )] f i L i x 0 x 2 f(x k k ) f L f x 0 x 2 k 2 f(x) L f I n Remark: maximal eigenvalue of a symmetric matrix can reach its trace! worst case: rate of convergence of (RCD) met. is the same as of full gradient met.! However: - (RCD) method has cheap iteration - (RCD) method has more chances to accelerate 18

19 Numerical tests (I) - Google problem Google problem: mine T x=1 Ex x 2 accuracy ǫ = 10 3, full iterations: x 0,x n 2,x n,,x kn * P ij 1 P ij P ij 0 P ij 0 P ij A*x k x k / x k A*x k x k / x k k k Equivalent number of full iterations versus Ex k x k / x k Left: n = 30 using probabilities p 0 ij,p1 ij and p ij Right: n = 10 6 using probabilitiesp 0 ij and p1 ij 19

20 Random coordinate descent - composite case min x R nf(x)+h(x) s.t. a T x = b f convex with block-component Lipsch. gradient h convex, nonsmooth and separable: h(x) = n (e.g. h = 1 [l,u] or h = µ x 1...) i=1 h i (x i ) Algorithm (CRCD): a T x 0 = b 1. Choose randomly a pair (i k,j k ) with probability p ik j k 2. Set x k+1 = x k +E ik d ik +E jk d jk, d ij = (d i,d j ) = arg min s ij R n i +n j f(x)+ ij f(x),s ij + L i +L j 2 s ij 2 +h(x+s ij ) s.t. a T i s i +a T j s j = 0 each iteration requires approximately O(p) operations (quadratic case)! 20

21 Random coordinate descent - composite case Characteristics: only 2 components of x are updated per iteration alg. (CRCD) needs only 2 components of the gradient and is using only 2 functions h i & h j of h if N = n and h is given by l1 norm or indicator function for box, then the direction d ij can be computed in closed form if N < n and h is coordinatewise separable, strictly convex and piece-wise linear/quadratic with O(1) pieces (e.g. h given by l 1 norm), then the direction d ij can be computed in linear-time (i.e. O(n i +n j ) operations). the complexity of choosing randomly a pair (i,j) with a uniform probability distribution requires O(1) operations 21

22 (CRCD) composite case - convergence rate Theorem 2 Let x k be generated by Algorithm (CRCD) and L = max i L i. If the index pairs are selected with uniform distribution, then we have E[F(x k )] F N2 L x 0 x 2. k If additionaly, function f is σ-strongly convex, then E[F(x k )] F ( 1 2(1 γ) ) k N 2 (F(x 0 ) F ), where γ is defined by: γ = 1 σ 2L σ, 8L, if σ 4L otherwise. Necoara & Patrascu, Random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, Computational Opt. Appl.,

23 Arithmetic complexity - comparison N = n (scalar case) & sparse QP R 2 = x x 0 2 L f i L i & L = max i L i & L av = i L i n metoda grad. Lipsc. model complexity per iteration GM (Nesterov) O( L fr 2 ) ǫ h & a full gradient - O(n) CGM (Tseng) O( nlr2 ) ǫ h & a partial gradient - O(n) RCGM (Nesterov) O( nl avr 2 ) ǫ h & a = 0 partial gradient - O(1) RCD O( nl avr 2 ) ǫ h = 0 & a partial gradient - O(1) CRCD O( n2 L av R 2 ) ǫ h & a partial gradient - O(1) our methods RCD & CRCD have usually better (N < n) or comparable (N = n) arithmetic complexity than (or with) existing methods adequate for parallel or distributed architectures robust and have more chances to accelerate (due to randomness) easy to implement (closed-form solution) 23

24 Numerical tests (II) - SVM problem Data set n/m (CRCD) full-iter/obj/time(s) (CGD) iter/obj/time(s) a7a 16100/122 (p = 14) 11242/ / / /21.5 a9a 32561/123 (p = 14) 15355/ / / /89.0 w8a 49749/300 (p = 12) 15380/ / / /27.2 ijcnn /22 (p = 13) 7601/ / / /16.5 web /254 (p = 85) 1428/ / / /748 covtyp /54 (p = 12) 1722/ / /-24000/480 test /10 6 (p = 50) 228/ / / /568 test /10 3 (p = 10) 500/ / / /516,66 real test problems taken from LIBSVM library Our alg. (CRCD) - by a factor of 10 faster than (CGD) method (Tseng)! 24

25 Numerical tests (III) - Chebyshev center problem Chebyshev center problem: given a set of points z 1,...,z n R m, find the center z c and radius r of the smallest enclosing ball of the given points Applications: pattern recognition, protein analysis, mechanical engineering Formulation as an optimization problem: min r,z c r s.t.: z i z c 2 r i = 1,...,n, where r is the radius and z c is the center of the enclosing ball. Dual problem: min x R n Zx 2 s.t. e T x = 1, n z i 2 x i +1 [0, ) (x) (2) i=1 where Z contains the given points z i as columns 25

26 Numerical tests (III) - Chebyshev center problem Simple recovery of primal optimal solution from dual x : r = ( Zx 2 + n z i 2 x i i=1 ) 1/2, z c = Zx. (3) Two sets of numerical experiments: all alg. start from x 0 = e 1 : observe that Tseng s algorithm has good performance and Gradient Method is worst starting from x 0 = e/n: observe that Gradient Method has good performance and Tseng is worst algorithm (CRCD) is very robust w.r.t. starting point x 0 26

27 Numerical tests (III) - Chebyshev center problem x 0 = e/n (CGD) x 0 = e/n (GM) x 0 = e/n (CRCD) x 0 = e 1 x 0 = e 1 x 0 = e 1 27

28 Random coordinate descent - nonconvex & composite min f(x)+h(x) x R n f nonconvex with block-component Lip. gradient & a = 0 (no coupling constraints) h is proper, convex and block separable e.g. : h = 0 or h = 1 [l,u] or h = µ x 1... If x R n is a local minimum, then the following relation holds 0 f(x )+ h(x ) (stationary points) Algorithm (NRCD): 1. Choose randomly an index i k with probability p ik 2. Compute x k+1 = x k +E ik d ik d i = arg min s i R n i f(x)+ i f(x),s i + L i 2 s i 2 +h(x+s i ). Each iteration is cheap, complexity O(p), where p << n (even closed-form solution)! Patrascu & Necoara, Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, submitted J. Global Opt.,

29 (NRCD) nonconvex & composite - convergence rate We introduce the following map (here L = [L 1...L N ] and D L = diag(l)): d L (x) = arg min s R nf(x)+ f(x),s s 2 L +h(x+s) We define optimality measure: M 1 (x,l) = D L d L (x) L, where s 2 L = st D L s and u L its dual norm (observe that M 1(x,L) = 0 x stationary point) Theorem 3 Let the sequence x k be generated by Algorithm (NRCD) using the uniform distribution, then the following statements are valid: (i) (ii) The sequence of random variables M 1 (x k,l) converges to 0 a.s. and the sequence F(x k ) converges to a random variable F a.s. Any accumulation point of the sequence x k is a stationary point Moreover, in expectation [ ( ] 2 min E M 1 (x,l)) l 2N ( F(x 0 ) F ) 0 l k k k 0 29

30 Random coordinate descent - nonconvex & constrained min x R nf(x)+h(x) s.t. a T x = b, f nonconvex with block-component Lip. gradient h is proper, convex and separable If x is a local minimum, then there exists a scalar λ such that: 0 f(x )+ h(x )+λ a and a T x = b. Algorithm (NCRCD): 1. Choose randomly a pair (i k,j k ) with probability p ik j k 2. Compute x k+1 = x k +E ik d ik +E jk d jk, d ij = (d i,d j ) = arg min s ij R n i +n j f(x)+ ij f(x),s ij + L i +L j 2 s ij 2 +h(x+s ij ) s.t. a T i s i +a T j s j = 0 Each iteration is cheap, complexity O(p) (even closed-form solution)! 30

31 (NCRCD) nonconvex & constrained - convergence rate We introduce the following map: d T(x) = arg min f(x)+ f(x),s + 1 s R n : a T s=0 2 s 2 T +h(x+s). We define the optimality measure: M 2 (x,t) = D T d NT (x) T, where T i = 1 N j L ij (observe that M 2 (x,t) = 0 x stationary point) Theorem 4 Let the sequence x k be generated by Algorithm (NCRCD) using the uniform distribution, then the following statements are valid: (i) (ii) The sequence of random variables M 2 (x k,t) converges to 0 a.s. and the sequence F(x k ) converges to a random variable F a.s. Any accumulation point of the sequence x k is a stationary point Moreover, in expectation [ ( ] 2 min E M 2 (x,t)) l N ( F(x 0 ) F ) 0 l k k k 0. 31

32 Numerical tests (IV) - eigenvalue complementarity problem Eigenvalue complem. prob. (EiCP): given matrices A,B R n n, find λ R & x 0 w = (λb A)x, w 0, x 0, w T x = 0 Applications of EiCP: optimal control, stability analysis of dynamic systems, electrical networks, quantum chemistry, chemical reactions, economics... If A, B are symmetric, then we have symmetric (EiCP). Symmetric (EiCP) is equivalent with finding a stationary point of a generalized Rayleigh quotient on the simplex: x T Ax min x R n x T Bx s.t.: et x = 1, x 0. Equivalent nonconvex logarithmic formulation (for A,B 0, with a ii,b ii > 0 e.g. stability of positive dynamical systems): ( ) max f(x) = lnx T Ax lnx T Bx x R n s.t.: e T x = 1, x 0 h(x) = 1 [0, ) (x) Perron-Frobenius theory for A irreducible and B = I n implies global maximum! 32

33 Numerical tests (IV) - eigenvalue complementarity problem Compare with DC (Difference of Convex functions) algorithm (Thi et al. 2012), equivalent in some sense with Projected Gradient method Hard to estimate Lipschitz parameter µ in DC alg., but crucial for convergence of DC ( µ µ ) max x: e T x=1, x 0 2 x2 +lnx T Ax lnx Bx) ( T 2 x2 DC NCRCD n µ CPU (sec) Iter F CPU (sec) Iter F n n n n n n n n n n n n

34 Conclusions usually full first/second-order methods are inefficient for huge-scale optimization for sparse problems coordinate descent methods are adequate for their low complexity per iteration (O(p)) randomized coordinate descent methods have simple strategy for choosing the working set - O(1) operations for index choice usually randomized methods outperform greedy methods we provide rates of convergence and arithmetic complexities for randomized coordinate descent methods randomized methods are easy to implement and adequate for modern parallel and distributed architectures 34

35 References A. Beck, The 2-coordinate descent method for solving double-sided simplex constrained minimization problems, Technical Report, Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM Journal on Optimization 22(2), , P. Richtarik and M. Takac, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, Series A, DOI /s z, P. Tseng and S. Yun, A Coordinate Gradient Descent Method for Nonsmooth Separable Minimization, Mathematical Programming, 117, , P. Tseng and S. Yun, A Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization, Journal of Optimization Theory and Applications, 140, ,

36 References I. Necoara, Y. Nesterov and F. Glineur, A random coordinate descent method on large optimization problems with linear constraints, Technical Report, University Politehnica Bucharest, 2011, I. Necoara, Random coordinate descent algorithms for multi-agent convex optimization over networks, IEEE Transactions on Automatic Control, 58(7), 1-12, I. Necoara and A. Patrascu, A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints, Computational Optimization and Applications, in press, 2013, A. Patrascu and I. Necoara, Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, submitted to Journal of Global Optimization,

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

Coordinate gradient descent methods. Ion Necoara

Coordinate gradient descent methods. Ion Necoara Coordinate gradient descent methods Ion Necoara January 2, 207 ii Contents Coordinate gradient descent methods. Motivation..................................... 5.. Coordinate minimization versus coordinate

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Coordinate Descent Methods on Huge-Scale Optimization Problems

Coordinate Descent Methods on Huge-Scale Optimization Problems Coordinate Descent Methods on Huge-Scale Optimization Problems Zhimin Peng Optimization Group Meeting Warm up exercise? Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant,

More information

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks J. Optimization Theory & Applications manuscript No. (will be inserted by the editor) Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks Ion Necoara, Yurii Nesterov

More information

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Martin Takáč The University of Edinburgh Based on: P. Richtárik and M. Takáč. Iteration complexity of randomized

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

The 2-Coordinate Descent Method for Solving Double-Sided Simplex Constrained Minimization Problems

The 2-Coordinate Descent Method for Solving Double-Sided Simplex Constrained Minimization Problems J Optim Theory Appl 2014) 162:892 919 DOI 10.1007/s10957-013-0491-5 The 2-Coordinate Descent Method for Solving Double-Sided Simplex Constrained Minimization Problems Amir Beck Received: 11 November 2012

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Meisam Razaviyayn meisamr@stanford.edu Mingyi Hong mingyi@iastate.edu Zhi-Quan Luo luozq@umn.edu Jong-Shi Pang jongship@usc.edu

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

On Nesterov s Random Coordinate Descent Algorithms

On Nesterov s Random Coordinate Descent Algorithms On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM

Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM Ching-pei Lee LEECHINGPEI@GMAIL.COM Dan Roth DANR@ILLINOIS.EDU University of Illinois at Urbana-Champaign, 201 N. Goodwin

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao June 8, 2014 Abstract In this paper we propose a randomized nonmonotone block

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč

FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES. Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč FAST DISTRIBUTED COORDINATE DESCENT FOR NON-STRONGLY CONVEX LOSSES Olivier Fercoq Zheng Qu Peter Richtárik Martin Takáč School of Mathematics, University of Edinburgh, Edinburgh, EH9 3JZ, United Kingdom

More information

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular form) Given: matrix C = (c i,j ) n,m i,j=1 ODE and num math: Linear algebra (N) [lectures] c phabala 2016 DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Affine iterations on nonnegative vectors

Affine iterations on nonnegative vectors Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

An adaptive accelerated first-order method for convex optimization

An adaptive accelerated first-order method for convex optimization An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant

More information

MATH36001 Perron Frobenius Theory 2015

MATH36001 Perron Frobenius Theory 2015 MATH361 Perron Frobenius Theory 215 In addition to saying something useful, the Perron Frobenius theory is elegant. It is a testament to the fact that beautiful mathematics eventually tends to be useful,

More information

Computational Economics and Finance

Computational Economics and Finance Computational Economics and Finance Part II: Linear Equations Spring 2016 Outline Back Substitution, LU and other decomposi- Direct methods: tions Error analysis and condition numbers Iterative methods:

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

SOCP Relaxation of Sensor Network Localization

SOCP Relaxation of Sensor Network Localization SOCP Relaxation of Sensor Network Localization Paul Tseng Mathematics, University of Washington Seattle University of Vienna/Wien June 19, 2006 Abstract This is a talk given at Univ. Vienna, 2006. SOCP

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

The speed of Shor s R-algorithm

The speed of Shor s R-algorithm IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University

More information

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Lee, Ching-pei University of Illinois at Urbana-Champaign Joint work with Dan Roth ICML 2015 Outline Introduction Algorithm Experiments

More information

Efficient Learning on Large Data Sets

Efficient Learning on Large Data Sets Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong Joint work with Mu Li, Chonghai Hu, Weike Pan, Bao-liang Lu Chinese Workshop on Machine Learning

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Primal and Dual Predicted Decrease Approximation Methods

Primal and Dual Predicted Decrease Approximation Methods Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State

More information