An adaptive accelerated first-order method for convex optimization

Size: px
Start display at page:

Download "An adaptive accelerated first-order method for convex optimization"

Transcription

1 An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant of Nesterov s method for solving composite convex optimization problems in which certain acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance compared to existing accelerated variants while at the same time preserve the optimal iteration-complexity shared by these methods. Computational results are presented to demonstrate that the proposed adaptive accelerated method endowed with a restarting scheme outperforms other existing accelerated variants. Introduction The methods of choice for solving large-scale convex optimization problems, specially when high accuracy is not needed, are first-order methods due to their cheap iteration cost (time and memory). In [] (see also [3]), Nesterov presents a scheme for accelerating first-order methods, more specifically, the steepest descent method for unconstrained convex optimization and, more generally, the projected gradient method for constrained convex optimization. He shows that the rate of convergence of these methods, namely O(/k), where k denotes the iteration count, can be improved to O(/k 2 ) by considering their corresponding accelerated variants. Due to its wide use for solving large-scale convex optimization problems arising in several applications, other accelerated variants of Nesterov s method for the aforementioned problems, and more generally the composite convex optimization problem, have been proposed and studied in the literature (see for example [, 3, 5, 6, 8,,, 2, 4, 5, 8]). Moreover, there has been an increasing effort towards improving the practical performance of these methods (see for example [6, 6]) in the solution of large-scale convex optimization problems. All of the accelerated variants mentioned above use certain accelerated parameters which are obtained by using well-known update formulas. This paper presents a new accelerated variant for composite convex optimization in which the acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance in comparison to these aforementioned variants while at the same time preserve the optimal iteration-complexity shared by these methods (with the exemption of [6]). More specifically, the new accelerated variant chooses School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, ( monteiro@isye.gatech.edu). The work of this author was partially supported by NSF Grants CMMI-994 and CMMI-322, and ONR Grant ONR N School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, ( camiort@gatech.edu). IMPA, Estrada Dona Castorina, Rio de Janeiro, Brazil ( benar@impa.br). The work of this author was partially supported by CNPq grants no /28-8, 32962/2-5, 48/28-6, /2-7, FAPERJ grants E-26/2.82/28, E-26/2.94/2.

2 the acceleration parameters at every iteration by solving a simple two-variable convex quadratically constrained linear program. Moreover, its iteration cost is comparable to or better than that of the aforementioned accelerated methods since it only computes a single resolvent (or proximal subproblem) at every iteration. For the purpose of our computational experiments, we have implemented the new accelerated variant endowed with two restarting schemes in order to improve its practical performance. The first restarting scheme is an aggressive one which restarts the method from the last iterate whenever the objective function increases (also proposed in [6]). The second restarting scheme is a more conservative (and more robust) one which restarts the method whenever the objective function increases and the number of iterations at that point is sufficiently large. Finally, computational results are presented on several conic quadratic programming instances showing that the new accelerated variant endowed with either one of the restarting schemes is faster and more robust than other existing Nesterov s variants. Our paper is organized as follows. Section 2 presents a class of accelerated first-order methods, and corresponding iteration-complexity results, for solving composite convex optimization problems. Moreover, this section proposes a specific instance of the latter class which adaptively chooses the sequence of acceleration parameters so as to greedily minimize an upper bound on the primal gap. Section 3 presents computational results comparing the latter instance (endowed with one of the aforementioned restart schemes) with other existing variants of Nesterov s method. Finally, Section 4 presents some final remarks. 2 An accelerated first-order method This section introduces an accelerated first-order method presented in [] for solving composite convex optimization problems. First, we propose a method with general acceleration parameters satisfying certain conditions which still guarantee an optimal iteration-complexity. Next, for the purpose of improving the practical performance of the method, we propose a specific way for choosing these parameters that greedily minimizes an upper bound on the primal gap. Let R denote the set of real numbers and define R := R {± }. Also, let X denote a finite dimensional real inner product space with inner product and induced norm denoted by, and, respectively. Our problem of interest is the composite convex optimization problem where: C. h : X R is a proper closed convex function; φ := min φ(x) := f(x) + h(x) () x X C.2 f is a real-valued function which is differentiable and convex on a closed convex set Ω dom h := {x X : h(x) < } ; C.3 f is L-Lipschitz continuous on Ω for some L >, i.e., f(x ) f(x) L x x x, x Ω; C.4 the set X of optimal solutions of () is non-empty. There are many accelerated variants of Nesterov s method for solving () under assumptions C. C.4. In this paper we are interested in studying the properties of the following accelerated variant of Nesterov s method whose statement uses the projection operator Π Ω : X Ω onto Ω defined as Π Ω (x) = arg min x Ω { x x } x X. 2

3 Accelerated Class: A class of accelerated algorithms for () ) Let λ (, /L] and x X be given, and set A =, y = x, the affine function Γ : X R as Γ, and k = ; ) compute a k, x k, x k Ω, yk and the affine function γ k : X R as a k := λ + λ 2 + 4λA k, (2) 2 A k a k x k := y k + x k, x k Ω = Π Ω ( x k ), (3) A k + a k A k + a k y k := arg min {λh(x) + 2 x x X xk + λ f( x kω) } 2, (4) γ k (x) := f( x k Ω) + f( x k Ω), y k x k Ω + h(y k ) + λ xk y k, x y k x X ; (5) 2) choose a pair (A k, a k ) = (A, a ) R R such that A, a, A + a A k + a k, (6) (A + a )φ(y k ) inf {(A Γ k + a γ k )(x) + 2 } x x X x 2, (7) with the safeguard that A = A = when k = ; 3) compute 4) set k k +, and go to step. A k = A k + a k, (8) A k a k Γ k = A k + Γ k + a k A k + γ k, (9) a k x k = x A k Γ k ; () Observe that (7), (8) and (9) imply A k φ(y k ) inf x X {A k Γ k (x) + 2 x x 2 }. () Also, note that the Accelerated Class does not specify how to choose (A k, a k ) R R satisfying (6) and (7). Different choices of the pair (A k, a k ) determines different instances of the class. For example, if the pair (A k, a k ) is chosen as (A k, a k ) = (A k, a k ) then, using equation (3) of Lemma A.3, it can be shown that the Accelerated Class reduces to Algorithm I of []. Moreover, if (A k, a k ) = (A k, a k ) and Ω = X (and hence the gradient of f is defined and is L-Lipschitz continuous on the whole X ), then the Accelerated Class reduces to a well-known Nesterov s accelerated variant, namely FISTA []. On the other hand, this paper is concerned with the instance of the above class which chooses the pair (A k, a k ) which maximizes A k subject to conditions (8) and (). 3

4 The following result not only shows that the pair (A k, a k ) = (A k, a k ) satisfies (6) and (7), but that any instance of the above class has optimal iteration-complexity. Proposition 2.. The following statements hold for every k : a) (A k, a k ) = (A k, a k ) satisfies (6) and (7), and hence the Accelerated Class is well-defined; b) Γ k is affine and Γ k φ; c) A k λk 2 /4; d) there holds where d is the distance of x to X. φ(y k ) φ d2 2A k 2d2 λk 2 (2) Proof. Statements a), b) and c) follow from Lemma A.3. Letting x denote the projection of x onto X, d) follows from c) and the fact that () and b) imply that A k φ(y k ) A k Γ k (x ) + 2 x x 2 A k φ + 2 d2. Observe that Proposition 2.(d) implies that any instance of the Accelerated Class with λ = /L has the optimal iteration-complexity O(Ld 2 /k2 ). Moreover, based on the fact that (8) and (2) imply that ( ) φ(y k ) φ = O A k +, a k our new accelerated variant chooses the pair (A k, a k ) as (A k, a k ) arg max A,a {A + a : A, a and (A, a ) satisfies (7)} (3) so as to greedily reduce the primal gap φ(y k ) φ as much as possible. We will now establish that (7) is equivalent to a convex quadratic constraint, and hence that (3) is a simple two-variable convex quadratically constrained linear program. We first state the following simple technical result. Lemma 2.2. Let χ R, x, y X and an affine function Γ : X R be given. Then, the following conditions are equivalent: a) min{γ(x) + x x 2 /2 : x X } χ; b) χ Γ(x ) + Γ 2 /2. Proof. The result follows from the optimality conditions of the unconstrained optimization problem in a). The following result gives the aforementioned characterization for condition (7). Proposition 2.3. For every k, condition (7) holds if and only if A [φ(y k ) Γ k (x )] + a [φ(y k ) γ k (x )] + 2 A Γ k + a γ k 2. (4) 4

5 Proof. This result follows from Lemma 2.2 with Γ = A Γ k + a γ k, χ = (A + a )φ(y k ) and y = y k. Note that (4) is a single convex quadratic constraint on the scalars A and a. Hence, the new accelerated method based on (3) chooses the pair (A k, a k ) as an optimal solution of the simple two-variable convex quadratically constrained linear program max A + a s.t. A [φ(y k+ ) Γ k (x )] + a [φ(y k+ ) γ k+ (x )] + 2 A Γ k + a γ k+ 2, A, a. (5) The remaining part of this section discusses the case where problem (5) is unbounded. It will be seen that this case implies that y k X, in which case the new accelerated method may be successfully stopped. The following result, whose proof is given in Appendix B, considers a slightly more general problem which contains (5) as a special case. Proposition 2.4. Suppose that x, y X and γ,..., γ m : X R are affine functions minorizing φ and consider the problem max s.t. m i= α i { inf m x X i= α iγ i (x) + 2 x x 2} m i= α iφ(y), α,..., α m. (6) Then, the following conditions are equivalent: a) problem (6) is unbounded; b) there exist not all zero nonnegative scalars ᾱ,..., ᾱ m such that In both cases, y X. m ᾱ i γ i =, ᾱ i [φ(y) γ i (y)] =, i =,..., m. (7) i= Proposition 2.4 deals with the case where (5) is unbounded. On the other hand, when (5) is bounded, its feasible set is compact, and hence (5) has an optimal solution. In our computational experiments, we will refer to the instance of the Accelerated Class which chooses the pair (A k, a k ) as an optimal solution of (5) as the adaptive accelerated (AA) method. 3 Numerical results In this section, we describe two restarting variants of the AA method and report numerical results comparing them to the following variants of Nesterov s method: i) FISTA (fast iterative shrinkage-thresholding algorithm) of [] (see also Algorithm 2 of [8]); ii) : restarting variant of FISTA; iii) GKR-2: Algorithm 2 of [7]; iii) GKR-3: Algorithm 3 of [7]. 5

6 More specifically, we compare the performance of these methods using three classes of conic quadratic programming instances, namely: a) random convex quadratic programs (CQPs) (see Subsection 3.); b) semidefinite least squares (SDLSs) (see Subsection 3.2); and c) random nonnegative least squares (NNLSs) (see Subsection 3.3). We observe that we have implemented our own code for. Although we have recently learned that a similar variant was implemented in [6], we have not had the opportunity to include their code in our computational benchmark. Nevertheless, we believe that our implementation should be very similar to the method in [6], and hence should reflect the actual performance of the latter algorithm. We stop all methods whenever an iterate y k is found such that yk (I + h) (y k )) L f(yk 6 (8) or when 2 iterations have been performed. Note that for the case where h is an indicator function of a closed convex set, the left hand side of (8) is exactly the norm of the projected gradient with stepsize /L. We now briefly describe the two restarting versions of the AA method. In the first version, if the function value at the end of the kth iteration increases, i.e., φ(y k ) < φ(y k ), then the method is restarted at step with x = y k and y = y k. refers to the variant of FISTA which incorporates this restarting scheme. Moreover, is equivalent to one of the restarting variants of FISTA discussed in [6]. In contrast to the first restarting version above, the second one allows some iterations to increase the function value, and hence is more conservative than the first one. More specifically, we restart this variant at the kth iteration whenever φ(y k ) < φ(y k ) and no more than log 2 (k l k ) restarts have been performed so far, where l k is the iteration where the last restart was performed. We will refer to the first and second restarting variants of the AA method as AA-R and, respectively. The codes for all the benchmarked variants tested are written in MATLAB. All the computational results were obtained on a single core of a server with 2 Xeon X552 processors at 2.27GHz and 48GB RAM. We now make some general remarks about how the results are reported on the tables given below. Tables, 3 and 5 report the times and Tables 2, 4 and 6 report the number of iterations for all instances of the three problem classes. Each problem class is associated with two tables, one reporting the times and the other one the number of iterations required by each benchmarked method to solve all instances of the class. We display the time or number of iterations that a variant takes on an instance in red, and also with an asterisk (*), whenever it cannot solve the instance to the required accuracy. In such a case, the accuracy obtained at the last iteration of the variant is also displayed in parentheses. Figures, 3 and 5 plot the time performance profiles (see [4]), and Figures 2, 4 and 6 plot the iteration performance profiles for each of the three problem classes. We recall the following definition of a performance profile. For a given instance, a method A is said to be at most x times slower than method B, if the time taken (resp. number of iterations performed) by method A is at most x times the time taken (resp. number of iterations performed) by method B. A point (x, y) is in the performance profile curve of a method if it can solve exactly (y)% of all the tested instances x times slower than any other competing method. 6

7 Performance Profiles (6 CQP instances) tol= -6 (time) Performance Profiles (6 CQP instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) at most x (the time of the other methods) Figure : Time performance profiles for solving CQP instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 3. Numerical results for random CQPs This subsection compares the performance of our methods AA-R and with the variants of Nesterov s method listed at the beginning of this section on a class of randomly generated sparsecqp instances. These instances were also used to report the performance of GKR-2 and GKR-3 in [7]. Let R n denote the n-dimensional Euclidean space, S n denote the set of all n n symmetric matrices and S n + denote the cone of n n symmetric positive semidefinite matrices. Given Q S n +, b R n, l {R { }} n and u {R { }} n such that l u, the box constrained convex quadratic programming problem is defined as Letting f and h be defined as min x R n { x T Qx + b T x : l x u }. (9) f(x) = x T Qx + b T x, h(x) = δ B (x), x R n, where B = {x R n : l x u}, we can easily see that (9) is a special case of () with Ω = X = R n Figures and 2 plot time and iteration performance profiles of all variants of Nesterov s method for solving this collection of random sparse CQP instances, respectively. Tables and 2 report the time and number of iterations taken by each method, respectively. Note that AA-R and outperform the other methods on most of the random sparse CQP instances. From Figures and 2 we can see that the aggressive restart scheme of AA-R performs slight better than the conservative one of on these instances. 3.2 Numerical results for SDLSs This subsection compares the performance of our methods AA-R and with the variants of Nesterov s method listed at the beginning of this section on a class of SDLS instances. 7

8 Performance Profiles (6 CQP instances) tol= -6 (iterations) Performance Profiles (6 CQP instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods) at most x (the # of iterations of the other methods) Figure 2: Iteration performance profiles for solving CQP instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. Given a linear map A S n R m and b R m, the semidefinite programming (SDP) feasibility problem consists of finding x such that Ax = b, x S n +. We can solve the above problem by considering the SDLS reformulation { } min x S n 2 Ax b 2 : x S+ n. (2) Letting f and h be defined as f(x) = 2 Ax b 2, h(x) = δ S n + (x), x S n, where denotes the Euclidean norm, we can easily see that (2) is a special case of () with Ω = X = S n. The SDLS instances included in this comparison are obtained via the above construction from the feasibility sets (after bringing them into standard form) of four classes of SDPs, namely: i) randomly generated SDPs as in [7]; ii) SDP relaxations of frequency assignment problems (see for example Subsection 2.4 in [2]); iii) SDP relaxations of binary integer quadratic problems (see for example Section 7 in [9]); and iv) SDP relaxations of quadratic assignment problems (see for example Section 7 in [9]). Figures 3 and 4 plot time and iteration performance profiles of all variants of Nesterov s method for solving this collection of SDLS instances, respectively. Tables 3 and 4 report the time and number of iterations taken by each method, respectively. Note that AA-R and outperform the other methods on most of the SDLS instances. From Figures 3 and 4 we can see that the aggressive restart scheme of AA-R performs slight better than the conservative one of on these instances. 8

9 Performance Profiles (68 SDLS instances) tol= -6 (time) Performance Profiles (68 SDLS instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) 2 4 at most x (the time of the other methods) Figure 3: Time performance profiles for solving SDLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. Performance Profiles (68 SDLS instances) tol= -6 (iterations) Performance Profiles (68 SDLS instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods) at most x (the # of iterations of the other methods) Figure 4: Number of iterations performance profiles for solving SDLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 9

10 Performance Profiles (58 NNLS instances) tol= -6 (time) Performance Profiles (58 NNLS instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) at most x (the time of the other methods) Figure 5: Time performance profiles for solving NNLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 3.3 Numerical results for NNLSs This subsection compares the performance of our methods AA-R and with the variants of Nesterov s method listed at the beginning of this section on a class of NNLS instances randomly generated as in [9]. Given a matrix A R m n and a vector b R m, the NNLS problem is defined as min x R n where denotes the Euclidean norm. Letting f and h be defined as { 2 Ax b 2 : x f(x) = 2 Ax b 2, h(x) = δ R n + (x), x R n, }, (2) where R n + is the cone of nonnegative vectors in R n, we can easily see that (2) is a special case of () with Ω = X = R n. Figures 5 and 6 plot the time and iteration performance profiles of all variants of Nesterov s method for solving this collection of random NNLS instances, respectively. Tables 5 and 6 report the time and number of iterations taken by each method, respectively. Note that outperforms the other methods on most of the NNLS instances where a solution with the required accuracy was found. From Figures 5 and 6 we can see that the conservative restart scheme of performs better and is more robust than the aggressive one of AA-R on these instances. Also, we can see that the methods that do not incorporate a restart scheme are only able to solve less than % of the NNLS instances, while the ones that do incorporate a restart scheme solve at least 7% of them.

11 Performance Profiles (58 NNLS instances) tol= -6 (iterations) Performance Profiles (58 NNLS instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods).5 2 at most x (the # of iterations of the other methods) Figure 6: Number of iterations performance profiles for solving NNLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 4 Concluding remarks We have observed in our computational experiments that quickly stops performing restarts, while AA-R periodically continues to perform restarts. Figures 7 and 8 plot the time and iteration performance profiles of all variants of Nesterov s method on the collection of instances obtained by combining all the three instance classes described in the previous sections. From these plots we can see that the overall performances of AA-R and are very close to one another, with turning out to be more robust, as it solves more problems to the specified accuracy 6 than any other of the variants tested. We can see that AA-R and are the two fastest variants among the six ones used in this benchmark in terms of both time and number of iterations. Finally, our implementation of the AA method and its restarting variants can be found at Acknowledgements We want to thank Elizabeth W. Karas for generously providing us with the code of the algorithms in [7]. References [] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci., 2():83 22, March 29. URL: , doi:.37/ [2] S. Burer, R. D. C. Monteiro, and Y. Zhang. A computational study of a gradient-based logbarrier algorithm for a class of large-scale SDPs. Mathematical Programming, 95: , 23..7/s URL:

12 Performance Profiles (86 instances) tol= -6 (time) Performance Profiles (86 instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) 2 4 at most x (the time of the other methods) Figure 7: Time performance profiles for solving all the three problem classes with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. Performance Profiles (86 instances) tol= -6 (iterations) Performance Profiles (86 instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods) at most x (the # of iterations of the other methods) Figure 8: Iterations performance profiles for solving all the three problem classes with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 2

13 [3] O. Devolder, F. Glineur, and Y. Nesterov. First-order methods of smooth convex optimization with inexact oracle. Core discussion paper 2/2, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, December 2. [4] E. D. Dolan and J. J. Moré. Benchmarking optimization software with performance profiles, January 22. URL: s7263, doi: [5] D. Goldfarb and K. Scheinberg. Fast first-order methods for composite convex optimization with line search. Optimization-online preprint 34, 2. URL: optimization-online.org/db_html/2/4/34.html. [6] C. C. Gonzaga and E. W. Karas. Fine tuning Nesterov s steepest descent algorithm for differentiable convex programming. Mathematical Programming, pages 26, 22. URL: doi:.7/s z. [7] Clóvis C. Gonzaga, Elizabeth W. Karas, and Diane R. Rossetto. An optimal algorithm for constrained differentiable convex optimization. Optimization-online preprint 353, pages 6, 2. URL: [8] G. Lan, Z. Lu, and R.D.C. Monteiro. Primal-dual first-order methods with iteration-complexity for cone programming. Mathematical Programming, 26(): 29, 2. [9] M. Merritt and Y. Zhang. Interior-point gradient method for large-scale totally nonnegative least squares problems. Journal of Optimization Theory and Applications, 26:9 22, 25..7/s z. URL: [] R. D. C. Monteiro and B. F. Svaiter. An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM Journal on Optimization, 23(2):92 25, 23. URL: arxiv: doi:.37/ [] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O (/k2). Doklady AN SSSR, 269(3): , 983. [2] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87 of Applied Optimization. Kluwer Academic Publishers, 24. [3] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 3():27 52, 25. [4] Y. Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program., 9(2-3, Ser. B):39 344, 27. [5] Y. Nesterov. Gradient methods for minimizing composite objective function. CORE, 27. [6] Brendan O Donoghue and Emmanuel Candès. Adaptive restart for accelerated gradient schemes. Foundations of Computational Mathematics, pages 8, 23. URL: http: //dx.doi.org/.7/s , doi:.7/s [7] J. Povh, F. Rendl, and A. Wiegele. A boundary point method to solve semidefinite programs. Computing, 78: , 26..7/s URL: 7/s

14 [8] P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 28. [9] X.-Y. Zhao, D. Sun, and K.-C. Toh. A Newton-CG augmented lagrangian method for semidefinite programming. SIAM Journal on Optimization, 2(4): , 2. URL: doi:.37/ A Technical results use in Section 2 This section presents some technical results used in Section 2. In particular, Lemma A.3 shows that the Accelerated Class satisfies statements a) c) of Proposition 2. at every iteration. Assume that the functions φ, f, and h, and the set Ω satisfy C. C.4 of Section 2 in the following results. We start by showing two technical lemmas that will be used to show the main result of this appendix in Lemma A.3. Lemma A.. Let A, λ >, x, y X and a closed convex function Γ : X R be given, and assume that the triple (y, A, Γ) satisfies Also, define x X, a > and x X as Aφ(y) min u X {AΓ(u) + 2 u x 2 }, AΓ Aφ. (22) x = arg min u X {AΓ(u) + 2 u x 2 }, (23) a := λ + λ 2 + 4λA, x := A 2 A + a y + a x. (24) A + a Then, for any proper closed convex function γ : X R minorizing φ we have {aγ(u) + AΓ(u) + 2 } u u 2 (A + a) θ λ, min u X where θ := min {λγ(u) + 2 } u u X x 2. (25) Proof. Let an arbitrary u X be given. Define and note that ũ x = ũ := Ay + au A + a Note also that the definition of a in (24) implies that ( ) a 2 (u x). A + a λ = a2 A + a. 4

15 In view of (23), 22, the fact that γ φ, the functions Γ and γ are convex, and the last three relations, we conclude that aγ(u) + AΓ(u) + 2 u x 2 aγ(u) + 2 u x 2 + min u X aγ(u) + 2 u x 2 + Aγ(y) ( ) Ay + au (A + a)γ + u x 2 A + a 2 [ ] (A + a) = (A + a) γ(ũ) + 2a 2 ũ x 2 (A + a) min u X { γ(u ) + 2λ u x 2 {AΓ(u) + 2 u x 2 } The result now follows from (25) and the fact that the above inequality holds for any u X. Lemma A.2. Let λ >, ξ X and a proper closed convex function g : X (, ] be given and denote the optimal solution and optimal value of {λg(x) + 2 } x ξ 2. (26) min x X by x and ḡ, respectively. Then, the affine function γ : X R defined as }. γ(x) = g( x) + ξ x, x x, x X, λ has the property that γ g and the optimal solution and optimal value of {λγ(x) + 2 } x ξ 2 min x X (27) is x and ḡ, respectively. Proof. The optimality conditions for (26) imply that λ g( x) + x ξ, or equivalently (ξ x) g( x), λ and hence that γ g, by the definition of γ and the subgradient of a function. Moreover, x clearly satisfies the optimality condition of (27). Hence, the last claim of the proposition follows. The main result of this appendix is as follows. Lemma A.3. Let A, < λ /L, x, y X and an affine function Γ : X R be given, and assume that the triple (y, A, Γ) satisfies (22). Compute x as in (23), a and x as in (24), x Ω := Π Ω ( x), A + := A + a and y + := (I + λ h) ( x λ f( x Ω )) = arg min {λh(u) + 2 } u x + λ f( x Ω) 2, u X 5

16 and define the affine functions γ, Γ + : X R as γ(u) := f( x Ω ) + f( x Ω ), y + x Ω + h(y + ) + λ x y+, u y + u X, Γ + := A A + a Γ + a A + a γ. Then, the triple (y +, A +, Γ + ) satisfies A + φ(y + ) min u X {A +Γ + (u) + 2 u x 2 }, A + Γ + A + φ, (28) A+ A + 2 λ. (29) Moreover, x A + Γ + = x a λ ( x y+ ). (3) Proof. We first claim that γ φ and Define the function g : X R as φ(y + ) min u X and note that assumption C.3 implies { γ(u) + } u x 2. (3) 2λ g(u) = f( x Ω ) + f( x Ω ), u x Ω + h(u) u X, g(u) φ(u) g(u) + L 2 u x Ω 2 g(u) + 2λ u x Ω 2 u X. (32) Then, in view of the definition of y + and λ, and the above relation with u = y +, we have min {λg(u) + 2 } u u X x 2 = min {λh(u) + 2 } u x + λ f( x Ω) 2 λ2 u X 2 f( x Ω) 2 + λf( x Ω ) = λh(y + ) + 2 y+ x + λ f( x Ω ) 2 λ2 2 f( x Ω) 2 + λf( x Ω ) λh(y + ) + 2 y+ x Ω + λ f( x Ω ) 2 λ2 2 f( x Ω) 2 + λf( x Ω ) [ = λ g(y + ) + ] 2λ y+ x Ω 2 λφ(y + ), where the first inequality is due to the non-expansiveness property of Π Ω. The claim follows from the above relation, the definitions of γ and g, and Lemma A.2 with ξ = x and noting that in this case x = y +. Using the first inequality of (22), Lemma A. and (3), we conclude that min u X {aγ(u) + AΓ(u) + 2 u x 2 } A + a λ min u X (A + a)φ(y + ). {λγ(u) + 2 u x 2 }. 6

17 Hence, (28) follows from the previous relation, the definitions of A + and Γ +, the second inequality in (22), and the previous claim. Relation (29) can be shown by noting that the definition of a implies a λ 2 + λa which, together with the definition of A +, implies that ( λ A + A ) λa ( ) A 2 + λ. 2 Finally, (3) follows immediately from the definition of γ, A + and Γ +, and the fact that x = x A Γ. B Proof of Proposition 2.4 First, note that the equivalence between a) and c) of Lemma 2.2 with Γ = m i= α iγ i and χ = m i= α iφ(y) imply that the hard constraint in problem (6) is equivalent to m 2 m α i γ i + y x + 2 α i [φ(y) γ i (y)] y x 2. (33) i= i= Now, assume that b) holds. Then, in view of (7), the point α(t) := (tᾱ,..., tᾱ m ) clearly satisfies (33), and hence is feasible for (6), for every t >. Hence, for a fixed x X, this conclusion implies that m (tᾱ i )φ(y) i= ( m (tᾱ i )γ i (x ) + m ) 2 x x 2 t ᾱ i φ(x ) + 2 x x 2, i= i= where the last inequality follows from the assumption that γ i φ and the non-negativity of tᾱ,..., tᾱ m. Dividing this expression by t( m i= ᾱi) >, and letting t, we then conclude that φ(y) φ(x ), and hence that y X. Also, the objective function of (6) at the point α(t) converges to infinity as t, showing that b) implies a). Now assume that a) holds. Then, there exists a sequence {α k = (α k,..., αk m) : k } of feasible solutions of problem (6) along which its objective function converges to infinity. Consider the sequence {ᾱ k := α k /(e T α k ) : k }, where e is the vector of all ones. Clearly, {ᾱ k } has an accumulation point ᾱ = (ᾱ,..., ᾱ m ), where ᾱ. Moreover, using the fact that α k satisfies (33) and the fact that γ i (y) φ(y), we easily see that (7) holds. 7

18 C Tables Table : Time comparison of the methods on CQP instances. Problem Time in seconds (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 QP n5 p QP n5 p QP n5 p QP n5 p QP n p QP n p QP n p *(.2-6).3 QP n p QP n2 p QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p *(.3-6) 6. QP n p QP n p QP n p QP n p *(2. -6) 8.5 QP n2 p *(.6-6).8 QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p QP n7 p QP n7 p QP n7 p QP n7 p QP n9 p *(2.2-6).7 QP n9 p QP n9 p QP n9 p QP nk p *(4. -6) 4.3 QP nk p QP nk p QP nk p QP n2k p *(3. -6) 3.9 QP n2k p QP n2k p QP n2k p QP n5k p *(3.2-6) 2 QP n5k p QP n5k p QP n5k p QP n7k p *(4.4-6) 65.2 QP n7k p QP n7k p *(.2-6) QP n7k p QP n9k p *(5.5-6).6 QP n9k p QP n9k p QP n9k p QP nk p *(5.5-6) 76. QP nk p QP nk p QP nk p

19 Table 2: Number of iterations comparison of the methods on CQP instances. Problem # of iterations (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 QP n5 p QP n5 p QP n5 p QP n5 p QP n p QP n p QP n p *(.2-6) 4 QP n p QP n2 p QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p *(.3-6) 398 QP n p QP n p QP n p QP n p *(2. -6) 86 QP n2 p *(.6-6) 62 QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p QP n7 p QP n7 p QP n7 p QP n7 p QP n9 p *(2.2-6) 62 QP n9 p QP n9 p QP n9 p QP nk p *(4. -6) 76 QP nk p QP nk p QP nk p QP n2k p *(3. -6) 7 QP n2k p QP n2k p QP n2k p QP n5k p *(3.2-6) 79 QP n5k p QP n5k p QP n5k p QP n7k p *(4.4-6) 77 QP n7k p QP n7k p *(.2-6) 388 QP n7k p QP n9k p *(5.5-6) 78 QP n9k p QP n9k p QP n9k p QP nk p *(5.5-6) 8 QP nk p QP nk p QP nk p

20 Table 3: Time comparison of the methods on SDLS instances. Problem Time in seconds (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 BIQ n2 m *(.4-6) 46.8 BIQ n3 m *(.4-6) 47.6 BIQ n4 m *(.4-6) 47.8 BIQ n5 m *(.4-6) 44.4 BIQ n6 m *(.4-6) 42. BIQ n7 m *(.4-6) 52.8 BIQ n8 m *(.4-6) 4.4 BIQ n9 m *(.4-6) 48.7 BIQ n m *(.4-6) 56.7 BIQ n2 m *(.4-6) 44.6 BIQ n5 m *(.4-6) 64.5 BIQ n2 m *(.4-6) 57.9 BIQ n25 m *(.4-6) 44.8 BIQ n5 m *(.4-6) 76 FAP n52 m FAP n6 m FAP n65 m FAP n8 m FAP n84 m FAP n93 m FAP n98 m FAP n2 m FAP n74 m FAP n83 m FAP n252 m FAP n369 m FAP n28 m FAP n4 m QAP n44 m *(. -6) *(2.5-6) 544. QAP n96 m *(3.3-6) 299.8*(2.3-6) QAP n225 m QAP n256 m *(.7-6) *(.2-6) QAP n289 m *(4. -6) QAP n324 m *(.9-6) QAP n4 m *(2.4-6) 557.4*(.3-6) QAP n44 m *(.8-6) QAP n484 m QAP n625 m *(2.8-6) 289 QAP n676 m *(2.7-6) QAP n729 m *(.8-6) QAP n784 m *(2. -6) QAP n9 m QAP n225 m *(.3-6) 59.9 QAP n6 m *(2.3-6) 75.*(. -6) RAND n3 mk p RAND n3 m2k p *(. -6) RAND n3 m25k p *(.5-6) *(. -6) RAND n4 m5k p RAND n4 m3k p RAND n4 m4k p *(.5-6) *(.3-6) RAND n5 m2k p RAND n5 m3k p RAND n5 m4k p RAND n5 m5k p *(. -6) 96.9*(.3-6) RAND n6 m2k p RAND n6 m4k p RAND n6 m5k p RAND n6 m6k p *(.6-6) *(.2-6) RAND n7 m5k p RAND n7 m7k p RAND n7 m9k p *(.3-6) *(. -6) RAND n8 mk p RAND n8 mk p *(.2-6) *(. -6) RAND n8 m7k p RAND n9 mk p RAND n9 m4k p *(.3-6) RAND nk mk p RAND nk m5k p *(.2-6)

21 Table 4: Number of iterations comparison of the methods on SDLS instances. Problem # of iterations (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 BIQ n2 m *(.4-6) 285 BIQ n3 m *(.4-6) 285 BIQ n4 m *(.4-6) 285 BIQ n5 m *(.4-6) 285 BIQ n6 m *(.4-6) 285 BIQ n7 m *(.4-6) 285 BIQ n8 m *(.4-6) 285 BIQ n9 m *(.4-6) 285 BIQ n m *(.4-6) 285 BIQ n2 m *(.4-6) 285 BIQ n5 m *(.4-6) 285 BIQ n2 m *(.4-6) 285 BIQ n25 m *(.4-6) 285 BIQ n5 m *(.4-6) 285 FAP n52 m FAP n6 m FAP n65 m FAP n8 m FAP n84 m FAP n93 m FAP n98 m FAP n2 m FAP n74 m FAP n83 m FAP n252 m FAP n369 m FAP n28 m FAP n4 m QAP n44 m *(. -6) 233 2*(2.5-6) 688 QAP n96 m *(3.3-6) 2*(2.3-6) QAP n225 m QAP n256 m *(.7-6) 2*(.2-6) QAP n289 m *(4. -6) 357 QAP n324 m *(.9-6) 939 QAP n4 m *(2.4-6) 2*(.3-6) QAP n44 m *(.8-6) QAP n484 m QAP n625 m *(2.8-6) 387 QAP n676 m *(2.7-6) 342 QAP n729 m *(.8-6) 646 QAP n784 m *(2. -6) 587 QAP n9 m QAP n225 m *(.3-6) 33 QAP n6 m *(2.3-6) 2*(. -6) RAND n3 mk p RAND n3 m2k p *(. -6) 2 RAND n3 m25k p *(.5-6) 2*(. -6) RAND n4 m5k p RAND n4 m3k p RAND n4 m4k p *(.5-6) 2*(.3-6) RAND n5 m2k p RAND n5 m3k p RAND n5 m4k p RAND n5 m5k p *(. -6) 2*(.3-6) RAND n6 m2k p RAND n6 m4k p RAND n6 m5k p RAND n6 m6k p *(.6-6) 2*(.2-6) RAND n7 m5k p RAND n7 m7k p RAND n7 m9k p *(.3-6) 2*(. -6) RAND n8 mk p RAND n8 mk p *(.2-6) 2*(. -6) RAND n8 m7k p RAND n9 mk p RAND n9 m4k p *(.3-6) 72 RAND nk mk p RAND nk m5k p *(.2-6) 972 2

22 Table 5: Time comparison of the methods on NNLS instances. Problem Time in seconds (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 NNLS n2 mk *(2.3-5) *(3.8-5) 2.9*(3.3-5) NNLS n2 mk *(.2-5) *(.9-6) 6.8*(.7-5) NNLS n2 m2k 2 4.7*(2.2-6) *(3.2-6) 59.*(4.8-6) NNLS n2 m2k NNLS n2 m *(6.7-6) 2 NNLS n2 m4k *(5. -6).3.9*(.3-5) 87.5*(3.2-4) NNLS n2 m4k *(.4-4).8 33.*(. -5) 4.5*(3.3-4) NNLS n2 m *(.7-5).8 7.2*(6.2-6) 2.*(5.6-5) NNLS n2 m6k *(.7-5).6 9.6*(.3-5) 2.8*(5. -6) NNLS n2 m *(7.7-5) 2.*(8. -6) 23.7*(5.4-5) NNLS n2 m8k *(. -6) 5.*(3. -4) *(.6-5) 3.7*(4. -4) NNLS n4 mk *(3. -4) *(. -4) 59.7*(.9-4) NNLS n4 mk *(2. -6) *(7. -6) NNLS n4 m2k *(3.7-4) *(.5-5) 5.6*(3. -4) NNLS n4 m2k *(.4-4).5 4.2*(6.5-6) 4.*(. -4) NNLS n4 m4k 4 6.7*(. -6) *(7.9-5) *(2. -5).7*(.9-4) NNLS n4 m4k *(7.9-6) *(.9-6) 44.3*(5.5-6) NNLS n4 m6k 4 8.7*(. -6).2 6.6*(4.5-5) *(2. -6) 35.2*(4. -5) NNLS n4 m *(.2-6) *(4.6-6) NNLS n4 m8k *(2.6-4) 8.4*(. -6) 43.2*(.2-4) 6.*(2.9-4) NNLS n6 mk *(8.4-5) 7.2*(.9-6) 47.9*(6.8-6) 45.5*(5.6-5) NNLS n6 m2k *(2.2-6) *(.5-6) 22.8*(.8-5) 93.*(4.9-6) NNLS n6 m2k *(6. -6) *(2.6-6) 28.7*(5. -5) NNLS n6 m4k *(4.6-6) 65.*(2.6-6) 88.2*(5.8-5) *(4.5-5) 263.6*(3.3-5) NNLS n6 m4k *(6.6-6).3 4.*(2. -6) 3.*(3.5-5) NNLS n6 m6k *(7.5-5) *(4.6-5) 55.5*(6. -5) NNLS n6 m8k *(3. -5) *(2.8-5) 5.2*(.8-5) NNLS n8 mk *(2.2-5) *(2.4-6).*(.3-5) NNLS n8 m2k *(6.6-5) *(.8-5) 9.9*(2. -4) NNLS n8 m2k *(.2-4).7 29.*(4.2-5) 28.5*(7.8-5) NNLS n8 m4k *(.4-4) 72.*(5.6-6) 36.*(3.4-5) 222.6*(3.9-4) NNLS n8 m4k *(8.3-5) *(5.3-6) 63.*(7. -5) NNLS n8 m6k *(.2-5) *(4. -6) 44.3*(.4-5) NNLS n8 m8k *(3. -6) 22.4*(. -6) *(.3-5) NNLS nk mk 23.3*(.8-6) *(4.7-5) *(4. -5) 97.*(3.9-5) NNLS nk m2k *(7.5-5) 5.*(. -6) 244.5*(6.3-5) 66.8*(7. -5) NNLS nk m2k *(2. -4) *(.8-5) 26.2*(.3-4) NNLS nk m4k 3.6*(3.5-6) 95.3*(. -6) 43.5*(3. -4) 37.5*(.8-6) 33.9*(3. -5) 276.9*(4. -4) NNLS nk m4k *(2.5-5) *(4.7-6) 35.5*(3.5-5) NNLS nk m6k *(. -5) *(4.6-6) 5.9*(5.3-6) NNLS nk m8k *(5.2-5) *(.3-5) 74.2*(.4-4) NNLS n2k mk *(5. -5) *(7.8-6) 33.3*(. -4) NNLS n2k m2k *(.2-6) *(2.3-5) *(2.3-5) 236.4*(2.4-5) NNLS n2k m4k *(5.5-6) *(2.2-5) *(.2-5) 8.8*(.2-5) NNLS n2k m4k *(3. -5) *(2.4-5) 9.5*(2.6-5) NNLS n2k m6k *(8.4-5) *(3.7-5) 3.4*(6.2-5) NNLS n2k m8k *(.3-5) 37.3*(.6-6) 27.8*(8.2-6) 82.6*(. -5) NNLS n4k mk *(.6-6) *(4.9-5) *(4. -5) 35.*(3.6-5) NNLS n4k m2k *(.2-6) 8.8*(8. -5) 85.9*(.3-6) 56.3*(4.7-5) 38.3*(5.8-5) NNLS n4k m4k *(.6-6) 452.2*(.6-6) 435.9*(8.3-5) 533.3*(2.9-6) 558.5*(.3-5) 22.4*(.4-4) NNLS n4k m8k *(8.8-5) *(2. -5) 45.3*(3.8-5) NNLS n6k m2k 6 3.9*(2.5-6) *(7.6-5) 267.9*(.2-6) 95.7*(9.9-6) 64.*(6.5-5) NNLS n6k m4k *(3. -6) *(.2-4) 62.*(.2-6) *(5.5-5) 252.3*(9.2-5) NNLS n8k m2k *(6.6-5) *(3.7-5) 6.5*(7.3-5) NNLS n8k m4k *(2.3-6) 68.5*(. -5) 637.7*(.4-6) 329.2*(6.2-6) *(2. -5) NNLS nk m2k 469.2*(2. -6) *(. -4) 478.3*(3. -6) 623.3*(5.9-5) 4.5*(8.2-5) NNLS nk m4k 78.7*(2.6-6) 72.7*(2.4-6) 39.4*(7. -5) 785.6*(2.8-6) 323.6*(2.6-5) 264.*(.3-4) NNLS n2k m4k *(.2-6) 294.*(.8-6) 792.2*(7.5-5) 244.2*(3.4-6) 667.7*(3.9-5) *(6.9-5) 22

23 Table 6: Number of iterations comparison of the methods on NNLS instances. Problem # of iterations (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 NNLS n2 mk *(2.3-5) 553 2*(3.8-5) 2*(3.3-5) NNLS n2 mk *(.2-5) 689 2*(.9-6) 2*(.7-5) NNLS n2 m2k 2 2*(2.2-6) *(3.2-6) 2*(4.8-6) NNLS n2 m2k NNLS n2 m *(6.7-6) 36 NNLS n2 m4k *(5. -6) 824 2*(.3-5) 2*(3.2-4) NNLS n2 m4k *(.4-4) 47 2*(. -5) 2*(3.3-4) NNLS n2 m *(.7-5) 8 2*(6.2-6) 2*(5.6-5) NNLS n2 m6k *(.7-5) 323 2*(.3-5) 2*(5. -6) NNLS n2 m *(7.7-5) 88 2*(8. -6) 2*(5.4-5) NNLS n2 m8k *(. -6) 2*(3. -4) 32 2*(.6-5) 2*(4. -4) NNLS n4 mk *(3. -4) 943 2*(. -4) 2*(.9-4) NNLS n4 mk *(2. -6) *(7. -6) NNLS n4 m2k *(3.7-4) 866 2*(.5-5) 2*(3. -4) NNLS n4 m2k *(.4-4) 49 2*(6.5-6) 2*(. -4) NNLS n4 m4k 4 2*(. -6) 39 2*(7.9-5) 556 2*(2. -5) 2*(.9-4) NNLS n4 m4k *(7.9-6) 44 2*(.9-6) 2*(5.5-6) NNLS n4 m6k 4 2*(. -6) 387 2*(4.5-5) 673 2*(2. -6) 2*(4. -5) NNLS n4 m *(.2-6) *(4.6-6) NNLS n4 m8k *(2.6-4) 2*(. -6) 2*(.2-4) 2*(2.9-4) NNLS n6 mk *(8.4-5) 2*(.9-6) 2*(6.8-6) 2*(5.6-5) NNLS n6 m2k 6 2*(2.2-6) *(.5-6) 2*(.8-5) 2*(4.9-6) NNLS n6 m2k *(6. -6) 78 2*(2.6-6) 2*(5. -5) NNLS n6 m4k 6 2*(4.6-6) 2*(2.6-6) 2*(5.8-5) 392 2*(4.5-5) 2*(3.3-5) NNLS n6 m4k *(6.6-6) 466 2*(2. -6) 2*(3.5-5) NNLS n6 m6k *(7.5-5) 46 2*(4.6-5) 2*(6. -5) NNLS n6 m8k *(3. -5) 486 2*(2.8-5) 2*(.8-5) NNLS n8 mk *(2.2-5) 395 2*(2.4-6) 2*(.3-5) NNLS n8 m2k *(6.6-5) 626 2*(.8-5) 2*(2. -4) NNLS n8 m2k *(.2-4) 83 2*(4.2-5) 2*(7.8-5) NNLS n8 m4k *(.4-4) 2*(5.6-6) 2*(3.4-5) 2*(3.9-4) NNLS n8 m4k *(8.3-5) 33 2*(5.3-6) 2*(7. -5) NNLS n8 m6k *(.2-5) 499 2*(4. -6) 2*(.4-5) NNLS n8 m8k *(3. -6) 2*(. -6) 76 2*(.3-5) NNLS nk mk 2*(.8-6) 352 2*(4.7-5) 372 2*(4. -5) 2*(3.9-5) NNLS nk m2k *(7.5-5) 2*(. -6) 2*(6.3-5) 2*(7. -5) NNLS nk m2k *(2. -4) 958 2*(.8-5) 2*(.3-4) NNLS nk m4k 2*(3.5-6) 2*(. -6) 2*(3. -4) 2*(.8-6) 2*(3. -5) 2*(4. -4) NNLS nk m4k *(2.5-5) 533 2*(4.7-6) 2*(3.5-5) NNLS nk m6k *(. -5) 356 2*(4.6-6) 2*(5.3-6) NNLS nk m8k *(5.2-5) 87 2*(.3-5) 2*(.4-4) NNLS n2k mk *(5. -5) 889 2*(7.8-6) 2*(. -4) NNLS n2k m2k 2 2*(.2-6) 44 2*(2.3-5) 52 2*(2.3-5) 2*(2.4-5) NNLS n2k m4k 2 2*(5.5-6) 286 2*(2.2-5) 449 2*(.2-5) 2*(.2-5) NNLS n2k m4k *(3. -5) 897 2*(2.4-5) 2*(2.6-5) NNLS n2k m6k *(8.4-5) 93 2*(3.7-5) 2*(6.2-5) NNLS n2k m8k *(.3-5) 2*(.6-6) 2*(8.2-6) 2*(. -5) NNLS n4k mk 4 2*(.6-6) 543 2*(4.9-5) 826 2*(4. -5) 2*(3.6-5) NNLS n4k m2k *(.2-6) 2*(8. -5) 2*(.3-6) 2*(4.7-5) 2*(5.8-5) NNLS n4k m4k 4 2*(.6-6) 2*(.6-6) 2*(8.3-5) 2*(2.9-6) 2*(.3-5) 2*(.4-4) NNLS n4k m8k *(8.8-5) 929 2*(2. -5) 2*(3.8-5) NNLS n6k m2k 6 2*(2.5-6) 49 2*(7.6-5) 2*(.2-6) 2*(9.9-6) 2*(6.5-5) NNLS n6k m4k 6 2*(3. -6) 695 2*(.2-4) 2*(.2-6) 2*(5.5-5) 2*(9.2-5) NNLS n8k m2k *(6.6-5) 953 2*(3.7-5) 2*(7.3-5) NNLS n8k m4k *(2.3-6) 2*(. -5) 2*(.4-6) 2*(6.2-6) 2*(2. -5) NNLS nk m2k 2*(2. -6) 744 2*(. -4) 2*(3. -6) 2*(5.9-5) 2*(8.2-5) NNLS nk m4k 2*(2.6-6) 2*(2.4-6) 2*(7. -5) 2*(2.8-6) 2*(2.6-5) 2*(.3-4) NNLS n2k m4k 2 2*(.2-6) 2*(.8-6) 2*(7.5-5) 2*(3.4-6) 2*(3.9-5) 2*(6.9-5) 23

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

An inexact block-decomposition method for extra large-scale conic semidefinite programming

An inexact block-decomposition method for extra large-scale conic semidefinite programming Math. Prog. Comp. manuscript No. (will be inserted by the editor) An inexact block-decomposition method for extra large-scale conic semidefinite programming Renato D. C. Monteiro Camilo Ortiz Benar F.

More information

An inexact block-decomposition method for extra large-scale conic semidefinite programming

An inexact block-decomposition method for extra large-scale conic semidefinite programming An inexact block-decomposition method for extra large-scale conic semidefinite programming Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter December 9, 2013 Abstract In this paper, we present an inexact

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

A hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities

A hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities A hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities Renato D.C. Monteiro Mauricio R. Sicre B. F. Svaiter June 3, 13 Revised: August 8, 14) Abstract

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Abstract This paper presents an accelerated

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 3, No., pp. 109 115 c 013 Society for Industrial and Applied Mathematics AN ACCELERATED HYBRID PROXIMAL EXTRAGRADIENT METHOD FOR CONVEX OPTIMIZATION AND ITS IMPLICATIONS TO SECOND-ORDER

More information

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS WEIWEI KONG, JEFFERSON G. MELO, AND RENATO D.C. MONTEIRO Abstract.

More information

Iteration-Complexity of a Newton Proximal Extragradient Method for Monotone Variational Inequalities and Inclusion Problems

Iteration-Complexity of a Newton Proximal Extragradient Method for Monotone Variational Inequalities and Inclusion Problems Iteration-Complexity of a Newton Proximal Extragradient Method for Monotone Variational Inequalities and Inclusion Problems Renato D.C. Monteiro B. F. Svaiter April 14, 2011 (Revised: December 15, 2011)

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Iteration-complexity of first-order augmented Lagrangian methods for convex programming

Iteration-complexity of first-order augmented Lagrangian methods for convex programming Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Iteration-complexity of a Rockafellar s proximal method of multipliers for convex programming based on second-order approximations

Iteration-complexity of a Rockafellar s proximal method of multipliers for convex programming based on second-order approximations Iteration-complexity of a Rockafellar s proximal method of multipliers for convex programming based on second-order approximations M. Marques Alves R.D.C. Monteiro Benar F. Svaiter February 1, 016 Abstract

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

Projection methods to solve SDP

Projection methods to solve SDP Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Journal of Convex Analysis Vol. 14, No. 2, March 2007 AN EXPLICIT DESCENT METHOD FOR BILEVEL CONVEX OPTIMIZATION. Mikhail Solodov. September 12, 2005

Journal of Convex Analysis Vol. 14, No. 2, March 2007 AN EXPLICIT DESCENT METHOD FOR BILEVEL CONVEX OPTIMIZATION. Mikhail Solodov. September 12, 2005 Journal of Convex Analysis Vol. 14, No. 2, March 2007 AN EXPLICIT DESCENT METHOD FOR BILEVEL CONVEX OPTIMIZATION Mikhail Solodov September 12, 2005 ABSTRACT We consider the problem of minimizing a smooth

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Solving large Semidefinite Programs - Part 1 and 2

Solving large Semidefinite Programs - Part 1 and 2 Solving large Semidefinite Programs - Part 1 and 2 Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Singapore workshop 2006 p.1/34 Overview Limits of Interior

More information

arxiv: v3 [math.oc] 18 Apr 2012

arxiv: v3 [math.oc] 18 Apr 2012 A class of Fejér convergent algorithms, approximate resolvents and the Hybrid Proximal-Extragradient method B. F. Svaiter arxiv:1204.1353v3 [math.oc] 18 Apr 2012 Abstract A new framework for analyzing

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

More First-Order Optimization Algorithms

More First-Order Optimization Algorithms More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

On well definedness of the Central Path

On well definedness of the Central Path On well definedness of the Central Path L.M.Graña Drummond B. F. Svaiter IMPA-Instituto de Matemática Pura e Aplicada Estrada Dona Castorina 110, Jardim Botânico, Rio de Janeiro-RJ CEP 22460-320 Brasil

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP Kaifeng Jiang, Defeng Sun, and Kim-Chuan Toh Abstract The accelerated proximal gradient (APG) method, first

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Non-negative Matrix Factorization via accelerated Projected Gradient Descent

Non-negative Matrix Factorization via accelerated Projected Gradient Descent Non-negative Matrix Factorization via accelerated Projected Gradient Descent Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium Email: manshun.ang@umons.ac.be Homepage: angms.science

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP Kaifeng Jiang, Defeng Sun, and Kim-Chuan Toh March 3, 2012 Abstract The accelerated proximal gradient (APG)

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

More information

Merit functions and error bounds for generalized variational inequalities

Merit functions and error bounds for generalized variational inequalities J. Math. Anal. Appl. 287 2003) 405 414 www.elsevier.com/locate/jmaa Merit functions and error bounds for generalized variational inequalities M.V. Solodov 1 Instituto de Matemática Pura e Aplicada, Estrada

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

FINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING. 1. Introduction. We study the nonlinear programming problem

FINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING. 1. Introduction. We study the nonlinear programming problem FINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING CLÓVIS C. GONZAGA AND ELIZABETH W. KARAS Abstract. We modify the first order algorithm for convex programming described

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

A globally convergent Levenberg Marquardt method for equality-constrained optimization

A globally convergent Levenberg Marquardt method for equality-constrained optimization Computational Optimization and Applications manuscript No. (will be inserted by the editor) A globally convergent Levenberg Marquardt method for equality-constrained optimization A. F. Izmailov M. V. Solodov

More information

Primal-Dual First-Order Methods for a Class of Cone Programming

Primal-Dual First-Order Methods for a Class of Cone Programming Primal-Dual First-Order Methods for a Class of Cone Programming Zhaosong Lu March 9, 2011 Abstract In this paper we study primal-dual first-order methods for a class of cone programming problems. In particular,

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

arxiv: v1 [math.fa] 16 Jun 2011

arxiv: v1 [math.fa] 16 Jun 2011 arxiv:1106.3342v1 [math.fa] 16 Jun 2011 Gauge functions for convex cones B. F. Svaiter August 20, 2018 Abstract We analyze a class of sublinear functionals which characterize the interior and the exterior

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Proximal-like contraction methods for monotone variational inequalities in a unified framework Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

An interior-point gradient method for large-scale totally nonnegative least squares problems

An interior-point gradient method for large-scale totally nonnegative least squares problems An interior-point gradient method for large-scale totally nonnegative least squares problems Michael Merritt and Yin Zhang Technical Report TR04-08 Department of Computational and Applied Mathematics Rice

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

On the iterate convergence of descent methods for convex optimization

On the iterate convergence of descent methods for convex optimization On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

A solution approach for linear optimization with completely positive matrices

A solution approach for linear optimization with completely positive matrices A solution approach for linear optimization with completely positive matrices Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria joint work with M. Bomze (Wien) and F.

More information