An adaptive accelerated first-order method for convex optimization

Size: px

Start display at page:

Download "An adaptive accelerated first-order method for convex optimization"

Bartholomew Thompson
6 years ago
Views:

1 An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant of Nesterov s method for solving composite convex optimization problems in which certain acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance compared to existing accelerated variants while at the same time preserve the optimal iteration-complexity shared by these methods. Computational results are presented to demonstrate that the proposed adaptive accelerated method endowed with a restarting scheme outperforms other existing accelerated variants. Introduction The methods of choice for solving large-scale convex optimization problems, specially when high accuracy is not needed, are first-order methods due to their cheap iteration cost (time and memory). In [] (see also [3]), Nesterov presents a scheme for accelerating first-order methods, more specifically, the steepest descent method for unconstrained convex optimization and, more generally, the projected gradient method for constrained convex optimization. He shows that the rate of convergence of these methods, namely O(/k), where k denotes the iteration count, can be improved to O(/k 2 ) by considering their corresponding accelerated variants. Due to its wide use for solving large-scale convex optimization problems arising in several applications, other accelerated variants of Nesterov s method for the aforementioned problems, and more generally the composite convex optimization problem, have been proposed and studied in the literature (see for example [, 3, 5, 6, 8,,, 2, 4, 5, 8]). Moreover, there has been an increasing effort towards improving the practical performance of these methods (see for example [6, 6]) in the solution of large-scale convex optimization problems. All of the accelerated variants mentioned above use certain accelerated parameters which are obtained by using well-known update formulas. This paper presents a new accelerated variant for composite convex optimization in which the acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance in comparison to these aforementioned variants while at the same time preserve the optimal iteration-complexity shared by these methods (with the exemption of [6]). More specifically, the new accelerated variant chooses School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, ( monteiro@isye.gatech.edu). The work of this author was partially supported by NSF Grants CMMI-994 and CMMI-322, and ONR Grant ONR N School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, ( camiort@gatech.edu). IMPA, Estrada Dona Castorina, Rio de Janeiro, Brazil ( benar@impa.br). The work of this author was partially supported by CNPq grants no /28-8, 32962/2-5, 48/28-6, /2-7, FAPERJ grants E-26/2.82/28, E-26/2.94/2.

2 the acceleration parameters at every iteration by solving a simple two-variable convex quadratically constrained linear program. Moreover, its iteration cost is comparable to or better than that of the aforementioned accelerated methods since it only computes a single resolvent (or proximal subproblem) at every iteration. For the purpose of our computational experiments, we have implemented the new accelerated variant endowed with two restarting schemes in order to improve its practical performance. The first restarting scheme is an aggressive one which restarts the method from the last iterate whenever the objective function increases (also proposed in [6]). The second restarting scheme is a more conservative (and more robust) one which restarts the method whenever the objective function increases and the number of iterations at that point is sufficiently large. Finally, computational results are presented on several conic quadratic programming instances showing that the new accelerated variant endowed with either one of the restarting schemes is faster and more robust than other existing Nesterov s variants. Our paper is organized as follows. Section 2 presents a class of accelerated first-order methods, and corresponding iteration-complexity results, for solving composite convex optimization problems. Moreover, this section proposes a specific instance of the latter class which adaptively chooses the sequence of acceleration parameters so as to greedily minimize an upper bound on the primal gap. Section 3 presents computational results comparing the latter instance (endowed with one of the aforementioned restart schemes) with other existing variants of Nesterov s method. Finally, Section 4 presents some final remarks. 2 An accelerated first-order method This section introduces an accelerated first-order method presented in [] for solving composite convex optimization problems. First, we propose a method with general acceleration parameters satisfying certain conditions which still guarantee an optimal iteration-complexity. Next, for the purpose of improving the practical performance of the method, we propose a specific way for choosing these parameters that greedily minimizes an upper bound on the primal gap. Let R denote the set of real numbers and define R := R {± }. Also, let X denote a finite dimensional real inner product space with inner product and induced norm denoted by, and, respectively. Our problem of interest is the composite convex optimization problem where: C. h : X R is a proper closed convex function; φ := min φ(x) := f(x) + h(x) () x X C.2 f is a real-valued function which is differentiable and convex on a closed convex set Ω dom h := {x X : h(x) < } ; C.3 f is L-Lipschitz continuous on Ω for some L >, i.e., f(x ) f(x) L x x x, x Ω; C.4 the set X of optimal solutions of () is non-empty. There are many accelerated variants of Nesterov s method for solving () under assumptions C. C.4. In this paper we are interested in studying the properties of the following accelerated variant of Nesterov s method whose statement uses the projection operator Π Ω : X Ω onto Ω defined as Π Ω (x) = arg min x Ω { x x } x X. 2

3 Accelerated Class: A class of accelerated algorithms for () ) Let λ (, /L] and x X be given, and set A =, y = x, the affine function Γ : X R as Γ, and k = ; ) compute a k, x k, x k Ω, yk and the affine function γ k : X R as a k := λ + λ 2 + 4λA k, (2) 2 A k a k x k := y k + x k, x k Ω = Π Ω ( x k ), (3) A k + a k A k + a k y k := arg min {λh(x) + 2 x x X xk + λ f( x kω) } 2, (4) γ k (x) := f( x k Ω) + f( x k Ω), y k x k Ω + h(y k ) + λ xk y k, x y k x X ; (5) 2) choose a pair (A k, a k ) = (A, a ) R R such that A, a, A + a A k + a k, (6) (A + a )φ(y k ) inf {(A Γ k + a γ k )(x) + 2 } x x X x 2, (7) with the safeguard that A = A = when k = ; 3) compute 4) set k k +, and go to step. A k = A k + a k, (8) A k a k Γ k = A k + Γ k + a k A k + γ k, (9) a k x k = x A k Γ k ; () Observe that (7), (8) and (9) imply A k φ(y k ) inf x X {A k Γ k (x) + 2 x x 2 }. () Also, note that the Accelerated Class does not specify how to choose (A k, a k ) R R satisfying (6) and (7). Different choices of the pair (A k, a k ) determines different instances of the class. For example, if the pair (A k, a k ) is chosen as (A k, a k ) = (A k, a k ) then, using equation (3) of Lemma A.3, it can be shown that the Accelerated Class reduces to Algorithm I of []. Moreover, if (A k, a k ) = (A k, a k ) and Ω = X (and hence the gradient of f is defined and is L-Lipschitz continuous on the whole X ), then the Accelerated Class reduces to a well-known Nesterov s accelerated variant, namely FISTA []. On the other hand, this paper is concerned with the instance of the above class which chooses the pair (A k, a k ) which maximizes A k subject to conditions (8) and (). 3

4 The following result not only shows that the pair (A k, a k ) = (A k, a k ) satisfies (6) and (7), but that any instance of the above class has optimal iteration-complexity. Proposition 2.. The following statements hold for every k : a) (A k, a k ) = (A k, a k ) satisfies (6) and (7), and hence the Accelerated Class is well-defined; b) Γ k is affine and Γ k φ; c) A k λk 2 /4; d) there holds where d is the distance of x to X. φ(y k ) φ d2 2A k 2d2 λk 2 (2) Proof. Statements a), b) and c) follow from Lemma A.3. Letting x denote the projection of x onto X, d) follows from c) and the fact that () and b) imply that A k φ(y k ) A k Γ k (x ) + 2 x x 2 A k φ + 2 d2. Observe that Proposition 2.(d) implies that any instance of the Accelerated Class with λ = /L has the optimal iteration-complexity O(Ld 2 /k2 ). Moreover, based on the fact that (8) and (2) imply that ( ) φ(y k ) φ = O A k +, a k our new accelerated variant chooses the pair (A k, a k ) as (A k, a k ) arg max A,a {A + a : A, a and (A, a ) satisfies (7)} (3) so as to greedily reduce the primal gap φ(y k ) φ as much as possible. We will now establish that (7) is equivalent to a convex quadratic constraint, and hence that (3) is a simple two-variable convex quadratically constrained linear program. We first state the following simple technical result. Lemma 2.2. Let χ R, x, y X and an affine function Γ : X R be given. Then, the following conditions are equivalent: a) min{γ(x) + x x 2 /2 : x X } χ; b) χ Γ(x ) + Γ 2 /2. Proof. The result follows from the optimality conditions of the unconstrained optimization problem in a). The following result gives the aforementioned characterization for condition (7). Proposition 2.3. For every k, condition (7) holds if and only if A [φ(y k ) Γ k (x )] + a [φ(y k ) γ k (x )] + 2 A Γ k + a γ k 2. (4) 4

5 Proof. This result follows from Lemma 2.2 with Γ = A Γ k + a γ k, χ = (A + a )φ(y k ) and y = y k. Note that (4) is a single convex quadratic constraint on the scalars A and a. Hence, the new accelerated method based on (3) chooses the pair (A k, a k ) as an optimal solution of the simple two-variable convex quadratically constrained linear program max A + a s.t. A [φ(y k+ ) Γ k (x )] + a [φ(y k+ ) γ k+ (x )] + 2 A Γ k + a γ k+ 2, A, a. (5) The remaining part of this section discusses the case where problem (5) is unbounded. It will be seen that this case implies that y k X, in which case the new accelerated method may be successfully stopped. The following result, whose proof is given in Appendix B, considers a slightly more general problem which contains (5) as a special case. Proposition 2.4. Suppose that x, y X and γ,..., γ m : X R are affine functions minorizing φ and consider the problem max s.t. m i= α i { inf m x X i= α iγ i (x) + 2 x x 2} m i= α iφ(y), α,..., α m. (6) Then, the following conditions are equivalent: a) problem (6) is unbounded; b) there exist not all zero nonnegative scalars ᾱ,..., ᾱ m such that In both cases, y X. m ᾱ i γ i =, ᾱ i [φ(y) γ i (y)] =, i =,..., m. (7) i= Proposition 2.4 deals with the case where (5) is unbounded. On the other hand, when (5) is bounded, its feasible set is compact, and hence (5) has an optimal solution. In our computational experiments, we will refer to the instance of the Accelerated Class which chooses the pair (A k, a k ) as an optimal solution of (5) as the adaptive accelerated (AA) method. 3 Numerical results In this section, we describe two restarting variants of the AA method and report numerical results comparing them to the following variants of Nesterov s method: i) FISTA (fast iterative shrinkage-thresholding algorithm) of [] (see also Algorithm 2 of [8]); ii) : restarting variant of FISTA; iii) GKR-2: Algorithm 2 of [7]; iii) GKR-3: Algorithm 3 of [7]. 5

6 More specifically, we compare the performance of these methods using three classes of conic quadratic programming instances, namely: a) random convex quadratic programs (CQPs) (see Subsection 3.); b) semidefinite least squares (SDLSs) (see Subsection 3.2); and c) random nonnegative least squares (NNLSs) (see Subsection 3.3). We observe that we have implemented our own code for. Although we have recently learned that a similar variant was implemented in [6], we have not had the opportunity to include their code in our computational benchmark. Nevertheless, we believe that our implementation should be very similar to the method in [6], and hence should reflect the actual performance of the latter algorithm. We stop all methods whenever an iterate y k is found such that yk (I + h) (y k )) L f(yk 6 (8) or when 2 iterations have been performed. Note that for the case where h is an indicator function of a closed convex set, the left hand side of (8) is exactly the norm of the projected gradient with stepsize /L. We now briefly describe the two restarting versions of the AA method. In the first version, if the function value at the end of the kth iteration increases, i.e., φ(y k ) < φ(y k ), then the method is restarted at step with x = y k and y = y k. refers to the variant of FISTA which incorporates this restarting scheme. Moreover, is equivalent to one of the restarting variants of FISTA discussed in [6]. In contrast to the first restarting version above, the second one allows some iterations to increase the function value, and hence is more conservative than the first one. More specifically, we restart this variant at the kth iteration whenever φ(y k ) < φ(y k ) and no more than log 2 (k l k ) restarts have been performed so far, where l k is the iteration where the last restart was performed. We will refer to the first and second restarting variants of the AA method as AA-R and, respectively. The codes for all the benchmarked variants tested are written in MATLAB. All the computational results were obtained on a single core of a server with 2 Xeon X552 processors at 2.27GHz and 48GB RAM. We now make some general remarks about how the results are reported on the tables given below. Tables, 3 and 5 report the times and Tables 2, 4 and 6 report the number of iterations for all instances of the three problem classes. Each problem class is associated with two tables, one reporting the times and the other one the number of iterations required by each benchmarked method to solve all instances of the class. We display the time or number of iterations that a variant takes on an instance in red, and also with an asterisk (*), whenever it cannot solve the instance to the required accuracy. In such a case, the accuracy obtained at the last iteration of the variant is also displayed in parentheses. Figures, 3 and 5 plot the time performance profiles (see [4]), and Figures 2, 4 and 6 plot the iteration performance profiles for each of the three problem classes. We recall the following definition of a performance profile. For a given instance, a method A is said to be at most x times slower than method B, if the time taken (resp. number of iterations performed) by method A is at most x times the time taken (resp. number of iterations performed) by method B. A point (x, y) is in the performance profile curve of a method if it can solve exactly (y)% of all the tested instances x times slower than any other competing method. 6

7 Performance Profiles (6 CQP instances) tol= -6 (time) Performance Profiles (6 CQP instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) at most x (the time of the other methods) Figure : Time performance profiles for solving CQP instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 3. Numerical results for random CQPs This subsection compares the performance of our methods AA-R and with the variants of Nesterov s method listed at the beginning of this section on a class of randomly generated sparsecqp instances. These instances were also used to report the performance of GKR-2 and GKR-3 in [7]. Let R n denote the n-dimensional Euclidean space, S n denote the set of all n n symmetric matrices and S n + denote the cone of n n symmetric positive semidefinite matrices. Given Q S n +, b R n, l {R { }} n and u {R { }} n such that l u, the box constrained convex quadratic programming problem is defined as Letting f and h be defined as min x R n { x T Qx + b T x : l x u }. (9) f(x) = x T Qx + b T x, h(x) = δ B (x), x R n, where B = {x R n : l x u}, we can easily see that (9) is a special case of () with Ω = X = R n Figures and 2 plot time and iteration performance profiles of all variants of Nesterov s method for solving this collection of random sparse CQP instances, respectively. Tables and 2 report the time and number of iterations taken by each method, respectively. Note that AA-R and outperform the other methods on most of the random sparse CQP instances. From Figures and 2 we can see that the aggressive restart scheme of AA-R performs slight better than the conservative one of on these instances. 3.2 Numerical results for SDLSs This subsection compares the performance of our methods AA-R and with the variants of Nesterov s method listed at the beginning of this section on a class of SDLS instances. 7

8 Performance Profiles (6 CQP instances) tol= -6 (iterations) Performance Profiles (6 CQP instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods) at most x (the # of iterations of the other methods) Figure 2: Iteration performance profiles for solving CQP instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. Given a linear map A S n R m and b R m, the semidefinite programming (SDP) feasibility problem consists of finding x such that Ax = b, x S n +. We can solve the above problem by considering the SDLS reformulation { } min x S n 2 Ax b 2 : x S+ n. (2) Letting f and h be defined as f(x) = 2 Ax b 2, h(x) = δ S n + (x), x S n, where denotes the Euclidean norm, we can easily see that (2) is a special case of () with Ω = X = S n. The SDLS instances included in this comparison are obtained via the above construction from the feasibility sets (after bringing them into standard form) of four classes of SDPs, namely: i) randomly generated SDPs as in [7]; ii) SDP relaxations of frequency assignment problems (see for example Subsection 2.4 in [2]); iii) SDP relaxations of binary integer quadratic problems (see for example Section 7 in [9]); and iv) SDP relaxations of quadratic assignment problems (see for example Section 7 in [9]). Figures 3 and 4 plot time and iteration performance profiles of all variants of Nesterov s method for solving this collection of SDLS instances, respectively. Tables 3 and 4 report the time and number of iterations taken by each method, respectively. Note that AA-R and outperform the other methods on most of the SDLS instances. From Figures 3 and 4 we can see that the aggressive restart scheme of AA-R performs slight better than the conservative one of on these instances. 8

9 Performance Profiles (68 SDLS instances) tol= -6 (time) Performance Profiles (68 SDLS instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) 2 4 at most x (the time of the other methods) Figure 3: Time performance profiles for solving SDLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. Performance Profiles (68 SDLS instances) tol= -6 (iterations) Performance Profiles (68 SDLS instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods) at most x (the # of iterations of the other methods) Figure 4: Number of iterations performance profiles for solving SDLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 9

10 Performance Profiles (58 NNLS instances) tol= -6 (time) Performance Profiles (58 NNLS instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) at most x (the time of the other methods) Figure 5: Time performance profiles for solving NNLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 3.3 Numerical results for NNLSs This subsection compares the performance of our methods AA-R and with the variants of Nesterov s method listed at the beginning of this section on a class of NNLS instances randomly generated as in [9]. Given a matrix A R m n and a vector b R m, the NNLS problem is defined as min x R n where denotes the Euclidean norm. Letting f and h be defined as { 2 Ax b 2 : x f(x) = 2 Ax b 2, h(x) = δ R n + (x), x R n, }, (2) where R n + is the cone of nonnegative vectors in R n, we can easily see that (2) is a special case of () with Ω = X = R n. Figures 5 and 6 plot the time and iteration performance profiles of all variants of Nesterov s method for solving this collection of random NNLS instances, respectively. Tables 5 and 6 report the time and number of iterations taken by each method, respectively. Note that outperforms the other methods on most of the NNLS instances where a solution with the required accuracy was found. From Figures 5 and 6 we can see that the conservative restart scheme of performs better and is more robust than the aggressive one of AA-R on these instances. Also, we can see that the methods that do not incorporate a restart scheme are only able to solve less than % of the NNLS instances, while the ones that do incorporate a restart scheme solve at least 7% of them.

11 Performance Profiles (58 NNLS instances) tol= -6 (iterations) Performance Profiles (58 NNLS instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods).5 2 at most x (the # of iterations of the other methods) Figure 6: Number of iterations performance profiles for solving NNLS instances with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 4 Concluding remarks We have observed in our computational experiments that quickly stops performing restarts, while AA-R periodically continues to perform restarts. Figures 7 and 8 plot the time and iteration performance profiles of all variants of Nesterov s method on the collection of instances obtained by combining all the three instance classes described in the previous sections. From these plots we can see that the overall performances of AA-R and are very close to one another, with turning out to be more robust, as it solves more problems to the specified accuracy 6 than any other of the variants tested. We can see that AA-R and are the two fastest variants among the six ones used in this benchmark in terms of both time and number of iterations. Finally, our implementation of the AA method and its restarting variants can be found at Acknowledgements We want to thank Elizabeth W. Karas for generously providing us with the code of the algorithms in [7]. References [] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci., 2():83 22, March 29. URL: , doi:.37/ [2] S. Burer, R. D. C. Monteiro, and Y. Zhang. A computational study of a gradient-based logbarrier algorithm for a class of large-scale SDPs. Mathematical Programming, 95: , 23..7/s URL:

12 Performance Profiles (86 instances) tol= -6 (time) Performance Profiles (86 instances) tol= -6 (time) AA-R FISTA GKR-2 GKR AA-R at most x (the time of the other methods) 2 4 at most x (the time of the other methods) Figure 7: Time performance profiles for solving all the three problem classes with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. Performance Profiles (86 instances) tol= -6 (iterations) Performance Profiles (86 instances) tol= -6 (iterations) AA-R FISTA GKR-2 GKR AA-R at most x (the # of iterations of the other methods) at most x (the # of iterations of the other methods) Figure 8: Iterations performance profiles for solving all the three problem classes with accuracy ɛ = 6. On the left we include all the methods and on the right we include the three fastest variants, namely: AA-R, and. 2

13 [3] O. Devolder, F. Glineur, and Y. Nesterov. First-order methods of smooth convex optimization with inexact oracle. Core discussion paper 2/2, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, December 2. [4] E. D. Dolan and J. J. Moré. Benchmarking optimization software with performance profiles, January 22. URL: s7263, doi: [5] D. Goldfarb and K. Scheinberg. Fast first-order methods for composite convex optimization with line search. Optimization-online preprint 34, 2. URL: optimization-online.org/db_html/2/4/34.html. [6] C. C. Gonzaga and E. W. Karas. Fine tuning Nesterov s steepest descent algorithm for differentiable convex programming. Mathematical Programming, pages 26, 22. URL: doi:.7/s z. [7] Clóvis C. Gonzaga, Elizabeth W. Karas, and Diane R. Rossetto. An optimal algorithm for constrained differentiable convex optimization. Optimization-online preprint 353, pages 6, 2. URL: [8] G. Lan, Z. Lu, and R.D.C. Monteiro. Primal-dual first-order methods with iteration-complexity for cone programming. Mathematical Programming, 26(): 29, 2. [9] M. Merritt and Y. Zhang. Interior-point gradient method for large-scale totally nonnegative least squares problems. Journal of Optimization Theory and Applications, 26:9 22, 25..7/s z. URL: [] R. D. C. Monteiro and B. F. Svaiter. An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM Journal on Optimization, 23(2):92 25, 23. URL: arxiv: doi:.37/ [] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O (/k2). Doklady AN SSSR, 269(3): , 983. [2] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87 of Applied Optimization. Kluwer Academic Publishers, 24. [3] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 3():27 52, 25. [4] Y. Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program., 9(2-3, Ser. B):39 344, 27. [5] Y. Nesterov. Gradient methods for minimizing composite objective function. CORE, 27. [6] Brendan O Donoghue and Emmanuel Candès. Adaptive restart for accelerated gradient schemes. Foundations of Computational Mathematics, pages 8, 23. URL: http: //dx.doi.org/.7/s , doi:.7/s [7] J. Povh, F. Rendl, and A. Wiegele. A boundary point method to solve semidefinite programs. Computing, 78: , 26..7/s URL: 7/s

14 [8] P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 28. [9] X.-Y. Zhao, D. Sun, and K.-C. Toh. A Newton-CG augmented lagrangian method for semidefinite programming. SIAM Journal on Optimization, 2(4): , 2. URL: doi:.37/ A Technical results use in Section 2 This section presents some technical results used in Section 2. In particular, Lemma A.3 shows that the Accelerated Class satisfies statements a) c) of Proposition 2. at every iteration. Assume that the functions φ, f, and h, and the set Ω satisfy C. C.4 of Section 2 in the following results. We start by showing two technical lemmas that will be used to show the main result of this appendix in Lemma A.3. Lemma A.. Let A, λ >, x, y X and a closed convex function Γ : X R be given, and assume that the triple (y, A, Γ) satisfies Also, define x X, a > and x X as Aφ(y) min u X {AΓ(u) + 2 u x 2 }, AΓ Aφ. (22) x = arg min u X {AΓ(u) + 2 u x 2 }, (23) a := λ + λ 2 + 4λA, x := A 2 A + a y + a x. (24) A + a Then, for any proper closed convex function γ : X R minorizing φ we have {aγ(u) + AΓ(u) + 2 } u u 2 (A + a) θ λ, min u X where θ := min {λγ(u) + 2 } u u X x 2. (25) Proof. Let an arbitrary u X be given. Define and note that ũ x = ũ := Ay + au A + a Note also that the definition of a in (24) implies that ( ) a 2 (u x). A + a λ = a2 A + a. 4

15 In view of (23), 22, the fact that γ φ, the functions Γ and γ are convex, and the last three relations, we conclude that aγ(u) + AΓ(u) + 2 u x 2 aγ(u) + 2 u x 2 + min u X aγ(u) + 2 u x 2 + Aγ(y) ( ) Ay + au (A + a)γ + u x 2 A + a 2 [ ] (A + a) = (A + a) γ(ũ) + 2a 2 ũ x 2 (A + a) min u X { γ(u ) + 2λ u x 2 {AΓ(u) + 2 u x 2 } The result now follows from (25) and the fact that the above inequality holds for any u X. Lemma A.2. Let λ >, ξ X and a proper closed convex function g : X (, ] be given and denote the optimal solution and optimal value of {λg(x) + 2 } x ξ 2. (26) min x X by x and ḡ, respectively. Then, the affine function γ : X R defined as }. γ(x) = g( x) + ξ x, x x, x X, λ has the property that γ g and the optimal solution and optimal value of {λγ(x) + 2 } x ξ 2 min x X (27) is x and ḡ, respectively. Proof. The optimality conditions for (26) imply that λ g( x) + x ξ, or equivalently (ξ x) g( x), λ and hence that γ g, by the definition of γ and the subgradient of a function. Moreover, x clearly satisfies the optimality condition of (27). Hence, the last claim of the proposition follows. The main result of this appendix is as follows. Lemma A.3. Let A, < λ /L, x, y X and an affine function Γ : X R be given, and assume that the triple (y, A, Γ) satisfies (22). Compute x as in (23), a and x as in (24), x Ω := Π Ω ( x), A + := A + a and y + := (I + λ h) ( x λ f( x Ω )) = arg min {λh(u) + 2 } u x + λ f( x Ω) 2, u X 5

16 and define the affine functions γ, Γ + : X R as γ(u) := f( x Ω ) + f( x Ω ), y + x Ω + h(y + ) + λ x y+, u y + u X, Γ + := A A + a Γ + a A + a γ. Then, the triple (y +, A +, Γ + ) satisfies A + φ(y + ) min u X {A +Γ + (u) + 2 u x 2 }, A + Γ + A + φ, (28) A+ A + 2 λ. (29) Moreover, x A + Γ + = x a λ ( x y+ ). (3) Proof. We first claim that γ φ and Define the function g : X R as φ(y + ) min u X and note that assumption C.3 implies { γ(u) + } u x 2. (3) 2λ g(u) = f( x Ω ) + f( x Ω ), u x Ω + h(u) u X, g(u) φ(u) g(u) + L 2 u x Ω 2 g(u) + 2λ u x Ω 2 u X. (32) Then, in view of the definition of y + and λ, and the above relation with u = y +, we have min {λg(u) + 2 } u u X x 2 = min {λh(u) + 2 } u x + λ f( x Ω) 2 λ2 u X 2 f( x Ω) 2 + λf( x Ω ) = λh(y + ) + 2 y+ x + λ f( x Ω ) 2 λ2 2 f( x Ω) 2 + λf( x Ω ) λh(y + ) + 2 y+ x Ω + λ f( x Ω ) 2 λ2 2 f( x Ω) 2 + λf( x Ω ) [ = λ g(y + ) + ] 2λ y+ x Ω 2 λφ(y + ), where the first inequality is due to the non-expansiveness property of Π Ω. The claim follows from the above relation, the definitions of γ and g, and Lemma A.2 with ξ = x and noting that in this case x = y +. Using the first inequality of (22), Lemma A. and (3), we conclude that min u X {aγ(u) + AΓ(u) + 2 u x 2 } A + a λ min u X (A + a)φ(y + ). {λγ(u) + 2 u x 2 }. 6

17 Hence, (28) follows from the previous relation, the definitions of A + and Γ +, the second inequality in (22), and the previous claim. Relation (29) can be shown by noting that the definition of a implies a λ 2 + λa which, together with the definition of A +, implies that ( λ A + A ) λa ( ) A 2 + λ. 2 Finally, (3) follows immediately from the definition of γ, A + and Γ +, and the fact that x = x A Γ. B Proof of Proposition 2.4 First, note that the equivalence between a) and c) of Lemma 2.2 with Γ = m i= α iγ i and χ = m i= α iφ(y) imply that the hard constraint in problem (6) is equivalent to m 2 m α i γ i + y x + 2 α i [φ(y) γ i (y)] y x 2. (33) i= i= Now, assume that b) holds. Then, in view of (7), the point α(t) := (tᾱ,..., tᾱ m ) clearly satisfies (33), and hence is feasible for (6), for every t >. Hence, for a fixed x X, this conclusion implies that m (tᾱ i )φ(y) i= ( m (tᾱ i )γ i (x ) + m ) 2 x x 2 t ᾱ i φ(x ) + 2 x x 2, i= i= where the last inequality follows from the assumption that γ i φ and the non-negativity of tᾱ,..., tᾱ m. Dividing this expression by t( m i= ᾱi) >, and letting t, we then conclude that φ(y) φ(x ), and hence that y X. Also, the objective function of (6) at the point α(t) converges to infinity as t, showing that b) implies a). Now assume that a) holds. Then, there exists a sequence {α k = (α k,..., αk m) : k } of feasible solutions of problem (6) along which its objective function converges to infinity. Consider the sequence {ᾱ k := α k /(e T α k ) : k }, where e is the vector of all ones. Clearly, {ᾱ k } has an accumulation point ᾱ = (ᾱ,..., ᾱ m ), where ᾱ. Moreover, using the fact that α k satisfies (33) and the fact that γ i (y) φ(y), we easily see that (7) holds. 7

18 C Tables Table : Time comparison of the methods on CQP instances. Problem Time in seconds (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 QP n5 p QP n5 p QP n5 p QP n5 p QP n p QP n p QP n p *(.2-6).3 QP n p QP n2 p QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p *(.3-6) 6. QP n p QP n p QP n p QP n p *(2. -6) 8.5 QP n2 p *(.6-6).8 QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p QP n7 p QP n7 p QP n7 p QP n7 p QP n9 p *(2.2-6).7 QP n9 p QP n9 p QP n9 p QP nk p *(4. -6) 4.3 QP nk p QP nk p QP nk p QP n2k p *(3. -6) 3.9 QP n2k p QP n2k p QP n2k p QP n5k p *(3.2-6) 2 QP n5k p QP n5k p QP n5k p QP n7k p *(4.4-6) 65.2 QP n7k p QP n7k p *(.2-6) QP n7k p QP n9k p *(5.5-6).6 QP n9k p QP n9k p QP n9k p QP nk p *(5.5-6) 76. QP nk p QP nk p QP nk p

19 Table 2: Number of iterations comparison of the methods on CQP instances. Problem # of iterations (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 QP n5 p QP n5 p QP n5 p QP n5 p QP n p QP n p QP n p *(.2-6) 4 QP n p QP n2 p QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p *(.3-6) 398 QP n p QP n p QP n p QP n p *(2. -6) 86 QP n2 p *(.6-6) 62 QP n2 p QP n2 p QP n2 p QP n5 p QP n5 p QP n5 p QP n5 p QP n7 p QP n7 p QP n7 p QP n7 p QP n9 p *(2.2-6) 62 QP n9 p QP n9 p QP n9 p QP nk p *(4. -6) 76 QP nk p QP nk p QP nk p QP n2k p *(3. -6) 7 QP n2k p QP n2k p QP n2k p QP n5k p *(3.2-6) 79 QP n5k p QP n5k p QP n5k p QP n7k p *(4.4-6) 77 QP n7k p QP n7k p *(.2-6) 388 QP n7k p QP n9k p *(5.5-6) 78 QP n9k p QP n9k p QP n9k p QP nk p *(5.5-6) 8 QP nk p QP nk p QP nk p

20 Table 3: Time comparison of the methods on SDLS instances. Problem Time in seconds (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 BIQ n2 m *(.4-6) 46.8 BIQ n3 m *(.4-6) 47.6 BIQ n4 m *(.4-6) 47.8 BIQ n5 m *(.4-6) 44.4 BIQ n6 m *(.4-6) 42. BIQ n7 m *(.4-6) 52.8 BIQ n8 m *(.4-6) 4.4 BIQ n9 m *(.4-6) 48.7 BIQ n m *(.4-6) 56.7 BIQ n2 m *(.4-6) 44.6 BIQ n5 m *(.4-6) 64.5 BIQ n2 m *(.4-6) 57.9 BIQ n25 m *(.4-6) 44.8 BIQ n5 m *(.4-6) 76 FAP n52 m FAP n6 m FAP n65 m FAP n8 m FAP n84 m FAP n93 m FAP n98 m FAP n2 m FAP n74 m FAP n83 m FAP n252 m FAP n369 m FAP n28 m FAP n4 m QAP n44 m *(. -6) *(2.5-6) 544. QAP n96 m *(3.3-6) 299.8*(2.3-6) QAP n225 m QAP n256 m *(.7-6) *(.2-6) QAP n289 m *(4. -6) QAP n324 m *(.9-6) QAP n4 m *(2.4-6) 557.4*(.3-6) QAP n44 m *(.8-6) QAP n484 m QAP n625 m *(2.8-6) 289 QAP n676 m *(2.7-6) QAP n729 m *(.8-6) QAP n784 m *(2. -6) QAP n9 m QAP n225 m *(.3-6) 59.9 QAP n6 m *(2.3-6) 75.*(. -6) RAND n3 mk p RAND n3 m2k p *(. -6) RAND n3 m25k p *(.5-6) *(. -6) RAND n4 m5k p RAND n4 m3k p RAND n4 m4k p *(.5-6) *(.3-6) RAND n5 m2k p RAND n5 m3k p RAND n5 m4k p RAND n5 m5k p *(. -6) 96.9*(.3-6) RAND n6 m2k p RAND n6 m4k p RAND n6 m5k p RAND n6 m6k p *(.6-6) *(.2-6) RAND n7 m5k p RAND n7 m7k p RAND n7 m9k p *(.3-6) *(. -6) RAND n8 mk p RAND n8 mk p *(.2-6) *(. -6) RAND n8 m7k p RAND n9 mk p RAND n9 m4k p *(.3-6) RAND nk mk p RAND nk m5k p *(.2-6)

21 Table 4: Number of iterations comparison of the methods on SDLS instances. Problem # of iterations (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 BIQ n2 m *(.4-6) 285 BIQ n3 m *(.4-6) 285 BIQ n4 m *(.4-6) 285 BIQ n5 m *(.4-6) 285 BIQ n6 m *(.4-6) 285 BIQ n7 m *(.4-6) 285 BIQ n8 m *(.4-6) 285 BIQ n9 m *(.4-6) 285 BIQ n m *(.4-6) 285 BIQ n2 m *(.4-6) 285 BIQ n5 m *(.4-6) 285 BIQ n2 m *(.4-6) 285 BIQ n25 m *(.4-6) 285 BIQ n5 m *(.4-6) 285 FAP n52 m FAP n6 m FAP n65 m FAP n8 m FAP n84 m FAP n93 m FAP n98 m FAP n2 m FAP n74 m FAP n83 m FAP n252 m FAP n369 m FAP n28 m FAP n4 m QAP n44 m *(. -6) 233 2*(2.5-6) 688 QAP n96 m *(3.3-6) 2*(2.3-6) QAP n225 m QAP n256 m *(.7-6) 2*(.2-6) QAP n289 m *(4. -6) 357 QAP n324 m *(.9-6) 939 QAP n4 m *(2.4-6) 2*(.3-6) QAP n44 m *(.8-6) QAP n484 m QAP n625 m *(2.8-6) 387 QAP n676 m *(2.7-6) 342 QAP n729 m *(.8-6) 646 QAP n784 m *(2. -6) 587 QAP n9 m QAP n225 m *(.3-6) 33 QAP n6 m *(2.3-6) 2*(. -6) RAND n3 mk p RAND n3 m2k p *(. -6) 2 RAND n3 m25k p *(.5-6) 2*(. -6) RAND n4 m5k p RAND n4 m3k p RAND n4 m4k p *(.5-6) 2*(.3-6) RAND n5 m2k p RAND n5 m3k p RAND n5 m4k p RAND n5 m5k p *(. -6) 2*(.3-6) RAND n6 m2k p RAND n6 m4k p RAND n6 m5k p RAND n6 m6k p *(.6-6) 2*(.2-6) RAND n7 m5k p RAND n7 m7k p RAND n7 m9k p *(.3-6) 2*(. -6) RAND n8 mk p RAND n8 mk p *(.2-6) 2*(. -6) RAND n8 m7k p RAND n9 mk p RAND n9 m4k p *(.3-6) 72 RAND nk mk p RAND nk m5k p *(.2-6) 972 2

22 Table 5: Time comparison of the methods on NNLS instances. Problem Time in seconds (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 NNLS n2 mk *(2.3-5) *(3.8-5) 2.9*(3.3-5) NNLS n2 mk *(.2-5) *(.9-6) 6.8*(.7-5) NNLS n2 m2k 2 4.7*(2.2-6) *(3.2-6) 59.*(4.8-6) NNLS n2 m2k NNLS n2 m *(6.7-6) 2 NNLS n2 m4k *(5. -6).3.9*(.3-5) 87.5*(3.2-4) NNLS n2 m4k *(.4-4).8 33.*(. -5) 4.5*(3.3-4) NNLS n2 m *(.7-5).8 7.2*(6.2-6) 2.*(5.6-5) NNLS n2 m6k *(.7-5).6 9.6*(.3-5) 2.8*(5. -6) NNLS n2 m *(7.7-5) 2.*(8. -6) 23.7*(5.4-5) NNLS n2 m8k *(. -6) 5.*(3. -4) *(.6-5) 3.7*(4. -4) NNLS n4 mk *(3. -4) *(. -4) 59.7*(.9-4) NNLS n4 mk *(2. -6) *(7. -6) NNLS n4 m2k *(3.7-4) *(.5-5) 5.6*(3. -4) NNLS n4 m2k *(.4-4).5 4.2*(6.5-6) 4.*(. -4) NNLS n4 m4k 4 6.7*(. -6) *(7.9-5) *(2. -5).7*(.9-4) NNLS n4 m4k *(7.9-6) *(.9-6) 44.3*(5.5-6) NNLS n4 m6k 4 8.7*(. -6).2 6.6*(4.5-5) *(2. -6) 35.2*(4. -5) NNLS n4 m *(.2-6) *(4.6-6) NNLS n4 m8k *(2.6-4) 8.4*(. -6) 43.2*(.2-4) 6.*(2.9-4) NNLS n6 mk *(8.4-5) 7.2*(.9-6) 47.9*(6.8-6) 45.5*(5.6-5) NNLS n6 m2k *(2.2-6) *(.5-6) 22.8*(.8-5) 93.*(4.9-6) NNLS n6 m2k *(6. -6) *(2.6-6) 28.7*(5. -5) NNLS n6 m4k *(4.6-6) 65.*(2.6-6) 88.2*(5.8-5) *(4.5-5) 263.6*(3.3-5) NNLS n6 m4k *(6.6-6).3 4.*(2. -6) 3.*(3.5-5) NNLS n6 m6k *(7.5-5) *(4.6-5) 55.5*(6. -5) NNLS n6 m8k *(3. -5) *(2.8-5) 5.2*(.8-5) NNLS n8 mk *(2.2-5) *(2.4-6).*(.3-5) NNLS n8 m2k *(6.6-5) *(.8-5) 9.9*(2. -4) NNLS n8 m2k *(.2-4).7 29.*(4.2-5) 28.5*(7.8-5) NNLS n8 m4k *(.4-4) 72.*(5.6-6) 36.*(3.4-5) 222.6*(3.9-4) NNLS n8 m4k *(8.3-5) *(5.3-6) 63.*(7. -5) NNLS n8 m6k *(.2-5) *(4. -6) 44.3*(.4-5) NNLS n8 m8k *(3. -6) 22.4*(. -6) *(.3-5) NNLS nk mk 23.3*(.8-6) *(4.7-5) *(4. -5) 97.*(3.9-5) NNLS nk m2k *(7.5-5) 5.*(. -6) 244.5*(6.3-5) 66.8*(7. -5) NNLS nk m2k *(2. -4) *(.8-5) 26.2*(.3-4) NNLS nk m4k 3.6*(3.5-6) 95.3*(. -6) 43.5*(3. -4) 37.5*(.8-6) 33.9*(3. -5) 276.9*(4. -4) NNLS nk m4k *(2.5-5) *(4.7-6) 35.5*(3.5-5) NNLS nk m6k *(. -5) *(4.6-6) 5.9*(5.3-6) NNLS nk m8k *(5.2-5) *(.3-5) 74.2*(.4-4) NNLS n2k mk *(5. -5) *(7.8-6) 33.3*(. -4) NNLS n2k m2k *(.2-6) *(2.3-5) *(2.3-5) 236.4*(2.4-5) NNLS n2k m4k *(5.5-6) *(2.2-5) *(.2-5) 8.8*(.2-5) NNLS n2k m4k *(3. -5) *(2.4-5) 9.5*(2.6-5) NNLS n2k m6k *(8.4-5) *(3.7-5) 3.4*(6.2-5) NNLS n2k m8k *(.3-5) 37.3*(.6-6) 27.8*(8.2-6) 82.6*(. -5) NNLS n4k mk *(.6-6) *(4.9-5) *(4. -5) 35.*(3.6-5) NNLS n4k m2k *(.2-6) 8.8*(8. -5) 85.9*(.3-6) 56.3*(4.7-5) 38.3*(5.8-5) NNLS n4k m4k *(.6-6) 452.2*(.6-6) 435.9*(8.3-5) 533.3*(2.9-6) 558.5*(.3-5) 22.4*(.4-4) NNLS n4k m8k *(8.8-5) *(2. -5) 45.3*(3.8-5) NNLS n6k m2k 6 3.9*(2.5-6) *(7.6-5) 267.9*(.2-6) 95.7*(9.9-6) 64.*(6.5-5) NNLS n6k m4k *(3. -6) *(.2-4) 62.*(.2-6) *(5.5-5) 252.3*(9.2-5) NNLS n8k m2k *(6.6-5) *(3.7-5) 6.5*(7.3-5) NNLS n8k m4k *(2.3-6) 68.5*(. -5) 637.7*(.4-6) 329.2*(6.2-6) *(2. -5) NNLS nk m2k 469.2*(2. -6) *(. -4) 478.3*(3. -6) 623.3*(5.9-5) 4.5*(8.2-5) NNLS nk m4k 78.7*(2.6-6) 72.7*(2.4-6) 39.4*(7. -5) 785.6*(2.8-6) 323.6*(2.6-5) 264.*(.3-4) NNLS n2k m4k *(.2-6) 294.*(.8-6) 792.2*(7.5-5) 244.2*(3.4-6) 667.7*(3.9-5) *(6.9-5) 22

23 Table 6: Number of iterations comparison of the methods on NNLS instances. Problem # of iterations (accuracy less that 6 ) Instance n AA-R FISTA GKR-2 GKR-3 NNLS n2 mk *(2.3-5) 553 2*(3.8-5) 2*(3.3-5) NNLS n2 mk *(.2-5) 689 2*(.9-6) 2*(.7-5) NNLS n2 m2k 2 2*(2.2-6) *(3.2-6) 2*(4.8-6) NNLS n2 m2k NNLS n2 m *(6.7-6) 36 NNLS n2 m4k *(5. -6) 824 2*(.3-5) 2*(3.2-4) NNLS n2 m4k *(.4-4) 47 2*(. -5) 2*(3.3-4) NNLS n2 m *(.7-5) 8 2*(6.2-6) 2*(5.6-5) NNLS n2 m6k *(.7-5) 323 2*(.3-5) 2*(5. -6) NNLS n2 m *(7.7-5) 88 2*(8. -6) 2*(5.4-5) NNLS n2 m8k *(. -6) 2*(3. -4) 32 2*(.6-5) 2*(4. -4) NNLS n4 mk *(3. -4) 943 2*(. -4) 2*(.9-4) NNLS n4 mk *(2. -6) *(7. -6) NNLS n4 m2k *(3.7-4) 866 2*(.5-5) 2*(3. -4) NNLS n4 m2k *(.4-4) 49 2*(6.5-6) 2*(. -4) NNLS n4 m4k 4 2*(. -6) 39 2*(7.9-5) 556 2*(2. -5) 2*(.9-4) NNLS n4 m4k *(7.9-6) 44 2*(.9-6) 2*(5.5-6) NNLS n4 m6k 4 2*(. -6) 387 2*(4.5-5) 673 2*(2. -6) 2*(4. -5) NNLS n4 m *(.2-6) *(4.6-6) NNLS n4 m8k *(2.6-4) 2*(. -6) 2*(.2-4) 2*(2.9-4) NNLS n6 mk *(8.4-5) 2*(.9-6) 2*(6.8-6) 2*(5.6-5) NNLS n6 m2k 6 2*(2.2-6) *(.5-6) 2*(.8-5) 2*(4.9-6) NNLS n6 m2k *(6. -6) 78 2*(2.6-6) 2*(5. -5) NNLS n6 m4k 6 2*(4.6-6) 2*(2.6-6) 2*(5.8-5) 392 2*(4.5-5) 2*(3.3-5) NNLS n6 m4k *(6.6-6) 466 2*(2. -6) 2*(3.5-5) NNLS n6 m6k *(7.5-5) 46 2*(4.6-5) 2*(6. -5) NNLS n6 m8k *(3. -5) 486 2*(2.8-5) 2*(.8-5) NNLS n8 mk *(2.2-5) 395 2*(2.4-6) 2*(.3-5) NNLS n8 m2k *(6.6-5) 626 2*(.8-5) 2*(2. -4) NNLS n8 m2k *(.2-4) 83 2*(4.2-5) 2*(7.8-5) NNLS n8 m4k *(.4-4) 2*(5.6-6) 2*(3.4-5) 2*(3.9-4) NNLS n8 m4k *(8.3-5) 33 2*(5.3-6) 2*(7. -5) NNLS n8 m6k *(.2-5) 499 2*(4. -6) 2*(.4-5) NNLS n8 m8k *(3. -6) 2*(. -6) 76 2*(.3-5) NNLS nk mk 2*(.8-6) 352 2*(4.7-5) 372 2*(4. -5) 2*(3.9-5) NNLS nk m2k *(7.5-5) 2*(. -6) 2*(6.3-5) 2*(7. -5) NNLS nk m2k *(2. -4) 958 2*(.8-5) 2*(.3-4) NNLS nk m4k 2*(3.5-6) 2*(. -6) 2*(3. -4) 2*(.8-6) 2*(3. -5) 2*(4. -4) NNLS nk m4k *(2.5-5) 533 2*(4.7-6) 2*(3.5-5) NNLS nk m6k *(. -5) 356 2*(4.6-6) 2*(5.3-6) NNLS nk m8k *(5.2-5) 87 2*(.3-5) 2*(.4-4) NNLS n2k mk *(5. -5) 889 2*(7.8-6) 2*(. -4) NNLS n2k m2k 2 2*(.2-6) 44 2*(2.3-5) 52 2*(2.3-5) 2*(2.4-5) NNLS n2k m4k 2 2*(5.5-6) 286 2*(2.2-5) 449 2*(.2-5) 2*(.2-5) NNLS n2k m4k *(3. -5) 897 2*(2.4-5) 2*(2.6-5) NNLS n2k m6k *(8.4-5) 93 2*(3.7-5) 2*(6.2-5) NNLS n2k m8k *(.3-5) 2*(.6-6) 2*(8.2-6) 2*(. -5) NNLS n4k mk 4 2*(.6-6) 543 2*(4.9-5) 826 2*(4. -5) 2*(3.6-5) NNLS n4k m2k *(.2-6) 2*(8. -5) 2*(.3-6) 2*(4.7-5) 2*(5.8-5) NNLS n4k m4k 4 2*(.6-6) 2*(.6-6) 2*(8.3-5) 2*(2.9-6) 2*(.3-5) 2*(.4-4) NNLS n4k m8k *(8.8-5) 929 2*(2. -5) 2*(3.8-5) NNLS n6k m2k 6 2*(2.5-6) 49 2*(7.6-5) 2*(.2-6) 2*(9.9-6) 2*(6.5-5) NNLS n6k m4k 6 2*(3. -6) 695 2*(.2-4) 2*(.2-6) 2*(5.5-5) 2*(9.2-5) NNLS n8k m2k *(6.6-5) 953 2*(3.7-5) 2*(7.3-5) NNLS n8k m4k *(2.3-6) 2*(. -5) 2*(.4-6) 2*(6.2-6) 2*(2. -5) NNLS nk m2k 2*(2. -6) 744 2*(. -4) 2*(3. -6) 2*(5.9-5) 2*(8.2-5) NNLS nk m4k 2*(2.6-6) 2*(2.4-6) 2*(7. -5) 2*(2.8-6) 2*(2.6-5) 2*(.3-4) NNLS n2k m4k 2 2*(.2-6) 2*(.8-6) 2*(7.5-5) 2*(3.4-6) 2*(3.9-5) 2*(6.9-5) 23

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This