A New Look at the Performance Analysis of First-Order Methods

Size: px

Start display at page:

Download "A New Look at the Performance Analysis of First-Order Methods"

Alicia Palmer
5 years ago
Views:

1 A New Look at the Performance Analysis of First-Order Methods Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Yoel Drori, Google s R&D Center, Tel Aviv Optimization without Borders In Honor of Yuri Nesterov s 60th Birthday February 7 2, 206 Les Houches, France

2 The Problem Problem: How to evaluate the performance (complexity) of an optimization method for a given class of problems?

3 The Problem Problem: How to evaluate the performance (complexity) of an optimization method for a given class of problems? We focus on first order methods for smooth convex minimization Asssumption. (M) min{f (x) : x R d }, f convex and C, L (Rd ). (M) is solvable, i.e., the optimal set X (f ) := argmin f is nonempty. Given any starting point x 0, R > 0, such that x 0 x R, x X (f ).

4 The Problem Problem: How to evaluate the performance (complexity) of an optimization method for a given class of problems? We focus on first order methods for smooth convex minimization Asssumption. (M) min{f (x) : x R d }, f convex and C, L (Rd ). (M) is solvable, i.e., the optimal set X (f ) := argmin f is nonempty. Given any starting point x 0, R > 0, such that x 0 x R, x X (f ). We completely depart from conventional approaches...

5 Black-Box First Order Methods A Black-box optimization method is an algorithm A which has knowledge of: The underlying space R d The family of functions F to be minimized Oracle Model of Optimization Nemirovsky-Yudin (83)

6 Black-Box First Order Methods A Black-box optimization method is an algorithm A which has knowledge of: The underlying space R d The family of functions F to be minimized The function itself is not known. To gain information on the objective function f to be minimized, the algorithm A queries a subroutine which given an input point in R d, returns the value of f and its gradient f at that point. First Order Method: The Algorithm A The algorithm starts with an initial point x 0 R d and generate a finite sequence of points {x i : i =,... N} where at each step, the algorithm depends only on the previous steps, their function values and gradients via some rule: x i+ = A(x 0,..., x i ; f (x 0 ),..., f (x i ); f (x 0 ),..., f (x i )), i = 0,,..., N Note that the algorithm has another implicit knowledge: x 0 x R. Oracle Model of Optimization Nemirovsky-Yudin (83)

7 Performance/Complexity of an Algorithm We measure the worst-case performance (or complexity) of an algorithm A by looking at the absolute inaccuracy δ(f, x N ) = f (x N ) f (x ), where x N is the output of the algorithm after making N calls to the oracle. The worst-case is taken over all possible functions f F with starting points x 0 satisfying x 0 x R, where x X (f ). Problem We look at finding the maximal absolute inaccuracy over all possible inputs to the algorithm. This leads to the following...

8 Main Observation The worst-case performance of an optimization method is by itself an optimization problem!

9 Main Observation The worst-case performance of an optimization method is by itself an optimization problem! The Performance Estimation Problem PEP To measure the worst-case performance of an algorithm A we need to solve the following Performance Estimation Problem (PEP): max f (x N ) f (x ) s.t. f F, x i+ = A(x 0,..., x i ; f (x 0 ),..., f (x i ); f (x 0 ),..., f (x i )), i = 0,..., N, x X (f ), x x 0 R, x 0,..., x N, x R d. (P) PEP is an abstract optimization problem in infinite dimension : f F. Clearly intractable!?!..

10 A Methodology to Tackle PEP: Basic Un-Formal Approach A. Relax the functional constraint (f F) by new variables and constraints in R d to built a finite dimensional problem. This is done by: Exploiting adequate properties of the class F at the points x 0,..., x N, x R d. 2 Using the rule(s) describing the given algorithm A.

11 A Methodology to Tackle PEP: Basic Un-Formal Approach A. Relax the functional constraint (f F) by new variables and constraints in R d to built a finite dimensional problem. This is done by: Exploiting adequate properties of the class F at the points x 0,..., x N, x R d. 2 Using the rule(s) describing the given algorithm A. The resulting relaxed finite dimensional problem remains a valid upper bound on f (x N ) f (x ). Yet, this problem remains nontrivial to tackle. So what else can be done...?

12 A Methodology to Tackle PEP: Basic Un-Formal Approach A. Relax the functional constraint (f F) by new variables and constraints in R d to built a finite dimensional problem. This is done by: Exploiting adequate properties of the class F at the points x 0,..., x N, x R d. 2 Using the rule(s) describing the given algorithm A. The resulting relaxed finite dimensional problem remains a valid upper bound on f (x N ) f (x ). Yet, this problem remains nontrivial to tackle. So what else can be done...? B. More Relaxations..!!... For a given class of algorithms A: Exploit structures of PEP to simplify it. 2 Develop a novel relaxation technique and duality to find an upper bound to this problem. Despite "massive" relaxations: We derive new and better complexity bounds than currently known. In principle... this approach is universal. It can be applied to any optimization algorithm...!

13 Relaxing the functional constraint f F We focus on First Order Methods (FOM) for smooth convex problem, that is: convex f F C, L. We start with the following well known fact for convex f in F C, L.

14 Relaxing the functional constraint f F We focus on First Order Methods (FOM) for smooth convex problem, that is: convex f F C, L. We start with the following well known fact for convex f in F C, L. Proposition Suppose f : R d R is convex and has Lipschitz continuous gradient with constant L. Then for every x, y R d : 2L f (x) f (y) 2 f (x) f (y) f (y), x y. ()

15 Relaxing the functional constraint f F We focus on First Order Methods (FOM) for smooth convex problem, that is: convex f F C, L. We start with the following well known fact for convex f in F C, L. Proposition Suppose f : R d R is convex and has Lipschitz continuous gradient with constant L. Then for every x, y R d : 2L f (x) f (y) 2 f (x) f (y) f (y), x y. () The relaxation scheme - a sort of discretization" Apply () at the points x 0,..., x N and x. Use the resulting inequalities as constraints" instead of the functional constraint f F.

16 Relaxing the functional constraint f F We focus on First Order Methods (FOM) for smooth convex problem, that is: convex f F C, L. We start with the following well known fact for convex f in F C, L. Proposition Suppose f : R d R is convex and has Lipschitz continuous gradient with constant L. Then for every x, y R d : 2L f (x) f (y) 2 f (x) f (y) f (y), x y. () The relaxation scheme - a sort of discretization" Apply () at the points x 0,..., x N and x. Use the resulting inequalities as constraints" instead of the functional constraint f F. Define L x x 0 2 δ i := f (x i ) f (x ), L x x 0 g i := f (x i ), i = 0,..., N,, In terms of δ i, g i, condition () becomes g 2 i g j 2 x δ i δ j g j, i x j x x, i, j = 0,..., N,. (2) 0 We now treat x, {x i, δ i, g i } N i=0 as the optimization variables, instead of f C, L.

17 A (Relaxed) Finite Dimensional PEP Replacing the constraint on f by the constraints (2) we reach a relaxed finite dimensional PEP in the variables x, {x i, δ i, g i } N i=0: max L x x 0 2 δ N x,x i,g i R d,δ i R (P) s.t. g 2 i g j 2 x δ i δ j g j, i x j x x, i, j = 0,..., N,, 0 x i+ = A(x 0,..., x i ; δ 0,..., δ i ; g 0,..., g i ), i = 0,..., N, x x 0 R. Since (P) is a relaxation of the original maximization problem, its solution still provides a valid upper bound on the complexity of the given method A: f (x N ) f (x ) val(p). We will now show our main results for: The gradient method. 2 A broad class of first order methods.

18 PEP for the Gradient Method Algorithm (GM) 0 Input: N, h, f C, L (Rd ) convex, x 0 R d. For i = 0,..., N, compute x i+ = x i h L f (x i ), (h > 0).

19 PEP for the Gradient Method Algorithm (GM) 0 Input: N, h, f C, L (Rd ) convex, x 0 R d. For i = 0,..., N, compute x i+ = x i h L f (x i ), (h > 0). After some transformations, PEP for (GM) is a Nonconvex Quadratic Problem: (P) max LR 2 δ N g i R d,δ i R s.t. 2 g i g j 2 δ i δ j g j, j t=i+ hg t, i < j = 0,..., N, 2 g i g j 2 δ i δ j + g j, i t=j+ hg t, j < i = 0,..., N, 2 g i 2 δ i, i = 0,..., N, 2 g i 2 δ i g i, ν + i t= hg t, i = 0,..., N. Notation: ν R d is any unit vector; i < j = 0,..., N is a shorthand notation for i = 0,..., N, j = i +,..., N. As is", PEP remains impossible to tackle..!?..we turn to the second phase: Reformulation and more relaxations!

20 Analyzing PEP for the Gradient Method The main steps (see paper for details): We further drop constraints... This is still a valid upper bound! Reformulate it as a Quadratic Matrix (QM) Optimization Problem: max G R (N+) d,δ R N+ LR 2 δ N s.t. Tr(G T A i,i G) δ i δ i, i =,..., N, Tr(G T D i G + νe T i+g) δ i, i = 0,..., N, (G ) The matrices A i,i, D i S N+ are explicitly given in terms of h. ( {e i+ } N i=0 are the canonical unit vectors in RN+ and ν R d is a unit vector.)

21 Analyzing PEP for the Gradient Method The main steps (see paper for details): We further drop constraints... This is still a valid upper bound! Reformulate it as a Quadratic Matrix (QM) Optimization Problem: max G R (N+) d,δ R N+ LR 2 δ N s.t. Tr(G T A i,i G) δ i δ i, i =,..., N, Tr(G T D i G + νe T i+g) δ i, i = 0,..., N, (G ) The matrices A i,i, D i S N+ are explicitly given in terms of h. ( {e i+ } N i=0 are the canonical unit vectors in RN+ and ν R d is a unit vector.) To find an upper bound on problem G we use duality. We further exploit the special structure of this (QM), and a dimension reduction result, to derive a tractable SDP dual.

22 A Dual Problem for the Quadratic Matrix Problem G min { λ R N,t R 2 LR2 t : λ Λ, S(λ, t) 0}, (DG ) Λ := {λ R N : λ i+ λ i 0, i =,..., N, λ N 0, λ i 0, i =,..., N}, ( ) S N+2 ( h)s0 (λ) + hs (λ) q S(λ, t) := q T, t q := (λ, λ 2 λ,..., λ N λ N, λ N ) T and S 0, S S N+ are defined by: 2λ λ λ 2λ 2 λ 2 λ 2 2λ 3 λ 3 S 0 (λ) = λ N 2λ N λ N λ N and 2λ λ 2 λ... λ N λ N λ N λ 2 λ 2λ 2 λ N λ N λ N S (λ) = (4) λ N λ N λ N λ N 2λ N λ N λ N λ N... λ N (3)

23 A Feasible Analytical Solution of this SDP can be found! A crucial and hard part of the proof!

24 A Feasible Analytical Solution of this SDP can be found! A crucial and hard part of the proof! Lemma Let Then, t = 2Nh +, and λ i i =, i =,..., N. 2N + i the matrices S 0 (λ), S (λ) S N+ defined in (3) (4) are positive definite for every N N. The pair (λ i, t) is feasible for DG. Equipped with this result, invoking standard duality leads to the desired complexity result for GM.

25 Complexity Bound for the Gradient Method Theorem Let f C, L (Rd ) and let x 0,..., x N R d be generated by (GM) with 0 < h. Then a f (x N ) f (x ) a The classical bound on the gradient method: f (x N ) f (x ) LR2 2N. LR2 4N + 2. (5)

26 Complexity Bound for the Gradient Method Theorem Let f C, L (Rd ) and let x 0,..., x N R d be generated by (GM) with 0 < h. Then a f (x N ) f (x ) a The classical bound on the gradient method: f (x N ) f (x ) LR2 2N. LR2 4N + 2. (5) We further prove that this bound is tight!

27 The Bound is Tight Theorem Let L > 0, N N and d N. Then for every h > 0 there exists a convex function ϕ C, L (Rd ) and a point x 0 R d such that after N iterations, Algorithm GM reaches an approximate solution x N with the following absolute inaccuracy ϕ(x N ) ϕ = LR2 4Nh + 2.

28 The Bound is Tight Theorem Let L > 0, N N and d N. Then for every h > 0 there exists a convex function ϕ C, L (Rd ) and a point x 0 R d such that after N iterations, Algorithm GM reaches an approximate solution x N with the following absolute inaccuracy ϕ(x N ) ϕ = LR2 4Nh + 2. Interestingly...this ϕ is nothing else but the Moreau envelope of x /(2Nh + )! 7 6 ϕ(x) = { x, x, 2N+ 2(2N+) 2 2N+ 2 x 2, x <, 2N with x 0 = e

29 A Conjecture for GM with 0 < h < 2 We conclude this part by raising a conjecture on the worst-case performance of the gradient method with a constant step size 0 < h < 2.

30 A Conjecture for GM with 0 < h < 2 We conclude this part by raising a conjecture on the worst-case performance of the gradient method with a constant step size 0 < h < 2. Conjecture Suppose the sequence x 0,..., x N is generated by Algorithm GM with 0 < h < 2, then ( ) f (x N ) f (x ) LR2 2 max, ( h)2n. 2Nh + Note: when 0 < h the bound above coincides with our previous bound.

31 A Wide Class of First-Order Algorithms Consider the following class of first-order algorithms: Algorithm (FO) 0 Input: f C, L (Rd ), x 0 R d. For i = 0,..., N, compute x i+ = x i L i k=0 h(i+) k f (x k ). We now show that the class (FO) covers some fundamental schemes beyond the gradient method. 2 For this class we establish a complexity bound that can be efficiently computed via SDP solvers. 3 Furthermore, we derive an "optimized" algorithm of this form by finding optimal step sizes h (i) k.

32 Example : the Heavy Ball Method Example (The heavy ball method, HBM, Polyak (964)) 0 Input: f C, L (Rd ), x 0 R d, x x 0 α L f (x 0 ), (α > 0). 2 For i =,..., N compute: x i+ = x i α L f (x i ) + β(x i x i ), (β > 0).

33 Example : the Heavy Ball Method Example (The heavy ball method, HBM, Polyak (964)) 0 Input: f C, L (Rd ), x 0 R d, x x 0 α L f (x 0 ), (α > 0). 2 For i =,..., N compute: x i+ = x i α L f (x i ) + β(x i x i ), (β > 0). By recursively eliminating the term x i x i in the last step, we can rewrite step 2 as follows: x i+ = x i i αβ i k f (x k ), L k=0 hence this methods clearly fits in the class (FO).

34 Example 2: Nesterov s Fast Gradient Method Example (Nesterov s fast gradient method, FGM (983)) 0 Input: f C, L (Rd ), x 0 R d, y x 0, t, 2 For i =,..., N compute: x i y i L f (y i ), 2 t i+ + +4t 2 i 2, 3 y i+ x i + t i t i+ (x i x i ). This algorithm is as simple as the gradient method, yet achieves an optimal convergence rate of O(/N 2 ): f (x N ) f (x ) 2L x 0 x 2 (N + ) 2, x X (f ); (3L/32, lower bound).

35 Example 2: Nesterov s Fast Gradient Method Example (Nesterov s fast gradient method, FGM (983)) 0 Input: f C, L (Rd ), x 0 R d, y x 0, t, 2 For i =,..., N compute: x i y i L f (y i ), 2 t i+ + +4t 2 i 2, 3 y i+ x i + t i t i+ (x i x i ). This algorithm is as simple as the gradient method, yet achieves an optimal convergence rate of O(/N 2 ): f (x N ) f (x ) 2L x 0 x 2 (N + ) 2, x X (f ); (3L/32, lower bound). This algorithm includes 2 sequences of points (x i, y i ). At first glance, it does not appear to belong to the class (FO)......It can be shown that the FGM fits in the class (FO), (see the paper ).

36 PEP for the Wide Class of First-Order Algorithms (FO) Applying our approach to FO, (as done for GM), we derive the following PEP: max LR 2 δ N g i R d,δ i R s.t. 2 g i g j 2 δ i δ j g j, j g 2 i g j 2 δ i δ j + g j, i t=j+ 2 g i 2 δ i, i = 0,..., N, g 2 i 2 δ i g i, ν + i t= t t=i+ k=0 h(t) k g k, i < j = 0,..., N, t k=0 h(t) k g k, j < i = 0,..., N, t k=0 h(t) k g k, i = 0,..., N. In this general case an analytical solution appears unlikely...

37 PEP for the Wide Class of First-Order Algorithms (FO) Applying our approach to FO, (as done for GM), we derive the following PEP: max LR 2 δ N g i R d,δ i R s.t. 2 g i g j 2 δ i δ j g j, j g 2 i g j 2 δ i δ j + g j, i t=j+ 2 g i 2 δ i, i = 0,..., N, g 2 i 2 δ i g i, ν + i t= t t=i+ k=0 h(t) k g k, i < j = 0,..., N, t k=0 h(t) k g k, j < i = 0,..., N, t k=0 h(t) k g k, i = 0,..., N. In this general case an analytical solution appears unlikely... Nevertheless, using techniques similar to the ones used for GM, we establish a dual bound that can be efficiently computed via any SDP solver. More precisely, we obtain the following result.

38 A Bound on Algorithm FO via Convex SDP Theorem Fix any N, d N. Let f C, L (Rd ) be convex and suppose that x 0,..., x N R d are generated by Algorithm FO, and that (DQ) is solvable. Then, f (x N ) f (x ) LR 2 B(h) Here B( ) is the value of the Convex SDP (DQ ): B(h) = min λ,τ,t 2 LR2 t ( N s.t. i= λ iãi,i(h) + N i=0 τ D i i (h) τ ) 2 τ T t 0, 2 2 (λ, τ) Λ, (DQ ) Λ := {(λ, τ) R N + R N+ + : τ 0 = λ, λ i λ i+ + τ i = 0, i =,..., N, λ N + τ N = }. and the matrices Ãi(h), D i (h) are explicitly given in terms of h (h i k) 0 <k i N. Note: The bound is independent of the dimension d.

39 Numerical Examples Performance estimate (log scale) Heavy ball method, α= β=0.5 FGM main sequence FGM auxiliary sequence FGM analytical bound N FGM Analytical Bound = 2LR2 (N+) 2. HBM is not competitive versus FGM Conjecture 2: f (x i ), f (y i ) converge to optimal value with same rate of convergence.

40 Numerical Examples Performance estimate (log scale) Heavy ball method, α= β=0.5 FGM main sequence FGM auxiliary sequence FGM analytical bound N FGM Analytical Bound = 2LR2 (N+) 2. HBM is not competitive versus FGM Conjecture 2: f (x i ), f (y i ) converge to optimal value with same rate of convergence....just proven by Kim-Fessler (205).

41 Finding an Optimized" Algorithm Given B(h), a natural question is: how to find the best" algorithm with respect to the bound? i.e., the best step sizes. That is find h = argmin h B(h) which leads to the mini-max problem: min h (k) k max δ N x i,g i R d,δ i R s.t. g 2L i g j 2 δ i δ j g j, x i x j, i, j = 0,..., N,, x i+ = x i i L k=0 h(i+) k g k, i = 0,..., N, x x 0 R. Once again, we face a challenging problem...

42 Finding an Optimized" Algorithm Given B(h), a natural question is: how to find the best" algorithm with respect to the bound? i.e., the best step sizes. That is find h = argmin h B(h) which leads to the mini-max problem: min h (k) k max δ N x i,g i R d,δ i R s.t. g 2L i g j 2 δ i δ j g j, x i x j, i, j = 0,..., N,, x i+ = x i i L k=0 h(i+) k g k, i = 0,..., N, x x 0 R. Once again, we face a challenging problem... Using semidefinite relaxations, duality and linearization, a solution to this problem can be efficiently approximated.

43 An Optimized Algorithm Solution step I Remove some selected constraints and eliminate x i using the equality constraints: min h (k) k max δ N x,g i R d,δ i R s.t. 2L g i g i 2 δ i δ i g i, i k=0 h(i) 2L g i 2 δ i g i, x x 0 + i t= x x 0 2 R 2. k g k, i =,..., N, t k=0 h(t) k g k, i = 0,..., N,

44 An Optimized Algorithm Solution step I Remove some selected constraints and eliminate x i using the equality constraints: min h (k) k max δ N x,g i R d,δ i R s.t. 2L g i g i 2 δ i δ i g i, i k=0 h(i) 2L g i 2 δ i g i, x x 0 + i t= x x 0 2 R 2. k g k, i =,..., N, t k=0 h(t) k g k, i = 0,..., N, Take dual of the inner max" problem, to obtain a Nonconvex (bilinear) SDP: (BIL) min h,λ,τ,t { 2 t : ( N i= λ iãi(h) + N i=0 τ D i i (h) τ ) 2 τ T t 2 2 } 0, (λ, τ) Λ, Ã i (h) := 2 (e i e i+ )(e i e i+ ) T + i 2 k=0 h(i) k (e i+ek+ T + e k+ei+ T ), D i (h) := 2 e i+ei+ T + i t 2 t= k=0 h(t) k (e i+ek+ T + e k+ei+ T ) Λ :={(λ, τ) R N + RN+ + : τ 0 = λ, λ i λ i+ + τ i = 0, i =,..., N, λ N + τ N = }.

45 Optimized Algorithm Solution step II Define a new variable (Linearize the bilinear nonconvex SDP): r i,k = λ i h (i) k to obtain a A Convex SDP: where (LIN) min r,λ,τ,t + τ i i t=k+ h(t) k, i =,..., N, k = 0,..., i { 2 t : ( S(r, λ, τ) 2 τ 2 τ T 2 t ) } 0, (λ, τ) Λ, S(r, λ, τ) = N 2 i= λ i(e i e i+ )(e i e i+ ) T + N 2 i=0 τ ie i+ ei+ T + N i 2 i= k=0 r i,k(e i+ ek+ T + e k+ ei+). T

46 Optimized Algorithm Solution step II Define a new variable (Linearize the bilinear nonconvex SDP): r i,k = λ i h (i) k to obtain a A Convex SDP: where (LIN) min r,λ,τ,t + τ i i t=k+ h(t) k, i =,..., N, k = 0,..., i { 2 t : ( S(r, λ, τ) 2 τ 2 τ T 2 t ) } 0, (λ, τ) Λ, S(r, λ, τ) = N 2 i= λ i(e i e i+ )(e i e i+ ) T + N 2 i=0 τ ie i+ ei+ T + N i 2 i= k=0 r i,k(e i+ ek+ T + e k+ ei+). T Theorem (Use the solution of (LIN) to solve (BIL) and get optimal h.) Suppose (r, λ, τ, t ) is an optimal solution for (LIN), then (h, λ, τ, t ) is an optimal solution for (BIL), where h = (h (i) k ) 0 k<i N is defined by the following recursive rule: r h (i) i,k τ i i t=k+ h(t) k k = λ +τ i λ i + τi 0 i, i =,..., N, k = 0,..., i. 0 otherwise

47 An Optimized Algorithm Numerical Results Performance estimate (log scale) Heavy ball method, α= β=0.5 FGM main sequence FGM auxiliary sequence FGM analytical bound Optimal algorithm N The bound on the new algorithm is two times better than the bound on Nesterov s FGM!

48 An Optimized Algorithm Example with N = 5 Example A first-order algorithm with optimal step-sizes for N = 5: x x L f (x 0 ) x 2 x 0.74 L f (x 0 ) L f (x ) x 3 x L f (x 0 ) L f (x ) L f (x 2 ) x 4 x L f (x 0 ) L f (x ) L f (x 2 ) L f (x 3 ) x 5 x L f (x 0 ) L f (x ) L f (x 2 ) L f (x 3 ) L f (x 4 ) We then get f (x 5) f (x ) 0.09 LR 2 for any x X (f ).

49 Concluding Remarks and Extensions The PEP framework offers a new approach to derive complexity bounds. Finding a bound for the PEP problem is challenging! Numerical bounds required solving SDP which dimension depends on N.

50 Concluding Remarks and Extensions The PEP framework offers a new approach to derive complexity bounds. Finding a bound for the PEP problem is challenging! Numerical bounds required solving SDP which dimension depends on N. Two Very Recent Works Building on PEP: Kim-Fessler (MP-205) confirmed our Conjecture 2. Also derived an efficient Optimized" algorithm, with an analytical bound for the auxiliary sequence y k. Taylor et al. (205): Tightness of our relaxations and numerical bounds + smooth strongly convex case.

51 Concluding Remarks and Extensions The PEP framework offers a new approach to derive complexity bounds. Finding a bound for the PEP problem is challenging! Numerical bounds required solving SDP which dimension depends on N. Two Very Recent Works Building on PEP: Kim-Fessler (MP-205) confirmed our Conjecture 2. Also derived an efficient Optimized" algorithm, with an analytical bound for the auxiliary sequence y k. Taylor et al. (205): Tightness of our relaxations and numerical bounds + smooth strongly convex case. Extensions: Analyze other algorithms (e.g., constraints done for projected gradient), and different classes F of input functions/optimization models... PEP also useful as a constructive approach to design new algorithms... In our Recent work on Nonsmooth problems we derive an Optimal Kelley-Like Cutting Plane Method. [To appear in Math. Prog.]

52 HAPPY BIRTHDAY YURI!

Performance of first-order methods for smooth convex minimization: a novel approach

Performance of first-order methods for smooth convex minimization: a novel approach arxiv:206.3209v [math.oc] 4 Jun 202 Yoel Drori and Marc Teboulle June 5, 202 Abstract We introduce a novel approach for