arxiv: v4 [math.oc] 1 Apr 2018

Size: px
Start display at page:

Download "arxiv: v4 [math.oc] 1 Apr 2018"

Transcription

1 Generalizing the optimized gradient method for smooth convex minimization Donghwan Kim Jeffrey A. Fessler arxiv: v4 [math.oc] Apr 08 Date of current version: April 3, 08 Abstract This paper generalizes the optimized gradient method OGM [9, 5, 6] that achieves the optimal worst-case cost function bound of first-order methods for smooth convex minimization [7]. Specifically, this paper studies a generalized formulation of OGM and analyzes its worst-case rates in terms of both the function value and the norm of the function gradient. This paper also develops a new algorithm called OGM-OG that is in the generalized family of OGM and that has the best known analytical worst-case bound with rate O/N.5 on the decrease of the gradient norm among fixed-step first-order methods. This paper also proves that Nesterov s fast gradient method [4,6] has an O/N.5 worstcase gradient norm rate but with constant larger than OGM-OG. The proof is based on the worst-case analysis called Performance Estimation Problem in [9]. Introduction First-order methods are favorable for solving large-scale problems because their computational complexity per iteration depends mildly on the problem dimension. In particular, Nesterov s fast gradient method FGM[4,6]achievestheoptimalworst-caserateO/N fordecreasingsmoothconvexfunctionsafter N iterations [5], and thus has been widely used in large-scale applications. Recently, the optimized gradient method OGM [9, 5, 6] was found to achieve the optimal worst-case cost function bound of first-order methods with either fixed-step or adaptive-step approaches for smooth convex minimization in [7], whereas FGM achieves that bound only up to constant. Building upon [9, 5, 6], this paper presents two different ways of generalizing OGM and its development. First, this paper specifies a parameterized family of algorithms that generalizes OGM, and provides worst-case bounds on the function and gradient norm values for this family. Like the generalized forms of FGM [3, 6] being widely used and studied e.g., [, 3, 9], we believe introducing the generalized OGM here can be potentially useful. Second, this paper optimizes the step coefficients of fixed-step first-order methods with respect to the rate of decrease of the cost function s gradient norm, leading to a new algorithm called OGM-OG OG for optimized over a gradient. This development expands the choice of worst-case rate metrics for optimizing first-order methods in [9, 5, 6] that focused on the cost function decrease leading to OGM. We next briefly review the Performance Estimation Problem PEP [9] that was used in [9,5,6] to develop OGM and that we extensively use throughout the paper. This research was supported in part by NIH grant U0 EB Donghwan Kim Jeffrey A. Fessler Dept. of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 4809 USA kimdongh@umich.edu, fessler@umich.edu There is a backtracking line-search version of FGM [5] that also achieves the optimal worst-case function bound up to constant, which is sometimes more useful than the fixed-step FGM in practice. However, such backtracking line-search version of OGM with a fast worst-case bound is yet unknown unlike the fixed-step OGM [9,5,6], while recently an exact line-search version of OGM is developed in [8].

2 Drori and Teboulle [9] cast a worst-case analysis into an optimization problem called PEP [9] that examines the maximal absolute cost function inaccuracy over all possible inputs cost functions to the optimization algorithm. See e.g., [5,8,0,5,6,7,8,9,30,3] for its extensions. Moreover, Drori and Teboulle [9] optimized numerically the step coefficients of first-order methods using PEP for smooth convex minimization, and found an algorithm whose worst-case bound is lower than that of FGM, but it required too much computation and memory to be appealing for large-scale problems. Building on their work, the authors [5, 6] found computationally and memory-wise efficient version, called OGM, and showed analytically that OGM satisfies an analytical worst-case bound that is twice smaller than that of FGM. Drori [7] showed that the OGM is optimal for large-dimensional smooth convex minimization over a general class of first-order methods with either fixed or adaptive step sizes [7]. This OGM has been numerically extended for nonsmooth composite convex problems in [30]. In addition, this OGM-type algorithm was already studied in the context of a proximal point method [3, Appendix]. Using the PEP approach[9], this paper proposes a generalized version of OGMGOGM and analyzes its worst-case rates in terms of both the decrease of the cost function and the decrease of the norm of the gradient of the cost function. The results complement the worst-case analysis of the OGM [5, 6], and expands our understanding of OGM-type first-order methods. This paper analyzes the worst-case rate of the gradient norm in addition to that of the cost function because it is important when dealing with dual problems, considering that the dual gradient norm corresponds to the primal distance to feasibility see e.g., [6,, 7]. While FGM has not been shown previously to satisfy a rate O/N.5 for decreasing the gradient norm, modified versions of FGM with such rate were studied in [,,7]. This paper proves that FGM in fact does have that rate, building upon [3] that numerically conjectured such rate for FGM using the gradient norm version of PEP. For further acceleration of the worst-case gradient norm rate, we optimize the step coefficients of first-order methods with respect to the gradient norm using PEP and propose an algorithm named OGM-OG that belongs to the GOGM family and has the best known analytical worst-case bound on rate of decrease of the gradient norm among fixed-step first-order methods. One can extend some aspects of the approaches for generalizing OGM described in this paper to other optimization algorithms and problems. One direction we have already taken in [7] aims to improve the fast iterative shrinkage/thresholding algorithm FISTA [] that reduces to FGM for smooth convex problems for nonsmooth composite convex problems. Naturally, this paper and [7] use some similar approaches, but they are different; the methods in [7] when simplified to the smooth case correspond to a generalization of FGM that differs from the GOGM. Another direction we have recently taken in [8] focuses on optimizing the step coefficients of first-order methods with respect to the gradient norm under the initial bounded function condition that is different from the initial bounded distance condition used in this paper. Sec. defines the smooth convex problem and the first-order methods. Sec. 3 reviews and discusses worst-case analyses of a gradient method GM, FGM, and OGM for both the function value and the gradient norm. Sec. 3 also reviews first-order methods that guarantee an O/N.5 rate for the gradient decrease. Sec. 4 reviews the cost function form of PEP [9] and reviews how the OGM [5] is derived using such PEP. Sec. 4 then proposes a generalized version of OGM GOGM using the cost function form of PEP, and Sec. 5 provides a worst-case gradient norm bound for the GOGM using the gradient form of PEP. Then, Sec. 5 optimizes the step coefficients using the gradient form of PEP and proposes the OGM-OG that belongs to the GOGM family. Sec. 5 also proves that FGM decreases the gradient norm with a rate O/N.5. Sec. 6 and Sec. 7 provide discussion and conclusion. Smooth convex problem and first-order methods. Smooth convex problem We focus on the following smooth convex minimization problem min fx, x R d M The original PEP was intractable to solve, so a series of relaxation on the PEP was introduced in [9] to make it possibly solvable, which we review in Sec. 4..

3 where the following additional conditions are assumed: f : R d R is a convex function of the type C, L Rd, i.e., continuously differentiable with Lipschitz continuous gradient: fx fy L x y, x,y R d,. where L > 0 is the Lipschitz constant. The optimal set X f = argmin x R d fx is nonempty, i.e., the problem M is solvable. We use F L R d to denote the class of functions that satisfy the above conditions. We also assume that the distance between an initial point x 0 and an optimal solution x Xf is bounded by some R > 0, i.e., x 0 x R... First-order methods To solve M, we consider the following class of fixed-step or non-apdative-step first-order methods FSFOM, where the update step at i + th iteration is a weighted sum of the previous and current gradients{ fx k } i scaledby L withfixedconstantstepcoefficients{h i+,k} i thatarenotadaptive tothegivenf andx 0 andthuslandr.thisclassfsfomincludesgm, FGM,OGM,andthemethods proposed in this paper, but excludes line-search-type methods. Algorithm Class FSFOM Input: f F L R d, x 0 R d. For i = 0,...,N x i+ = x i i h i+,k fx k. L 3 Review of the worst-case analysis of FSFOM This section reviews the worst-case analysis of existing FSFOMs and simple variants thereof in terms of bounds on the cost function and gradient norm. Sec. 3. reviews the worst-case cost function decrease of GM, FGM, and OGM. Sec. 3. presents both FSFOMs including GM, FGM, OGM and some variants that have either O/N or O/N.5 rate for the worst-case gradient decrease; it also reviews an O/N lower bound of the worst-case rates of first-order methods for decreasing the gradient norm. 3. Function value worst-case analysis of FSFOM The simplest example of a FSFOM is the following GM that uses only the current gradient and the Lipschitz constant L for the update. Algorithm GM Input: f F L R d, x 0 R d. For i = 0,,... x i+ = x i L fx i 3

4 This GM monotonically decreases the cost function [5] and satisfies the following tight 3 worst-case bound [9, Thm. ], for any i 0, fx i fx LR 4i+. 3. Among the class FSFOM, the following two equivalent forms of FGM [4,6] have been used widely because they decrease the cost function with the optimal rate O/N. Algorithm FGM Input: f F L R d, x 0 = y 0 R d, t 0 =. For i = 0,,... y i+ = x i L fx i + +4t i t i+ =, x i+ = y i+ + t i y i+ y i t i+ Algorithm FGM Input: f F L R d, x 0 = y 0 R d, t 0 =. For i = 0,,... y i+ = x i L fx i z i+ = x 0 i t k fx k L + +4t i t i+ =, x i+ = y i+ + z i+ t i+ t i+ Specifically, the FGM and FGM iterates satisfy the following worst-case cost function bounds [5, 4, 6] for any i : fy i fx LR t LR i i+, and fx i fx LR t i where the parameter t i satisfies t i = i l=0 t l and t i i+ LR i+, 3. for all i. 3.3 A generalized form of FGM in [3] uses parameters t i satisfying t 0 = and t i t i + t i, including the choice t i = i+a a for any a. There is another generalized form of FGM in [6], and these generalized forms of FGM have been widely used and studied e.g., [, 3, 9]. Similarly this paper studies generalizations of the OGM. Building upon [9] that optimized numerically the step coefficients over the cost function form of PEP, the authors [5] developed the following two equivalent forms of OGM, as reviewed in Sec. 4. Algorithm OGM Input: f F L R d, x 0 = y 0 R d, θ 0 =. For i = 0,...,N y i+ = x i L fx i + +4θ i, i N θ i+ = + +8θ i, i = N x i+ = y i+ + θ i θ i+ y i+ y i + θ i θ i+ y i+ x i Algorithm OGM Input: f F L R d, x 0 = y 0 R d, θ 0 =. For i = 0,...,N y i+ = x i L fx i z i+ = x 0 i θ k fx k L + +4θ i, i N θ i+ = + +8θ i, i = N x i+ = y i+ + z i+ θ i+ θ i+ 3 A tight worst-case bound denotes an inequality where the equality holds for some function f. For example, [9, Thm. ] shows the bound 3. is tight. 4

5 The OGM iterates satisfy the following worst-case cost function bounds [5, 6]: for any i N, and The parameter sequence θ i satisfies fy i fx LR 4θi LR i+, 3.4 fx N fx LR θ N θ i = { i+ l=0 θ l, i N, N l=0 θ l +θ N, i = N, LR N +N and θ i { i+, i N, i+, i = N, which is equivalent to t i 3.3 except at the final iteration. The bounds 3.4 and 3.5 of OGM are about twice smaller than the bounds 3. of FGM, so OGM decreases the cost function faster than FGM in the worst case and often in practice [4]. In addition, the bound 3.5 on the final iterate x N is tight and satisfies the optimal worst-case bound of general first-order methods including both FSFOM and adaptive-step first-order methods, when the condition d N + holds [7]. The additional term θi θ i+ y i+ x i of OGM and the additional constant for the update of z i of OGM, compared to FGM and FGM respectively, along with the parameter θ N, are what make OGM optimal for d N +. One of the main goals of this paper is to generalize the form of OGM and analyze the worst-case rate of such generalized OGM in terms of both the function value and the gradient norm, complementing the bounds 3.4 and 3.5 on the function value of OGM. The next section studies the worst-case rate of the gradient of FSFOM Gradient norm worst-case analysis of FSFOM When tackling dual problems, it is known that the gradient norm worst-case rate is important in addition to the function value worst-case rate because the dual gradient norm is related to the primal distance to feasibility see e.g., [6,,7]. One simple way to find a loose worst-casebound for the gradient norm is to use the well-known convex inequality for convex functions with L-Lipschitz continuous gradients [5]: L fx fx f x L fx fx fx, x R d, 3.7 as discussed in [5,3]. Combining the bounds 3., 3. and the inequality 3.7, for any i, the GM iterates satisfy and the iterates of FGM satisfy fx i Lfx i fx LR i+, 3.8 fy i LR t i LR i+, and fx i LR t i LR i Similarly for any i N, the OGM iterates with the bounds 3.4 and 3.5 satisfy fy i LR LR θi i+, and fx N LR LR θ N N Unfortunately, using the inequality 3.7 provides at best an O/N bound due to the optimal rate O/N of the function decrease. Furthermore, in general using 3.7 need not lead to tight worst-case bounds on the gradient norm. Using a different approach, a smaller O/N worst-case bound for the gradient norm of GM was derived in [7], as reviewed in next section. While the bounds 3.8, 3.9, and 3.0 are not guaranteed to be tight, the next section shows that the worst-case gradient bound 3.0 on the final iterate x N of OGM is in fact tight and thus has the same disappointingly slow O/N worst-case bound on the gradient norm as GM. 5

6 3.. FSFOM with rate O/N for decreasing the gradient norm This section uses the following lemma stating that GM monotonically decreases the gradient. Lemma [, Lemma.4] The GM monotonically decreases the gradient norm, i.e., x f L fx fx. 3. The following theorem reviews a simple proof in [7] that provides a worst-case gradient norm bound forgmwithrateo/nthat issmallerthan3.8, where[,thm. 6.]additionallyconsidersLemma. Theorem [, Thm. 6.], [7] Let f : R d R be F L R d and let x 0,,x N R d be generated by GM. Then for any N, LR min fx i = fx N. 3. i {0,...,N} NN + Proof Lemma implies the first equality in 3.. Using 3., 3.7, 3. yields: LR 3. fx m fx 3.7 fx N+ fx + 4m+ L 3. N m+ L fx N, N fx i which is equivalent to 3. using m = N/ for which m N and N m N. Inspired by the conjecture in [3, Sec. 4..3], the following theorem shows that the O/N rate of the worst-case gradient norm bound 3. of GM is tight up to a constant. Theorem Let x 0,,x N R d be generated by GM. Then for any N, LR N + max f F LR d, x Xf, x 0 x R min fx i = i {0,...,N} where the inequality in 3.3 is achieved by the following function in F L R d : ψx = { LR N+ x LR N+, x R N+, i=m max f F LR d, x Xf, x 0 x R fx N, 3.3 L x, x < R N Proof Lemma implies the equality in 3.3. Starting from x 0 = Rν, where ν is a unit vector, the GM iterates are x i = i Rν, ψx i = LR ν, i = 0,...,N, N + N + which implies the inequality 3.3. We next show that the bound 3.0 for the gradient norm at the final iterate x N of OGM is tight and and that its worst-case function is a simple quadratic function. Note that OGM was derived by optimizing a worst-case bound on the cost function decrease and its behavior in terms of gradient norms was not investigated previously. Theorem 3 Let x 0,,x N R d be generated by OGM. Then for any N, max f F LR d, x Xf, x 0 x R min fx i = i {0,...,N} max fx N = LR f F LR d, θ N x Xf, x 0 x R LR, 3.5 N + where the worst-case function in F L R d for OGM in terms of the gradient norm is the quadratic function φx = L x. 6

7 Proof See Appendix A. Comparing 3. and 3.5, we see that GM and OGM have essentially similar worst-case gradient norm bounds. This is a dilemma because OGM is the fastest FSFOM in terms of the worst-case cost function bound, but is as slow as GM in terms of the worst-case gradient norm bound. Therefore, one of the main goals of this paper is to study optimizing the step coefficients of FSFOM using PEP with respect to the gradient norm in Sec We next discuss the specific FSFOM in [7] that decreases the gradient norm with a faster O/N.5 rate. 3.. FSFOM with rate O/N.5 for decreasing the gradient norm Searching for a FSFOM that decreases the gradient norm faster than the O/N rate of GM and OGM, Nesterov [7] among other variants of FGM [, ] considered performing FGM for the first m iterations, and GM for the remaining iterations. He showed that this method, which we denote FGM-m, satisfies a fast rate O/N.5 for decreasing the gradient norm. In [6,,7], FGM-m for m = N/ was used to solve dual problems. To pursue a faster worst-case rate in terms of the constant factor, we consider here another variant that performs OGM for the first m iterations and GM for the remaining iterations, which we denote OGM-m. Algorithm OGM-m Input: f F L R d, x 0 = y 0 R d, ϑ 0 =, m {,...,N }. For i = 0,...,m y i+ = x i L fx i + +4ϑ i, i m ϑ i+ = + +8ϑ i, i = m x i+ = y i+ + ϑ i ϑ i+ y i+ y i + ϑ i ϑ i+ y i+ x i For i = m,...,n x i+ = x i L fx i The following theorem bounds the gradient norm of the OGM-m iterates, inspired by the proof in [, 7] for the worst-case gradient norm bound of the FGM-m iterates. The worst-case bound of FGM-m in [,7] is asymptotically -times larger than the following new bound 3.6 for OGM-m. Theorem 4 Let f : R d R be F L R d and let x 0, x N R d be generated by OGM-m for m N. Then for any N, LR min fx i fx N i {0,...,N} m+ N m Proof Using 3.5, 3.7, 3. yields: LR ϑ m 3.5 fx m fx 3.7 fx N+ fx + L 3. N m+ L fx N, which is equivalent to 3.6 using ϑ m m+ that is implied by 3.6. N fx i The bound 3.6 is minimized at a point close to m = N/3, leading to its approximately smallest constant 3 6 with the rate O/N.5. Other variants of FGM having O/N.5 worst-case gradient bounds were derived in [,]. Such variations of FGM including FGM-m were derived since, prior to this paper, it was unknown whether or not FGM decreases the gradient norm with the rate O/N.5 ; this rate for the gradient norm of i=m 7

8 FGM was conjectured numerically in [3]. Sec. 5. below uses the PEP to show for the first time the rate O/N.5 for the gradient decrease of the FGM. The bound 3.6 of OGM-m for decreasing the gradient is smaller than the bounds for the FGM variants in [,], and Sec. 5.4 below shows that our proposed methods have worst-case bounds even lower than 3.6. The preceding sections have focused on tight or upper worst-case bounds of the gradient norm decrease of first-order methods, whereas the next section reviews a lower bound for the worst-case gradient norm decrease in [3], illustrating the best achievable worst-case rate of the gradient norm decrease for any first-order method with either fixed-step or adaptive-step approaches A lower bound of the worst-case rates of first-order methods for decreasing the gradient norm For completeness, this section reviews a lower bound on the worst-case rate of any first-order method in terms of the gradient norm values for smooth convex quadratic functions [3]. Lower bounds on the function value were studied for convex quadratic functions in [3], and for smooth convex functions in [7, 5]. When the condition d N +3 holds, a worst-case gradient norm bound of any first-order method generating x N after N iterations has rate O/N at best, for convex quadratic f, i.e., has the following lower bound [3, Sec..3.B]: LR 4e N + max fx N, 3.7 f Q LR d, x Xf, x 0 x R where Q L R d := { f : fx x Qx+p x+r for x Ê d, Q 0, Q L }. Since Q L R d F L R d, the lower bound 3.7 for convex quadratic functions also applies to smooth convex functions. A regularization technique in [7] achieves the rate O/N up to a logarithmic factor. However, its adaptive step coefficients require knowing R in advance which is undesirable in practice. To our knowledge, whether there exists any FSFOM satisfying such rate is an open question. Instead, this paper discusses a way to develop FSFOM that achieves an O/N.5 gradient norm bound with the smallest constant among known FSFOM. 4 Relaxation and optimization of the cost function form of PEP Thissectionreviewsarelaxationofthecostfunction formofpep[9]and reviewshow[9,5,6]optimized the step coefficients of the FSFOM class over the cost function form of PEP, leading to OGM. Then, we propose a parameterized family of algorithms that generalizes OGM, and analyze the worst-case cost function decrease of the generalized OGM family. 4. Review: Relaxation for the cost function form of PEP The worst-case bound on the cost function for a FSFOM having given step coefficients h := {h i+,k } corresponds to a solution of the following PEP problem [9, Prob. P]: B P h,n,d,l,r := max fx N fx f F LR d, x 0,,x N R d, x X f, x 0 x R s.t. x i+ = x i i h i+,k fx k, i = 0,...,N. L P Since problem P is impractical to solve due to its functional constraint f F L R d, [9] relaxed it by the following finite set of inequalities satisfied by f [5, Thm...5]: L fx i fx j fx i fx j fx j, x i x j 4. 8

9 for i,j = 0,,...,N,. Then, a matrix G = [g 0,,g N ] R N+ d and a vector δ = [δ 0,,δ N ] R N+ with g i := L x 0 x fx i, and δ i := L x 0 x fx i fx are introduced to represent gradient vectors and function values respectively in the set of 4.. This leads to a finite-dimensional relaxation of problem P [9, Prob. Q]: B P h,n,d,l,r := max G R N+ d, δ R N+ LR δ N P s.t. Tr { G A i,j hg } δ i δ j, i < j = 0,...,N, Tr { G B i,j hg } δ i δ j, j < i = 0,...,N, Tr { G C i G } δ i, i = 0,...,N, Tr { G D i hg+νu i G } δ i, i = 0,...,N, for any given unit vector ν R d, where u i = e i+ R N+ is the i+th standard basis vector. Note that Tr { G u i u j G} = g i, g j by definition. The matrices A i,j h,b i,j h,c i,d i h are defined as A i,j h := u i u j u i u j + j l=i+ B i,j h := u i u j u i u j i l=j+ C i := u iu i, D i h := u iu i + i j= l j h j,ku i u k +u ku i. h l,ku j u k +u ku j, l h l,ku j u k +u ku j, In [9, Prob. Q ], problem P is further relaxed by discarding some constraints to yield B P h,n,d,l,r := max G R N+ d, δ R N+ 4. LR δ N P s.t. Tr { G A i,i hg } δ i δ i, i =,...,N, Tr { G D i hg+νu i G} δ i, i = 0,...,N, for any given unit vector ν R d. We explicitly illustrate the relaxation from P to P because Sec. 5 uses a similar but different relaxation. Taylor et al. [3] avoided this step to analyze a tight worst-case bound of P under a large-scale condition d N + [3, Thm. 5]; however, this relaxation P facilitates the analysis in [9,5,6] and in this paper. Replacing max G,δ LR δ N by min G,δ { δ N } for convenience in P, the Lagrangian of the corresponding constrained minimization problem with dual variables λ = λ,,λ N R N and τ = τ 0,,τ N R N+ for the first and second constraint inequalities of P respectively becomes where N N LG,δ,λ,τ;h = δ N + λ i δ i δ i + τ i δ i 4.3 i= i= i=0 +Tr { G Sh,λ,τG+ντ G }, N N Sh,λ,τ := λ i A i,i h+ τ i D i h. 4.4 Then, we have the following dual problem of P that one could use to compute a valid upper bound of P using a semidefinite program SDP for given h [9, Prob. DQ ]: { Sh,λ,τ B D h,n,l,r := min λ,τ Λ, LR γ : τ } τ γ 0, D γ R i=0 9

10 where Λ = { λ,τ R N+ + : } τ 0 = λ, λ N +τ N =,. 4.5 λ i λ i+ +τ i = 0, i =,...,N, The next section reviews the analytical solution to this upper bound D for OGM, instead of using a numerical SDP solver. 4. Review: Optimizing step coefficients for the cost function form of PEP Drori and Teboulle [9] optimized numerically the step coefficients h over the simple SDP problem D as follows [9, Prob. BIL]: ĥ D := argmin h R NN+/ B D h,n,l,r. HD The problem HD is bilinear, and [9, Thm. 3] 4 used a convex relaxation technique to make it solvable by numerical methods. In [5, Lemma 4], we solved HD analytically yielding the optimized step coefficients θ h i+,k = i+ θ k i j=k+ h j,k, k = 0,...,i, θi θ i+, k = i, for θ i in 3.6. Fortuitously, the optimized coefficients 4.6 lead to equivalent computationally efficient OGM and OGM forms [5, Prop. 3, 4 and 5], and the bound 3.5 for the final secondary iterate x N of OGM is implied by [5, Lemma 4]. Recently, Drori [7] showed that the OGM is optimal for d N+, implying that optimizing over the relaxed bound D in HD for simplicity is equivalent to optimizing over the exact worst-case cost function bound P when d N +. One could use a SDP solver to compute a numerical bound from D for any FSFOM; however, deriving an analytical bound using D is difficult for the primary sequence {y i } of OGM. Therefore, we devised a new relaxed bound in [6] similar to D, which we review next. 4.3 Review: Another cost function form of relaxed PEP for the primary sequence of OGM An upper bound of the worst-case bound on fy N+ fx for FSFOM with step coefficients h and y N+ = x N L fx N could be computed using D by a SDP solver. However, we found it difficult to find its analytical worst-case bound for the primary sequence {y i } of OGM, so [6, Prob. D ] provided the following alternate upper bound on fy N+ fx : B D h,n,l,r := min λ,τ Λ, γ R { LR γ : Sh,λ,τ+ u Nu N τ τ γ } 0, D which led to the bound 3.4 for the primary sequence {y i } of OGM in [6]. Similar to [5, Lemma 4], we found a feasible point of D in [6, Lemma 3.], along with feasible step coefficients h of a FSFOM: t h i+,k = i+ t k i j=k+ h j,k, k = 0,...,i, ti t i+, k = i, for t i in 3.3. Then, [6, Thm. 3.] showed the bound 3.4 using [6, Lemma 3.]. The step coefficients 4.6 and 4.7 are identical except the final iteration, since t i 3.3 and θ i 3.6 are equivalent for i < N. We are now done reviewing the portions of papers [5,6] that are the ingredients for specifying a parameterized family of algorithms that generalizes OGM in the next two sections. 4 [9, Thm. 3] has typos that are fixed in [5, Eq. 6.3]. 0

11 4.4 Feasible points of D and D for the generalized OGM This section specifies feasible points of D and D that lead to a generalized version of OGM. Specifically, the following lemma presents additional feasible points of D; this lemma reduces to [5, Lemma 4] and the step coefficients 4.6 of OGM when θ i = Ω i for all i. Lemma For the following step coefficients: h i+,k = the choice of variables: θ i+ Ω i+ θ k i j=k+ h j,k + θi θi+, i = 0,...,N, k = 0,...,i, Ω i+, i = 0,...,N, k = i, γ = Ω τ N, i = 0, 0, λ i = Ω i τ 0, i =,...,N, τ i = θ i τ 0 i =,...,N, θ N τ 0, i = N, is a feasible point of D for any choice of θ i such that θ 0 =, θ i > 0, and θ i Ω i := { i l=0 θ l, i = 0,...,N, N l=0 θ l +θ N, i = N. 4.0 Proof See Appendix B. The followinglemma alsospecifies somefeasible points of D ; this lemma reducesto [6, Lemma3.] and 4.7 when t i = T i for all i. Lemma 3 For the following step coefficients: h i+,k = t i+ T i+ t k i j=k+ h j,k + ti ti+, i = 0,...,N, k = 0,...,i, T i+, i = 0,...,N, k = i, 4. the choice of variables: γ = τ 0, λ i = T i τ 0, i =,...,N, τ i = { T N, i = 0, t i τ 0 i =,...,N, 4. is a feasible point of D for any choice of t i such that Proof See Appendix C. t 0 =, t i > 0, and t i T i := i t l. 4.3 Similar to the relationship between the step coefficients 4.6 and 4.7, the step coefficients 4.8 and 4. are identical when θ i = t i for i < N except for the final iteration, implying that the iterates {x i } N i=0 of the two FSFOMs with 4.8 and 4. are equivalent; only the final iterate x N is different. The step coefficients 4.6 and 4.7 lead to computationally efficient equivalent OGM forms; similarly the next section provides computationally efficient generalized forms of OGM that each correspond to a FSFOM with either 4.8 or 4., and we analyze their cost function worst-case bounds. l=0

12 4.5 Generalized OGM This section proposes a generalized OGM using lemmas and 3. The FSFOM with the step coefficients 4.8 has the following two equivalent efficient generalized forms of OGM, named GOGM and GOGM, that reduce to the standard OGM when θ i = Ω i for all i. Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, θ 0 = Ω 0 =. For i = 0,...,N y i+ = x i L fx i Choose θ i+ > 0 s.t. θi+ Ω i+, where { i+ Ω i+ = l=0 θ l, i N N l=0 θ l +θ N, i = N x i+ = y i+ + Ω i θ i θ i+ θ i Ω i+ y i+ y i + θ i Ω iθ i+ θ i Ω i+ y i+ x i Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, θ 0 = Ω 0 =. For i = 0,...,N y i+ = x i L fx i z i+ = x 0 i θ k fx k L Choose θ i+ > 0 s.t. θi+ Ω i+, where { i+ Ω i+ = l=0 θ l, i N N l=0 θ l +θ N, i = N x i+ = θ i+ y i+ + θ i+ z i+ Ω i+ Ω i+ Proposition The sequence {x 0,,x N } generated by the FSFOM with 4.8 is identical to the corresponding sequence generated by GOGM and GOGM. Proof See Appendix D. Note that this proof is independent of the choice of θ i and Ω i. Because the proof of Prop. for the FSFOM with step coefficients 4.8 is independent of the choice of θ i and Ω i, it is straightforward to show that the FSFOM with step coefficients 4. has the following two efficient equivalent forms, named GOGM and GOGM, that reduce to [6, Alg. OGM and OGM ] when t i = T i for all i. Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, t 0 = T 0 =. For i = 0,...,N y i+ = x i L fx i Choose t i+ > 0 s.t. t i+ T i+ i+ = t l l=0 x i+ = y i+ + T i t i t i+ t i T i+ y i+ y i + t i T it i+ t i T i+ y i+ x i Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, t 0 = T 0 =. For i = 0,...,N y i+ = x i L fx i z i+ = x 0 i t k fx k L Choose t i+ > 0 s.t. t i+ T i+ i+ = t l l=0 x i+ = t i+ y i+ + t i+ z i+ T i+ T i+ Clearly when θ i = t i for i < N, the primary iterates {y i } N i=0 and the intermediate secondary iterates {x i } N i=0 of GOGM and GOGM are equivalent. Although illustrating two similar algorithms GOGM and GOGM might seem redundant, presenting both formulations with lemmas and 3 completes the story of generalized OGM here and in Sec. 5. Using lemmas and 3, the following theorem bounds the cost function decrease of the GOGM and GOGM iterates.

13 Theorem 5 Let f : R d R be F L R d and let y 0,,y N,x N R d be generated by GOGM and GOGM. Then for any i N, fy i fx LR 4Ω i, 4.4 fx N fx LR Ω N. 4.5 The iterates y 0,,y N R d generated by GOGM and GOGM also satisfy the bound 4.4 when θ i = t i for i < N. Proof Using Lemma 3, the FSFOM with h in 4. of GOGM satisfies fy N+ fx B D h,n,l,r = LR γ = LR 4T N, 4.6 where y N+ = x N L fx N. Since the coefficients h in 4. are recursive and do not depend on a given N, we can extend 4.6 for all iterations. By letting θ i = t i for i < N, the bound 4.6 also satisfies for the iterates {y i } of GOGM, as in 4.4. Using Lemma, the FSFOM with the step h in 4.8 of GOGM satisfies which is equivalent to 4.5. fx N fx B D h,n,l,r = LR γ = LR Ω N, 4.7 GOGM and Thm. 5 reduce to OGM and its bounds 3.4 and 3.5, when{ θi = Ω i for all i. Similar i+a a to general forms of FGM in [3,6], the GOGM family includes the choice θ i =, i < N, N+a a, i = N for any a, because such parameter θ i satisfies the following conditions for GOGM: Ω i θ i = i+i+a a i+a a = a i +aa 3 a 0, 4.8 for i < N, and Ω N θ N = Ω N + θ N θ N θ N + θ N θ N = θ N 0. Similarly, the GOGM family includes the choice t i = i+a a for any a, which we denote as OGM-a. Corollary Let f : R d R be F L R d and let y 0,,y N R d be generated by GOGM with t i = i+a a OGM-a for any a. Then for any i N, fy i fx alr ii+a. 4.9 Proof Thm. 5 implies 4.9, since T i = i+i+a a any a. and the condition T i t i 0 satisfies in 4.8 for 5 Relaxation and optimization of the gradient form of PEP This sectionanalyzesaworst-casebound forthe gradientofanygogm and GOGM usingthe gradient form of PEP. We use relaxations on the gradient form of PEP that are similar but slightly different from those of PEP for the cost function in the previous section. Using this relaxed PEP, we prove that FGM has an O/N.5 rate for the worst-case gradient decrease, and analyze the worst-case gradient bound for the GOGM. Then, we optimize the step coefficients with respect to the gradient form of PEP and propose an algorithm named OGM-OG that lies in the GOGM family and that has the best known analytical worst-case bound for decreasing the gradient norm among the class FSFOM. 3

14 5. Relaxation for the gradient form of PEP To analyze a worst-case bound on the gradient for a FSFOM with a given h, we consider the following gradient-form version of PEP that is similar to P: B P h,n,d,l,r := max f F LR d, x 0,,x N R d, x X f, x 0 x R s.t. x i+ = x i L min fx i P i {0,...,N} i h i+,k fx k, i = 0,...,N. Here, we use the smallest gradient norm squared among all iterates min i {0,...,N} fx i as a criteria,asconsideredin[3,sec.4.3].wecouldinsteadconsiderthefinalgradientnormsquared fx N asacriteria,but ourproposedrelaxationonp inthis sectionforsuchcriteriaprovidedonlyano/n worst-case bound at best even for the corresponding optimized step coefficients results not shown; we leave studying the gradient form of the tight PEP as future work. As in [3], we replace min i {0,...,N} fx i in P by L x 0 x α with the condition α L x 0 x fx i = Tr { G u i u i G} for all i. Then, we relax this reformulated P similar to the relaxation from P to P with the additional constraint Tr { G C N G } δ N in P as follows: B P h,n,d,l,r := max G R N+d, δ R N+, α R L R α P s.t. Tr { G A i,i hg } δ i δ i, i =,...,N, Tr { G C N G } δ N, Tr { G D i hg+νu i G } δ i, i = 0,...,N, Tr { G u i u i G} α, i = 0,...,N. Replacing max G,δ L R α by min G,δ { α} for convenience, the Lagrangian of the corresponding constrainedminimizationproblemwithdualvariablesλ R N +,η R +,τ R N+ +,andβ = β 0,,β N R N+ + for the first, second, third, and fourth set of constraint inequalities of P respectively becomes where N N N L G,δ,λ,η,τ,β;h = α+ λ i δ i δ i ηδ N + τ i δ i + β i α 5. i= i=0 i=0 +Tr { G S h,λ,η,τ,βg+ντ G }, S h,λ,η,τ,β := Sh,λ,τ+ N ηu Nu N β i u i u i. 5. Then similar to D, we have the following dual problem of P that one could use to compute an upper bound of the PEP P of the smallest gradient norm squared among all iterates by a numerical SDP solver: S h,λ,η,τ,β τ where B D h,n,l,r := Λ := { min λ,η,τ,β Λ, L R γ : γ R { λ,η,τ,β R 3N+3 + : i=0 τ γ } 0, D N τ 0 = λ, λ N +τ N = η, i=0 β } i =,. 5.3 λ i λ i+ +τ i = 0, i =,...,N The next two sections use a valid upper bound D of P for given step coefficients h, providing an analytical solution to D for the step coefficients h of FGM and GOGM, superseding the use of a SDP solver. 4

15 5. A worst-case bound for the gradient norm of FGM FGM is equivalent to a FSFOM with the step coefficients [5, Prop. ]: t h i+,k = i+ t k i j=k+ h j,k, k = 0,...,i, + ti t i+, k = i, 5.4 for t i in 3.3. The following lemma provides a feasible point of D associated with the step coefficients 5.4 of FGM to provide a worst-case bound for the gradient of FGM. Lemma 4 For the step coefficients 5.4, the following choice of variables: γ = τ 0, λ i = t N, i τ 0, i =,...,N, τ i = k t i = 0, t i τ 0, i =,...,N, 5.5 η = t N τ 0, β i = t i τ 0, i = 0,...,N, 5.6 is a feasible point of D for t i in 3.3. Proof See Appendix E. Using Lemma 4, the following theorem bounds the gradient norm of the FGM iterates, proving for the first time an O/N.5 rate of decrease. Theorem 6 Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by FGM. Then for any N, where y N+ = x N L fx N. min fy i min fx i 5.7 i {0,...,N+} i {0,...,N} LR N t k 3LR N +N +6N +, Proof Lemma implies the first inequality in 5.7. Using Lemma 4, the FSFOM with the step coefficients h 5.4 of FGM satisfies min fx i B D h,n,l,r = i {0,...,N} L R γ = L R N, 5.8 t k which is equivalent to 5.7 using N t k N k+ 4 = N+N +3N A worst-case bound for the gradient norm of GOGM Having established the gradient bound 5.7 for FGM, this section and the next seek to improve on it by studying GOGM. To bound the gradient decrease of GOGM and GOGM, the following lemma illustrates one possible set of feasible points of D. Lemma 5 For the step coefficients 4., the following choice of variables: γ = N τ 0, λ i = T i τ 0, i =,...,N, τ i = Tk tk, i = 0, t i τ 0, i =,...,N, 5.9 η = T N τ 0, β i = T i t i τ0, i = 0,...,N 5.0 is a feasible point of D for any choice of t i and T i that satisfies 4.3 and for which there exists some i such that t i < T i. 5

16 Proof See Appendix F. Using Lemma 5, the following theorem bounds the worst-case gradient norm for the iterates of GOGM and GOGM. Theorem 7 Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by GOGM. Then for any N, min fy i min fx i i {0,...,N+} i {0,...,N} LR N, 5. T k t k where y N+ = x N L fx N. The bound 5. can be generalized to the intermediate iterates {x i } N i=0 and {y i } N i=0 of both GOGM and GOGM when θ i = t i for i < N. Proof Lemma implies the first inequality in 5.. Using Lemma 5, FSFOM with the step coefficients h 4. of GOGM satisfies min fx i B D h,n,l,r = i {0,...,N} L R L R γ = 4 N T k t k, which implies 5.. Since the iterates of GOGM are recursive and do not depend on a given N, the bound 5. easily generalizes to the intermediate iterates of GOGM and GOGM when θ i = t i. 5.4 Optimizing step coefficients over the gradient form of PEP In search of a FSFOM that decreases the gradient norm the fastest, we optimize the step coefficients in terms of the gradient form of the relaxed D by solving the following problem: ĥ D := argmin h R NN+/ B D h,n,l,r. HD The problem HD is bilinear, similar to HD, and a convex relaxation technique [9, Thm. 3] makes this problem solvable using numerical methods. We solved HD for many choices of N using a numerical SDP solver [4,], and observed that the following choice of t i :, i = 0, t i = + +4t i, i =,..., N/, N i+, i = N/,...,N, 5. makes the feasible point in Lemma 5 optimal for the problem HD. Based on that numerical evidence, we conjecture that ĥd in HD corresponds to the step coefficients 4. with the parameter t i 5.. The t i factors in 5. start decreasing after i = N/, whereas the usual t i in 3.3 and t i = i+a a for any a increase with i indefinitely. In addition, we found numerically that minimizing the gradient bound 5. of GOGM, i.e., solving the following constrained quadratic problem: k max {t i} N l=0 t l t k s.t. t i satisfies 4.3 for all i, 5.3 is equivalent to solving the problem HD. In other words, the solution of 5.3 numerically appears equivalent to 5., the conjectured solution of HD. The unconstrained maximizer of the cost function of 5.3 is t i = N i+, and this term partially appears in the constrained maximizer 5. for N/ i N. We denote the resulting GOGM with 5. as OGM-OG OG for optimized over gradient. The following theorem bounds the cost function and gradient norm of the OGM-OG iterates. 6

17 Theorem 8 Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by OGM-OG. Then, where y N+ = x N L fx N. fy N+ fx LR N + min fy i min fx i i {0,...,N+} i {0,...,N}, 5.4 6LR N N +, 5.5 Proof OGM-OG is an instance of GOGM and thus Thm. 5 implies that OGM-OG satisfies which is equivalent to 5.4, since T N = T m + fy N+ fx LR 4T N, N l=m+ t l = t m N +3 +NN N mn m+ 4 = N +8N +9 6 for m = N/, using m N, N m N, and t m m+ N+3 4 in 3.3. Thm. 7 implies 5.5, using the above inequalities for m, the equality t i = T i for i m, and N k=m+ =N mt m + Tk t k = N N k=m+ N m =N mt m + k = k=m+ k k l=m+ l = t m + k l=m+ N l+ N l m+ =N mt m N m N m+k kk k= =N mt m + N m k= k t l t k N k + N k m+ N m+ N m+k +k 4 N m+ +N m+3/4k 4 =N mt N mn m+/n m+ m 6 N mn m+3/4n m+ N mn m+ + 4 N mm+ + N m N m+ N mn m N m N m+ 3 4 N N +. The gradient bound 5.5 of OGM-OG is asymptotically -times smaller than that of FGM in Thm. 6 and.5-times smaller than that of OGM-m = N/3 in Thm. 4. Regarding the cost function decrease, the bound 5.4 of OGM-OG is asymptotically the same as the bound 3. of FGM, and both are twice larger than the bounds 3.4 and 3.5 of OGM. 7

18 5.5 Decreasing the gradient norm with rate O/N.5 using GOGM without selecting N in advance Although OGM-OG satisfies a small worst-case gradient bound with a rate O/N.5, OGM-OG and FGM-m and OGM-m must select N in advance, unlike FGM. Using Thm. 7, the following corollary shows that OGM-a with a > can decrease the gradient with a rate O/N.5 without selecting N in advance. Cor. showed that OGM-a algorithm with a can decrease the cost function with an optimal rate O/N. Corollary Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by GOGM with t i = i+a a OGM-a for any a. Then for N, where y N+ = x N L fx N. min fy i min fx i 5.6 i {0,...,N+} i {0,...,N} a 6LR NN +a N +3a 4a, Proof Using T i = i+i+a a and 4.8, Thm. 7 implies 5.6 since N N T k t k = a k +aa 3k a = NN + a N +3a 4a 6a. a OGM-a forany a > hasagradientbound 5.6 that is about -times largerthan the bound 5.5 a of OGM-OG. This constant factor minimizes to when a = 4, and this OGM-a=4 has a worst-case gradient bound that is asymptotically equivalent to the bound 5.7 of FGM. Therefore, when one does not want to select N in advance, both FGM and OGM-a=4 and OGM-a for any a > will be useful for decreasing the gradient with a rate O/N.5. 6 Discussion This section summarizes analytical worst-case bounds of FSFOM discussed in the previous sections. This section also reports tight numerical worst-case bounds for exact comparison of algorithms because many of the analytical bounds are not guaranteed to be tight. 6. Summary of analytical worst-case bounds on the cost function and gradient norm Table summarizes the asymptotic rate of analytical worst-case bounds of all algorithms described in this paper. As discussed, OGM and OGM-OG have the best known worst-case bounds for the cost function and gradient decrease respectively in Table. However, since OGM has a slow worst-case rate for the gradient decrease, other algorithms such as FGM, OGM-m, OGM-OG, and OGM-a that satisfy both the optimal rate O/N for the function decrease and a fast rate O/N.5 for the gradient decrease could be preferableoverogm when one is interested in both the gradientdecrease as well as the function decrease, particularly when solving dual problems. In addition, when one does not want to choose N in advance, FGM and OGM-a could be preferable. 6. Tight worst-case bounds on the cost function and the gradient norm Since many worst-case bounds presented in Table are not guaranteed to be tight, we used the code in Taylor et al. [3] with SDP solvers [0,8] to compare tight numerical worst-case bounds for N =,,4,0,0,30,40,47,50. These numerical worst-case bounds are guaranteed to be tight, i.e., equivalent to the bounds of either P or P, when the large-scale condition d N + is satisfied [3, Thm. 5], and we assume this condition hereafter. Tables and 3 provide tight worst-case bounds for 8

19 Asymptotic worst-case bound Require selecting Algorithm Cost function Gradient norm N in advance GM 4 N N No FGM N 3N.5 No OGM N N No 9 OGM-m= N/3 4 N 3 6 N.5 Yes OGM-OG N 6N.5 Yes a OGM-a a > N a 6 a N.5 No OGM-a=4 N 3N.5 Table Asymptotic worst-case bounds on the cost function LR fx N fx and the gradient norm min i {0,...,N} LR fx i of GM, FGM, OGM, OGM-m, OGM-OG, and OGM-a. The worst-case cost function bound for OGM-m in the table corresponds to the bound for OGM after m iterations, because we do not have an analytical bound for the final iterate. the decrease of the cost function fx N fx and the gradient norm decrease min i {0,...,N} fx i respectively. Although most of the bounds in Table are not guaranteed to be tight, the worst-case rate formulas in Table are similar to the tight numerical results in Tables and 3, except that the gradient bounds of OGM-m in Table are relatively looser than those of OGM-m in Table 3. In particular, the tight numerical gradient bound of OGM-m is smaller than that of FGM in Table 3, which was not expected from their known possibly loose analytical bounds in Table. N GM FGM OGM OGM-m OGM-OG OGM-a Empi. O N.0 N.9 N.9 N.8 N.8 N.8 Known O N N N N N N LR Table Tight worst-case bounds on, the reciprocal of the cost function, of GM, FGM, OGM, OGMm= N/3, OGM-OG, and OGM-a=4. We computed empirical rates by assuming that the bounds follow the form bn c fx N fx with constants b and c, and then by estimating c from points N = 47,50. Note that the corresponding empirical rates are underestimated due to its simple modeling of bounds. N GM FGM OGM OGM-m OGM-OG OGM-a Empi. O N.0 N.4 N 0.9 N.4 N.4 N.4 Known O N N.5 N N.5 N.5 N.5 LR Table 3 Tight worst-case bounds on, the reciprocal of the gradient norm, of GM, FGM, OGM, min i {0,...,N} fx i OGM-m= N/3, OGM-OG, and OGM-a=4. Empirical rates were computed as described in Table. 9

20 6.3 Tight worst-case bounds on the gradient norm at the final iterate To be clear, Sec. 5 and Tables, 3 have focused on analyzing the smallest gradient norm among all iterates using the gradient form of PEP, whereas the gradient analysis in Sec. 3. considers the final gradient in addition to the smallest gradient among all iterates. As mentioned before, we have not yet found a relaxation on the final gradient form of the PEP that provides as comparable results as for the relaxation on the smallest gradient form of the PEP P in Sec. 5. To complete comparisons on the worst-case gradient bounds, Table 4 uses the code provided by Taylor et al. [3] with SDP solvers [0, 8] to compare tight numerical worst-case bounds on the final gradient of the FSFOMs presented in this paper. 5 The worst-case smallest gradient norm bounds 3. and 3.6 of GM and OGM-m respectively among algorithms considered extend to the final gradient bounds. In Table 4, FGM and OGM-a=4 have slow O/N tight worst-case bounds on the final gradient, unlikeogm-m= N/3 andogm-ogroughlyhavingo/n.5 boundsforboth the smallestand final gradients. Thm. 4 has shown that the final gradient of OGM-m satisfies a worst-case rate O/N.5, but this is unknown yet for OGM-OG, which we leave as future work. We also leave as future work the challenge of developing a FSFOM that has O/N.5 or even faster worst-case rates for the final gradient decrease that are lower than those of OGM-m and OGM-OG, possibly without requiring to choose N in advance. N GM FGM OGM OGM OGM OGM-a y N x N y N x N -m -OG y N x N Empi. O N.0 N 0.9 N 0.9 N.4 N.3 N 0.9 Known O N N N N.5 N N LR LR and fx N fy N Table 4 Tight worst-case bounds on, the reciprocal of the final gradient norm, of GM, FGM, OGM, OGM-m= N/3, OGM-OG, and OGM-a=4. Empirical rates were computed as described in Table. The known bounds of OGM-OG and OGM-a=4 are derived based on Sec. 3., where the empirical bounds of OGM-OG are comparable to the bounds of OGM-m= N/3 with known rate O/N Non-optimality of OGM-OG in terms of the worst-case gradient bound Because OGM is optimal in terms of the function decrease when d N + [7], one might hope that the OGM-OG would achieve the optimal worst-case bound in terms of the gradient decrease, since OGM-OG is also derived by optimizing the step coefficients over the gradient form of relaxed PEP. However, the OGM-OG is apparently not optimal as explained next. Taylor et al. [3] numerically studied an optimal fixed-step GM using their tight PEP in terms of both the cost function and gradient decrease. In other words, they searched for an optimal step h of GM: x i+ = x i h L fx i for i = 0,...,N and a given N with respect to either fx N fx or fx N. In the special case of N =, Taylor et al. [3] numerically conjectured that the step size h =.5 is optimal in terms of the 5 Table 4 reports tight worst-case gradient bounds for both the final primary iterate y N and the final secondary iterate x N if necessary, unlike Tables and 3. We observed that numerical tight worst-case cost function bounds on both final iterates y N and x N have similar values for the algorithms in Table unlike Table 4, so we did not report the bounds on y N for simplicity. We also did not report numerical tight smallest worst-case gradient norm bounds min i {0,...,N} fy i of the primary iterates {y i } because the code provided by Taylor et al. [3] does not support computing their values, unlike that of the secondary iterates {x i } in Table 3. 0

21 cost function decrease. The corresponding GM is equivalent to OGM for N =, and this numerically confirms the optimality of OGM [7] for N =. They also numerically conjectured that the optimal step size of GM for N = in terms of the gradient decrease is h = with a worst-case bound fx LR + LR However, OGM-OG for N = reduces to GM with h = 4 LR 3.3 with a bound.3 in Tables 3 and 4, implying that OGM-OG is not optimal even for N = based on the numerical evidence in [3]. This analysis for N = illustrates that there is still room for improvement in accelerating the worstcase rate of first-order methods in terms of gradients, which we leave as future work possibly with a tighter relaxation on the gradient form of PEP. In addition, we leave as future work studying the optimal worst-case bound for the gradient decrease of first-order methods building upon [7, 3], and developing a FSFOM that achieves such optimal bound. Nevertheless, the OGM-OG is the best known FSFOM for decreasing the gradient norm among the class FSFOM, and will be useful when decreasing the gradient is key. 7 Conclusion We generalized the formulation of OGM and analyzed its worst-case bounds on the function value and gradient, using the cost function form and the gradient form of relaxed PEP. We then proposed OGM-OG by optimizing the step coefficients of FSFOM using a relaxed PEP with respect to the gradient, similar to the development of the optimal OGM. To the best of our knowledge, the worst-case bound on the gradient of the OGM-OG is the best known analytical worst-case bound for decreasing the gradient norm among the class FSFOM. However, this OGM-OG is not optimal for decreasing the gradient norm, and further accelerating the worst-case rate of FSFOM in terms of the gradient possibly with a tight relaxation on the gradient form of PEP is a possible research direction. On the other hand, deriving an optimal worst-case bound for the gradient norm of first-order methods, similar to that for the function decrease [7] will be useful. Nonetheless, the proposed OGM-OG and OGM-a may be useful when one finds minimizing gradients important, particularly in dual problems. In addition, we used the proposed gradient form of PEP to show that FGM decreases the smallest gradient with a rate O/N.5, implying that FGM is comparable in a big-o sense to OGM-m, OGM-OG and OGM-a for the gradient decrease. Our analysis considers unconstrained smooth convex minimization; extending such gradient norm worst-case analysis to constrained problems or nonsmooth composite convex problems is a natural direction to pursue, which is studied for FGM or FISTA [] by the authors [7]. In addition, extending the analyses on the general form of FGM in [,3,9] to GOGM is a possible research direction. Lastly, investigating a new relaxation of the PEP approach that allows adaptive step size such as backtracking line-search or exact line-search [5, 8] is of interest. Software has Matlab codes for the algorithms considered and the SDP approaches in Sec. 5.4 and Sec. 6. A Proof of Thm. 3 Due to 3.0, we have max f F L R d, x Xf, x 0 x R min fx i i {0,...,N} max f F L R d, x Xf, x 0 x R fx N LR θ N,

arxiv: v5 [math.oc] 23 Jan 2018

arxiv: v5 [math.oc] 23 Jan 2018 Another look at the fast iterative shrinkage/thresholding algorithm FISTA Donghwan Kim Jeffrey A. Fessler arxiv:608.086v5 [math.oc] Jan 08 Date of current version: January 4, 08 Abstract This paper provides

More information

Optimized first-order methods for smooth convex minimization

Optimized first-order methods for smooth convex minimization Noname manuscript No. will be inserted by the editor) Optimized first-order methods for smooth convex minimization Donghwan Kim Jeffrey A. Fessler Date of current version: September 9, 05 Abstract We introduce

More information

A New Look at the Performance Analysis of First-Order Methods

A New Look at the Performance Analysis of First-Order Methods A New Look at the Performance Analysis of First-Order Methods Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Yoel Drori, Google s R&D Center, Tel Aviv Optimization without

More information

Optimized first-order minimization methods

Optimized first-order minimization methods Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Adaptive Restart of the Optimized Gradient Method for Convex Optimization

Adaptive Restart of the Optimized Gradient Method for Convex Optimization J Optim Theory Appl (2018) 178:240 263 https://doi.org/10.1007/s10957-018-1287-4 Adaptive Restart of the Optimized Gradient Method for Convex Optimization Donghwan Kim 1 Jeffrey A. Fessler 1 Received:

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

arxiv: v2 [math.oc] 28 Nov 2017

arxiv: v2 [math.oc] 28 Nov 2017 Adaptive Restart of the Optimized Gradient Method for Convex Optimization Donghwan Kim Jeffrey A. Fessler arxiv:703.0464v [math.oc] 8 Nov 07 Date of current version: November 9, 07 Abstract First-order

More information

Performance of first-order methods for smooth convex minimization: a novel approach

Performance of first-order methods for smooth convex minimization: a novel approach Performance of first-order methods for smooth convex minimization: a novel approach arxiv:206.3209v [math.oc] 4 Jun 202 Yoel Drori and Marc Teboulle June 5, 202 Abstract We introduce a novel approach for

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Chapter 8 Optimization basics

Chapter 8 Optimization basics Chapter 8 Optimization basics Contents (class version) 8.0 Introduction........................................ 8.2 8.1 Preconditioned gradient descent (PGD) for LS..................... 8.3 Tool: Matrix

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

EE 227A: Convex Optimization and Applications October 14, 2008

EE 227A: Convex Optimization and Applications October 14, 2008 EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

A General Framework for Convex Relaxation of Polynomial Optimization Problems over Cones

A General Framework for Convex Relaxation of Polynomial Optimization Problems over Cones Research Reports on Mathematical and Computing Sciences Series B : Operations Research Department of Mathematical and Computing Sciences Tokyo Institute of Technology 2-12-1 Oh-Okayama, Meguro-ku, Tokyo

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

4. Algebra and Duality

4. Algebra and Duality 4-1 Algebra and Duality P. Parrilo and S. Lall, CDC 2003 2003.12.07.01 4. Algebra and Duality Example: non-convex polynomial optimization Weak duality and duality gap The dual is not intrinsic The cone

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Relaxed linearized algorithms for faster X-ray CT image reconstruction Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Lecture 9 Sequential unconstrained minimization

Lecture 9 Sequential unconstrained minimization S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities

More information

Projection methods to solve SDP

Projection methods to solve SDP Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

arxiv: v1 [math.oc] 21 Apr 2016

arxiv: v1 [math.oc] 21 Apr 2016 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and

More information

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Proximal-like contraction methods for monotone variational inequalities in a unified framework Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University Determinant maximization with linear matrix inequality constraints S. Boyd, L. Vandenberghe, S.-P. Wu Information Systems Laboratory Stanford University SCCM Seminar 5 February 1996 1 MAXDET problem denition

More information

Solving large Semidefinite Programs - Part 1 and 2

Solving large Semidefinite Programs - Part 1 and 2 Solving large Semidefinite Programs - Part 1 and 2 Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Singapore workshop 2006 p.1/34 Overview Limits of Interior

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Research Reports on Mathematical and Computing Sciences

Research Reports on Mathematical and Computing Sciences ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs Laura Galli 1 Adam N. Letchford 2 Lancaster, April 2011 1 DEIS, University of Bologna, Italy 2 Department of Management Science,

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Lecture 7 Monotonicity. September 21, 2008

Lecture 7 Monotonicity. September 21, 2008 Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Inderjit Dhillon The University of Texas at Austin

Inderjit Dhillon The University of Texas at Austin Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Unconstrained minimization

Unconstrained minimization CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information