arxiv: v4 [math.oc] 1 Apr 2018
|
|
- Shon Rice
- 5 years ago
- Views:
Transcription
1 Generalizing the optimized gradient method for smooth convex minimization Donghwan Kim Jeffrey A. Fessler arxiv: v4 [math.oc] Apr 08 Date of current version: April 3, 08 Abstract This paper generalizes the optimized gradient method OGM [9, 5, 6] that achieves the optimal worst-case cost function bound of first-order methods for smooth convex minimization [7]. Specifically, this paper studies a generalized formulation of OGM and analyzes its worst-case rates in terms of both the function value and the norm of the function gradient. This paper also develops a new algorithm called OGM-OG that is in the generalized family of OGM and that has the best known analytical worst-case bound with rate O/N.5 on the decrease of the gradient norm among fixed-step first-order methods. This paper also proves that Nesterov s fast gradient method [4,6] has an O/N.5 worstcase gradient norm rate but with constant larger than OGM-OG. The proof is based on the worst-case analysis called Performance Estimation Problem in [9]. Introduction First-order methods are favorable for solving large-scale problems because their computational complexity per iteration depends mildly on the problem dimension. In particular, Nesterov s fast gradient method FGM[4,6]achievestheoptimalworst-caserateO/N fordecreasingsmoothconvexfunctionsafter N iterations [5], and thus has been widely used in large-scale applications. Recently, the optimized gradient method OGM [9, 5, 6] was found to achieve the optimal worst-case cost function bound of first-order methods with either fixed-step or adaptive-step approaches for smooth convex minimization in [7], whereas FGM achieves that bound only up to constant. Building upon [9, 5, 6], this paper presents two different ways of generalizing OGM and its development. First, this paper specifies a parameterized family of algorithms that generalizes OGM, and provides worst-case bounds on the function and gradient norm values for this family. Like the generalized forms of FGM [3, 6] being widely used and studied e.g., [, 3, 9], we believe introducing the generalized OGM here can be potentially useful. Second, this paper optimizes the step coefficients of fixed-step first-order methods with respect to the rate of decrease of the cost function s gradient norm, leading to a new algorithm called OGM-OG OG for optimized over a gradient. This development expands the choice of worst-case rate metrics for optimizing first-order methods in [9, 5, 6] that focused on the cost function decrease leading to OGM. We next briefly review the Performance Estimation Problem PEP [9] that was used in [9,5,6] to develop OGM and that we extensively use throughout the paper. This research was supported in part by NIH grant U0 EB Donghwan Kim Jeffrey A. Fessler Dept. of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 4809 USA kimdongh@umich.edu, fessler@umich.edu There is a backtracking line-search version of FGM [5] that also achieves the optimal worst-case function bound up to constant, which is sometimes more useful than the fixed-step FGM in practice. However, such backtracking line-search version of OGM with a fast worst-case bound is yet unknown unlike the fixed-step OGM [9,5,6], while recently an exact line-search version of OGM is developed in [8].
2 Drori and Teboulle [9] cast a worst-case analysis into an optimization problem called PEP [9] that examines the maximal absolute cost function inaccuracy over all possible inputs cost functions to the optimization algorithm. See e.g., [5,8,0,5,6,7,8,9,30,3] for its extensions. Moreover, Drori and Teboulle [9] optimized numerically the step coefficients of first-order methods using PEP for smooth convex minimization, and found an algorithm whose worst-case bound is lower than that of FGM, but it required too much computation and memory to be appealing for large-scale problems. Building on their work, the authors [5, 6] found computationally and memory-wise efficient version, called OGM, and showed analytically that OGM satisfies an analytical worst-case bound that is twice smaller than that of FGM. Drori [7] showed that the OGM is optimal for large-dimensional smooth convex minimization over a general class of first-order methods with either fixed or adaptive step sizes [7]. This OGM has been numerically extended for nonsmooth composite convex problems in [30]. In addition, this OGM-type algorithm was already studied in the context of a proximal point method [3, Appendix]. Using the PEP approach[9], this paper proposes a generalized version of OGMGOGM and analyzes its worst-case rates in terms of both the decrease of the cost function and the decrease of the norm of the gradient of the cost function. The results complement the worst-case analysis of the OGM [5, 6], and expands our understanding of OGM-type first-order methods. This paper analyzes the worst-case rate of the gradient norm in addition to that of the cost function because it is important when dealing with dual problems, considering that the dual gradient norm corresponds to the primal distance to feasibility see e.g., [6,, 7]. While FGM has not been shown previously to satisfy a rate O/N.5 for decreasing the gradient norm, modified versions of FGM with such rate were studied in [,,7]. This paper proves that FGM in fact does have that rate, building upon [3] that numerically conjectured such rate for FGM using the gradient norm version of PEP. For further acceleration of the worst-case gradient norm rate, we optimize the step coefficients of first-order methods with respect to the gradient norm using PEP and propose an algorithm named OGM-OG that belongs to the GOGM family and has the best known analytical worst-case bound on rate of decrease of the gradient norm among fixed-step first-order methods. One can extend some aspects of the approaches for generalizing OGM described in this paper to other optimization algorithms and problems. One direction we have already taken in [7] aims to improve the fast iterative shrinkage/thresholding algorithm FISTA [] that reduces to FGM for smooth convex problems for nonsmooth composite convex problems. Naturally, this paper and [7] use some similar approaches, but they are different; the methods in [7] when simplified to the smooth case correspond to a generalization of FGM that differs from the GOGM. Another direction we have recently taken in [8] focuses on optimizing the step coefficients of first-order methods with respect to the gradient norm under the initial bounded function condition that is different from the initial bounded distance condition used in this paper. Sec. defines the smooth convex problem and the first-order methods. Sec. 3 reviews and discusses worst-case analyses of a gradient method GM, FGM, and OGM for both the function value and the gradient norm. Sec. 3 also reviews first-order methods that guarantee an O/N.5 rate for the gradient decrease. Sec. 4 reviews the cost function form of PEP [9] and reviews how the OGM [5] is derived using such PEP. Sec. 4 then proposes a generalized version of OGM GOGM using the cost function form of PEP, and Sec. 5 provides a worst-case gradient norm bound for the GOGM using the gradient form of PEP. Then, Sec. 5 optimizes the step coefficients using the gradient form of PEP and proposes the OGM-OG that belongs to the GOGM family. Sec. 5 also proves that FGM decreases the gradient norm with a rate O/N.5. Sec. 6 and Sec. 7 provide discussion and conclusion. Smooth convex problem and first-order methods. Smooth convex problem We focus on the following smooth convex minimization problem min fx, x R d M The original PEP was intractable to solve, so a series of relaxation on the PEP was introduced in [9] to make it possibly solvable, which we review in Sec. 4..
3 where the following additional conditions are assumed: f : R d R is a convex function of the type C, L Rd, i.e., continuously differentiable with Lipschitz continuous gradient: fx fy L x y, x,y R d,. where L > 0 is the Lipschitz constant. The optimal set X f = argmin x R d fx is nonempty, i.e., the problem M is solvable. We use F L R d to denote the class of functions that satisfy the above conditions. We also assume that the distance between an initial point x 0 and an optimal solution x Xf is bounded by some R > 0, i.e., x 0 x R... First-order methods To solve M, we consider the following class of fixed-step or non-apdative-step first-order methods FSFOM, where the update step at i + th iteration is a weighted sum of the previous and current gradients{ fx k } i scaledby L withfixedconstantstepcoefficients{h i+,k} i thatarenotadaptive tothegivenf andx 0 andthuslandr.thisclassfsfomincludesgm, FGM,OGM,andthemethods proposed in this paper, but excludes line-search-type methods. Algorithm Class FSFOM Input: f F L R d, x 0 R d. For i = 0,...,N x i+ = x i i h i+,k fx k. L 3 Review of the worst-case analysis of FSFOM This section reviews the worst-case analysis of existing FSFOMs and simple variants thereof in terms of bounds on the cost function and gradient norm. Sec. 3. reviews the worst-case cost function decrease of GM, FGM, and OGM. Sec. 3. presents both FSFOMs including GM, FGM, OGM and some variants that have either O/N or O/N.5 rate for the worst-case gradient decrease; it also reviews an O/N lower bound of the worst-case rates of first-order methods for decreasing the gradient norm. 3. Function value worst-case analysis of FSFOM The simplest example of a FSFOM is the following GM that uses only the current gradient and the Lipschitz constant L for the update. Algorithm GM Input: f F L R d, x 0 R d. For i = 0,,... x i+ = x i L fx i 3
4 This GM monotonically decreases the cost function [5] and satisfies the following tight 3 worst-case bound [9, Thm. ], for any i 0, fx i fx LR 4i+. 3. Among the class FSFOM, the following two equivalent forms of FGM [4,6] have been used widely because they decrease the cost function with the optimal rate O/N. Algorithm FGM Input: f F L R d, x 0 = y 0 R d, t 0 =. For i = 0,,... y i+ = x i L fx i + +4t i t i+ =, x i+ = y i+ + t i y i+ y i t i+ Algorithm FGM Input: f F L R d, x 0 = y 0 R d, t 0 =. For i = 0,,... y i+ = x i L fx i z i+ = x 0 i t k fx k L + +4t i t i+ =, x i+ = y i+ + z i+ t i+ t i+ Specifically, the FGM and FGM iterates satisfy the following worst-case cost function bounds [5, 4, 6] for any i : fy i fx LR t LR i i+, and fx i fx LR t i where the parameter t i satisfies t i = i l=0 t l and t i i+ LR i+, 3. for all i. 3.3 A generalized form of FGM in [3] uses parameters t i satisfying t 0 = and t i t i + t i, including the choice t i = i+a a for any a. There is another generalized form of FGM in [6], and these generalized forms of FGM have been widely used and studied e.g., [, 3, 9]. Similarly this paper studies generalizations of the OGM. Building upon [9] that optimized numerically the step coefficients over the cost function form of PEP, the authors [5] developed the following two equivalent forms of OGM, as reviewed in Sec. 4. Algorithm OGM Input: f F L R d, x 0 = y 0 R d, θ 0 =. For i = 0,...,N y i+ = x i L fx i + +4θ i, i N θ i+ = + +8θ i, i = N x i+ = y i+ + θ i θ i+ y i+ y i + θ i θ i+ y i+ x i Algorithm OGM Input: f F L R d, x 0 = y 0 R d, θ 0 =. For i = 0,...,N y i+ = x i L fx i z i+ = x 0 i θ k fx k L + +4θ i, i N θ i+ = + +8θ i, i = N x i+ = y i+ + z i+ θ i+ θ i+ 3 A tight worst-case bound denotes an inequality where the equality holds for some function f. For example, [9, Thm. ] shows the bound 3. is tight. 4
5 The OGM iterates satisfy the following worst-case cost function bounds [5, 6]: for any i N, and The parameter sequence θ i satisfies fy i fx LR 4θi LR i+, 3.4 fx N fx LR θ N θ i = { i+ l=0 θ l, i N, N l=0 θ l +θ N, i = N, LR N +N and θ i { i+, i N, i+, i = N, which is equivalent to t i 3.3 except at the final iteration. The bounds 3.4 and 3.5 of OGM are about twice smaller than the bounds 3. of FGM, so OGM decreases the cost function faster than FGM in the worst case and often in practice [4]. In addition, the bound 3.5 on the final iterate x N is tight and satisfies the optimal worst-case bound of general first-order methods including both FSFOM and adaptive-step first-order methods, when the condition d N + holds [7]. The additional term θi θ i+ y i+ x i of OGM and the additional constant for the update of z i of OGM, compared to FGM and FGM respectively, along with the parameter θ N, are what make OGM optimal for d N +. One of the main goals of this paper is to generalize the form of OGM and analyze the worst-case rate of such generalized OGM in terms of both the function value and the gradient norm, complementing the bounds 3.4 and 3.5 on the function value of OGM. The next section studies the worst-case rate of the gradient of FSFOM Gradient norm worst-case analysis of FSFOM When tackling dual problems, it is known that the gradient norm worst-case rate is important in addition to the function value worst-case rate because the dual gradient norm is related to the primal distance to feasibility see e.g., [6,,7]. One simple way to find a loose worst-casebound for the gradient norm is to use the well-known convex inequality for convex functions with L-Lipschitz continuous gradients [5]: L fx fx f x L fx fx fx, x R d, 3.7 as discussed in [5,3]. Combining the bounds 3., 3. and the inequality 3.7, for any i, the GM iterates satisfy and the iterates of FGM satisfy fx i Lfx i fx LR i+, 3.8 fy i LR t i LR i+, and fx i LR t i LR i Similarly for any i N, the OGM iterates with the bounds 3.4 and 3.5 satisfy fy i LR LR θi i+, and fx N LR LR θ N N Unfortunately, using the inequality 3.7 provides at best an O/N bound due to the optimal rate O/N of the function decrease. Furthermore, in general using 3.7 need not lead to tight worst-case bounds on the gradient norm. Using a different approach, a smaller O/N worst-case bound for the gradient norm of GM was derived in [7], as reviewed in next section. While the bounds 3.8, 3.9, and 3.0 are not guaranteed to be tight, the next section shows that the worst-case gradient bound 3.0 on the final iterate x N of OGM is in fact tight and thus has the same disappointingly slow O/N worst-case bound on the gradient norm as GM. 5
6 3.. FSFOM with rate O/N for decreasing the gradient norm This section uses the following lemma stating that GM monotonically decreases the gradient. Lemma [, Lemma.4] The GM monotonically decreases the gradient norm, i.e., x f L fx fx. 3. The following theorem reviews a simple proof in [7] that provides a worst-case gradient norm bound forgmwithrateo/nthat issmallerthan3.8, where[,thm. 6.]additionallyconsidersLemma. Theorem [, Thm. 6.], [7] Let f : R d R be F L R d and let x 0,,x N R d be generated by GM. Then for any N, LR min fx i = fx N. 3. i {0,...,N} NN + Proof Lemma implies the first equality in 3.. Using 3., 3.7, 3. yields: LR 3. fx m fx 3.7 fx N+ fx + 4m+ L 3. N m+ L fx N, N fx i which is equivalent to 3. using m = N/ for which m N and N m N. Inspired by the conjecture in [3, Sec. 4..3], the following theorem shows that the O/N rate of the worst-case gradient norm bound 3. of GM is tight up to a constant. Theorem Let x 0,,x N R d be generated by GM. Then for any N, LR N + max f F LR d, x Xf, x 0 x R min fx i = i {0,...,N} where the inequality in 3.3 is achieved by the following function in F L R d : ψx = { LR N+ x LR N+, x R N+, i=m max f F LR d, x Xf, x 0 x R fx N, 3.3 L x, x < R N Proof Lemma implies the equality in 3.3. Starting from x 0 = Rν, where ν is a unit vector, the GM iterates are x i = i Rν, ψx i = LR ν, i = 0,...,N, N + N + which implies the inequality 3.3. We next show that the bound 3.0 for the gradient norm at the final iterate x N of OGM is tight and and that its worst-case function is a simple quadratic function. Note that OGM was derived by optimizing a worst-case bound on the cost function decrease and its behavior in terms of gradient norms was not investigated previously. Theorem 3 Let x 0,,x N R d be generated by OGM. Then for any N, max f F LR d, x Xf, x 0 x R min fx i = i {0,...,N} max fx N = LR f F LR d, θ N x Xf, x 0 x R LR, 3.5 N + where the worst-case function in F L R d for OGM in terms of the gradient norm is the quadratic function φx = L x. 6
7 Proof See Appendix A. Comparing 3. and 3.5, we see that GM and OGM have essentially similar worst-case gradient norm bounds. This is a dilemma because OGM is the fastest FSFOM in terms of the worst-case cost function bound, but is as slow as GM in terms of the worst-case gradient norm bound. Therefore, one of the main goals of this paper is to study optimizing the step coefficients of FSFOM using PEP with respect to the gradient norm in Sec We next discuss the specific FSFOM in [7] that decreases the gradient norm with a faster O/N.5 rate. 3.. FSFOM with rate O/N.5 for decreasing the gradient norm Searching for a FSFOM that decreases the gradient norm faster than the O/N rate of GM and OGM, Nesterov [7] among other variants of FGM [, ] considered performing FGM for the first m iterations, and GM for the remaining iterations. He showed that this method, which we denote FGM-m, satisfies a fast rate O/N.5 for decreasing the gradient norm. In [6,,7], FGM-m for m = N/ was used to solve dual problems. To pursue a faster worst-case rate in terms of the constant factor, we consider here another variant that performs OGM for the first m iterations and GM for the remaining iterations, which we denote OGM-m. Algorithm OGM-m Input: f F L R d, x 0 = y 0 R d, ϑ 0 =, m {,...,N }. For i = 0,...,m y i+ = x i L fx i + +4ϑ i, i m ϑ i+ = + +8ϑ i, i = m x i+ = y i+ + ϑ i ϑ i+ y i+ y i + ϑ i ϑ i+ y i+ x i For i = m,...,n x i+ = x i L fx i The following theorem bounds the gradient norm of the OGM-m iterates, inspired by the proof in [, 7] for the worst-case gradient norm bound of the FGM-m iterates. The worst-case bound of FGM-m in [,7] is asymptotically -times larger than the following new bound 3.6 for OGM-m. Theorem 4 Let f : R d R be F L R d and let x 0, x N R d be generated by OGM-m for m N. Then for any N, LR min fx i fx N i {0,...,N} m+ N m Proof Using 3.5, 3.7, 3. yields: LR ϑ m 3.5 fx m fx 3.7 fx N+ fx + L 3. N m+ L fx N, which is equivalent to 3.6 using ϑ m m+ that is implied by 3.6. N fx i The bound 3.6 is minimized at a point close to m = N/3, leading to its approximately smallest constant 3 6 with the rate O/N.5. Other variants of FGM having O/N.5 worst-case gradient bounds were derived in [,]. Such variations of FGM including FGM-m were derived since, prior to this paper, it was unknown whether or not FGM decreases the gradient norm with the rate O/N.5 ; this rate for the gradient norm of i=m 7
8 FGM was conjectured numerically in [3]. Sec. 5. below uses the PEP to show for the first time the rate O/N.5 for the gradient decrease of the FGM. The bound 3.6 of OGM-m for decreasing the gradient is smaller than the bounds for the FGM variants in [,], and Sec. 5.4 below shows that our proposed methods have worst-case bounds even lower than 3.6. The preceding sections have focused on tight or upper worst-case bounds of the gradient norm decrease of first-order methods, whereas the next section reviews a lower bound for the worst-case gradient norm decrease in [3], illustrating the best achievable worst-case rate of the gradient norm decrease for any first-order method with either fixed-step or adaptive-step approaches A lower bound of the worst-case rates of first-order methods for decreasing the gradient norm For completeness, this section reviews a lower bound on the worst-case rate of any first-order method in terms of the gradient norm values for smooth convex quadratic functions [3]. Lower bounds on the function value were studied for convex quadratic functions in [3], and for smooth convex functions in [7, 5]. When the condition d N +3 holds, a worst-case gradient norm bound of any first-order method generating x N after N iterations has rate O/N at best, for convex quadratic f, i.e., has the following lower bound [3, Sec..3.B]: LR 4e N + max fx N, 3.7 f Q LR d, x Xf, x 0 x R where Q L R d := { f : fx x Qx+p x+r for x Ê d, Q 0, Q L }. Since Q L R d F L R d, the lower bound 3.7 for convex quadratic functions also applies to smooth convex functions. A regularization technique in [7] achieves the rate O/N up to a logarithmic factor. However, its adaptive step coefficients require knowing R in advance which is undesirable in practice. To our knowledge, whether there exists any FSFOM satisfying such rate is an open question. Instead, this paper discusses a way to develop FSFOM that achieves an O/N.5 gradient norm bound with the smallest constant among known FSFOM. 4 Relaxation and optimization of the cost function form of PEP Thissectionreviewsarelaxationofthecostfunction formofpep[9]and reviewshow[9,5,6]optimized the step coefficients of the FSFOM class over the cost function form of PEP, leading to OGM. Then, we propose a parameterized family of algorithms that generalizes OGM, and analyze the worst-case cost function decrease of the generalized OGM family. 4. Review: Relaxation for the cost function form of PEP The worst-case bound on the cost function for a FSFOM having given step coefficients h := {h i+,k } corresponds to a solution of the following PEP problem [9, Prob. P]: B P h,n,d,l,r := max fx N fx f F LR d, x 0,,x N R d, x X f, x 0 x R s.t. x i+ = x i i h i+,k fx k, i = 0,...,N. L P Since problem P is impractical to solve due to its functional constraint f F L R d, [9] relaxed it by the following finite set of inequalities satisfied by f [5, Thm...5]: L fx i fx j fx i fx j fx j, x i x j 4. 8
9 for i,j = 0,,...,N,. Then, a matrix G = [g 0,,g N ] R N+ d and a vector δ = [δ 0,,δ N ] R N+ with g i := L x 0 x fx i, and δ i := L x 0 x fx i fx are introduced to represent gradient vectors and function values respectively in the set of 4.. This leads to a finite-dimensional relaxation of problem P [9, Prob. Q]: B P h,n,d,l,r := max G R N+ d, δ R N+ LR δ N P s.t. Tr { G A i,j hg } δ i δ j, i < j = 0,...,N, Tr { G B i,j hg } δ i δ j, j < i = 0,...,N, Tr { G C i G } δ i, i = 0,...,N, Tr { G D i hg+νu i G } δ i, i = 0,...,N, for any given unit vector ν R d, where u i = e i+ R N+ is the i+th standard basis vector. Note that Tr { G u i u j G} = g i, g j by definition. The matrices A i,j h,b i,j h,c i,d i h are defined as A i,j h := u i u j u i u j + j l=i+ B i,j h := u i u j u i u j i l=j+ C i := u iu i, D i h := u iu i + i j= l j h j,ku i u k +u ku i. h l,ku j u k +u ku j, l h l,ku j u k +u ku j, In [9, Prob. Q ], problem P is further relaxed by discarding some constraints to yield B P h,n,d,l,r := max G R N+ d, δ R N+ 4. LR δ N P s.t. Tr { G A i,i hg } δ i δ i, i =,...,N, Tr { G D i hg+νu i G} δ i, i = 0,...,N, for any given unit vector ν R d. We explicitly illustrate the relaxation from P to P because Sec. 5 uses a similar but different relaxation. Taylor et al. [3] avoided this step to analyze a tight worst-case bound of P under a large-scale condition d N + [3, Thm. 5]; however, this relaxation P facilitates the analysis in [9,5,6] and in this paper. Replacing max G,δ LR δ N by min G,δ { δ N } for convenience in P, the Lagrangian of the corresponding constrained minimization problem with dual variables λ = λ,,λ N R N and τ = τ 0,,τ N R N+ for the first and second constraint inequalities of P respectively becomes where N N LG,δ,λ,τ;h = δ N + λ i δ i δ i + τ i δ i 4.3 i= i= i=0 +Tr { G Sh,λ,τG+ντ G }, N N Sh,λ,τ := λ i A i,i h+ τ i D i h. 4.4 Then, we have the following dual problem of P that one could use to compute a valid upper bound of P using a semidefinite program SDP for given h [9, Prob. DQ ]: { Sh,λ,τ B D h,n,l,r := min λ,τ Λ, LR γ : τ } τ γ 0, D γ R i=0 9
10 where Λ = { λ,τ R N+ + : } τ 0 = λ, λ N +τ N =,. 4.5 λ i λ i+ +τ i = 0, i =,...,N, The next section reviews the analytical solution to this upper bound D for OGM, instead of using a numerical SDP solver. 4. Review: Optimizing step coefficients for the cost function form of PEP Drori and Teboulle [9] optimized numerically the step coefficients h over the simple SDP problem D as follows [9, Prob. BIL]: ĥ D := argmin h R NN+/ B D h,n,l,r. HD The problem HD is bilinear, and [9, Thm. 3] 4 used a convex relaxation technique to make it solvable by numerical methods. In [5, Lemma 4], we solved HD analytically yielding the optimized step coefficients θ h i+,k = i+ θ k i j=k+ h j,k, k = 0,...,i, θi θ i+, k = i, for θ i in 3.6. Fortuitously, the optimized coefficients 4.6 lead to equivalent computationally efficient OGM and OGM forms [5, Prop. 3, 4 and 5], and the bound 3.5 for the final secondary iterate x N of OGM is implied by [5, Lemma 4]. Recently, Drori [7] showed that the OGM is optimal for d N+, implying that optimizing over the relaxed bound D in HD for simplicity is equivalent to optimizing over the exact worst-case cost function bound P when d N +. One could use a SDP solver to compute a numerical bound from D for any FSFOM; however, deriving an analytical bound using D is difficult for the primary sequence {y i } of OGM. Therefore, we devised a new relaxed bound in [6] similar to D, which we review next. 4.3 Review: Another cost function form of relaxed PEP for the primary sequence of OGM An upper bound of the worst-case bound on fy N+ fx for FSFOM with step coefficients h and y N+ = x N L fx N could be computed using D by a SDP solver. However, we found it difficult to find its analytical worst-case bound for the primary sequence {y i } of OGM, so [6, Prob. D ] provided the following alternate upper bound on fy N+ fx : B D h,n,l,r := min λ,τ Λ, γ R { LR γ : Sh,λ,τ+ u Nu N τ τ γ } 0, D which led to the bound 3.4 for the primary sequence {y i } of OGM in [6]. Similar to [5, Lemma 4], we found a feasible point of D in [6, Lemma 3.], along with feasible step coefficients h of a FSFOM: t h i+,k = i+ t k i j=k+ h j,k, k = 0,...,i, ti t i+, k = i, for t i in 3.3. Then, [6, Thm. 3.] showed the bound 3.4 using [6, Lemma 3.]. The step coefficients 4.6 and 4.7 are identical except the final iteration, since t i 3.3 and θ i 3.6 are equivalent for i < N. We are now done reviewing the portions of papers [5,6] that are the ingredients for specifying a parameterized family of algorithms that generalizes OGM in the next two sections. 4 [9, Thm. 3] has typos that are fixed in [5, Eq. 6.3]. 0
11 4.4 Feasible points of D and D for the generalized OGM This section specifies feasible points of D and D that lead to a generalized version of OGM. Specifically, the following lemma presents additional feasible points of D; this lemma reduces to [5, Lemma 4] and the step coefficients 4.6 of OGM when θ i = Ω i for all i. Lemma For the following step coefficients: h i+,k = the choice of variables: θ i+ Ω i+ θ k i j=k+ h j,k + θi θi+, i = 0,...,N, k = 0,...,i, Ω i+, i = 0,...,N, k = i, γ = Ω τ N, i = 0, 0, λ i = Ω i τ 0, i =,...,N, τ i = θ i τ 0 i =,...,N, θ N τ 0, i = N, is a feasible point of D for any choice of θ i such that θ 0 =, θ i > 0, and θ i Ω i := { i l=0 θ l, i = 0,...,N, N l=0 θ l +θ N, i = N. 4.0 Proof See Appendix B. The followinglemma alsospecifies somefeasible points of D ; this lemma reducesto [6, Lemma3.] and 4.7 when t i = T i for all i. Lemma 3 For the following step coefficients: h i+,k = t i+ T i+ t k i j=k+ h j,k + ti ti+, i = 0,...,N, k = 0,...,i, T i+, i = 0,...,N, k = i, 4. the choice of variables: γ = τ 0, λ i = T i τ 0, i =,...,N, τ i = { T N, i = 0, t i τ 0 i =,...,N, 4. is a feasible point of D for any choice of t i such that Proof See Appendix C. t 0 =, t i > 0, and t i T i := i t l. 4.3 Similar to the relationship between the step coefficients 4.6 and 4.7, the step coefficients 4.8 and 4. are identical when θ i = t i for i < N except for the final iteration, implying that the iterates {x i } N i=0 of the two FSFOMs with 4.8 and 4. are equivalent; only the final iterate x N is different. The step coefficients 4.6 and 4.7 lead to computationally efficient equivalent OGM forms; similarly the next section provides computationally efficient generalized forms of OGM that each correspond to a FSFOM with either 4.8 or 4., and we analyze their cost function worst-case bounds. l=0
12 4.5 Generalized OGM This section proposes a generalized OGM using lemmas and 3. The FSFOM with the step coefficients 4.8 has the following two equivalent efficient generalized forms of OGM, named GOGM and GOGM, that reduce to the standard OGM when θ i = Ω i for all i. Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, θ 0 = Ω 0 =. For i = 0,...,N y i+ = x i L fx i Choose θ i+ > 0 s.t. θi+ Ω i+, where { i+ Ω i+ = l=0 θ l, i N N l=0 θ l +θ N, i = N x i+ = y i+ + Ω i θ i θ i+ θ i Ω i+ y i+ y i + θ i Ω iθ i+ θ i Ω i+ y i+ x i Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, θ 0 = Ω 0 =. For i = 0,...,N y i+ = x i L fx i z i+ = x 0 i θ k fx k L Choose θ i+ > 0 s.t. θi+ Ω i+, where { i+ Ω i+ = l=0 θ l, i N N l=0 θ l +θ N, i = N x i+ = θ i+ y i+ + θ i+ z i+ Ω i+ Ω i+ Proposition The sequence {x 0,,x N } generated by the FSFOM with 4.8 is identical to the corresponding sequence generated by GOGM and GOGM. Proof See Appendix D. Note that this proof is independent of the choice of θ i and Ω i. Because the proof of Prop. for the FSFOM with step coefficients 4.8 is independent of the choice of θ i and Ω i, it is straightforward to show that the FSFOM with step coefficients 4. has the following two efficient equivalent forms, named GOGM and GOGM, that reduce to [6, Alg. OGM and OGM ] when t i = T i for all i. Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, t 0 = T 0 =. For i = 0,...,N y i+ = x i L fx i Choose t i+ > 0 s.t. t i+ T i+ i+ = t l l=0 x i+ = y i+ + T i t i t i+ t i T i+ y i+ y i + t i T it i+ t i T i+ y i+ x i Algorithm GOGM Input: f F L R d, x 0 = y 0 R d, t 0 = T 0 =. For i = 0,...,N y i+ = x i L fx i z i+ = x 0 i t k fx k L Choose t i+ > 0 s.t. t i+ T i+ i+ = t l l=0 x i+ = t i+ y i+ + t i+ z i+ T i+ T i+ Clearly when θ i = t i for i < N, the primary iterates {y i } N i=0 and the intermediate secondary iterates {x i } N i=0 of GOGM and GOGM are equivalent. Although illustrating two similar algorithms GOGM and GOGM might seem redundant, presenting both formulations with lemmas and 3 completes the story of generalized OGM here and in Sec. 5. Using lemmas and 3, the following theorem bounds the cost function decrease of the GOGM and GOGM iterates.
13 Theorem 5 Let f : R d R be F L R d and let y 0,,y N,x N R d be generated by GOGM and GOGM. Then for any i N, fy i fx LR 4Ω i, 4.4 fx N fx LR Ω N. 4.5 The iterates y 0,,y N R d generated by GOGM and GOGM also satisfy the bound 4.4 when θ i = t i for i < N. Proof Using Lemma 3, the FSFOM with h in 4. of GOGM satisfies fy N+ fx B D h,n,l,r = LR γ = LR 4T N, 4.6 where y N+ = x N L fx N. Since the coefficients h in 4. are recursive and do not depend on a given N, we can extend 4.6 for all iterations. By letting θ i = t i for i < N, the bound 4.6 also satisfies for the iterates {y i } of GOGM, as in 4.4. Using Lemma, the FSFOM with the step h in 4.8 of GOGM satisfies which is equivalent to 4.5. fx N fx B D h,n,l,r = LR γ = LR Ω N, 4.7 GOGM and Thm. 5 reduce to OGM and its bounds 3.4 and 3.5, when{ θi = Ω i for all i. Similar i+a a to general forms of FGM in [3,6], the GOGM family includes the choice θ i =, i < N, N+a a, i = N for any a, because such parameter θ i satisfies the following conditions for GOGM: Ω i θ i = i+i+a a i+a a = a i +aa 3 a 0, 4.8 for i < N, and Ω N θ N = Ω N + θ N θ N θ N + θ N θ N = θ N 0. Similarly, the GOGM family includes the choice t i = i+a a for any a, which we denote as OGM-a. Corollary Let f : R d R be F L R d and let y 0,,y N R d be generated by GOGM with t i = i+a a OGM-a for any a. Then for any i N, fy i fx alr ii+a. 4.9 Proof Thm. 5 implies 4.9, since T i = i+i+a a any a. and the condition T i t i 0 satisfies in 4.8 for 5 Relaxation and optimization of the gradient form of PEP This sectionanalyzesaworst-casebound forthe gradientofanygogm and GOGM usingthe gradient form of PEP. We use relaxations on the gradient form of PEP that are similar but slightly different from those of PEP for the cost function in the previous section. Using this relaxed PEP, we prove that FGM has an O/N.5 rate for the worst-case gradient decrease, and analyze the worst-case gradient bound for the GOGM. Then, we optimize the step coefficients with respect to the gradient form of PEP and propose an algorithm named OGM-OG that lies in the GOGM family and that has the best known analytical worst-case bound for decreasing the gradient norm among the class FSFOM. 3
14 5. Relaxation for the gradient form of PEP To analyze a worst-case bound on the gradient for a FSFOM with a given h, we consider the following gradient-form version of PEP that is similar to P: B P h,n,d,l,r := max f F LR d, x 0,,x N R d, x X f, x 0 x R s.t. x i+ = x i L min fx i P i {0,...,N} i h i+,k fx k, i = 0,...,N. Here, we use the smallest gradient norm squared among all iterates min i {0,...,N} fx i as a criteria,asconsideredin[3,sec.4.3].wecouldinsteadconsiderthefinalgradientnormsquared fx N asacriteria,but ourproposedrelaxationonp inthis sectionforsuchcriteriaprovidedonlyano/n worst-case bound at best even for the corresponding optimized step coefficients results not shown; we leave studying the gradient form of the tight PEP as future work. As in [3], we replace min i {0,...,N} fx i in P by L x 0 x α with the condition α L x 0 x fx i = Tr { G u i u i G} for all i. Then, we relax this reformulated P similar to the relaxation from P to P with the additional constraint Tr { G C N G } δ N in P as follows: B P h,n,d,l,r := max G R N+d, δ R N+, α R L R α P s.t. Tr { G A i,i hg } δ i δ i, i =,...,N, Tr { G C N G } δ N, Tr { G D i hg+νu i G } δ i, i = 0,...,N, Tr { G u i u i G} α, i = 0,...,N. Replacing max G,δ L R α by min G,δ { α} for convenience, the Lagrangian of the corresponding constrainedminimizationproblemwithdualvariablesλ R N +,η R +,τ R N+ +,andβ = β 0,,β N R N+ + for the first, second, third, and fourth set of constraint inequalities of P respectively becomes where N N N L G,δ,λ,η,τ,β;h = α+ λ i δ i δ i ηδ N + τ i δ i + β i α 5. i= i=0 i=0 +Tr { G S h,λ,η,τ,βg+ντ G }, S h,λ,η,τ,β := Sh,λ,τ+ N ηu Nu N β i u i u i. 5. Then similar to D, we have the following dual problem of P that one could use to compute an upper bound of the PEP P of the smallest gradient norm squared among all iterates by a numerical SDP solver: S h,λ,η,τ,β τ where B D h,n,l,r := Λ := { min λ,η,τ,β Λ, L R γ : γ R { λ,η,τ,β R 3N+3 + : i=0 τ γ } 0, D N τ 0 = λ, λ N +τ N = η, i=0 β } i =,. 5.3 λ i λ i+ +τ i = 0, i =,...,N The next two sections use a valid upper bound D of P for given step coefficients h, providing an analytical solution to D for the step coefficients h of FGM and GOGM, superseding the use of a SDP solver. 4
15 5. A worst-case bound for the gradient norm of FGM FGM is equivalent to a FSFOM with the step coefficients [5, Prop. ]: t h i+,k = i+ t k i j=k+ h j,k, k = 0,...,i, + ti t i+, k = i, 5.4 for t i in 3.3. The following lemma provides a feasible point of D associated with the step coefficients 5.4 of FGM to provide a worst-case bound for the gradient of FGM. Lemma 4 For the step coefficients 5.4, the following choice of variables: γ = τ 0, λ i = t N, i τ 0, i =,...,N, τ i = k t i = 0, t i τ 0, i =,...,N, 5.5 η = t N τ 0, β i = t i τ 0, i = 0,...,N, 5.6 is a feasible point of D for t i in 3.3. Proof See Appendix E. Using Lemma 4, the following theorem bounds the gradient norm of the FGM iterates, proving for the first time an O/N.5 rate of decrease. Theorem 6 Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by FGM. Then for any N, where y N+ = x N L fx N. min fy i min fx i 5.7 i {0,...,N+} i {0,...,N} LR N t k 3LR N +N +6N +, Proof Lemma implies the first inequality in 5.7. Using Lemma 4, the FSFOM with the step coefficients h 5.4 of FGM satisfies min fx i B D h,n,l,r = i {0,...,N} L R γ = L R N, 5.8 t k which is equivalent to 5.7 using N t k N k+ 4 = N+N +3N A worst-case bound for the gradient norm of GOGM Having established the gradient bound 5.7 for FGM, this section and the next seek to improve on it by studying GOGM. To bound the gradient decrease of GOGM and GOGM, the following lemma illustrates one possible set of feasible points of D. Lemma 5 For the step coefficients 4., the following choice of variables: γ = N τ 0, λ i = T i τ 0, i =,...,N, τ i = Tk tk, i = 0, t i τ 0, i =,...,N, 5.9 η = T N τ 0, β i = T i t i τ0, i = 0,...,N 5.0 is a feasible point of D for any choice of t i and T i that satisfies 4.3 and for which there exists some i such that t i < T i. 5
16 Proof See Appendix F. Using Lemma 5, the following theorem bounds the worst-case gradient norm for the iterates of GOGM and GOGM. Theorem 7 Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by GOGM. Then for any N, min fy i min fx i i {0,...,N+} i {0,...,N} LR N, 5. T k t k where y N+ = x N L fx N. The bound 5. can be generalized to the intermediate iterates {x i } N i=0 and {y i } N i=0 of both GOGM and GOGM when θ i = t i for i < N. Proof Lemma implies the first inequality in 5.. Using Lemma 5, FSFOM with the step coefficients h 4. of GOGM satisfies min fx i B D h,n,l,r = i {0,...,N} L R L R γ = 4 N T k t k, which implies 5.. Since the iterates of GOGM are recursive and do not depend on a given N, the bound 5. easily generalizes to the intermediate iterates of GOGM and GOGM when θ i = t i. 5.4 Optimizing step coefficients over the gradient form of PEP In search of a FSFOM that decreases the gradient norm the fastest, we optimize the step coefficients in terms of the gradient form of the relaxed D by solving the following problem: ĥ D := argmin h R NN+/ B D h,n,l,r. HD The problem HD is bilinear, similar to HD, and a convex relaxation technique [9, Thm. 3] makes this problem solvable using numerical methods. We solved HD for many choices of N using a numerical SDP solver [4,], and observed that the following choice of t i :, i = 0, t i = + +4t i, i =,..., N/, N i+, i = N/,...,N, 5. makes the feasible point in Lemma 5 optimal for the problem HD. Based on that numerical evidence, we conjecture that ĥd in HD corresponds to the step coefficients 4. with the parameter t i 5.. The t i factors in 5. start decreasing after i = N/, whereas the usual t i in 3.3 and t i = i+a a for any a increase with i indefinitely. In addition, we found numerically that minimizing the gradient bound 5. of GOGM, i.e., solving the following constrained quadratic problem: k max {t i} N l=0 t l t k s.t. t i satisfies 4.3 for all i, 5.3 is equivalent to solving the problem HD. In other words, the solution of 5.3 numerically appears equivalent to 5., the conjectured solution of HD. The unconstrained maximizer of the cost function of 5.3 is t i = N i+, and this term partially appears in the constrained maximizer 5. for N/ i N. We denote the resulting GOGM with 5. as OGM-OG OG for optimized over gradient. The following theorem bounds the cost function and gradient norm of the OGM-OG iterates. 6
17 Theorem 8 Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by OGM-OG. Then, where y N+ = x N L fx N. fy N+ fx LR N + min fy i min fx i i {0,...,N+} i {0,...,N}, 5.4 6LR N N +, 5.5 Proof OGM-OG is an instance of GOGM and thus Thm. 5 implies that OGM-OG satisfies which is equivalent to 5.4, since T N = T m + fy N+ fx LR 4T N, N l=m+ t l = t m N +3 +NN N mn m+ 4 = N +8N +9 6 for m = N/, using m N, N m N, and t m m+ N+3 4 in 3.3. Thm. 7 implies 5.5, using the above inequalities for m, the equality t i = T i for i m, and N k=m+ =N mt m + Tk t k = N N k=m+ N m =N mt m + k = k=m+ k k l=m+ l = t m + k l=m+ N l+ N l m+ =N mt m N m N m+k kk k= =N mt m + N m k= k t l t k N k + N k m+ N m+ N m+k +k 4 N m+ +N m+3/4k 4 =N mt N mn m+/n m+ m 6 N mn m+3/4n m+ N mn m+ + 4 N mm+ + N m N m+ N mn m N m N m+ 3 4 N N +. The gradient bound 5.5 of OGM-OG is asymptotically -times smaller than that of FGM in Thm. 6 and.5-times smaller than that of OGM-m = N/3 in Thm. 4. Regarding the cost function decrease, the bound 5.4 of OGM-OG is asymptotically the same as the bound 3. of FGM, and both are twice larger than the bounds 3.4 and 3.5 of OGM. 7
18 5.5 Decreasing the gradient norm with rate O/N.5 using GOGM without selecting N in advance Although OGM-OG satisfies a small worst-case gradient bound with a rate O/N.5, OGM-OG and FGM-m and OGM-m must select N in advance, unlike FGM. Using Thm. 7, the following corollary shows that OGM-a with a > can decrease the gradient with a rate O/N.5 without selecting N in advance. Cor. showed that OGM-a algorithm with a can decrease the cost function with an optimal rate O/N. Corollary Let f : R d R be F L R d and let y 0,,y N,x 0,,x N R d be generated by GOGM with t i = i+a a OGM-a for any a. Then for N, where y N+ = x N L fx N. min fy i min fx i 5.6 i {0,...,N+} i {0,...,N} a 6LR NN +a N +3a 4a, Proof Using T i = i+i+a a and 4.8, Thm. 7 implies 5.6 since N N T k t k = a k +aa 3k a = NN + a N +3a 4a 6a. a OGM-a forany a > hasagradientbound 5.6 that is about -times largerthan the bound 5.5 a of OGM-OG. This constant factor minimizes to when a = 4, and this OGM-a=4 has a worst-case gradient bound that is asymptotically equivalent to the bound 5.7 of FGM. Therefore, when one does not want to select N in advance, both FGM and OGM-a=4 and OGM-a for any a > will be useful for decreasing the gradient with a rate O/N.5. 6 Discussion This section summarizes analytical worst-case bounds of FSFOM discussed in the previous sections. This section also reports tight numerical worst-case bounds for exact comparison of algorithms because many of the analytical bounds are not guaranteed to be tight. 6. Summary of analytical worst-case bounds on the cost function and gradient norm Table summarizes the asymptotic rate of analytical worst-case bounds of all algorithms described in this paper. As discussed, OGM and OGM-OG have the best known worst-case bounds for the cost function and gradient decrease respectively in Table. However, since OGM has a slow worst-case rate for the gradient decrease, other algorithms such as FGM, OGM-m, OGM-OG, and OGM-a that satisfy both the optimal rate O/N for the function decrease and a fast rate O/N.5 for the gradient decrease could be preferableoverogm when one is interested in both the gradientdecrease as well as the function decrease, particularly when solving dual problems. In addition, when one does not want to choose N in advance, FGM and OGM-a could be preferable. 6. Tight worst-case bounds on the cost function and the gradient norm Since many worst-case bounds presented in Table are not guaranteed to be tight, we used the code in Taylor et al. [3] with SDP solvers [0,8] to compare tight numerical worst-case bounds for N =,,4,0,0,30,40,47,50. These numerical worst-case bounds are guaranteed to be tight, i.e., equivalent to the bounds of either P or P, when the large-scale condition d N + is satisfied [3, Thm. 5], and we assume this condition hereafter. Tables and 3 provide tight worst-case bounds for 8
19 Asymptotic worst-case bound Require selecting Algorithm Cost function Gradient norm N in advance GM 4 N N No FGM N 3N.5 No OGM N N No 9 OGM-m= N/3 4 N 3 6 N.5 Yes OGM-OG N 6N.5 Yes a OGM-a a > N a 6 a N.5 No OGM-a=4 N 3N.5 Table Asymptotic worst-case bounds on the cost function LR fx N fx and the gradient norm min i {0,...,N} LR fx i of GM, FGM, OGM, OGM-m, OGM-OG, and OGM-a. The worst-case cost function bound for OGM-m in the table corresponds to the bound for OGM after m iterations, because we do not have an analytical bound for the final iterate. the decrease of the cost function fx N fx and the gradient norm decrease min i {0,...,N} fx i respectively. Although most of the bounds in Table are not guaranteed to be tight, the worst-case rate formulas in Table are similar to the tight numerical results in Tables and 3, except that the gradient bounds of OGM-m in Table are relatively looser than those of OGM-m in Table 3. In particular, the tight numerical gradient bound of OGM-m is smaller than that of FGM in Table 3, which was not expected from their known possibly loose analytical bounds in Table. N GM FGM OGM OGM-m OGM-OG OGM-a Empi. O N.0 N.9 N.9 N.8 N.8 N.8 Known O N N N N N N LR Table Tight worst-case bounds on, the reciprocal of the cost function, of GM, FGM, OGM, OGMm= N/3, OGM-OG, and OGM-a=4. We computed empirical rates by assuming that the bounds follow the form bn c fx N fx with constants b and c, and then by estimating c from points N = 47,50. Note that the corresponding empirical rates are underestimated due to its simple modeling of bounds. N GM FGM OGM OGM-m OGM-OG OGM-a Empi. O N.0 N.4 N 0.9 N.4 N.4 N.4 Known O N N.5 N N.5 N.5 N.5 LR Table 3 Tight worst-case bounds on, the reciprocal of the gradient norm, of GM, FGM, OGM, min i {0,...,N} fx i OGM-m= N/3, OGM-OG, and OGM-a=4. Empirical rates were computed as described in Table. 9
20 6.3 Tight worst-case bounds on the gradient norm at the final iterate To be clear, Sec. 5 and Tables, 3 have focused on analyzing the smallest gradient norm among all iterates using the gradient form of PEP, whereas the gradient analysis in Sec. 3. considers the final gradient in addition to the smallest gradient among all iterates. As mentioned before, we have not yet found a relaxation on the final gradient form of the PEP that provides as comparable results as for the relaxation on the smallest gradient form of the PEP P in Sec. 5. To complete comparisons on the worst-case gradient bounds, Table 4 uses the code provided by Taylor et al. [3] with SDP solvers [0, 8] to compare tight numerical worst-case bounds on the final gradient of the FSFOMs presented in this paper. 5 The worst-case smallest gradient norm bounds 3. and 3.6 of GM and OGM-m respectively among algorithms considered extend to the final gradient bounds. In Table 4, FGM and OGM-a=4 have slow O/N tight worst-case bounds on the final gradient, unlikeogm-m= N/3 andogm-ogroughlyhavingo/n.5 boundsforboth the smallestand final gradients. Thm. 4 has shown that the final gradient of OGM-m satisfies a worst-case rate O/N.5, but this is unknown yet for OGM-OG, which we leave as future work. We also leave as future work the challenge of developing a FSFOM that has O/N.5 or even faster worst-case rates for the final gradient decrease that are lower than those of OGM-m and OGM-OG, possibly without requiring to choose N in advance. N GM FGM OGM OGM OGM OGM-a y N x N y N x N -m -OG y N x N Empi. O N.0 N 0.9 N 0.9 N.4 N.3 N 0.9 Known O N N N N.5 N N LR LR and fx N fy N Table 4 Tight worst-case bounds on, the reciprocal of the final gradient norm, of GM, FGM, OGM, OGM-m= N/3, OGM-OG, and OGM-a=4. Empirical rates were computed as described in Table. The known bounds of OGM-OG and OGM-a=4 are derived based on Sec. 3., where the empirical bounds of OGM-OG are comparable to the bounds of OGM-m= N/3 with known rate O/N Non-optimality of OGM-OG in terms of the worst-case gradient bound Because OGM is optimal in terms of the function decrease when d N + [7], one might hope that the OGM-OG would achieve the optimal worst-case bound in terms of the gradient decrease, since OGM-OG is also derived by optimizing the step coefficients over the gradient form of relaxed PEP. However, the OGM-OG is apparently not optimal as explained next. Taylor et al. [3] numerically studied an optimal fixed-step GM using their tight PEP in terms of both the cost function and gradient decrease. In other words, they searched for an optimal step h of GM: x i+ = x i h L fx i for i = 0,...,N and a given N with respect to either fx N fx or fx N. In the special case of N =, Taylor et al. [3] numerically conjectured that the step size h =.5 is optimal in terms of the 5 Table 4 reports tight worst-case gradient bounds for both the final primary iterate y N and the final secondary iterate x N if necessary, unlike Tables and 3. We observed that numerical tight worst-case cost function bounds on both final iterates y N and x N have similar values for the algorithms in Table unlike Table 4, so we did not report the bounds on y N for simplicity. We also did not report numerical tight smallest worst-case gradient norm bounds min i {0,...,N} fy i of the primary iterates {y i } because the code provided by Taylor et al. [3] does not support computing their values, unlike that of the secondary iterates {x i } in Table 3. 0
21 cost function decrease. The corresponding GM is equivalent to OGM for N =, and this numerically confirms the optimality of OGM [7] for N =. They also numerically conjectured that the optimal step size of GM for N = in terms of the gradient decrease is h = with a worst-case bound fx LR + LR However, OGM-OG for N = reduces to GM with h = 4 LR 3.3 with a bound.3 in Tables 3 and 4, implying that OGM-OG is not optimal even for N = based on the numerical evidence in [3]. This analysis for N = illustrates that there is still room for improvement in accelerating the worstcase rate of first-order methods in terms of gradients, which we leave as future work possibly with a tighter relaxation on the gradient form of PEP. In addition, we leave as future work studying the optimal worst-case bound for the gradient decrease of first-order methods building upon [7, 3], and developing a FSFOM that achieves such optimal bound. Nevertheless, the OGM-OG is the best known FSFOM for decreasing the gradient norm among the class FSFOM, and will be useful when decreasing the gradient is key. 7 Conclusion We generalized the formulation of OGM and analyzed its worst-case bounds on the function value and gradient, using the cost function form and the gradient form of relaxed PEP. We then proposed OGM-OG by optimizing the step coefficients of FSFOM using a relaxed PEP with respect to the gradient, similar to the development of the optimal OGM. To the best of our knowledge, the worst-case bound on the gradient of the OGM-OG is the best known analytical worst-case bound for decreasing the gradient norm among the class FSFOM. However, this OGM-OG is not optimal for decreasing the gradient norm, and further accelerating the worst-case rate of FSFOM in terms of the gradient possibly with a tight relaxation on the gradient form of PEP is a possible research direction. On the other hand, deriving an optimal worst-case bound for the gradient norm of first-order methods, similar to that for the function decrease [7] will be useful. Nonetheless, the proposed OGM-OG and OGM-a may be useful when one finds minimizing gradients important, particularly in dual problems. In addition, we used the proposed gradient form of PEP to show that FGM decreases the smallest gradient with a rate O/N.5, implying that FGM is comparable in a big-o sense to OGM-m, OGM-OG and OGM-a for the gradient decrease. Our analysis considers unconstrained smooth convex minimization; extending such gradient norm worst-case analysis to constrained problems or nonsmooth composite convex problems is a natural direction to pursue, which is studied for FGM or FISTA [] by the authors [7]. In addition, extending the analyses on the general form of FGM in [,3,9] to GOGM is a possible research direction. Lastly, investigating a new relaxation of the PEP approach that allows adaptive step size such as backtracking line-search or exact line-search [5, 8] is of interest. Software has Matlab codes for the algorithms considered and the SDP approaches in Sec. 5.4 and Sec. 6. A Proof of Thm. 3 Due to 3.0, we have max f F L R d, x Xf, x 0 x R min fx i i {0,...,N} max f F L R d, x Xf, x 0 x R fx N LR θ N,
arxiv: v5 [math.oc] 23 Jan 2018
Another look at the fast iterative shrinkage/thresholding algorithm FISTA Donghwan Kim Jeffrey A. Fessler arxiv:608.086v5 [math.oc] Jan 08 Date of current version: January 4, 08 Abstract This paper provides
More informationOptimized first-order methods for smooth convex minimization
Noname manuscript No. will be inserted by the editor) Optimized first-order methods for smooth convex minimization Donghwan Kim Jeffrey A. Fessler Date of current version: September 9, 05 Abstract We introduce
More informationA New Look at the Performance Analysis of First-Order Methods
A New Look at the Performance Analysis of First-Order Methods Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Yoel Drori, Google s R&D Center, Tel Aviv Optimization without
More informationOptimized first-order minimization methods
Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationAdaptive Restart of the Optimized Gradient Method for Convex Optimization
J Optim Theory Appl (2018) 178:240 263 https://doi.org/10.1007/s10957-018-1287-4 Adaptive Restart of the Optimized Gradient Method for Convex Optimization Donghwan Kim 1 Jeffrey A. Fessler 1 Received:
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationarxiv: v2 [math.oc] 28 Nov 2017
Adaptive Restart of the Optimized Gradient Method for Convex Optimization Donghwan Kim Jeffrey A. Fessler arxiv:703.0464v [math.oc] 8 Nov 07 Date of current version: November 9, 07 Abstract First-order
More informationPerformance of first-order methods for smooth convex minimization: a novel approach
Performance of first-order methods for smooth convex minimization: a novel approach arxiv:206.3209v [math.oc] 4 Jun 202 Yoel Drori and Marc Teboulle June 5, 202 Abstract We introduce a novel approach for
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationChapter 8 Optimization basics
Chapter 8 Optimization basics Contents (class version) 8.0 Introduction........................................ 8.2 8.1 Preconditioned gradient descent (PGD) for LS..................... 8.3 Tool: Matrix
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationEE 227A: Convex Optimization and Applications October 14, 2008
EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider
More informationarxiv: v2 [math.oc] 21 Nov 2017
Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationAdaptive Restarting for First Order Optimization Methods
Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationBlock Coordinate Descent for Regularized Multi-convex Optimization
Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationA General Framework for Convex Relaxation of Polynomial Optimization Problems over Cones
Research Reports on Mathematical and Computing Sciences Series B : Operations Research Department of Mathematical and Computing Sciences Tokyo Institute of Technology 2-12-1 Oh-Okayama, Meguro-ku, Tokyo
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More information4. Algebra and Duality
4-1 Algebra and Duality P. Parrilo and S. Lall, CDC 2003 2003.12.07.01 4. Algebra and Duality Example: non-convex polynomial optimization Weak duality and duality gap The dual is not intrinsic The cone
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationRelaxed linearized algorithms for faster X-ray CT image reconstruction
Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationLecture 9 Sequential unconstrained minimization
S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities
More informationProjection methods to solve SDP
Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationarxiv: v1 [math.oc] 21 Apr 2016
Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and
More informationProximal-like contraction methods for monotone variational inequalities in a unified framework
Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationLIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS
LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationDeterminant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University
Determinant maximization with linear matrix inequality constraints S. Boyd, L. Vandenberghe, S.-P. Wu Information Systems Laboratory Stanford University SCCM Seminar 5 February 1996 1 MAXDET problem denition
More informationSolving large Semidefinite Programs - Part 1 and 2
Solving large Semidefinite Programs - Part 1 and 2 Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Singapore workshop 2006 p.1/34 Overview Limits of Interior
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationResearch Reports on Mathematical and Computing Sciences
ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationConvexification of Mixed-Integer Quadratically Constrained Quadratic Programs
Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs Laura Galli 1 Adam N. Letchford 2 Lancaster, April 2011 1 DEIS, University of Bologna, Italy 2 Department of Management Science,
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationCONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS
CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLecture 7 Monotonicity. September 21, 2008
Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationInderjit Dhillon The University of Texas at Austin
Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationConstrained optimization: direct methods (cont.)
Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a
More informationGeometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as
Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationUnconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationAn inexact subgradient algorithm for Equilibrium Problems
Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationMultipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations
Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationLagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems
Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More information