FINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING. 1. Introduction. We study the nonlinear programming problem
|
|
- Philip Floyd
- 6 years ago
- Views:
Transcription
1 FINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING CLÓVIS C. GONZAGA AND ELIZABETH W. KARAS Abstract. We modify the first order algorithm for convex programming described by Nesterov in his book [5]. In his algorithms, Nesterov makes explicit use of a Lipschitz constant L for the function gradient, which is either assumed known [5], or is estimated by an adaptive procedure [7]. We eliminate the use of L at the cost of an extra imprecise line search, and obtain an algorithm which keeps the optimal complexity properties and also inherit the global convergence properties of the steepest descent method for general continuously differentiable optimization. Besides this, we develop an adaptive procedure for estimating a strong convexity constant for the function. Numerical tests for a limited set of toy problems show an improvement in performance when compared with the original Nesterov s algorithms. Key words. Unconstrained convex problems Optimal method AMS subject classifications. 62K05, 65K05, 68Q25, 90C25, 90C60 1. Introduction. We study the nonlinear programming problem P ) minimize fx) subject to x IR n, where f : IR n IR is convex and continuously differentiable, with a Lipschitz constant L > 0 for the gradient and a convexity parameter µ 0. It means that for all x, y IR n, 1.1) and 1.2) fx) fy) L x y fx) fy) + fy) T x y) µ x y 2. If µ > 0, the function is said to be strongly convex. Note that µ L. Assume that the problem admits an optimal solution x not necessarily unique) and consider an algorithm which generates a sequence {x k }, starting at a given point x 0 IR n. The complexity results here will consist in establishing bounds for the number of iterations k which ensure fx k ) fx ) ε, for a given precision ε > 0. The bounds will depend on x 0 x, on the Lipschitz constant L and on the strong convexity constant µ if available). No method for solving P) based on accumulated first order information can achieve an iteration-complexity bound of less than O1/ ε) [4, 13]. The steepest descent method with constant step lengths cannot achieve a complexity better than O1/ε) [5]. Results for higher order methods are discussed in Nesterov and Polyak [9]. Department of Mathematics, Federal University of Santa Catarina. Cx. Postal 52, Florianópolis, SC, Brazil; clovis@mtm.ufsc.br. Research done while visiting LIMOS Université Blaise Pascal, Clermont- Ferrand, France. This author is supported by CNPq and PRONEX - Optimization. Department of Mathematics, Federal University of Paraná. Cx. Postal 19081, Curitiba, PR, Brazil; ewkaras@ufpr.br, ewkaras@gmail.com. Research done while visiting the Tokyo Institute of Technology, Global Edge Institute. Supported by CNPq, PRONEX - Optimization and MEXT s program. 1
2 2 C. GONZAGA AND E. KARAS Nesterov [5] proves that with insignificant increase in computational effort per iteration, the computational complexity of the steepest descent method is lowered to the optimal value O1/ ε) for general smooth convex functions. For strongly convex functions, Nesterov s methods are linearly convergent, with a complexity bound of L/µ log1/ε) see [5, Thm 2.2.2]). For general convex functions, the bound obtained in this theorem is 1.3) fx k ) fx ) 4L x 0 x k + 2) 2 4L x 0 x k 2. These results were extended to more general convex problems, including problems with so-called simple constraints see, for instance [2, 6, 7, 8]). In the special case of minimizing a strictly convex quadratic function, conjugate direction methods with exact line searches find the optimal solution in no more than n iterations and satisfy see [11, p. 170]): 1.4) fx k ) fx ) L x 0 x 22k + 1) 2 L x 0 x 8k 2. Note that this bound is independent of n, and is about 32 times better than the bound 1.3) obtained by Nesterov for general convex functions. Nemirovsky and Yudin [4, Sec 7.2.2] show that no first order method can be more than twice better than conjugate gradients for strongly convex quadratic problems. For non-quadratic convex functions, the same reference [4, Sec 8.2.2] shows that both Fletcher-Reeves and Polak-Ribière versions of conjugate gradient methods are not optimal. This is done by constructing examples in which, even with exact line searches, both methods need a multiple of 1/ε iterations to achieve an ε accuracy. Our paper does a fine tuning of Nesterov s algorithm [5, Scheme 2.2.6]. His method relies heavily on the usage of a Lipschitz constant L, which is either given or adaptively estimated as in Nesterov [7] at the cost of extra gradient computations. The methods achieve optimal complexity and when L is given this is done with the smallest possible computational time per iteration: only one gradient computation and no function evaluations besides the ones used in the line search, if it is done at all). We trade this economy in function evaluations for an extra inexact line search, eliminating the usage of L and improving the efficiency of each iteration, while keeping the optimal complexity in number of iterations. Nesterov s algorithm takes advantage of a strong convexity parameter µ > 0 whenever one is known. Our best algorithm, presented in Section 4, uses a decreasing sequence of estimates for the parameter µ. Remark. Estimations of the Lipschitz constant were used by Nesterov [7]. In another context constrained minimization of non-smooth homogeneous convex functions Richtárik [12] uses an adaptive scheme for guessing bounds for a distance between a point x 0 and an optimal solution x. This bound determines the number of subgradient steps needed to obtain a desired precision. We found no method for guessing the strong convexity constant µ in the literature. Structure of the method. A detailed explanation of the method is in Nesterov s book [5, Sec ]. Here we give a rough description of its main features. It uses local properties of f ) the steepest descent directions) and global properties of convex functions, by generating a sequence functional upper bounds which are locally imposed on the epigraph of f ) at each iteration. The functional upper bounds are simple distance functions with the
3 FINE TUNING NESTEROV S ALGORITHM 3 shape 1.5) φ k x) = φ k + γ k 2 x v k 2, where φ k IR, γ k > 0 and v k IR n. Note that these functions are strictly convex with minimum φ k at v k. At start, v 0 = x 0 is a given point and γ 0 > 0 is given. The method will compute a sequence of points x k, a sequence of second order constants γ k µ such that γ k µ and a sequence of functions φ k ) as above. The construction is such that at each iteration φ k fx k) in the present paper, we require φ k = fx k)). Hence any optimal solution x satisfies fx ) φ k φ kx k ), and so we can think of φ k ) as an upper bound imposed to the epigraph of f ). As k increases, the functions φ k become flatter. This is shown in Fig f φ k f φ k φ k * fx k ) φ k+1 v k x k x x Fig The mechanics of the algorithm. Nesterov shows how to construct these functions so that, defining the following relation holds: λ k = γ k µ γ 0 µ, fx k ) φ k ) λ k φ 0 ) + 1 λ k )f ). With this property, an immediate examination of an optimal solution results in fx k ) φ k x ) λ k φ 0 x ) + 1 λ k )fx ) = fx ) + λ k φ 0 x ) fx )), and we see that fx k ) fx ) with the speed with which λ k 0. Nesterov s method ensures λ k = O1/k 2 ), and this means optimal complexity: an error of ε > 0 is achieved in O1/ ε) iterations [5, Chap. 2]. In the present paper we shall not use these parameters λ k, dealing directly with the sequences γ k ). Moreover, the complete complexity proof made in Section 4 is for a modified algorithm based on the construction of an adaptive sequence of strong convexity parameter estimates µ k µ such that µ k µ. The practical efficiency of the method depends on how much each iteration explores the local properties around each iterate x k to push λ k down. Of course, the worst case complexity cannot be improved, but the speed can be improved in practical problems. This is the goal of this paper. Structure of the paper. In Section 2 we present a prototype algorithm, its main variants and we describe all variables, functions and parameters needed in the analysis.
4 4 C. GONZAGA AND E. KARAS Section 3 discusses some details on the implementation and on simplifications of the methods in several different environments. Section 4 describes the modified algorithm based on adaptive estimates of the strong convexity parameter and does a complete complexity analysis, proving that Nesterov s complexity bound is preserved. Finally, Section 5 compares numerically the performance of the algorithms. The numerical results show that our modifications improve the performance of Nesterov s algorithms in most examples in a set of toy problems, mainly when no good bound for L is known. 2. Description of the main algorithms. In this section we present a prototype algorithm which includes the choice of two parameters α k and θ k ) in each iteration: different choices of these parameters give different algorithms The algorithm. Here we state the prototype algorithm. At this moment it is not possible to understand the meaning of each procedure: this will be the subject of the rest of this section. Consider the problem P ). Here we assume that the strong convexity parameter µ is given possibly null). In Section 4 we modify the algorithm to use an adaptive decreasing sequence of estimates for µ. Algorithm 2.1. Prototype algorithm. Data: x 0 IR n, v 0 = x 0, γ 0 > µ. k = 0 repeat d k = v k x k. Choose θ k [0, 1]. y k = x k + θ k d k. If fy k ) = 0, then stop with y k as an optimal solution. Steepest descent step: x k+1 = y k ν fy k ). Choose α k 0, 1]. γ k+1 = 1 α k )γ k + α k µ. v k+1 = 1 γ k+1 1 α k )γ k v k + α k µy k fy k ))). k = k + 1. A complete proof of the optimal complexity will be done in Section 4, for an algorithm based on adaptive estimates of µ. The proof for a fixed µ is simpler and follows from that one. The variables associated with each iteration of the algorithm are: x k, v k IR n, with x 0 given and v 0 = x 0. α k [0, 1). γ k IR ++ : second order constants. γ 0 > µ is given. If L is available, then Nesterov [5, Thm ] suggests the choice γ 0 = L. y k IR n : a point in the segment joining x k and v k y k = x k + θ k v k x k ). It is from y k that a steepest descent step is computed in each iteration to obtain x k+1.
5 FINE TUNING NESTEROV S ALGORITHM 5 x k+1 = y k ν fy k ): next iterate, computed by any efficient line search procedure. We know that if L is known, then the step length ν = 1/L ensures that fy k ) fx k+1 ) fy k ) 2 /2L). We require that the line search used must be at least as good as this, whenever L is known. An Armijo search with well chosen parameters is usually the best choice, and even if L is not known it ensures a decrease of at least fy k ) 2 /4L), which is also acceptable for the complexity analysis. Hence we shall assume that at all iterations 2.1) fx k+1 ) fy k ) 1 4L fy k) 2. The geometry of one iteration of the algorithm is shown in Fig. 2.1 left): y k is between x k and v k, and x k+1 is obtained by a steepest descent step from y k. The directions x k+1 y k ) and v k+1 v k ) are both collinear with fy k ). φ k f x k y k x k+1 φ k+1 v k v k+1 l y k x k+1 v k+1 Fig One iteration of the algorithm Choice of the parameters: different variants of the algorithm stem from different choices of the parameters α k and θ k. The study of efficient choices will be the main task of this paper. The choice made by Nesterov and by us will be described in Sec. 2.3, after a technical section on the construction of the functions φ k ) Construction of the functions φ k ).. Each iteration starts with the simple quadratic function φ k ) as defined in 1.5) we call simple a quadratic function whose Hessian has the shape ti, with t 0). A new simple quadratic φ k+1 ) is constructed as a convex combination of φ k ) and a quadratic approximation l ) of f ) around y k. This is shown in Fig. 2.1 right) for a one-dimensional problem. The choice of this combination by Nesterov is so that the minimum value φ k+1 of φ k+1 ) satisfies φ k+1 fx k+1) fx ). We shall require that φ k+1 = fx k+1). Now we quote Nesterov [5] to describe the construction of the functions φ k ). We start from given x k, v k, γ k, φ k ). At this point we assume that y k is also given its calculation will be one of the tasks of this paper). Assume also that x k+1 has been computed and that 2.1) holds. Once y k and x k+1 are given, we must compute α k. Here we change the notation to include α as a variable and define the function α IR, x IR n φ k α, x).
6 6 C. GONZAGA AND E. KARAS The function φ k+1, ) is an affine combination of φ k ) and a quadratic approximation l k ) of f ) around y k : 2.2) φ k+1 α, x) = 1 α)φ k x) + αl k x), where l k x) = fy k ) + fy k ) T x y k ) + µ 2 x y k 2. Remark.. In this subsection we shall not require l k ) to be a lower quadratic approximation, i.e., µ > 0 may not be a strong convexity constant. If µ > 0, then l k ) is minimized at ˆx = y k fy k), with µ 2.3) min l kx) = l k ˆx) = fy k ) fy k) 2, x IR n 2µ and l k ) may be expressed as l k x) = l k ˆx) + µ 2 x ˆx 2. We shall assume that l k ) is not constant, otherwise y k would be an optimal solution for the problem P ). Define for α IR, 2.4) φ k+1α) = inf φ k+1α, x) IR { } x IR n and 2.5) γ k+1 α) = 1 α)γ k + αµ. The function φ k+1 α, ) is also a simple quadratic function, with Hessian γ k+1 α)i. If γ k+1 α) > 0, then φ k+1 α) is finite. If γ k+1α) < 0, then φ k+1 α) =. The fact below follows trivially from 2.5): Fact 2.2. For γ k > µ 0, {α IR γ k+1 α) > 0} =, α max ), with α max = γ k /γ k µ). So φ k+1 α) is well defined for any α < α max and φ k+1 α) = for α > α max because in this case γ k+1 α) < 0. Here we shall look at what happens for α = α max and isolate a simple situation where φ k and l k have coincident minimizers. Lemma 2.3. Consider γ k > µ and α max = γ k /γ k µ). Then, one and only one of the following situations occurs: i) v k is a minimizer of l k and φ k+1 is linear on, α max] with φ k+1α) = 1 α)φ k + αl k v k ). ii) φ k+1 α max) =. Proof. Let us assume that inf x IR n φ k+1 α max, x) is finite. By construction, φ k+1 α max, ) is linear, because γ + α max ) = 0. Thus, φ k+1 α max, ) = 1 α max )φ k ) + α max l k ) is constant. If µ = 0, then α max = 1, l k ) is constant and obviously v k is a minimizer of l k.
7 FINE TUNING NESTEROV S ALGORITHM 7 On the other hand, if µ > 0, we can write l k x) = l k ˆx) + µ 2 x ˆx 2, where ˆx is the minimizer of l k ). Differentiating the expression for φ k+1 α max, ) in relation to x, we have for any x IR n, 1 α max )γ k x v k ) + α max µx ˆx) = 0. If, by contradiction, v k ˆx, then substituting x = v k, we get α max µ = 0, which cannot be true, proving that v k = ˆx. So, in any case, v k is a minimizer of both φ k ) and l k ), and for any α, α max ], φ k+1 α) = 1 α)φ k + αl kv k ), which is linear, completing the proof. Lemma 2.4. For any α, α max ), the function x φ k+1 α, x) is a simple quadratic function given by 2.6) with 2.7) 2.8) φ k+1 α, x) = φ k+1α) + γ k+1α) x v k+1 α) 2, 2 α 2 φ k+1 α) = fy k) 2γ k+1 α) fy k) α) [φ k fy k )+ + αγ k µ γ k+1 α) 2 y k v k 2 + fy k ) T v k y k ) )], v k+1 α) = 1 γ k+1 α) 1 α)γ kv k + αµy k fy k )) and 2.9) γ k+1 α) = 1 α)γ k + αµ. In particular, if µ > 0 then 2.) φ k+11) = fy k ) fy k) 2. 2µ Proof. The proof is straightforward and done in detail in Nesterov s book [5]. We now use Danskin s theorem in the format presented in Bertsekas [1] to prove that φ k+1 ) is concave. Lemma 2.5. The function α IR φ k+1 α) constructed above is concave in IR and differentiable in, α max ). Proof. If φ k+1 α max) is finite, then Lemma 2.3 ensures that φ k+1 ) is linear and hence this case is proved. Assume then that φ k+1 α max) =. We must prove that φ k+1 ) is concave and differentiable in, α max). We will prove the concavity in an arbitrary interval [α 1, α 2 ], α max ). For each α [α 1, α 2 ], φ k+1 α, ) has a unique minimizer v k+1 α). By continuity of v k+1 ) in 2.8), v k+1 [0, ᾱ]) is bounded, i.e., there exists a compact set V IR n such that φ k+1α) = min x V φ k+1α, x). So, the hypotheses for Danskin s theorem are satisfied: φ k+1, x) is concave in fact linear) for each x IR n,
8 8 C. GONZAGA AND E. KARAS φ k+1 α, ) has a unique minimizer in the compact set V for each α [α 1, α 2 ]. We conclude that φ k+1 ) is concave and differentiable in [α 1, α 2 ]. Since [α 1, α 2 ] was taken arbitrary, this proves the concavity and differentiability in, α max ), completing the proof. Setting y k = x k + θ k d k, with d k = v k x k and θ k [0, 1], 2.7) may be written as 2.11) φ k+1α) = fy k ) α2 2γ k+1 fy k ) α)ζα, θ k ) with ζα, θ) = φ k fx k + θd k )+ αγ 2.12) k µ + 1 α)γ k + αµ 2 1 θ)2 d k θ)f x k + θd k, d k )). Lemma 2.6. Assume that fx k ) φ k, y k = x k + θ k d k with d k = v k x k and θ k [0, 1]. Assume also that x k+1 is obtained by a steepest descent step from y k, satisfying 2.1), where the Lipschitz constant L is possibly infinite. Then 2.13) 1 φ k+1α) fx k+1 ) + 4L α2 2γ k+1 ) fy k ) α)ζα, θ k ). Furthermore 2.14) ζα, θ) ) αγ k θ + 1 θ) f x k + θd k, d k ). 1 α)γ k + αµ Proof. The expression 2.13) follows directly from 2.11) and the fact that the steepest descent step satisfies 2.1). Using the facts that fx k ) φ k and µ 0 in the definition of ζ, ), we can write ζα, θ) fx k ) fx k + θd k ) + αγ k 1 α)γ k + αµ 1 θ)f x k + θd k, d k ). By the convexity of f ), we know that fx k ) fx k + θd k ) θf x k + θd k, d k ), and so ) αγ k ζα, θ) θ + 1 θ) f x k + θd k, d k ), 1 α)γ k + αµ completing the proof Choice of the parameters. In this section we assume that µ 0 is a strong convexity constant for f ). We present two different schemes for choosing the parameters θ k and α k in Algorithm 2.1. For both schemes we have two main objectives: Prove that for all choices of θ k, it is always possible to compute α k such that φ k+1 α k) = fx k+1 ); Prove that in any case α k is large, i.e., α k = Ω γ k+1 ). We begin by describing Nesterov s choice.
9 FINE TUNING NESTEROV S ALGORITHM 9 Nesterov s choice [5, Scheme 2.2.6].. When a Lipschitz constant L is known, Nesterov s choices are: α N is computed so that the central term in 2.13) is null. So, α N is the positive solution of the second degree equation 2.15) 2Lα 2 1 α)γ k αµ = 0. θ N is so that the right hand side of 2.14) is null: 2.16) θ N = γ k γ k + α N µ α N. Note: If L is unknown but exists, α N and θ N are well defined but unknown). If L does not exist, we set α N = 0, θ N = 0. Nesterov s original method sets α k = α N and θ k = θ N, obtaining directly from 2.13) and ζα N, θ N ) = 0 that 2.17) Our choice. φ k+1α N ) fx k+1 ) and α N = γk+1 2L. Compute θ k [0, 1] such that 2.18) fx k + θ k d k ) fx k ) and θ k = 1 or f x k + θ k d k, d k ) 0). These are an Armijo condition with parameter 0 and a Wolfe condition with parameter 0. Compute α k [0, 1] as the largest root of the equation 2.19) with Aα 2 + Bα + C = 0 µ ) G = γ k 2 v k y k 2 + fy k ) T v k y k ), A = G fy k) 2 + µ γ k ) fx k ) fy k )), B = µ γ k ) fx k+1 ) fx k )) γ k fy k ) fx k )) G, C = γ k fx k+1 ) fx k )). As we will see this equation comes from the equality φ k+1 α) = fx k+1). The next lemma uses the definition of α N to show under what conditions we can compute α k α N such that this equality holds. By construction, we have 2.20) 2.21) φ k+10) = φ k fx k ) φ k+11) = inf l kx) fx ) fx k+1 ). x IR n Let us first eliminate a trivial case: due to 2.21), if φ k+1 1) = fx k+1) then x k+1 is an optimal solution, and we may terminate the algorithm. Let us assume that φ k+1 1) < fx k+1).
10 C. GONZAGA AND E. KARAS The following lemma does not need the assumption that µ > 0 is a strong convexity constant. Lemma 2.7. Assume that φ k > fx k+1) > inf x IR n l kx). Then φ k+1 α) = fx k+1) has one or two real roots and the largest root belongs to [0, 1]. The roots are computed by solving the second degree equation 2.19). Proof. Let ᾱ be the largest α [0, 1) such that φ k+1 ᾱ) = max α [0,1]φ k+1α) which exists because φ k+1 ) is concave and continuous in [0, 1), with φ k+1 1) < φ k+1 ᾱ)). Since φ k+1 1) < φ k+1 ᾱ) and fx k+1) φ k+1 1), the concave function decreases from ᾱ to 1, and crosses the value fx k+1 ) exactly once in [ᾱ, + ), at some point α [0, 1]. Setting φ k+1 α) = fx k+1) in 2.7) and multiplying both sides of the equation by γ k+1 α) = 1 α)γ k + αµ, we obtain a second degree equation in α. The largest root of this equation must be α, completing the proof. Lemma 2.8. Consider an iteration k of Algorithm 2.1. Assume that x k is not a minimizer of f and φ k+1 1) < fx k+1). Assume that y k = x k + θ k d k is chosen so that ζα N, θ k ) 0. Then there exists a value α k [α N, 1] that solves the equation φ k+1 α k) = fx k+1 ) in 2.7). Proof. Assume that α N and y k = x k +θ k d k have been computed as in the Lemma. Since x k is not a minimizer of f, by convexity of f and definition of l k ) On the other hand, φ k fx k ) > fx ) l k x ) inf l kx). x IR n fx k+1 ) > φ k+11) = inf l kx). x IR n Using Lemma 2.6 with α = α N and ζα N, θ k ) 0, we get from 2.13) that φ k+1α N ) fx k+1 ). Now from Lemma 2.7 the equation φ k+1 α) = fx k+1) has a real root, and the largest root α is in [0, 1]. Note that α α N, because α is the largest root of the concave function φ k+1 ) fx k+1). We conclude from this lemma that the equality φ k+1 α k) = fx k+1 ) can always be achieved whenever ζα N, θ k ) 0. Theorem 2.9. Algorithm 2.1 with either of the choices 2.16) or 2.18) for θ k is well defined and generates sequences {x k } and {φ k } such that for all k = 0, 1,..., φ k fx k). Moreover, the parameters α k and γ k satisfy 2.22) α k γk+1 2L. Proof. i) Let us first prove that both versions of the algorithm are well defined: Nesterov: immediate. Taking θ k = θ N we obtain by definition of α N and θ N, ζα N, θ N ) = 0 and Lemma 2.8 ensures that α k [α N, 1) can be computed as desired. Our method: since fx k ) fx k +θ k d k ) 0 and f x k +θ k d k, d k ) 0 by construction, then ζα, θ k ) 0 for any α [0, 1], and Lemma 2.8 can be used similarly. ii) In both cases we obtain ζα k, θ k ) 0. Hence from 2.13), fx k+1 ) = φ k+1α k ) fx k+1 ) + 1 4L α2 k 2γ k+1 ) fy k ) 2,
11 FINE TUNING NESTEROV S ALGORITHM 11 which immediately gives 2.22), completing the proof. We conclude from this that Nesterov s algorithm, which uses the value θ k = θ N with α k = α N can be improved by re-calculating α k using 2.7) and keeping θ k = θ N ). Remark.. Even computing the best value for α k, the sequence {fx k )} is not necessarily decreasing. Remark.. In the case µ = 0, the expressions get simplified. Then α N is the positive solution of Lα 2 k = 1 α k)γ k, and expression 2.14) is reduced to In this case, θ N = α N. ζα, θ) α θ 1 α f x k + θd k, d k ). 3. More on the choice of θ k. Our choice of θ k uses a line search along d k. In this section we discuss the amount of computation needed in this line search, and show that when L and µ are known it may be done in polynomial time. We conclude from the preceding analysis that what must be shown for a given choice of y k = x k + θ k d k is that ζα N, θ k ) 0. The following lemma summarizes the conditions that ensure this property. Lemma 3.1. Let y k = x k + θ k d k be the choice made by Algorithm 2.1 in iteration k, and let θ N be the Nesterov choice 2.16). Assume that one of the following conditions is satisfied: i) fy k ) fx k ) and f y k, d k ) 0; ii) fy k ) fx k ) and θ k θ N ; iii) f y k, d k ) 0 and θ k θ N ; iv) f x k, d k ) µ 2 d k 2 and θ k = 0. Then ζα N, θ k ) 0 and α k α N may be computed using 2.7)). Proof. i) It has been proved in Theorem 2.9. Let us substitute θ k = θ N + θ in 2.14), with γ N = 1 α N )γ k + α N µ: ζα N, θ k ) θ N + α Nγ k γ N 1 θ N ) θ α ) Nγ k θ f x k + θ k d k, d k ). γ N By definition of θ N, θ N + α N γ k 1 θ N )/γ N = 0. Hence ζα N, θ k ) 1 + α ) Nγ k 3.1) θ f x k + θ k d k, d k ). γ N ii) Assume that fy k ) fx k ) and θ k θ N. If f y k, d k ) 0, then the result follows from 3.1) because θ 0. Otherwise, it follows from i). iii) Assume now that f y k, d k ) 0 and θ k θ N. Then the result follows from 3.1) because θ 0. iv) This case is also trivial by 2.12), completing the proof. All we need is to specify in Algorithm 2.1 the choice of θ k and α k as follows: Choose θ k [0, 1] satisfying one of the conditions in Lemma 3.1. Compute α k such that φ k+1 = fx k+1) using 2.7). We must discuss the computation of θ k, which requires a line search. Of course, if L is given, we can use θ k = θ N and ensure the optimal complexity, but larger values of α k may be obtained.
12 12 C. GONZAGA AND E. KARAS We would like to obtain θ k and then α k ) such that ζα k, θ k ) is as large as possible. Since the determination of α k depends on the steepest descent step from y k = x k + θ k d k, we cannot maximize ζ, ). But positive values of ζα, θ k ) will be obtained if θ k is such that fx k ) fx k + θ k d k ) and f x k + θ k d k, d k ) are made as large as possible of course, these are conflicting objectives). We define but do not compute) two points: 3.2) 3.3) θ argmin{fx k + θd k ) θ 0}, θ = max{θ [0, 1] fx k + θd k ) fx k )}. Let us examine two important cases: If f x k, d k ) 0, then we should choose θ k = 0: a steepest descent step from x k follows. If fx k + d k ) fx k ), then we may choose θ k = 1: a steepest descent step from v k follows. If none of these two special situations occur, then 0 < θ < θ < 1. Any point θ k [θ, θ ] gives ζα, θ k ) 0 for any α [0, 1], and y k = x k +θ k d k satisfies condition i) of Lemma 3.1. A point in this interval can be computed by the interval reduction algorithm described in the Appendix. We have no complexity estimates for a line search procedure such as this if L and µ are not available. When these constants are known, it is possible to limit the number of iterations in the line search to keep Nesterov s complexity. We shall discuss briefly the choice of θ k in three possible situations, according to our knowledge of the constants. Case 1. No knowledge of L or µ.. We start the iteration checking the special cases above: If fx k + d k ) fx k ), then we choose θ k = 1: a steepest descent step from v k will follow. Otherwise, we must decide whether f x k, d k ) 0. This may be done as follows: Compute fx k + θd k ) for a small value of θ > 0 If fx k + θd k ) fx k ), compute fx k ) and then f x k, d k ) = fx k ) T d k. Else f x k, d k ) < 0. If f x k, d k ) 0, set θ k = 0. Else, do a line search as in the Appendix, starting with the points 0, θ and 1. Case 2: L is given.. In this case we start the search with θ N computed as in 2.16). The line search can be abbreviated to keep the number of function calculations within a given bound, resulting in a step satisfying one of the three first conditions in Lemma 3.1. There are two cases to consider, depending on the sign of fx k ) fx k + θ N d k ). fx k + θ N d k ) < fx k ): Condition ii) of Lemma 3.1 is satisfied for any θ k θ N such that fx k + θ k d k ) fx k ). We may use an interval reduction method as in the Appendix, or simply take steps to increase θ. A trivial method is the following, using a constant β > 1: θ k = θ N while βθ k 1 and fx k + βθ k d k ) fx k ), set θ k = βθ k. In any method, the number of function calculations may be limited by a given constant, adopting as θ k the largest computed step length which satisfies the descent condition. fx k + θ N d k ) > fx k ): Condition iii) of Lemma 3.1 holds for θ N, and we intend to reduce the value of θ. For an interval reduction search between 0 and θ N we need
13 FINE TUNING NESTEROV S ALGORITHM 13 an intermediate point ν such that fx k + νd k ) fx k ), if it exists. We propose the following procedure: Compute fx k + νd k ) for ν << θ N If fx k + νd k ) fx k ), set θ k = ν Else compute θ k by an interval reduction method as in the Appendix, starting with the points 0, ν and θ N. The algorithm can be interrupted at any time, setting θ k = B. Case 3: L and µ > 0 are given.. This is a special case of case 2, with an interesting property: a point satisfying condition i) in Lemma 3.1 can be computed in polynomial time without computing fx k ), using the next lemma. Lemma 3.2. Consider θ = argmin{fx k + θd k ) θ 0} and θ > θ such that fx k + θ d k ) = fx k ), if it exists. Define θ = µ/2l). If fx k + θd k ) fx k ) µ2 8L d 2, then θ θ and θ θ θ, otherwise f x k, d k ) µ 2 d 2. Proof. Assume that fx k + θ d k ) fx k + θd k ) fx k ) µ2 8L d 2. Using the Lipschitz condition at θ with f x k + θ d k, d k ) = 0, we get for θ IR, Using the assumption, fx k + θd k ) fx k + θ d k ) + L 2 θ θ ) 2 d 2. fx k + θd k ) fx k ) + ) Lθ θ ) 2 µ2 d 2 4L 2. For θ = 0, this gives Lθ 2 µ 2 /4L) 0, or equivalently θ 2 θ 2, proving the first inequality. For θ = θ, fx k + θ d k ) = fx k ), and we obtain immediately θ θ ) 2 θ 2. Assume now that fx k + θd k ) > fx k ) µ2 8L d 2. By the Lipschitz condition for f ), we know that fx k + θd k ) fx k ) + f x k, d k ) θ + L 2 θ 2 d 2. Assuming by contradiction that f x k, d k ) < µ d 2 /2, we obtain fx k + θd k ) < fx k ) µ 2 θ d 2 + L 2 θ 2 d 2 = fx k ) µ2 4L d 2 + µ2 8L d 2 = fx k ) µ2 8L d 2, contradicting the hypothesis and completing the proof. In this case we may use a golden search, as follows: If fx k + θd k ) fx k ) µ2 8L d 2, set θ k = 0 and stop condition iv) in Lemma 3.1 holds). If fx k + d k ) fx k ), set θ k = 1 and stop condition ii) in Lemma 3.1 holds). Now we know that θ θ µ/2l) < 1). Use an interval reduction search as in the Appendix in the interval [ θ, 1]
14 14 C. GONZAGA AND E. KARAS If we use a golden section search, then at each iteration j before the stopping condition is met, we have A θ θ B, and the interval length satisfies B A β j, with β = 5 1)/ Using the lemma above, a point B [θ, θ ] will have been found when β j µ/2l). Hence the number of iterations of the search will be bounded by j max = logµ/2l)). log β This provides a bound on the computational effort per iteration of Olog2L/µ)) function evaluations. 4. Adaptive convexity parameter µ. In computational tests see next section) we see that the use of a strong convexity constant is very effective in reducing the number of iterations. But very seldom such constant is available. In this section we state an algorithm which uses a decreasing sequence of estimates for the value of the constant µ. We assume that a strong convexity parameter µ possibly null) is given, and the method will generate a sequence µ k µ, starting with µ 0 [µ, γ 0 ). We still need an extra hypothesis: that the level set associated with x 0 is bounded. Note that since f ) is convex, this is equivalent to the hypothesis that the optimal set is bounded. The algorithm iterations are the same as in Algorithm 2.1, using a parameter µ k µ. The parameter is reduced we do this by setting µ k = max{µ, µ k /}) in two situations: When γ k µ < βµ k µ ), where β > 1 is fixed we used β = 1.02), meaning that γ k is too close to µ k. When it is impossible to satisfy the equation φ k+1 α) = fx k+1) for α [0, 1]. By Lemma 2.7, this can only happen when fx k+1 ) min x IR n lx). This minimum is given by φ k+1 1) = fy k) fy k ) 2 /2µ k ), using expression 2.). So, µ k cannot be larger than µ = fy k ) 2 2fy k ) fx k+1 )). If µ k > µ, we reduce µ k. Notation: Denote φ k+1 x) = φ k+1 α k, x) and φ k+1 = φ k+1 α k). Algorithm 4.1. Algorithm with adaptive convexity parameter. Data: x 0 IR n, v 0 = x 0, γ 0 > 0, β > 1, µ = µ, µ 0 [µ, γ 0 ). k = 0 Set µ + = µ 0. repeat d k = v k x k. Choose θ k [0, 1] as in Algorithm 2.1. y k = x k + θ k d k. If fy k ) = 0, then stop with y k as an optimal solution. Steepest descent step: x k+1 = y k ν fy k ). If L is known, ν 1/L. If γ k µ ) < βµ + µ ), then choose µ + [µ, γ k /β]. Compute µ = fy k ) 2 2fy k ) fx k+1 )). If µ + µ, then µ + = max{µ, µ/}. Compute α k as the largest root of 2.19) with µ = µ +.
15 FINE TUNING NESTEROV S ALGORITHM 15 γ k+1 = 1 α k )γ k + α k µ +. v k+1 = 1 γ k+1 1 α k )γ k v k + α k y k µ + fy k ))). Set µ k = µ +. k = k + 1. We suggest γ 0 = L if L is known, µ 0 = max{µ, γ 0 /0}, β = We also suggest the following reduction scheme for µ +, provided that β : µ + = max{µ, γ k /}. Independently of this choice, the following fact holds: Fact 4.2. For any k 0, γ k µ ) βµ k µ ). Proof. If at the beginning of an iteration k, γ k µ ) < βµ + µ ), then we chose µ + [µ, γ k /β] with β > 1. Then, As µ k µ +, we complete the proof. βµ + µ ) γ k βµ γ k µ. We now show the optimality of the algorithm, using an additional hypothesis. Hypothesis.. Given x 0 IR n, assume that the level set associated with x 0 is bounded, i.e., Define Q = D2 2. D = sup{ x y fx) fx 0 ), fy) fx 0 )} <. Lemma 4.3. Consider γ 0 > µ. Let x be an optimal solution of P ). Then, at all iterations of Algorithm 4.1, 4.1) φ k x ) fx ) γ 0 + L γ 0 µ Qγ k µ ). Proof. First let us prove that 4.1) holds for k = 0. By definition of φ 0 and convexity of f with Lipschitz constant L for the gradient, we have φ 0 x ) fx ) = fx 0 ) fx ) + γ 0 2 x x 0 2 L + γ 0 x x L + γ 0 )Q. Thus, we can use induction. By 2.2) and 1.2), we have φ k+1 x) = 1 α k )φ k x) + α k fy k ) + fy k ) T x y k ) + µ k 2 x y k 2) 1 α k )φ k x) + α k fx) + µ k µ ) x y k 2. 2 Thus, φ k+1 x ) fx ) 1 α k ) φ k x ) fx )) + α k Qµ k µ ).
16 16 C. GONZAGA AND E. KARAS Using the induction hypothesis and the definition of γ k+1, we have φ k+1 x ) fx ) 1 α k ) γ 0 + L γ 0 µ Qγ k µ ) + α k Qµ k µ ) γ 0 + L γ 0 µ Q 1 α k)γ k + α k µ k µ ) = γ 0 + L γ 0 µ Qγ k+1 µ ), completing the proof. Now we prove a technical lemma, following a very similar result in Nesterov [5, Lemma 2.2.4]. Lemma 4.4. Consider a positive sequence {λ k }. Assume that there exists M > 0 such that λ k+1 1 M λ k+1 )λ k. Then, for all k > 0, λ k 4 M 2 1 k 2. Proof. Denote a k = 1 λk. Since {λ k } is a decreasing sequence, we have a k+1 a k = Using the hypothesis, Thus, a k a 0 + Mk 2 Mk 2 λk λ k+1 λ k λ = k+1 λk λ k+1 λk λ k+1 λk + ) λ k λ k+1. λ k+1 2λ k λk+1 a k+1 a k λ k 1 M λ k+1 )λ k 2λ k λk+1 = M 2. and consequently λ k 4 M 2, completing the proof. k2 Now we can discuss how fast γ k µ ) goes to zero for estimating the rate of convergence of the algorithm. Lemma 4.5. Consider γ 0 > µ. Then, for all k > 0, 4.2) γ k µ min 1 β 1 β ) k µ γ 0 µ ), 2L 8β 2 L 1 β 1) 2 k 2. Proof. By the definition of γ k+1, γ k+1 µ = 1 α k )γ k µ ) + α k µ k µ ). By Fact 4.2, γ k µ ) βµ k µ ). Thus, 4.3) γ k+1 µ 1 α k )γ k µ ) + α k β γ k µ ) = 1 β 1 ) β α k γ k µ ).
17 FINE TUNING NESTEROV S ALGORITHM 17 By Theorem ) α k γk+1 2L γk+1 µ. 2L Thus, γ k+1 µ 1 β 1 β ) µ γ k µ ) 2L 1 β 1 β ) k µ γ 0 µ ), proving the first inequality in 4.2). On the other hand, from 4.3) and 4.4), we have γ k+1 µ 1 β 1 ) β γk+1 µ γ k µ ). 2L Applying Lemma 4.4 with λ k = γ k µ, for k 0, and M = β 1 β, we complete the 2L proof. The next theorem discusses the complexity bound of the algorithm. Theorem 4.6. Consider γ 0 > µ 0. Then Algorithm 4.1 generates a sequence {x k } such that, for all k > 0, fx k ) fx ) min 1 β 1 ) k µ 8β 2 L 1, β 2L β 1) 2 γ 0 µ ) k 2 QL + γ 0). 2L Proof. By construction, fx k ) φ k x) for all x IR n, in particular for x. Using this and Lemma 4.3, we have: fx k ) fx ) L + γ 0) γ 0 µ Qγ k µ ). Applying Lemma 4.5, we complete the proof. This lemma ensures that an error of ε > 0 for the final objective function value is achieved in O1/ ε) iterations. Global convergence. In our algorithms, each iteration produces fx k+1 ) fy k ) fx k ). Hence, if we consider the sequence x 1, y 1, x 2, y 2,..., we have a descent algorithm, and every other step is a steepest descent step with an Armijo step length. The global convergence of such a method is a standard result. 5. Numerical results. In this section, we report the results of a limited set of computational experiments, comparing the variants of Algorithm 2.1 and Algorithm 4.1. The codes are written in Matlab and we used as stopping criterion the norm of the objective function gradient and the function values, whenever the optimal value is available. In all the algorithms, the line searches are of the Armijo type, and the same routine is used for all. We did not work on the efficiency of the line searches, and so it does not make much sense to compare methods based on the number of function calculations, with one exception: the original Nesterov algorithm never computes function values, which will be commented ahead. Hence our comparisons will be based
18 18 C. GONZAGA AND E. KARAS on the number of iterations, with one or two gradient computations per iteration and one or two line searches per iteration see below). Here are the algorithms to be tested, with their acronyms: [Nesterov1] Algorithm 2.1 with θ k = θ N, α k = α N. [α G, θ G ] Algorithm 2.1 with our choice of parameters. [Adaptive] Algorithm 4.1 with our choice of parameters. [Nesterov2] Nesterov s algorithm with adaptive estimates for L, from reference [7]. [Cauchy] Steepest descent algorithm. [F-R] Fletcher-Reeves as presented in [, p.120]. We shall see that in most examples [Adaptive] has the best performance among the optimal methods cited. When there are good estimates for the constants L and µ, [Nesterov1] may be the best. At the end of the section we compare our method with the Fletcher and Reeves conjugate gradient method, and conclude that in many cases this last method has the best performance. Now we describe our tests for three different convex functions: i) Quadratic function with µ = 1 and L = 00. ii) Nesterov s worst function in the world. iii) A convex function including exponentials. In i) and iii), the Hessian eigenvalues at the unique minimizer are randomly distributed between µ = 1 and L = 00. i) Quadratic functions. Given n, L > µ > 0 in these tests, L = 00 and µ = 1), the quadratic test functions are given by x IR n fx) = 1 2 xt Qx, where Q is a diagonal matrix with random entries in the interval [1, L]. For each experiment, we generate Q and a random initial point x 0 with x 0 = 1. We generated several instances of problems with n assuming the values, 0, 200, 500, 00, 2500 which were solved by all methods. Figure 5.1 shows the performance of the methods assuming ignorance of µ and L. The algorithms were fed with data µ = 0, L = On the left, we show the function values for a typical example in dimension 00, and on the right the performance profiles for all methods on the whole set of problems. In all cases [Adaptive] used the smallest number of iterations, followed by [Nesterov2] and [α G, θ G ]. [Nesterov1] always needed over times the number of iterations of [Adaptive]. Figure 5.2 shows the results for the same tests, but using the knowledge of the values µ = 1 and L = 00. Now [Nesterov1] performs very well and even if the number of iterations is a little larger, each iteration is faster because only one line search is needed. Now [Nesterov2] is slower because it does not take advantage of the knowledge of µ. ii) Nesterov s worst function of the world. This function is defined and explained in Nesterov [5, p.67]. Given L > µ > 0 we took µ = 1, L = 00), set Q = L/µ. The function is ) n 1 x IR n µq 1) 5.1) fx) = x x i x i+1 ) 2 + µ 8 2 x 2 i=1
19 FINE TUNING NESTEROV S ALGORITHM 19 f k Nesterov1 θ G,α G Adaptive Nesterov2 Cauchy Fig Quadratic functions with µ and L unknown. f k 0.3 Nesterov1 θ,α G G 0.2 Adaptive 0.1 Nesterov2 Cauchy Fig Quadratic functions with µ and L given. Generating a set of problems for the same values of n as before and several starting points, Figure 5.3 shows the performance profiles for all problems in two situations: on the left, considering µ and L unknown µ = 0 and L = 0000); on the right for µ = 1 and L = 00 given to the algorithms. We see that the behavior is similar to the case i), with a detail: [Nesterov1] was worse than [Cauchy] for unknown µ and L. This is certainly due to the choice L = 0000, which affects the computation of θ N in the algorithm Nesterov θ G,α G Adaptive Nesterov2 Cauchy Fig Performance profile for the function defined by 5.1). iii) Function with exponentials This function is convex and includes exponentials. The Hessian eigenvalues at the
20 20 C. GONZAGA AND E. KARAS optimizer are randomly distributed between 1 and 00: 5.2) fx) = 1 n 2 x 2 + q i e xi x i 1), i=1 where q i [1, 999] for i = 1,..., n. The minimizer is x = 0, with f0) = 0. We use the same values of n as in the former examples. For this function the Hessian matrix Hx) is diagonal with H ii x) = q i e xi + 1, and hence there is no global Lipschitz constant for the gradient. Figure 5.4 shows a typical run with n = 00 and the performance profile for µ = 0, L = 0000 fed to the algorithms. f k Nesterov1 θ G,α G Adaptive Nesterov2 Cauchy Fig Objective function value and performance profile for a function with exponentials. Again, [Adaptive] has the best performance, and [Nesterov2] takes about 5 times the number of iterations computing at least two gradients per iteration). Here again that [Nesterov1] is slower than [Cauchy]. Comparison with Fletcher-Reeves algorithm. As we saw in the Introduction, conjugate gradient algorithms are the most efficient first order methods for quadratic functions. They are also known to perform very well for a strictly convex function in a region where the second derivatives are near their values at an optimal solution possibly using restarts and precise line searches). There exists a vast literature on conjugate gradient methods, and we do not intend to review it here. We shall construct an example in which the second derivatives are illbehaved near the optimal solution, and observe that our method performs better than Fletcher-Reeves. For a drastic but not so simple counterexample, see the reference [4] discussed in the Introduction. The separable function. We start by defining a function g : IR IR as follows: partition the set [0, + ] in p intervals defined by the points 0, 2 p+2, 2 p+3,..., 2 0, +, and associate with each interval a positive value chosen at random. Now integrate twice the resulting function to obtain a strictly convex quadratic by parts function. Given τ j [1, L], j = 1,..., n, L > 0, our separable function is constructed as x IR n fx) = n τ j gx j ). This function has a global minimum at the origin, the Hessian is always positive diagonal, the derivatives are L-Lipschitz continuous, but the Hessian Hx) differs very much from H0) at any points x with x > 2 p+2. j=1
21 FINE TUNING NESTEROV S ALGORITHM 21 Figure 5.5 shows a typical behavior of our algorithm and [F-R] for problems with dimension n = 200 and n = The algorithm [F-R] was implemented with restarts at every n iterations, and also when the search direction becomes almost orthogonal to the gradient. The line search used was either golden section or Armijo. In practically all runs with dimensions from 0 to 000, [F-R] needed more iterations than [Adaptive]. f 3 f 3 Adaptive F R k k Fig Separable function values in IR 200 and IR Appendix: interval reduction search. Let g : [0, 1] IR be a convex differentiable function. Assume that g0) = 0, g 0) < 0 and g 1) > 0. We shall study the problem of finding θ [0, 1] such that gθ) 0 and g θ) 0. This can be seen as a line search problem with an Armijo constant 0 and a Wolfe condition with constant 0, for which there are algorithms in the literature see for instance [3, ]). In the present case, in which the function is convex, we can use a simple interval reduction algorithm, based on the following step: Assume that three points 0 A < ν < B 1 are given, satisfying ga) gν) gb). Note that as consequences of the convexity of g, the interval [A, B] contains a minimizer of g and g B) 0. The problem can be solved by the following interval reduction scheme: Algorithm 5.1. Interval reduction algorithm while gb) > 0, Choose ξ [0, 1], ξ ν. Set u = min{ν, ξ}, v = max{ν, ξ}. If gu) gv), set B = v, ν = u, else set A = u, ν = v. The initial value of ν and the values of ξ in each iteration may be such that u and v define the golden section of the interval [A, B], and then the interval length will be reduced by a factor of 5 1)/ in each iteration. We can also choose ξ as the minimizer of the quadratic function through ga), gν), gb). In this case one must avoid the case ξ = µ, in which ξ must be perturbed. Acknowledgements. The authors would like to thank Prof. Masakazu Kojima and Prof. Mituhiro Fukuda for useful discussions about the subject. We also thank the anonymous referees and Prof. Paulo J. S. Silva for their valuable comments and suggestions which very much improved this paper. This research was partially completed when the second author was visiting Tokyo Institute of Technology under the project supported by MEXT s program Promotion of Environmental Improvement
22 22 C. GONZAGA AND E. KARAS for Independence of Young Researchers under the Special Coordination Funds for Promoting Science and Technology. REFERENCES [1] D. P. Bertsekas, A. Nedić, and A. E. Ozdaglar. Convex Analysis and Optimization. Athena Scientific, Belmont, USA, [2] G. Lan, Z. Lu, and R.D.C. Monteiro. Primal-dual first-order methods with O1/ɛ) iterationcomplexity for cone programming. Technical report, School of ISyE, Georgia Tech, USA, Accepted in Mathematical Programming. [3] J. J. Moré and D. J. Thuente. Line search algorithms with guaranteed sufficient decrease. Technical Report MCS-P330-92, Mathematics and Computer Science Division, Argonne National Laboratory, [4] A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley, New York, [5] Y. Nesterov. Introductory Lectures on Convex Optimization. A basic course. Kluwer Academic Publishers, Boston, [6] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 31): , [7] Y. Nesterov. Gradient methods for minimizing composite objective function. Discussion paper 76, CORE, UCL, Belgium, [8] Y. Nesterov. Smoothing technique and its applications in semidefinite optimization. Mathematical Programming, 12): , [9] Y. Nesterov and B. T. Polyak. Cubic regularization of newton method and its global performance. Mathematical Programming, 8: , [] J. Nocedal and S. J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag, [11] B. T. Polyak. Introduction to Optimization. Optimization Software Inc., New York, [12] P. Richtárik. Improved algorithms for convex minimization in relative scale. SIAM Journal on Optimization, 213): , [13] N. Shor. Minimization Methods for Non-Differentiable Functions. Springer Verlag, Berlin, 1985.
OPTIMAL STEEPEST DESCENT ALGORITHMS FOR UNCONSTRAINED CONVEX PROBLEMS: FINE TUNING NESTEROV S METHOD
OPTIMAL STEEPEST DESCENT ALGORITHMS FOR UNCONSTRAINED CONVEX PROBLEMS: FINE TUNING NESTEROV S METHOD CLÓVIS C. GONZAGA AND ELIZABETH W. KARAS Abstract. We modify the first order algorithm for convex programming
More informationPrimal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization
Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department
More informationOn the iterate convergence of descent methods for convex optimization
On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationInterior Point Methods for Mathematical Programming
Interior Point Methods for Mathematical Programming Clóvis C. Gonzaga Federal University of Santa Catarina, Florianópolis, Brazil EURO - 2013 Roma Our heroes Cauchy Newton Lagrange Early results Unconstrained
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationSome Inexact Hybrid Proximal Augmented Lagrangian Algorithms
Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationInterior Point Methods in Mathematical Programming
Interior Point Methods in Mathematical Programming Clóvis C. Gonzaga Federal University of Santa Catarina, Brazil Journées en l honneur de Pierre Huard Paris, novembre 2008 01 00 11 00 000 000 000 000
More informationGlobal convergence of trust-region algorithms for constrained minimization without derivatives
Global convergence of trust-region algorithms for constrained minimization without derivatives P.D. Conejo E.W. Karas A.A. Ribeiro L.G. Pedroso M. Sachine September 27, 2012 Abstract In this work we propose
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationAn adaptive accelerated first-order method for convex optimization
An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationA convergence result for an Outer Approximation Scheme
A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationStep-size Estimation for Unconstrained Optimization Methods
Volume 24, N. 3, pp. 399 416, 2005 Copyright 2005 SBMAC ISSN 0101-8205 www.scielo.br/cam Step-size Estimation for Unconstrained Optimization Methods ZHEN-JUN SHI 1,2 and JIE SHEN 3 1 College of Operations
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationExamination paper for TMA4180 Optimization I
Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationSome Background Math Notes on Limsups, Sets, and Convexity
EE599 STOCHASTIC NETWORK OPTIMIZATION, MICHAEL J. NEELY, FALL 2008 1 Some Background Math Notes on Limsups, Sets, and Convexity I. LIMITS Let f(t) be a real valued function of time. Suppose f(t) converges
More informationUnconstrained optimization I Gradient-type methods
Unconstrained optimization I Gradient-type methods Antonio Frangioni Department of Computer Science University of Pisa www.di.unipi.it/~frangio frangio@di.unipi.it Computational Mathematics for Learning
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationThis manuscript is for review purposes only.
1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region
More informationNONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM
NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth
More informationAn Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization
An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization H. Mansouri M. Zangiabadi Y. Bai C. Roos Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord,
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationBASICS OF CONVEX ANALYSIS
BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationCORE 50 YEARS OF DISCUSSION PAPERS. Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable 2016/28
26/28 Globally Convergent Second-order Schemes for Minimizing Twicedifferentiable Functions YURII NESTEROV AND GEOVANI NUNES GRAPIGLIA 5 YEARS OF CORE DISCUSSION PAPERS CORE Voie du Roman Pays 4, L Tel
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationComplexity bounds for primal-dual methods minimizing the model of objective function
Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationOptimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More informationSome Properties of the Augmented Lagrangian in Cone Constrained Optimization
MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented
More informationQUADRATIC MAJORIZATION 1. INTRODUCTION
QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationOn self-concordant barriers for generalized power cones
On self-concordant barriers for generalized power cones Scott Roy Lin Xiao January 30, 2018 Abstract In the study of interior-point methods for nonsymmetric conic optimization and their applications, Nesterov
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems
Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm
More informationResearch Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization
Iranian Journal of Operations Research Vol. 4, No. 1, 2013, pp. 88-107 Research Note A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization B. Kheirfam We
More informationA CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE
Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received
More informationChapter 2 Convex Analysis
Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationAccelerating Nesterov s Method for Strongly Convex Functions
Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More informationDO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO
QUESTION BOOKLET EECS 227A Fall 2009 Midterm Tuesday, Ocotober 20, 11:10-12:30pm DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO You have 80 minutes to complete the midterm. The midterm consists
More informationMA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS
MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is
More informationAM 205: lecture 18. Last time: optimization methods Today: conditions for optimality
AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationLecture 10: September 26
0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationA PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
More informationKantorovich s Majorants Principle for Newton s Method
Kantorovich s Majorants Principle for Newton s Method O. P. Ferreira B. F. Svaiter January 17, 2006 Abstract We prove Kantorovich s theorem on Newton s method using a convergence analysis which makes clear,
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationOn the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean
On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems
An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract
More informationAcceleration Method for Convex Optimization over the Fixed Point Set of a Nonexpansive Mapping
Noname manuscript No. will be inserted by the editor) Acceleration Method for Convex Optimization over the Fixed Point Set of a Nonexpansive Mapping Hideaki Iiduka Received: date / Accepted: date Abstract
More informationGLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH
GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics
More informationOn the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search
ANZIAM J. 55 (E) pp.e79 E89, 2014 E79 On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search Lijun Li 1 Weijun Zhou 2 (Received 21 May 2013; revised
More informationFALL 2018 MATH 4211/6211 Optimization Homework 4
FALL 2018 MATH 4211/6211 Optimization Homework 4 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More information