Iteration-complexity of first-order penalty methods for convex programming
|
|
- Julian Francis Sullivan
- 5 years ago
- Views:
Transcription
1 Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP) problems whose feasible region is a simple compact convex set intersected with the inverse image of a closed convex cone under an affine transformation. We study two first-order penalty methods for solving the above class of problems, namely: the quadratic penalty method and the exact penalty method. In addition to one or two gradient evaluations, an iteration of these methods requires one or two projections onto the simple convex set. We establish the iteration-complexity bounds for these methods to obtain two types of near optimal solutions, namely: near primal and near primaldual optimal solutions. Finally, we present variants, with possibly better iteration-complexity bounds than the aforementioned methods, which consist of applying penalty-based methods to the perturbed problem obtained by adding a suitable perturbation term to the objective function of the original CP problem. Keywords: Convex programming, quadratic penalty method, exact penalty method, Lagrange multiplier AMS 2000 subject classification: 90C25, 90C06, 90C22, 49M37 1 Introduction The basic problem of interest in this paper is the convex programming CP) problem f := inf{fx) : Ax) K, x X}, 1) where f : X IR is a convex function with Lipschitz continuous gradient, X R n is a sufficiently simple closed convex set, A : R n R m is an affine function, and K denotes the dual cone of a closed convex cone K R m, i.e., K := {s R m : s, x 0, x K}. For the case where the feasible region consists only of the set X, or equivalently A 0, Nesterov [2, 4]) developed a method which finds a point x X such that fx) f ɛ in at most Oɛ 1/2 ) The wor of both authors were partially supported by NSF Grants CCF and CCF and ONR Grant N School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, glan@isye.gatech.edu). School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, monteiro@isye.gatech.edu). 1
2 iterations. Moreover, each iteration of his method requires one gradient evaluation of f and computation of two projections onto X. It is shown that his method achieves, uniformly in the dimension, the lower bound on the number of iterations for minimizing convex functions with Lipschitz continuous gradient over a closed convex set. When A is not identically 0, Nesterov s optimal method can still be applied directly to problem 1) but this approach would require the computation of projections onto the feasible region X {x : Ax) K }, which for most practical problems is as expensive as solving the original problem itself. In this paper, we are interested in alternative first-order methods for 1) whose iterations consist of, in addition to a couple of gradient evaluations, only projections onto the simple set X. More specifically, we present two penalty-based approaches for solving 1), namely: the quadratic and the exact penalty methods, and establish their iteration-complexity bounds for computing two types of near optimal solutions of 1). It is well-nown that the penalty parameters of the penalization problems for the above penalty-based approaches must be chosen sufficiently large so that near optimal solutions of the penalization problems yield near optimal solutions for the original problem 1). Accordingly, we establish theoretical lower bounds on the penalty parameters that depend not only on the desired solution accuracies but also on the size λ of the minimum norm Lagrange multiplier associated with the constraint Ax) K. Theoretically, setting the penalty parameters to their corresponding lower bounds would yield the lowest provably iteration-complexity bounds, and hence would be the best strategy to pursue. But since λ is not nown a priori, we present alternative penalty-based approaches based on simple guess-and-chec procedures for these penalty parameters whose iteration-complexity bounds are of the same order as the approach in which λ is nown as priori, or equivalently, the theoretical lower bound on the penalty parameter is available. Finally, we present variants of the aforementioned methods, with possibly better iteration-complexity bounds, which consist of applying penalty-based methods to the problem obtained by adding a suitable perturbation term to the objective function of the original CP problem. The paper is organized as follows. In Section 2, we give a more detailed description of problem 1) and definitions of two types of approximate solutions of 1). Section 3 discusses some technical results that will be used in our analysis and reviews two first order algorithms due to Nesterov [2, 4] for solving special classes of CP problems. In Section 4, we establish the iteration-complexity of quadratic penalty based methods for solving 1). More specifically, we present the iterationcomplexity of the quadratic penalty method applied to 1) in Subsection 4.1, and of a variant, which consists of applying the quadratic penalty approach to the perturbed problem obtained by adding a suitable perturbation term to the objective function of 1), in Subsection 4.2. In Section 5, we establish the iteration-complexity of exact penalty based methods for solving 1). More specifically, we present the iteration-complexity of the exact penalty applied directly to 1) in Subsection 5.1, and of a variant, which consists of applying the exact penalty method to the perturbed problem corresponding to 1), in Subsection Notation and terminology We denote the p-dimensional Euclidean space by IR p. Also, IR p + and IRp ++ denote the nonnegative and the positive orthants of IR p, respectively. In this paper, we use the notation R p to denote a p- dimensional vector space inherited with a inner product space,. The dual norm associated with an arbitrary norm on R p is defined as s := max x { s, x : x 1}, s Rp. 2
3 Note that if the norm is the one induced by the inner product, i.e. =, 1/2, then. Given a closed convex set C R p, we define the distance function d C : R p IR to C with respect to as d C u) := min{ u c : c C} for every u R p. It is well-nown that this minimum is always achieved at some c C. Moreover, this minimizer is unique whenever is a inner product norm. In such a case, we denote this unique minimizer by Π C u), i.e., Π C u) = argmin{ u c : c C} for every u R p. The support function of a set C R p is defined as σ C u) := sup{ u, c : c C}. 2 Problem of interest Consider inner product spaces R n and R m endowed with arbitrary norms, both denoted simply by. In this paper, we consider the CP problem 1) where f : X IR is a convex function with L f -Lipschitz-continuous gradient, i.e.: f x) fx) L f x x, x, x X. 2) We define the norm of the map A as being the operator norm of its linear part A 0 := A ) A0), i.e.: A := A 0 = max{ A 0 x) : x 1} = max{ Ax) A0) : x 1}. The Lagrangian dual function and value function associated with 1) are defined as dλ) := inf{fx) + λ, Ax) : x X}, λ K, 3) vu) := inf{fx) : Ax) + u K, x X}, u R m. 4) We mae the following assumptions throughout the paper: Assumption 1 A.1) the set X is bounded and hence f IR); A.2) there exists a Lagrange multiplier for 1), i.e., a vector λ K such that f = dλ ). Throughout this paper, we assume that the norm in R m is induced by the inner pr. First note that x is an optimal solution of 1) if, and only if, x X, Ax ) K and fx ) f. This observation leads us to our first definition of a near optimal solution x X of 1), which essentially requires the primal infeasibility measure d K A x)) and the primal optimality gap [f x) f ] + to be both small. Definition 1 For a given pair ɛ p, ɛ o ) IR ++ IR ++, x X is called an ɛ p, ɛ o )-primal solution for 1) if d K A x)) ɛ p and f x) f ɛ 0. 5) One drawbac of the above notion of near optimality of x is that it says nothing about the size of [f x) f ]. We will now show that this quantity can be bounded as [f x) f ] ɛ p λ, where λ is an arbitrary Lagrange multiplier for 1). We start by stating the following result whose proof is given in the Appendix. Proposition 1 Let K R m be a closed convex cone. Then, the following statements hold: a) d K = σ C, where C := K) B0, 1), where B0, 1) := {u R m : u 1}; 3
4 b) for every u R m and λ K, we have u, λ λ d K u). As a consequence of the above result, we obtain the following technical inequality which will be frequently used in our analysis. Corollary 2 Let λ be an Lagrange multiplier for 1). Then, for every x X, we have fx) f λ d K Ax)). Proof. It is well-nown that our assumptions imply that v is a convex function such that λ v0), and hence that vu) v0) λ, u = λ, u λ d K u) = λ d K u), u R m, where the inequality follows from Proposition 1b) and the fact that λ K. Now, let x X be given. Since x is clearly feasible for problem 4) with u = Ax), the definition of v ) in 4) implies that vu) fx). Hence, fx) f vu) v0) λ d K u) = λ d K Ax)). Corollary 3 If x X is an ɛ p -feasible solution, i.e., if d K A x)) ɛ p, then f x) f ɛ p λ, where λ is an arbitrary Lagrange multiplier for 1). Another way of defining a near optimal solution of 1) is based on the following optimality conditions: x X is an optimal solution of 1) and λ K is a Lagrange multiplier for 1) if, and only if, x, λ) = x, λ ) satisfies A x) K, λ, A x) = 0, f x) + A 0 ) λ NX x), 6) where N X x) := {s R n : s, x x 0, x X} denotes the normal cone of X at x. Based on this observation, we can introduce our second definition of a near optimal solution of 1). Definition 2 For a given pair ɛ p, ɛ d ) IR ++ IR ++, x, λ) X K) is called an ɛ p, ɛ d )-primaldual solution of 1) if where Bη) := {x R n : x η} for every η 0. d K A x)) ɛ p, λ, Π K A x)) = 0, 7) f x) + A 0 ) λ NX x) + Bɛ d ), 8) In Sections 4 and 5, we will study the iteration-complexities of well-nown penalty methods, namely the quadratic and the exact penalty methods, for computing the two types of near optimal solutions defined in this section. 4
5 3 Technical results This section discusses some technical results that will be used in our analysis and reviews two first order algorithms due to Nesterov [2, 4] for solving special classes of CP problems. It consists of three subsections. The first one develops several technical results involving projected gradients. The second subsection reviews Nesterov s optimal method for solving a class of smooth CP problems. The third subsection reviews Nesterov s smooth approximation scheme for solving a class of non-smooth CP problems that can be reformulated as smooth convex-concave saddle point i.e., min-max) problems. 3.1 Projected gradient and the optimality conditions In this subsection, we consider an inner product space R n endowed with the norm associated with its inner product. We consider the CP problem φ := min φx), 9) x X where X R n is a convex set and φ : X IR is a convex function that has L φ -Lipschitz-continuous gradient over X with respect to the norm. It is well-nown that x X is an optimal solution of 9) if and only if φx ) N X x ). Moreover, this optimality condition is in turn related to the projected gradient of the function φ over X defined as follows. Definition 3 Given a fixed constant τ > 0, we define the projected gradient of φ at x X with respect to X as see, for example, [3]) φ x)] τ X := 1 τ [ x Π X x τ φ x))], 10) where Π X ) is the projection map onto X defined in terms of the inner product norm see Subsection 1.1). The following proposition relates the projected gradient to the aforementioned optimality condition. Proposition 4 Let x X be given and define x + := Π X x τ φ x)). Then, for any given ɛ 0, the following statements hold: a) φ x)] τ X ɛ if, and only if, φ x) N X x + ) + Bɛ); b) φ x)] τ X ɛ implies that φ x+ ) N X x + ) + B 1 + τl φ )ɛ). Proof. To simplify notation, define v := φ x)] τ X. By 10) and well-nown properties of the projection operator Π X, we have v = φ x)] τ X x τv = Π X x τ φ x)) x τ φ x) x τv), y x τv) 0, y X v φ x), y x τv) 0, y X v φ x), y x + 0, y X φ x) v N X x + ). 5
6 from which statement a) clearly follows. To show b), assume that v ɛ. Then, we have x + x Π X x τ φ x)) x = x τv) x = τ v τɛ. which, together with the assumption that φ has L φ -Lipschitz-continuous gradient, implies that φ x + ) φ x) τl φ ɛ. Statement b) now follows immediately from the latter conclusion and statement a). The following lemma summarizes some interesting properties of the projected gradient. Lemma 5 Let x X be given and x + := Π X x τ φ x)). Denoting g ) := φ ), g X ) := φ )] τ X. We have: a) φ x + ) φ x) τ g X x) 2 /2 for any τ 1/L φ ; b) For any x X, there holds in particular, setting x = x, we obtain c) For any x X, there holds d) If τ = 1/L φ in definition 10), then where x Argmin x X φx). g x) g X x), x x + 0; 11) g x) g X x), g X x) 0; 12) φ x + ) φx) 1 + τl φ ) g X x) x + x. 13) φx) φx ) 1 2L φ g X x) 2, x X, 14) Proof. a) This statement is proved in p. 87 of [3]. b) Noting that x + = x τg X x) = Π X x τg x)), it follows from well-nown properties of the projection map Π X that x x τg X ), x τg x) x τg X x)) = x x τg X x)), τg X x) g x)) = x x +, τg X x) g x)) 0, x X, which clearly implies statement b). c) It follows from from the convexity of φ ), 11), the assumption that φ ) has L φ -Lipschitzcontinuous gradient, and definition 10) that φ x + ) φx) g x + ), x + x = g x) g X x), x + x + g X x), x + x + g x + ) g x), x + x g X x), x + x + g x + ) g x), x + x g X x), x + x + L φ x + x x + x = g X x), x + x + τl φ g X x) x + x 1 + τl φ ) g X x) x + x, x X. 6
7 d) Using the fact that Φx ) Φx), x X and the assumption that φ ) has L φ -Lipschitzcontinuous gradient, we conclude that φx ) φ x 1 ) g X x) φx) 1 gx), g X x) + 1 g X x) 2 L φ L φ 2L φ φx) 1 2L φ g X x) 2 for any x X, where the last inequality follows from 12). 3.2 Nesterov s Optimal Method In this subsection, we discuss Nesterov s smooth first-order method for solving a class of smooth CP problems. Our problem of interest is still the CP problem 9), which is assumed to satisfy the same assumptions mentioned in Subsection 3.1, except that now we allow to be an arbitrary norm. Moreover, we assume throughout our discussion that the optimal value φ of problem 9) is finite and that its set of optimal solutions is nonempty. Let h : X IR be a differentiable strongly convex function with modulus σ h > 0 with respect to, i.e., hx) h x) + h x), x x + σ h 2 x x 2, x, x X. 15) The Bregman distance d h : X X IR associated with h is defined as d h x; x) hx) l h x; x), x, x X, 16) where l h : R n X IR is the linear approximation of h defined as l h x; x) = h x) + h x), x x, x, x) R n X. We are now ready to state Nesterov s smooth first-order method for solving 9). We use the superscript sd in the sequence obtained by taing a steepest descent step and the superscript ag which stands for aggregated gradient ) in the sequence obtained by using all past gradients. Nesterov s Algorithm: end 0) Let x sd 0 = xag 0 X be given and set = 0 1) Set x = 2 +2 xag + +2 xsd and compute φx ) and φ x ). 2) Compute x sd +1, xag +1 ) X X as x sd +1 Argmin x ag +1 argmin 3) Set + 1 and go to step 1. { l φ x; x ) + L φ { L φ σ h d h x; x 0 ) + } 2 x x 2 : x X i=0 i [l φ x; x i )] : x X, 17) }. 18) The main convergence result established by Nesterov [4] regarding the above algorithm is sum- 7
8 marized in the following theorem. Theorem 6 The sequence {x sd } generated by Nesterov s optimal method satisfies φx sd ) φ 4L φ d h x ; x sd 0 ), 1, σ h + 1) where x is an optimal solution of 9). satisfying φx sd ) φ ɛ can be found in no more than d h x ; x sd 0 )L φ σ h ɛ 2 As a consequence, given any ɛ > 0, an iterate x sd X 19) iterations. The following result is as an immediate special case of Theorem 6. Corollary 7 Suppose that is a inner product norm and h : X R is chosen as h ) = 2 /2 in Nesterov s optimal method. Then, for any ɛ > 0, an iterate x sd X satisfying φxsd ) φ ɛ can be found in no more than x sd 0 x 2Lφ 20) ɛ iterations, where x is an optimal solution of 9). Proof. If hx) = x 2 /2, then 16) implies that d h x ; x sd 0 ) = xsd 0 x 2 /2. The corollary now follows from this bound and relation 19). Now assume that the objective function φ is strongly convex over X, i.e., for some µ > 0, φx) φ x), x x µ x x 2, x, x X. 21) Nesterov shows in Theorem of [3] that, under the assumptions of Corollary 7, a variant of his optimal method finds a solution x X satisfying φx ) φ ɛ in no more than L φ µ log L φ x sd 0 x 2 22) ɛ iterations. The following result gives an iteration-complexity bound for Nesterov s optimal method that replaces the term logl φ x sd 0 x 2 /ɛ) in 22) with logµ x sd 0 x 2 /ɛ). The resulting iterationcomplexity bound is not only sharper but also more suitable since it maes it easier for us to compare the quality of the different bounds obtained in our analysis of first-order penalty methods. Theorem 8 Let ɛ > 0 be given and suppose that the assumptions of Corollary 7 hold and that the function φ is strongly convex with modulus µ. Then, the variant where we restart Nesterov s optimal method, with proximal function h ) = 2 /2, every 8L φ K := 23) µ 8
9 iterations finds a solution x X satisfying φ x) φ ɛ in no more than 8 L φ max{1, log Q} 24) µ iterations, where and x := argmin x X φx). Q := µ xsd 0 x 2 2ɛ 25) Proof. Denote D := x sd 0 x. First consider the case where Q 1. Clearly we have ɛ µd 2 /2, which, in view of Corollary 7, implies that the number of iterations is bounded by 2 L φ /µ and hence bounded by 24). We now show that bound 24) holds for the case when Q > 1. Let x j be the iterate obtained at the end of j 1)-th restart after K iterations of Nesterov s optimal method are performed. By Theorem 6 with h ) = 2 /2 and inequality 21), we have φx 1 ) φx ) 2L φ x sd 0 x 2 K 2 2L φ D 2 K 2 φx j ) φx ) 2L φ x j 1 x 2 K 2 4L φ [ φx j 1 µ K 2 ) φx ) ], j 2. Using the above relations inductively, we conclude that Setting j = log Q and observing that K 2j [ 8Lφ µ ) 1 2 ] 2j = 2 j 4Lφ µ φx j ) φx ) 4L φ) j D 2 2µ j 1 K 2j. ) j ) j 2 log Q 4Lφ µd2 4L φ ) j µ 2ɛ µ j = D2 4L φ ) j 2ɛ µ j 1, we conclude that φx j ) φx ) ɛ. Hence, the overall number of iterations is bounded by K log Q, or equivalently, by 24). It is interesting to compare the complexity bounds of Corollary 7 and Theorem 8. Indeed, it can be easily seen that there exists a threshold value Q > 0 such that the condition Q Q, where Q is defined in 25), implies that the bound 20) is always greater than or equal to the bound 24). Moreover, as Q goes to infinity, the ratio between 20) and 24) converges to infinity. Hence, when the function φ is strongly convex and Q satisfies 25), we will always use the bound 24) in our analysis. 3.3 Nesterov s smooth approximation scheme In this subsection, we discuss Nesterov s smooth approximation scheme for solving a class of nonsmooth CP problems which can be reformulated as special smooth convex-concave saddle point i.e., min-max) problems. 9
10 Our problem of interest in this subsection is still problem 9). We assume that X R n is a closed convex set and φ : X IR is a convex function given by φx) := ˆφx) + sup{ Ex, y ψy) : y Y }, x X, 26) where ˆφ : X IR is a convex function with L ˆφ-Lipschitz-continuous gradient with respect to an arbitrary norm on R n, Y R m is a compact convex set, φ : Y IR is a continuous convex function, and E : R n R m is an affine operator. Unless stated explicitly otherwise, we assume that the CP problem 9) referred to in this subsection is the one with its objective function given by 26). Also, we assume throughout our discussion that f is finite and that the set of optimal solutions of 9) is nonempty. The function φ defined as in 26) is generally non-differentiable but can be closely approximated by a function with Lipschitz-continuous gradient using the following construction due to Nesterov [4]. Let h : Y IR be a continuous strongly convex function with modulus σ h > 0 with respect to an arbitrary norm on R m satisfying min{ hy) : y Y } = 0. For some smoothness parameter η > 0, consider the following function φ η x) = ˆφx) + max y { } Ex, y ψy) η hy) : y Y. 27) The next result, due to Nesterov [4], shows that φ η is a function with Lipschitz-continuous gradient with respect to, whose closeness to φ depends linearly on the parameter η. Proposition 9 The following statements hold: a) For every x X, we have φ η x) φx) φ η x) + ηd h, where } D h := max { hy) : y Y ; 28) b) The function φ η x) has L η -Lipschitz-continuous gradient with respect to, where L η := L ˆφ + E 2. 29) ησ h We are now ready to state Nesterov s smooth approximation scheme to solve 9). Nesterov s smooth approximation scheme: end 0) Let ɛ > 0 be given. 1) Let η := ɛ/2d h) and consider the approximation problem φ η inf{φ η x) : x X}. 30) 2) Apply Nesterov s optimal method or its variant) to 30) and terminate whenever an iterate x sd satisfying φxsd ) φ ɛ is found. The following result describes the convergence behavior of the above scheme. 10
11 Theorem 10 Nesterov s smooth approximation scheme generates an iterate x sd φ ɛ in no more than satisfying φxsd ) 8D h x sd 0 ) 2D h E 2 + L ˆφ) 31) σ h ɛ σ hɛ iterations, where D h x sd 0 ) := max x X d h x; x sd 0 ) and D h is defined in 28). 4 The quadratic penalty method The goal of this section is to establish the iteration-complexity in terms of Nesterov s optimal method iterations) of the quadratic penalty method for solving 1). Specifically, we present the iteration-complexity of the classical quadratic penalty method in Subsection 4.1, and a variant of this method, for which a perturbation term is added into the objective function of 1), is discussed and analyzed in Subsection 4.2. The basic idea underlying penalty methods is rather simple, namely: instead of solving problem 1) directly, we solve certain relaxations of 1) obtained by penalizing some violation of the constraint Ax) K. More specifically, in the case of the quadratic penalty method, given a penalty parameter ρ > 0, we solve the relaxation { Ψ ρ := inf Ψ ρ x) := fx) + ρ x X 2 [d K Ax))]2}. 32) In order to guarantee that the function Ψ ρ is smooth, we assume throughout this section that the distance function d K is defined with respect to an inner product norm in R m. Note that in this case, we have =. We will now see that the objective function Ψ ρ of 32) has Lipschitz continuous gradient. We first state the following well-nown result which guarantees that the distance function has Lipschitz continuous gradient see for example Proposition 15 of [1] for its proof). Proposition 11 Given a closed convex set C R m, consider the distance function d C : R m IR to C with respect some inner product norm on R m. Then, the function ψ : R m R defined as ψu) = [d C u)] 2 is convex and its gradient is given by ψu) = 2[u Π C u)] for every u R m. Moreover, ψũ) ψũ) 2 ũ u for every u, ũ R m. As an immediate consequence of Proposition 11, we obtain the following result. Corollary 12 The function Ψ ρ has M ρ -Lipschitz continuous gradient where M ρ := L f + ρ A 2. Proof. The differentiability of Ψ ρ follows immediately from the assumption that f is differentiable and Proposition 11. Moreover, it easily follows from the chain rule that Ψ ρ x) = fx) + ρa 0 d2 K )Ax))/2, which together with 2) and Proposition 11, then imply that Ψ ρ x 1 ) Ψ ρ x 2 ) fx 1 ) fx 2 ) + ρ 2 A 0 d 2 K)Ax 1 )) A 0 d 2 K)Ax 2 )) L f x 1 x 2 + ρ 2 A 0 d 2 KAx 1 )) d 2 KAx 2 )) L f x 1 x 2 + ρ A 0 A 0 x 1 x 2 ) L f x 1 x 2 + ρ A 0 A 0 x 1 x 2 = M ρ x 1 x 2, 11
12 for every x 1, x 2 X, where the last equality follows from the fact that A = A 0 = A 0. Therefore, problem 32) can be solved, for example, by Nesterov s optimal method or one of its variants see for example [1, 2, 4]). 4.1 Quadratic penalty method applied to the original problem In this subsection, we consider the application of the quadratic penalty method directly to the original problem 1), which consists of solving the penalized problem 32) for a suitably chosen penalty parameter ρ. This subsection contains two subsubsections. The first one considers the complexity of obtaining an ɛ p, ɛ o )-primal solution of 1) while the second one discusses the complexity of computing an ɛ p, ɛ d )-primal-dual solution of 1) Computation of a primal approximate solution In this subsection, we are interested in deriving the iteration-complexity involved in the computation of an ɛ p, ɛ o )-primal solution of problem 1) by approximately solving the penalized problem 32) for some value of the penalty parameter ρ. Given an approximate solution x X for the penalized problem 32), the following result provides bounds on the primal infeasibility and optimality gap of x with respect to 1). Proposition 13 If x X is a δ-approximate solution of 32), i.e., it satisfies Ψ ρ x) Ψ ρ δ, 33) then d K A x)) 2 ρ λ + 2δ ρ 34) f x) f δ, 35) where λ is an arbitrary Lagrange multiplier associated with 1). Proof. Using the fact that v0) = f Ψ ρ, Corollary 2 and assumption 33), we conclude that δ Ψ ρ x) Ψ ρ = f x) + ρ 2 [d K A x))]2 Ψ ρ f x) f + ρ 2 [d K A x))]2 λ d K A x)) + ρ 2 [d K A x))]2, which clearly implies both 34) and 35). For the sae of future reference, we state the following simple result. Lemma 14 Let α i, i = 0, 1, 2, be given positive constants. Then, the only positive scalar ρ satisfying the equation α 2 ρ 1 + α 1 ρ 1/2 = α 0 is given by ρ = α 1 + α α 0α 2 2α 0 ) 2. 12
13 With the aid of Proposition 13, we can now derive an iteration-complexity bound for Nesterov s method applied to the quadratic penalty problem 32) to compute an ɛ p, ɛ o )-primal solution of 1). Theorem 15 Let λ be an arbitrary Lagrange multiplier for 1) and let ɛ p, ɛ o ) IR ++ IR ++ be given. If ɛo + ) 2 ɛ o + 4ɛ p t ρ = ρ p t) := 36) 2ɛp for some t λ, then Nesterov s optimal method applied to problem 32) finds an ɛ p, ɛ o )-primal solution of 1) in at most { N p t) := 2 D h x sd 0 ) } L f 2 A 2t + + A σ h ɛ o ɛ p ɛ p ɛ o, 37) iterations, where D h x sd 0 ) is defined in Theorem 10. In particular, if is the inner product norm on R n and h ) = 2 /2, then the above bound reduces to { } 2DX L f 2 A 2t N p t) := + + A, 38) ɛ o ɛ p ɛ p ɛ o where D X := max x1,x 2 X x 1 x 2. Proof. Let x X satisfies 33) with δ = ɛ o. Proposition 13, the assumption that t λ, relation 36) and Lemma 14 with α 0 = ɛ p, α 1 = ɛ o and α 2 = t then imply that f x) f ɛ o and d K A x)) 2 ρ p t) λ + 2 ɛ o ρ p t) 2t ρ p t) + 2ɛ o ρ p t) = ɛ p, and hence that x is an ɛ p, ɛ o )-primal solution of 1). In view of Corollary 7, Nesterov s optimal method finds an approximate solution x as above in at most 2 D h x sd 0 ) M ρ σ h ɛ 0 iterations, where M ρ := L f +ρ A 2. Substituting the value of ρ given by 36) in this iteration bound and using some trivial majorization, we obtain bound 37). We now mae a few observations regarding Theorem 15. First, the choice of ρ given by 36) requires that t λ so as to guarantee that an ɛ o -approximate solution x of 32) satisfies d K A x)) ɛ p. Second, note that the iteration-complexity N p t) obtained in Theorem 15 achieves its minimum possible value over the interval t λ exactly when t = λ. However, since the quantity λ is not nown a priori, it is necessary to use a guess and chec procedure for t so as to develop a scheme for computing an ɛ p, ɛ o )-primal solution of 1) whose iteration-complexity has the same order of magnitude as the ideal one in which t = λ. We now describe the aforementioned guess and chec procedure for t. Search Procedure 1: 13
14 1) Set = 0 and define D h x sd 0 β 0 := 2 ) ) L f 2 A 2D h x sd 0 +, β 1 := 2 A ) [ ] max1, β0 ) 2, t 0 :=. 39) σ h ɛ o ɛ p σ h ɛ p ɛ o β 1 2) Set ρ = ρ p t ), and perform at most N p t ) iterations of Nesterov s optimal method applied to problem 32). If an iterate x is obtained such that 33) with δ = ɛ o and d K A x)) ɛ p are satisfied, then stop; otherwise, go to step 3; 3) Set t +1 = 2t, = + 1, and go to step 2. Before establishing the iteration-complexity of the above procedure, we state a technical result, namely Lemma 17, which will be used more than once in the paper. Lemma 16 states a simple inequality which is also used in a few times in our presentation. Lemma 16 For any scalars τ 0, x > 0, and α 0, we have τx + α τ + α) x. Lemma 17 Let a positive scalar p be given. Then, there exists a constant C = Cp) such that for any positive scalars β 1, β 2, t, we have β 1 + β 2 t p C β 1 + β 2 t p, 40) where t = t 0 2 for = 1,..., K, and ) maxβ1, 1) 1/p { )} t t 0 :=, K := max 0, log. 41) β 2 t 0 Proof. Assume first that t t 0. Due to the definition of K in 41), we have K = 0 in this case, and hence that β 1 + β 2 t p = β 1 + β 2 t p 0 = β 1 + maxβ 1, 1) 2β β 1 4 β 1 + β 2 t p, where the second equality follows from the definition of t 0. Hence, in the case where t t 0, inequality 40) holds with C = 4. Assume now that t > t 0. By the definition of K in 41), we have K = log t/t 0 ), from which we conclude that K < log t/t 0 ) + 1, and hence that t 0 2 K+1 < 4 t. Using these relations, the inequality 14
15 log x = log x p )/p x p /p for any x > 0, and the definition of t 0 in 41), we obtain β 1 + β 2 t p K 1 + β 1 + β 2 t p 0 2p 1 + β 1 )1 + K) + β 2 t p 2 K+1)p 0 2 p 1 [ )] t 4 t) p 1 + β 1 ) 2 + log + β 2 t 0 2 p 1 [ 1 + β 1 ) ) t p ] + 4p p t 0 2 p 1 β 2 t p [ 1 + β 1 ) β 2 t p )] + 4p p maxβ 1, 1) 2 p 1 β 2 t p 21 + β 1 ) + 2 p β 2 t p + 4p 2 p 1 β 2 t p } 2p 4p 2 + max {2, + 2 p β 1 + β 2 t p ), 1 which, in view of Lemma 16, implies that 40) holds with C = Cp) := 2 + max {2, 2p + 4p 2 p 1 }. The following result gives the iteration-complexity of Search Procedure 1 for obtaining an ɛ p, ɛ o )- primal solution of 1). Corollary 18 Let λ be the minimum norm Lagrange multiplier for 1). Then, the overall number of iterations of Search Procedure 1 for obtaining an ɛ p, ɛ o )-primal solution of 1) is bounded by ON p λ )), where N p ) is defined in 37). Proof. In view of Theorem 15, the iteration count in Search Procedure 1 can not exceed K := max{0, log λ /t 0 ), and hence its overall number of inner i.e., Nesterov s optimal method) iterations is bounded by K N pt ) = K β 0 + β 1 t 1/2, where β 0 and β 1 are defined by 39). The result now follows from the definition of t 0 in 39) and 40) with p 1 = 1/2 and t = λ Computation of a primal-dual approximate solution In this subsection, we are interested in deriving the iteration-complexity involved in the computation of an ɛ p, ɛ d )-primal-dual solution of problem 1) by approximately solving the penalized problem 32) for certain value of the penalty parameter ρ. We assume throughout this subsection that the norms on R n and R m are the ones associated with their respective inner products. Given an approximate solution x X for the penalized problem 32), the following result shows that there exists a pair x +, λ) X K) depending on x that approximately satisfies the optimality conditions 6). 15
16 Proposition 19 If x X is a δ-approximate solution of 32), i.e., it satisfies 33), then the pair x +, λ) defined as x + := Π X x Ψ ρ x)/m ρ ), 42) λ := ρ[a x + ) Π K A x + ))] 43) is in X K) and satisfies the second relation in 7) and the relations d K A x + )) 1 δ ρ λ + ρ, 44) f x + ) + A 0 ) λ N X x + ) + B2 2M ρ δ), 45) where λ is an arbitrary Lagrange multiplier for 1). Proof. It follows from Lemma 5a) with φ = Ψ ρ and τ = M ρ that Ψ ρ x + ) Ψ ρ x), and hence that Ψ ρ x + ) Ψ ρ δ in view of 33). By Proposition 13, this implies that 44) holds. Moreover, by Lemma 5d) with φ = Ψ ρ and L φ = M ρ and assumption 33), we have Ψ ρ x)] 1/Mρ X 2M ρ [Ψ ρ x) Ψ ρ] 2M ρ δ. Relation 45) now follows from this inequality, Proposition 4b) with L φ = M ρ, τ = 1/M ρ and ɛ = 2M ρ δ, and the fact that Ψ ρ x + ) = f x + ) + A 0 ) λ, where λ is given by 43). Finally, λ K) and the second relation of 7) holds in view of the definition of λ and well-nown properties of the projection operator onto a cone. With the aid of Proposition 19, we can now derive an iteration-complexity bound for Nesterov s method applied to the quadratic penalty problem 32) to compute an ɛ p, ɛ d )-primal-dual solution of 1). Theorem 20 Let λ be an arbitrary Lagrange multiplier for 1) and let ɛ p, ɛ d ) IR ++ IR ++ be given. If ρ = ρ pd t) := 1 t + ɛ ) d 46) ɛ p 8 A for some t λ, then Nesterov s optimal method with h ) = 2 /2 applied to problem 32) finds an ɛ p, ɛ d )-primal-dual solution of 1) in at most ) L f N pd t) := 4D X + A 2 t + A 47) ɛ d ɛ p ɛ d 8ɛp iterations, where D X is defined in Theorem 15. Proof. Let x X be a δ-approximate solution of 32) where δ := ɛ 2 d /8M ρ). Noting that δ ɛ 2 d /8ρ A 2 ) in view of the fact that M ρ := L f + ρ A 2 ρ A 2, we conclude from Proposition 19 that the pair x +, λ) defined as x + := Π X x Ψ ρ x)/m ρ ) and λ := ρ[a x + ) Π K A x + ))] satisfies f x + ) + A λ N X x + ) + Bɛ d ) and d K A x + )) 1 ρ λ + δ ρ 1 λ + ρ 16 ɛ d 8 A ) ɛ p,
17 where the last inequality is due to 46) and the assumption that t λ. We have thus shown that x +, λ) is an ɛ p, ɛ d )-primal-dual solution of 1). In view of Corollary 7, Nesterov s optimal method finds an approximate solution x as above in at most ) ) 2DX 1/2 Mρ 8M 2 1/2 = ρ δ 2DX ɛ 2 d = M ρ 4D X = ɛ d L f + ρ A 4D 2 X iterations. Substituting the value of ρ given by 36) in the above bound, we obtain bound 37). The following result states the iteration-complexity of a guess and chec procedure based on Theorem 20. Its proof is based on similar arguments as those used in the proof of Corollary 18. Corollary 21 Let λ be the minimum norm Lagrange multiplier for 1). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions N pd and ρ pd defined in Theorem 20, δ is set to ɛ 2 d /8M ρ), and t 0 is set to [maxβ 0, 1)/β 1 ] with ) L f β 0 = 4D X + A, β 1 = 4D X A 2. ɛ d 8ɛp ɛ p ɛ d Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ d )- primal-dual solution of 1) is bounded by ON pd λ )), where N pd ) is defined in 47). 4.2 Quadratic penalty method applied to a perturbed problem Throughout this subsection, we assume that the norm on R n is the one associated with its inner product. Recall that throughout Section 4, we are also assuming that the norm on R m is the one associated with its inner product.) Consider the perturbation problem f γ := min{f γ x) := fx) + γ 2 x x 0 2 : Ax) K, x X}, 48) where x 0 is a fixed point in X and γ > 0 is a pre-specified perturbation parameter. It is well-nown that if γ is sufficiently small, then an approximate solution of 48) will also be an approximate solution of 1). In this subsection, we will derive the iteration-complexity in terms of Nesterov s optimal method iterations) of computing an approximate solution of 1) by applying the quadratic penalty approach to the perturbation problem 48) for a conveniently chosen perturbation parameter γ > 0. The subsection is divided into two subsubsections: the computation of a primal approximate solution of 1) is treated in the first subsubsection while the computation of a primal-dual approximate solution of 1) is treated in the second one Computation of a primal approximate solution The quadratic penalty problem associated with 48) is given by Ψ ρ,γ := min x X { Ψ ρ,γ x) := fx) + γ 2 x x ρ 2 d K Ax))2}. 49) ɛ d 17
18 It can be easily seen that the function Ψ ρ,γ has M ρ,γ -Lipschitz continuous gradient where M ρ,γ := L f + ρ A 2 + γ. 50) The following simple lemma relates the optimal values of the perturbation problem 48) and the original problem 1). Lemma 22 Let f, Ψ ρ, f γ, and Ψ ρ,γ be the optimal values defined in 1), 32), 48), and 49), respectively. Then, where D X := max x1,x 2 X x 1 x 2. 0 fγ f γdx 2 /2 51) 0 Ψ ρ,γ Ψ ρ γdx 2 /2, 52) Proof. The first inequalities in both relations 51) and 52) follow immediately from the fact that f γ f and Ψ ρ,γ Ψ ρ. Now, let x and x γ be optimal solutions of 1) and 48), respectively. Then, f γ = fx γ) + γ 2 x γ x 0 2 fx ) + γ 2 x x 0 2 f + γd2 X 2, from which the second inequality in 51) immediately follows. The second inequality in 52) can be similarly shown. The following theorem gives the iteration-complexity of computing an ɛ p, ɛ o )-primal solution of 1) by applying the quadratic penalty approach to the perturbation problem 48) with an appropriate choice of perturbation parameter γ > 0. Theorem 23 Let λ γ be an arbitrary Lagrange multiplier for 48) and let ɛ p, ɛ o ) IR ++ IR ++ be given. If ρ = ρ p t) := 4t and γ = ɛ o ɛ p DX 2, 53) for some t λ γ, then the variant of Nesterov s optimal method of Theorem 8 with µ = γ and L φ = M ρ,γ applied to problem 49) finds an ɛ p, ɛ o )-primal solution of 1) in at most Ñ p t) := 2 2D X L f ɛ o + 2 A iterations, where D X is defined in Theorem 15. t ɛ p ɛ o ) + 1 { max 1, log ɛ } o tɛ p Proof. Assume that ρ and γ are given by 53) for some t λ γ. Let δ := min{ɛ o /4, ɛ p t/2} and x X be a δ-approximate solution of 49), i.e., Ψ ρ,γ x) Ψ ρ,γ δ. Then, Proposition 13 with f = f γ, Ψ ρ = Ψ ρ,γ and ρ = ρ p t) and the assumption that t λ γ imply that d K A x)) 2 2δ ρ p t) λ γ + ρ p t) ɛ p λ γ + 2t tɛ p 4t/ɛ p ɛ p 2 + ɛ p 2 = ɛ p 54) 18
19 and f γ x) f γ δ ɛ o /4. The later inequality together with 51) and 53) in turn imply that f x) f f γ x) f [f γ x) f γ ] + [f γ f ] ɛ o 4 + ɛ o 2 < ɛ o. We have thus proved that x is an ɛ p, ɛ o )-primal solution of 1). Now, using Theorem 8 with φ = Ψ ρ,γ, µ = γ and ɛ = δ, and noting that the definition of γ in 53) and the definition of δ imply that Q = µ xsd 0 x 2 = γ xsd 0 x 2 { γd2 X ɛ o = 2ɛ 2δ 2δ min{ɛ o /2, ɛ p t} = max 2, ɛ } o, ɛ p t we conclude that the number of iterations performed by Nesterov s optimal method with the restarting feature) for finding a δ-approximate solution x as above is bounded by { 8M ρ,γ 8M ρ,γ log Q = max 1, log ɛ } o. γ γ tɛ p Now, using relations 50) and 53) and some trivial majorization, we easily see that the latter bound is majorized by 54). Note that the complexity bound 54) derived in Theorem 23 is guaranteed only under the assumption that t λ γ, where λ γ is the minimum norm Lagrange multiplier for 48). Since the bound 54) is a monotonically increasing function of t, the ideal theoretical) choice of t would be to set t = λ γ. Without assuming any nowledge of this Lagrange multiplier, the following result shows that a guess and chec procedure similar to Search Procedure 1 still has an OÑp λ γ )) iterationcomplexity bound, where Ñp ) is defined in 54). Before establishing the iteration-complexity of this guess and chec procedure, we state the following technical result which substantially generalizes the one stated in Lemma 17. The proof of this result is given in the Appendix. Proposition 24 For some positive integer L, let positive scalars p 1, p 2,, p L be given. Then, there exists a constant C = Cp 1,, p L ) such that for any positive scalars β 0, β 1,, β L, ν, and t, we have β 0 + βl t p ) { } l max 1, log νt C β 0 + β l β l t p l ) { max 1, log ν t }, 55) where { )} ) t maxβ0, 1) 1/pl K := max 0, log, t 0 := min, t = t 0 2, = 1,..., K. 56) t 0 1lL In particular, if ν = t, then 55) implies that β 0 + βl t p ) l C β 0 + β l t p l ). 57) We are now ready to discuss the iteration-complexity bound of the above-mentioned guess and chec procedure. 19
20 Corollary 25 Let λ γ be the minimum norm Lagrange multiplier for 48). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions Ñp and ρ p defined in Theorem 23, δ is set to δ := min{ɛ o /4, ɛ p t /2}, and t 0 is set to [maxβ 0, 1)/β 1 ] 2 with β 0 = 2 L f 2D X + 1, β 1 = 4 2DX A. 58) ɛ o ɛp ɛ o Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ o )- primal-dual solution of 1) is bounded by OÑp λ γ )), where Ñp ) is defined in 54). Proof. In view of Theorem 23, the iteration count in Search Procedure 1 can not exceed K := max{0, log λ γ /t 0 ), and hence its overall number of inner i.e., Nesterov s optimal method) iterations is bounded by K Ñpt ) = K β 0 + β 1 t 1/2 max{1, logν/t )}, where β 0 and β 1 are defined by 58), and ν = ɛ 0 /ɛ p. The result now follows from the definition of t 0 and Proposition 24 with L = 1, p 1 = 1/2, and t = λ γ. It is interesting to compare the two functions N p t) and Ñpt) defined in Theorems 15 and 23, respectively. Indeed, both functions up to some constant factor) are comparable for the case when r t := ɛ o /tɛ p ) = O1). Otherwise, if r t = Ω1) and, we further assume that the term involving t dominates the other terms in the right hand side of 54), i.e., 2 L f 2D X + 1 = O 4 ) t 2D X A, ɛ o ɛ p ɛ o then ) Ñ p t) log N p t) O1) rt, rt for some universal constant O1). Hence, in the latter case, we conclude that the function Ñpt) is asymptotically smaller than the function N p t) Computation of a primal-dual approximate solution In this subsection, we derive the iteration-complexity of computing a primal-dual approximate solution of 1) by applying the quadratic penalty method to the perturbation problem 48). We will show that the resulting approach has a substantially better iteration-complexity than the one discussed in Subsection 4.1.2, which consists of applying the quadratic penalty method directly to the original problem 1). Theorem 26 Let λ γ be an arbitrary Lagrange multiplier for 48) and let ɛ p, ɛ d ) IR ++ IR ++ be given. Also assume that ρ = ρ pd t) := 1 ) ɛ t + d, γ = ɛ d, 59) ɛ p 32 A 2D X for some t λ γ. Then, the variant of Nesterov s optimal method of Theorem 8 with µ = γ and L φ = M ρ,γ applied to problem 49) finds an ɛ p, ɛ d )-primal-dual solution of 1) in at most Ñ pd t) := 8St) 2 log2st)), 60) 20
21 where D X is defined in Theorem 15 and ) ] 1 L f St) := [2D X + A 2 1/2 2D X A t 1/2. 61) ɛ d 32ɛp ɛp ɛ d Proof. Define δ := ɛ 2 d /32M ρ,γ) and let x X be a δ-approximate solution for 49), i.e., Ψ ρ,γ x) Ψ ρ,γ δ. It then follows from Proposition 19 with Ψ ρ = Ψ ρ,γ, f = f γ, and M ρ = M ρ,γ that the pair x +, λ) defined as x + := Π X x Ψ ρ,γ x)/m ρ,γ ) and λ := ρ[a x + ) Π K A x + ))] satisfies f γ x + ) + A λ N X x + ) + B2 2M ρ,γ δ) = N X x + ) + Bɛ d /2). This together with the fact that γ x + x 0 γd X ɛ d /2 then imply that f x + ) + A λ = [ f γ x + ) γ x + x 0 )] + A λ γ x + x 0 ) N X x + ) + Bɛ d /2) N X x + ) + Bɛ d ). Moreover, noting that δ := ɛ 2 d /32M ρ,γ) ɛ 2 d /32ρ A 2 ), we conclude that Proposition 19 also implies that d K A x + )) 1 δ ρ λ γ + ρ 1 ) λ ɛ d ρ γ + ɛ p, 32 A where the last inequality is due to 59) and the assumption that t λ γ. We have thus shown that x +, λ) is an ɛ p, ɛ d )-primal-dual solution of 1). Now, using Theorem 8 with φ = Ψ ρ,γ, µ = γ and ɛ = δ, and noting that the definition of δ and the definition of γ in 59) imply that Q = µ xsd 0 x 2 2ɛ = γ xsd 0 x 2 2δ γd2 X 2δ = γd 2 X ɛ 2 d /16M ρ,γ) = 4M ρ,γ, γ we conclude that the number of iterations performed by Nesterov s optimal method with the restarting feature) for finding a δ-approximate solution x as above is bounded by ) M ρ,γ M ρ,γ 4Mρ,γ 8 log Q = 8 log. 62) γ γ γ Now, using 50) and the definitions of ρ and γ in 53), we easily see that the latter bound is majorized by 60). Note that the complexity bound 60) derived in Theorem 26 is guaranteed only under the assumption that t λ γ, where λ γ is the minimum norm Lagrange multiplier for 48). Since the bound 60) is a monotonically increasing function of t, the ideal theoretical) choice of t would be to set t = λ γ. Without assuming any nowledge of this Lagrange multiplier, the following result shows that a guess and chec procedure similar to Search Procedure 1 still has an OÑpd λ γ )) iteration-complexity bound, where Ñpd ) is defined in 60). Corollary 27 Let λ γ be the minimum norm Lagrange multiplier for 48). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions Ñpd and ρ pd defined in Theorem 26, Nesterov s optimal method is replaced by its variant 21
22 of Theorem 8 with µ = γ and L φ = M ρ,γ see step 2 of Search Procedure 1), δ is set to ɛ 2 d /32M ρ,γ), and t 0 is set to [maxβ 0, 1)/β 1 ] 2 with β 0 = 8 ) ] 1/2 L f [2D X + A + 1, β 1 = 8 2D 1/2 X A ɛ d 32ɛp ɛ p ɛ d ) 1/2. 63) Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ d )- primal-dual solution of 1) is bounded by OÑpd λ γ )), where Ñpd ) is defined in 60). Proof. In view of Theorem 26, the iteration count of the above guess and chec see Search Procedure 1) for obtaining an ɛ p, ɛ d )-primal-dual solution of 1) can not exceed K := max{0, log λ /t 0 ), and hence its overall number of inner i.e., Nesterov s optimal method) iterations is bounded by Ñ pd t ) = 8St ) 2 log2st )) 2 log2st K )) 8St ). 64) Now, the definition of t 0 and relation 40) with p = 1/2 and t = λ γ imply that 8St ) = β 0 + β 1 t 1/2 = O β 0 + β 1 λ γ 1/2) = O 8S λ γ ) ), where β 0 and β 1 are defined by 63). Moreover, using the fact that t K 2 λ γ, we easily see that 2 log2st K )) = O 2 log2s λ γ )) ). Substituting the last two bounds into 64), we then conclude that K Ñpdt ) = OÑpd λ γ )), and hence that the corollary holds. It is interesting to compare the functions N pd t) and ˆN pd t) defined in Theorems 20 and 26, respectively. It follows from 47), 60), and 61) that Ñ pd t) N pd t) 8St) 2 log2st))) St)) 2 1 log St) O1) St) 1, 65) where O1) denotes an absolute constant. Hence, when St) is large, the bound Ñpdt) can be substantially smaller than the bound N pd t). Note that we can not use the previous observation to compare the iteration-complexity of Corollary 21 with that obtained in Corollary 27 since the first one is expressed in terms of λ and the later in terms of λ γ. However, if λ γ = O λ ), then it can be easily seen that 65) implies that Ñ pd λ γ ) N pd λ ) O1) log S λ ) S λ ) 1. Hence, the first complexity is better than the second one whenever λ γ = O λ ) and S λ ) is sufficiently large. The following result describes an alternative way of choosing the penalty parameter ρ in subproblem 49) as a function of t, but requires that t λ instead of t λ γ. 22
23 Theorem 28 Let λ be an arbitrary Lagrange multiplier for 1) and let ɛ p, ɛ d ) IR ++ IR ++ be given. Assume that ɛd D X /2 + ) 2 ɛ d D X /2) + 4αt)ɛ p ρ = ˆρ pd t) :=, γ = ɛ d, 66) 2ɛ p 2D X where αt) := t+ɛ d / 32 A ) for some t λ. Then, the variant of Nesterov s optimal method of Theorem 8 with µ = γ and L φ = M ρ,γ applied to problem 49) finds an ɛ p, ɛ d )-primal-dual solution of 1) in at most ˆN 8Ŝt) pd t) := 2 log2ŝt)), 67) where D X is defined in Theorem 15 and 2L f D X Ŝt) := + A ɛ d D X + ɛ p 2tD X ɛ p ɛ d ) + A DX 8 1/4 ɛ p ) Proof. Define δ := ɛ 2 d /32M ρ,γ) and let x X be a δ-approximate solution for 49), i.e., Ψ ρ,γ x) Ψ ρ,γ δ. As shown in the proof of Theorem 26, the pair x +, λ) defined as x + := Π X x Ψ ρ,γ x)/m ρ,γ ) and λ := ρ[a x + ) Π K A x + ))] satisfies f x + ) + A λ N X x + ) + Bɛ d ). Moreover, it follows from Lemma 5a) with φ = Ψ ρ,γ and τ = M ρ,γ that Ψ ρ,γ x + ) Ψ ρ,γ x), and hence that Ψ ρ,γ x + ) Ψ ρ,γ Ψ ρ,γ x) Ψ ρ,γ δ. This observation together with 32), 49), 52) and 66) then imply that Ψ ρ x + ) Ψ ρ = Ψ ρ,γ x + ) γ 2 x+ x 0 2 Ψ ρ [Ψ ρ,γ x + ) Ψ ρ,γ] + [Ψ ρ,γ Ψ ρ] δ + γd 2 X ɛ 2 d 32ρ A 2 + ɛ dd X, 2 where the last inequality follows from the definition of δ and the fact that M ρ ρ A 2. The above inequality together with Proposition 13, the assumption that t λ, relation 66) and Lemma 14 with α 0 = ɛ p, α 1 = ɛ d D X /2 and α 2 = t + ɛ d / 32 A ) =: αt) then imply that d K A x + )) 1 ρ λ + ɛ 2 d 32ρ 2 A 2 + D Xɛ d 2ρ 1 ρ t + ɛ d 32 A ) + D X ɛ d 2ρ We have thus shown that x +, λ) is an ɛ p, ɛ d )-primal-dual solution of 1). Now, using Theorem 8 with φ = Ψ ρ,γ, µ = γ and ɛ = δ, and noting that the definition of δ and the definition of γ in 66) imply that Q = µ xsd 0 x 2 2ɛ = γ xsd 0 x 2 2δ γd2 X 2δ γd 2 X = ɛ 2 d /16M ρ,γ) = 4M ρ,γ, γ we conclude that the number of iterations performed by Nesterov s optimal method with the restarting feature) for finding a δ-approximate solution x as above is bounded by 62). Since relations 50) and 66) and the definition of αt) imply that M ρ,γ γ = L f + ρ A 2 + γ γ L f ρ γ + A γ + 1 = 23 2L f D X ɛ d = ɛ p. ρ + A γ + 1
24 and ρ γ ) ɛd D X /2 αt) 2DX + = D X + ɛ p ɛ p ɛ d ɛ p ) D X 2tD X + +, ɛ p ɛ p ɛ d D X 8ɛp A t + ɛ d / 32 A ) 2DX ɛ p ɛ d it follows that 62) can be bounded by 67). The following result states the iteration-complexity of a guess and chec procedure based on Theorem 28. Its proof is based on similar arguments as those used in the proof of Corollary 27. Corollary 29 Let λ be the minimum norm Lagrange multiplier for 1). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions ˆN pd and ˆρ pd defined in Theorem 26, Nesterov s optimal method is replaced by its variant of Theorem 8 with µ = γ and L φ = M ρ,γ see step 2 of Search Procedure 1), δ is set to ɛ 2 d /32M ρ,γ), and t 0 is set to [maxβ 0, 1)/β 1 ] 2 with β 0 = 8 2L f D X ɛ d + A D X ɛ p + ) A D X 2D X + 1, β 1 = 8 A. 8ɛp ɛ p ɛ d Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ d )- primal-dual solution of 1) is bounded by O ˆN pd λ )), where ˆN pd ) is defined in 67). It is interesting to compare the functions N pd t) and ˆN pd t) defined in Theorems 20 and 26, respectively. When the second term in the right hand side of 68) is dominated by the other terms, i.e., then it can be easily seen that A D X ɛ p = O A t 1/2 D 1/2 X + ɛp ɛ d L f D X ɛ d ˆN pd t) N pd t) O1)Ñpdt) log St) O1) N pd t) St) 1. ), 69) Hence, if 69) holds and St) is large, then ˆN pd t) can be considerably smaller than N pd t). It is also interesting to observe when ɛ p = ɛ d = ɛ, the dependence on ɛ of the function ˆN pd is O1/ɛ) log1/ɛ) while that of N pd is O1/ɛ 2 ). 5 The exact penalty method The goal of this section is to establish the iteration-complexity in terms of Nesterov s optimal method iterations) of first-order exact penalty methods for computing a ɛ p -primal solution of 1). It consists of two subsections. The first one considers the complexity of a first-order exact penalty method applied directly to the original problem 1) while the second one deals with the complexity of a first-order exact penalty method applied to the perturbation problem 48) for some suitably chosen perturbation parameter θ. 24
Iteration-complexity of first-order augmented Lagrangian methods for convex programming
Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems
An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract
More informationPrimal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming
Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato
More informationEfficient Methods for Stochastic Composite Optimization
Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems
Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm
More informationCOMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS
COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS WEIWEI KONG, JEFFERSON G. MELO, AND RENATO D.C. MONTEIRO Abstract.
More informationOn the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean
On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationAdditional Homework Problems
Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.
More informationarxiv: v1 [math.oc] 5 Dec 2014
FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been
More informationConvex Optimization Under Inexact First-order Information. Guanghui Lan
Convex Optimization Under Inexact First-order Information A Thesis Presented to The Academic Faculty by Guanghui Lan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy H. Milton
More informationAn adaptive accelerated first-order method for convex optimization
An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant
More informationConvergence rate of inexact proximal point methods with relative error criteria for convex optimization
Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,
More informationOn the acceleration of augmented Lagrangian method for linearly constrained optimization
On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationGeneralized Uniformly Optimal Methods for Nonlinear Programming
Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationIterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming
Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming arxiv:1211.0056v2 [math.oc] 2 Nov 2012 Zhaosong Lu October 30, 2012 Abstract In this paper we consider l 0 regularized convex
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationAN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING
AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general
More informationSparse Approximation via Penalty Decomposition Methods
Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems
More informationA Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties
A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Abstract This paper presents an accelerated
More informationA Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions
A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework
More informationA hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities
A hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities Renato D.C. Monteiro Mauricio R. Sicre B. F. Svaiter June 3, 13 Revised: August 8, 14) Abstract
More informationc 2013 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 3, No., pp. 109 115 c 013 Society for Industrial and Applied Mathematics AN ACCELERATED HYBRID PROXIMAL EXTRAGRADIENT METHOD FOR CONVEX OPTIMIZATION AND ITS IMPLICATIONS TO SECOND-ORDER
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationOptimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers
Optimization for Communications and Networks Poompat Saengudomlert Session 4 Duality and Lagrange Multipliers P Saengudomlert (2015) Optimization Session 4 1 / 14 24 Dual Problems Consider a primal convex
More informationEuler Equations: local existence
Euler Equations: local existence Mat 529, Lesson 2. 1 Active scalars formulation We start with a lemma. Lemma 1. Assume that w is a magnetization variable, i.e. t w + u w + ( u) w = 0. If u = Pw then u
More informationIterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming
Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationPart 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)
Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x
More informationOn the Existence and Convergence of the Central Path for Convex Programming and Some Duality Results
Computational Optimization and Applications, 10, 51 77 (1998) c 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. On the Existence and Convergence of the Central Path for Convex
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationA strongly polynomial algorithm for linear systems having a binary solution
A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationA GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE
A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30,
More informationA Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions
A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationOptimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationDouglas-Rachford splitting for nonconvex feasibility problems
Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationSelf-Concordant Barrier Functions for Convex Optimization
Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationDuality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by
Duality (Continued) Recall, the general primal problem is min f ( x), xx g( x) 0 n m where X R, f : X R, g : XR ( X). he Lagrangian is a function L: XR R m defined by L( xλ, ) f ( x) λ g( x) Duality (Continued)
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationNumerical Optimization
Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,
More information3.10 Lagrangian relaxation
3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the
More informationInverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology
Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x
More informationInexact alternating projections on nonconvex sets
Inexact alternating projections on nonconvex sets D. Drusvyatskiy A.S. Lewis November 3, 2018 Dedicated to our friend, colleague, and inspiration, Alex Ioffe, on the occasion of his 80th birthday. Abstract
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationOn Penalty and Gap Function Methods for Bilevel Equilibrium Problems
On Penalty and Gap Function Methods for Bilevel Equilibrium Problems Bui Van Dinh 1 and Le Dung Muu 2 1 Faculty of Information Technology, Le Quy Don Technical University, Hanoi, Vietnam 2 Institute of
More informationOptimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method
Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Zhaosong Lu November 21, 2015 Abstract We consider the problem of minimizing a Lipschitz dierentiable function over a
More informationProximal-like contraction methods for monotone variational inequalities in a unified framework
Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China
More informationPrimal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization
Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department
More informationKaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth
More informationSome Properties of the Augmented Lagrangian in Cone Constrained Optimization
MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationCONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS
CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics
More informationACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION
ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION By WEI ZHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationLocal strong convexity and local Lipschitz continuity of the gradient of convex functions
Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate
More informationConvex Optimization Methods for Dimension Reduction and Coefficient Estimation in Multivariate Linear Regression
Convex Optimization Methods for Dimension Reduction and Coefficient Estimation in Multivariate Linear Regression Zhaosong Lu Renato D. C. Monteiro Ming Yuan January 10, 2008 Abstract In this paper, we
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationConvex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms
Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement
More informationAn Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1
An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 Nobuo Yamashita 2, Christian Kanzow 3, Tomoyui Morimoto 2, and Masao Fuushima 2 2 Department of Applied
More informationDuality Theory of Constrained Optimization
Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive
More informationHölder Metric Subregularity with Applications to Proximal Point Method
Hölder Metric Subregularity with Applications to Proximal Point Method GUOYIN LI and BORIS S. MORDUKHOVICH Revised Version: October 1, 01 Abstract This paper is mainly devoted to the study and applications
More informationsubject to (x 2)(x 4) u,
Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the
More informationSequential Unconstrained Minimization: A Survey
Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.
More informationINTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE
INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationLocally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem
56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationPerturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization
Noname manuscript No. (will be inserted by the editor Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Davood Hajinezhad and Mingyi Hong Received: date / Accepted: date Abstract
More informationJoint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018
EXTENDED EULER-LAGRANGE AND HAMILTONIAN CONDITIONS IN OPTIMAL CONTROL OF SWEEPING PROCESSES WITH CONTROLLED MOVING SETS BORIS MORDUKHOVICH Wayne State University Talk given at the conference Optimization,
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationComplexity bounds for primal-dual methods minimizing the model of objective function
Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationEstimate sequence methods: extensions and approximations
Estimate sequence methods: extensions and approximations Michel Baes August 11, 009 Abstract The approach of estimate sequence offers an interesting rereading of a number of accelerating schemes proposed
More informationLargest dual ellipsoids inscribed in dual cones
Largest dual ellipsoids inscribed in dual cones M. J. Todd June 23, 2005 Abstract Suppose x and s lie in the interiors of a cone K and its dual K respectively. We seek dual ellipsoidal norms such that
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More information