Iteration-complexity of first-order penalty methods for convex programming

Size: px
Start display at page:

Download "Iteration-complexity of first-order penalty methods for convex programming"

Transcription

1 Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP) problems whose feasible region is a simple compact convex set intersected with the inverse image of a closed convex cone under an affine transformation. We study two first-order penalty methods for solving the above class of problems, namely: the quadratic penalty method and the exact penalty method. In addition to one or two gradient evaluations, an iteration of these methods requires one or two projections onto the simple convex set. We establish the iteration-complexity bounds for these methods to obtain two types of near optimal solutions, namely: near primal and near primaldual optimal solutions. Finally, we present variants, with possibly better iteration-complexity bounds than the aforementioned methods, which consist of applying penalty-based methods to the perturbed problem obtained by adding a suitable perturbation term to the objective function of the original CP problem. Keywords: Convex programming, quadratic penalty method, exact penalty method, Lagrange multiplier AMS 2000 subject classification: 90C25, 90C06, 90C22, 49M37 1 Introduction The basic problem of interest in this paper is the convex programming CP) problem f := inf{fx) : Ax) K, x X}, 1) where f : X IR is a convex function with Lipschitz continuous gradient, X R n is a sufficiently simple closed convex set, A : R n R m is an affine function, and K denotes the dual cone of a closed convex cone K R m, i.e., K := {s R m : s, x 0, x K}. For the case where the feasible region consists only of the set X, or equivalently A 0, Nesterov [2, 4]) developed a method which finds a point x X such that fx) f ɛ in at most Oɛ 1/2 ) The wor of both authors were partially supported by NSF Grants CCF and CCF and ONR Grant N School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, glan@isye.gatech.edu). School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, monteiro@isye.gatech.edu). 1

2 iterations. Moreover, each iteration of his method requires one gradient evaluation of f and computation of two projections onto X. It is shown that his method achieves, uniformly in the dimension, the lower bound on the number of iterations for minimizing convex functions with Lipschitz continuous gradient over a closed convex set. When A is not identically 0, Nesterov s optimal method can still be applied directly to problem 1) but this approach would require the computation of projections onto the feasible region X {x : Ax) K }, which for most practical problems is as expensive as solving the original problem itself. In this paper, we are interested in alternative first-order methods for 1) whose iterations consist of, in addition to a couple of gradient evaluations, only projections onto the simple set X. More specifically, we present two penalty-based approaches for solving 1), namely: the quadratic and the exact penalty methods, and establish their iteration-complexity bounds for computing two types of near optimal solutions of 1). It is well-nown that the penalty parameters of the penalization problems for the above penalty-based approaches must be chosen sufficiently large so that near optimal solutions of the penalization problems yield near optimal solutions for the original problem 1). Accordingly, we establish theoretical lower bounds on the penalty parameters that depend not only on the desired solution accuracies but also on the size λ of the minimum norm Lagrange multiplier associated with the constraint Ax) K. Theoretically, setting the penalty parameters to their corresponding lower bounds would yield the lowest provably iteration-complexity bounds, and hence would be the best strategy to pursue. But since λ is not nown a priori, we present alternative penalty-based approaches based on simple guess-and-chec procedures for these penalty parameters whose iteration-complexity bounds are of the same order as the approach in which λ is nown as priori, or equivalently, the theoretical lower bound on the penalty parameter is available. Finally, we present variants of the aforementioned methods, with possibly better iteration-complexity bounds, which consist of applying penalty-based methods to the problem obtained by adding a suitable perturbation term to the objective function of the original CP problem. The paper is organized as follows. In Section 2, we give a more detailed description of problem 1) and definitions of two types of approximate solutions of 1). Section 3 discusses some technical results that will be used in our analysis and reviews two first order algorithms due to Nesterov [2, 4] for solving special classes of CP problems. In Section 4, we establish the iteration-complexity of quadratic penalty based methods for solving 1). More specifically, we present the iterationcomplexity of the quadratic penalty method applied to 1) in Subsection 4.1, and of a variant, which consists of applying the quadratic penalty approach to the perturbed problem obtained by adding a suitable perturbation term to the objective function of 1), in Subsection 4.2. In Section 5, we establish the iteration-complexity of exact penalty based methods for solving 1). More specifically, we present the iteration-complexity of the exact penalty applied directly to 1) in Subsection 5.1, and of a variant, which consists of applying the exact penalty method to the perturbed problem corresponding to 1), in Subsection Notation and terminology We denote the p-dimensional Euclidean space by IR p. Also, IR p + and IRp ++ denote the nonnegative and the positive orthants of IR p, respectively. In this paper, we use the notation R p to denote a p- dimensional vector space inherited with a inner product space,. The dual norm associated with an arbitrary norm on R p is defined as s := max x { s, x : x 1}, s Rp. 2

3 Note that if the norm is the one induced by the inner product, i.e. =, 1/2, then. Given a closed convex set C R p, we define the distance function d C : R p IR to C with respect to as d C u) := min{ u c : c C} for every u R p. It is well-nown that this minimum is always achieved at some c C. Moreover, this minimizer is unique whenever is a inner product norm. In such a case, we denote this unique minimizer by Π C u), i.e., Π C u) = argmin{ u c : c C} for every u R p. The support function of a set C R p is defined as σ C u) := sup{ u, c : c C}. 2 Problem of interest Consider inner product spaces R n and R m endowed with arbitrary norms, both denoted simply by. In this paper, we consider the CP problem 1) where f : X IR is a convex function with L f -Lipschitz-continuous gradient, i.e.: f x) fx) L f x x, x, x X. 2) We define the norm of the map A as being the operator norm of its linear part A 0 := A ) A0), i.e.: A := A 0 = max{ A 0 x) : x 1} = max{ Ax) A0) : x 1}. The Lagrangian dual function and value function associated with 1) are defined as dλ) := inf{fx) + λ, Ax) : x X}, λ K, 3) vu) := inf{fx) : Ax) + u K, x X}, u R m. 4) We mae the following assumptions throughout the paper: Assumption 1 A.1) the set X is bounded and hence f IR); A.2) there exists a Lagrange multiplier for 1), i.e., a vector λ K such that f = dλ ). Throughout this paper, we assume that the norm in R m is induced by the inner pr. First note that x is an optimal solution of 1) if, and only if, x X, Ax ) K and fx ) f. This observation leads us to our first definition of a near optimal solution x X of 1), which essentially requires the primal infeasibility measure d K A x)) and the primal optimality gap [f x) f ] + to be both small. Definition 1 For a given pair ɛ p, ɛ o ) IR ++ IR ++, x X is called an ɛ p, ɛ o )-primal solution for 1) if d K A x)) ɛ p and f x) f ɛ 0. 5) One drawbac of the above notion of near optimality of x is that it says nothing about the size of [f x) f ]. We will now show that this quantity can be bounded as [f x) f ] ɛ p λ, where λ is an arbitrary Lagrange multiplier for 1). We start by stating the following result whose proof is given in the Appendix. Proposition 1 Let K R m be a closed convex cone. Then, the following statements hold: a) d K = σ C, where C := K) B0, 1), where B0, 1) := {u R m : u 1}; 3

4 b) for every u R m and λ K, we have u, λ λ d K u). As a consequence of the above result, we obtain the following technical inequality which will be frequently used in our analysis. Corollary 2 Let λ be an Lagrange multiplier for 1). Then, for every x X, we have fx) f λ d K Ax)). Proof. It is well-nown that our assumptions imply that v is a convex function such that λ v0), and hence that vu) v0) λ, u = λ, u λ d K u) = λ d K u), u R m, where the inequality follows from Proposition 1b) and the fact that λ K. Now, let x X be given. Since x is clearly feasible for problem 4) with u = Ax), the definition of v ) in 4) implies that vu) fx). Hence, fx) f vu) v0) λ d K u) = λ d K Ax)). Corollary 3 If x X is an ɛ p -feasible solution, i.e., if d K A x)) ɛ p, then f x) f ɛ p λ, where λ is an arbitrary Lagrange multiplier for 1). Another way of defining a near optimal solution of 1) is based on the following optimality conditions: x X is an optimal solution of 1) and λ K is a Lagrange multiplier for 1) if, and only if, x, λ) = x, λ ) satisfies A x) K, λ, A x) = 0, f x) + A 0 ) λ NX x), 6) where N X x) := {s R n : s, x x 0, x X} denotes the normal cone of X at x. Based on this observation, we can introduce our second definition of a near optimal solution of 1). Definition 2 For a given pair ɛ p, ɛ d ) IR ++ IR ++, x, λ) X K) is called an ɛ p, ɛ d )-primaldual solution of 1) if where Bη) := {x R n : x η} for every η 0. d K A x)) ɛ p, λ, Π K A x)) = 0, 7) f x) + A 0 ) λ NX x) + Bɛ d ), 8) In Sections 4 and 5, we will study the iteration-complexities of well-nown penalty methods, namely the quadratic and the exact penalty methods, for computing the two types of near optimal solutions defined in this section. 4

5 3 Technical results This section discusses some technical results that will be used in our analysis and reviews two first order algorithms due to Nesterov [2, 4] for solving special classes of CP problems. It consists of three subsections. The first one develops several technical results involving projected gradients. The second subsection reviews Nesterov s optimal method for solving a class of smooth CP problems. The third subsection reviews Nesterov s smooth approximation scheme for solving a class of non-smooth CP problems that can be reformulated as smooth convex-concave saddle point i.e., min-max) problems. 3.1 Projected gradient and the optimality conditions In this subsection, we consider an inner product space R n endowed with the norm associated with its inner product. We consider the CP problem φ := min φx), 9) x X where X R n is a convex set and φ : X IR is a convex function that has L φ -Lipschitz-continuous gradient over X with respect to the norm. It is well-nown that x X is an optimal solution of 9) if and only if φx ) N X x ). Moreover, this optimality condition is in turn related to the projected gradient of the function φ over X defined as follows. Definition 3 Given a fixed constant τ > 0, we define the projected gradient of φ at x X with respect to X as see, for example, [3]) φ x)] τ X := 1 τ [ x Π X x τ φ x))], 10) where Π X ) is the projection map onto X defined in terms of the inner product norm see Subsection 1.1). The following proposition relates the projected gradient to the aforementioned optimality condition. Proposition 4 Let x X be given and define x + := Π X x τ φ x)). Then, for any given ɛ 0, the following statements hold: a) φ x)] τ X ɛ if, and only if, φ x) N X x + ) + Bɛ); b) φ x)] τ X ɛ implies that φ x+ ) N X x + ) + B 1 + τl φ )ɛ). Proof. To simplify notation, define v := φ x)] τ X. By 10) and well-nown properties of the projection operator Π X, we have v = φ x)] τ X x τv = Π X x τ φ x)) x τ φ x) x τv), y x τv) 0, y X v φ x), y x τv) 0, y X v φ x), y x + 0, y X φ x) v N X x + ). 5

6 from which statement a) clearly follows. To show b), assume that v ɛ. Then, we have x + x Π X x τ φ x)) x = x τv) x = τ v τɛ. which, together with the assumption that φ has L φ -Lipschitz-continuous gradient, implies that φ x + ) φ x) τl φ ɛ. Statement b) now follows immediately from the latter conclusion and statement a). The following lemma summarizes some interesting properties of the projected gradient. Lemma 5 Let x X be given and x + := Π X x τ φ x)). Denoting g ) := φ ), g X ) := φ )] τ X. We have: a) φ x + ) φ x) τ g X x) 2 /2 for any τ 1/L φ ; b) For any x X, there holds in particular, setting x = x, we obtain c) For any x X, there holds d) If τ = 1/L φ in definition 10), then where x Argmin x X φx). g x) g X x), x x + 0; 11) g x) g X x), g X x) 0; 12) φ x + ) φx) 1 + τl φ ) g X x) x + x. 13) φx) φx ) 1 2L φ g X x) 2, x X, 14) Proof. a) This statement is proved in p. 87 of [3]. b) Noting that x + = x τg X x) = Π X x τg x)), it follows from well-nown properties of the projection map Π X that x x τg X ), x τg x) x τg X x)) = x x τg X x)), τg X x) g x)) = x x +, τg X x) g x)) 0, x X, which clearly implies statement b). c) It follows from from the convexity of φ ), 11), the assumption that φ ) has L φ -Lipschitzcontinuous gradient, and definition 10) that φ x + ) φx) g x + ), x + x = g x) g X x), x + x + g X x), x + x + g x + ) g x), x + x g X x), x + x + g x + ) g x), x + x g X x), x + x + L φ x + x x + x = g X x), x + x + τl φ g X x) x + x 1 + τl φ ) g X x) x + x, x X. 6

7 d) Using the fact that Φx ) Φx), x X and the assumption that φ ) has L φ -Lipschitzcontinuous gradient, we conclude that φx ) φ x 1 ) g X x) φx) 1 gx), g X x) + 1 g X x) 2 L φ L φ 2L φ φx) 1 2L φ g X x) 2 for any x X, where the last inequality follows from 12). 3.2 Nesterov s Optimal Method In this subsection, we discuss Nesterov s smooth first-order method for solving a class of smooth CP problems. Our problem of interest is still the CP problem 9), which is assumed to satisfy the same assumptions mentioned in Subsection 3.1, except that now we allow to be an arbitrary norm. Moreover, we assume throughout our discussion that the optimal value φ of problem 9) is finite and that its set of optimal solutions is nonempty. Let h : X IR be a differentiable strongly convex function with modulus σ h > 0 with respect to, i.e., hx) h x) + h x), x x + σ h 2 x x 2, x, x X. 15) The Bregman distance d h : X X IR associated with h is defined as d h x; x) hx) l h x; x), x, x X, 16) where l h : R n X IR is the linear approximation of h defined as l h x; x) = h x) + h x), x x, x, x) R n X. We are now ready to state Nesterov s smooth first-order method for solving 9). We use the superscript sd in the sequence obtained by taing a steepest descent step and the superscript ag which stands for aggregated gradient ) in the sequence obtained by using all past gradients. Nesterov s Algorithm: end 0) Let x sd 0 = xag 0 X be given and set = 0 1) Set x = 2 +2 xag + +2 xsd and compute φx ) and φ x ). 2) Compute x sd +1, xag +1 ) X X as x sd +1 Argmin x ag +1 argmin 3) Set + 1 and go to step 1. { l φ x; x ) + L φ { L φ σ h d h x; x 0 ) + } 2 x x 2 : x X i=0 i [l φ x; x i )] : x X, 17) }. 18) The main convergence result established by Nesterov [4] regarding the above algorithm is sum- 7

8 marized in the following theorem. Theorem 6 The sequence {x sd } generated by Nesterov s optimal method satisfies φx sd ) φ 4L φ d h x ; x sd 0 ), 1, σ h + 1) where x is an optimal solution of 9). satisfying φx sd ) φ ɛ can be found in no more than d h x ; x sd 0 )L φ σ h ɛ 2 As a consequence, given any ɛ > 0, an iterate x sd X 19) iterations. The following result is as an immediate special case of Theorem 6. Corollary 7 Suppose that is a inner product norm and h : X R is chosen as h ) = 2 /2 in Nesterov s optimal method. Then, for any ɛ > 0, an iterate x sd X satisfying φxsd ) φ ɛ can be found in no more than x sd 0 x 2Lφ 20) ɛ iterations, where x is an optimal solution of 9). Proof. If hx) = x 2 /2, then 16) implies that d h x ; x sd 0 ) = xsd 0 x 2 /2. The corollary now follows from this bound and relation 19). Now assume that the objective function φ is strongly convex over X, i.e., for some µ > 0, φx) φ x), x x µ x x 2, x, x X. 21) Nesterov shows in Theorem of [3] that, under the assumptions of Corollary 7, a variant of his optimal method finds a solution x X satisfying φx ) φ ɛ in no more than L φ µ log L φ x sd 0 x 2 22) ɛ iterations. The following result gives an iteration-complexity bound for Nesterov s optimal method that replaces the term logl φ x sd 0 x 2 /ɛ) in 22) with logµ x sd 0 x 2 /ɛ). The resulting iterationcomplexity bound is not only sharper but also more suitable since it maes it easier for us to compare the quality of the different bounds obtained in our analysis of first-order penalty methods. Theorem 8 Let ɛ > 0 be given and suppose that the assumptions of Corollary 7 hold and that the function φ is strongly convex with modulus µ. Then, the variant where we restart Nesterov s optimal method, with proximal function h ) = 2 /2, every 8L φ K := 23) µ 8

9 iterations finds a solution x X satisfying φ x) φ ɛ in no more than 8 L φ max{1, log Q} 24) µ iterations, where and x := argmin x X φx). Q := µ xsd 0 x 2 2ɛ 25) Proof. Denote D := x sd 0 x. First consider the case where Q 1. Clearly we have ɛ µd 2 /2, which, in view of Corollary 7, implies that the number of iterations is bounded by 2 L φ /µ and hence bounded by 24). We now show that bound 24) holds for the case when Q > 1. Let x j be the iterate obtained at the end of j 1)-th restart after K iterations of Nesterov s optimal method are performed. By Theorem 6 with h ) = 2 /2 and inequality 21), we have φx 1 ) φx ) 2L φ x sd 0 x 2 K 2 2L φ D 2 K 2 φx j ) φx ) 2L φ x j 1 x 2 K 2 4L φ [ φx j 1 µ K 2 ) φx ) ], j 2. Using the above relations inductively, we conclude that Setting j = log Q and observing that K 2j [ 8Lφ µ ) 1 2 ] 2j = 2 j 4Lφ µ φx j ) φx ) 4L φ) j D 2 2µ j 1 K 2j. ) j ) j 2 log Q 4Lφ µd2 4L φ ) j µ 2ɛ µ j = D2 4L φ ) j 2ɛ µ j 1, we conclude that φx j ) φx ) ɛ. Hence, the overall number of iterations is bounded by K log Q, or equivalently, by 24). It is interesting to compare the complexity bounds of Corollary 7 and Theorem 8. Indeed, it can be easily seen that there exists a threshold value Q > 0 such that the condition Q Q, where Q is defined in 25), implies that the bound 20) is always greater than or equal to the bound 24). Moreover, as Q goes to infinity, the ratio between 20) and 24) converges to infinity. Hence, when the function φ is strongly convex and Q satisfies 25), we will always use the bound 24) in our analysis. 3.3 Nesterov s smooth approximation scheme In this subsection, we discuss Nesterov s smooth approximation scheme for solving a class of nonsmooth CP problems which can be reformulated as special smooth convex-concave saddle point i.e., min-max) problems. 9

10 Our problem of interest in this subsection is still problem 9). We assume that X R n is a closed convex set and φ : X IR is a convex function given by φx) := ˆφx) + sup{ Ex, y ψy) : y Y }, x X, 26) where ˆφ : X IR is a convex function with L ˆφ-Lipschitz-continuous gradient with respect to an arbitrary norm on R n, Y R m is a compact convex set, φ : Y IR is a continuous convex function, and E : R n R m is an affine operator. Unless stated explicitly otherwise, we assume that the CP problem 9) referred to in this subsection is the one with its objective function given by 26). Also, we assume throughout our discussion that f is finite and that the set of optimal solutions of 9) is nonempty. The function φ defined as in 26) is generally non-differentiable but can be closely approximated by a function with Lipschitz-continuous gradient using the following construction due to Nesterov [4]. Let h : Y IR be a continuous strongly convex function with modulus σ h > 0 with respect to an arbitrary norm on R m satisfying min{ hy) : y Y } = 0. For some smoothness parameter η > 0, consider the following function φ η x) = ˆφx) + max y { } Ex, y ψy) η hy) : y Y. 27) The next result, due to Nesterov [4], shows that φ η is a function with Lipschitz-continuous gradient with respect to, whose closeness to φ depends linearly on the parameter η. Proposition 9 The following statements hold: a) For every x X, we have φ η x) φx) φ η x) + ηd h, where } D h := max { hy) : y Y ; 28) b) The function φ η x) has L η -Lipschitz-continuous gradient with respect to, where L η := L ˆφ + E 2. 29) ησ h We are now ready to state Nesterov s smooth approximation scheme to solve 9). Nesterov s smooth approximation scheme: end 0) Let ɛ > 0 be given. 1) Let η := ɛ/2d h) and consider the approximation problem φ η inf{φ η x) : x X}. 30) 2) Apply Nesterov s optimal method or its variant) to 30) and terminate whenever an iterate x sd satisfying φxsd ) φ ɛ is found. The following result describes the convergence behavior of the above scheme. 10

11 Theorem 10 Nesterov s smooth approximation scheme generates an iterate x sd φ ɛ in no more than satisfying φxsd ) 8D h x sd 0 ) 2D h E 2 + L ˆφ) 31) σ h ɛ σ hɛ iterations, where D h x sd 0 ) := max x X d h x; x sd 0 ) and D h is defined in 28). 4 The quadratic penalty method The goal of this section is to establish the iteration-complexity in terms of Nesterov s optimal method iterations) of the quadratic penalty method for solving 1). Specifically, we present the iteration-complexity of the classical quadratic penalty method in Subsection 4.1, and a variant of this method, for which a perturbation term is added into the objective function of 1), is discussed and analyzed in Subsection 4.2. The basic idea underlying penalty methods is rather simple, namely: instead of solving problem 1) directly, we solve certain relaxations of 1) obtained by penalizing some violation of the constraint Ax) K. More specifically, in the case of the quadratic penalty method, given a penalty parameter ρ > 0, we solve the relaxation { Ψ ρ := inf Ψ ρ x) := fx) + ρ x X 2 [d K Ax))]2}. 32) In order to guarantee that the function Ψ ρ is smooth, we assume throughout this section that the distance function d K is defined with respect to an inner product norm in R m. Note that in this case, we have =. We will now see that the objective function Ψ ρ of 32) has Lipschitz continuous gradient. We first state the following well-nown result which guarantees that the distance function has Lipschitz continuous gradient see for example Proposition 15 of [1] for its proof). Proposition 11 Given a closed convex set C R m, consider the distance function d C : R m IR to C with respect some inner product norm on R m. Then, the function ψ : R m R defined as ψu) = [d C u)] 2 is convex and its gradient is given by ψu) = 2[u Π C u)] for every u R m. Moreover, ψũ) ψũ) 2 ũ u for every u, ũ R m. As an immediate consequence of Proposition 11, we obtain the following result. Corollary 12 The function Ψ ρ has M ρ -Lipschitz continuous gradient where M ρ := L f + ρ A 2. Proof. The differentiability of Ψ ρ follows immediately from the assumption that f is differentiable and Proposition 11. Moreover, it easily follows from the chain rule that Ψ ρ x) = fx) + ρa 0 d2 K )Ax))/2, which together with 2) and Proposition 11, then imply that Ψ ρ x 1 ) Ψ ρ x 2 ) fx 1 ) fx 2 ) + ρ 2 A 0 d 2 K)Ax 1 )) A 0 d 2 K)Ax 2 )) L f x 1 x 2 + ρ 2 A 0 d 2 KAx 1 )) d 2 KAx 2 )) L f x 1 x 2 + ρ A 0 A 0 x 1 x 2 ) L f x 1 x 2 + ρ A 0 A 0 x 1 x 2 = M ρ x 1 x 2, 11

12 for every x 1, x 2 X, where the last equality follows from the fact that A = A 0 = A 0. Therefore, problem 32) can be solved, for example, by Nesterov s optimal method or one of its variants see for example [1, 2, 4]). 4.1 Quadratic penalty method applied to the original problem In this subsection, we consider the application of the quadratic penalty method directly to the original problem 1), which consists of solving the penalized problem 32) for a suitably chosen penalty parameter ρ. This subsection contains two subsubsections. The first one considers the complexity of obtaining an ɛ p, ɛ o )-primal solution of 1) while the second one discusses the complexity of computing an ɛ p, ɛ d )-primal-dual solution of 1) Computation of a primal approximate solution In this subsection, we are interested in deriving the iteration-complexity involved in the computation of an ɛ p, ɛ o )-primal solution of problem 1) by approximately solving the penalized problem 32) for some value of the penalty parameter ρ. Given an approximate solution x X for the penalized problem 32), the following result provides bounds on the primal infeasibility and optimality gap of x with respect to 1). Proposition 13 If x X is a δ-approximate solution of 32), i.e., it satisfies Ψ ρ x) Ψ ρ δ, 33) then d K A x)) 2 ρ λ + 2δ ρ 34) f x) f δ, 35) where λ is an arbitrary Lagrange multiplier associated with 1). Proof. Using the fact that v0) = f Ψ ρ, Corollary 2 and assumption 33), we conclude that δ Ψ ρ x) Ψ ρ = f x) + ρ 2 [d K A x))]2 Ψ ρ f x) f + ρ 2 [d K A x))]2 λ d K A x)) + ρ 2 [d K A x))]2, which clearly implies both 34) and 35). For the sae of future reference, we state the following simple result. Lemma 14 Let α i, i = 0, 1, 2, be given positive constants. Then, the only positive scalar ρ satisfying the equation α 2 ρ 1 + α 1 ρ 1/2 = α 0 is given by ρ = α 1 + α α 0α 2 2α 0 ) 2. 12

13 With the aid of Proposition 13, we can now derive an iteration-complexity bound for Nesterov s method applied to the quadratic penalty problem 32) to compute an ɛ p, ɛ o )-primal solution of 1). Theorem 15 Let λ be an arbitrary Lagrange multiplier for 1) and let ɛ p, ɛ o ) IR ++ IR ++ be given. If ɛo + ) 2 ɛ o + 4ɛ p t ρ = ρ p t) := 36) 2ɛp for some t λ, then Nesterov s optimal method applied to problem 32) finds an ɛ p, ɛ o )-primal solution of 1) in at most { N p t) := 2 D h x sd 0 ) } L f 2 A 2t + + A σ h ɛ o ɛ p ɛ p ɛ o, 37) iterations, where D h x sd 0 ) is defined in Theorem 10. In particular, if is the inner product norm on R n and h ) = 2 /2, then the above bound reduces to { } 2DX L f 2 A 2t N p t) := + + A, 38) ɛ o ɛ p ɛ p ɛ o where D X := max x1,x 2 X x 1 x 2. Proof. Let x X satisfies 33) with δ = ɛ o. Proposition 13, the assumption that t λ, relation 36) and Lemma 14 with α 0 = ɛ p, α 1 = ɛ o and α 2 = t then imply that f x) f ɛ o and d K A x)) 2 ρ p t) λ + 2 ɛ o ρ p t) 2t ρ p t) + 2ɛ o ρ p t) = ɛ p, and hence that x is an ɛ p, ɛ o )-primal solution of 1). In view of Corollary 7, Nesterov s optimal method finds an approximate solution x as above in at most 2 D h x sd 0 ) M ρ σ h ɛ 0 iterations, where M ρ := L f +ρ A 2. Substituting the value of ρ given by 36) in this iteration bound and using some trivial majorization, we obtain bound 37). We now mae a few observations regarding Theorem 15. First, the choice of ρ given by 36) requires that t λ so as to guarantee that an ɛ o -approximate solution x of 32) satisfies d K A x)) ɛ p. Second, note that the iteration-complexity N p t) obtained in Theorem 15 achieves its minimum possible value over the interval t λ exactly when t = λ. However, since the quantity λ is not nown a priori, it is necessary to use a guess and chec procedure for t so as to develop a scheme for computing an ɛ p, ɛ o )-primal solution of 1) whose iteration-complexity has the same order of magnitude as the ideal one in which t = λ. We now describe the aforementioned guess and chec procedure for t. Search Procedure 1: 13

14 1) Set = 0 and define D h x sd 0 β 0 := 2 ) ) L f 2 A 2D h x sd 0 +, β 1 := 2 A ) [ ] max1, β0 ) 2, t 0 :=. 39) σ h ɛ o ɛ p σ h ɛ p ɛ o β 1 2) Set ρ = ρ p t ), and perform at most N p t ) iterations of Nesterov s optimal method applied to problem 32). If an iterate x is obtained such that 33) with δ = ɛ o and d K A x)) ɛ p are satisfied, then stop; otherwise, go to step 3; 3) Set t +1 = 2t, = + 1, and go to step 2. Before establishing the iteration-complexity of the above procedure, we state a technical result, namely Lemma 17, which will be used more than once in the paper. Lemma 16 states a simple inequality which is also used in a few times in our presentation. Lemma 16 For any scalars τ 0, x > 0, and α 0, we have τx + α τ + α) x. Lemma 17 Let a positive scalar p be given. Then, there exists a constant C = Cp) such that for any positive scalars β 1, β 2, t, we have β 1 + β 2 t p C β 1 + β 2 t p, 40) where t = t 0 2 for = 1,..., K, and ) maxβ1, 1) 1/p { )} t t 0 :=, K := max 0, log. 41) β 2 t 0 Proof. Assume first that t t 0. Due to the definition of K in 41), we have K = 0 in this case, and hence that β 1 + β 2 t p = β 1 + β 2 t p 0 = β 1 + maxβ 1, 1) 2β β 1 4 β 1 + β 2 t p, where the second equality follows from the definition of t 0. Hence, in the case where t t 0, inequality 40) holds with C = 4. Assume now that t > t 0. By the definition of K in 41), we have K = log t/t 0 ), from which we conclude that K < log t/t 0 ) + 1, and hence that t 0 2 K+1 < 4 t. Using these relations, the inequality 14

15 log x = log x p )/p x p /p for any x > 0, and the definition of t 0 in 41), we obtain β 1 + β 2 t p K 1 + β 1 + β 2 t p 0 2p 1 + β 1 )1 + K) + β 2 t p 2 K+1)p 0 2 p 1 [ )] t 4 t) p 1 + β 1 ) 2 + log + β 2 t 0 2 p 1 [ 1 + β 1 ) ) t p ] + 4p p t 0 2 p 1 β 2 t p [ 1 + β 1 ) β 2 t p )] + 4p p maxβ 1, 1) 2 p 1 β 2 t p 21 + β 1 ) + 2 p β 2 t p + 4p 2 p 1 β 2 t p } 2p 4p 2 + max {2, + 2 p β 1 + β 2 t p ), 1 which, in view of Lemma 16, implies that 40) holds with C = Cp) := 2 + max {2, 2p + 4p 2 p 1 }. The following result gives the iteration-complexity of Search Procedure 1 for obtaining an ɛ p, ɛ o )- primal solution of 1). Corollary 18 Let λ be the minimum norm Lagrange multiplier for 1). Then, the overall number of iterations of Search Procedure 1 for obtaining an ɛ p, ɛ o )-primal solution of 1) is bounded by ON p λ )), where N p ) is defined in 37). Proof. In view of Theorem 15, the iteration count in Search Procedure 1 can not exceed K := max{0, log λ /t 0 ), and hence its overall number of inner i.e., Nesterov s optimal method) iterations is bounded by K N pt ) = K β 0 + β 1 t 1/2, where β 0 and β 1 are defined by 39). The result now follows from the definition of t 0 in 39) and 40) with p 1 = 1/2 and t = λ Computation of a primal-dual approximate solution In this subsection, we are interested in deriving the iteration-complexity involved in the computation of an ɛ p, ɛ d )-primal-dual solution of problem 1) by approximately solving the penalized problem 32) for certain value of the penalty parameter ρ. We assume throughout this subsection that the norms on R n and R m are the ones associated with their respective inner products. Given an approximate solution x X for the penalized problem 32), the following result shows that there exists a pair x +, λ) X K) depending on x that approximately satisfies the optimality conditions 6). 15

16 Proposition 19 If x X is a δ-approximate solution of 32), i.e., it satisfies 33), then the pair x +, λ) defined as x + := Π X x Ψ ρ x)/m ρ ), 42) λ := ρ[a x + ) Π K A x + ))] 43) is in X K) and satisfies the second relation in 7) and the relations d K A x + )) 1 δ ρ λ + ρ, 44) f x + ) + A 0 ) λ N X x + ) + B2 2M ρ δ), 45) where λ is an arbitrary Lagrange multiplier for 1). Proof. It follows from Lemma 5a) with φ = Ψ ρ and τ = M ρ that Ψ ρ x + ) Ψ ρ x), and hence that Ψ ρ x + ) Ψ ρ δ in view of 33). By Proposition 13, this implies that 44) holds. Moreover, by Lemma 5d) with φ = Ψ ρ and L φ = M ρ and assumption 33), we have Ψ ρ x)] 1/Mρ X 2M ρ [Ψ ρ x) Ψ ρ] 2M ρ δ. Relation 45) now follows from this inequality, Proposition 4b) with L φ = M ρ, τ = 1/M ρ and ɛ = 2M ρ δ, and the fact that Ψ ρ x + ) = f x + ) + A 0 ) λ, where λ is given by 43). Finally, λ K) and the second relation of 7) holds in view of the definition of λ and well-nown properties of the projection operator onto a cone. With the aid of Proposition 19, we can now derive an iteration-complexity bound for Nesterov s method applied to the quadratic penalty problem 32) to compute an ɛ p, ɛ d )-primal-dual solution of 1). Theorem 20 Let λ be an arbitrary Lagrange multiplier for 1) and let ɛ p, ɛ d ) IR ++ IR ++ be given. If ρ = ρ pd t) := 1 t + ɛ ) d 46) ɛ p 8 A for some t λ, then Nesterov s optimal method with h ) = 2 /2 applied to problem 32) finds an ɛ p, ɛ d )-primal-dual solution of 1) in at most ) L f N pd t) := 4D X + A 2 t + A 47) ɛ d ɛ p ɛ d 8ɛp iterations, where D X is defined in Theorem 15. Proof. Let x X be a δ-approximate solution of 32) where δ := ɛ 2 d /8M ρ). Noting that δ ɛ 2 d /8ρ A 2 ) in view of the fact that M ρ := L f + ρ A 2 ρ A 2, we conclude from Proposition 19 that the pair x +, λ) defined as x + := Π X x Ψ ρ x)/m ρ ) and λ := ρ[a x + ) Π K A x + ))] satisfies f x + ) + A λ N X x + ) + Bɛ d ) and d K A x + )) 1 ρ λ + δ ρ 1 λ + ρ 16 ɛ d 8 A ) ɛ p,

17 where the last inequality is due to 46) and the assumption that t λ. We have thus shown that x +, λ) is an ɛ p, ɛ d )-primal-dual solution of 1). In view of Corollary 7, Nesterov s optimal method finds an approximate solution x as above in at most ) ) 2DX 1/2 Mρ 8M 2 1/2 = ρ δ 2DX ɛ 2 d = M ρ 4D X = ɛ d L f + ρ A 4D 2 X iterations. Substituting the value of ρ given by 36) in the above bound, we obtain bound 37). The following result states the iteration-complexity of a guess and chec procedure based on Theorem 20. Its proof is based on similar arguments as those used in the proof of Corollary 18. Corollary 21 Let λ be the minimum norm Lagrange multiplier for 1). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions N pd and ρ pd defined in Theorem 20, δ is set to ɛ 2 d /8M ρ), and t 0 is set to [maxβ 0, 1)/β 1 ] with ) L f β 0 = 4D X + A, β 1 = 4D X A 2. ɛ d 8ɛp ɛ p ɛ d Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ d )- primal-dual solution of 1) is bounded by ON pd λ )), where N pd ) is defined in 47). 4.2 Quadratic penalty method applied to a perturbed problem Throughout this subsection, we assume that the norm on R n is the one associated with its inner product. Recall that throughout Section 4, we are also assuming that the norm on R m is the one associated with its inner product.) Consider the perturbation problem f γ := min{f γ x) := fx) + γ 2 x x 0 2 : Ax) K, x X}, 48) where x 0 is a fixed point in X and γ > 0 is a pre-specified perturbation parameter. It is well-nown that if γ is sufficiently small, then an approximate solution of 48) will also be an approximate solution of 1). In this subsection, we will derive the iteration-complexity in terms of Nesterov s optimal method iterations) of computing an approximate solution of 1) by applying the quadratic penalty approach to the perturbation problem 48) for a conveniently chosen perturbation parameter γ > 0. The subsection is divided into two subsubsections: the computation of a primal approximate solution of 1) is treated in the first subsubsection while the computation of a primal-dual approximate solution of 1) is treated in the second one Computation of a primal approximate solution The quadratic penalty problem associated with 48) is given by Ψ ρ,γ := min x X { Ψ ρ,γ x) := fx) + γ 2 x x ρ 2 d K Ax))2}. 49) ɛ d 17

18 It can be easily seen that the function Ψ ρ,γ has M ρ,γ -Lipschitz continuous gradient where M ρ,γ := L f + ρ A 2 + γ. 50) The following simple lemma relates the optimal values of the perturbation problem 48) and the original problem 1). Lemma 22 Let f, Ψ ρ, f γ, and Ψ ρ,γ be the optimal values defined in 1), 32), 48), and 49), respectively. Then, where D X := max x1,x 2 X x 1 x 2. 0 fγ f γdx 2 /2 51) 0 Ψ ρ,γ Ψ ρ γdx 2 /2, 52) Proof. The first inequalities in both relations 51) and 52) follow immediately from the fact that f γ f and Ψ ρ,γ Ψ ρ. Now, let x and x γ be optimal solutions of 1) and 48), respectively. Then, f γ = fx γ) + γ 2 x γ x 0 2 fx ) + γ 2 x x 0 2 f + γd2 X 2, from which the second inequality in 51) immediately follows. The second inequality in 52) can be similarly shown. The following theorem gives the iteration-complexity of computing an ɛ p, ɛ o )-primal solution of 1) by applying the quadratic penalty approach to the perturbation problem 48) with an appropriate choice of perturbation parameter γ > 0. Theorem 23 Let λ γ be an arbitrary Lagrange multiplier for 48) and let ɛ p, ɛ o ) IR ++ IR ++ be given. If ρ = ρ p t) := 4t and γ = ɛ o ɛ p DX 2, 53) for some t λ γ, then the variant of Nesterov s optimal method of Theorem 8 with µ = γ and L φ = M ρ,γ applied to problem 49) finds an ɛ p, ɛ o )-primal solution of 1) in at most Ñ p t) := 2 2D X L f ɛ o + 2 A iterations, where D X is defined in Theorem 15. t ɛ p ɛ o ) + 1 { max 1, log ɛ } o tɛ p Proof. Assume that ρ and γ are given by 53) for some t λ γ. Let δ := min{ɛ o /4, ɛ p t/2} and x X be a δ-approximate solution of 49), i.e., Ψ ρ,γ x) Ψ ρ,γ δ. Then, Proposition 13 with f = f γ, Ψ ρ = Ψ ρ,γ and ρ = ρ p t) and the assumption that t λ γ imply that d K A x)) 2 2δ ρ p t) λ γ + ρ p t) ɛ p λ γ + 2t tɛ p 4t/ɛ p ɛ p 2 + ɛ p 2 = ɛ p 54) 18

19 and f γ x) f γ δ ɛ o /4. The later inequality together with 51) and 53) in turn imply that f x) f f γ x) f [f γ x) f γ ] + [f γ f ] ɛ o 4 + ɛ o 2 < ɛ o. We have thus proved that x is an ɛ p, ɛ o )-primal solution of 1). Now, using Theorem 8 with φ = Ψ ρ,γ, µ = γ and ɛ = δ, and noting that the definition of γ in 53) and the definition of δ imply that Q = µ xsd 0 x 2 = γ xsd 0 x 2 { γd2 X ɛ o = 2ɛ 2δ 2δ min{ɛ o /2, ɛ p t} = max 2, ɛ } o, ɛ p t we conclude that the number of iterations performed by Nesterov s optimal method with the restarting feature) for finding a δ-approximate solution x as above is bounded by { 8M ρ,γ 8M ρ,γ log Q = max 1, log ɛ } o. γ γ tɛ p Now, using relations 50) and 53) and some trivial majorization, we easily see that the latter bound is majorized by 54). Note that the complexity bound 54) derived in Theorem 23 is guaranteed only under the assumption that t λ γ, where λ γ is the minimum norm Lagrange multiplier for 48). Since the bound 54) is a monotonically increasing function of t, the ideal theoretical) choice of t would be to set t = λ γ. Without assuming any nowledge of this Lagrange multiplier, the following result shows that a guess and chec procedure similar to Search Procedure 1 still has an OÑp λ γ )) iterationcomplexity bound, where Ñp ) is defined in 54). Before establishing the iteration-complexity of this guess and chec procedure, we state the following technical result which substantially generalizes the one stated in Lemma 17. The proof of this result is given in the Appendix. Proposition 24 For some positive integer L, let positive scalars p 1, p 2,, p L be given. Then, there exists a constant C = Cp 1,, p L ) such that for any positive scalars β 0, β 1,, β L, ν, and t, we have β 0 + βl t p ) { } l max 1, log νt C β 0 + β l β l t p l ) { max 1, log ν t }, 55) where { )} ) t maxβ0, 1) 1/pl K := max 0, log, t 0 := min, t = t 0 2, = 1,..., K. 56) t 0 1lL In particular, if ν = t, then 55) implies that β 0 + βl t p ) l C β 0 + β l t p l ). 57) We are now ready to discuss the iteration-complexity bound of the above-mentioned guess and chec procedure. 19

20 Corollary 25 Let λ γ be the minimum norm Lagrange multiplier for 48). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions Ñp and ρ p defined in Theorem 23, δ is set to δ := min{ɛ o /4, ɛ p t /2}, and t 0 is set to [maxβ 0, 1)/β 1 ] 2 with β 0 = 2 L f 2D X + 1, β 1 = 4 2DX A. 58) ɛ o ɛp ɛ o Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ o )- primal-dual solution of 1) is bounded by OÑp λ γ )), where Ñp ) is defined in 54). Proof. In view of Theorem 23, the iteration count in Search Procedure 1 can not exceed K := max{0, log λ γ /t 0 ), and hence its overall number of inner i.e., Nesterov s optimal method) iterations is bounded by K Ñpt ) = K β 0 + β 1 t 1/2 max{1, logν/t )}, where β 0 and β 1 are defined by 58), and ν = ɛ 0 /ɛ p. The result now follows from the definition of t 0 and Proposition 24 with L = 1, p 1 = 1/2, and t = λ γ. It is interesting to compare the two functions N p t) and Ñpt) defined in Theorems 15 and 23, respectively. Indeed, both functions up to some constant factor) are comparable for the case when r t := ɛ o /tɛ p ) = O1). Otherwise, if r t = Ω1) and, we further assume that the term involving t dominates the other terms in the right hand side of 54), i.e., 2 L f 2D X + 1 = O 4 ) t 2D X A, ɛ o ɛ p ɛ o then ) Ñ p t) log N p t) O1) rt, rt for some universal constant O1). Hence, in the latter case, we conclude that the function Ñpt) is asymptotically smaller than the function N p t) Computation of a primal-dual approximate solution In this subsection, we derive the iteration-complexity of computing a primal-dual approximate solution of 1) by applying the quadratic penalty method to the perturbation problem 48). We will show that the resulting approach has a substantially better iteration-complexity than the one discussed in Subsection 4.1.2, which consists of applying the quadratic penalty method directly to the original problem 1). Theorem 26 Let λ γ be an arbitrary Lagrange multiplier for 48) and let ɛ p, ɛ d ) IR ++ IR ++ be given. Also assume that ρ = ρ pd t) := 1 ) ɛ t + d, γ = ɛ d, 59) ɛ p 32 A 2D X for some t λ γ. Then, the variant of Nesterov s optimal method of Theorem 8 with µ = γ and L φ = M ρ,γ applied to problem 49) finds an ɛ p, ɛ d )-primal-dual solution of 1) in at most Ñ pd t) := 8St) 2 log2st)), 60) 20

21 where D X is defined in Theorem 15 and ) ] 1 L f St) := [2D X + A 2 1/2 2D X A t 1/2. 61) ɛ d 32ɛp ɛp ɛ d Proof. Define δ := ɛ 2 d /32M ρ,γ) and let x X be a δ-approximate solution for 49), i.e., Ψ ρ,γ x) Ψ ρ,γ δ. It then follows from Proposition 19 with Ψ ρ = Ψ ρ,γ, f = f γ, and M ρ = M ρ,γ that the pair x +, λ) defined as x + := Π X x Ψ ρ,γ x)/m ρ,γ ) and λ := ρ[a x + ) Π K A x + ))] satisfies f γ x + ) + A λ N X x + ) + B2 2M ρ,γ δ) = N X x + ) + Bɛ d /2). This together with the fact that γ x + x 0 γd X ɛ d /2 then imply that f x + ) + A λ = [ f γ x + ) γ x + x 0 )] + A λ γ x + x 0 ) N X x + ) + Bɛ d /2) N X x + ) + Bɛ d ). Moreover, noting that δ := ɛ 2 d /32M ρ,γ) ɛ 2 d /32ρ A 2 ), we conclude that Proposition 19 also implies that d K A x + )) 1 δ ρ λ γ + ρ 1 ) λ ɛ d ρ γ + ɛ p, 32 A where the last inequality is due to 59) and the assumption that t λ γ. We have thus shown that x +, λ) is an ɛ p, ɛ d )-primal-dual solution of 1). Now, using Theorem 8 with φ = Ψ ρ,γ, µ = γ and ɛ = δ, and noting that the definition of δ and the definition of γ in 59) imply that Q = µ xsd 0 x 2 2ɛ = γ xsd 0 x 2 2δ γd2 X 2δ = γd 2 X ɛ 2 d /16M ρ,γ) = 4M ρ,γ, γ we conclude that the number of iterations performed by Nesterov s optimal method with the restarting feature) for finding a δ-approximate solution x as above is bounded by ) M ρ,γ M ρ,γ 4Mρ,γ 8 log Q = 8 log. 62) γ γ γ Now, using 50) and the definitions of ρ and γ in 53), we easily see that the latter bound is majorized by 60). Note that the complexity bound 60) derived in Theorem 26 is guaranteed only under the assumption that t λ γ, where λ γ is the minimum norm Lagrange multiplier for 48). Since the bound 60) is a monotonically increasing function of t, the ideal theoretical) choice of t would be to set t = λ γ. Without assuming any nowledge of this Lagrange multiplier, the following result shows that a guess and chec procedure similar to Search Procedure 1 still has an OÑpd λ γ )) iteration-complexity bound, where Ñpd ) is defined in 60). Corollary 27 Let λ γ be the minimum norm Lagrange multiplier for 48). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions Ñpd and ρ pd defined in Theorem 26, Nesterov s optimal method is replaced by its variant 21

22 of Theorem 8 with µ = γ and L φ = M ρ,γ see step 2 of Search Procedure 1), δ is set to ɛ 2 d /32M ρ,γ), and t 0 is set to [maxβ 0, 1)/β 1 ] 2 with β 0 = 8 ) ] 1/2 L f [2D X + A + 1, β 1 = 8 2D 1/2 X A ɛ d 32ɛp ɛ p ɛ d ) 1/2. 63) Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ d )- primal-dual solution of 1) is bounded by OÑpd λ γ )), where Ñpd ) is defined in 60). Proof. In view of Theorem 26, the iteration count of the above guess and chec see Search Procedure 1) for obtaining an ɛ p, ɛ d )-primal-dual solution of 1) can not exceed K := max{0, log λ /t 0 ), and hence its overall number of inner i.e., Nesterov s optimal method) iterations is bounded by Ñ pd t ) = 8St ) 2 log2st )) 2 log2st K )) 8St ). 64) Now, the definition of t 0 and relation 40) with p = 1/2 and t = λ γ imply that 8St ) = β 0 + β 1 t 1/2 = O β 0 + β 1 λ γ 1/2) = O 8S λ γ ) ), where β 0 and β 1 are defined by 63). Moreover, using the fact that t K 2 λ γ, we easily see that 2 log2st K )) = O 2 log2s λ γ )) ). Substituting the last two bounds into 64), we then conclude that K Ñpdt ) = OÑpd λ γ )), and hence that the corollary holds. It is interesting to compare the functions N pd t) and ˆN pd t) defined in Theorems 20 and 26, respectively. It follows from 47), 60), and 61) that Ñ pd t) N pd t) 8St) 2 log2st))) St)) 2 1 log St) O1) St) 1, 65) where O1) denotes an absolute constant. Hence, when St) is large, the bound Ñpdt) can be substantially smaller than the bound N pd t). Note that we can not use the previous observation to compare the iteration-complexity of Corollary 21 with that obtained in Corollary 27 since the first one is expressed in terms of λ and the later in terms of λ γ. However, if λ γ = O λ ), then it can be easily seen that 65) implies that Ñ pd λ γ ) N pd λ ) O1) log S λ ) S λ ) 1. Hence, the first complexity is better than the second one whenever λ γ = O λ ) and S λ ) is sufficiently large. The following result describes an alternative way of choosing the penalty parameter ρ in subproblem 49) as a function of t, but requires that t λ instead of t λ γ. 22

23 Theorem 28 Let λ be an arbitrary Lagrange multiplier for 1) and let ɛ p, ɛ d ) IR ++ IR ++ be given. Assume that ɛd D X /2 + ) 2 ɛ d D X /2) + 4αt)ɛ p ρ = ˆρ pd t) :=, γ = ɛ d, 66) 2ɛ p 2D X where αt) := t+ɛ d / 32 A ) for some t λ. Then, the variant of Nesterov s optimal method of Theorem 8 with µ = γ and L φ = M ρ,γ applied to problem 49) finds an ɛ p, ɛ d )-primal-dual solution of 1) in at most ˆN 8Ŝt) pd t) := 2 log2ŝt)), 67) where D X is defined in Theorem 15 and 2L f D X Ŝt) := + A ɛ d D X + ɛ p 2tD X ɛ p ɛ d ) + A DX 8 1/4 ɛ p ) Proof. Define δ := ɛ 2 d /32M ρ,γ) and let x X be a δ-approximate solution for 49), i.e., Ψ ρ,γ x) Ψ ρ,γ δ. As shown in the proof of Theorem 26, the pair x +, λ) defined as x + := Π X x Ψ ρ,γ x)/m ρ,γ ) and λ := ρ[a x + ) Π K A x + ))] satisfies f x + ) + A λ N X x + ) + Bɛ d ). Moreover, it follows from Lemma 5a) with φ = Ψ ρ,γ and τ = M ρ,γ that Ψ ρ,γ x + ) Ψ ρ,γ x), and hence that Ψ ρ,γ x + ) Ψ ρ,γ Ψ ρ,γ x) Ψ ρ,γ δ. This observation together with 32), 49), 52) and 66) then imply that Ψ ρ x + ) Ψ ρ = Ψ ρ,γ x + ) γ 2 x+ x 0 2 Ψ ρ [Ψ ρ,γ x + ) Ψ ρ,γ] + [Ψ ρ,γ Ψ ρ] δ + γd 2 X ɛ 2 d 32ρ A 2 + ɛ dd X, 2 where the last inequality follows from the definition of δ and the fact that M ρ ρ A 2. The above inequality together with Proposition 13, the assumption that t λ, relation 66) and Lemma 14 with α 0 = ɛ p, α 1 = ɛ d D X /2 and α 2 = t + ɛ d / 32 A ) =: αt) then imply that d K A x + )) 1 ρ λ + ɛ 2 d 32ρ 2 A 2 + D Xɛ d 2ρ 1 ρ t + ɛ d 32 A ) + D X ɛ d 2ρ We have thus shown that x +, λ) is an ɛ p, ɛ d )-primal-dual solution of 1). Now, using Theorem 8 with φ = Ψ ρ,γ, µ = γ and ɛ = δ, and noting that the definition of δ and the definition of γ in 66) imply that Q = µ xsd 0 x 2 2ɛ = γ xsd 0 x 2 2δ γd2 X 2δ γd 2 X = ɛ 2 d /16M ρ,γ) = 4M ρ,γ, γ we conclude that the number of iterations performed by Nesterov s optimal method with the restarting feature) for finding a δ-approximate solution x as above is bounded by 62). Since relations 50) and 66) and the definition of αt) imply that M ρ,γ γ = L f + ρ A 2 + γ γ L f ρ γ + A γ + 1 = 23 2L f D X ɛ d = ɛ p. ρ + A γ + 1

24 and ρ γ ) ɛd D X /2 αt) 2DX + = D X + ɛ p ɛ p ɛ d ɛ p ) D X 2tD X + +, ɛ p ɛ p ɛ d D X 8ɛp A t + ɛ d / 32 A ) 2DX ɛ p ɛ d it follows that 62) can be bounded by 67). The following result states the iteration-complexity of a guess and chec procedure based on Theorem 28. Its proof is based on similar arguments as those used in the proof of Corollary 27. Corollary 29 Let λ be the minimum norm Lagrange multiplier for 1). Consider the guess and chec procedure similar to Search Procedure 1 where the functions N p and ρ p are replaced by the functions ˆN pd and ˆρ pd defined in Theorem 26, Nesterov s optimal method is replaced by its variant of Theorem 8 with µ = γ and L φ = M ρ,γ see step 2 of Search Procedure 1), δ is set to ɛ 2 d /32M ρ,γ), and t 0 is set to [maxβ 0, 1)/β 1 ] 2 with β 0 = 8 2L f D X ɛ d + A D X ɛ p + ) A D X 2D X + 1, β 1 = 8 A. 8ɛp ɛ p ɛ d Then, the overall number of iterations of this guess and chec procedure for obtaining an ɛ p, ɛ d )- primal-dual solution of 1) is bounded by O ˆN pd λ )), where ˆN pd ) is defined in 67). It is interesting to compare the functions N pd t) and ˆN pd t) defined in Theorems 20 and 26, respectively. When the second term in the right hand side of 68) is dominated by the other terms, i.e., then it can be easily seen that A D X ɛ p = O A t 1/2 D 1/2 X + ɛp ɛ d L f D X ɛ d ˆN pd t) N pd t) O1)Ñpdt) log St) O1) N pd t) St) 1. ), 69) Hence, if 69) holds and St) is large, then ˆN pd t) can be considerably smaller than N pd t). It is also interesting to observe when ɛ p = ɛ d = ɛ, the dependence on ɛ of the function ˆN pd is O1/ɛ) log1/ɛ) while that of N pd is O1/ɛ 2 ). 5 The exact penalty method The goal of this section is to establish the iteration-complexity in terms of Nesterov s optimal method iterations) of first-order exact penalty methods for computing a ɛ p -primal solution of 1). It consists of two subsections. The first one considers the complexity of a first-order exact penalty method applied directly to the original problem 1) while the second one deals with the complexity of a first-order exact penalty method applied to the perturbation problem 48) for some suitably chosen perturbation parameter θ. 24

Iteration-complexity of first-order augmented Lagrangian methods for convex programming

Iteration-complexity of first-order augmented Lagrangian methods for convex programming Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS WEIWEI KONG, JEFFERSON G. MELO, AND RENATO D.C. MONTEIRO Abstract.

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

Additional Homework Problems

Additional Homework Problems Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Convex Optimization Under Inexact First-order Information. Guanghui Lan

Convex Optimization Under Inexact First-order Information. Guanghui Lan Convex Optimization Under Inexact First-order Information A Thesis Presented to The Academic Faculty by Guanghui Lan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy H. Milton

More information

An adaptive accelerated first-order method for convex optimization

An adaptive accelerated first-order method for convex optimization An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant

More information

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization

Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Convergence rate of inexact proximal point methods with relative error criteria for convex optimization Renato D. C. Monteiro B. F. Svaiter August, 010 Revised: December 1, 011) Abstract In this paper,

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming

Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming arxiv:1211.0056v2 [math.oc] 2 Nov 2012 Zhaosong Lu October 30, 2012 Abstract In this paper we consider l 0 regularized convex

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Abstract This paper presents an accelerated

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

A hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities

A hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities A hybrid proximal extragradient self-concordant primal barrier method for monotone variational inequalities Renato D.C. Monteiro Mauricio R. Sicre B. F. Svaiter June 3, 13 Revised: August 8, 14) Abstract

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 3, No., pp. 109 115 c 013 Society for Industrial and Applied Mathematics AN ACCELERATED HYBRID PROXIMAL EXTRAGRADIENT METHOD FOR CONVEX OPTIMIZATION AND ITS IMPLICATIONS TO SECOND-ORDER

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers Optimization for Communications and Networks Poompat Saengudomlert Session 4 Duality and Lagrange Multipliers P Saengudomlert (2015) Optimization Session 4 1 / 14 24 Dual Problems Consider a primal convex

More information

Euler Equations: local existence

Euler Equations: local existence Euler Equations: local existence Mat 529, Lesson 2. 1 Active scalars formulation We start with a lemma. Lemma 1. Assume that w is a magnetization variable, i.e. t w + u w + ( u) w = 0. If u = Pw then u

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL) Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x

More information

On the Existence and Convergence of the Central Path for Convex Programming and Some Duality Results

On the Existence and Convergence of the Central Path for Convex Programming and Some Duality Results Computational Optimization and Applications, 10, 51 77 (1998) c 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. On the Existence and Convergence of the Central Path for Convex

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30,

More information

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by Duality (Continued) Recall, the general primal problem is min f ( x), xx g( x) 0 n m where X R, f : X R, g : XR ( X). he Lagrangian is a function L: XR R m defined by L( xλ, ) f ( x) λ g( x) Duality (Continued)

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x

More information

Inexact alternating projections on nonconvex sets

Inexact alternating projections on nonconvex sets Inexact alternating projections on nonconvex sets D. Drusvyatskiy A.S. Lewis November 3, 2018 Dedicated to our friend, colleague, and inspiration, Alex Ioffe, on the occasion of his 80th birthday. Abstract

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

On Penalty and Gap Function Methods for Bilevel Equilibrium Problems

On Penalty and Gap Function Methods for Bilevel Equilibrium Problems On Penalty and Gap Function Methods for Bilevel Equilibrium Problems Bui Van Dinh 1 and Le Dung Muu 2 1 Faculty of Information Technology, Le Quy Don Technical University, Hanoi, Vietnam 2 Institute of

More information

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Zhaosong Lu November 21, 2015 Abstract We consider the problem of minimizing a Lipschitz dierentiable function over a

More information

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Proximal-like contraction methods for monotone variational inequalities in a unified framework Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

More information

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics

More information

ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION

ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION ACCELERATED BUNDLE LEVEL TYPE METHODS FOR LARGE SCALE CONVEX OPTIMIZATION By WEI ZHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

Convex Optimization Methods for Dimension Reduction and Coefficient Estimation in Multivariate Linear Regression

Convex Optimization Methods for Dimension Reduction and Coefficient Estimation in Multivariate Linear Regression Convex Optimization Methods for Dimension Reduction and Coefficient Estimation in Multivariate Linear Regression Zhaosong Lu Renato D. C. Monteiro Ming Yuan January 10, 2008 Abstract In this paper, we

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement

More information

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 Nobuo Yamashita 2, Christian Kanzow 3, Tomoyui Morimoto 2, and Masao Fuushima 2 2 Department of Applied

More information

Duality Theory of Constrained Optimization

Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

More information

Hölder Metric Subregularity with Applications to Proximal Point Method

Hölder Metric Subregularity with Applications to Proximal Point Method Hölder Metric Subregularity with Applications to Proximal Point Method GUOYIN LI and BORIS S. MORDUKHOVICH Revised Version: October 1, 01 Abstract This paper is mainly devoted to the study and applications

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Noname manuscript No. (will be inserted by the editor Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Davood Hajinezhad and Mingyi Hong Received: date / Accepted: date Abstract

More information

Joint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018

Joint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018 EXTENDED EULER-LAGRANGE AND HAMILTONIAN CONDITIONS IN OPTIMAL CONTROL OF SWEEPING PROCESSES WITH CONTROLLED MOVING SETS BORIS MORDUKHOVICH Wayne State University Talk given at the conference Optimization,

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Complexity bounds for primal-dual methods minimizing the model of objective function

Complexity bounds for primal-dual methods minimizing the model of objective function Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Estimate sequence methods: extensions and approximations

Estimate sequence methods: extensions and approximations Estimate sequence methods: extensions and approximations Michel Baes August 11, 009 Abstract The approach of estimate sequence offers an interesting rereading of a number of accelerating schemes proposed

More information

Largest dual ellipsoids inscribed in dual cones

Largest dual ellipsoids inscribed in dual cones Largest dual ellipsoids inscribed in dual cones M. J. Todd June 23, 2005 Abstract Suppose x and s lie in the interiors of a cone K and its dual K respectively. We seek dual ellipsoidal norms such that

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information