arxiv: v2 [math.oc] 25 Mar 2018

Size: px
Start display at page:

Download "arxiv: v2 [math.oc] 25 Mar 2018"

Transcription

1 arxiv: v [math.oc] 5 Mar 018 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming Yangyang Xu Abstract Augmented Lagrangian method ALM has been popularly used for solving constrained optimization problems. Practically, subproblems for updating primal variables in the framewor of ALM usually can only be solved inexactly. The convergence and local convergence speed of ALM have been extensively studied. However, the global convergence rate of inexact ALM is still open for problems with nonlinear inequality constraints. In this paper, we wor on general convex programs with both equality and inequality constraints. For these problems, we establish the global convergence rate of inexact ALM and estimate its iteration complexity in terms of the number of gradient evaluations to produce a solution with a specified accuracy. We first establish an ergodic convergence rate result of inexact ALM that uses constant penalty parameters or geometrically increasing penalty parameters. Based on the convergence rate result, we apply Nesterov s optimal first-order method on each primal subproblem and estimate the iteration complexity of the inexact ALM. We show that if the objective is convex, then Oε 1 gradient evaluations are sufficient to guarantee an ε-optimal solution in terms of both primal objective and feasibility violation. If the objective is strongly convex, the result can be improved to Oε 1 logε. Finally, by relating to the inexact proximal point algorithm, we establish a nonergodic convergence rate result of inexact ALM that uses geometrically increasing penalty parameters. We show that the nonergodic iteration complexity result is in the same order as that for the ergodic result. Numerical experiments on quadratically constrained quadratic programming are conducted to compare the performance of the inexact ALM with different settings. Keywords: augmented Lagrangian method ALM, nonlinearly constrained problem, first-order method, global convergence rate, iteration complexity Mathematics Subject Classification: 90C06, 90C5, 68W40, 49M7. 1 Introduction In this paper, we consider the constrained convex programming minimizef 0 x, s.t. Ax = b,f i x 0,i = 1,...,m, 1 x X Y. Xu Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY xuy1@rpi.edu

2 Yangyang Xu where X R n is a closed convex set, A and b are respectively given matrix and vector, and f i is a convex function for every i = 0,1,...,m. Any convex optimization problem can be written in the standard form of 1. It appears in many areas including statistics, machine learning, data mining, engineering, signal processing, finance, operations research, and so on. Note that the constraint x X can be equivalently represented by using an inequality constraint ι X x 0 or adding ι X x to the objective, where ι X denotes the indicator function on X. However, we explicitly use it for technical reason. In addition, every affine constraint a j x = b j can be equivalently represented by two inequality constraints: a j x b j 0 and a j x+b j 0. That way does not change theoretical results of an algorithm but will mae the problem computationally more difficult. One popular method for solving 1 is the augmented Lagrangian method ALM, which first appeared in [16, 9]. ALM alternatingly updates the primal variable and the Lagrangian multipliers. At each update, the primal variable is renewed by minimizing the augmented Lagrangian AL function and the multipliers by a dual gradient ascent. The global convergence and local convergence rate of ALM have been extensively studied; see the boos [4, 5]. Several recent wors e.g., [14, 15] establish the global convergence rate of ALM and/or its variants for affinely constrained problems. In the framewor of ALM, the primal subproblem usually can only be solved inexactly, and thus practically inexact ALM ialm is often used. However, to the best of our nowledge, the global convergence rate of ialm for problems with nonlinear inequality constraints still remains open 1. We address this open question in this wor and also establish the iteration complexity of ialm in terms of the number of gradient evaluations. The iteration complexity result appears to be optimal. We will assume composite convex structure on 1. More specifically, we assume f 0 x = gx+hx, where g is a Lipschitz differentiable convex function, and h is a simple possibly nondifferentiable convex function. Also, f i is convex and Lipschitz differentiable for every i [m], namely, there are constants L 0,L 1,...,L m such that gˆx g x L 0 ˆx x, ˆx, x domh X, f i ˆx f i x L i ˆx x, ˆx, x domh X, i [m]. 3a 3b In addition, we assume the boundedness of domh X and denote its diameter as D = maximize ˆx x. ˆx, x domh X 1.1 Augmented Lagrangian function In the literature, there are several different penalty terms used in an augmented Lagrangian AL function, such as the classic one [30, 31], the quadratic penalty on constraint violation [3], and the exponential penalty [34]. The wor [] gives a general class of augmented penalty functions that satisfy certain properties. In 1 Although the global convergence rate in terms of augmented dual objective can be easily shown from existing wors e.g., see our discussion in section 5, that does not indicate the convergence speed from the perspective of the primal objective and feasibility. By simple, we mean the proximal mapping of h is easy to evaluate, i.e., it is easy to find a solution to min x X hx + 1 γ x ˆx for any ˆx and γ > 0.

3 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 3 this paper, we use the classic one. As discussed below, it can be derived from a quadratic penalty on an equivalent equality constrained problem. Introducing nonnegative slac variable s i s, one can write 1 to an equivalent form: minimize x X,s 0 f 0x, s.t. Ax = b,f i x+s i = 0,i = 1,...,m. 4 With quadratic penalty on the equality constraints, the AL function of 4 is L β x,s,y,z = f 0 x+y Ax b+ m z i fi x+s i + β Ax b + β m, fi x+s i where y and z are multipliers, and β is the augmented penalty parameter. Minimizing L β with respect to s 0 while fixing x,y and z, we have the optimal s given by [ s i = z ] i β f ix, i = 1,...,m. + Plugging the above s into L β gives where Let L β x,s,y,z = f 0 x+y Ax b+ β Ax b + m ψ β f i x,z i, uv + β ψ β u,v = u, if βu+v 0, v β, if βu+v < 0. 5 Ψ β x,z = and we obtain the classical AL function of 1: m ψ β f i x,z i, L β x,y,z = f 0 x+y Ax b+ β Ax b +Ψ β x,z Inexact augmented Lagrangian method The augmented Lagrangian methodalm was proposed in[16,9]. Within each iteration, ALM first updates the x variable by minimizing the AL function with respect to x while fixing y and z, and then it performs a dual gradient ascent update to y and z. In general, it is difficult to exactly minimize the AL function about x. A more realistic way is to solve the x-subproblem within a tolerance error, which leads to the inexact ALM. Its pseudocode is given in Algorithm 1 below. If ε = 0,, it reduces to the ALM. It is shown in [30] that the augmented dual function 3 d β y,z = min x X L β x,y,z is continuously differentiable, and d β is Lipschitz continuous with constant 1 β. In addition, it turns out that the inexact 3 Although [30] only considers the inequality constrained case, the results derived there apply to the case with both equality and inequality constraints.

4 4 Yangyang Xu Algorithm 1: Inexact augmented Lagrangian method for 1 1 Initialization: choose x 0,y 0,z 0 and {β,ρ } for = 0,1,... do 3 Find x +1 X such that L β x +1,y,z min x X L β x,y,z +ε. 7 4 Update y and z by y +1 = y +ρ Ax +1 b, 8 z +1 i = z i +ρ max z i β,f i x +1,i = 1,...,m. 9 ALM is an inexact augmented dual gradient ascent [31], and thus convergence rate of the inexact ALM in term of d β can be shown from existing results about inexact gradient method [33]. However, it is unclear from there how to get convergence rate result from the perspective of the primal problem, especially as β varies at different updates. Our analysis will be different from this line, and our results will be based on the primal problem. In addition, our requirement on the tolerance errors is weaer than that in [31]. The main results we establish in this paper are summarized as follows. Both ergodic and nonergodic convergence rate results are established. Theorem 1 Summary of main results For a given ε > 0, choose a positive integer K and numbers C β > 0,C ε > 0. Let {x,y,z } K be the iterates generated from Algorithm 1 with parameters set according to one of the follows: C ε 1. ρ = β = C β Kε,ε = ε C β,.. ρ = β = β g σ, for certain β g > 0 and σ > 1 such that β = C β ε, and ε = ε C ε C β,. 3. ρ = β = β g σ, for certain β g > 0 and σ > 1 such that β = C β ε. If f 0 is convex, let ε = Cε 1 β 3 1,, and if f t=0 β 3 0 is strongly convex, let ε = Cε 1 t β 1, t=0 β 1 t Then the averaged point x K = ρ x +1 t=0 ρt is an Oε-optimal solution see Definition 1, where the hidden constant depends on C β,c ε and dual solution y,z, and for the second and third settings, the actual point x K is also an Oε-optimal solution. In addition, the total number of evaluations on g and f i,i = 1,...,m is Oε 1 if f 0 is convex and Oε 1 logε if f 0 is strongly convex. The formal statement and the hidden constants are shown in Theorem 5 for the first setting, in Theorems 6 and 9 for the second setting, and in Theorems 7 and 9 for the third setting. 1.3 Contributions The contributions of this paper are mainly on establishing global convergence rate and estimating iteration complexity of ialm for solving 1, in both ergodic and nonergodic sense. They are listed as follows.

5 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 5 We first establish an ergodic convergence rate result of ialm through a novel analysis. With penalty parameters fixed to constant or increased geometrically, we choose the tolerance errors accordingly. By applying Nesterov s optimal first-order method to each x-subproblem, we show that to reach an ε-optimal solution, Oε 1 gradient evaluations are sufficient if the objective is convex, and the order is improved to Oε 1 logε if the objective is strongly convex. For the convex case, the result is optimal, and for the strongly convex, the result also appears the best compared to existing ones in the literature; see the discussions in Remar. We note that if ialm only runs to one iteration, i.e., a single x-subproblem 7 is solved, then the algorithm reduces to the penalty method by setting initial multipliers to zero vectors. Hence, as a byproduct, we establish the iteration complexity result of the penalty method for solving 1. The wor [19] analyzes first-order penalty method for solving affinely constrained problems. To have an ε-optimal solution, Oε gradient evaluations are required, if the penalty method is applied to the original problem, and Oε 1 logε gradient evaluations are required if the penalty method is applied to a carefully perturbed problem. Hence, our result is better than the former case by an order Oε 1 and the latter case by O logε. By relating to inexact proximal point algorithm ippa, we then establish a nonergodic convergence rate result of ialm through applying existing results in [3] about ialm. We show that with geometrically increasing penalty parameters, the nonergodic convergence of ialm enjoys the same order as that of the ergodic convergence, and the constant is just a few times larger. Compared to one recent nonergodic convergence result [1] for solving affinely constrained problem, our result is better by an order Oε Notation For simplicity, throughout the paper, we focus on a finite-dimensional Euclidean space, but our analysis can be directly extended to a general Hilbert space. We use italic letters a,c,b,l,..., for scalars, bold lower-case letters x,y,z,... for vectors, and bold upper-case letters A,B,... for matrices. z i denotes the i-th entry of a vector z. We use 0 to denote a vector of all zeros, and its size is clear from the context. [m] denotes the set {1,,...,m} for any positive integer m. Given a real number a, we let [a] + = max0,a and a be the smallest integer that is no less than a. For a vector a, [a] + taes the positive part of a in a component-wise manner. a denotes the Euclidean norm of a vector a and A the spectral norm of a matrix A. We denote l as the vector consisting of L i,i [m], where L i is the Lipschitz constant of f i in 3b. Also we let f be the vector function with f i as the i-th component scalar function. That is l = [L 1,...,L m ], fx = [f 1 x,...,f m x]. 10 Given a convex function f, fx represents one subgradient of f at x, namely, fˆx fx+ fx,ˆx x, ˆx, and fx denotes its subdifferential, i.e., the set of all subgradients. When f is differentiable, we simply write its subgradient as fx. For a convex set X, we use ι X as its indicator function, i.e., { 0, if x X, ι X x = +, if x X, and N X x = ι X x as its normal cone at x X. Given an ε > 0, the ε-optimal solution of 1 is defined as follows.

6 6 Yangyang Xu Definition 1 ε-optimal solution Let f0 called an ε-optimal solution to 1 if be the optimal value of 1. Given ε 0, a point x X is f 0 x f 0 ε, and Ax b + [fx]+ ε. 1.5 Outline The rest of the paper is organized as follows. In section, we give a few preparatory results and review Nesterov s optimal first-order method for solving a composite convex program. An ergodic convergence rate result of ialm is given in section 3, and a nonergodic convergence rate result is shown in section 4. Iteration complexity results in terms of the number of gradient evaluations are established for both ergodic and nonergodic cases. Related wors are reviewed and compared in section 5, and numerical results are provided in section 6. Finally section 7 concludes the paper. Preliminary results and Nesterov s optimal first-order method In this section, we give a few preliminary results and also review Nesterov s optimal first-order method for composite convex programs..1 Basic facts A point x, y, z satisfies the Karush-Kuhn-Tucer KKT conditions for 1 if m 0 f 0 x+n X x+a y+ z i f i x, Ax = b, x X, z i 0, f i x 0, z i f i x = 0, i [m]. 11a 11b 11c From the convexity of f i s, if x,y,z is a KKT point, then m f 0 x f 0 x + y,ax b + zif i x 0, x X. 1 The result below will be used to establish convergence rate of Algorithm 1. Lemma 1 Assume x,y,z satisfies the KKT conditions in 11. Let x be a point such that for any y and any z 0, m f 0 x f 0 x +y A x b+ z i f i x α+c 1 y +c z, 13 where α and c 1,c are nonnegative constants independent of y and z. Then α+4c 1 y +4c z f 0 x f 0 x α, 14 A x b + [f x]+ α+c1 1+ y +c 1+ z. 15

7 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 7 Proof. Letting y = 0 and z = 0 in 13 gives the second inequality in 14. For any nonnegative γ y and γ z, we let A x b y = γ y A x b, z = γ [f x] + z [f x]+ and have from 13 by using the convention 0 0 = 0 that Noting we have from 1 and 16 that f 0 x f 0 x +γ y A x b +γ z [f x]+ α+c1 γ y +c γ z. 16 y,a x b y A x b, m zi f i x z [f x]+, 17 γ y y A x b +γ z z [f x] + α+c1 γ y +c γ z In the above inequality, letting γ y = 1 + y and γ z = 1 + z gives 15, and letting γ y = y and γ z = z gives the first inequality in 14 by 1 and 17.. Nesterov s optimal first-order method In this subsection, we review Nesterov s optimal first-order method for composite convex programs. The method will be used to approximately solve x-subproblems in Algorithm 1. It aims at finding a solution of the following problem minimizeφx+ψx, 18 x where φ is a Lipschitz differentiable and strongly convex function with gradient Lipschitz constant L φ and strong convexity modulus µ 0, and ψ is a simple possibly nondifferentiable closed convex function. Algorithm summarizes the method. Here, for simplicity, we assume L φ and µ are nown. The method does not require the value of L φ but can estimate it by bactracing. In addition, it only requires a lower estimate of µ; see [7] for example. The theorem below gives the convergence rate of Algorithm for both convex i.e., µ = 0 and strongly convex i.e., µ > 0 cases; see [1,6,7]. We will use the results to estimate iteration complexity of ialm. Theorem Let {x } be the sequence generated from Algorithm. Assume x to be a minimizer of 18. The following results holds: 1. If µ = 0 and α 0 = 1, then. If µ > 0 and α 0 = µ L φ, then φx +ψx φx ψx L φ x 0 x φx φx L φ +µ x 0 x, µ 1, 1. 0 Lφ

8 8 Yangyang Xu Algorithm : Nesterov s optimal first-order method for 18 1 Initialization: choose ˆx 0 = x 0, α 0 0,1], and let q = µ L φ ; for = 0,1,..., do 3 Let 4 Set x +1 = argmin φˆx,x + L φ x x ˆx +ψx. and q α q + α +4α α +1 =, ˆx +1 = x +1 + α 1 α α +α x +1 x Ergodic convergence rate and iteration complexity results In this section, we first establish an ergodic convergence rate result of Algorithm 1. From that result, we then specify algorithm parameters and estimate the total number of gradient evaluations in order to produce an ε- optimal solution. Two different settings of the penalty parameters are studied: one with constant penalty and another with geometrically increasing penalty parameters. For each setting, the tolerance error parameter ε is chosen in an optimal way so that the total number of gradient evaluation is minimized. Throughout this section, we mae the following assumptions. Assumption 1 There exists a point x,y,z satisfying the KKT conditions in 11. Assumption For every, there is x +1 satisfying 7. The first assumption holds if a certain regularity condition is satisfied, such as the Slater condition namely, there is an interior point x of X such that Ax = b and f i x < 0, i [m]. The second assumption is for the well-definedness of the algorithm. It holds if X is compact and f i s are continuous. 3.1 Convergence rate analysis of ialm To show the convergence results of Algorithm 1, we first establish a few lemmas. Lemma Let y and z be updated by 8 and 9 respectively. Then for any, it holds 1 [ y +1 y y y + y +1 y ] y +1 y,r +1 = 0, ρ 1 1 [ z +1 z z z + z +1 z ] m z +1 i z i max z i,f i x +1 = 0, ρ β where r = Ax b. Proof. Using the equality u v = u u v + v, we have the results from the updates 8 and 9.

9 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 9 Lemma 3 For any z 0, we have m [z i +β f i x +1 m ] + z i fi x +1 z +1 i z i max z i β,f i x +1 1 ρ β ρ z +1 z. 3 Proof. Denote Then I + = {i [m] : z i +β f i x +1 0}, I = [m]\i +. 4 the left hand side of 3 = [ z i z i f i x +1 +β [f i x +1 ] zi +ρ f i x +1 z i fi x +1 ] i I + + i I [ z i f i x +1 zi ρ zi z ] z i i β β =β ρ [f i x +1 ] + i I + β ρ i I + i I [f i x +1 ] + 1 β = 1 ρ β ρ z +1 z, [ ] z i fi x +1 + z i 1 + β β β ρ zi β ρ zi i I where the inequality follows from z i 0 and f i x +1 + z i β 0, i I, and the last equality holds due to the update 9. The next theorem is a fundamental result by running one iteration of Algorithm 1. Theorem 3 One-iteration progress of ialm Let {x,y,z } be the sequence generated from Algorithm 1. Then for any x X such that Ax = b and f i x 0, i [m], any y, and any z 0, it holds that f 0 x +1 f 0 x+y r +1 + m z i f i x +1 + β ρ r +1 + β ρ ρ z +1 z + 1 y +1 y + 1 z +1 z ρ ρ 1 y y + 1 z z +ε. 5 ρ ρ Proof. From 7, it follows that for any x such that Ax = b, f 0 x +1 + y,r +1 + β r+1 +Ψ β x +1,z f 0 x Ψ β x,z ε. 6

10 10 Yangyang Xu Since y,r +1 = y +1 y,r +1 + y,r +1 ρ r +1, by adding 1 and to the above inequality, we have m m f 0 x +1 f 0 x+y r +1 + z i f i x +1 + [z i +β f i x +1 ] + z i fi x +1 + β ρ r +1 +Ψ β x +1,z m [zi +β f i x +1 ] + f i x +1 Ψ β x,z + 1 ρ [ y +1 y y y + y +1 y ] + 1 ρ [ z +1 z z z + z +1 z ] ε. Note that Ψ β x +1,z = i I + = m [zi +β f i x +1 ] + f i x +1 m z +1 i [ zi f ix +1 + β ] [f ix +1 ] [zi +β f i x +1 ]f i x +1 i I + β [f ix +1 ] i I z i β z i max z i β,f i x +1 + ] [ z i β = β ρ z +1 z, 8 z where the sets I+ and I are defined in 4. In addition, if f ix 0, i [m], then Ψ β x,z 0. Hence, plugging 3 and 8 into 7 yields 5. By Lemma 1 and Theorem 3, we have the following convergence rate estimate of Algorithm 1. Theorem 4 Ergodic convergence rate of ialm Under Assumptions 1 and, let {x,y,z } be the sequence generated from Algorithm 1 with y 0 = 0,z 0 = 0 and 0 < ρ β,. Then f0 x K f 0 x 1 t=0 ρ t i I y + z + ρ ε 7, 9a A x K b + [f x K ] y t=0 ρ + 1+ z + ρ ε, 9b t where x K = t=0 ρ tx t+1 t=0 ρ. 30 t

11 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 11 Proof. Since ρ β, the two terms β ρ r +1 and β ρ z +1 z are nonnegative. Dropping them ρ and multiplying ρ to both sides of 5 yields ] m ρ [f 0 x +1 f 0 x+y r +1 + z i f i x y+1 y + 1 z+1 z 1 y y + 1 z z +ρ ε. 31 Summing up 5 with x = x gives ρ [f 0 x +1 f 0 x +y r +1 + ] m z i f i x yk y + 1 zk z 1 y0 y + 1 z0 z + ρ ε. 3 By the convexity of f i s and the nonnegativity of z, we have which together with 3 implies f 0 x K f 0 x +y A x K b+ m z i f i x K 1 t=0 ρ ρ [f 0 x +1 f 0 x +y r +1 + t ] m z i f i x +1, m f 0 x K f 0 x +y A x K b+ z i f i x K 1 1 t=0 ρ t y + 1 z + ρ ε. The results thus follow from Lemma 1 with α = ρ ε ρ, c 1 = 1 ρ, c = 1 ρ, and we complete the proof. Remar 1 Notethatifρ ρ > 0, and{ε }issummable,then asublinearconvergenceresultfollowsfrom 9. The wor [31] has also analyzed the convergence of Algorithm 1 through the augmented dual function d β. However, it requires =1 ε <, which is strictly stronger than the condition =1 ε <.

12 1 Yangyang Xu 3. Iteration complexity of ialm In this subsection, we apply Nesterov s optimal first-order method to each x-subproblem 7 and estimate the total number of gradient evaluations to produce an ε-optimal solution of 1. Note that the convergence rate results in Theorem 4 do not assume specific structures of 1 except convexity. If the problem 1 has richer structures than those in 3, more efficient methods can be applied to the subproblems in 7. The following results are easy to show from the Lipschitz differentiability of g and f i, i [m]. Proposition 1 Assume 3a,3b,and the boundedness of domh X. Then there exist constantsb 1,...,B m such that Let the smooth part of L β be denoted as max f i x, f i x B i, x domh X, i [m], 33a f i ˆx f i x B i ˆx x, ˆx, x domh X, i [m]. 33b F β x,y,z = L β x,y,z hx. Based on 33, we are able to show Lipschitz continuity of x F β x,y,z with respect to x for every y,z. Lemma 4 Assume 3a, 3b, and the boundedness of domh X. Let B i s be given in Proposition 1. Then x F β x,y,z is Lipschitz continuous on domh X in terms of x with constant Lz = L 0 +β A A + m β B i B i +L i +L i zi. 34 Proof. For ease of description, we let β = β and y,z = y,z in the proof. First we notice that u ψ βu,v = [βu+v] +, and thus for any v, u ψ βû,v u ψ βũ,v β û ũ, û,ũ. Let h i x,z i = ψ β f i x,z i, i = 1,...,m. Then x h i ˆx,z i x h i x,z i = u ψ βf i ˆx,z i f i ˆx u ψ βf i x,z i f i x u ψ βf i ˆx,z i f i ˆx u ψ βf i x,z i f i ˆx + u ψ βf i x,z i f i ˆx u ψ βf i x,z i f i x β f i ˆx f i x f i ˆx + u ψ βf i x,z i fi ˆx f i x βb i ˆx x +L iβb i + z i ˆx x.

13 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 13 Hence, x F β ˆx,y,z x F β x,y,z m gˆx g x +β A Aˆx x + x h i ˆx,z i x h i x,z i m L 0 +β A [ A + βb i +L i βb i + z i ] ˆx x, which completes the proof. Therefore, letting φx = F β x,y,z and ψx = hx + ι X x, we can apply Nesterov s optimal first-order method in Algorithm to find x +1 in 7. From Theorem, we have the following results. Note that if the strong convexity constant µ = 0, the problem is just convex. Lemma 5 Assume that g is strongly convex with modulus µ 0. Given ε > 0, if we start from x and run Algorithm, then at most t iterations are needed to produce x +1 such that 7 holds, where distx,x Lz, if µ = 0, ε t = log Lz +µ ε [distx,x ] log1/ 1 µ Lz and X denotes the set of optimal solutions to min x X L β x,y,z., if µ > 0, Below we specify the sequences {β },{ρ } and {ε } for a given ε > 0, and through combining Theorem 4 and Lemma 5, we give the iteration complexity results of ialm for producing an ε-optimal solution. We study two cases. In the first case, a constant penalty parameter is used, and in the second case, we geometrically increase β and ρ. Given ε > 0, we set {β } and {ρ } according to one of the follows: Setting 1 constant penalty Let K be a positive integer number and C β a positive real number. Set 35 ρ = β = C β, 0 < K. Kε Setting geometrically increasing penalty Let K be a positive integer number, C β a positive real number, and σ > 1. Set β g = C β σ 1 ε σ K 1, 36 and ρ = β = β g σ, 0 < K.

14 14 Yangyang Xu Note that if K = 1, the above two settings are the same, and in this case, Algorithm 1 simply reduces to the quadratic penalty method. For either of the above two settings, we have ρ = C β ε. From 34, we see that the Lipschitz constant depends on z. Hence, from 35, to solve the x-subproblem to the accuracy ε, the number of gradient evaluations will depend on z. Below we show that if ε is sufficiently small, z can be bounded and thus so is Lz. Lemma 6 Let {x,y,z } K be the sequence generated from Algorithm 1 with {β } and {ρ } set according to either Setting 1 or Setting. If y 0 = 0,z 0 = 0, and ε s are chosen such that then ρ ε C ε, 37 where and l is given in 10. H = A A + m Lz L +β H, 0 K, 38 B i B i +L i, L = L 0 + l y + z + C ε Proof. Letting x,y,z = x,y,z in 31 and using 1, we have 1 y+1 y + 1 z+1 z 1 y y + 1 z z +ρ ε. Summing the above inequality yields which implies 1 y y + 1 z z 1 y0 y z0 z + ρ t ε t, 0 K, t=0 1 z z + y0 y + z 0 z + ρ t ε t. Since u u 1 for any vector u, we have from the above inequality that z z + y 0 y + z 0 z + ρ t ε t, 0 K. 39 Hence, if y 0 = 0 and z 0 = 0, and 37 holds, it follows from the above inequality that t=0 z y + z + C ε, 0 K, 40 By the Cauchy-Schwartz inequality, we have from 34 that for any 0 K, Lz L 0 +β H + z l t=0

15 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 15 which together with 40 gives the result in 38. Optimal subproblem accuracy parameters. If t gradient evaluations are required to produce x +1, then the total number of gradient evaluations is T K = t to generate {x } K =1. Given ε > 0, and {β },{ρ } set according to either Setting 1 or Setting, we can choose {ε } to minimize T K subject to the condition in 37. When µ = 0, we solve the following problem: minimize ε>0 distx,x Lz, s.t. ε β ε C ε, where ε = [ε 0,...,ε ]. Through formulating the KKT system of the above problem, one can easily find the optimal ε given by ε = C ε When µ > 0, we solve the problem below: minimize ε>0 whose optimal solution is given by [distx,x ] 3[Lz ] 1 3 β 3 t=0 β 1 3 t [distx t,x t ] 3[Lz t ] 1 3 Lz Lz +µ log [distx,x µ ε ], s.t. ε = C ε, 0 < K. 41 β ε C ε, 4 Lz, 0 < K. 43 β t=0 Lzt Note that the summand in the objective of 4 is not exactly the same as that in the second inequality of 35. They are close if µ Lz since log1+a = a+oa. The optimal ε given in 41 and 43 depends on distx,x and the future points z+1,...,z, which are unnown at iteration. We do not assume these unnowns. Instead, we set ε in two different ways. One way is to simply set for both cases of µ = 0 and µ > 0. Another way is to let ε = C ε C ε ε = ε, 0 < K, 44 C β 1 β 1 3 t=0 β 3 t, 0 < K, 45 for the case of µ = 0, and ε = C ε 1, 0 < K, 46 β t=0 βt for the case of µ > 0. We see that if β H dominates L and distx,x is roughly the same for all s, then {ε } in 45 and 46 well approximate those in 41 and 43. If {β } and {ρ } are set according to Setting 1, i.e., constant parameters, then the {ε } in both 45 and 46 is constant as in 44. Plugging these parameters into 35, we have the following estimates on the total number of gradient evaluations.

16 16 Yangyang Xu Theorem 5 Iteration complexity with constant penalty and constant error For any given ε > 0, let K be a positive integer number and C β,c ε two positive real numbers. Set {β } and {ρ } according to Setting 1 and {ε } by 44. Let x K be given in 30. Then f 0 x K f 0 x ε y + z + ε C β A x K b + [f x K ] + ε[ 1+ y +1+ z ] + ε C β C ε C β, C ε C β. 47a 47b Assume µ L0 4. Then Algorithm 1 can produce xk by evaluating gradients of g,f i, i [m] in at most T K times, where Cβ L T K DK ε + 1 Cβ H +K, if µ = 0, 48 ε K C ε and T K K L µ + C β H µkε D C β L +µ log + C βh C ε ε Kε +K, if µ > Proof. The results in 47 directly follows from Theorem 4 and the settings of {β }, {ρ }, and {ε }. For the total number of gradient evaluations, we use the inequalities in 35. First, for the case of µ = 0, from the first inequality of 35 and the parameter setting, it follows that the total number of gradient evaluations T K distx,x L + C βh Kε +K. 50 ε/ Cε /C β Since a+b a+ b for any two nonnegative numbers a,b, we have from the above inequality and by noting distx,x D that T K D Cβ C ε C L + β H Kε ε +K = DK Cβ C ε L ε + 1 Cβ H +K, ε K which gives 48. For the case of µ > 0, we first note that for any 0 < a 1, it holds log1+a a a µ L0 4, we have µ Lz 4 and log µ/lz 1 µ/lz 1 1 µ/lz = log Therefore, µ/lz 1 1 µ/lz µ/lz 1 µ/lz, a. Hence, if and thus 1 log 1 1 µ/lz Lz µ 1 µ/lz Lz µ. 51

17 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 17 Using the above inequality and the second inequality of 35, we have that the total number of gradient evaluations L + C βh Kε L + C βh Kε T K log +µ [distx,x µ εc ε /C ] +K. 5 β Since L + C βh Kε C L + β H Kε and distx,x D, the above inequality implies 49. This completes the proof. We mae two observations below about the results in Theorem 5. Remar From the error bounds in 47, we see that if C β max 4 y +4 z,1+ y +1+ z +C ε, 53 then x K is an ε-optimal solution. Otherwise, the errors in 47 are multiples of ε. If we represent ε by the total number t of gradient evaluations, we can obtain the convergence rate result in terms of t. Let C β = C ε L and K = 1 in 48. Then the total number of gradient evaluations is about t = D ε + ε 1 Cβ H. By quadratic formula, one can easily show that ε = D L + L D +Dt C β H t 4L D t + 4D Cβ H. t Let ˆx t = x K to specify the dependence of the iterate on the number of gradient evaluations. Plugging the above ε into 47, we have f 0 ˆx t f 0 x y + z + 1 4L D C β t + 4D Cβ H, 54a t Aˆx t b + [fˆx t ] + 1+ y +1+ z + 1 4L D C β t + 4D Cβ H. 54b t If there are no equality or inequality constraints, then H = 0, y = 0,z = 0, and the rate of convergence in 54a matches with the optimal one in 19; if the objective f 0 x 0 and there areno inequality constraints, thenh = A A,y = 0,z = 0,L = 0,andthe rateofconvergencewith C β = in54broughlybecomes Aˆx t b 8 D A A t, whose order is also optimal. Therefore, the order of convergencerate in 54 is optimal, and so is the iteration complexity result in 48. For the strongly convex case, if there are no equality or inequality constraints, the iteration complexity result in 49 is optimal by comparing it to 0. With the existence of constraints and nonsmooth term in the objective, the order 1/ ε also appears to be optimal and is the best we can find in the literature; see the discussion in section 5.

18 18 Yangyang Xu Remar 3 From both 48 and 49, we see that T 1 T K, K 1, i.e., K = 1 is the best. Note that if y 0 = 0,z 0 = 0, and K = 1, Algorithm 1 reduces to the quadratic penalty method by solving a single penalty problem. However, practically K > 1 could be better since distx,x usually decreases as increases. Hence, from 50 or 5, T K can be smaller than T 1 if K > 1; see our numerical results in Table 1. Theorem 6 Iteration complexity with geometrically increasing penalty and constant error For any given ε > 0, let K be a positive integer number and C β,c ε two positive real numbers. Set {β } and {ρ } according to Setting and {ε } to 44. Assume µ L0 4. Let xk be given in 30. Then the inequalities in 47 hold, and Algorithm 1 can produce x K by evaluating gradients of g,f i, i [m] in at most T K times, where and where T K T K D Cβ C ε G ε K K L ε + Cβ Hσ 1 ε +K, if µ = 0, 55 σ 1 L µ + H Cβ σ 1 +K, if µ > µ ε σ 1 G ε = log C βd +log L +µ+ H C β σ 1+β g ε. εc ε σε Proof. When µ = 0, we have from the first inequality in 35 that the total number of gradient evaluations satisfies distx,x T K L +β H +K. 57 ε Plugging into 57 the ε given in 44 and noting distx,x D yields T K D Cβ C ε Note that β = σ β K 1 g σ 1. From 36, it holds and thus σ K 1 Cβ σ 1 β gε. Therefore, σ K = C βσ 1 β g ε and using L +β H L + β H, we have L +β H L +β H ε +K , 59 β Cβ σ 1 ε σ 1, 60 L + β H K L + Cβ Hσ 1 ε σ 1, 61

19 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 19 which together with 58 gives 55. For the strongly convex case, we use 51 and the second inequality of 35 to have T K L +β H L +β H +µ log [distx,x µ ε ] +K. 6 Since distx,x D and ε s are set to those in 44, the above inequality indicates For 0 < K, T K L +β H log C βd L +β H +µ +K. 63 µ εc ε β β = β g σ = β g 59 σk = β g Cβ σ 1 +1 σ σ β g ε = C βσ 1+β g ε. 64 σε Plugging into 63 the second inequality in 61 and the above bound on β, we have 56 and thus complete the proof. Remar 4 Comparingthe iteration complexity results in Theorems 5 and 6, we see that ifk = 1, the number T K in either case of µ = 0 or µ > 0 is the same for both penalty parameter settings as σ. That is because when K = 1, ialm with either of the two settings reduces to the penalty method. If K > 1, the number T K for the setting of geometrically increasing penalty can be smaller than that for the constant parameter setting as σ is big; see numerical results in section 6. Theorem 7 Iteration complexity with geometrically increasing penalty and adaptive error For any given ε > 0, let K be a positive integer number and C β,c ε two positive real numbers. Set {β } and {ρ } according to Setting. Assume µ L0 4. If µ = 0, set {ε } as in 45, and if µ > 0, set {ε } as in 46. Let x K be given in 30. Then the inequalities in 47 hold, and Algorithm 1 can produce x K by evaluating gradients of g,f i, i [m] in at most T K times, where T K D Cβ L C ε ε σ 1 1 σ 1 6 1σ HCβ σ 1 + +K, if µ = 0, 65 εσ and where T K G ε K L µ + H Cβ σ 1 +K, if µ > µ ε σ 1 G ε = log C βd +log L +µ+ H C β σ 1+β g ε σ 1 +β g εσ 1/C β +log εc ε σε σ. σ

20 0 Yangyang Xu Proof. For the case of µ = 0, we have 57, plugging into which the ε given in 45 yields T K distx,xβ 1 6 L +β H 1 +K. Cε t=0 β 3 t Since distx,x D, the above inequality implies T K D Note that and t=0 β 1 6 L +β H 1 Hence, it follows from 67 that Cε β 3 t = T K D β 1 σ K g Cε σ 3 1 t=0 β 3 t t=0 β 1 6 L + β H β 1 6 L +β H 1 +K. 67 β 3 g σ t 3 = β 1 σ K g σ 3 1, L β 1 σ K g = L From 59 and the fact a+b a+ b, a,b 0, it follows that σ K Cβ σ β g ε 3, σ K β H = L β 1 σ K g σ Hβ 3 g β 3 σ K 3 1 σ 3 1. σ Hβ σ K g +K. 68 σ Cβ σ 1 β g ε Therefore, plugging the two inequalities in 69 into 68 yields 65. Forthe caseofµ > 0, we have6. Since distx,x D and ε s areset to those in 46, the inequality in 6 indicates T K = L +β H µ L +β H µ log log β t=0 L +β H +µ βt D +K C ε β D t=0 βt +log L +β H +µ C ε +K. 70 Therefore, plugging into 70 the inequality in 60, the upper bounds of L +β H and β in 61 and 64 respectively, we obtain 66 and complete the proof.

21 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 1 Remar 5 Let us compare the iteration complexity results in Theorems 6 and 7. We see that for the case L HCβ of µ = 0, as K > 1 and σ is big, if ε dominates ε, the iteration complexity result in Theorem 7 is HCβ L ε, the two better than that in Theorem 6 see the numerical results in Table, and if ε dominates results are similar. For the case of µ > 0, as K > 1, the iteration complexity result in Theorem 6 is better than that in Theorem 7. 4 Nonergodic convergence rate and iteration complexity In this section, we show a nonergodic convergence rate result of Algorithm 1, by employing the relation between ialm and the inexact proximal point algorithm ippa. Throughout this section, we assume there is no affine equality constraint in 1, i.e., we consider the problem minimizef 0 x, s.t. f i x 0, i [m], 71 x X where f i,i = 0,1,...,m, satisfy the assumptions through 3b. We do not include affine equality constraints for the purpose of directly applying existing results in [30, 3]. Although results similar to those in [30,3] can possibly be shown for the equality and inequality constrained problem 1, we do not extend our discussion but instead formulate any affine equality constraint a x = b by two affine inequality constraints a x b 0 and a x+b 0 if there is any. 4.1 Relation between ialm and ippa Let L 0 x,z be the Lagrangian function of 71, namely, L 0 x,z = f 0 x+ m z i f i x, and let L β x,z be the augmented Lagrangian function of 71, defined in the same way as that in 6. In addition, let d 0 z be the Lagrangian dual function, defined as { minx X L 0 x,z, if z 0, d 0 z =, otherwise, and let d β z min x X L β x,z be the augmented Lagrangian dual function. Applying Algorithm 1 with ρ = β to 71, we have iterates {x,z } that satisfy: L β x +1,z d β z +ε, z +1 = z +β z L β x +1,z. 7a 7b The ippa applied to the Lagrangian dual problem max z d 0 z iteratively performs the updates: z +1 M β z, 73

22 Yangyang Xu where the operator M β is the proximal mapping of βd 0, defined as M β z = argmaxd 0 u 1 u β u z. In 73, the approximation could be measured by the objective error as in 7a or by the gradient norm at the returned point z +1 ; see [13] for example. It was noted in [30] that d β z = max d 0u 1 u β u z, 74 and in addition, if ˆx X satisfies L β ˆx,z d β z+ε, then z+β z L β ˆx,z Mz ialm with updates in 7 reduces to ippa in 73 with approximation error βε. Therefore, z +1 M β z β ε Nonergodic convergence rate of ialm For ialm with updates in 7 on solving 71, [3, Theorem 4] establishes the following bounds on the objective error and feasibility violation: If in 7a, ε = 0,, [1, Theorem.] shows that f 0 x +1 f 0 x ε + z z +1 β, 76a f i x +1 z i z+1 i β, i [m]. 76b z z +1 β z0 z t=0 β. 77 t Therefore, combining the results in 76 with ε = 0, and 77, and also noting the boundedness of z from 40, one can easily obtain a nonergodic convergence rate result of exact ALM on solving 71. However, if ε > 0, we do not notice any existing result on estimating z z +1 β. In the following, we establish a bound on this quantity and thus show a nonergodic convergence rate result of ialm. Lemma 7 Given a positive integer K and a nonnegative number C ε, choose positive sequences {β } and {ε } such that β ε Cε. Let {x,z } K be the sequence generated from the updates in 7 with z 0 = 0 on solving 71. Then where we have assumed that 71 has a primal-dual solution x,z. z z +1 5 z + 7 Cε, 78

23 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 3 Proof. Let z +1 = M β z. Then from 74, it follows that 1 β z +1 z = d 0 z +1 d β z. 79 By the wea duality, it holds d 0 z +1 f 0 x. From 7a, we have Recall the definition of ψ β u,v in 5 and note ψ β u,v v β. Hence, Thus by 80, it holds that and Hence, L β x +1,z = f 0 x +1 + d β z L β x +1,z ε. 80 m d β z f 0 x +1 z ψ β f i x +1,z i f 0 x +1 z β. β ε, d 0 z +1 d β z f 0 x f 0 x +1 + z Since x,z is a primal-dual solution of 71, similar to 1, it holds that f 0 x f 0 x + f 0 x f 0 x +1 Therefore, from 81 and the above inequality, it follows that and noting 79, we have β +ε. 81 m zif i x 0, x X. 8 m zi f ix +1 76b z z +1 z. d 0 z +1 d β z z z +1 β z + z β +ε, z +1 z = β d0 z +1 d β z z z +1 z + z +β ε. By the triangle inequality, we have from 75 and the above inequality that z z +1 β z z +1 z + z +β ε + z + z +1 z + z + 3 z + z +1 β ε β ε + z + z + 3 Cε, 83

24 4 Yangyang Xu where the last inequality uses the Young s inequality and the fact β ε Cε. Through the same arguments as those in Lemma 6, one can show z z + C ε, 0 K, 84 i.e., the inequality in 40 with y = 0. Plugging the above bound on z into 83 gives the desired result in 78. Combining 76 and 78, we are able to establish the nonergodic convergence rate result of ialm on solving 71. Theorem 8 nonergodic convergence rate Under the same assumptions of Lemma 7, it holds that for any 0 < K, f0 x +1 f 0 x ε + z + C ε β 5 z + 7 Cε, 85a [fx +1 ] z + 7 Cε. 85b β Proof. Directly from 76, 78, and 84, we obtain 85b and Using 8, we have from 85b that f 0 x +1 f 0 x ε + z + C ε β f 0 x +1 f 0 x z β 5 z + 7 Cε z + 7 Cε, which together with 86 gives 85a. Remar 6 From the results in 85, we see that to have {x } to be a minimizing sequence of 71, we need β and ε 0 as. Hence, setting {β } to a constant sequence will not be a valid option. 4.3 Iteration complexity In this subsection, we set parameters according to Setting, and we estimate the iteration complexity of ialm on solving 71 by applying Nesterov s optimal first-order method to 7a. Again, note that the results in Theorem 8 do not need specific structure of 71 except convexity. Hence, if the problem has richer structures, one can apply more efficient methods to find x +1 that satisfies 7a. Theorem 9 Nonergodiciteration complexity Given apositive integer K and positive numbers C β,c ε, choose positive sequences {ρ } and {β } according to Setting. In addition, choose {ε } according to 44 for both cases of µ = 0 and µ > 0, or choose {ε } according to 45 for the case of µ = 0 and 46 for µ > 0.

25 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 5 Let {x,z } K be the sequence generated from Algorithm 1 with y = 0,, and z 0 = 0 on solving 71. Then f 0 x K f 0 x ε C ε C β + [fx K ] + εσ C β σ 1 εσ z C β σ 1 + C ε 5 z + 7 Cε, 87a 5 z + 7 Cε. 87b If {ε } is chosen according to 44 for both cases of µ = 0 and µ > 0, the total number T K of gradient evaluations is given in 55 and 56 respectively; if {ε } is set according to 45 for the case of µ = 0 and 46 for µ > 0, then T K is given in 65 for µ = 0 and 66 for µ > 0. Proof. Note that β is increasing with respect to. Hence, the ε given in both 45 and 46 is decreasing, and thus t=0 ε β tε t t=0 β ε C ε. t C β If {ε } is chosen according to 44 for both cases of µ = 0 and µ > 0, then the above bound on ε obviously holds. In addition, from 64, we have β C βσ 1. εσ Therefore, plugging into 85 the bounds on ε and β gives the desired results in 87. The bounds on the total number T K of gradient evaluations follow from the same arguments as in the proofs of Theorems 6 and 7. Hence, we complete the proof. Remar 7 From the results in 87, we see that if C β C ε + σ z σ 1 + C ε 5 z + 7 Cε, 88 then x K is an ε-optimal solution to 71. If z 1, C ε = z, and σ σ 1 1 e.g., σ = 10 is often used, then the C β in 88 is roughly 10 times of that in 53 by assuming no affine constraint. For the iteration L complexity, if ε dominates H z ε, then the nonergodic result is roughly 10 times of the ergodic result H z for both convex and strongly convex cases. If ε dominates, then the former would be roughly 10 times of the latter for the convex case, but still roughly 10 times for the strongly convex case. However, in either case, both ergodic and nonergodic results have the same order of complexity. 5 Related wors and comparison with existing results In this section, we review related wors and compare them to our results. Our review and comparison focus on convex optimization, but note that ALM has also been popularly applied to nonconvex optimization problems; see [4 6] and the references therein.

26 6 Yangyang Xu Affinely constrained convex problems Several recent wors have established the convergence rate of ALM and its inexact version for affinely constrained convex problems: minimizef 0 x, s.t. Ax = b. 89 x X Assuming exact solution to every x-subproblem, [14] first shows O1/ convergence of ALM for smooth problems in terms of dual objective and then accelerates the rate to O1/ by applying Nesterov s extrapolation technique to the multiplier update. The results are extended to nonsmooth problems in [18] that uses similar technique. By adapting parameters, [35] establishes O1/ convergence of a linearized ALM in terms of primal objective and feasibility violation. The linearized ALM allows linearization to smooth part in the objective but still assumes exact solvability of x-subproblems. When the objective is strongly convex, [17] proves O1/ convergence of ialm with extrapolation technique applied to the multiplier update. It requires summable error and subproblems to be solved more and more accurately. However, it does not give an estimate on the total number of gradient evaluations on solving all subproblems to the required accuracies. For smooth linearly constrained convex problems, [0] analyzes the iteration complexity of the ialm. It applies Nesterov s optimal first-order method to every x-subproblem and shows that Oε 7 4 gradient evaluations are required to reach an ε-optimal solution. Compared to this complexity, our results for the convex case are better by an order Oε 4. 3 In addition, [0] modifies the ialm by solving a perturbed problem. The modified ialm requires Oε 1 logε 3 4 gradient evaluations to produce an ε-optimal solution, and this order is worse than our results by an order O logε 3 4. Motivated by the model predictive control, [3] also analyzes the iteration complexity of inexact dual gradient methods idgm that are essentially ialms.itshowsthattoreachanε-optimalsolution 4,anonacceleratediDGMrequiresOε 1 outeriterations and every x-subproblem solved to an accuracy Oε, and an accelerated idgm requires Oε 1 outer iterations and every x-subproblem solved to an accuracy Oε 3. While the iteration complexity in [0] is estimated based on the best iterate, and that in [3] is ergodic, the recent wor [1] establishes non-ergodic convergence of ialm. It requires Oε gradient evaluations to reach an ε-optimal primal-dual solution x,ȳ in the sense that A x b ε, g x+a ȳ, x x +h x hx ε, x, 90 where it is assumed that f 0 = g + h in 89 and g is Lipschitz differentiable. From the convexity of g, it follows from 90 that f 0 x f 0 x g x, x x +h x hx ε ȳ,a x b ε+ ȳ ε. Hence, if ȳ 0, to have an ε-optimal solution by our Definition 1, the iteration complexity result in [1] would be Oε 4, which is Oε 3 worse than our nonergodic iteration complexity result in Theorem 9. Another line of existing wors on ialm assume two or multiple bloc structure on the problem and simply perform one cycle of Gauss-Seidel update to the bloc variables or update one randomly selected bloc. Global sublinear convergence of these methods has also been established. Exhausting all such wors is impossible and out of scope of this paper. We refer interested readers to [7 10,15,8,36,37] and the references therein. 4 [3] assumes every subproblem solved to the condition L β x +1,y,x x +1 Oε, x X, which is implied by L β x +1,y min x X L β x,y Oε if L β is Lipschitz differentiable with respect to x.

27 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming 7 General convex problems As there are nonlinear inequality constraints, we do not find any wor in the literature showing the global convergence rate of ialm, though its local convergence rate has been extensively studied e.g., [3, 30, 3]. Many existing wors on nonlinearly constrained convex problems employ Lagrangian function instead of the augmented one and establish global convergence rate through dual subgradient approach e.g., [, 4, 5]. For general convex problems, these methods enjoy O1/ convergence, and for strongly convex case, the rate can be improved to O1/. To achieve an ε-optimal solution, compared to our results, their iteration complexityisoε 1 timesworsefortheconvexproblemsandoε 1 worseforthestronglyconvexproblems. Assuming Lipschitz continuity of f i for every i [m], [39] proposes a new primal-dual type algorithm for nonlinearly constrained convex programs. Every iteration, it minimizes a proximal Lagrangian function and updates the multiplier in a novel way. With sufficiently large proximal parameter that depends on the Lipschitz constants of f i s, the algorithm converges in O1/ ergodic rate. The follow-up paper [38] focuses on smooth constrained convex problems and proposes a linearized variant of the algorithm in [39]. Assuming compactness of the set X, it also establishes O1/ ergodic convergence of the linearized method. Iteration complexity from existing results on ippa Through relating ialm and ippa, iteration complexity result can be obtained from existing results about ippa to produce near-optimal dual solution. On solving problem min z φz, [13] analyzes the ippa with iterative update: z +1 argminφz+ 1 z ẑ. z β If the above approximation error satisfies z +1 prox β φẑ = O1/ a, 91 for a certainnumber a > 1, and the parameterβ is increasing,then by choosingspecifically designed ẑ, [13] shows that φz φz = O1/ +O1/ a 1. From our discussion in section 4.1, if ε = O 1 a β in 7a, then we have 91 holds with φ = d 0, and thus obtain the convergence rate in terms of dual function: d 0 z d 0 z = O1/ +O1/ a 1. Note that z is bounded from the summability of β ε and the proof of Lemma 6. Hence, setting β to a constant for all and applying Nesterov s optimal first-order method to each subproblem in 7a, we need O a gradient evaluations. Let a = 3. Then K = O1/ ε ippa iterations are required to obtain an ε-optimal dual solution, i.e., d 0 z K d 0 z ε, and the total number of gradient evaluations is T K = K =1 O 3 = OK 5 = Oε 5 4. However, it is not clear how to measure the quality of the primal iterates.

28 8 Yangyang Xu 6 Numerical results In this section, we conduct numerical experiments on the quadratically constrained quadratic programming QCQP: 1 minimize x R n x Q 0 x+c 0 x, 1 s.t. x Q j x+c j x+d j 0, j = 1,...,m, x i [l i,u i ], i = 1,...,n. The purpose of the tests is to verify the established theoretical results and compare the ialm with three different settings of parameters. Three QCQP instances are made. The first two instances are convex, and the third one is strongly convex. For all three instances, we set n = 100,m = 5 and l i = 1,u i = 1, i. The vectors c j,j = 0,1,...,m are generated following Gaussian distribution, and the scalars d j,j = 1,...,m are made negative. This way, all inequalities in 9 hold strictly at the origin x = 0, and thus the KKT conditions are satisfied at the optimal solution. Q j,j = 0,1,...,m are randomly generated and symmetric positive semidefinite. Q 0 is ran-deficient for the first two instances and full-ran for the third one. The data in the first two instances are the same except Q 0, which is 100 times in the second instance as that in the first instance. For all tests, we set ε = 10 3, C β = 1, C ε = u l, and K = 10, and the initial primal-dual point is set to zero vector. The algorithm parameters {β,ρ,ε } are set in three different ways corresponding to Theorems 5, 6, and 7 respectively, where σ = 10 is used for the geometrically increasing penalty. On finding x +1 by applying Algorithm to min x X L β x,z, we terminate the algorithm if the iteration number exceeds 10 6 or dist x L β x +1,z,N X x +1 ε u l, 93 where X = n [l i,u i ]. Since L β x,z is convex about x, and u l is the diameter of the feasible set X, the condition in 93 guarantees that x +1 satisfies 7. We report the difference of objective value and optimal value, and the feasibility violation at both actual iterate x and the weighted averaged point x = t=1 xt / t=1 β t. The optimal solution is computed by CVX[11]. In addition, to compare the iteration complexity, we also report the number of gradient evaluations and function evaluations for each outer iteration. The results are provided in Tables 1,, and 3 respectively for the three instances. In Table 1, we also report the results from quadratic penalty method, which corresponds to setting K = 1 see the discussions in Remar 3. From the results, we can clearly see that the quadratic penalty method is worse, namely, running a single ialm step with a big penalty parameter is significantly worse than running multiple steps with smaller penalty parameters. Also, we see that the ialm with three different settings yields the last actual iterate x K and the averaged point x K of similar accuracy. For all three instances, to produce similarly accurate solutions, the ialm with constant penalty requires more gradient and function evaluations than that with geometrically increasing penalty. Furthermore, the ialm with geometrically increasing penalty and constant error requires fewest gradient and function evaluations on the first and third instances. However, the setting of geometrically increasing penalty and adaptive error is the best for ialm on the second instance. That is because the gradient Lipschitz constant of the objective in the second instance is significantly bigger than that in the first instance, in which case the bound on T K in 65 is smaller than that in 55. 9

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

arxiv: v7 [math.oc] 22 Feb 2018

arxiv: v7 [math.oc] 22 Feb 2018 A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

arxiv: v1 [math.oc] 13 Dec 2018

arxiv: v1 [math.oc] 13 Dec 2018 A NEW HOMOTOPY PROXIMAL VARIABLE-METRIC FRAMEWORK FOR COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, LIANG LING, AND KIM-CHUAN TOH arxiv:8205243v [mathoc] 3 Dec 208 Abstract This paper suggests two novel

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30,

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30, 2014 Abstract Regularized

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

Subgradient Methods in Network Resource Allocation: Rate Analysis

Subgradient Methods in Network Resource Allocation: Rate Analysis Subgradient Methods in Networ Resource Allocation: Rate Analysis Angelia Nedić Department of Industrial and Enterprise Systems Engineering University of Illinois Urbana-Champaign, IL 61801 Email: angelia@uiuc.edu

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br

More information

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS WEIWEI KONG, JEFFERSON G. MELO, AND RENATO D.C. MONTEIRO Abstract.

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization

Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Noname manuscript No. (will be inserted by the editor Perturbed Proximal Primal Dual Algorithm for Nonconvex Nonsmooth Optimization Davood Hajinezhad and Mingyi Hong Received: date / Accepted: date Abstract

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT Pacific Journal of Optimization Vol., No. 3, September 006) PRIMAL ERROR BOUNDS BASED ON THE AUGMENTED LAGRANGIAN AND LAGRANGIAN RELAXATION ALGORITHMS A. F. Izmailov and M. V. Solodov ABSTRACT For a given

More information

CO 250 Final Exam Guide

CO 250 Final Exam Guide Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Duality Theory of Constrained Optimization

Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Inexact alternating projections on nonconvex sets

Inexact alternating projections on nonconvex sets Inexact alternating projections on nonconvex sets D. Drusvyatskiy A.S. Lewis November 3, 2018 Dedicated to our friend, colleague, and inspiration, Alex Ioffe, on the occasion of his 80th birthday. Abstract

More information

1. f(β) 0 (that is, β is a feasible point for the constraints)

1. f(β) 0 (that is, β is a feasible point for the constraints) xvi 2. The lasso for linear models 2.10 Bibliographic notes Appendix Convex optimization with constraints In this Appendix we present an overview of convex optimization concepts that are particularly useful

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Barrier Method. Javier Peña Convex Optimization /36-725

Barrier Method. Javier Peña Convex Optimization /36-725 Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Iteration-complexity of first-order augmented Lagrangian methods for convex programming

Iteration-complexity of first-order augmented Lagrangian methods for convex programming Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT

FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT N. S. AYBAT, D. GOLDFARB, AND G. IYENGAR Abstract. The stable principal component pursuit SPCP problem is a non-smooth convex optimization

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Implications of the Constant Rank Constraint Qualification

Implications of the Constant Rank Constraint Qualification Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

A STABILIZED SQP METHOD: GLOBAL CONVERGENCE

A STABILIZED SQP METHOD: GLOBAL CONVERGENCE A STABILIZED SQP METHOD: GLOBAL CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-13-4 Revised July 18, 2014, June 23,

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Math. Program., Ser. A (2011) 126:1 29 DOI 10.1007/s10107-008-0261-6 FULL LENGTH PAPER Primal-dual first-order methods with O(1/ɛ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 3, No., pp. 109 115 c 013 Society for Industrial and Applied Mathematics AN ACCELERATED HYBRID PROXIMAL EXTRAGRADIENT METHOD FOR CONVEX OPTIMIZATION AND ITS IMPLICATIONS TO SECOND-ORDER

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP

An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP Kaifeng Jiang, Defeng Sun, and Kim-Chuan Toh Abstract The accelerated proximal gradient (APG) method, first

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity

Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity A MAJORIZED ADMM WITH INDEFINITE PROXIMAL TERMS FOR LINEARLY CONSTRAINED CONVEX COMPOSITE OPTIMIZATION MIN LI, DEFENG SUN, AND KIM-CHUAN TOH Abstract. This paper presents a majorized alternating direction

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

CONSTRAINED OPTIMALITY CRITERIA

CONSTRAINED OPTIMALITY CRITERIA 5 CONSTRAINED OPTIMALITY CRITERIA In Chapters 2 and 3, we discussed the necessary and sufficient optimality criteria for unconstrained optimization problems. But most engineering problems involve optimization

More information