Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Size: px

Start display at page:

Download "Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods"

Cory Hodges
5 years ago
Views:

1 Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of decentralized convex optimization problems in which a network of agents aims to minimize a global objective function that is a sum of private local) smooth and strongly convex objectives. More specifically, we study decentralized inexact dual ascent method, in which the agents only approximately solve their private minimization and then update their dual variables using inexact dual ascent. We study the effect of inexact inner minimization on the convergence rate. In particular, we show that the overall convergence rate will not be affected by inexact minimization if the errors are decreased at an appropriate rate. We illustrate our findings in a distributed binary classification problem. I. INTRODUCTION In decentralized consensus optimization, a network of agents aims to minimize a global objective function that is a sum of private local) objective functions available to each node. The goal of the agents is to collaboratively agree on a minimizer of the global objective function without revealing their private objective functions over the course of minimization. Applications include distributed multi-agent coordination, estimation problems in sensor networks, resource allocation problems, decentralized decision making, and large-scale optimization problems in machine learning [1] [4]. Broadly speaking, decentralized algorithms for convex optimization fall into three categories: i) primal methods, in which the problem is solved entirely in the primal domain and partial) consensus among the agents is guaranteed by making use of disagreement functions as penalty functions [5] [7]. ii) primal-dual methods, in which the consensus constraint is relaxed by introducing Lagrangian multipliers, and the agents simultaneously iterate over the primal and dual variables to seek the saddle point of the Lagrangian [8] [11]; and iii) dual methods, in which the agents solve the dual problem in a decentralized fashion [1] [14]. The advantage of each of these methods over the others is determined by what assumptions the objective functions satisfy smoothness, strong convexity, etc.), as well as how the consensus constraint is encoded in the problem. In decentralized dual methods, agents need to repetitively compute the dual function, which amounts to solving an inner minimization problem exactly at each iteration of the dual problem. From a practical point of view, the agents may not be able to solve the inner minimization problem exactly, Work supported by the NSF under grants CAREER-ECCS and IIS The authors are with the Department of Electrical and Systems Engineering, University of Pennsylvania. The authors are with the Department of Electrical and Systems Engineering, University of Pennsylvania. {mahyarfa, paternain, aribeiro, preciado}@seas.upenn.edu. leading to computational errors in the dual problem. On the other hand, it is known that first-order methods necessarily suffer from inexact oracle [15] and their convergence rate degrades. In this context, there has been an increasing interest in analyzing various optimization algorithms with inexact oracle, such as inexact proximal gradient methods [16], smooth optimization with approximate gradient [17], and inexact augmented Lagrangian methods [18]. In this paper, we focus on smooth and strongly convex objective functions, where we study distributed inexact dual ascent based on Laplacian consensus. In each global iteration of this algorithm, the agents perform the inner minimization step up to a certain accuracy and then communicate via a Laplacian consensus protocol to update their private dual variables based on dual ascent. The resulting dual update is a gradient ascent-like method with additive errors that depend on the accuracy of the minimization step. We quantify the effect of inexact minimization on the overall convergence of the algorithm. In particular, we show that the convergence rate of the inexact algorithm will be the same as the exact counterpart, which is exponential, if the accuracy of the local minimization step is increased an appropriate rate. Numerical examples in which a group of agents collectively train a binary classifier support the theoretical results. A. Preliminaries II. DECENTRALIZED OPTIMIZATION We denote the set of real numbers by R, the set of real d- dimensional vectors by R d, the set of m d-dimensional matrices by R m d, the d-dimensional identity matrix by I d, and the the vector of d ones by 1 d. A weighted undirected graph is defined as G = V, E, A), where V is a set of n nodes and E is a set of m undirected edges. We assume that the graph is connected and has no self-loops. We consider graphs with weights associated to edges. We denote the weight of an edge {i, j} E as w ij >. The weighted adjacency matrix, denoted by A = [a ij ], is an n n symmetric matrix defined entry-wise as a ij = w ij if {i, j} E, and a ij =, otherwise. We denote by N i = {j V : a ij } the set of connected nodes to node i. The weighted Laplacian matrix of G is defined as L = diag A1 n ) A, and its eigenvalue decomposition is given by L = diag{, λ,..., λ n })U, where λ... λ n are nontrivial eigenvalues of L. By defining L = Udiag{, λ,..., λ n })U, we have that L L = L. We define L = L Id to handle d-dimensional variables. Notice that by the properties of Kronecker product,

2 we can write Null L) = {x R nd : x = 1 n x, x R d }. 1) A differentiable function f : R d R is -strongly convex over a convex set C R d if and only if it satisfies fx) y x) + y x + fx) fy), ) for all x, y C. A differentiable f whose gradient is Lipschitz continuous on C with parameter < satisfies fy) fx) + fx) y x) + y x, 3) for all x, y C. We denote by F, ) the class of functions satisfying both ) and 3). Note that in this class, it must hold that. We define the condition number of f F, ) as κ f = /. B. Decentralized Optimization with Laplacian Consensus Consider the following convex optimization problem: n x = argmin x R d f i x), 4) i=1 where the collective objective function is the sum of n convex functions f i : R d R. In a distributed setting, each objective function f i x) is privately available to node i of a network of n nodes denoted by V = {1,,, n}. The topology of the network is determined by a connected, weighted, and undirected graph G = V, E) where E V V is the set of edges. The nodes are required to collaboratively minimize the global cost function without revealing their private cost functions. The communication flow pattern among the nodes is determined by the network structure graph G. Further, we assume that the local objective functions are twice-continuously differentiable and strongly convex with Lipschitz gradient. In other words, we have m i f I d f i x) L i f I d, 5) for some < m i f Li f < and all i V. By defining x i R d as the decision variable of node i a local copy of x R d at node i) and enforcing the equality constraint x 1 =... = x n, an equivalent formulation of 4) is n x = 1 n x = argmin f i x i ), 6) {x i} n i=1 i=1 s.t. x 1 = = x n. There are various ways to enforce the consensus constraints in 6). In this paper, we use the Laplacian consensus formulation. More explicitly, consider the Laplacian L of the communication graph. Since the graph is connected and undirected, we have that L1 d = and 1 d L =. By defining x = [x 1,, x n ] R nd as the concatenation of the local decision variables, and defining L = L I d, we can equivalently formulate 6) as { n } min fx) = f i x i ) s.t. Lx =. 7) x R nd i=1 The Lagrangian of 7) is given by Lx, λ) = fx) + λ Lx, 8) where λ = [λ 1,..., λ n ] R nd is the stacked vector of local multipliers. Notice that the Lagrangian function is strongly convex in x and affine in λ. Further, the Lagrangian is not affected by λ λ + λ, λ Null L). We can therefore assume λ Null L) or, equivalently, i λi = ), without loss of generality. The dual function for the Lagrangian in 8) is defined by dλ) = min Lx, λ). 9) x Rn The dual problem is then to maximize the dual function 9) with respect to λ, d = max λ Null dλ) = L) max λ Null L) min Lx, λ). x R nd 1) Since the primal problem 7) is feasible, i.e., the Slater s condition holds, the duality gap is zero, and hence, the problem can be solved by its dual formulation. The Karush- Kuhn-Tucker KKT) optimality conditions for characterizing the optimal pair x, λ ) R nd R nd are x Lx, λ ) = fx ) + Lλ =, 11) λ Lx, λ ) = Lx =, where fx) = f 1 x 1 ),..., f n x n )). The second identity ensures consensus, x = 1 n x, according to 1). Multiplying both sides of the first condition from left by 1 n I d and noticing that 1 n I d ) L = 1 n I d ) L I d ) = 1 n L) Id I d ) = n I d, we obtain n i=1 f i x ) =, which is the optimality condition for the centralized problem in 4), as required. C. Decentralized Dual Ascent Dual ascent methods solve the dual problem 9) using gradient ascent. Assuming that the dual function is differentiable, the dual gradient, by Danskin s theorem [19], is given by dλ)= Lx λ), where x λ)=argmin x R nd Lx, λ). 1) Therefore, the dual ascent performs the following recursions to solve 9): x k+1 =argmin Lx, λ k ), 13a) x R nd λ k+1 =λ k +α Lx k+1, 13b) with initialization λ NullL) or, equivalently, i λi = ), and α > is a constant step size universal to all the agents. The primal update is an exact minimization of the Lagrangian function, leading to dλ k ) = Lx k+1, according to 1). Therefore, the second update corresponds to a gradient step. Notice, however, that 13) is not distributed, since the linear map L is not locally computable. Introducing ν k = ν 1 k,..., νn k ) = Lλ k as the scaled

3 version of the Lagrangian multipliers, we can equivalently write 13) in terms of the new dual variables ν, as follows, x k+1 = argmin x R nd fx) + ν k x, 14a) ν k+1 = ν k + αlx k+1, 14b) which lends itself to a distributed implementation. More precisely, if we rewrite 14) in terms of the individual dynamics of the agents, we arrive at the following update law for agent i V, x i k+1 = argmin x f i x) + νk i x, 15a) νk+1 i = νk i + α w ij x i k+1 x j k+1 ). 15b) j N i The first update in 15) requires no communication since the function f i x) and the multiplier ν i are local. The second update requires one round of communication of each agent with its neighbors to exchange the private minimizers x j k+1. Notice that due to the change of variables ν = Lλ, the initial dual variables ν in 14) must respect the condition n i=1 νi =. III. DECENTRALIZED INEXACT DUAL ASCENT The decentralized dual ascent algorithm 15) relies on the assumption that the Lagrangian minimization with respect to the primal variable at each iteration is exact, which translates into exact dual gradient information to be available. In other words, at each global time step k, the agents need to compute their local exact minimizers argmin x f i x) + ν i k x) via an iterative method. In practice, especially in realtime applications, agents are often only able to solve their subproblems approximately by truncating their iterative inner minimization scheme, which means that the outer iteration dual ascent) will be provided with an inexact dual gradient. We therefore consider the practical case that the agents approximately solve their private minimization: x i k+1 argmin x R d f i x) + νk i x, 16a) νk+1 i = νk i + α w ij x i k+1 x j k+1 ). 16b) j N i There exist several ways to characterize the accuracy of approximate minimizers. Here, we focus on accuracy levels expressed in terms of objective values, i.e., we assume that at each global time k +1, agent i admits an ε i k+1 -suboptimal solution x i k+1 such that f i x i k+1)+ν i k x i k+1 min x {f i x)+ν i k x} ε i k+1). The updates in 16) can then collectively be written as 16c) x k+1 argmin x R nd fx) + ν x, 17a) ν k+1 = ν k + αlx k+1. 17b) By summing both sides of 16c) over i = 1,..., n, we obtain fx k+1 )+ν k x k+1 min x {fx)+ν k x} ε k+1, 17c) where ε k = [ε 1 k... εn k ] is the stacked vector of local errors, defined in 16c). Notice that the exact dual ascent dynamics in 14) corresponds to ε k =. The reason for choosing the objective value as a stopping criterion is twofold. First, this criterion is the most relaxed weakest) stopping criterion [15]. Second, for several algorithms in the literature, we can find explicit bounds on the number of iterations required to obtain the desired accuracy in terms of the objective value. The decentralized inexact dual ascent algorithm is outlined in Algorithm 1. Algorithm 1 Decentralized Inexact Dual Ascent DIDA) Given: {f i x)} n i=1, where f i x) Fm i f, Li f ). An undirected connected communication graph with Laplacian matrix L = [w ij ], step size < α /λ nl), where = i mi f. 1: Initialize at ν i =, xi Rd for all i = 1,..., n : for k =, 1,, all agents do 3: f i x i k+1 )+νi k x i k+1 minx{f i x)+ν i k x} ε i k+1 ). 4: ν i k+1 = νi k + α j N i w ij x i k+1 xj k+1 ). 5: end for A. Convergence Analysis We now analyze the convergence of decentralized inexact dual ascent dynamics. For convenience, we perform the convergence analysis in terms of the original dual variables λ. We first define the dual gradient mapping F : R nd R nd as follows, F λ) :=λ+α Lx λ), x λ) = argmin x Lx, λ). 18) By this definition, the recursions in 17), after the change of variables ν = Lλ, can be written as Lx k+1, λ k ) Lx k+1, λ k ) ε k+1, λ k+1 = F λ k ) + e k+1, e k+1 = α Lx k+1 Lx k+1). 19a) 19b) 19c) Here, e k+1 is the propagated error to the dual ascent update as a result of inexact inner minimization. To analyze the convergence of 19), we first characterize the smoothness properties of F λ). Lemma 1 Consider the Lagrangian in 8), where f F, ). Then the dual gradient mapping, defined as in 18), satisfies F λ) F ν) L F λ ν, for all λ, ν NullL), where the Lipschitz constant L F is given by L F = max 1 αλ n L)/, 1 αλ L)/ ). ) Proof: See Appendix VI-A. A direct consequence of Lemma 1 is that the dual ascent mapping F λ) is contractive on the subspace NullL) provided that L F < 1. It is easy to verify that this condition is satisfied by the selection < α < /λ n L). Therefore, in view of 19b), the dual update is a contractive

4 dynamics perturbed with additive errors. In the following, we characterize the convergence rate of 19). Proposition 1 Consider the decentralized inexact dual ascent outlined in 19), where f F, ). Define L F as in ). Then, for all k 1, we have where A k = L k F x k+1 x A k +B k, 1) λ n L) 1 λ λ, B k = ) 3 λn L)α k j=1 L k j F ε j + ) 1 εk+1. Proof: See Appendix VI-B. We now describe the terms A k and B k that determine the overall convergence rate of the algorithm. The first term, A k, is error-independent and vanishes exponentially with the same rate as the exact algorithm i.e., when ε k ). In other words, we have that A k = OL k F ). The second term, B k, is a weighted sum of the inner minimization errors, whose limiting behavior depends on that of ε k. Explicitly, suppose ε k Oρ k e) for some < ρ e < 1, ρ e L F. Then, we can verify that B k Omaxρ e, L F ) k ), which further implies that x k+1 x Omaxρ e, L F ) k ). In particular, if the errors in the inexact minimization step vanish faster than the contraction factor of the dual dynamics, i.e., if ρ e < L F, the convergence rate of the inexact algorithm will be unaffected by inexact inner minimization. This suggests that the agents can start with a crude solution of their inner minimization, when they are far from the global solution, and then increase their accuracy geometrically after each round of communication. This leads to a substantial computational gain for each agent at the initial iterations of the algorithm. Finally, if ρ e = L F, then it is not difficult to show that B k OkL k F ), leading to an OkLk F ) overall convergence rate. We close this section by a remark. Remark 1 Comparison with Decentralized ADMM) Recall the expression for the convergence factor of the inexact dual gradient algorithm stated in Proposition 1 L F =max 1 αλ n L)/, 1 αλ L)/ ), ) which is a function of the step size α. By simple algebraic calculations, the optimal smallest) convergence factor is attained when the step size is selected as α = λ n L)/ + λ L)/ ). For this selection, the optimal convergence factor is L F = κ f κ G 1 κ f κ G + 1, 3) where we have defined the graph condition number as κ G = λ n L)/λ L), and κ f = / is the condition number of Convergence Factor κ G 4 4 Dual Ascent ADMM Fig. 1: A comparison of exponential convergence factors of Decentralized Dual Ascent with that of Decentralized ADMM established in [1]. κ f is the condition number of the global objective function whereas κ G is the condition number of the graph Laplacian. Smaller values of convergence factor means faster convergence. the objective function as before. For the purpose of comparison, we consider the decentralized ADMM implementation of the original problem 4) outlined in [1]. The authors establish the following optimal rate L ADMM = 1 κf κ f κ f κ G κ f 6 1 ) 1, 4) using the same assumptions as in the present work. In Fig. 1, we compare the convergence factor of decentralized dual ascent, Eq. 3), with that of decentralized ADMM, Eq. 4) for κ f, κ G ) [1, 1] [1, 1]. We observe that the worst case convergence rate of decentralized dual ascent is always better by a considerable margin than the rate claimed in [1]. Intuitively explaining, in ADMM we still need to solve exact inner minimizations but in a sequential fashion. This sequential minimization introduces inherent inexactness in dual gradient information even when the inner minimizations are performed exactly. In contrast, decentralized dual ascent provides the algorithm with exact dual gradient information. Even if the inner minimization is performed inexactly, Proposition 1 states that the convergence rate is not affected as long as the inner minimization errors are decreased at an appropriate rate. IV. NUMERICAL SIMULATIONS In this section we consider the problem of training a binary classifier with a dataset 1 that is scattered across a multi-agent network. For the connectivity graph, we consider a random network with n = 1 nodes and probability of connection p =.. Let us denote by x i k, yi k ) Rd 1 { 1, 1} the k-th data point of the training set of agent i, where d = 1, i = 1,..., n and k = 1,..., 5. Each agent has a local copy w i R d of the global classifier and a local function based 1 ftp://ftp.ics.uci.edu/pub/machine-learning-databases

5 Fig. : Plot of the norm of the gradient of the Lagrangian with respect to the primal variable in blue) and the dual variable in red) for the numerical example of Section IV. on the training data observed. N i )) f i w i ) = log 1 + e yi k [x i k 1]w i + γ w i. 5) k=1 Here, γ > is the regularization constant, which we set to γ = 1. Given these numerical values, we run Algorithm 1 assuming that the inaccuracy of the inner minimization decreases at a geometric rate with ρ e =.99. In Figure, we plot the evolution of the norm of the derivative of the Lagrangian with respect to the primal and dual variables. V. CONCLUSIONS We studied decentralized consensus optimization of smooth and strongly convex objective functions using decentralized inexact dual ascent. Specifically, we considered a Lagrangian formulation in which the inner minimization of the Lagrangian is locally computable at each node. We assumed that the inner minimizations are performed inexactly, resulting in the dual update to be an inexact dual ascent. We analyzed the effect of inaccuracy on the overall convergence rate. We showed that the convergence rate of the algorithm does not deteriorate if the inner minimization error are controlled in an appropriate way. A. Proof of Lemma 1 VI. APPENDIX Since fx) and in turn, Lx, λ) is strongly convex, the mapping λ x λ) = argmin x Lx, λ) is well defined and differentiable almost everywhere. To show this, we begin by the optimality condition that defines x λ) as follows, fx λ)) + Lλ =. 6) By strong convexity of f, we can write [, 9] x λ) x µ) fx λ)) fx µ)), L λ µ, where in the second inequality, we have used 6). The above inequality establishes the Lipschitz continuity of the map x λ). Hence, x λ) is differentiable almost everywhere. Now since 6) holds for all λ, we can differentiate both sides with respect to λ and use the chain rule to obtain fx λ)) dx λ) dλ + L =. On the other hand, by Danskin s theorem [19], the dual function is differentiable with its gradient given by dλ) = Lx λ). By differentiating one more time, we obtain the dual Hessian dλ) = L dx λ) dλ = L fx λ)) 1 L. Note that the Hessian is negative definite on NullL). To see this, we note that λ dλ)λ = Lλ) fx λ)) 1 Lλ). Since I nd fx λ)) I nd by assumption, we obtain 1 λ Lλ λ dλ)λ 1 λ Lλ. 7) Furthermore, for all λ NullL), we can write λ L) λ λ Lλ λ n L) λ. 8) Incorporating 8) in 7), we obtain λ nl) I nd dλ) λ L) I nd, λ NullL). 9) On the other hand, the Jacobian of F λ) = λ + α dλ) is given by Using 9), we can write d dλ F λ) = I nd + α dλ). 1 αλ nl) )I nd d dλ F λ) 1 αλ L) )I nd. Therefore, for all λ NullL) the Jacobian of F satisfies the bound d dλ F λ) L F = max 1 αλ nl), 1 αλ L) ). 3) Next, using Taylor s theorem, for all λ, ν NullL), we can write F λ) F ν) = d F ν + tλ ν))λ ν)dt. dλ

6 Since ν + tλ ν) NullL) for t 1, we can write F λ) F ν) = d dλ F ν + tλ ν))λ ν)dt d dλ F ν + tλ ν)) λ ν) dt L F λ ν) dt = L F λ ν. In other words, F is Lipschitz continuous on NullL) with parameter L F. The proof is now complete. B. Proof of Proposition 1 Consider the map x Lx, λ) = fx)+λ Lx, which is -strongly convex. We can therefore write [, 9] x k+1 x k+1 Lx k+1, λ k ) Lx k+1, λ k ). The right-hand side is bounded by ε k+1, according to 19a). Therefore, we can write which further implies x k+1 x k+1 ε k+1, 31) e k+1 = α Lx k+1 x k+1) C ε k+1, 3) where C = λ n L)α /. Similarly, we use the strong convexity of f to write x k+1 x fx k+1) fx ). Recalling that fx k+1 ) + Lλ k = and fx ) + Lλ = see 11)), we can write x k+1 x Lλ k Lλ 33) λ nl) 1 λ k λ. On the other hand, observe that we can write λ k = λ + k j=1 α Lx j, k 1. Since λ NullL) by assumption initialization) and k j=1 α Lx j NullL), we have that λ k NullL) for all k, i.e., λ k is in the subspace in which the dual gradient mapping F λ) is contractive. Invoking Lemma 1, we can write λ k λ = F λ k 1 ) F λ ) + e k 34) F λ k 1 ) F λ ) + e k L F λ k 1 λ + e k. By iterating down to k = 1 and using the bound in 3), we obtain k λ k λ L k F λ λ +C 1 L j F ε j ). 35) j=1 Finally, we use the triangle inequality to write x k+1 x x k+1 x k+1 + x k+1 x. 36) The first and second term on the right-hand side can by bounded by 31) and 33), respectively. By Substituting these bounds in 36) and further using the bound in 35) yields the desired inequality. The proof is now complete. REFERENCES [1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends R in Machine Learning, vol. 3, no. 1, pp. 1 1, 11. [] R. Zhang and J. Kwok, Asynchronous distributed admor consensus optimization, in International Conference on Machine Learning, pp , 14. [3] J. C. Duchi, A. Agarwal, and M. J. Wainwright, Dual averaging for distributed optimization: Convergence analysis and network scaling, IEEE Transactions on Automatic control, vol. 57, no. 3, pp , 1. [4] D. V. Dimarogonas, E. Frazzoli, and K. H. Johansson, Distributed event-triggered control for multi-agent systems, IEEE Transactions on Automatic Control, vol. 57, no. 5, pp , 1. [5] K. Yuan, Q. Ling, and W. Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization, vol. 6, no. 3, pp , 16. [6] W. Shi, Q. Ling, G. Wu, and W. Yin, Extra: An exact first-order algorithor decentralized consensus optimization, SIAM Journal on Optimization, vol. 5, no., pp , 15. [7] A. Mokhtari, Q. Ling, and A. Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing, vol. 65, no. 1, pp , 17. [8] A. Nedic and A. Ozdaglar, Distributed subgradient methods for multiagent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp , 9. [9] D. Feijer and F. Paganini, Stability of primal dual gradient dynamics and applications to network optimization, Automatica, vol. 46, no. 1, pp , 1. [1] S. S. Kia, J. Cortés, and S. Martínez, Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication, Automatica, vol. 55, pp , 15. [11] J. Wang and N. Elia, Control approach to distributed optimization, in Communication, Control, and Computing Allerton), 1 48th Annual Allerton Conference on, pp , IEEE, 1. [1] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, On the linear convergence of the admm in decentralized consensus optimization., IEEE Trans. Signal Processing, vol. 6, no. 7, pp , 14. [13] M. Zargham, A. Ribeiro, A. Ozdaglar, and A. Jadbabaie, Accelerated dual descent for network flow optimization, IEEE Transactions on Automatic Control, vol. 59, no. 4, pp. 95 9, 14. [14] R. Tutunov, H. B. Ammar, and A. Jadbabaie, A distributed newton method for large scale consensus optimization, arxiv preprint arxiv: , 16. [15] O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol. 146, no. 1-, pp , 14. [16] M. Schmidt, N. L. Roux, and F. R. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, in Advances in neural information processing systems, pp , 11. [17] A. d Aspremont, Smooth optimization with approximate gradient, SIAM Journal on Optimization, vol. 19, no. 3, pp , 8. [18] V. Nedelcu, I. Necoara, and Q. Tran-Dinh, Computational complexity of inexact gradient augmented lagrangian methods: application to constrained mpc, SIAM Journal on Control and Optimization, vol. 5, no. 5, pp , 14. [19] D. P. Bertsekas, Nonlinear programming. Athena scientific Belmont, [] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 4.

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work