Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Size: px
Start display at page:

Download "Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods"

Transcription

1 Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of decentralized convex optimization problems in which a network of agents aims to minimize a global objective function that is a sum of private local) smooth and strongly convex objectives. More specifically, we study decentralized inexact dual ascent method, in which the agents only approximately solve their private minimization and then update their dual variables using inexact dual ascent. We study the effect of inexact inner minimization on the convergence rate. In particular, we show that the overall convergence rate will not be affected by inexact minimization if the errors are decreased at an appropriate rate. We illustrate our findings in a distributed binary classification problem. I. INTRODUCTION In decentralized consensus optimization, a network of agents aims to minimize a global objective function that is a sum of private local) objective functions available to each node. The goal of the agents is to collaboratively agree on a minimizer of the global objective function without revealing their private objective functions over the course of minimization. Applications include distributed multi-agent coordination, estimation problems in sensor networks, resource allocation problems, decentralized decision making, and large-scale optimization problems in machine learning [1] [4]. Broadly speaking, decentralized algorithms for convex optimization fall into three categories: i) primal methods, in which the problem is solved entirely in the primal domain and partial) consensus among the agents is guaranteed by making use of disagreement functions as penalty functions [5] [7]. ii) primal-dual methods, in which the consensus constraint is relaxed by introducing Lagrangian multipliers, and the agents simultaneously iterate over the primal and dual variables to seek the saddle point of the Lagrangian [8] [11]; and iii) dual methods, in which the agents solve the dual problem in a decentralized fashion [1] [14]. The advantage of each of these methods over the others is determined by what assumptions the objective functions satisfy smoothness, strong convexity, etc.), as well as how the consensus constraint is encoded in the problem. In decentralized dual methods, agents need to repetitively compute the dual function, which amounts to solving an inner minimization problem exactly at each iteration of the dual problem. From a practical point of view, the agents may not be able to solve the inner minimization problem exactly, Work supported by the NSF under grants CAREER-ECCS and IIS The authors are with the Department of Electrical and Systems Engineering, University of Pennsylvania. The authors are with the Department of Electrical and Systems Engineering, University of Pennsylvania. {mahyarfa, paternain, aribeiro, preciado}@seas.upenn.edu. leading to computational errors in the dual problem. On the other hand, it is known that first-order methods necessarily suffer from inexact oracle [15] and their convergence rate degrades. In this context, there has been an increasing interest in analyzing various optimization algorithms with inexact oracle, such as inexact proximal gradient methods [16], smooth optimization with approximate gradient [17], and inexact augmented Lagrangian methods [18]. In this paper, we focus on smooth and strongly convex objective functions, where we study distributed inexact dual ascent based on Laplacian consensus. In each global iteration of this algorithm, the agents perform the inner minimization step up to a certain accuracy and then communicate via a Laplacian consensus protocol to update their private dual variables based on dual ascent. The resulting dual update is a gradient ascent-like method with additive errors that depend on the accuracy of the minimization step. We quantify the effect of inexact minimization on the overall convergence of the algorithm. In particular, we show that the convergence rate of the inexact algorithm will be the same as the exact counterpart, which is exponential, if the accuracy of the local minimization step is increased an appropriate rate. Numerical examples in which a group of agents collectively train a binary classifier support the theoretical results. A. Preliminaries II. DECENTRALIZED OPTIMIZATION We denote the set of real numbers by R, the set of real d- dimensional vectors by R d, the set of m d-dimensional matrices by R m d, the d-dimensional identity matrix by I d, and the the vector of d ones by 1 d. A weighted undirected graph is defined as G = V, E, A), where V is a set of n nodes and E is a set of m undirected edges. We assume that the graph is connected and has no self-loops. We consider graphs with weights associated to edges. We denote the weight of an edge {i, j} E as w ij >. The weighted adjacency matrix, denoted by A = [a ij ], is an n n symmetric matrix defined entry-wise as a ij = w ij if {i, j} E, and a ij =, otherwise. We denote by N i = {j V : a ij } the set of connected nodes to node i. The weighted Laplacian matrix of G is defined as L = diag A1 n ) A, and its eigenvalue decomposition is given by L = diag{, λ,..., λ n })U, where λ... λ n are nontrivial eigenvalues of L. By defining L = Udiag{, λ,..., λ n })U, we have that L L = L. We define L = L Id to handle d-dimensional variables. Notice that by the properties of Kronecker product,

2 we can write Null L) = {x R nd : x = 1 n x, x R d }. 1) A differentiable function f : R d R is -strongly convex over a convex set C R d if and only if it satisfies fx) y x) + y x + fx) fy), ) for all x, y C. A differentiable f whose gradient is Lipschitz continuous on C with parameter < satisfies fy) fx) + fx) y x) + y x, 3) for all x, y C. We denote by F, ) the class of functions satisfying both ) and 3). Note that in this class, it must hold that. We define the condition number of f F, ) as κ f = /. B. Decentralized Optimization with Laplacian Consensus Consider the following convex optimization problem: n x = argmin x R d f i x), 4) i=1 where the collective objective function is the sum of n convex functions f i : R d R. In a distributed setting, each objective function f i x) is privately available to node i of a network of n nodes denoted by V = {1,,, n}. The topology of the network is determined by a connected, weighted, and undirected graph G = V, E) where E V V is the set of edges. The nodes are required to collaboratively minimize the global cost function without revealing their private cost functions. The communication flow pattern among the nodes is determined by the network structure graph G. Further, we assume that the local objective functions are twice-continuously differentiable and strongly convex with Lipschitz gradient. In other words, we have m i f I d f i x) L i f I d, 5) for some < m i f Li f < and all i V. By defining x i R d as the decision variable of node i a local copy of x R d at node i) and enforcing the equality constraint x 1 =... = x n, an equivalent formulation of 4) is n x = 1 n x = argmin f i x i ), 6) {x i} n i=1 i=1 s.t. x 1 = = x n. There are various ways to enforce the consensus constraints in 6). In this paper, we use the Laplacian consensus formulation. More explicitly, consider the Laplacian L of the communication graph. Since the graph is connected and undirected, we have that L1 d = and 1 d L =. By defining x = [x 1,, x n ] R nd as the concatenation of the local decision variables, and defining L = L I d, we can equivalently formulate 6) as { n } min fx) = f i x i ) s.t. Lx =. 7) x R nd i=1 The Lagrangian of 7) is given by Lx, λ) = fx) + λ Lx, 8) where λ = [λ 1,..., λ n ] R nd is the stacked vector of local multipliers. Notice that the Lagrangian function is strongly convex in x and affine in λ. Further, the Lagrangian is not affected by λ λ + λ, λ Null L). We can therefore assume λ Null L) or, equivalently, i λi = ), without loss of generality. The dual function for the Lagrangian in 8) is defined by dλ) = min Lx, λ). 9) x Rn The dual problem is then to maximize the dual function 9) with respect to λ, d = max λ Null dλ) = L) max λ Null L) min Lx, λ). x R nd 1) Since the primal problem 7) is feasible, i.e., the Slater s condition holds, the duality gap is zero, and hence, the problem can be solved by its dual formulation. The Karush- Kuhn-Tucker KKT) optimality conditions for characterizing the optimal pair x, λ ) R nd R nd are x Lx, λ ) = fx ) + Lλ =, 11) λ Lx, λ ) = Lx =, where fx) = f 1 x 1 ),..., f n x n )). The second identity ensures consensus, x = 1 n x, according to 1). Multiplying both sides of the first condition from left by 1 n I d and noticing that 1 n I d ) L = 1 n I d ) L I d ) = 1 n L) Id I d ) = n I d, we obtain n i=1 f i x ) =, which is the optimality condition for the centralized problem in 4), as required. C. Decentralized Dual Ascent Dual ascent methods solve the dual problem 9) using gradient ascent. Assuming that the dual function is differentiable, the dual gradient, by Danskin s theorem [19], is given by dλ)= Lx λ), where x λ)=argmin x R nd Lx, λ). 1) Therefore, the dual ascent performs the following recursions to solve 9): x k+1 =argmin Lx, λ k ), 13a) x R nd λ k+1 =λ k +α Lx k+1, 13b) with initialization λ NullL) or, equivalently, i λi = ), and α > is a constant step size universal to all the agents. The primal update is an exact minimization of the Lagrangian function, leading to dλ k ) = Lx k+1, according to 1). Therefore, the second update corresponds to a gradient step. Notice, however, that 13) is not distributed, since the linear map L is not locally computable. Introducing ν k = ν 1 k,..., νn k ) = Lλ k as the scaled

3 version of the Lagrangian multipliers, we can equivalently write 13) in terms of the new dual variables ν, as follows, x k+1 = argmin x R nd fx) + ν k x, 14a) ν k+1 = ν k + αlx k+1, 14b) which lends itself to a distributed implementation. More precisely, if we rewrite 14) in terms of the individual dynamics of the agents, we arrive at the following update law for agent i V, x i k+1 = argmin x f i x) + νk i x, 15a) νk+1 i = νk i + α w ij x i k+1 x j k+1 ). 15b) j N i The first update in 15) requires no communication since the function f i x) and the multiplier ν i are local. The second update requires one round of communication of each agent with its neighbors to exchange the private minimizers x j k+1. Notice that due to the change of variables ν = Lλ, the initial dual variables ν in 14) must respect the condition n i=1 νi =. III. DECENTRALIZED INEXACT DUAL ASCENT The decentralized dual ascent algorithm 15) relies on the assumption that the Lagrangian minimization with respect to the primal variable at each iteration is exact, which translates into exact dual gradient information to be available. In other words, at each global time step k, the agents need to compute their local exact minimizers argmin x f i x) + ν i k x) via an iterative method. In practice, especially in realtime applications, agents are often only able to solve their subproblems approximately by truncating their iterative inner minimization scheme, which means that the outer iteration dual ascent) will be provided with an inexact dual gradient. We therefore consider the practical case that the agents approximately solve their private minimization: x i k+1 argmin x R d f i x) + νk i x, 16a) νk+1 i = νk i + α w ij x i k+1 x j k+1 ). 16b) j N i There exist several ways to characterize the accuracy of approximate minimizers. Here, we focus on accuracy levels expressed in terms of objective values, i.e., we assume that at each global time k +1, agent i admits an ε i k+1 -suboptimal solution x i k+1 such that f i x i k+1)+ν i k x i k+1 min x {f i x)+ν i k x} ε i k+1). The updates in 16) can then collectively be written as 16c) x k+1 argmin x R nd fx) + ν x, 17a) ν k+1 = ν k + αlx k+1. 17b) By summing both sides of 16c) over i = 1,..., n, we obtain fx k+1 )+ν k x k+1 min x {fx)+ν k x} ε k+1, 17c) where ε k = [ε 1 k... εn k ] is the stacked vector of local errors, defined in 16c). Notice that the exact dual ascent dynamics in 14) corresponds to ε k =. The reason for choosing the objective value as a stopping criterion is twofold. First, this criterion is the most relaxed weakest) stopping criterion [15]. Second, for several algorithms in the literature, we can find explicit bounds on the number of iterations required to obtain the desired accuracy in terms of the objective value. The decentralized inexact dual ascent algorithm is outlined in Algorithm 1. Algorithm 1 Decentralized Inexact Dual Ascent DIDA) Given: {f i x)} n i=1, where f i x) Fm i f, Li f ). An undirected connected communication graph with Laplacian matrix L = [w ij ], step size < α /λ nl), where = i mi f. 1: Initialize at ν i =, xi Rd for all i = 1,..., n : for k =, 1,, all agents do 3: f i x i k+1 )+νi k x i k+1 minx{f i x)+ν i k x} ε i k+1 ). 4: ν i k+1 = νi k + α j N i w ij x i k+1 xj k+1 ). 5: end for A. Convergence Analysis We now analyze the convergence of decentralized inexact dual ascent dynamics. For convenience, we perform the convergence analysis in terms of the original dual variables λ. We first define the dual gradient mapping F : R nd R nd as follows, F λ) :=λ+α Lx λ), x λ) = argmin x Lx, λ). 18) By this definition, the recursions in 17), after the change of variables ν = Lλ, can be written as Lx k+1, λ k ) Lx k+1, λ k ) ε k+1, λ k+1 = F λ k ) + e k+1, e k+1 = α Lx k+1 Lx k+1). 19a) 19b) 19c) Here, e k+1 is the propagated error to the dual ascent update as a result of inexact inner minimization. To analyze the convergence of 19), we first characterize the smoothness properties of F λ). Lemma 1 Consider the Lagrangian in 8), where f F, ). Then the dual gradient mapping, defined as in 18), satisfies F λ) F ν) L F λ ν, for all λ, ν NullL), where the Lipschitz constant L F is given by L F = max 1 αλ n L)/, 1 αλ L)/ ). ) Proof: See Appendix VI-A. A direct consequence of Lemma 1 is that the dual ascent mapping F λ) is contractive on the subspace NullL) provided that L F < 1. It is easy to verify that this condition is satisfied by the selection < α < /λ n L). Therefore, in view of 19b), the dual update is a contractive

4 dynamics perturbed with additive errors. In the following, we characterize the convergence rate of 19). Proposition 1 Consider the decentralized inexact dual ascent outlined in 19), where f F, ). Define L F as in ). Then, for all k 1, we have where A k = L k F x k+1 x A k +B k, 1) λ n L) 1 λ λ, B k = ) 3 λn L)α k j=1 L k j F ε j + ) 1 εk+1. Proof: See Appendix VI-B. We now describe the terms A k and B k that determine the overall convergence rate of the algorithm. The first term, A k, is error-independent and vanishes exponentially with the same rate as the exact algorithm i.e., when ε k ). In other words, we have that A k = OL k F ). The second term, B k, is a weighted sum of the inner minimization errors, whose limiting behavior depends on that of ε k. Explicitly, suppose ε k Oρ k e) for some < ρ e < 1, ρ e L F. Then, we can verify that B k Omaxρ e, L F ) k ), which further implies that x k+1 x Omaxρ e, L F ) k ). In particular, if the errors in the inexact minimization step vanish faster than the contraction factor of the dual dynamics, i.e., if ρ e < L F, the convergence rate of the inexact algorithm will be unaffected by inexact inner minimization. This suggests that the agents can start with a crude solution of their inner minimization, when they are far from the global solution, and then increase their accuracy geometrically after each round of communication. This leads to a substantial computational gain for each agent at the initial iterations of the algorithm. Finally, if ρ e = L F, then it is not difficult to show that B k OkL k F ), leading to an OkLk F ) overall convergence rate. We close this section by a remark. Remark 1 Comparison with Decentralized ADMM) Recall the expression for the convergence factor of the inexact dual gradient algorithm stated in Proposition 1 L F =max 1 αλ n L)/, 1 αλ L)/ ), ) which is a function of the step size α. By simple algebraic calculations, the optimal smallest) convergence factor is attained when the step size is selected as α = λ n L)/ + λ L)/ ). For this selection, the optimal convergence factor is L F = κ f κ G 1 κ f κ G + 1, 3) where we have defined the graph condition number as κ G = λ n L)/λ L), and κ f = / is the condition number of Convergence Factor κ G 4 4 Dual Ascent ADMM Fig. 1: A comparison of exponential convergence factors of Decentralized Dual Ascent with that of Decentralized ADMM established in [1]. κ f is the condition number of the global objective function whereas κ G is the condition number of the graph Laplacian. Smaller values of convergence factor means faster convergence. the objective function as before. For the purpose of comparison, we consider the decentralized ADMM implementation of the original problem 4) outlined in [1]. The authors establish the following optimal rate L ADMM = 1 κf κ f κ f κ G κ f 6 1 ) 1, 4) using the same assumptions as in the present work. In Fig. 1, we compare the convergence factor of decentralized dual ascent, Eq. 3), with that of decentralized ADMM, Eq. 4) for κ f, κ G ) [1, 1] [1, 1]. We observe that the worst case convergence rate of decentralized dual ascent is always better by a considerable margin than the rate claimed in [1]. Intuitively explaining, in ADMM we still need to solve exact inner minimizations but in a sequential fashion. This sequential minimization introduces inherent inexactness in dual gradient information even when the inner minimizations are performed exactly. In contrast, decentralized dual ascent provides the algorithm with exact dual gradient information. Even if the inner minimization is performed inexactly, Proposition 1 states that the convergence rate is not affected as long as the inner minimization errors are decreased at an appropriate rate. IV. NUMERICAL SIMULATIONS In this section we consider the problem of training a binary classifier with a dataset 1 that is scattered across a multi-agent network. For the connectivity graph, we consider a random network with n = 1 nodes and probability of connection p =.. Let us denote by x i k, yi k ) Rd 1 { 1, 1} the k-th data point of the training set of agent i, where d = 1, i = 1,..., n and k = 1,..., 5. Each agent has a local copy w i R d of the global classifier and a local function based 1 ftp://ftp.ics.uci.edu/pub/machine-learning-databases

5 Fig. : Plot of the norm of the gradient of the Lagrangian with respect to the primal variable in blue) and the dual variable in red) for the numerical example of Section IV. on the training data observed. N i )) f i w i ) = log 1 + e yi k [x i k 1]w i + γ w i. 5) k=1 Here, γ > is the regularization constant, which we set to γ = 1. Given these numerical values, we run Algorithm 1 assuming that the inaccuracy of the inner minimization decreases at a geometric rate with ρ e =.99. In Figure, we plot the evolution of the norm of the derivative of the Lagrangian with respect to the primal and dual variables. V. CONCLUSIONS We studied decentralized consensus optimization of smooth and strongly convex objective functions using decentralized inexact dual ascent. Specifically, we considered a Lagrangian formulation in which the inner minimization of the Lagrangian is locally computable at each node. We assumed that the inner minimizations are performed inexactly, resulting in the dual update to be an inexact dual ascent. We analyzed the effect of inaccuracy on the overall convergence rate. We showed that the convergence rate of the algorithm does not deteriorate if the inner minimization error are controlled in an appropriate way. A. Proof of Lemma 1 VI. APPENDIX Since fx) and in turn, Lx, λ) is strongly convex, the mapping λ x λ) = argmin x Lx, λ) is well defined and differentiable almost everywhere. To show this, we begin by the optimality condition that defines x λ) as follows, fx λ)) + Lλ =. 6) By strong convexity of f, we can write [, 9] x λ) x µ) fx λ)) fx µ)), L λ µ, where in the second inequality, we have used 6). The above inequality establishes the Lipschitz continuity of the map x λ). Hence, x λ) is differentiable almost everywhere. Now since 6) holds for all λ, we can differentiate both sides with respect to λ and use the chain rule to obtain fx λ)) dx λ) dλ + L =. On the other hand, by Danskin s theorem [19], the dual function is differentiable with its gradient given by dλ) = Lx λ). By differentiating one more time, we obtain the dual Hessian dλ) = L dx λ) dλ = L fx λ)) 1 L. Note that the Hessian is negative definite on NullL). To see this, we note that λ dλ)λ = Lλ) fx λ)) 1 Lλ). Since I nd fx λ)) I nd by assumption, we obtain 1 λ Lλ λ dλ)λ 1 λ Lλ. 7) Furthermore, for all λ NullL), we can write λ L) λ λ Lλ λ n L) λ. 8) Incorporating 8) in 7), we obtain λ nl) I nd dλ) λ L) I nd, λ NullL). 9) On the other hand, the Jacobian of F λ) = λ + α dλ) is given by Using 9), we can write d dλ F λ) = I nd + α dλ). 1 αλ nl) )I nd d dλ F λ) 1 αλ L) )I nd. Therefore, for all λ NullL) the Jacobian of F satisfies the bound d dλ F λ) L F = max 1 αλ nl), 1 αλ L) ). 3) Next, using Taylor s theorem, for all λ, ν NullL), we can write F λ) F ν) = d F ν + tλ ν))λ ν)dt. dλ

6 Since ν + tλ ν) NullL) for t 1, we can write F λ) F ν) = d dλ F ν + tλ ν))λ ν)dt d dλ F ν + tλ ν)) λ ν) dt L F λ ν) dt = L F λ ν. In other words, F is Lipschitz continuous on NullL) with parameter L F. The proof is now complete. B. Proof of Proposition 1 Consider the map x Lx, λ) = fx)+λ Lx, which is -strongly convex. We can therefore write [, 9] x k+1 x k+1 Lx k+1, λ k ) Lx k+1, λ k ). The right-hand side is bounded by ε k+1, according to 19a). Therefore, we can write which further implies x k+1 x k+1 ε k+1, 31) e k+1 = α Lx k+1 x k+1) C ε k+1, 3) where C = λ n L)α /. Similarly, we use the strong convexity of f to write x k+1 x fx k+1) fx ). Recalling that fx k+1 ) + Lλ k = and fx ) + Lλ = see 11)), we can write x k+1 x Lλ k Lλ 33) λ nl) 1 λ k λ. On the other hand, observe that we can write λ k = λ + k j=1 α Lx j, k 1. Since λ NullL) by assumption initialization) and k j=1 α Lx j NullL), we have that λ k NullL) for all k, i.e., λ k is in the subspace in which the dual gradient mapping F λ) is contractive. Invoking Lemma 1, we can write λ k λ = F λ k 1 ) F λ ) + e k 34) F λ k 1 ) F λ ) + e k L F λ k 1 λ + e k. By iterating down to k = 1 and using the bound in 3), we obtain k λ k λ L k F λ λ +C 1 L j F ε j ). 35) j=1 Finally, we use the triangle inequality to write x k+1 x x k+1 x k+1 + x k+1 x. 36) The first and second term on the right-hand side can by bounded by 31) and 33), respectively. By Substituting these bounds in 36) and further using the bound in 35) yields the desired inequality. The proof is now complete. REFERENCES [1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends R in Machine Learning, vol. 3, no. 1, pp. 1 1, 11. [] R. Zhang and J. Kwok, Asynchronous distributed admor consensus optimization, in International Conference on Machine Learning, pp , 14. [3] J. C. Duchi, A. Agarwal, and M. J. Wainwright, Dual averaging for distributed optimization: Convergence analysis and network scaling, IEEE Transactions on Automatic control, vol. 57, no. 3, pp , 1. [4] D. V. Dimarogonas, E. Frazzoli, and K. H. Johansson, Distributed event-triggered control for multi-agent systems, IEEE Transactions on Automatic Control, vol. 57, no. 5, pp , 1. [5] K. Yuan, Q. Ling, and W. Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization, vol. 6, no. 3, pp , 16. [6] W. Shi, Q. Ling, G. Wu, and W. Yin, Extra: An exact first-order algorithor decentralized consensus optimization, SIAM Journal on Optimization, vol. 5, no., pp , 15. [7] A. Mokhtari, Q. Ling, and A. Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing, vol. 65, no. 1, pp , 17. [8] A. Nedic and A. Ozdaglar, Distributed subgradient methods for multiagent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp , 9. [9] D. Feijer and F. Paganini, Stability of primal dual gradient dynamics and applications to network optimization, Automatica, vol. 46, no. 1, pp , 1. [1] S. S. Kia, J. Cortés, and S. Martínez, Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication, Automatica, vol. 55, pp , 15. [11] J. Wang and N. Elia, Control approach to distributed optimization, in Communication, Control, and Computing Allerton), 1 48th Annual Allerton Conference on, pp , IEEE, 1. [1] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, On the linear convergence of the admm in decentralized consensus optimization., IEEE Trans. Signal Processing, vol. 6, no. 7, pp , 14. [13] M. Zargham, A. Ribeiro, A. Ozdaglar, and A. Jadbabaie, Accelerated dual descent for network flow optimization, IEEE Transactions on Automatic Control, vol. 59, no. 4, pp. 95 9, 14. [14] R. Tutunov, H. B. Ammar, and A. Jadbabaie, A distributed newton method for large scale consensus optimization, arxiv preprint arxiv: , 16. [15] O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol. 146, no. 1-, pp , 14. [16] M. Schmidt, N. L. Roux, and F. R. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, in Advances in neural information processing systems, pp , 11. [17] A. d Aspremont, Smooth optimization with approximate gradient, SIAM Journal on Optimization, vol. 19, no. 3, pp , 8. [18] V. Nedelcu, I. Necoara, and Q. Tran-Dinh, Computational complexity of inexact gradient augmented lagrangian methods: application to constrained mpc, SIAM Journal on Control and Optimization, vol. 5, no. 5, pp , 14. [19] D. P. Bertsekas, Nonlinear programming. Athena scientific Belmont, [] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 4.

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

DECENTRALIZED algorithms are used to solve optimization

DECENTRALIZED algorithms are used to solve optimization 5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 19, OCTOBER 1, 016 DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Aryan Mohtari, Wei Shi, Qing Ling,

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Decentralized Consensus Optimization with Asynchrony and Delay

Decentralized Consensus Optimization with Asynchrony and Delay Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS

BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS Songtao Lu, Ioannis Tsaknakis, and Mingyi Hong Department of Electrical

More information

arxiv: v2 [math.oc] 7 Apr 2017

arxiv: v2 [math.oc] 7 Apr 2017 Optimal algorithms for smooth and strongly convex distributed optimization in networks arxiv:702.08704v2 [math.oc] 7 Apr 207 Kevin Scaman Francis Bach 2 Sébastien Bubeck 3 Yin Tat Lee 3 Laurent Massoulié

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

A Distributed Newton Method for Network Optimization

A Distributed Newton Method for Network Optimization A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

Generalization to inequality constrained problem. Maximize

Generalization to inequality constrained problem. Maximize Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Asynchronous Distributed Optimization. via Randomized Dual Proximal Gradient

Asynchronous Distributed Optimization. via Randomized Dual Proximal Gradient Asynchronous Distributed Optimization 1 via Randomized Dual Proximal Gradient Ivano Notarnicola and Giuseppe Notarstefano arxiv:1509.08373v2 [cs.sy] 24 Jun 2016 Abstract In this paper we consider distributed

More information

WE consider the problem of estimating a time varying

WE consider the problem of estimating a time varying 450 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 61, NO 2, JANUARY 15, 2013 D-MAP: Distributed Maximum a Posteriori Probability Estimation of Dynamic Systems Felicia Y Jakubiec Alejro Ribeiro Abstract This

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

A Distributed Newton Method for Network Utility Maximization

A Distributed Newton Method for Network Utility Maximization A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Distributed Coordination for Separable Convex Optimization with Coupling Constraints

Distributed Coordination for Separable Convex Optimization with Coupling Constraints Distributed Coordination for Separable Convex Optimization with Coupling Constraints Simon K. Niederländer Jorge Cortés Abstract This paper considers a network of agents described by an undirected graph

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Distributed Consensus-Based Optimization

Distributed Consensus-Based Optimization Advanced Topics in Control 208: Distributed Systems & Control Florian Dörfler We have already seen in previous lectures that optimization problems defined over a network of agents with individual cost

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

A projection algorithm for strictly monotone linear complementarity problems.

A projection algorithm for strictly monotone linear complementarity problems. A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu

More information

Average-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control

Average-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control Outline Background Preliminaries Consensus Numerical simulations Conclusions Average-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control Email: lzhx@nankai.edu.cn, chenzq@nankai.edu.cn

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Distributed Computation of Quantiles via ADMM

Distributed Computation of Quantiles via ADMM 1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Zeno-free, distributed event-triggered communication and control for multi-agent average consensus

Zeno-free, distributed event-triggered communication and control for multi-agent average consensus Zeno-free, distributed event-triggered communication and control for multi-agent average consensus Cameron Nowzari Jorge Cortés Abstract This paper studies a distributed event-triggered communication and

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

On the Linear Convergence of Distributed Optimization over Directed Graphs

On the Linear Convergence of Distributed Optimization over Directed Graphs 1 On the Linear Convergence of Distributed Optimization over Directed Graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v1 [math.oc] 7 Oct 015 Abstract This paper develops a fast distributed algorithm,

More information

On the linear convergence of distributed optimization over directed graphs

On the linear convergence of distributed optimization over directed graphs 1 On the linear convergence of distributed optimization over directed graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v4 [math.oc] 7 May 016 Abstract This paper develops a fast distributed algorithm,

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

A State-Space Approach to Control of Interconnected Systems

A State-Space Approach to Control of Interconnected Systems A State-Space Approach to Control of Interconnected Systems Part II: General Interconnections Cédric Langbort Center for the Mathematics of Information CALIFORNIA INSTITUTE OF TECHNOLOGY clangbort@ist.caltech.edu

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information