Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods
|
|
- Cory Hodges
- 5 years ago
- Views:
Transcription
1 Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of decentralized convex optimization problems in which a network of agents aims to minimize a global objective function that is a sum of private local) smooth and strongly convex objectives. More specifically, we study decentralized inexact dual ascent method, in which the agents only approximately solve their private minimization and then update their dual variables using inexact dual ascent. We study the effect of inexact inner minimization on the convergence rate. In particular, we show that the overall convergence rate will not be affected by inexact minimization if the errors are decreased at an appropriate rate. We illustrate our findings in a distributed binary classification problem. I. INTRODUCTION In decentralized consensus optimization, a network of agents aims to minimize a global objective function that is a sum of private local) objective functions available to each node. The goal of the agents is to collaboratively agree on a minimizer of the global objective function without revealing their private objective functions over the course of minimization. Applications include distributed multi-agent coordination, estimation problems in sensor networks, resource allocation problems, decentralized decision making, and large-scale optimization problems in machine learning [1] [4]. Broadly speaking, decentralized algorithms for convex optimization fall into three categories: i) primal methods, in which the problem is solved entirely in the primal domain and partial) consensus among the agents is guaranteed by making use of disagreement functions as penalty functions [5] [7]. ii) primal-dual methods, in which the consensus constraint is relaxed by introducing Lagrangian multipliers, and the agents simultaneously iterate over the primal and dual variables to seek the saddle point of the Lagrangian [8] [11]; and iii) dual methods, in which the agents solve the dual problem in a decentralized fashion [1] [14]. The advantage of each of these methods over the others is determined by what assumptions the objective functions satisfy smoothness, strong convexity, etc.), as well as how the consensus constraint is encoded in the problem. In decentralized dual methods, agents need to repetitively compute the dual function, which amounts to solving an inner minimization problem exactly at each iteration of the dual problem. From a practical point of view, the agents may not be able to solve the inner minimization problem exactly, Work supported by the NSF under grants CAREER-ECCS and IIS The authors are with the Department of Electrical and Systems Engineering, University of Pennsylvania. The authors are with the Department of Electrical and Systems Engineering, University of Pennsylvania. {mahyarfa, paternain, aribeiro, preciado}@seas.upenn.edu. leading to computational errors in the dual problem. On the other hand, it is known that first-order methods necessarily suffer from inexact oracle [15] and their convergence rate degrades. In this context, there has been an increasing interest in analyzing various optimization algorithms with inexact oracle, such as inexact proximal gradient methods [16], smooth optimization with approximate gradient [17], and inexact augmented Lagrangian methods [18]. In this paper, we focus on smooth and strongly convex objective functions, where we study distributed inexact dual ascent based on Laplacian consensus. In each global iteration of this algorithm, the agents perform the inner minimization step up to a certain accuracy and then communicate via a Laplacian consensus protocol to update their private dual variables based on dual ascent. The resulting dual update is a gradient ascent-like method with additive errors that depend on the accuracy of the minimization step. We quantify the effect of inexact minimization on the overall convergence of the algorithm. In particular, we show that the convergence rate of the inexact algorithm will be the same as the exact counterpart, which is exponential, if the accuracy of the local minimization step is increased an appropriate rate. Numerical examples in which a group of agents collectively train a binary classifier support the theoretical results. A. Preliminaries II. DECENTRALIZED OPTIMIZATION We denote the set of real numbers by R, the set of real d- dimensional vectors by R d, the set of m d-dimensional matrices by R m d, the d-dimensional identity matrix by I d, and the the vector of d ones by 1 d. A weighted undirected graph is defined as G = V, E, A), where V is a set of n nodes and E is a set of m undirected edges. We assume that the graph is connected and has no self-loops. We consider graphs with weights associated to edges. We denote the weight of an edge {i, j} E as w ij >. The weighted adjacency matrix, denoted by A = [a ij ], is an n n symmetric matrix defined entry-wise as a ij = w ij if {i, j} E, and a ij =, otherwise. We denote by N i = {j V : a ij } the set of connected nodes to node i. The weighted Laplacian matrix of G is defined as L = diag A1 n ) A, and its eigenvalue decomposition is given by L = diag{, λ,..., λ n })U, where λ... λ n are nontrivial eigenvalues of L. By defining L = Udiag{, λ,..., λ n })U, we have that L L = L. We define L = L Id to handle d-dimensional variables. Notice that by the properties of Kronecker product,
2 we can write Null L) = {x R nd : x = 1 n x, x R d }. 1) A differentiable function f : R d R is -strongly convex over a convex set C R d if and only if it satisfies fx) y x) + y x + fx) fy), ) for all x, y C. A differentiable f whose gradient is Lipschitz continuous on C with parameter < satisfies fy) fx) + fx) y x) + y x, 3) for all x, y C. We denote by F, ) the class of functions satisfying both ) and 3). Note that in this class, it must hold that. We define the condition number of f F, ) as κ f = /. B. Decentralized Optimization with Laplacian Consensus Consider the following convex optimization problem: n x = argmin x R d f i x), 4) i=1 where the collective objective function is the sum of n convex functions f i : R d R. In a distributed setting, each objective function f i x) is privately available to node i of a network of n nodes denoted by V = {1,,, n}. The topology of the network is determined by a connected, weighted, and undirected graph G = V, E) where E V V is the set of edges. The nodes are required to collaboratively minimize the global cost function without revealing their private cost functions. The communication flow pattern among the nodes is determined by the network structure graph G. Further, we assume that the local objective functions are twice-continuously differentiable and strongly convex with Lipschitz gradient. In other words, we have m i f I d f i x) L i f I d, 5) for some < m i f Li f < and all i V. By defining x i R d as the decision variable of node i a local copy of x R d at node i) and enforcing the equality constraint x 1 =... = x n, an equivalent formulation of 4) is n x = 1 n x = argmin f i x i ), 6) {x i} n i=1 i=1 s.t. x 1 = = x n. There are various ways to enforce the consensus constraints in 6). In this paper, we use the Laplacian consensus formulation. More explicitly, consider the Laplacian L of the communication graph. Since the graph is connected and undirected, we have that L1 d = and 1 d L =. By defining x = [x 1,, x n ] R nd as the concatenation of the local decision variables, and defining L = L I d, we can equivalently formulate 6) as { n } min fx) = f i x i ) s.t. Lx =. 7) x R nd i=1 The Lagrangian of 7) is given by Lx, λ) = fx) + λ Lx, 8) where λ = [λ 1,..., λ n ] R nd is the stacked vector of local multipliers. Notice that the Lagrangian function is strongly convex in x and affine in λ. Further, the Lagrangian is not affected by λ λ + λ, λ Null L). We can therefore assume λ Null L) or, equivalently, i λi = ), without loss of generality. The dual function for the Lagrangian in 8) is defined by dλ) = min Lx, λ). 9) x Rn The dual problem is then to maximize the dual function 9) with respect to λ, d = max λ Null dλ) = L) max λ Null L) min Lx, λ). x R nd 1) Since the primal problem 7) is feasible, i.e., the Slater s condition holds, the duality gap is zero, and hence, the problem can be solved by its dual formulation. The Karush- Kuhn-Tucker KKT) optimality conditions for characterizing the optimal pair x, λ ) R nd R nd are x Lx, λ ) = fx ) + Lλ =, 11) λ Lx, λ ) = Lx =, where fx) = f 1 x 1 ),..., f n x n )). The second identity ensures consensus, x = 1 n x, according to 1). Multiplying both sides of the first condition from left by 1 n I d and noticing that 1 n I d ) L = 1 n I d ) L I d ) = 1 n L) Id I d ) = n I d, we obtain n i=1 f i x ) =, which is the optimality condition for the centralized problem in 4), as required. C. Decentralized Dual Ascent Dual ascent methods solve the dual problem 9) using gradient ascent. Assuming that the dual function is differentiable, the dual gradient, by Danskin s theorem [19], is given by dλ)= Lx λ), where x λ)=argmin x R nd Lx, λ). 1) Therefore, the dual ascent performs the following recursions to solve 9): x k+1 =argmin Lx, λ k ), 13a) x R nd λ k+1 =λ k +α Lx k+1, 13b) with initialization λ NullL) or, equivalently, i λi = ), and α > is a constant step size universal to all the agents. The primal update is an exact minimization of the Lagrangian function, leading to dλ k ) = Lx k+1, according to 1). Therefore, the second update corresponds to a gradient step. Notice, however, that 13) is not distributed, since the linear map L is not locally computable. Introducing ν k = ν 1 k,..., νn k ) = Lλ k as the scaled
3 version of the Lagrangian multipliers, we can equivalently write 13) in terms of the new dual variables ν, as follows, x k+1 = argmin x R nd fx) + ν k x, 14a) ν k+1 = ν k + αlx k+1, 14b) which lends itself to a distributed implementation. More precisely, if we rewrite 14) in terms of the individual dynamics of the agents, we arrive at the following update law for agent i V, x i k+1 = argmin x f i x) + νk i x, 15a) νk+1 i = νk i + α w ij x i k+1 x j k+1 ). 15b) j N i The first update in 15) requires no communication since the function f i x) and the multiplier ν i are local. The second update requires one round of communication of each agent with its neighbors to exchange the private minimizers x j k+1. Notice that due to the change of variables ν = Lλ, the initial dual variables ν in 14) must respect the condition n i=1 νi =. III. DECENTRALIZED INEXACT DUAL ASCENT The decentralized dual ascent algorithm 15) relies on the assumption that the Lagrangian minimization with respect to the primal variable at each iteration is exact, which translates into exact dual gradient information to be available. In other words, at each global time step k, the agents need to compute their local exact minimizers argmin x f i x) + ν i k x) via an iterative method. In practice, especially in realtime applications, agents are often only able to solve their subproblems approximately by truncating their iterative inner minimization scheme, which means that the outer iteration dual ascent) will be provided with an inexact dual gradient. We therefore consider the practical case that the agents approximately solve their private minimization: x i k+1 argmin x R d f i x) + νk i x, 16a) νk+1 i = νk i + α w ij x i k+1 x j k+1 ). 16b) j N i There exist several ways to characterize the accuracy of approximate minimizers. Here, we focus on accuracy levels expressed in terms of objective values, i.e., we assume that at each global time k +1, agent i admits an ε i k+1 -suboptimal solution x i k+1 such that f i x i k+1)+ν i k x i k+1 min x {f i x)+ν i k x} ε i k+1). The updates in 16) can then collectively be written as 16c) x k+1 argmin x R nd fx) + ν x, 17a) ν k+1 = ν k + αlx k+1. 17b) By summing both sides of 16c) over i = 1,..., n, we obtain fx k+1 )+ν k x k+1 min x {fx)+ν k x} ε k+1, 17c) where ε k = [ε 1 k... εn k ] is the stacked vector of local errors, defined in 16c). Notice that the exact dual ascent dynamics in 14) corresponds to ε k =. The reason for choosing the objective value as a stopping criterion is twofold. First, this criterion is the most relaxed weakest) stopping criterion [15]. Second, for several algorithms in the literature, we can find explicit bounds on the number of iterations required to obtain the desired accuracy in terms of the objective value. The decentralized inexact dual ascent algorithm is outlined in Algorithm 1. Algorithm 1 Decentralized Inexact Dual Ascent DIDA) Given: {f i x)} n i=1, where f i x) Fm i f, Li f ). An undirected connected communication graph with Laplacian matrix L = [w ij ], step size < α /λ nl), where = i mi f. 1: Initialize at ν i =, xi Rd for all i = 1,..., n : for k =, 1,, all agents do 3: f i x i k+1 )+νi k x i k+1 minx{f i x)+ν i k x} ε i k+1 ). 4: ν i k+1 = νi k + α j N i w ij x i k+1 xj k+1 ). 5: end for A. Convergence Analysis We now analyze the convergence of decentralized inexact dual ascent dynamics. For convenience, we perform the convergence analysis in terms of the original dual variables λ. We first define the dual gradient mapping F : R nd R nd as follows, F λ) :=λ+α Lx λ), x λ) = argmin x Lx, λ). 18) By this definition, the recursions in 17), after the change of variables ν = Lλ, can be written as Lx k+1, λ k ) Lx k+1, λ k ) ε k+1, λ k+1 = F λ k ) + e k+1, e k+1 = α Lx k+1 Lx k+1). 19a) 19b) 19c) Here, e k+1 is the propagated error to the dual ascent update as a result of inexact inner minimization. To analyze the convergence of 19), we first characterize the smoothness properties of F λ). Lemma 1 Consider the Lagrangian in 8), where f F, ). Then the dual gradient mapping, defined as in 18), satisfies F λ) F ν) L F λ ν, for all λ, ν NullL), where the Lipschitz constant L F is given by L F = max 1 αλ n L)/, 1 αλ L)/ ). ) Proof: See Appendix VI-A. A direct consequence of Lemma 1 is that the dual ascent mapping F λ) is contractive on the subspace NullL) provided that L F < 1. It is easy to verify that this condition is satisfied by the selection < α < /λ n L). Therefore, in view of 19b), the dual update is a contractive
4 dynamics perturbed with additive errors. In the following, we characterize the convergence rate of 19). Proposition 1 Consider the decentralized inexact dual ascent outlined in 19), where f F, ). Define L F as in ). Then, for all k 1, we have where A k = L k F x k+1 x A k +B k, 1) λ n L) 1 λ λ, B k = ) 3 λn L)α k j=1 L k j F ε j + ) 1 εk+1. Proof: See Appendix VI-B. We now describe the terms A k and B k that determine the overall convergence rate of the algorithm. The first term, A k, is error-independent and vanishes exponentially with the same rate as the exact algorithm i.e., when ε k ). In other words, we have that A k = OL k F ). The second term, B k, is a weighted sum of the inner minimization errors, whose limiting behavior depends on that of ε k. Explicitly, suppose ε k Oρ k e) for some < ρ e < 1, ρ e L F. Then, we can verify that B k Omaxρ e, L F ) k ), which further implies that x k+1 x Omaxρ e, L F ) k ). In particular, if the errors in the inexact minimization step vanish faster than the contraction factor of the dual dynamics, i.e., if ρ e < L F, the convergence rate of the inexact algorithm will be unaffected by inexact inner minimization. This suggests that the agents can start with a crude solution of their inner minimization, when they are far from the global solution, and then increase their accuracy geometrically after each round of communication. This leads to a substantial computational gain for each agent at the initial iterations of the algorithm. Finally, if ρ e = L F, then it is not difficult to show that B k OkL k F ), leading to an OkLk F ) overall convergence rate. We close this section by a remark. Remark 1 Comparison with Decentralized ADMM) Recall the expression for the convergence factor of the inexact dual gradient algorithm stated in Proposition 1 L F =max 1 αλ n L)/, 1 αλ L)/ ), ) which is a function of the step size α. By simple algebraic calculations, the optimal smallest) convergence factor is attained when the step size is selected as α = λ n L)/ + λ L)/ ). For this selection, the optimal convergence factor is L F = κ f κ G 1 κ f κ G + 1, 3) where we have defined the graph condition number as κ G = λ n L)/λ L), and κ f = / is the condition number of Convergence Factor κ G 4 4 Dual Ascent ADMM Fig. 1: A comparison of exponential convergence factors of Decentralized Dual Ascent with that of Decentralized ADMM established in [1]. κ f is the condition number of the global objective function whereas κ G is the condition number of the graph Laplacian. Smaller values of convergence factor means faster convergence. the objective function as before. For the purpose of comparison, we consider the decentralized ADMM implementation of the original problem 4) outlined in [1]. The authors establish the following optimal rate L ADMM = 1 κf κ f κ f κ G κ f 6 1 ) 1, 4) using the same assumptions as in the present work. In Fig. 1, we compare the convergence factor of decentralized dual ascent, Eq. 3), with that of decentralized ADMM, Eq. 4) for κ f, κ G ) [1, 1] [1, 1]. We observe that the worst case convergence rate of decentralized dual ascent is always better by a considerable margin than the rate claimed in [1]. Intuitively explaining, in ADMM we still need to solve exact inner minimizations but in a sequential fashion. This sequential minimization introduces inherent inexactness in dual gradient information even when the inner minimizations are performed exactly. In contrast, decentralized dual ascent provides the algorithm with exact dual gradient information. Even if the inner minimization is performed inexactly, Proposition 1 states that the convergence rate is not affected as long as the inner minimization errors are decreased at an appropriate rate. IV. NUMERICAL SIMULATIONS In this section we consider the problem of training a binary classifier with a dataset 1 that is scattered across a multi-agent network. For the connectivity graph, we consider a random network with n = 1 nodes and probability of connection p =.. Let us denote by x i k, yi k ) Rd 1 { 1, 1} the k-th data point of the training set of agent i, where d = 1, i = 1,..., n and k = 1,..., 5. Each agent has a local copy w i R d of the global classifier and a local function based 1 ftp://ftp.ics.uci.edu/pub/machine-learning-databases
5 Fig. : Plot of the norm of the gradient of the Lagrangian with respect to the primal variable in blue) and the dual variable in red) for the numerical example of Section IV. on the training data observed. N i )) f i w i ) = log 1 + e yi k [x i k 1]w i + γ w i. 5) k=1 Here, γ > is the regularization constant, which we set to γ = 1. Given these numerical values, we run Algorithm 1 assuming that the inaccuracy of the inner minimization decreases at a geometric rate with ρ e =.99. In Figure, we plot the evolution of the norm of the derivative of the Lagrangian with respect to the primal and dual variables. V. CONCLUSIONS We studied decentralized consensus optimization of smooth and strongly convex objective functions using decentralized inexact dual ascent. Specifically, we considered a Lagrangian formulation in which the inner minimization of the Lagrangian is locally computable at each node. We assumed that the inner minimizations are performed inexactly, resulting in the dual update to be an inexact dual ascent. We analyzed the effect of inaccuracy on the overall convergence rate. We showed that the convergence rate of the algorithm does not deteriorate if the inner minimization error are controlled in an appropriate way. A. Proof of Lemma 1 VI. APPENDIX Since fx) and in turn, Lx, λ) is strongly convex, the mapping λ x λ) = argmin x Lx, λ) is well defined and differentiable almost everywhere. To show this, we begin by the optimality condition that defines x λ) as follows, fx λ)) + Lλ =. 6) By strong convexity of f, we can write [, 9] x λ) x µ) fx λ)) fx µ)), L λ µ, where in the second inequality, we have used 6). The above inequality establishes the Lipschitz continuity of the map x λ). Hence, x λ) is differentiable almost everywhere. Now since 6) holds for all λ, we can differentiate both sides with respect to λ and use the chain rule to obtain fx λ)) dx λ) dλ + L =. On the other hand, by Danskin s theorem [19], the dual function is differentiable with its gradient given by dλ) = Lx λ). By differentiating one more time, we obtain the dual Hessian dλ) = L dx λ) dλ = L fx λ)) 1 L. Note that the Hessian is negative definite on NullL). To see this, we note that λ dλ)λ = Lλ) fx λ)) 1 Lλ). Since I nd fx λ)) I nd by assumption, we obtain 1 λ Lλ λ dλ)λ 1 λ Lλ. 7) Furthermore, for all λ NullL), we can write λ L) λ λ Lλ λ n L) λ. 8) Incorporating 8) in 7), we obtain λ nl) I nd dλ) λ L) I nd, λ NullL). 9) On the other hand, the Jacobian of F λ) = λ + α dλ) is given by Using 9), we can write d dλ F λ) = I nd + α dλ). 1 αλ nl) )I nd d dλ F λ) 1 αλ L) )I nd. Therefore, for all λ NullL) the Jacobian of F satisfies the bound d dλ F λ) L F = max 1 αλ nl), 1 αλ L) ). 3) Next, using Taylor s theorem, for all λ, ν NullL), we can write F λ) F ν) = d F ν + tλ ν))λ ν)dt. dλ
6 Since ν + tλ ν) NullL) for t 1, we can write F λ) F ν) = d dλ F ν + tλ ν))λ ν)dt d dλ F ν + tλ ν)) λ ν) dt L F λ ν) dt = L F λ ν. In other words, F is Lipschitz continuous on NullL) with parameter L F. The proof is now complete. B. Proof of Proposition 1 Consider the map x Lx, λ) = fx)+λ Lx, which is -strongly convex. We can therefore write [, 9] x k+1 x k+1 Lx k+1, λ k ) Lx k+1, λ k ). The right-hand side is bounded by ε k+1, according to 19a). Therefore, we can write which further implies x k+1 x k+1 ε k+1, 31) e k+1 = α Lx k+1 x k+1) C ε k+1, 3) where C = λ n L)α /. Similarly, we use the strong convexity of f to write x k+1 x fx k+1) fx ). Recalling that fx k+1 ) + Lλ k = and fx ) + Lλ = see 11)), we can write x k+1 x Lλ k Lλ 33) λ nl) 1 λ k λ. On the other hand, observe that we can write λ k = λ + k j=1 α Lx j, k 1. Since λ NullL) by assumption initialization) and k j=1 α Lx j NullL), we have that λ k NullL) for all k, i.e., λ k is in the subspace in which the dual gradient mapping F λ) is contractive. Invoking Lemma 1, we can write λ k λ = F λ k 1 ) F λ ) + e k 34) F λ k 1 ) F λ ) + e k L F λ k 1 λ + e k. By iterating down to k = 1 and using the bound in 3), we obtain k λ k λ L k F λ λ +C 1 L j F ε j ). 35) j=1 Finally, we use the triangle inequality to write x k+1 x x k+1 x k+1 + x k+1 x. 36) The first and second term on the right-hand side can by bounded by 31) and 33), respectively. By Substituting these bounds in 36) and further using the bound in 35) yields the desired inequality. The proof is now complete. REFERENCES [1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends R in Machine Learning, vol. 3, no. 1, pp. 1 1, 11. [] R. Zhang and J. Kwok, Asynchronous distributed admor consensus optimization, in International Conference on Machine Learning, pp , 14. [3] J. C. Duchi, A. Agarwal, and M. J. Wainwright, Dual averaging for distributed optimization: Convergence analysis and network scaling, IEEE Transactions on Automatic control, vol. 57, no. 3, pp , 1. [4] D. V. Dimarogonas, E. Frazzoli, and K. H. Johansson, Distributed event-triggered control for multi-agent systems, IEEE Transactions on Automatic Control, vol. 57, no. 5, pp , 1. [5] K. Yuan, Q. Ling, and W. Yin, On the convergence of decentralized gradient descent, SIAM Journal on Optimization, vol. 6, no. 3, pp , 16. [6] W. Shi, Q. Ling, G. Wu, and W. Yin, Extra: An exact first-order algorithor decentralized consensus optimization, SIAM Journal on Optimization, vol. 5, no., pp , 15. [7] A. Mokhtari, Q. Ling, and A. Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing, vol. 65, no. 1, pp , 17. [8] A. Nedic and A. Ozdaglar, Distributed subgradient methods for multiagent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp , 9. [9] D. Feijer and F. Paganini, Stability of primal dual gradient dynamics and applications to network optimization, Automatica, vol. 46, no. 1, pp , 1. [1] S. S. Kia, J. Cortés, and S. Martínez, Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication, Automatica, vol. 55, pp , 15. [11] J. Wang and N. Elia, Control approach to distributed optimization, in Communication, Control, and Computing Allerton), 1 48th Annual Allerton Conference on, pp , IEEE, 1. [1] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, On the linear convergence of the admm in decentralized consensus optimization., IEEE Trans. Signal Processing, vol. 6, no. 7, pp , 14. [13] M. Zargham, A. Ribeiro, A. Ozdaglar, and A. Jadbabaie, Accelerated dual descent for network flow optimization, IEEE Transactions on Automatic Control, vol. 59, no. 4, pp. 95 9, 14. [14] R. Tutunov, H. B. Ammar, and A. Jadbabaie, A distributed newton method for large scale consensus optimization, arxiv preprint arxiv: , 16. [15] O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol. 146, no. 1-, pp , 14. [16] M. Schmidt, N. L. Roux, and F. R. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, in Advances in neural information processing systems, pp , 11. [17] A. d Aspremont, Smooth optimization with approximate gradient, SIAM Journal on Optimization, vol. 19, no. 3, pp , 8. [18] V. Nedelcu, I. Necoara, and Q. Tran-Dinh, Computational complexity of inexact gradient augmented lagrangian methods: application to constrained mpc, SIAM Journal on Control and Optimization, vol. 5, no. 5, pp , 14. [19] D. P. Bertsekas, Nonlinear programming. Athena scientific Belmont, [] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 4.
ADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationDecentralized Quadratically Approximated Alternating Direction Method of Multipliers
Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationNetwork Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationA SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS
A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)
More informationDLM: Decentralized Linearized Alternating Direction Method of Multipliers
1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationDECENTRALIZED algorithms are used to solve optimization
5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 19, OCTOBER 1, 016 DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Aryan Mohtari, Wei Shi, Qing Ling,
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationDistributed Consensus Optimization
Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized
More informationSome Inexact Hybrid Proximal Augmented Lagrangian Algorithms
Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br
More informationSolving Dual Problems
Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationDecentralized Consensus Optimization with Asynchrony and Delay
Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationDistributed Convex Optimization
Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationDual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationLagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine
More informationDual Ascent. Ryan Tibshirani Convex Optimization
Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationA Distributed Newton Method for Network Utility Maximization, II: Convergence
A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility
More informationIntroduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationBLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS
BLOCK ALTERNATING OPTIMIZATION FOR NON-CONVEX MIN-MAX PROBLEMS: ALGORITHMS AND APPLICATIONS IN SIGNAL PROCESSING AND COMMUNICATIONS Songtao Lu, Ioannis Tsaknakis, and Mingyi Hong Department of Electrical
More informationarxiv: v2 [math.oc] 7 Apr 2017
Optimal algorithms for smooth and strongly convex distributed optimization in networks arxiv:702.08704v2 [math.oc] 7 Apr 207 Kevin Scaman Francis Bach 2 Sébastien Bubeck 3 Yin Tat Lee 3 Laurent Massoulié
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationA Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations
A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationPreconditioning via Diagonal Scaling
Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationPrimal-Dual Interior-Point Methods for Linear Programming based on Newton s Method
Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationA Distributed Newton Method for Network Optimization
A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationALADIN An Algorithm for Distributed Non-Convex Optimization and Control
ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More informationGeneralization to inequality constrained problem. Maximize
Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationAsynchronous Distributed Optimization. via Randomized Dual Proximal Gradient
Asynchronous Distributed Optimization 1 via Randomized Dual Proximal Gradient Ivano Notarnicola and Giuseppe Notarstefano arxiv:1509.08373v2 [cs.sy] 24 Jun 2016 Abstract In this paper we consider distributed
More informationWE consider the problem of estimating a time varying
450 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 61, NO 2, JANUARY 15, 2013 D-MAP: Distributed Maximum a Posteriori Probability Estimation of Dynamic Systems Felicia Y Jakubiec Alejro Ribeiro Abstract This
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationA Distributed Newton Method for Network Utility Maximization
A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationConstrained optimization: direct methods (cont.)
Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationDistributed Coordination for Separable Convex Optimization with Coupling Constraints
Distributed Coordination for Separable Convex Optimization with Coupling Constraints Simon K. Niederländer Jorge Cortés Abstract This paper considers a network of agents described by an undirected graph
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationKarush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationDistributed Consensus-Based Optimization
Advanced Topics in Control 208: Distributed Systems & Control Florian Dörfler We have already seen in previous lectures that optimization problems defined over a network of agents with individual cost
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationA projection algorithm for strictly monotone linear complementarity problems.
A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu
More informationAverage-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control
Outline Background Preliminaries Consensus Numerical simulations Conclusions Average-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control Email: lzhx@nankai.edu.cn, chenzq@nankai.edu.cn
More informationConvex Optimization of Graph Laplacian Eigenvalues
Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationDistributed Computation of Quantiles via ADMM
1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.
More information3.10 Lagrangian relaxation
3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationZeno-free, distributed event-triggered communication and control for multi-agent average consensus
Zeno-free, distributed event-triggered communication and control for multi-agent average consensus Cameron Nowzari Jorge Cortés Abstract This paper studies a distributed event-triggered communication and
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationOn the Linear Convergence of Distributed Optimization over Directed Graphs
1 On the Linear Convergence of Distributed Optimization over Directed Graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v1 [math.oc] 7 Oct 015 Abstract This paper develops a fast distributed algorithm,
More informationOn the linear convergence of distributed optimization over directed graphs
1 On the linear convergence of distributed optimization over directed graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v4 [math.oc] 7 May 016 Abstract This paper develops a fast distributed algorithm,
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More informationA State-Space Approach to Control of Interconnected Systems
A State-Space Approach to Control of Interconnected Systems Part II: General Interconnections Cédric Langbort Center for the Mathematics of Information CALIFORNIA INSTITUTE OF TECHNOLOGY clangbort@ist.caltech.edu
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More information