On the Linear Convergence of Distributed Optimization over Directed Graphs

Size: px
Start display at page:

Download "On the Linear Convergence of Distributed Optimization over Directed Graphs"

Transcription

1 1 On the Linear Convergence of Distributed Optimization over Directed Graphs Chenguang Xi, and Usman A. Khan arxiv: v1 [math.oc] 7 Oct 015 Abstract This paper develops a fast distributed algorithm, termed DEXTRA, to solve the optimization problem when N agents reach agreement and collaboratively minimize the sum of their local objective functions over the networ, where the communications between agents are described by a directed graph, which is uniformly strongly-connected. Existing algorithms, including Gradient-Push (GP and Directed-Distributed Gradient Descent (D-DGD, solve the same problem restricted to directed graphs with a convergence rate of O(ln /, where is the number of iterations. Our analysis shows that DEXTRA converges at a linear rate O(ɛ for some constant ɛ < 1, with the assumption that the objective functions are strongly convex. The implementation of DEXTRA requires each agent now its local out-degree only. Simulation examples illustrate our findings. Index Terms Distributed optimization; multi-agent networs; directed graphs;. I. INTRODUCTION Distributed computation and optimization have gained great interests due to their widely applications often in, e.g., large scale machine learning, [1, ], model predictive control, [3], cognitive networs, [4, 5], source localization, [6, 7], resource scheduling, [8], and message routing, [9]. All of above applications can be reduced to variations of distributed optimization problems by a networ of nodes when nowledge of the objective function is scattered over the networ. In particular, we consider the problem that minimizing a sum of objectives, n i=1 f i(x, where f i : R p R is a private objective function at the ith agent of the networ. C. Xi and U. A. Khan are with the Department of Electrical and Computer Engineering, Tufts University, 161 College Ave, Medford, MA 0155; chenguang.xi@tufts.edu, han@ece.tufts.edu. This wor has been partially supported by an NSF Career Award # CCF

2 There have been many types of algorithms to solve the problem in a distributed manner. The most common alternatives are Distributed Gradient Descent (DGD, [10, 11], Distributed Dual Averaging (DDA, [1], and the distributed implementations of the Alternating Direction Method of Multipliers (ADMM, [13 15]. DGD and DDA can be reduced to gradient-based methods, where at each iteration a gradient related step is calculated, followed by averaging with neighbors in the networ. The main advantage of these methods is computational simplicity. However, they suffer from a slower convergence rate due to the necessary slowly diminishing stepsize in order to achieve an exact convergence. The convergence rate of DGD and DDA, with a diminishing stepsize, is shown to be O( ln, [10]. When given a constant stepsize, the algorithms accelerate themselves to O( 1, at a cost of inexact converging to a neighborhood of optimal solutions, [11]. To overcome the difficulties, some alternatives include Nesterov-based methods, e.g., the Distributed Nesterov Gradient (DNG, [16], whose convergence rate is O( log with a diminishing stepsize, and Distributed Nesterov gradient with Consensus iterations (DNC, [16], with a faster convergence rate of O( 1 at the cost of times iterations at th iteration. Some methods use second order information such as the Networ Newton (NN method, [17], with a convergence rate that is at least linear. Both DNC and NN need multiple communication steps in one iteration. The distributed implementations of ADMM is based on augmented Lagrangians, where at each iteration the primal and dual variables are solved to minimize a lagrangian related function. This type of methods converges exactly to the optimal point with a faster rate of O( 1 with a constant stepsize and further get linear convergence rate with the assumption of strongly convex of objective functions. However, the disadvantage is the high computation burden. Each agent needs to solve an optimization problem at each iteration. To resolve the issue, Decentralized Linearized ADMM (DLM, [18], and EXTRA, [19], are proposed, which can be considered as a first order approximation of decentralized ADMM. DLM and EXTRA have been proven to converge at a linear rate if the local objective functions are assumed to be strongly convex. All these proposed distributed algorithms, [10 19], assume the multi-agent networ to be an undirected graph. In contrast, Researches successfully solving the problem restricted to directed graphs are limited. We report the papers considering directed graphs here. There are two types of alternatives, which are both gradient-based algorithms, solving the problems restricted to directed graphs. The first type of algorithms, called Gradient-Push (GP method, [0, 1], combines gradient

3 3 descent and push-sum consensus. The push-sum algorithm, [, 3], is first proposed in consensus problems 1 to achieve average-consensus given a column-stochastic matrix. The idea is based on computing the stationary distribution of the Marov chain characterized by the multi-agent networ and canceling the imbalance by dividing with the left-eigenvector. GP follows a similar spirit of push-sum consensus and can be regarded as a perturbed push-sum protocol converging to the optimal solution. Directed-Distributed Graident Descent (D-DGD, [8, 9], is the other gradient-based alternative to solve the problem. The main idea follows Cai s paper, [30], where a new non-doubly-stochastic matrix is constructed to reach average-consensus. In each iteration, D- DGD simultaneously constructs a row-stochastic matrix and a column-stochastic matrix instead of only a doubly-stochastic matrix. The convergence of the new weight matrix, depending on the row-stochastic and column-stochastic matrices, ensures agents to reach both consensus and optimality. Since both GP and D-DGD are gradient-based algorithm, they suffer from a slow convergence rate of O( ln due to the diminishing stepsize. We conclude the existing first order distributed algorithms solving the optimization problem of minimizing the sum of agents local objective functions, and do a comparison in terms of speed, in Table I, including both the circumstance of undirected graphs and directed graphs. As seen in Table I, I means DGD with a constant stepsize is an Inexact method; M is saying that the DNC needs to do Multiple ( times of consensus at th iteration, which is impractical in real situation as goes to infinity, and C represents that DADMM has a much higher Computation burden compared to other first order methods. In this paper, we propose a fast distributed algorithm, termed DEXTRA, to solve the problem over directed graphs, by combining the push-sum protocol and EXTRA. The push-sum protocol has proven to be a good technique for dealing with optimization over digraphs, [0, 1]. EXTRA wors well in optimization problems over undirected graph with a fast convergence rate and a low computation complexity. By applying the push-sum techniqies into EXTRA, we show that DEXTRA converges exactly to the optimal solution with a linear rate, O(ɛ, even when the underlying networ is directed. The fast convergence rate is guaranteed because we apply a constant stepsize in DEXTRA compared with the diminishing stepsize used in GP or D- DGD. Currently, our formulation is restricted to strongly convex functions. We claim that though DEXTRA is a combination of push-sum and EXTRA, the proof is not trival by simply following 1 See, [4 7], for additional information on average consensus problems.

4 4 Algorithms General Convex Strongly Convex DGD (α DDA (α O( ln O( ln O( ln O( ln DGD (α O( 1 I DNG (α O( ln DNC (α O( 1 M DADMM (α O(ɛ C DLM (α O(ɛ EXTRA (α O( 1 O(ɛ GP (α D-DGD (α O( ln O( ln O( ln O( ln DEXTRA (α O(ɛ TABLE I: Convergence rate of first order distributed optimization algorithms regarding undirected and directed graphs. the proof of either algorithm. As the communication lin is more restricted compared with the case in undirected graph, many properties used in the proof of EXTRA no longer wors for DEXTRA, e.g., the proof of EXTRA requiring the weighting matrix to be symmetric which never happens in directed graphs. The remainder of the paper is organized as follows. Section II describes, develops and interprets the DEXTRA algorithm. Section III presents the appropriate assumptions and states the main convergence results. In Section IV, we show the proof of some ey lemmas, which will be used in the subsequent proofs of convergence rate of DEXTRA. The main proof of the convergence rate of DEXTRA is provided in Section V. We show numerical results in Section VI and Section VII contains the concluding remars. Notation: We use lowercase bold letters to denote vectors and uppercase italic letters to denote matrices. We denote by [x] i the ith component of a vector x. For matrix A, we denote by [A] i the ith row of the matrix, and by [A] ij the (i, jth element of a matrix, A. The matrix, I n,

5 5 represents the n n identity, and 1 n (0 n are the n-dimensional vector of all 1 s (0 s. The inner product of two vectors x and y is x, y. We define the A-matrix norm, x A, of any vector x as x A x, Ax = x, A x = x, A+A x for A + A being positive semi-definite. If a symmetric matrix A is positive semi-definite, we write A 0, while A 0 means A is positive definite. The largest and smallest eigenvalues of a matrix A are denoted as λ max (A and λ min (A. The smallest nonzero eigenvalue of a matrix A is denoted as λ min (A. For any objective function f(x, we denote f(x the gradient of f at x. II. DEXTRA DEVELOPMENT In this section, we formulate the optimization problems and describe the DEXTRA algorithm. We derive an informal but intuitive proof of DEXTRA, showing that the algorithm push agents to achieve consensus and reach the optimal solution. The EXTRA algorithm, [19], is briefly recapitulated in this section. We derive DEXTRA to a similar form as EXTRA such that our algorithm can be viewed as an improvement of EXTRA suited to the case of directed graphs. This reveals the meaning of DEXTRA: Directed EXTRA. Formal convergence results and proofs are left to Section III, IV, and V. Consider a strongly-connected networ of n agents communicating over a directed graph, G = (V, E, where V is the set of agents, and E is the collection of ordered pairs, (i, j, i, j V, such that agent j can send information to agent i. Define N in i to be the collection of in-neighbors, i.e., the set of agents that can send information to agent i. Similarly, N out i is defined as the out-neighborhood of agent i, i.e., the set of agents that can receive information from agent i. We allow both N in i and N out i to include the node i itself. Note that in a directed graph when (i, j E, it is not necessary that (j, i E. Consequently, N in i Ni out, in general. We focus on solving a convex optimization problem that is distributed over the above multi-agent networ. In particular, the networ of agents cooperatively solve the following optimization problem: n P1 : min f(x = f i (x, where each f i : R p R is convex, not necessarily differentiable, representing the local objective function at agent i. i=1

6 6 A. EXTRA for undirected graphs EXTRA is a fast exact first-order algorithm to solve Problem P1 but in the case that the communication networ is described by an undirected graph. At th iteration, each agent i performs the following updates: x +1 i =x i + w ij x j w ij x 1 j α [ f i (x i f i (x 1 i ], (1 j N i j N i where the weight, w ij, forms a weighting matrix, W = {w ij } satisfying that W is symmetric and doubly stochastic, and the collection W = { w ij } satisfies W = θi n + (1 θw with some θ (0, 1 ]. The update in Eq. (1 ensures x i to converge to the optimal solution of the problem for all i with convergence rate O( 1, and converge linearly when the objective function is further strongly convex. To better represent EXTRA and later compare with the DEXTRA algorithm, we write Eq. (1 in a matrix form. Let x, f(x R np be the collection of all agents states and gradients at time, i.e., x and W, W R n n [x 1; ; x n], f(x [ f 1 (x 1;,, f n (x n], be the weighting matrix collecting weights w ij, w ij, respectively. Then Eq. (1 can be represented in a matrix form as: x +1 = [(I n + W I p ] x ( W I p x 1 α [ f(x f(x 1 ], ( where the symbol is the Kronecer product. We now state DEXTRA and derive it in a similar form as EXTRA. B. DEXTRA Algorithm To solve the Problem P1 suited to the case of directed graphs, we claim the DEXTRA algorithm implemented in a distributed manner as follow. Each agent, j V, maintains two vector variables: x j, z j R p, as well as a scalar variable, y j R, where is the discrete-time index. At the th iteration, agent j weights its state estimate, a ij x j, a ij yj,and ã ij x 1 j, and sends it to each out-neighbor, i Nj out, where the weights, a ij, and, ã ij, s are such that: a ij = > 0, i N out j, 0, otw., θ + (1 θa ij, i = j, ã ij = (1 θa ij, i j, n a ij = 1, j, i=1 j,

7 7 where θ (0, 1 in ]. With agent i receiving the information from its in-neighbors, j N i, it calculates the states, z i, by dividing x i over yi, and updates the variables x +1 i, and y +1 i follows: x +1 i y +1 i z i = x i, yi =x i + = j N in i j N in i ( aij x j j N in i as (3a [ (ãij x 1 j α fi (z i f i (z 1 i ], (3b ( aij yj, (3c where f i (z i is the gradient of the function f i (z at z = z i, and f i (z 1 i is the gradient at z 1 i, respectively. The method is initiated with an arbitrary vector, x 0 i, and with y 0 i = 1 for any agent i. The stepsize, α, is a positive number within a certain interval. We will explicitly discuss the range of α in Section III. At the first iteration, i.e., = 0, we have the following iteration instead of Eq. (3, z 0 i = x0 i, yi 0 x 1 i = j N in i y 1 i = j N in i ( aij x 0 j α fi (z 0 i, ( aij yj 0. We note that the implementation of Eq. (3 needs each agent having the nowledge of its outneighbors (such that it can design the weights. In a more restricted setting, e.g., a broadcast application where it may not be possible to now the out-neighbors, we may use a ij = N out j 1 ; thus, the implementation only requires each agent nowing its out-degree, see, e.g., [0, 8] for similar assumptions. To simplify the proof, we write DEXTRA, Eq. (3, in a matrix form. Let, A = {a ij } R n n, Ã = {ã ij } R n n, be the collection of weights, a ij, ã ij, respectively, x, z, f(x R np, be the collection of all agents states and gradients at time, i.e., x [x 1; ; x n], z [z 1; ; z n], f(x [ f 1 (x 1; ; f n (x n], and y R n be the collection of agent states, y i, i.e., y [y 1; ; y n]. Note that at time, value, y 0 : y can be represented by the initial y = Ay 1 = A y 0 = A 1 n. (5

8 8 Define the diagonal matrix, D R n n, for each, such that the ith element of D is y i, i.e., D = diag ( A 1 n. (6 Then we have Eq. (3 in the matrix form equivalently as follows: ( [D z = ] 1 Ip x, x +1 =x + (A I p x (Ã I px 1 α [ f(z f(z 1 ], y +1 =Ay, (7a (7b (7c where both the two weight matrices, A and Ã, are column stochastic matrix and satisfy the relationship: Ã = θi n + (1 θa with some θ (0, 1 ], and when = 0, Eq. (7b changes to x 1 = (A I p x 0 α f(z 0. Consider Eq. (7a, we obtain x = ( D I p z,. (8 Therefore, it follows that DEXTRA can be represented in one single equation: ( D +1 I p z +1 = [ (I n + AD I p ] z (ÃD 1 I p z 1 α [ f(z f(z 1 ]. We refer to our algorithm DEXTRA, since Eq. (9 has a similar form as the EXTRA algorithm, Eq. (, and can be used to solve Problem P1 in the case of directed graph. We state our main result in Section III, showing that as time goes to infinity, the iteration in Eq. (9 push z to achieve consensus and reach the optimal solution. Our proof in this paper will based on the form, Eq. (9, of DEXTRA. (9 C. Interpretation of DEXTRA In this section, we give an intuitive interpretation on DEXTRA s convergence to the optimal point before we show the formal proof which will appear in Section IV, V. Note that the analysis given here is not a formal proof of DEXTRA. However, it gives a snapshot on how DEXTRA wors. Since A is a column stochastic matrix, the sequence of iterates, { y }, generated by Eq. (7c converges to the span of A s right eigenvector corresponding to eigenvalue 1, i.e., y = u π for some number u, where π is the right eigenvector of A corresponding to eigenvalue 1. We assume that the sequence of iterates, { z } and { x }, generated by DEXTRA, Eq. (7 or (9, converges

9 9 to their own limit points, z, x, respectively, (which might not be the truth. According to the updating rule in Eq. (7b, the limit point x satisfies x =x + (A I p x (Ã I px α [ f(z f(z ], (10 which implies that [(A Ã I p]x = 0 np, or x = π v for some vector v R p. Therefore, it follows from Eq. (7a that z = ( [D ] 1 I p (π v {1n v}, (11 where the consensus is reached. The above analysis reveals the idea of DEXTRA to overcome the imbalance of agent states occurred when the graph is directed: both x and y lie in the span of π. By dividing x over y, the imbalance is canceled. By summing up the updates in Eq. (7b over, we obtain that 1 ] x +1 = (A I p x α f(z [(Ã A I p x r. Consider that x = π v and the preceding relation. It follows that the limit point z satisfies ] α f(z = [(Ã A I p x r. (1 Therefore, we obtain that α (1 n I p f(z = r=0 [ 1 n r=0 ] (Ã A I p x r = 0 p, which is the optimality condition of Problem P1. Therefore, given the assumption that the sequence of DEXTRA iterates, { z } and { x }, have limit points, z and x, the limit point, z, achieves consensus and reaches the optimal solution of Problem P1. In next section, we state our main result of this paper and we leave the formal proof to Section IV, V. r=0 III. ASSUMPTIONS AND MAIN RESULTS With appropriate assumptions, our main result states that DEXTRA converges to the optimal solution of Problem P1 linearly. In this paper, we assume that the agent graph, G, is stronglyconnected; each local function, f i : R p R, is convex and differentiable, i V, and the optimal solution of Problem P1 and the corresponding optimal value exist. Formally, we denote the optimal solution by φ R p and optimal value by f, i.e., f(φ = f = min f(x. (13 x Rp

10 10 Besides the above assumptions, we emphasize some other assumptions regarding the objective functions and weighting matrices, which are formally presented as follow. Assumption A1 (Functions and Gradients. For all agent i, each function, f i, is differentiable and satisfies the following assumptions. (a The function, f i, has Lipschitz gradient with the constant L fi, i.e., f i (x f i (y L fi x y, x, y R p. (b The gradient, f i (x, is bounded, i.e., f i (x B fi, x R p. (c The function, f i, is strictly convex with a constant i, i.e., i x y f i (x f i (y, x y, x, y R p. Following Assumption A1, we have for any x, y R np, f(x f(y L f x y, f(x B f, x y f(x f(y, x y, (14a (14b (14c where the constants B f = max i {B fi }, L f = max i {L fi }, and = max i {i }. By combining Eqs. (14b and (14c, we obtain which is x y 1 f(x f(y B f, x B f, (15 by letting y = 0 np. Eq. (15 reveals that the sequences updated by DEXTRA is bounded. Recall the definition of D in Eq. (6, we denote the limits of D by D, i.e., D = lim D = diag (A 1 n = n diag (π, (16 where π is the normalized right eigenvector of A corresponding to eigenvalue 1. The next assumption is related to the weighting matrices, A and Ã, and D. Assumption A (Weighting matrices. The weighting matrices, A and Eq. (7 or (9, satisfy the following. (a A is a column stochastic matrix. Ã, used in DEXTRA, (b à is a column stochastic matrix and satisfies à = θi n + (1 θa, for some θ (0, 1 ].

11 11 (c (D 1 à + à (D 1 0. To ensure (D 1 à + à (D 1 0, the agent i can assign a large value to its own weight a ii in the implementation of DEXTRA, (such that the weighting matrices A and à are more lie diagonally-dominant matrices. Then all eigenvalues of (D 1 à + à (D 1 can be positive. Before we present the main result of this paper, we define some auxiliary sequences. Let z R np be the collection of optimal solution, φ, of Problem P1, i.e., z = 1 n φ, (17 and q R np some vector satisfying [(à A I p ] q + α f(z = 0 np. (18 Introduce auxiliary sequence and for each q q = x r, (19 r=0 ( t D I p z =, (D I p z t =, [à (D 1] I p G = q [ ( ] (D 1 à A I p, (0 We are ready to state the main result as shown in Theorem 1, which maes explicit the rate at which the objective function converges to its optimal value. Theorem 1. Let Assumptions A1, A hold. Denote P = I n A, L = à A, N = (D 1 (à A, { M = (D 1 Ã, d max = max D } {, d max = max (D 1 }, and d = (D 1. ( Define ρ 1 = 4(1 θ λ max P P (à + 8(αL f d max, and ρ = 4λ max à + 8(αL f d max, where α is some constant regarded as a ceiling of α. Then with proper step size α [α min, α max ], there exists, δ > 0, Γ <, and γ (0, 1, such that the sequence { t } generated by DEXTRA satisfies t t G (1 + δ t +1 t G Γγ. (1

12 1 The lower bound, α min, of α is and the upper bound, α max, of α is ( λ min α max = α min = M+M 1 + σλmax(nn σ λ min (L L ρ 1 (d max η σλ max (, MM σλmax(nn ρ λ min (L L, L f η (d maxd where, η and σ, are arbitrary constants to ensure α min > 0 and α max > 0, i.e., η < σ < λ min(m+m ( 1 MM λmax + λmax(nn λ min (L ρ L., and d max We will show the proof of Theorem 1 in Section V, with the help of Lemmas 1, 3, and 4 in Section IV. Note that the result of Theorem 1, Eq. (1, does not mean the convergence of DEXTRA to the optimal solution. To proof t t, we further need to show that the matrix G satisfying a G 0, a Rnp. With Assumption A(c, we already have a à (D 1 0, a R n. It is left to show that a (D 1 (à A 0, a Rn, which will be presented in Section IV. Assume that we have shown that a (D 1 (à A 0, a Rn. Since γ (0, 1, then we can say that t t G converges to 0 at the R-linear rate O((1 + δ for some δ (0, δ. Consequently, z z à (D 1 converges to 0 at the R-linear rate O((1 + δ for some δ (0, δ. Again with Assumption A(c, it follows that z z with a R-linear rate. In Theorem 1, we have given the specific value of α min and α max. We represent α min = αt min, α d min and α max = αt max for the sae of simplicity. To ensure α α d min < α max, the objective function needs max to have the strong convexity constant satisfying ( α (d max t min αmax d + η ( αmax t It follows from Eq. ( that in order to achieve a linear convergence rate for DEXTRA algorithm, the objective function should not only be strongly convex, but also with the constant above some threshold. It is beyond the scope of this paper on how to design the weighting matrix, A, which influences the eigenvalues in the representation of α min and α max, such that the threshold of can be close to zero. In real situation, numerical experiments illustrate that practically can be really small, i.e., DEXTRA have all agents converge to the optimal solution with an R-linear rate for almost all the strongly convex functions.. Note that the values of α min and α max need the nowledge of networ topology, which is not available in a distributed manner. It is an open question on how to avoid the global nowledge

13 13 of networ topology when designing the value of α. In numerical experiments, however, the range of α is much wider than the theoretical value. IV. BASIC RELATIONS We will present the complete proof of Theorem 1 in Section V, which will rely on three lemmas stated in this section. For the proof from now on, we will assume that the sequences updated by DEXTRA have only one dimension, i.e., z i, x i R, i,. For x i, z i R p being p-dimensional vectors, the proof is same for every dimension by applying the results to each coordinate. Thus, assuming p = 1 is without the loss of generality. The advantage is that with p = 1 we get rid of the Kronecer product in equations, which is easier to follow. Let p = 1, and rewrite DEXTRA, Eq. (9, D +1 z +1 =(I n + AD z ÃD 1 z 1 α [ f(z f(z 1 ]. (3 We begin to introduce three main lemmas in this section. Recall the definition of q, q, in Eqs. (19 and (18, and D, D, in Eqs. (6 and (16. Lemma 1 establishes the relations among D z, q, D z, and q. Lemma 1. Let Assumptions A1, A hold. In DEXTRA, the qradruple sequence {D z, q, D z, q } obeys for any = 0, 1,. (I n + A Ã ( D +1 z +1 D z + Ã ( D +1 z +1 D z = (Ã A (q +1 q α [ f(z f(z ], (4 Proof. We sum DEXTRA, Eq. (3, over time from 0 to, D +1 z +1 = ÃD z α f(z (Ã A D r z r. By substracting r=0 (Ã A D +1 z +1 on both sides of the preceding equation, and rearrange the terms, it follows that ( I n + A Ã D +1 z +1 + Ã ( D +1 z +1 D z = (Ã A q +1 α f ( z. (5 ( Since I n + A Ã π = 0 n, where π is the right eigenvector of A corresponding to eigenvalue 1, we have ( I n + A Ã D z = 0 n. (6

14 14 By combining Eqs. (5 and (6, and noting that obtain the desired result. (Ã A q + α f(z = 0 n, Eq. (18, we Before presenting Lemma 3, we introduce an existing result on the special structure of columnstochastic matrices, which helps efficiently on our proof. Specifically, we have the following property of the weighting matrix, A and Ã, used in DEXTRA. Lemma. (Nedic et al. [0] For any column stochastic matrix A R n n, we have (a The limit lim [ A ] exists and lim [ A ] ij = π i, where π is the right eigenvector of A corresponding to eigenvalue 1. (b For all i {1,, n}, the entries [ A ] and π ij i satisfy [ A ] π ij i < Cλ, j, where we can have C = 4 and λ = (1 1 n n. The convergence properties of the weighting matrix, A, help to build the relation between D +1 z +1 D z and z +1 z. There is also similar relations between D +1 z +1 D z and z +1 z. Lemma 3. Let Assumptions A1, A hold. Denote d max = max { D }, d max = max { (D 1 }. The sequences, D +1 z +1 D z, z +1 z, D +1 z +1 D z, and z +1 z, generated by DEXTRA, satisfies the following relations. (a (b (c Proof. (a z +1 z d max D +1 z +1 D z + 4d maxncb f λ. z +1 z d max D +1 z +1 D z + d maxncb f λ. D +1 z +1 D z dmax z +1 z + ncb f λ. z +1 z ( = D +1 1 ( ( D +1 z +1 z ( D +1 1 D +1 z +1 D z + D z D +1 z

15 15 d max D +1 z +1 D z + d max D D +1 z. d max D +1 z +1 D z 4d + maxncb f λ, where the last inequality follows by applying Lemma and the boundedness of the sequences updated by DEXTRA, Eq. (15. Similarly, we can proof the result of part (b. The proof of part (c follows: D +1 z +1 D z = D +1 z +1 D +1 z + D +1 z D z d max z +1 z + D +1 D z. d max z +1 z + ncb f λ. Lemmas 1 and 3 will be ( used in the proof of Theorem 1 in Section V. We now analysis the property of matrix (D 1 à A. As we have stated in Section III, the result of Theorem 1 itself can not show that the sequence, { z }, generated by DEXTRA converges, to the optimal solution, z, i.e., z z. To proof this, we still need to show that a (D 1 0, a (à A R n. This property can be easily proved by directly applying the following lemma. Lemma 4. (Chung. [31] The Laplacian, L, of a directed graph, G, is defined by L(G = I n S1/ RS 1/ + S 1/ R S 1/, (7 where R is the transition probability matrix and S = diag(s, where s is the left eigenvector of R corresponding to eigenvalue 1. If G is strongly connected, then the eigenvalues of L(G satisfy 0 = λ 0 < λ 1 < < λ n. Since (D 1 (I n A + (I n A (D 1 = (D 1/ L(G (D 1/, it follows from Lemma 4 that for any a R n, a (D 1 (I n A = a (D 1 (I n A+(I n A (D 1 0. Noting that à A = θ(i n A, and I n + A à = (1 θ(i n A, and θ (0, 1 ], we also have the following relations. a (D 1 (à A 0, a Rn, (8

16 16 a (D 1 (I n+a à 0, a Rn, (9 and λ min ((D ( 1 à A + (à A (D 1 = 0, λ min ((D 1 (I n + A à + (I n + A à (D 1 = 0, Therefore, Eq. (8, together with the result of Theorem 1, Eq. (1 prove the convergence of DEXTRA to the optimal solution of Problem P1. What is left to prove is Theorem 1, which is presented in the next section. V. PROOF OF THEOREM 1 With all the pieces in place, we are finally ready to prove Theorem 1. According to the strong convexity assumption, A1(c, we have α z +1 z α D z +1 D z, (D 1 [ f(z +1 f(z ], =α D z +1 D +1 z +1, (D 1 [ f(z +1 f(z ] + α D +1 z +1 D z, (D 1 [ f(z +1 f(z ] + D +1 z +1 D z, (D 1 α [ f(z f(z ] (30 :=s 1 + s + s 3, where s 1, s, s 3 denote each of RHS terms in Eq. (30. We discuss each term in sequence. Denote d max = max { D }, d max = max { (D 1 }, and d = (D 1, which are all bounded due to the convergence of the weighting matrix, A. For the first two terms, we have s 1 α D D +1 z +1 (D 1 f(z +1 f(z 8αncB f d λ ; (31 s αη D +1 z +1 D z + αl f η αη D +1 z +1 D z + αl f (d d max (D 1 z +1 z η D +1 z +1 D z + 3α(ncL fb f d d max λ ; (3 ηsf

17 17 Recall Lemma 1, Eq. (4, we can represent s 3 as s 3 = D +1 z +1 D z (D 1 (I n+a à + D +1 z +1 D z, (D 1 à ( D z D +1 z +1 ( (q + D +1 z +1 D z, (D 1 à A q +1 :=s 3a + s 3b + s 3c, (33 where and s 3b = D +1 z +1 D z, à (D 1 ( D z D +1 z +1 (34 ( (q s 3c = D +1 z +1, (D 1 à A q +1 ( (q = q +1 q, (D 1 à A q +1. (35 The first equality in Eq. (35 holds because we have (à A (D 1 D z = 0 n. By substituting Eqs. (34 and (35 into (33, and recalling the definition of t, t, G in Eq. (0, we obtain s 3 = D +1 z +1 D z (D 1 (I n+a à + t +1 t, G ( t t +1. (36 By applying the basic inequality t +1 t, G ( t t +1 and note that = t t G t +1 t G t +1 t G G ( t +1 t, t t +1, (37 t +1 t = G D +1 z +1 D z à (D q +1 q 1 (D 1 (à A where the inequality holds due to Eq. (8, and D +1 z +1 D z à (D 1, (38 G ( t +1 t, t t +1 à = (D 1 ( D +1 z +1 D z, D z D +1 z +1 D +1 z +1 D z, (à A (D 1 ( q q +1 σ D +1 z +1 D z + 1 (D 1 Ãà (D D +1 z +1 D z 1 σ

18 18 + σ q q +1 (D 1 (à A(à A (D 1, (39 we are able to bound s 3 with all terms of matrix norms by combining Eqs. (36, (37, (38, and (39, s 3 D +1 z +1 D z (D 1 (I n+a à + t t G t +1 t G + D +1 z +1 D z + à (D 1 +σ(d 1 Ãà (D D +1 z +1 D z 1 σ + σ q q +1 (D 1 (à A(à A (D 1. (40 It follows from Lemma 3(c that D +1 z +1 D z d max z +1 z 8(nCB f + λ. (41 By combining Eqs. (41 and (30, we obtain α D +1 z +1 D z s d 1 + s + s 3 + 8α(nCB f λ. (4 max d max Finally, by plugging the bounds of s 1, Eq. (31, s, Eq. (3, and s 3, Eq. (40, into Eq. (4, and rearranging the equations, it follows that t t G t +1 t G D +1 z +1 D z ( α Sf d η I n 1 σ In+(D 1 (I n+a à max + D +1 z +1 D z M σ MM αl f (d d max σ [ 4α(nCB f η I n S f + 8αncB f d + 16α(ncL fb f d d max ηs f d max q q +1, (43 NN where we denote M = (D 1 Ã, and N = (D 1 (à A for simplicity in representation. In order to establish the result of Theorem 1, Eq. (1, i.e., t t G (1+δ t +1 t G Γγ. it remains to show that the RHS of Eq. (43 is no less than δ t +1 t G Γγ. This is equivalent to show that D +1 z +1 D z α ( Sf d η max ] λ I n 1 σ In+(D 1 (I n+a à δm + D +1 z +1 D z M σ MM αl f (d d max η I n

19 19 From Lemma 1, we have [ 4α(nCB f d max ] + 8αncB f d + 16α(ncL fb f d d max λ ηsf q q +1 σ NN +δn Γλ. (44 q q +1 (à A (à A (à (q = A q +1 = (I n + A Ã(D+1 z +1 D z + α[ f(z +1 f(z ] + Ã(D+1 z +1 D z + α[ f(z f(z +1 ] ( D 4 +1 z +1 D z + (1 θ P P α L f z +1 z ( D z +1 D z à à + α L f z +1 z D +1 z +1 D z 4(1 θ P P +8(αL f d max I n + D +1 z +1 D z 4à Ã+8(αL f d max I n ( αlf d maxncb f λ, (45 where we denote P = I n A for simplicity in representation. Consider the analysis in Eqs. (8 ( and (9, we get λ 0. It is also true that λ ( NN 0, and λ ((à A (à A N+N 0. Therefore, we have the relation that 1 ( q q +1 σ λ max (NN N+N + δλ σ NN +δn max q q +1 1 ( q q +1 λ min (à A (à A (à A (à A, (46 where λ min ( gives the smallest nonzero eigenvalue. We denote L = (à A. By considering Eqs. (44, (45, and (46, and let Γ in Eq. (44 be [ ] 4α(nCB f Γ = + 8αncB f d + 16α(ncL fb f d d max d max + σ λ max ( ( NN N+N + δλ max λ min (L L ηs f ( αlf d 160 maxncb f,

20 0 it follows that in order to valid Eq. (44, we need to have the following relation be valid. D +1 z +1 D z ( α Sf I n 1 σ In+(D 1 (I n+a à δm d η max + D +1 z +1 D z M σ MM αl f (d d max σ λ max ( ( NN N+N + δλ max λ min (L L ( D +1 z +1 D z which can be implemented by letting ( ( α Sf η I d n 1 I max σ n δ η I n 4(1 θ P P +8(αL f d max I n + D +1 z +1 D z 4à Ã+8(αL f d max I n, M+M ( σ λmax(nn +δλ max λ min(l L N+N (4(1 θ P P + 8(αL f d max I n, M+M I n ( σ λmax(nn +δλ max σ MM αl f (d d max η N+N (4à à + 8(αL λ min(l L f d max I n. ( Denote ρ 1 = 4(1 θ λ max P P (à + 8(αL f d max, and ρ = 4λ max à + 8(αL f d max, where α is some constant regarded as a ceiling of α. Therefore, we have ( α Sf η 1 σλmax(nn d max σ λ min(l δ min L ρ 1 ( ( N+N λ max + λmax, λ min(l L ρ 1 λ min ( M+M M+M σλ ( max MM αl f (d d max η λ max ( N+N λ min(l L ρ To ensure δ > 0, we set the parameters, η, σ, to be η <, σ < d max σλ ( max NN ( λ max N+N. λ min(m+m ( 1 MM λmax + λmax(nn λ min (L ρ L and also α (α min, α max, where α min, and α max are the values given in Theorem 1. Therefore, we obtained the desired result., VI. NUMERICAL EXPERIMENTS This section provides numerical experiments to study the convergence rate of DEXTRA for a least squares problem in a directed graph. The local objective functions in the least squares

21 1 problems are strongly convex. We compare the performance of DEXTRA with other algorithms suited to the case of directed graph: GP as defined by [0, 1] and D-DGD as defined by [3]. Our second experiment verifies the existance of α min and α max, such that the proper stepsize α should be α [α min, α max ]. We also consider various networ topologies, to see how they effect the stepsize α. Convergence is studied in terms of the residual r = 1 n z n i φ, i=1 where φ is the optimal solution. The distributed least squares problem is described as follow. Each agent owns a private objective function, s i = R i x + n i, where s i R m i and M i R m i p are measured data, x R p is unnown states, and n i R m i is random unnown noise. The goal is to estimate x. This problem can be formulated as a distributed optimization problem solving min f(x = 1 n R i x s i. n We consider the networ topology as the digraphs shown in Fig. 1. i= G a G b G c Fig. 1: Three examples of strongly-connected but non-balanced digraphs. Our first experiment compares several algorithms with the choice of networ G a, illustrated in Fig. 1. The comparison of DEXTRA, GP, D-DGD and DGD with weighting matrix being row-stochastic is shown in Fig.. In this experiment, we set α = 0.1. The convergence rate of DEXTRA is linear as stated in Section III. G-P and D-DGD apply the same stepsize, α = α. As a result, the convergence rate of both is sub-linear. We also consider the DGD algorithm, but with the weighting matrix being row-stochastic. The reason is that in a directed graph, it is impossible to construct a doubly-stochastic matrix. As expected, DGD with row-stochastic matrix does not converge to the exact optimal solution while other three algorithms are suited to the case of directed graph.

22 As stated in Theorem 1, DEXTRA converges to the optimal solution when the stepsize α is in a proper interval. Fig. 3 illustrate this fact. The networ topology is chosen to be G a in Fig. 1. It is shown that when α = 0.01 or α = 1, DEXTRA does not converge. DEXTRA has a more restricted stepsize compared with EXTRA, see [19], where the value of stepsize can be as low as any value close to zero, i.e., α min = 0, when a symmetric and doubly-stochastic matrix is applied in EXTRA. The relative smaller range of interval is due to the fact that the weighting matrix applied in DEXTRA can not be symmetric Residual DGD with Row Stochastic Matrix GP D DGD DEXTRA Fig. : Comparison of different distributed optimization algorithms in a least squares problem. GP, D-DGD, and DEXTRA are proved to wor when the networ topology is described by digraphs. Moreover, DEXTRA has a much faster convergence rate compared with GP and D- DGD. In third experiments, we explore the relations between the range of stepsize and the networ topology. Fig. 4 shows the interval of proper stepsizes for the convergence of DEXTRA to optimal solution when the networ topology is chosen to be G a, G b, and G c in Fig. 1. The interval of proper stepsizes is related to the networ topology as stated in Theorem 1 in Section III. Compared Fig. 4 (b and (c with (a, a faster information diffusion on the networ can increase the range of stepsize interval. Given a fixed networ topology, e.g., see Fig. 4(a only, a larger stepsize increase the convergence speed of algorithms. When we compare Fig. 4(b with (a, it is interesting to see that α = 0.3 maes DEXTRA iterates about 190 times to converge with the

23 DEXTRA α=0.01 DEXTRA α=0.1 DEXTRA α= Residual Fig. 3: There exists some interval for the stepsize to have DEXTRA converge to the optimal solution. choice of G a while the same stepsize needs DEXTRA iterates 30 times to converge when the topology changes to G b. We claim that a faster information diffusion on the networ fasten the convergence speed. VII. CONCLUSIONS We introduce DEXTRA, a distributed algorithm to solve multi-agent optimization problems over directed graphs. We have shown that DEXTRA succeeds in driving all agents to the same point, which is the optimal solution of the problem, given that the communication graph is strongly-connected and the objective functions are strongly convex. Moreover, the algorithm converges at a linear rate O(ɛ for some constant ɛ < 1. The constant depends on the networ topology G, as well as the Lipschitz constant, strong convexity constant, and gradient norm of objective functions. Numerical experiments are conducted for a least squares problem, where we show that DEXTRA is the fastest distributed algorithm among all algorithms suited to the case of directed graphs, compared with GP, and D-DGD. Our analysis also explicitly gives the range of interval, which needs the nowledge of networ topology, of stepsize to have DEXTRA converge to the optimal solution. Although numerical experiments show that the practical range of interval are much bigger, it is still an open question

24 G a, α [0.03, 0.3] 10 0 Residual G b, α [10 10, 0.3] 10 0 Residual G c, α [10 10, 0.4] 10 0 Residual Fig. 4: There exists some interval for the stepsize to have DEXTRA converge to the optimal solution. The upperbound and lowerbound of the interval depends on the networ topology, Lipschitz constant, strong convexity constant, and gradient norm of objective functions. (a G a ; (b G b ; (c G c. Given a fixed topology, the convergence speed increases as α increases.

25 5 on how to get rid of the nowlege of networ topology, which is a global information in a distributed setting. REFERENCES [1] V. Cevher, S. Becer, and M. Schmidt, Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics, IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 3 43, 014. [] S. Boyd, N. Parih, E. Chu, B. Peleato, and J. Ecstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundation and Trends in Maching Learning, vol. 3, no. 1, pp. 1 1, Jan [3] I. Necoara and J. A. K. Suyens, Application of a smoothing technique to decomposition in convex optimization, IEEE Transactions on Automatic Control, vol. 53, no. 11, pp , Dec [4] G. Mateos, J. A. Bazerque, and G. B. Giannais, Distributed sparse linear regression, IEEE Transactions on Signal Processing, vol. 58, no. 10, pp , Oct [5] J. A. Bazerque and G. B. Giannais, Distributed spectrum sensing for cognitive radio networs by exploiting sparsity, IEEE Transactions on Signal Processing, vol. 58, no. 3, pp , March 010. [6] M. Rabbat and R. Nowa, Distributed optimization in sensor networs, in 3rd International Symposium on Information Processing in Sensor Networs, Bereley, CA, Apr. 004, pp [7] U. A. Khan, S. Kar, and J. M. F. Moura, Diland: An algorithm for distributed sensor localization with noisy distance measurements, IEEE Transactions on Signal Processing, vol. 58, no. 3, pp , Mar [8] C. L and L. Li, A distributed multiple dimensional qos constrained resource scheduling optimization policy in computational grid, Journal of Computer and System Sciences, vol. 7, no. 4, pp , 006. [9] G. Neglia, G. Reina, and S. Alouf, Distributed gradient optimization for epidemic routing: A preliminary evaluation, in nd IFIP in IEEE Wireless Days, Paris, Dec. 009, pp [10] A. Nedic and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp , Jan. 009.

26 6 [11] K. Yuan, Q. Ling, and W. Yin, On the convergence of decentralized gradient descent, arxiv preprint arxiv: , 013. [1] J. C. Duchi, A. Agarwal, and M. J. Wainwright, Dual averaging for distributed optimization: Convergence analysis and networ scaling, IEEE Transactions on Automatic Control, vol. 57, no. 3, pp , Mar. 01. [13] J. F. C. Mota, J. M. F. Xavier, P. M. Q. Aguiar, and M. Puschel, D-ADMM: A communication-efficient distributed algorithm for separable optimization, IEEE Transactions on Signal Processing, vol. 61, no. 10, pp , May 013. [14] E. Wei and A. Ozdaglar, Distributed alternating direction method of multipliers, in 51st IEEE Annual Conference on Decision and Control, Dec. 01, pp [15] W. Shi, Q. Ling, K Yuan, G Wu, and W Yin, On the linear convergence of the admm in decentralized consensus optimization, IEEE Transactions on Signal Processing, vol. 6, no. 7, pp , April 014. [16] D. Jaovetic, J. Xavier, and J. M. F. Moura, Fast distributed gradient methods, IEEE Transactions onautomatic Control, vol. 59, no. 5, pp , 014. [17] A. Mohatari, Q. Ling, and A. Ribeiro, Networ newton, aryanm/wii/nn.pdf, 014. [18] Q. Ling and A. Ribeiro, Decentralized linearized alternating direction method of multipliers, in IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 014, pp [19] W. Shi, Q. Ling, G. Wu, and W Yin, Extra: An exact first-order algorithm for decentralized consensus optimization, SIAM Journal on Optimization, vol. 5, no., pp , 015. [0] A. Nedic and A. Olshevsy, Distributed optimization over time-varying directed graphs, IEEE Transactions on Automatic Control, vol. PP, no. 99, pp. 1 1, 014. [1] K. I. Tsianos, The role of the Networ in Distributed Optimization Algorithms: Convergence Rates, Scalability, Communication/Computation Tradeoffs and Communication Delays, Ph.D. thesis, Dept. Elect. Comp. Eng. McGill University, 013. [] D. Kempe, A. Dobra, and J. Gehre, Gossip-based computation of aggregate information, in 44th Annual IEEE Symposium on Foundations of Computer Science, Oct. 003, pp [3] F. Benezit, V. Blondel, P. Thiran, J. Tsitsilis, and M. Vetterli, Weighted gossip: Distributed averaging using non-doubly stochastic matrices, in IEEE International Symposium on

27 7 Information Theory, Jun. 010, pp [4] A. Jadbabaie, J. Lim, and A. S. Morse, Coordination of groups of mobile autonomous agents using nearest neighbor rules, IEEE Transactions on Automatic Control, vol. 48, no. 6, pp , Jun [5] R. Olfati-Saber and R. M. Murray, Consensus problems in networs of agents with switching topology and time-delays, IEEE Transactions on Automatic Control, vol. 49, no. 9, pp , Sep [6] R. Olfati-Saber and R. M. Murray, Consensus protocols for networs of dynamic agents, in IEEE American Control Conference, Denver, Colorado, Jun. 003, vol., pp [7] L. Xiao, S. Boyd, and S. J. Kim, Distributed average consensus with least-mean-square deviation, Journal of Parallel and Distributed Computing, vol. 67, no. 1, pp , 007. [8] C. Xi, Q. Wu, and U. A. Khan, Distributed gradient descent over directed graphs, cxi01/paper/015d-dgd.pdf, 015. [9] C. Xi and U. A. Khan, Directed-distributed gradient descent, in 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, Oct [30] K. Cai and H. Ishii, Average consensus on general strongly connected digraphs, Automatica, vol. 48, no. 11, pp , 01. [31] F. Chung, Laplacians and the cheeger inequality for directed graphs, Annals of Combinatorics, vol. 9, no. 1, pp. 1 19, 005. [3] C. Xi, Q. Wu, and U. A. Khan, Distributed gradient descent over directed graphs, cxi01/paper/015d-dgd.pdf, 015.

On the linear convergence of distributed optimization over directed graphs

On the linear convergence of distributed optimization over directed graphs 1 On the linear convergence of distributed optimization over directed graphs Chenguang Xi, and Usman A. Khan arxiv:1510.0149v4 [math.oc] 7 May 016 Abstract This paper develops a fast distributed algorithm,

More information

ADD-OPT: Accelerated Distributed Directed Optimization

ADD-OPT: Accelerated Distributed Directed Optimization 1 ADD-OPT: Accelerated Distributed Directed Optimization Chenguang Xi, Student Member, IEEE, and Usman A. Khan, Senior Member, IEEE Abstract This paper considers distributed optimization problems where

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

Distributed Optimization over Networks Gossip-Based Algorithms

Distributed Optimization over Networks Gossip-Based Algorithms Distributed Optimization over Networks Gossip-Based Algorithms Angelia Nedić angelia@illinois.edu ISE Department and Coordinated Science Laboratory University of Illinois at Urbana-Champaign Outline Random

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Discrete-time Consensus Filters on Directed Switching Graphs

Discrete-time Consensus Filters on Directed Switching Graphs 214 11th IEEE International Conference on Control & Automation (ICCA) June 18-2, 214. Taichung, Taiwan Discrete-time Consensus Filters on Directed Switching Graphs Shuai Li and Yi Guo Abstract We consider

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

A Distributed Newton Method for Network Utility Maximization, I: Algorithm

A Distributed Newton Method for Network Utility Maximization, I: Algorithm A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order

More information

Distributed intelligence in multi agent systems

Distributed intelligence in multi agent systems Distributed intelligence in multi agent systems Usman Khan Department of Electrical and Computer Engineering Tufts University Workshop on Distributed Optimization, Information Processing, and Learning

More information

DECENTRALIZED algorithms are used to solve optimization

DECENTRALIZED algorithms are used to solve optimization 5158 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 19, OCTOBER 1, 016 DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Aryan Mohtari, Wei Shi, Qing Ling,

More information

Fast Linear Iterations for Distributed Averaging 1

Fast Linear Iterations for Distributed Averaging 1 Fast Linear Iterations for Distributed Averaging 1 Lin Xiao Stephen Boyd Information Systems Laboratory, Stanford University Stanford, CA 943-91 lxiao@stanford.edu, boyd@stanford.edu Abstract We consider

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Convergence rates for distributed stochastic optimization over random networks

Convergence rates for distributed stochastic optimization over random networks Convergence rates for distributed stochastic optimization over random networs Dusan Jaovetic, Dragana Bajovic, Anit Kumar Sahu and Soummya Kar Abstract We establish the O ) convergence rate for distributed

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Quantized Average Consensus on Gossip Digraphs

Quantized Average Consensus on Gossip Digraphs Quantized Average Consensus on Gossip Digraphs Hideaki Ishii Tokyo Institute of Technology Joint work with Kai Cai Workshop on Uncertain Dynamical Systems Udine, Italy August 25th, 2011 Multi-Agent Consensus

More information

Decentralized Consensus Optimization with Asynchrony and Delay

Decentralized Consensus Optimization with Asynchrony and Delay Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

Distributed Optimization over Random Networks

Distributed Optimization over Random Networks Distributed Optimization over Random Networks Ilan Lobel and Asu Ozdaglar Allerton Conference September 2008 Operations Research Center and Electrical Engineering & Computer Science Massachusetts Institute

More information

A Distributed Newton Method for Network Utility Maximization

A Distributed Newton Method for Network Utility Maximization A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility

More information

Weighted Gossip: Distributed Averaging Using Non-Doubly Stochastic Matrices

Weighted Gossip: Distributed Averaging Using Non-Doubly Stochastic Matrices Weighted Gossip: Distributed Averaging Using Non-Doubly Stochastic Matrices Florence Bénézit ENS-INRIA, France Vincent Blondel UCL, Belgium Patric Thiran EPFL, Switzerland John Tsitsilis MIT, USA Martin

More information

c 2015 Society for Industrial and Applied Mathematics

c 2015 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 5, No., pp. 944 966 c 05 Society for Industrial and Applied Mathematics EXTRA: AN EXACT FIRST-ORDER ALGORITHM FOR DECENTRALIZED CONSENSUS OPTIMIZATION WEI SHI, QING LING, GANG WU, AND

More information

A Distributed Newton Method for Network Optimization

A Distributed Newton Method for Network Optimization A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Convergence Rate for Consensus with Delays

Convergence Rate for Consensus with Delays Convergence Rate for Consensus with Delays Angelia Nedić and Asuman Ozdaglar October 8, 2007 Abstract We study the problem of reaching a consensus in the values of a distributed system of agents with time-varying

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

Finite-Time Distributed Consensus in Graphs with Time-Invariant Topologies

Finite-Time Distributed Consensus in Graphs with Time-Invariant Topologies Finite-Time Distributed Consensus in Graphs with Time-Invariant Topologies Shreyas Sundaram and Christoforos N Hadjicostis Abstract We present a method for achieving consensus in distributed systems in

More information

Consensus-Based Distributed Optimization with Malicious Nodes

Consensus-Based Distributed Optimization with Malicious Nodes Consensus-Based Distributed Optimization with Malicious Nodes Shreyas Sundaram Bahman Gharesifard Abstract We investigate the vulnerabilities of consensusbased distributed optimization protocols to nodes

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić February 7, 2017 Abstract We consider distributed optimization

More information

arxiv: v3 [math.oc] 1 Jul 2015

arxiv: v3 [math.oc] 1 Jul 2015 On the Convergence of Decentralized Gradient Descent Kun Yuan Qing Ling Wotao Yin arxiv:1310.7063v3 [math.oc] 1 Jul 015 Abstract Consider the consensus problem of minimizing f(x) = n fi(x), where x Rp

More information

Distributed Algorithms for Optimization and Nash-Games on Graphs

Distributed Algorithms for Optimization and Nash-Games on Graphs Distributed Algorithms for Optimization and Nash-Games on Graphs Angelia Nedić angelia@illinois.edu ECEE Department Arizona State University, Tempe, AZ 0 Large Networked Systems The recent advances in

More information

Fastest Distributed Consensus Averaging Problem on. Chain of Rhombus Networks

Fastest Distributed Consensus Averaging Problem on. Chain of Rhombus Networks Fastest Distributed Consensus Averaging Problem on Chain of Rhombus Networks Saber Jafarizadeh Department of Electrical Engineering Sharif University of Technology, Azadi Ave, Tehran, Iran Email: jafarizadeh@ee.sharif.edu

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS. Michael G. Rabbat

ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS. Michael G. Rabbat ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS Michael G. Rabbat Dept. of Electrical and Computer Engineering McGill University, Montréal, Québec Email: michael.rabbat@mcgill.ca ABSTRACT This paper

More information

Optimization over time-varying directed graphs with row and column-stochastic matrices

Optimization over time-varying directed graphs with row and column-stochastic matrices 1 Optimization over time-varying directed graphs ith ro and column-stochastic matrices Fahteh Saadatniai, Student Member, IEEE, Ran Xin, Student Member, IEEE, and Usman A Khan, Senior Member, IEEE arxiv:181007393v1

More information

Asymptotics, asynchrony, and asymmetry in distributed consensus

Asymptotics, asynchrony, and asymmetry in distributed consensus DANCES Seminar 1 / Asymptotics, asynchrony, and asymmetry in distributed consensus Anand D. Information Theory and Applications Center University of California, San Diego 9 March 011 Joint work with Alex

More information

Complex Laplacians and Applications in Multi-Agent Systems

Complex Laplacians and Applications in Multi-Agent Systems 1 Complex Laplacians and Applications in Multi-Agent Systems Jiu-Gang Dong, and Li Qiu, Fellow, IEEE arxiv:1406.186v [math.oc] 14 Apr 015 Abstract Complex-valued Laplacians have been shown to be powerful

More information

Constrained Consensus and Optimization in Multi-Agent Networks

Constrained Consensus and Optimization in Multi-Agent Networks Constrained Consensus Optimization in Multi-Agent Networks The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher

More information

arxiv: v1 [math.oc] 24 Dec 2018

arxiv: v1 [math.oc] 24 Dec 2018 On Increasing Self-Confidence in Non-Bayesian Social Learning over Time-Varying Directed Graphs César A. Uribe and Ali Jadbabaie arxiv:82.989v [math.oc] 24 Dec 28 Abstract We study the convergence of the

More information

Distributed Coordinated Tracking With Reduced Interaction via a Variable Structure Approach Yongcan Cao, Member, IEEE, and Wei Ren, Member, IEEE

Distributed Coordinated Tracking With Reduced Interaction via a Variable Structure Approach Yongcan Cao, Member, IEEE, and Wei Ren, Member, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 1, JANUARY 2012 33 Distributed Coordinated Tracking With Reduced Interaction via a Variable Structure Approach Yongcan Cao, Member, IEEE, and Wei Ren,

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Decentralized Estimation of the Algebraic Connectivity for Strongly Connected Networks

Decentralized Estimation of the Algebraic Connectivity for Strongly Connected Networks Decentralized Estimation of the Algebraic Connectivity for Strongly Connected Networs Hasan A Poonawala and Mar W Spong Abstract The second smallest eigenvalue λ 2 (L) of the Laplacian L of a networ G

More information

Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks

Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks 06 IEEE International Conference on Big Data (Big Data) Consensus Optimization with Delayed and Stochastic Gradients on Decentralized Networks Benjamin Sirb Xiaojing Ye Department of Mathematics and Statistics

More information

Research on Consistency Problem of Network Multi-agent Car System with State Predictor

Research on Consistency Problem of Network Multi-agent Car System with State Predictor International Core Journal of Engineering Vol. No.9 06 ISSN: 44-895 Research on Consistency Problem of Network Multi-agent Car System with State Predictor Yue Duan a, Linli Zhou b and Yue Wu c Institute

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Distributed Consensus and Linear Functional Calculation in Networks: An Observability Perspective

Distributed Consensus and Linear Functional Calculation in Networks: An Observability Perspective Distributed Consensus and Linear Functional Calculation in Networks: An Observability Perspective ABSTRACT Shreyas Sundaram Coordinated Science Laboratory and Dept of Electrical and Computer Engineering

More information

Convergence Rates in Decentralized Optimization. Alex Olshevsky Department of Electrical and Computer Engineering Boston University

Convergence Rates in Decentralized Optimization. Alex Olshevsky Department of Electrical and Computer Engineering Boston University Convergence Rates in Decentralized Optimization Alex Olshevsky Department of Electrical and Computer Engineering Boston University Distributed and Multi-agent Control Strong need for protocols to coordinate

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Newton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems

Newton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems Newton s Method for Constrained Norm Minimization and Its Application to Weighted Graph Problems Mahmoud El Chamie 1 Giovanni Neglia 1 Abstract Due to increasing computer processing power, Newton s method

More information

Average-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control

Average-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control Outline Background Preliminaries Consensus Numerical simulations Conclusions Average-Consensus of Multi-Agent Systems with Direct Topology Based on Event-Triggered Control Email: lzhx@nankai.edu.cn, chenzq@nankai.edu.cn

More information

arxiv: v2 [cs.dc] 2 May 2018

arxiv: v2 [cs.dc] 2 May 2018 On the Convergence Rate of Average Consensus and Distributed Optimization over Unreliable Networks Lili Su EECS, MI arxiv:1606.08904v2 [cs.dc] 2 May 2018 Abstract. We consider the problems of reaching

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Measurement partitioning and observational. equivalence in state estimation

Measurement partitioning and observational. equivalence in state estimation Measurement partitioning and observational 1 equivalence in state estimation Mohammadreza Doostmohammadian, Student Member, IEEE, and Usman A. Khan, Senior Member, IEEE arxiv:1412.5111v1 [cs.it] 16 Dec

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

An Optimization-based Approach to Decentralized Assignability

An Optimization-based Approach to Decentralized Assignability 2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract

More information

Stability Analysis of Stochastically Varying Formations of Dynamic Agents

Stability Analysis of Stochastically Varying Formations of Dynamic Agents Stability Analysis of Stochastically Varying Formations of Dynamic Agents Vijay Gupta, Babak Hassibi and Richard M. Murray Division of Engineering and Applied Science California Institute of Technology

More information

Resilient Distributed Optimization Algorithm against Adversary Attacks

Resilient Distributed Optimization Algorithm against Adversary Attacks 207 3th IEEE International Conference on Control & Automation (ICCA) July 3-6, 207. Ohrid, Macedonia Resilient Distributed Optimization Algorithm against Adversary Attacks Chengcheng Zhao, Jianping He

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Agreement algorithms for synchronization of clocks in nodes of stochastic networks

Agreement algorithms for synchronization of clocks in nodes of stochastic networks UDC 519.248: 62 192 Agreement algorithms for synchronization of clocks in nodes of stochastic networks L. Manita, A. Manita National Research University Higher School of Economics, Moscow Institute of

More information

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 4, AUGUST

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 4, AUGUST IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 4, AUGUST 2011 833 Non-Asymptotic Analysis of an Optimal Algorithm for Network-Constrained Averaging With Noisy Links Nima Noorshams Martin

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Randomized Gossip Algorithms

Randomized Gossip Algorithms 2508 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 52, NO 6, JUNE 2006 Randomized Gossip Algorithms Stephen Boyd, Fellow, IEEE, Arpita Ghosh, Student Member, IEEE, Balaji Prabhakar, Member, IEEE, and Devavrat

More information

ON SEPARATION PRINCIPLE FOR THE DISTRIBUTED ESTIMATION AND CONTROL OF FORMATION FLYING SPACECRAFT

ON SEPARATION PRINCIPLE FOR THE DISTRIBUTED ESTIMATION AND CONTROL OF FORMATION FLYING SPACECRAFT ON SEPARATION PRINCIPLE FOR THE DISTRIBUTED ESTIMATION AND CONTROL OF FORMATION FLYING SPACECRAFT Amir Rahmani (), Olivia Ching (2), and Luis A Rodriguez (3) ()(2)(3) University of Miami, Coral Gables,

More information

Alternative Characterization of Ergodicity for Doubly Stochastic Chains

Alternative Characterization of Ergodicity for Doubly Stochastic Chains Alternative Characterization of Ergodicity for Doubly Stochastic Chains Behrouz Touri and Angelia Nedić Abstract In this paper we discuss the ergodicity of stochastic and doubly stochastic chains. We define

More information

Learning distributions and hypothesis testing via social learning

Learning distributions and hypothesis testing via social learning UMich EECS 2015 1 / 48 Learning distributions and hypothesis testing via social learning Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey September 29, 2015

More information

Asymmetric Information Diffusion via Gossiping on Static And Dynamic Networks

Asymmetric Information Diffusion via Gossiping on Static And Dynamic Networks Asymmetric Information Diffusion via Gossiping on Static And Dynamic Networks Ercan Yildiz 1, Anna Scaglione 2, Asuman Ozdaglar 3 Cornell University, UC Davis, MIT Abstract In this paper we consider the

More information

arxiv: v2 [eess.sp] 20 Nov 2017

arxiv: v2 [eess.sp] 20 Nov 2017 Distributed Change Detection Based on Average Consensus Qinghua Liu and Yao Xie November, 2017 arxiv:1710.10378v2 [eess.sp] 20 Nov 2017 Abstract Distributed change-point detection has been a fundamental

More information

Consensus Problems on Small World Graphs: A Structural Study

Consensus Problems on Small World Graphs: A Structural Study Consensus Problems on Small World Graphs: A Structural Study Pedram Hovareshti and John S. Baras 1 Department of Electrical and Computer Engineering and the Institute for Systems Research, University of

More information

Lecture 1 and 2: Introduction and Graph theory basics. Spring EE 194, Networked estimation and control (Prof. Khan) January 23, 2012

Lecture 1 and 2: Introduction and Graph theory basics. Spring EE 194, Networked estimation and control (Prof. Khan) January 23, 2012 Lecture 1 and 2: Introduction and Graph theory basics Spring 2012 - EE 194, Networked estimation and control (Prof. Khan) January 23, 2012 Spring 2012: EE-194-02 Networked estimation and control Schedule

More information

MULTI-AGENT TRACKING OF A HIGH-DIMENSIONAL ACTIVE LEADER WITH SWITCHING TOPOLOGY

MULTI-AGENT TRACKING OF A HIGH-DIMENSIONAL ACTIVE LEADER WITH SWITCHING TOPOLOGY Jrl Syst Sci & Complexity (2009) 22: 722 731 MULTI-AGENT TRACKING OF A HIGH-DIMENSIONAL ACTIVE LEADER WITH SWITCHING TOPOLOGY Yiguang HONG Xiaoli WANG Received: 11 May 2009 / Revised: 16 June 2009 c 2009

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu Thanks to: Aryan Mokhtari, Mark

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems 1 Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems Mauro Franceschelli, Andrea Gasparri, Alessandro Giua, and Giovanni Ulivi Abstract In this paper the formation stabilization problem

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Distributed Computation of Quantiles via ADMM

Distributed Computation of Quantiles via ADMM 1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.

More information

arxiv: v1 [math.oc] 29 Sep 2018

arxiv: v1 [math.oc] 29 Sep 2018 Distributed Finite-time Least Squares Solver for Network Linear Equations Tao Yang a, Jemin George b, Jiahu Qin c,, Xinlei Yi d, Junfeng Wu e arxiv:856v mathoc 9 Sep 8 a Department of Electrical Engineering,

More information

Graph Weight Design for Laplacian Eigenvalue Constraints with Multi-Agent Systems Applications

Graph Weight Design for Laplacian Eigenvalue Constraints with Multi-Agent Systems Applications 211 5th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) Orlando, FL, USA, December 12-15, 211 Graph Weight Design for Laplacian Eigenvalue Constraints with Multi-Agent

More information

Push-sum Algorithm on Time-varying Random Graphs

Push-sum Algorithm on Time-varying Random Graphs Push-sum Algorithm on Time-varying Random Graphs by Pouya Rezaeinia A thesis submitted to the Department of Mathematics and Statistics in conformity with the requirements for the degree of Master of Applied

More information

On Quantized Consensus by Means of Gossip Algorithm Part I: Convergence Proof

On Quantized Consensus by Means of Gossip Algorithm Part I: Convergence Proof Submitted, 9 American Control Conference (ACC) http://www.cds.caltech.edu/~murray/papers/8q_lm9a-acc.html On Quantized Consensus by Means of Gossip Algorithm Part I: Convergence Proof Javad Lavaei and

More information

arxiv: v2 [math.oc] 7 Apr 2017

arxiv: v2 [math.oc] 7 Apr 2017 Optimal algorithms for smooth and strongly convex distributed optimization in networks arxiv:702.08704v2 [math.oc] 7 Apr 207 Kevin Scaman Francis Bach 2 Sébastien Bubeck 3 Yin Tat Lee 3 Laurent Massoulié

More information

Digraph Fourier Transform via Spectral Dispersion Minimization

Digraph Fourier Transform via Spectral Dispersion Minimization Digraph Fourier Transform via Spectral Dispersion Minimization Gonzalo Mateos Dept. of Electrical and Computer Engineering University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/

More information

Consensus on multi-time Scale digraphs

Consensus on multi-time Scale digraphs Consensus on multi-time Scale digraphs Qiong Wu a,, Chenguang Xi b a Department of Mathematics, Tufts University, 503 Boston Ave, Medford, MA 0255 b Department of Electrical and Computer Engineering, Tufts

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Graph Theoretic Methods in the Stability of Vehicle Formations

Graph Theoretic Methods in the Stability of Vehicle Formations Graph Theoretic Methods in the Stability of Vehicle Formations G. Lafferriere, J. Caughman, A. Williams gerardol@pd.edu, caughman@pd.edu, ancaw@pd.edu Abstract This paper investigates the stabilization

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Distributed Control of the Laplacian Spectral Moments of a Network

Distributed Control of the Laplacian Spectral Moments of a Network Distributed Control of the Laplacian Spectral Moments of a Networ Victor M. Preciado, Michael M. Zavlanos, Ali Jadbabaie, and George J. Pappas Abstract It is well-nown that the eigenvalue spectrum of the

More information

On Quantized Consensus by Means of Gossip Algorithm Part II: Convergence Time

On Quantized Consensus by Means of Gossip Algorithm Part II: Convergence Time Submitted, 009 American Control Conference (ACC) http://www.cds.caltech.edu/~murray/papers/008q_lm09b-acc.html On Quantized Consensus by Means of Gossip Algorithm Part II: Convergence Time Javad Lavaei

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson 204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization

More information