Distributed Optimization With Local Domains: Applications in MPC and Network Flows

Similar documents
A Brief Introduction to Markov Chains and Hidden Markov Models

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

Problem set 6 The Perron Frobenius theorem.

CS229 Lecture notes. Andrew Ng

Stochastic Variational Inference with Gradient Linearization

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

A. Distribution of the test statistic

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

Communication-Efficient Algorithms For Distributed Optimization

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Primal and dual active-set methods for convex quadratic programming

Distributed average consensus: Beyond the realm of linearity

8 Digifl'.11 Cth:uits and devices

XSAT of linear CNF formulas

Approximated MLC shape matrix decomposition with interleaf collision constraint

Explicit overall risk minimization transductive bound

Cryptanalysis of PKP: A New Approach

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

Separation of Variables and a Spherical Shell with Surface Charge

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

FRIEZE GROUPS IN R 2

arxiv: v1 [math.co] 17 Dec 2018

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

BALANCING REGULAR MATRIX PENCILS

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

Lecture Note 3: Stationary Iterative Methods

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Asynchronous Control for Coupled Markov Decision Systems

A Statistical Framework for Real-time Event Detection in Power Systems

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Carnegie Mellon University

Efficiently Generating Random Bits from Finite State Markov Chains

Math 124B January 17, 2012

$, (2.1) n="# #. (2.2)

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

The Group Structure on a Smooth Tropical Cubic

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

A proposed nonparametric mixture density estimation using B-spline functions

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

Lecture 6 Povh Krane Enge Williams Properties of 2-nucleon potential

Error-free Multi-valued Broadcast and Byzantine Agreement with Optimal Communication Complexity

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES

6 Wave Equation on an Interval: Separation of Variables

Approximated MLC shape matrix decomposition with interleaf collision constraint

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

Recursive Constructions of Parallel FIFO and LIFO Queues with Switched Delay Lines

Algorithms to solve massively under-defined systems of multivariate quadratic equations

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

Integrating Factor Methods as Exponential Integrators

Melodic contour estimation with B-spline models using a MDL criterion

Equilibrium of Heterogeneous Congestion Control Protocols

Partial permutation decoding for MacDonald codes

<C 2 2. λ 2 l. λ 1 l 1 < C 1

Homogeneity properties of subadditive functions

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Section 6: Magnetostatics

Available online at ScienceDirect. IFAC PapersOnLine 50-1 (2017)

Physics 127c: Statistical Mechanics. Fermi Liquid Theory: Collective Modes. Boltzmann Equation. The quasiparticle energy including interactions

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

MA 201: Partial Differential Equations Lecture - 10

Unconditional security of differential phase shift quantum key distribution

Tracking Control of Multiple Mobile Robots

4 Separation of Variables

4 1-D Boundary Value Problems Heat Equation

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

Theory and implementation behind: Universal surface creation - smallest unitcell

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled.

V.B The Cluster Expansion

Manipulation in Financial Markets and the Implications for Debt Financing

Math 124B January 31, 2012

Statistical Learning Theory: A Primer

Nonlinear Analysis of Spatial Trusses

Throughput Optimal Scheduling for Wireless Downlinks with Reconfiguration Delay

A unified framework for design and analysis of networked and quantized control systems

Rate-Distortion Theory of Finite Point Processes

In-plane shear stiffness of bare steel deck through shell finite element models. G. Bian, B.W. Schafer. June 2017

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

1D Heat Propagation Problems

CONSISTENT LABELING OF ROTATING MAPS

A Branch and Cut Algorithm to Design. LDPC Codes without Small Cycles in. Communication Systems

K a,k minors in graphs of bounded tree-width *

Discrete Applied Mathematics

Coupling of LWR and phase transition models at boundary

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Lower Bounds for the Relative Greedy Algorithm for Approximating Steiner Trees

17 Lecture 17: Recombination and Dark Matter Production

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Efficient Generation of Random Bits from Finite State Markov Chains

Uniformly Reweighted Belief Propagation: A Factor Graph Approach

Emmanuel Abbe Colin Sandon

Transcription:

Distributed Optimization With Loca Domains: Appications in MPC and Network Fows João F. C. Mota, João M. F. Xavier, Pedro M. Q. Aguiar, and Markus Püsche Abstract In this paper we consider a network with P nodes, where each node has excusive access to a oca cost function. Our contribution is a communication-efficient distributed agorithm that finds a vector x minimizing the sum of a the functions. We make the additiona assumption that the functions have intersecting oca domains, i.e., each function depends ony on some components of the variabe. Consequenty, each node is interested in knowing ony some components of x, not the entire vector. This aows for improvement in communication-efficiency. We appy our agorithm to mode predictive contro MPC and to network fow probems and show, through experiments on arge networks, that our proposed agorithm requires ess communications to converge than prior agorithms. Index Terms Distributed agorithms, aternating direction method of mutipiers ADMM, Mode Predictive Contro, network fow, muticommodity fow, sensor networks. f x,x,x 3 f 6 x,x,x 3 6 f x,x,x 3 4 3 f 3 x,x,x 3 f 4x,x,x 3 a Goba variabe f 5 x,x,x 3 5 f x,x f x,x,x 3 3 f 3 x,x 3 f 6 x,x 6 4 f 4 x,x 3 b Partia variabe f 5 x,x 3 Figure. Exampe of a a goba and a b partia variabe. Whie each function in a depends on a the components of the variabe x = x,x,x 3, each function in b depends ony on a subset of the components of x. 5 I. INTRODUCTION Distributed agorithms have become popuar for soving optimization probems formuated on networks. Consider, for exampe, a network with P nodes and the foowing probem: f x+f x+ +f P x, x R n where f p is a function known ony at node p. Fig. a iustrates this probem for a variabe x of size n = 3. Severa agorithms have been proposed to sove in a distributed way, that is, each node communicates ony with its neighbors and there is no centra node. In a typica distributed agorithm for, each node hods an estimate of a soution x, and iterativey updates and exchanges it with its neighbors. It is usuay assumed that a nodes are interested in knowing the entire soution x. Whie such an assumption hods for probems ike consensus [] or distributed SVMs [], there are important probems where it does not hod, especiay in the context of arge networks. Two exampes we wi expore here are distributed Mode Predictive Contro MPC and network fows. The goa in distributed MPC is to contro a network of interacting subsystems with couped dynamics [3]. That contro shoud be performed using the east amount of energy. Network fow probems have many appications [4]; here, we João F. C. Mota, João M. F. Xavier, and Pedro M. Q. Aguiar are with Instituto de Sistemas e Robótica ISR, Instituto Superior Técnico IST, Technica University of Lisbon, Portuga. Markus Püsche is with the Department of Computer Science at ETH Zurich, Switzerand. João F. C. Mota is aso with the Department of Eectrica and Computer Engineering at Carnegie Meon University, USA. This work was supported by the foowing grants from Fundação para a Ciência e Tecnoogia FCT: CMU-PT/SIA/006/009, PEst- OE/EEI/LA0009/0, and SFRH/BD/3350/008 through the Carnegie Meon/Portuga Program managed by ICTI. wi sove a network fow probem to deays in a muticommodity routing probem. Both distributed MPC and network fow probems can be written naturay as with functions that depend ony on a subset of the components of x. We sove in the case that each function f p may depend ony on a subset of the components of the variabe x R n. This situation is iustrated in Fig. b, where, for exampe, f ony depends on x and x. To capture these dependencies, we write x S, S {,...,n}, to denote a subset of the components of x. For exampe, if S = {,4}, then x S = x,x 4. With this notation, our goa is soving f x S +f x S + +f P x SP, x R n where is the set of components the function f p depends on. Accordingy, every node p is ony interested in a part of the soution: x. We make the foowing Assumption. No components are goba, i.e., P p= =. Whenever this assumption hods, we say that the variabe x is partia. Fig. b shows an exampe of a partia variabe. Note that, athough no component appears in a nodes, node depends on a components, i.e., it has a goba domain. In fact, Assumption aows ony a strict subset of nodes to have goba domains. This contrasts with Fig. a, where a nodes have goba domains and hence Assumption does not hod. We say that the variabe x in Fig. a is goba. Ceary, probem is a particuar case of probem and hence it can be soved with any agorithm designed for. This approach, however, may introduce unnecessary communications, since nodes exchange fu estimates of x, and not just of the components they are interested in, thus

potentiay wasting usefu communication resources. In many networks, communication is the operation that consumes the most energy and/or time. Contributions. We first formaize probem by making a cear distinction between variabe dependencies and communication network. Before, both were usuay assumed the same. Then, we propose a distributed agorithm for probem that takes advantage of its specia structure to reduce communications. We wi distinguish two cases for the variabe of : connected and non-connected, and design agorithms for both. To our knowedge, this is the first time an agorithm has been proposed for a non-connected variabe. We appy our agorithms to distributed MPC and to network fow probems. A surprising resut is that, despite their generaity, the proposed agorithms outperform prior agorithms even though they are appication-specific. Reated work. Many agorithms have been proposed for the goba probem, for exampe, gradient-based methods [], [5], [6], or methods based on the Aternating Direction Method of Mutipiers ADMM [7], [8], [9]. As mentioned before, soving with an agorithm designed for introduces unnecessary communications. We wi observe this when we compare the agorithm proposed here with D-ADMM [9], the state-of-the-art for in terms of communication-efficiency. To our knowedge, this is the first time probem has been expicity stated in a distributed context. For exampe, [0, 7.] proposes an agorithm for, but is not distributed in our sense. Namey, it either requires a patform that supports a-to-a communications in other words, a centra node, or requires running consensus agorithms on each induced subgraph, at each iteration [0, 0.]. Thus, that agorithm is ony distributed when every component induces subgraphs that are stars. Actuay, we found ony one agorithm in the iterature that is distributed or that can easiy be made distributed for a the scenarios considered in this paper. That agorithm was proposed in [] in the context of power system state estimation the agorithm we propose can aso be appied to this probem, athough we wi not consider it here. Our simuations show that the agorithm in [] requires aways more communications than the agorithm we propose. Athough we found just one communication-efficient distributed agorithm soving, there are many other agorithms soving particuar instances of it. For exampe, in network fow probems, each component of the variabe is associated to an edge of the network. We wi see such probems can be written as with a connected variabe, in the specia case where each induced subgraph is a star. In this case, [0, 7.] becomes distributed, and aso gradient/subgradient methods can be appied directy either to the prima probem [] or to the dua probem [3], and yied distributed agorithms. Network fow probems have aso been tacked with Newtonike methods [4], [3]. A reated probem is Network Utiity Maximization NUM, which is used to mode traffic contro on the Internet [5], [6]. For exampe, the TCP/IP protoco has been interpreted as a gradient agorithm soving a NUM. In [7], we compared a particuar instance of the proposed agorithm with prior agorithms soving NUM, and showed that it requires ess end-to-end communications. However, due to its structure, it does not offer interpretations of end-to-end protocos as reaistic as gradient-based agorithms. Distributed Mode Predictive Contro MPC [3] is another probem that has been addressed with agorithms soving, again in the specia case of a variabe whose components induce star subgraphs ony. Such agorithms incude subgradient methods [8], interior-point methods [9], fast gradient [0], and ADMM-based methods [0], [] which appy [0, 7.]. A these methods were designed for the specia case of star-shaped induced subgraphs and, simiary to [0, 7.], they become inefficient if appied to more generic cases. In spite of its generaity, the agorithm we propose requires ess communications than previous agorithms that were specificay designed for distributed MPC or network fow probems. Additionay, we appy our agorithm to two scenarios in distributed MPC that have not been considered before: probems where the variabe is connected but the induced subgraphs are not stars, and probems with a non-connected variabe. Both cases can mode scenarios where subsystems that are couped through their dynamics cannot communicate directy. Lasty, this paper extends consideraby our preiminary work [7]. In particuar, the agorithm in [7] was designed for bipartite networks and was based on the -bock ADMM. In contrast, the agorithms proposed here work on any connected network and are based on the Extended ADMM; thus, they have different convergence guarantees. Aso, the MPC mode proposed here is significanty more genera than the one in [7]. II. TERMINOLOGY AND PROBLEM STATEMENT We start by introducing the concepts of communication network and variabe connectivity. Communication network. A communication network is represented as an undirected graph G = V,E, where V = {,...,P} is the set of nodes and E V V is the set of edges. Two nodes communicate directy if there is an edge connecting them in G. We assume: Assumption. G is connected and its topoogy does not change over time; aso, a cooring scheme C of G is avaiabe beforehand. A cooring scheme C is a set of numbers, caed coors, assigned to the nodes such that two neighbors never have the same coor, as shown in Fig.. Given its importance in TDMA, a widespread protoco for avoiding packet coisions, there is a arge iterature on cooring networks, as briefy overviewed in []. Our agorithm integrates naturay with TDMA, since both use cooring as a synchronization scheme: nodes work sequentiay according to their coors, and nodes with the same coor work in parae. The difference is that TDMA uses a more restrictive cooring, as nodes within two hops cannot have the same coor. Note that packet coision is often ignored in the design of distributed agorithms, as confirmed by the ubiquitous assumption that a nodes can communicate simutaneousy.

3 3 3 6 Figure. Exampe of a cooring scheme of the communication network using 3 coors: C = {,3,5}, C = {4,6}, and C 3 = {}. We associate with each node p in the communication network a function f p : R np R {+ }, where n = n + +n P, and make the Assumption 3. Each function f p is known ony at node p and it is cosed, proper, and convex over R np. Since we aow f p to take infinite vaues, each node can impose constraints on the variabe using indicator functions, i.e., functions that evauate to + when the constraints are not satisfied, and to 0 otherwise. Variabe connectivity. Athough each function f p is avaiabe ony at node p, each component of the variabe x may be associated with severa nodes. Let x be a given component. The subgraph induced by x is represented by G = V,E G, where V is the set of nodes whose functions depend on x, and an edge i,j E beongs to E if both i and j are in V. For exampe, the subgraph induced by x in Fig. b consists of V = {,,4,6} and E = {,,,6,,6}. We say that x is connected if its induced subgraph is connected, and non-connected otherwise. Likewise, a variabe is connected if a its components are connected, and non-connected if it has at east one non-connected component. Probem statement. Given a network satisfying Assumption and a set of functions satisfying Assumptions and 3, we sove the foowing probem: design a distributed, communication-efficient agorithm that soves, either with a connected or with a non-connected variabe. By distributed agorithm we mean a procedure that makes no use of a centra node and where each node communicates ony with its neighbors. Unfortunatey there is no known ower bound on how many communications are needed to sove. Because of this, communication-efficiency can ony be assessed reative to existing agorithms that sove the same probem. As mentioned before, our strategy for this probem is to design an agorithm for the connected case and then generaize it to the non-connected case. 4 III. CONNECTED CASE In this section we derive an agorithm for assuming its variabe is connected. Our derivation uses the same principes as the state-of-the-art agorithm [9], [] for the goba probem. The main idea is to manipuate to make the Extended ADMM [3] appicabe. We wi see that the agorithm derived here generaizes the one in [9], []. 5 Probem manipuation. Let x be a given component and G = V,E the respective induced subgraph. In this section we assume each G is connected. Since a nodes in V are interested inx, we wi create a copy ofx in each of those nodes: x p wi be the copy at node p and x p := {x p } Sp wi be the set of a copies at node p. We rewrite as { x } n = subject to f x S +f x S + +f P x P S P x i = x j, i,j E, =,...,n, where { x } L = is the optimization variabe and represents the set of a copies. We used x to denote a copies of the component x, which are ocated ony in the nodes of G : x := {x p } p V. The reason for introducing constraints in 3 is to enforce equaity among the copies of the same component: if two neighboring nodes i and j depend on x, then x i = x j appears in the constraints of 3. We assume that any edge in the communication network is represented as the ordered pair i,j E, with i < j. As such, there are no repeated equations in 3. Probems and 3 are equivaent because each induced subgraph is connected. A usefu observation is that x i = x j, i,j E, can be written as A x = 0, where A is the transposed node-arc incidence matrix of the subgraph G. The node-arc incidence matrix represents a given graph with a matrix where each coumn corresponds to an edge i,j E and has in the ith entry, in the jth entry, and zeros esewhere. We now partition the optimization variabe according to the cooring scheme: for each =,...,n, x = x,..., xc, where { x c p {x = } p V C c, if V C c, if V C c =, and C c is the set of nodes that have coor c. Thus, x c is the set of copies of x hed by the nodes that have coor c. If no node with coor c depends on x, then x c is empty. A simiar notation for the coumns of the matrix A enabes us to write A x as Ā x + +ĀC xc, and thus 3 equivaenty as x,..., x C p C f p x p + + p C C f p x p subject to Ā x 4 + +ĀC x C = 0, where x c = { x c }n =, and Āc is the diagona concatenation of the matricesāc,āc,...,āc n, i.e.,āc = diagāc,āc,...,āc n. To better visuaize the constraint in 4, we wrote Ā c Ā c x c Ā c x c =... 5. Ā c n x c n for each c =,...,C. The format of 4 is exacty the one to which the Extended ADMM appies, as expained next. Extended ADMM. The Extended ADMM is a natura generaization of the Aternating Direction Method of Mutipiers ADMM. Given a set of cosed, convex functions g,..., g C, and a set of fu coumn rank matrices E,..., E C, a with the same number of rows, the Extended ADMM soves x c x,...,x C g x + +g C x C subject to E x + +E C x C = 0. 3 6

4 It consists of iterating on k the foowing equations: x k+ = argmin x L ρ x,x k,...,x k P;λ k 7 x k+ = argmin x L ρ x k+,x,x k 3,...,xk C ;λk 8 x k+ C. = argmin L ρ x k+,x k+,...,x k+ C,x C;λ k 9 x C C E c x k+ c, 0 λ k+ = λ k +ρ c= where λ is the dua variabe, ρ is a positive parameter, and L ρ x;λ = C gc x c +λ ρ E c x c + C E c x c c= c= is the augmented Lagrangian of 6. The origina ADMM is recovered whenever C =, i.e., when there are ony two terms in the sums of 6. The foowing theorem gathers some known convergence resuts for 7-0. Theorem [3], [4]. For each c =,...,C, et g c : R nc R {+ } be cosed and convex over R nc and domg c. Let each E c be an m n c matrix. Assume 6 is sovabe and that either C = and each E c has fu coumn rank, or C and each g c is strongy convex. Then, the sequence {x k,...,x k C,λk } generated by 7-0 converges to a prima-dua soution of 6. It is beieved that 7-0 converges even when C >, each g c is cosed and convex not necessariy strongy convex, and each matrix E c has fu coumn rank. Such beief is supported by empirica evidence [], [3] and its proof remains an open probem. So far, there are ony proofs for modifications of 7-0 that resuted either in a sower agorithm [5], or in agorithms not appicabe to distributed scenarios [6]. Appying the Extended ADMM. The cear correspondence between 4 and 6 makes 7-0 directy appicabe to 4. Associate a dua variabe λ ij to each constraint x i = x j in 3. Transating 0 component-wise, λ ij is updated as λ ij,k+ = λ ij,k +ρ x i,k+ x j,k+, where x p,k+ is the estimate of x at node p after iteration k. This estimate is obtained from 7-9, where we wi focus our attention now. This sequence wi yied the synchronization mentioned in Section II: nodes work sequentiay according to their coors, with the same coored nodes working in parae. In fact, each probem in 7-30 corresponds to a given coor. Moreover, each of these probems decomposes into C c probems that can be soved in parae, each by a node with coor c. For exampe, the copies of the nodes with coor are updated according to 7: x,k+ = argmin f p x p +λ k Ā x x p C C Ā x + Ā c x c,k = argmin x + p C j N p V + ρ f p x p signj pλ pj,k + ρ D p, c= x ρx j,k p x p, 3 whose equivaence is estabished in Lemma beow. In 3, the sign function is defined as for nonnegative arguments and as for negative arguments. Aso, D p, is the degree of node p in the subgraph G, i.e., the number of neighbors of node p that aso depend on x. Of course, D p, is ony defined when. Before estabishing the equivaence between and 3, note that 3 decomposes into C probems that can be soved in parae. This is because x consists of the copies hed by the nodes with coor ; and, since nodes with the same coor are never neighbors, none of the copies in x appears as x j,k in the second term of 3. Therefore, a nodes p in C can sove in parae the foowing probem: x p,k+ = argmin f p x p x p Sp ={xp } Sp + x signj pλ pj,k ρx j,k p j N p V + ρ D p, x p. 4 However, node p can sove 4 ony if it knows x j,k and λ pj,k, for j N p V and. This is possibe if, in the previous iteration, it received the respective copies of x from its neighbors. This is aso enough for knowing λ pj,k, athough we wi see ater that no node needs to know each λ pj,k individuay. The proof of the foowing emma, in Appendix A, shows how we obtained 3 from. Lemma. and 3 are equivaent. We just saw how 7 yieds C probems with the format of 4 that can be soved in parae by a the nodes with coor. For the other coors, the anaysis is the same with one minor difference: in the second term of 4 we have x j,k+ from the neighbors with a smaer coor and x j,k from the nodes with a arger coor. The resuting agorithm is shown in Agorithm. There is a cear correspondence between the structure of Agorithm and equations 7-0: steps -9 correspond to 7-9, and the oop in step 0 corresponds to 0. In steps -9, nodes work according to their coors, with the same coored nodes working in parae. Each node computes the vector v in step 4, soves the optimization probem in step 6, and then sends the

5 Agorithm Agorithm for a connected variabe Initiaization: for a p V,, set γ p, : repeat : for c =,...,C do 3: for a p C c [in parae] do 4: for a do v p,k = γ p,k ρ ρ 5: end for 6: Set x p,k+ j N p V Cj<c arg min f px p + } Sp x p Sp ={xp x j,k+ as the soution of v p,k = x p, = 0; k = j N p V Cj>c x j,k p x + ρ D p, 7: For each component, send x p,k+ 8: end for 9: end for 0: for a p V and [in parae] do γ p,k+ = γ p,k : end for : k k + 3: unti some stopping criterion is met x p to N p V +ρ j N p V x p,k+ x j,k+ new estimates of x to the neighbors that aso depend on x, for. Note the introduction of extra notation in step 4: Cp is the coor of node p. The computation of v p,k in that step requires x j,k from the neighbors with arger coors and x j,k+ from the neighbors with smaer coors. Whie the former is obtained from the previous iteration, the atter is obtained at the current iteration, after the respective nodes execute step 7. Regarding the probem in step 6, it invoves the private function of node p, f p, to which is added a inear and a quadratic term. This fufis our requirement that a operations invoving f p be performed at node p. Note that the update of the dua variabes in step 0 is different from. In particuar, a the λ s at node p were condensed into a singe dua variabe γ p. This was done because the optimization probem 4 does not depend on the individua λ pj s, but ony on γ p,k := j N p V signj pλ pj,k. If we repace λ ij,k+ = λ ij,k +ρ signj i x i,k+ x j,k+ 5 in the definition of γ p,k, we obtain the update of step 0. The extra sign in 5 w.r.t. was necessary to take into account the extension of the definition of the dua variabe λ ij for i > j see Appendix A. Convergence. Apart from manipuations, Agorithm resuts from the appication of the Extended ADMM to probem 4. Consequenty, the concusions of Theorem appy if we prove that 4 satisfies the conditions of that theorem. Lemma. Each matrix Āc in 4 has fu coumn rank. Proof: Let c be any coor in {,,...,C}. By definition, Ā c = diagāc,āc,...,āc n ; therefore, we have to prove that each Āc has fu coumn rank, for =,,...,n. Let then c and be fixed. We are going to prove that Āc Ā c, a square matrix, has fu rank, and therefore Āc has fu coumn rank. Since Ā = [ Ā c Ā c Ān] c, Ā c Ā c corresponds to theth bock in the diagona of the matrixa A, the Lapacian matrix of the induced subgraph G. By assumption, in this section a induced subgraphs are connected. This means each node in G has at east one neighbor aso in G and hence each entry in the diagona of A A is greater than zero. The same happens to the entries in the diagona of Āc Ā c. In fact, these are the ony nonzero entries of Āc Ā c, as this is a diagona matrix. This is because Āc Ā c corresponds to the Lapacian entries of nodes that have the same coor, which are never neighbors. Therefore, Āc Ā c has fu rank. The foowing coroary, whose proof is omitted, is a straightforward consequence of Theorem and Lemma. Coroary. Let Assumptions -3 hod and et the variabe be connected. Let aso one of the foowing conditions hod: the network is bipartite, i.e., C =, or each p C c f p x Sp is strongy convex, c =,...,C. Then, the sequence {x p,k } k= at node p, produced by Agorithm, converges to x, where x soves. As stated before, it is beieved that the Extended ADMM converges for C > even when none of the g c s is strongy convex just cosed and convex. However, it is required that each E c has fu coumn rank. This transates into the beief that Agorithm converges for any network, provided each f p is cosed and convex and each matrixāc in 4 has fu coumn rank. The ast condition is the content of Lemma. Comparison with other agorithms. Agorithm is a generaization of D-ADMM []: by vioating Assumption and making = {,...,n} for a p, the variabe becomes goba and Agorithm becomes exacty D-ADMM. This is a generaization indeed, for Agorithm cannot be obtained from D-ADMM. The above fact is not surprising since Agorithm was derived using the same set of ideas as D-ADMM, but adapted to a partia variabe. Each iteration of Agorithm resp. D-ADMM invoves communicating P p= resp. np numbers. Under Assumption, P p= < np, and thus there is a cear per-iteration gain in soving with Agorithm. Athough Assumption can be ignored in the sense that Agorithm sti works without it, we considered that assumption to make cear the type of probems addressed in this paper. We mentioned before that the agorithm in [] is the ony one we found in the iterature that efficienty soves in the same scenarios as Agorithm. For comparison purposes, we show it as Agorithm. Agorithms and are very simiar in format, athough their derivations are consideraby different. In particuar, Agorithm is derived from the - bock ADMM and thus it has stronger convergence guarantees. Namey, it does not require the network to be bipartite nor We are impicity excuding the pathoogica case where a component x appears in ony one node, say node p; this woud ead to a Lapacian matrix A A equa to 0. This case is easiy addressed by redefining f p, the function at node p, to f p = inf x f p...,x,...

6 Agorithm [] Initiaization: for a p V,, set γ p, : repeat : for a p V [in parae] do 3: for a do v p,k 4: end for = γ p,k ρ = x p, = 0; k = D p, x p,k + x j,k j N p V 5: Set x p,k+ as the soution of arg min f px p + } Sp x p Sp ={xp v p,k p x + ρ D p, x p 6: For each component, send x p,k+ to N p V 7: end for 8: for a p V and [in parae] do γ p,k+ = γ p,k + ρ j N p V x p,k+ x j,k+ 9: end for 0: k k + : unti some stopping criterion is met any function to be strongy convex cf. Coroary. Aso, it does not require any cooring scheme and, instead, a nodes perform the same tasks in parae. Note aso that the updates of v p and γ p are different in both agorithms. In the same way that Agorithm was derived using the techniques of D-ADMM, Agorithm was derived using the techniques of [7]. And, as in the experimenta resuts of [], [9], we wi observe in Section VI that Agorithm aways requires ess communications than Agorithm. Next, we propose a modification to Agorithms and that makes them appicabe to a non-connected variabe. IV. NON-CONNECTED CASE So far, we have assumed a connected variabe in. In this section, the variabe wi be non-connected, i.e., it wi have at east one component that induces a non-connected subgraph. In this case, probems and 3 are no onger equivaent and, therefore, the derivations that foow do not appy. We propose a sma trick to make these probems equivaent. Let x be a component whose induced subgraph G = V,E is non-connected. Then, the constraint x i = x j, i,j E, in 3 fais to enforce equaity on a the copies ofx. To overcome this, we propose creating a virtua path to connect the disconnected components ofg. This wi aow the nodes in G to reach an agreement on an optima vaue for x. Since our goa is to communications, we woud ike to find the shortest path between these disconnected components, that is, to find an optima Steiner tree. Steiner tree probem. Let G = V,E be an undirected graph and et R V be a set of required nodes. A Steiner tree is any tree in G that contains the required nodes, i.e., it is an acycic connected graph T,F G such that R T. The set of nodes in the tree that are not required are caed Steiner nodes, and wi be denoted with S := T \R. In the Steiner tree Figure 3. Exampe of an optima Steiner tree: back nodes are required and striped nodes are Steiner. probem, each edgei,j E has a cost c ij associated, and the goa is to find a Steiner tree whose edges have a minima cost. This is exacty our probem if we make c ij = for a edges and R = V. The Steiner tree probem is iustrated in Fig. 3, where the required nodes are back and the Steiner nodes are striped. Unfortunatey, computing optima Steiner trees is NP-hard [7]. There are, however, many heuristic agorithms, some even with approximation guarantees. The Steiner tree probem can be formuated as [8] {z ij} i,j E i,j E subject to c ij z ij z ij, U : 0 < U R < R i U j U z ij {0,}, i,j E, 6 where U in the first constraint is any subset of nodes that separates at east two required nodes. The optimization variabe is constrained to be binary, and an optima vaue zij = means that edge i, j was seected for the Steiner tree. Let hz := i,j E c ijz ij denote the objective of 6. We say that an agorithm for 6 has an approximation ratio of α if it produces a feasibe point z such h z αhz, for any probem instance. For exampe, the prima-dua agorithm for combinatoria probems [8], [9] has an approximation ratio of. This number has been decreased in a series of works, the smaest one being +n3/.55, provided by [30]. Agorithm generaization. To make Agorithms and appicabe to a non-connected variabe, we propose the foowing preprocessing step. For every component x that induces a disconnected subgraph G = V,E, compute a Steiner tree T,F G using V as required nodes. Let S := T \V denote the Steiner nodes in that tree. The functions of these Steiner nodes do not depend on x, i.e., x for a p S. Define a new induced graph as G = V,E, with V := T and E := E F. Then, we can create copies of x in a nodes in V, and write equivaenty as { x } n = subject to f x S +f x S + +f P x P S P 7 x i = x j, i,j E, =,...,n, where x := {x p } p V denotes the set of a copies of x, and{ x } L =, the optimization variabe, represents the set of a copies. Note that the function at node p remains unchanged:

7 Agorithm 3 Agorithm for a non-connected variabe Preprocessing centraized: : Set S p = for a p V, and V = V for a = {,...,n} : for a {,...,n} such that x is non-connected do 3: Compute a Steiner tree T,F, where V are required nodes 4: Set V = T and S := T \V Steiner nodes 5: For a p S, S p = S p {x } 6: end for Main agorithm distributed: Initiaization: Set γ p, = x p, = 0, for, p V; k = 7: repeat 8: for c =,...,C do 9: for a p C c [in parae] do 0: for a do v p,k = γ p,k ρ x j,k+ ρ : end for : Set x p,k+ S p j N p V Cj<c arg minf px p + x p Sp S p as the soution of v p,k j N p V Cj>c p x + ρ D p, x j,k x p 3: For each, send x p,k+ to N p V 4: end for 5: end for 6: for a p V and [in parae] do +ρ j N p V 7: end for 8: k k + 9: unti some stopping criterion is met γ p,k+ = γ p,k x p,k+ x j,k+ it ony depends on x p := {x p } Sp, athough node p can now have more copies, namey, x p S, where p S p is the set of components of which node p is a Steiner node. Of course, when a component x is connected, we set G = G ; aso, if a node p is not Steiner for any component, =. If we repeat the anaysis of the previous section repacing probem 3 by 7, we get Agorithm 3. Agorithm 3 has two parts: a preprocessing step, which is new, and the main agorithm, which it essentiay Agorithm with some sma adaptations. We assume the preprocessing step can be done in a centraized way, before the execution of the main agorithm. In fact, the preprocessing ony requires knowing the communication network G and the nodes dependencies, but not the specific the functions f p. Regarding the main agorithm, it is simiar to Agorithm except that each node, in addition to estimating the components its function depends on, it aso estimates the components for which it is a Steiner node. The additiona computations are, however, very simpe: if nodepis a Steiner node for componentx, it updates it as x p,k+ = /ρd p, v p,k in step ; since f p does not depend on x, the probem corresponding to the update of x becomes a quadratic probem for which there is a cosedform soution. Note that D p, is now defined as the degree of node p in the subgraph G. The steps we took to generaize Agorithm to a non-connected variabe can be easiy appied the same way to Agorithm. V. APPLICATIONS In this section we describe how the proposed agorithms can be used to sove distributed MPC and network fow probems. Distributed MPC. MPC is a popuar contro strategy for discrete-time systems [3]. It assumes a state-space mode for the system, where the state at time t, here denoted with x[t] R n, evoves according tox[t+] = Θ t x[t],u[t], whereu[t] R m is the input at time t and Θ t : R n R m R n is a map that gives the system dynamics at each time instant t. Given a time-horizon T, an MPC impementation consists of measuring the state at time t = 0, computing the desired states and inputs for the next T time steps, appying u[0] to the system, setting t = 0, and repeating the process. The second step, i.e., computing the desired states and inputs for a given time horizon T, is typicay addressed by soving x,ū subject to Φx[T]+ T t=0 Ψt x[t],u[t] x[t+] = Θ t x[t],u[t], t = 0,...,T x[0] = x 0, 8 where the variabe is x,ū := {x[t]} T t=0,{u[t]}t t=0. Whie Φ penaizes deviations of the fina state x[t] from our goa, Ψ t usuay measures, for each t = 0,...,T, some type of energy consumption that we want to. Regarding the constraints of 8, the first one enforces the state to foow the system dynamics, and the second one encodes the initia measurement x 0. We sove 8 in the foowing distributed scenario. There is a set ofp systems that communicate through a communication network G = V,E. Each system has a state x p [t] R np and a oca input u p [t] R mp, where n + + n P = n and m + + m P = m. The state of system p evoves as x p [t+] = Θ t p {xj [t],u j [t]} j Ωp, where Ωp V is the set of nodes whose state and/or input infuences x p we assume {p} Ω p for a p. Note that, in contrast with what is usuay assumed, Ω p is not necessariy a subset of the neighbors of node p. In other words, two systems that infuence each other may be unabe to communicate directy. This is iustrated in Fig. 4b where, for exampe, the state/input of node 3 infuences the state evoution of node dotted arrow, but there is no communication ink soid ine between them. Finay, we assume functions Φ and Ψ t in 8 can be decomposed, respectivey, as Φx[T] = P p= Φ p{x j [T]} j Ωp and Ψ t x[t],u[t] = P p= Ψt p{x j [t],u j [t]} j Ωp, where Φ p and Ψ t p are both associated to node p. In sum, we sove min x,ū P p=[ Φ p {x j [T]} j Ωp + ] T t=0 Ψt p {x j[t],u j [t]} j Ωp s.t. x p [t+] = Θp t {xj [t],u j [t]} j Ωp, t = 0,...,T x p [0] = x 0 p p =,...,P, 9 where x 0 p is the initia measurement at node p. The variabe in 9 is x,ū := { x p } P p=,{ū p} p= P, where xp :=

8 3 6 a Connected star-shaped variabe 4 5 3 6 b Non-connected variabe Figure 4. Two MPC scenarios. Soid ines represent inks in the communication network and dotted arrows represent system interactions. a Connected variabe where each induced subgraph is a star. b Non-connected variabe because node 5 is infuenced by x,ū, but not none of its neighbors are. {x p [t]} T t=0 andū p := {u p [t]} T t=0. Probem 9 can be written as by making f p { x j,ū j } j Ωp = Φ p {x j [T]} j Ωp +i xp[0]=x 0 p x p T + Ψ t p{x j [t],u j [t]} j Ωp +i Γ t p { x j,ū j } j Ωp, t=0 where i S is the indicator function of the set S, i.e., i S x = + if x S and i S x = 0 if x S, and Γ t p := {{ x j,ū j } j Ωp : x p [t+] = Θ t p {xj [t],u j [t]} j Ωp }. We iustrate in Fig. 4a the case where Ω p N p {p}, i.e., the state of node p is infuenced by its own state/input and by the states/inputs of the systems with which it can communicate. Using our terminoogy, this corresponds to a connected variabe, where each induced subgraph is a star: the center of the star is node p, whose state is x p. Particuar cases of this mode have been considered, for exampe, in [3], [3], [33], whose soutions are heuristics, and in [8], [9], [0], [], whose soutions are optimization-based. The mode we propose here is significanty more genera, since it can hande scenarios where interacting nodes do not necessariy need to communicate, or even scenarios with a non-connected variabe. Both cases are shown in Fig. 4b. For exampe, the subgraph induced by x 3,ū 3 consists of the nodes{,,3,4} and is connected. The reference for connectivity is aways the communication network which, in the pots, is represented by soid ines. Nodes and 3, however, cannot communicate directy. This is an exampe of an induced subgraph that is not a star. On the other hand, the subgraph induced by x,ū consists of the nodes {,,3,5}. This subgraph is not connected, which impies that the optimization variabe is nonconnected. Situations ike the above can be usefu in scenarios where communications inks are expensive or hard to estabish. For instance, MPC can be used for temperature reguation of buidings [33], where making wired connections between rooms, here viewed as systems, can be expensive. In that case, two adjacent rooms whose temperatures infuence each other may not be abe to communicate directy. The proposed MPC mode can hande this scenario easiy. MPC mode for the experiments. We now present a simpe inear MPC mode, which wi be used in our experiments in Section VI. Athough simpe, this mode wi iustrate a the 4 5 cases considered above. We assume that systems are couped though their inputs, i.e.,x p [t+] = A p x p [t]+ j Ω p B pj u j [t], where A p R np np and each B pj R np mj are arbitrary matrices, known ony at node p. Aso, we assume Φ p and Ψ t p in 9 are, respectivey, Φ p {x j [T]} j Ωp = x p [T] Qf p x p [T] and Ψ t p{x j [t]} j Ωp = x p [t] Qp x p [t]+u p [t] Rp, where Q p and Q f p are positive semidefinite matrices, and R p is positive definite. Probem 9 then becomes P x,...,x P p= u p R pu p +x p Q px p u,...,u P subject to x p = C p {u j } j Sp +Dp 0, p =,...,P, 0 where, x p = x p [0],...,x p [T], u p = u p [0],...,u p [T ], for each p, and [ IT Q p = Q ] p 0, R 0 p = I T R p, Qf p 0 0 0 I B p 0 0 A pp C p = A pp B p B p 0, Dp 0..... = A pp x 0 p... A T pp B p A T pp B p B p A T pp In the entries of matrix C p, B p is the horizonta concatenation of the matrices B pj, for a j Ω p. One of the advantages of the mode we are using is that a the variabes x p can be eiminated from 0, yieding u,...,u P P {u j } j E p {u j } j Sp +wp {u j } j Sp, p= where each E p is obtained by summing R p with Cp Q p C p in the correct entries, and w p = Cp Q pdp 0. Our mode thus eads to a very simpe probem. In a centraized scenario, where a matrices E p and a vectors w p are known in the same ocation, the soution of can be computed by soving a inear system. Likewise, the probem in step 6 of Agorithm and steps 5 and of Agorithms and 3, respectivey bois down to soving a inear system. φ x φ 3 x 3 3 φ 4 x 4 φ 43 x 43 φ 6 x 6 4 6 φ 46 x 46 φ 45 x 45 φ 67 x 67 5 7 φ 57 x 57 Figure 5. A network fow probem: each edge has a variabe x ij representing the fow from node i to node j and aso has a cost function φ ij x ij. Network fow. A network fow probem is typicay formuated on a network with arcs or directed edges, where an arc from node i to node j indicates a fow in that direction. In the exampe given in Fig. 5, there can be a fow from node to node 6, but not the opposite. Every arc i,j A has associated a non-negative variabe x ij representing the

9 amount of fow in that arc from node i to node j, and a cost function φ ij x ij that depends ony on x ij. The goa is to the sum of a the costs, whie satisfying the aws of conservation of fow. Externa fow can be injected or extracted from a node, making that node a source or a sink, respectivey. For exampe, in Fig. 5, node can ony be a source, since it has ony outward edges; in contrast, nodes 3 and 7 can ony be sinks, since they have ony inward edges. The remaining nodes may or may not be sources or sinks. We represent the network of fows with the node-arc incidence matrix B, where the coumn associated to an arc from node i to node j has a in the ith entry, a in the jth entry, and zeros esewhere. We assume the components of the variabe x and the coumns of B are in exicographic order. For exampe, x = x,x 6,x 3,x 4,x 43,x 45,x 46,x 57,x 67 woud be the variabe in Fig. 5. The aws of conservation of fow are expressed as Bx = d, where d R P is the vector of externa inputs/outputs. The entries of d sum up to zero and d p < 0 resp. d p > 0 if node p is a source resp. sink. When node p is neither a source nor a sink, d p = 0. The probem we sove is x subject to i,j A φ ijx ij Bx = d x 0, which can be written as by setting + f p {x pj } p,j A,{x jp } j,p A = j,p A p,j A φ pj x pj φ jp x jp +i b p x=d p {x pj } p,j A,{x jp } j,p A, where b p is the pth row of B. In words,f p consists of the sum of the functions associated to a arcs invoving node p, pus the indicator function of the set {x : b p x = d p}. This indicator function enforces the conservation of fow at node p and it ony invoves the variabes {x pj } p,j A and {x jp } j,p A. Regarding the communication network G = V, E, we assume it consists of the underying undirected network. This means that nodes i and j can exchange messages directy, i.e., i,j E for i < j, if there is an arc between these nodes, i.e., i,j A or j,i A. Therefore, in contrast with the fows, messages do not necessariy need to be exchanged satisfying the direction of the arcs. In fact, messages and fows might represent different physica quantities: think, for exampe, in a network of water pipes controed by actuators at each pipe junction; whie the pipes might enforce a direction in the fow of water by using, for exampe, specia vaves, there is no reason to impose the same constraint on the eectrica signas exchanged by the actuators. In probem, the subgraph induced by x ij, i,j A, consists ony of nodes i and j and an edge connecting them. This makes the variabe in connected and star-shaped. Next we discuss the functions φ ij used in our simuations. Modes for the experiments. We considered two instances of : a simpe instance and a compex instance. Whie the simpe instance makes a the agorithms we consider appicabe, the more compex instance can be soved ony by a subset of agorithms, but it provides a more reaistic appication. The simpe instance uses φ ij = x ij a ij, where a ij > 0, as the cost function for each arc i,j and no constraints besides the conservation of fow, i.e., we drop the nonnegativity constraint x 0 in. The reason for dropping this constraint was to make the agorithm in [3] appicabe. The other instance we consider is [4, Ch.7]: x={x ij} i,j A subject to i,j A x ij c ij x ij Bx = d 0 x ij c ij, i,j A, 3 where c ij represents the capacity of the arc i,j A. Probem 3 has the same format of except for the additiona capacity constraintsx ij c ij, and it modes overa system deays on muticommodity fow probems [4, Ch.7]. If we appy Agorithm to probem 3, node p has to sove at each step y=y,...,y Dp subject to Dp i= yi c i y i +v i y i +a i y i b p y = d p 0 y c, 4 where each y i corresponds to x pj if p,j A, or to x jp if j, p A. Since projecting a point onto the set of constraints of 4 is simpe see [34], 4 can be soved efficienty with a projected gradient method. In fact, we wi use the agorithm in [35], which is based on the Barziai- Borwein method. In sum, we wi sove two instances of : a simpe one, where φ ij x ij = /x ij a ij and with no constraints besides Bx = d, and 3, a more compex but reaistic one. VI. EXPERIMENTAL RESULTS In this section we show experimenta resuts of the proposed agorithms soving MPC and network fow probems. We start with network fow because it is simper and more agorithms are appicabe. Aso, it wi iustrate the inefficiency of soving with an agorithm designed for the goba probem. Network fows: experimenta setup. As mentioned in the previous section, we soved two instances of. In both instances, we used a network with 000 nodes and 3996 edges, generated randomy according to the Barabasi- Abert mode [37] with parameter, using the Network X Python package [38]. We made the simpifying assumption that between any two pairs of nodes there can be at most arc, as shown in Fig. 5. Hence, the size of the variabe x in is equa to the number of edges E, in this case, 3996. The generated network had a diameter of 8, an average node degree of 3.996, and it was coored with 3 coors in Sage [39]. This gives us the underying undirected communication network. Then, we assigned randomy a direction to each edge, with equa probabiities for both directions, creating a directed network ike in Fig. 5. We aso assigned to each edge a number drawn randomy from the set {0, 0, 30, 40, 50, 00}. The probabiities were 0. for the first four eements and 0. for 50 and 00. These numbers payed the roe of the a ij s in

0 Reative error 0 0 Reative error 0 0 [9] 0 0 [3] 0 0 [36] 0 3 [], [0] Ag. [36] 0 3 Ag. [], [0] 0 4 0 00 00 300 400 500 0 4 0 00 400 600 800 000 Communication steps Communication steps a Simpe instance of b Probem 3 Figure 6. Resuts for the network fow probems on a network with 000 nodes and 3996 edges. The resuts in a are for the simpe instance of, where φ ij x ij = /x ij a ij and there are no nonnegativity constraints; and the resuts in b are for 3. the simpe instance of and the roe of the capacities c ij in 3. To generate the vector d or, in other words, to determine which nodes are sources or sinks, we proceeded as foows. For each k =,...,00, we picked a source s k randomy uniformy out of the set of 000 nodes and then picked a sink r k randomy uniformy out of the set of reachabe nodes of s k. For exampe, if we were considering the network of Fig. 5 and picked s k = 4 as a source node, the set of its reachabe nodes woud be {3, 5, 6, 7}. Next, we added to the entries s k and r k of d the vaues f k /00 and f k /00, respectivey, where f k is a number drawn randomy exacty as c ij or a ij. This corresponds to injecting a fow of quantity f k /00 at node s k and extracting the same quantity at node r k. After repeating this process 00 times, for k =,...,K, we obtained vector d. To assess the error given by each agorithm, we computed the soutions x of the instances of in a centraized way. The simpe instance of considers φ ij x ij = /x ij a ij and ignores the constraint x 0. Thus, it is a simpe quadratic program and has a cosed-form soution: soving a inear system. Simiary, the probem Agorithm resp. Agorithm has to sove in step 6 resp. step 5 bois down to soving a inear system. To compute the soution of 3, the compex instance of, we used CVXOPT [40]. The pots we wi show depict the reative error on the prima variabe x k x / x, where x k is the concatenation of the estimates at a nodes, versus the number of communication steps. A communication step CS consists of a nodes communicating their current estimates to their neighbors. That is, in each CS, information fows on each edge in both directions and, hence, the tota number of CSs is proportiona to the tota number of communications. A the agorithms we compared, discussed next, had a tuning parameter: ρ for the ADMM-based agorithms cf. Agorithms and, a Lipschitz constant L for a gradient-based agorithm, and a stepsize α for a Newton-based agorithm. Suppose we seected ρ for an ADMM-based agorithm. We say that ρ has precision γ, if both ρ γ and ρ+γ ead to worse resuts for that agorithm. A simiar definition is used for L and α. We compared Agorithm, henceforth denoted as Ag., against the ADMM-based agorithms in [0, 7.] and [] reca that Agorithm describes [], Nesterov s method [36], the distributed Newton method proposed in [3], and D- ADMM [9]. For network fow probems, the agorithms in [0, 7.] and [] coincide, i.e., they become exacty the same agorithm. This is not surprising since both are based on the same agorithm: the -bock ADMM. A the ADMM-based agorithms, incuding Ag., takecer iteration. The work in [3], besides proposing a distributed Newton method, aso describes the appication of the gradient method to the dua of. Here, instead of appying the simpe gradient method, we appy Nesterov s method [36], which can be appied in the same conditions, has a better bound on the convergence rate, and is known to converge faster in practice. However, gradient methods, incuding Nesterov s method, require an objective that has a Lipschitz-continuous gradient. Whie this is the case of the objective of the dua of the simpe instance of, the same does not happen for the objective of the dua of 3. Therefore, in the atter case, we had to estimate a Lipschitz constant L. Simiary to the ADMM-based agorithms, each iteration of a gradient agorithm takes Cer iteration. Regarding the distributed Newton agorithm in [3], we impemented it with a parameter N =, which is the order of the approximation in the computation of the Newton direction, and fixed the stepsize α. With this impementation, each iteration takes 3 CSs. Finay, D-ADMM [9] is currenty the most communication-efficient agorithm for the goba probem. As such, it makes a the nodes compute the fu soution x, which has dimensions 3996 in this case. Thus, each message exchanged in one CS of D-ADMM is 3996 times arger than the messages exchanged by the other agorithms. Network fows: resuts. The resuts for the simpe instance of are shown in Fig. 6a. Of a the agorithms, Ag. required the east amount of CSs to achieve any reative error between and 0 4. The second best were the agorithms [0] and [], whose ines coincide because they become the same agorithm when appied to network fows. Nesterov s method [36] and the Newton-based method [3]

Tabe I STATISTICS FOR THE NETWORKS USED IN MPC. Name Source # Nodes # Edges Diam. # Coors Av. Deg. A [37] 00 96 6 3 3.9 B [4] 494 6594 46 6.67 had a performance very simiar to each other, but worse than the ADMM-based agorithms. However, D-ADMM [9], which is aso ADMM-based but soves the goba probem instead, was the agorithm with the worst performance. Note that, in addition to requiring much more CSs than any other agorithm, each message exchanged by [9] is 3996 times arger than a message exchanged by any other agorithm. This ceary shows that if we want to derive communicationefficient agorithms, we have to expore the structure of. Finay, we mention that the vaue of ρ in these experiments was for a ADMM-based agorithms precision, the Lipschitz constant L was 70 precision 5, and the stepsize α was 0.4 precision 0.. Fig. 6b shows the resuts for 3. In this case, we were not abe to make the agorithm in [3] converge actuay, it is not guaranteed to converge for this probem. It is visibe in Fig. 6b that this probem is harder to sove, since a agorithms required more CSs sove it. Again, Ag. was the agorithm with the best performance. This time we did not find any choice for L that made Nesterov s agorithm [36] achieve an error of 0 4 in ess than 000 CSs. The best resut we obtained was for L = 5000. The parameter ρ was 0.08 for Ag. and 0. for [], [0], both computed with precision 0.0. MPC: experimenta setup. For the MPC experiments we used two networks with very different sizes. One network, which we ca A, has 00 nodes,96 edges, and was generated the same way as the network for the network fow experiments: with a Barabasi-Abert mode [37] with parameter. The other network, named B, has 494 nodes and 6594 edges and it represents the topoogy of the Western States Power Grid [4] obtained in [4]. The diameter, the number of used coors, and the average degree for these networks is shown in Tabe I. For cooring the networks, we used Sage [39]. We soved the MPC probem and, to iustrate a the particuar cases of a variabe for, we created severa types of data. For a the data types, the size of the state resp. input at each node was aways n p = 3 resp. m p =, and the time-horizon was T = 5. Since has a variabe of size m p TP, network A impied a variabe of size 500 and network B impied a variabe of size 4705. With network A, we generated the matrices A p so that each subsystem coud be unstabe; namey, we drew each of its entries from a norma distribution. With network B, we proceeded the same way, but then shrunk the eigenvaues of each A p to the interva [,], hence making each subsystem stabe. A matricesb pj were aways generated as each A p in the unstabe case. The way we generated system coupings, i.e., the set Ω p for each node p see aso the dotted arrows in the networks of Fig. 4, wi be expained as we present the experimenta resuts. Note that for the MPC probem the Lipschitz constant of the gradient of its objective can be computed in cosed-form and, therefore, does not need to be estimated. The reative error wi be computed as in the network fows: x k x / x, wherex k is the concatenation of a the nodes input estimates. MPC resuts: connected case. The resuts for a the experiments on a connected variabe are shown in Fig. 7. There, Ag. is compared against [] see aso Agorithm, and [0], and [36]. We mention that agorithms [0], [36] were aready appied to, e.g., in [0], in the specia case of a variabe with star-shaped induced subgraphs. This is in fact the ony case where [0] and [36] are distributed, and it expains why they are not in Figs. 7c and 7d: the induced subgraphs in those figures are not stars. Ony Ag. and [] are appicabe in this case. In Fig. 7a the network is A and each subsystem was generated possiby unstabe, and in Fig. 7b the network is B and each subsystem was generated stabe. In both cases, Ag. required the east number of CSs to achieve any reative error between and 0 4, foowed by [0], then by [], and finay by [36]. It can be seen from these pots that the difficuty of the probem is determined, not so much by the size of network, but by the stabiity of the subsystems. In fact, a agorithms required uniformy more communications to sove a probem on network A, which has ony 00 nodes, than on network B, which has approximatey 5000 nodes. This difficuty can be measured by the Lipschitz constant L:.63 0 6 for network A Fig. 7a and 3395 for network B Fig. 7b. Regarding the parameter ρ, in Fig. 7a it was 0 for [0] and 35 for the other agorithms computed with precision 5; in Fig. 7b, it was 5 for Ag. and [0], and 30 for Ag. [] aso computed with precision 5. In Figs. 7c and 7d we considered a generic connected variabe, where each induced subgraphs is not necessariy a star. In this case, the system coupings were generated as foows. Given a node p, we assigned it u p and we initiaized a fringe with its neighbors N p. Then, we seected a node randomy with equa probabiity from the fringe and made it depend on u p ; we aso added its neighbors to the fringe. The described process was done 3 times for each variabe u p i.e., node p. When each induced subgraph is not a star, ony Ag. and [] are appicabe. Figs. 7c and 7d show their performance for network A with unstabe subsystems and for network B with stabe subsystems, respectivey. It can be seen that Ag. required uniformy ess CSs than [] to achieve the same reative error. MPC resuts: non-connected case. A non-connected variabe has at east one component whose induced subgraphg = V,E is not connected. In this case, Agorithm is no onger appicabe and it requires a generaization, shown in Agorithm 3. Part of the generaization consists of computing Steiner trees, using the nodes in V as required nodes. The same generaization can be made to the agorithm in []. To create a probem instance with a non-connected variabe, we generated system coupings in a way very simiar to the coupings for Figs. 7c and 7d. The difference was that any node in the network coud be chosen to depend on a given u p. However, any node in the fringe had twice the probabiity of