Distributed Subgradient Methods for Multi-agent Optimization

Similar documents
Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Convergence Rate for Consensus with Delays

Asynchronous Gossip Algorithms for Stochastic Optimization

Multi-Dimensional Hegselmann-Krause Dynamics

Average Consensus and Gossip Algorithms in Networks with Stochastic Asymmetric Communications

OPTIMIZATION in multi-agent networks has attracted

Distributed Optimization over Random Networks

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

3.8 Three Types of Convergence

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Block designs and statistics

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

Computational and Statistical Learning Theory

A Simple Regression Problem

A note on the multiplication of sparse matrices

Feature Extraction Techniques

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Stochastic Subgradient Methods

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Polygonal Designs: Existence and Construction

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

Chapter 6 1-D Continuous Groups

Sharp Time Data Tradeoffs for Linear Inverse Problems

Fairness via priority scheduling

COS 424: Interacting with Data. Written Exercises

Hybrid System Identification: An SDP Approach

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

3.3 Variational Characterization of Singular Values

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

On Constant Power Water-filling

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Solutions of some selected problems of Homework 4

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

arxiv: v1 [math.na] 10 Oct 2016

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Topic 5a Introduction to Curve Fitting & Linear Regression

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

Convex Programming for Scheduling Unrelated Parallel Machines

Least Squares Fitting of Data

Constrained Consensus and Optimization in Multi-Agent Networks

Scheduling Contract Algorithms on Multiple Processors

RESTARTED FULL ORTHOGONALIZATION METHOD FOR SHIFTED LINEAR SYSTEMS

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Non-Parametric Non-Line-of-Sight Identification 1

arxiv: v1 [cs.ds] 3 Feb 2014

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Introduction to Optimization Techniques. Nonlinear Programming

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Supplementary Information for Design of Bending Multi-Layer Electroactive Polymer Actuators

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

The Methods of Solution for Constrained Nonlinear Programming

Alternative Characterization of Ergodicity for Doubly Stochastic Chains

Error Exponents in Asynchronous Communication

On Conditions for Linearity of Optimal Estimation

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Data Dependent Convergence For Consensus Stochastic Optimization

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

The Fundamental Basis Theorem of Geometry from an algebraic point of view

Composite optimization for robust blind deconvolution

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

CS Lecture 13. More Maximum Likelihood

A note on the realignment criterion

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Bipartite subgraphs and the smallest eigenvalue

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

Ch 12: Variations on Backpropagation

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Introduction to Machine Learning. Recitation 11

Information Overload in a Network of Targeted Communication: Supplementary Notes

Fixed-to-Variable Length Distribution Matching

A Note on Online Scheduling for Jobs with Arbitrary Release Times

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Kernel Methods and Support Vector Machines

Monochromatic images

MULTIAGENT Resource Allocation (MARA) is the

Supplement to: Subsampling Methods for Persistent Homology

arxiv: v1 [cs.ds] 29 Jan 2012

Linear Algebra (I) Yijia Chen. linear transformations and their algebraic properties. 1. A Starting Point. y := 3x.

Lower Bounds for Quantized Matrix Completion

Support Vector Machines MIT Course Notes Cynthia Rudin

The Transactional Nature of Quantum Information

On the Use of A Priori Information for Sparse Signal Approximations

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

arxiv: v1 [cs.ds] 17 Mar 2016

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Chaotic Coupled Map Lattices

Transcription:

1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions corresponding to ultiple agents. For solving this not necessarily sooth) optiization proble, we consider a subgradient ethod that is distributed aong the agents. The ethod involves every agent iniizing his/her own objective function while exchanging inforation locally with other agents in the network over a tie-varying topology. We provide convergence results and convergence rate estiates for the subgradient ethod. Our convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approxiate optial solutions and the nuber of iterations needed to achieve the accuracy. 1 Introduction There has been considerable recent interest in the analysis of large-scale networks, such as the Internet, which consist of ultiple agents with different objectives. For such networks, it is essential to design resource allocation ethods that can operate in a decentralized anner with liited local inforation and rapidly converge to an approxiately optial operating point. Most existing approaches use cooperative and noncooperative distributed optiization fraeworks under the assuption that each agent has an objective function that depends only on the resource allocated to that agent. In any practical situations however, individual cost functions or perforance easures depend on the entire resource allocation vector. In this paper, we study a distributed coputation odel that can be used for general resource allocation probles. We provide a siple algorith and study its convergence We would like to thank Rayadurga Srikant, Devavrat Shah, the associate editor, three anonyous referees, and various seinar participants for useful coents and discussions. Departent of Industrial and Enterprise Systes Engineering, University of Illinois, angelia@uiuc.edu Departent of Electrical Engineering and Coputer Science, Massachusetts Institute of Technology, asuan@it.edu

and rate of convergence properties. In particular, we focus on the distributed control of a network consisting of agents over a tie-varying topology. The global objective is to cooperatively iniize the cost function i=1 f ix), where the function f i : R n R represents the cost function of agent i, known by this agent only, and x R n is a decision vector. The decision vector can be viewed as either a resource vector where sub-coponents correspond to resources allocated to each agent, or a global decision vector which the agents are trying to copute using local inforation. Our algorith is as follows: each agent generates and aintains estiates of the optial decision vector based on inforation concerning his own cost function in particular, the subgradient inforation of f i ) and exchanges these estiates directly or indirectly with the other agents in the network. We allow such counication to be asynchronous, local, and with tie varying connectivity. Under soe weak assuptions, we prove that this type of local counication and coputation achieves consensus in the estiates and converges to an approxiate) global optial solution. Moreover, we provide rate of convergence analysis for the estiates aintained by each agent. In particular, we show that there is a tradeoff between the quality of an approxiate optial solution and the coputation load required to generate such a solution. Our convergence rate estiate captures this dependence explicitly in ters of the syste and algorith paraeters. Our approach builds on the seinal work of Tsitsiklis [23] see also Tsitsiklis et al. [24], Bertsekas and Tsitsiklis [2]), who developed a fraework for the analysis of distributed coputation odels. This fraework focuses on the iniization of a sooth) function fx) by distributing the processing of the coponents of vector x R n aong n agents. In contrast, our focus is on probles in which each agent has a locally known, different, convex and potentially nonsooth cost function. To the best of our knowledge, this paper presents the first analysis of this type of distributed resource allocation proble. In addition, our work is also related to the literature on reaching consensus on a particular scalar value or coputing exact averages of the initial values of the agents, which has attracted uch recent attention as natural odels of cooperative behavior in networked-systes see Vicsek et al. [25], Jadbabaie et al. [8], Boyd et al. [4], Olfati- Saber and Murray [17], Cao et al. [5], and Olshevsky and Tsitsiklis [18, 19]). Our work is also related to the utility axiization fraework for resource allocation in networks see Kelly et al. [10], Low and Lapsley [11], Srikant [22], and Chiang et al. [7]). In contrast to this literature, we allow the local perforance easures to depend on the entire resource allocation vector and provide explicit rate analysis. The reainder of this paper is organized as follows: In Section 2, we introduce a odel for the inforation exchange aong the agents and a distributed optiization odel. In Section 3, we establish the basic convergence properties of the transition atrices governing the evolution of the syste in tie. In Section 4, we establish convergence and convergence rate results for the proposed distributed optiization algorith. Finally, we give our concluding rearks in Section 5. Basic Notation and Notions: A vector is viewed as a colun vector, unless clearly stated otherwise. We denote 2

by x i or [x] i the i-th coponent of a vector x. When x i 0 for all coponents i of a vector x, we write x 0. For a atrix A, we write A j i or [A]j i to denote the atrix entry in the i-th row and j-th colun. We write [A] i to denote the i-th row of the atrix A, and [A] j to denote the j-th colun of A. We denote the nonnegative orthant by R n +, i.e., R n + = {x R n x 0}. We write x to denote the transpose of a vector x. The scalar product of two vectors x, y R is denoted by x y. We use x to denote the standard Euclidean nor, x = x x. We write x to denote the ax nor, x = ax 1 i x i. A vector a R is said to be a stochastic vector when its coponents a i, i = 1,...,, are nonnegative and their su is equal to 1, i.e., i=1 a i = 1. A square atrix A is said to be a stochastic atrix when each row of A is a stochastic vector. A square atrix A is said to be a doubly stochastic atrix when both A and A are stochastic atrices. For a function F : R n, ], we denote the doain of F by dof ), where dof ) = {x R n F x) < }. We use the notion of a subgradient of a convex function F x) at a given vector x dof ). We say that s F x) R n is a subgradient of the function F at x dof ) when the following relation holds: F x) + s F x) x x) F x) for all x dof ). 1) The set of all subgradients of F at x is denoted by F x) see [1]). 2 Multi-agent Model We are interested in a distributed coputation odel for a ulti-agent syste, where each agent processes his/her local inforation and shares the inforation with his/her neighbors. To describe such a ulti-agent syste, we need to specify two odels: an inforation exchange odel describing the evolution of the agents inforation in tie and an optiization odel specifying overall syste objective that agents are cooperatively iniizing by individually iniizing their own local objectives. Inforally speaking, the first odel specifies the rules of the agents interactions such as how often the agents counicate, how they value their own inforation and the inforation received fro the neighbors. The second odel describes the overall goal that the agents want to achieve through their inforation exchange. The odels are discussed in the following sections. 2.1 Inforation Exchange Model We consider a network with node or agent) set V = {1,..., }. At this point, we are not describing what the agents goal is, but rather what the agents inforation exchange process is. We use an inforation evolution odel based on the odel proposed by Tsitsiklis [23] see also Blondel et al. [3]). We ipose soe rules that govern the inforation evolution of the agent syste in tie. These rules include: 3

- A rule on the weights that an agent uses when cobining his inforation with the inforation received fro his/her neighbors. - A connectivity rule ensuring that the inforation of each agent influences the inforation of any other agent infinitely often in tie. - A rule on the frequency at which an agent sends his inforation to the neighbors. We assue that the agents update and possibly) send their inforation to their neighbors at discrete ties t 0, t 1, t 2,.... The neighbors of an agent i are the agents j counicating directly with agent i through a directed link j, i). We index agents inforation states and any other inforation at tie t k by k. We use x i k) R n to denote agent i inforation state at tie t k. We now describe a rule that agents use when updating their inforation states x i k). The inforation update for agents includes cobining their own inforation state with those received fro their neighbors. In particular, we assue that each agent i has a vector of weights a i k) R at any tie t k. For each j, the scalar a i jk) is the weight that agent i assigns to the inforation x j obtained fro a neighboring agent j, when the inforation is received during the tie interval t k, t k+1 ) or slot k). We use the following assuption on these weights. Assuption 1 Weights Rule) We have: a) There exists a scalar η with 0 < η < 1 such that for all i {1,..., }, i) a i ik) η for all k 0. ii) a i jk) η for all k 0 and all agents j counicating directly with agent i in the interval t k, t k+1 ). iii) a i jk) = 0 for all k 0 and j otherwise. b) The vectors a i k) are stochastic, i.e., ai jk) = 1 for all i and k. Assuption 1a) states that each agent gives significant weights to her own state x i k) and the inforation state x j k) available fro her neighboring agents j at the update tie t k. Naturally, an agent i assigns zero weights to the states x j for those agents j whose inforation state is not available at the update tie. 1 Note that, under Assuption 1, for a atrix Ak) whose coluns are a 1 k),..., a k), the transpose A k) is a stochastic atrix for all k 0. The following is an exaple of weight choices satisfying Assuption 1 cf. Blondel et al. [3]): 1 For Assuption 1a) to hold, the agents need not be given the lower bound η for their weights a i j k). In particular, as a lower bound for the positive weights ai j k), each agent ay use her own η i, with 0 < η i < 1. In this case, Assuption 1a) holds for η = in 1 i η i. Moreover, we do not assue that the coon bound value η is available to any agent at any tie. 4

Equal Neighbor Weights) Each agent assigns equal weight to his/her inforation and the inforation received fro the other agents at slot k, i.e., a i jk) = 1 n i k)+1 for each i, k, and those neighbors j whose state inforation is available to agent i at tie t k ; otherwise a i jk) = 0. Here, the nuber n i k) is the nuber of agents counicating with agent i at the given slot k. We now discuss the rules we ipose on the inforation exchange. At each update tie t k, the inforation exchange aong the agents ay be represented by a directed graph V, E k ) with the set E k of directed edges given by E k = {j, i) a i jk) > 0}. Note that, by Assuption 1a), we have i, i) E k for each agent i and all k. Also, we have j, i) E k if and only if agent i receives the inforation x j fro agent j in the tie interval t k, t k+1 ). Motivated by the odel of Tsitsiklis [23] and the consensus setting of Blondel et al. [3]), we ipose a inial assuption on the connectivity of a ulti-agent syste as follows: following any tie t k, the inforation of an agent j reaches each and every agent i directly or indirectly through a sequence of counications between the other agents). In other words, the inforation state of any agent i has to influence the inforation state of any other agent infinitely often in tie. In forulating this, we use the set E consisting of edges j, i) such that j is a neighbor of i who counicates with i infinitely often. The connectivity requireent is forally stated in the following assuption. Assuption 2 Connectivity) The graph V, E ) is connected, where E is the set of edges j, i) representing agent pairs counicating directly infinitely any ties, i.e., E = {j, i) j, i) E k for infinitely any indices k}. In other words, this assuption states that for any k and any agent pair j, i), there is a directed path fro agent j to agent i with edges in the set l k E l. Thus, Assuption 2 is equivalent to the assuption that the coposite directed graph V, l k E l ) is connected for all k. When analyzing the syste state behavior, we use an additional assuption that the intercounication intervals are bounded for those agents that counicate directly. In particular, we use the following. Assuption 3 Bounded Intercounication Interval) There exists an integer B 1 such that for every j, i) E, agent j sends his/her inforation to the neighboring agent i at least once every B consecutive tie slots, i.e., at tie t k or at tie t k+1 or... or at latest) at tie t k+b 1 for any k 0. This assuption is equivalent to the requireent that there is B 1 such that j, i) E k E k+1 E k+b 1 for all j, i) E and k 0. 5

2.2 Optiization Model We consider a scenario where agents cooperatively iniize a coon additive cost. Each agent has inforation only about one cost coponent, and iniizes that coponent while exchanging inforation with other agents. In particular, the agents want to cooperatively solve the following unconstrained optiization proble: iniize i=1 f ix) subject to x R n, 2) where each f i : R n R is a convex function. We denote the optial value of this proble by f, which we assue to be finite. We also denote the optial solution set by X, i.e., X = {x R n i=1 f ix) = f }. In this setting, the inforation state of an agent i is an estiate of an optial solution of the proble 2). We denote by x i k) R n the estiate aintained by agent i at the tie t k. The agents update their estiates as follows: When generating a new estiate, agent i cobines his/her current estiate x i with the estiates x j received fro soe of the other agents j. Here we assue that there is no counication delay in delivering a essage fro agent j to agent i 2. In particular, agent i updates his/her estiates according to the following relation: x i k + 1) = a i jk)x j k) α i k)d i k), 3) where the vector a i k) = a i 1k),..., a i k)) is a vector of weights and the scalar α i k) > 0 is a stepsize used by agent i. The vector d i k) is a subgradient of agent i objective function f i x) at x = x i k). We note that the optiization odel of Eqs. 2) 3) reduces to a consensus or agreeent proble when all the objective functions f i are identically equal to zero; see Jadbabaie et al. [8] and Blondel et al. [3]. We are interested in conditions guaranteeing convergence of x i k) to a coon liit vector in R n. We are also interested in characterizing these coon liit points in ters of the properties of the functions f i. In order to have a ore copact representation of the evolution of the estiates x i k) of Eq. 3) in tie, we rewrite this odel in a for siilar to that of Tsitsiklis [23]. This for is also ore appropriate for our convergence analysis. In particular, we introduce atrices As) whose i-th colun is the vector a i s). Using these atrices we can relate estiate x i k + 1) to the estiates x 1 s),..., x s) for any s k. In particular, it is straightforward to verify that for the iterates generated by Eq. 3), we have for any i, and any s and k with k s, x i k + 1) = [As)As + 1) Ak 1)a i k)] j x j s) [As + 1) Ak 1)a i k)] j α j s)d j s) 2 A ore general odel that accounts for the possibility of such delays is the subject of our current work, see [16]. 6

[As + 2) Ak 1)a i k)] j α j s + 1)d j s + 1) [Ak 1)a i k)] j α j k 2)d j k 2) [a i k)] j α j k 1)d j k 1) α i k)d i k). 4) Let us introduce the atrices Φk, s) = As)As + 1) Ak 1)Ak) for all s and k with k s, where Φk, k) = Ak) for all k. Note that the i-th colun of Φk, s) is given by [Φk, s)] i = As)As + 1) Ak 1)a i k) for all i, s and k with k s, while the entry in i-th colun and j-th row of Φk, s) is given by [Φk, s)] i j = [As)As + 1) Ak 1)a i k)] j for all i, j, s and k with k s. We can now rewrite relation 4) copactly in ters of the atrices Φk, s), as follows: for any i {1,..., }, and s and k with k s 0, k ) x i k + 1) = [Φk, s)] i jx j s) [Φk, r)] i jα j r 1)d j r 1) α i k)d i k). r=s+1 We start our analysis by considering the transition atrices Φk, s). 5) 3 Convergence of the Transition Matrices Φk, s) In this section, we study the convergence behavior of the atrices Φk, s) as k goes to infinity. We establish convergence rate estiates for these atrices. Clearly, the convergence rate of these atrices dictates the convergence rate of the agents estiates to an optial solution of the overall optiization proble 2). Recall that these atrices are given by Φk, s) = As)As + 1) Ak 1)Ak) for all s and k with k s, 6) where Φk, k) = Ak) for all k. 7) 3.1 Basic Properties Here, we establish soe properties of the transition atrices Φk, s) under the assuptions discussed in Section 2. 7

Lea 1 Let Weights Rule a) hold [cf. Assuption 1a)]. We then have: a) [Φk, s)] j j ηk s+1 for all j, k, and s with k s. b) [Φk, s)] i j η k s+1 for all k and s with k s, and for all j, i) E s E k, where E t is the set of edges defined by E t = {j, i) a i jt) > 0} for all t. c) Let j, v) E s E r for soe r s and v, i) E r+1 E k for k > r. Then, [Φk, s)] i j η k s+1. d) Let Weights Rule b) also hold [cf. Assuption 1b)]. Then, the atrices Φ k, s) are stochastic for all k and s with k s. Proof. We let s be arbitrary, and prove the relations by induction on k. a) Note that, in view of Assuption 1a), the atrices Φk, s) have nonnegative entries for all k and s with k s. Furtherore, by Assuption 1a)i), we have [Φs, s)] j j η. Thus, the relation [Φk, s)]j j ηk s+1 holds for k = s. Now, assue that for soe k with k > s we have [Φk, s)] j j ηk s+1, and consider [Φk + 1, s)] j j. By the definition of the atrix Φk, s) [cf. Eq. 6)], we have [Φk + 1, s)] j j = [Φk, s)] h j a j h k + 1) [Φk, s)]j j aj j k + 1), h=1 where the inequality in the preceding relation follows fro the nonnegativity of the entries of Φk, s). By using the inductive hypothesis and the relation a j j k + 1) η [cf. Assuption 1a)i)], we obtain [Φk + 1, s)] j j ηk s+2. Hence, the relation [Φk, s)] j j ηk s+1 holds for all k s. b) Let j, i) E s. Then, by the definition of E s and Assuption 1a), we have that a i js) η. Since Φs, s) = As) [cf. Eq. 7)], it follows that the relation [Φk, s)] i j η k s+1 holds for k = s and any j, i) E s. Assue now that for soe k > s and all j, i) E s E k, we have [Φk, s)] i j η k s+1. Consider k + 1, and let j, i) E s E k E k+1. There are two possibilities j, i) E s E k or j, i) E k+1. Suppose that j, i) E s E k. Then, by the induction hypothesis, we have [Φk, s)] i j η k s+1. Therefore [Φk + 1, s)] i j = [Φk, s)] h j a i hk + 1) [Φk, s)] i ja i ik + 1), h=1 8

where the inequality in the preceding relation follows fro the nonnegativity of the entries of Φk, s). By cobining the preceding two relations, and using the fact ar) i i η for all i and r [cf. Assuption 1a)i)], we obtain [Φk + 1, s)] i j η k s+2. Suppose now that j, i) E k+1. Then, by the definition of E k+1, we have a i jk + 1) η. Furtherore, by part a), we have [Φk, s)] j j ηk s+1. Therefore [Φk + 1, s)] i j = [Φk, s)] h j a i hk + 1) [Φk, s)] j j ai jk + 1) η k s+2. h=1 Hence, [Φk, s)] i j η k s+1 holds for all k s and all j, i) E s E k. c) Let j, v) E s E r for soe r s and v, i) E r+1 E k for k > r. Then, by the nonnegativity of the entries of Φr, s) and Φk, r + 1), we have [Φk, s)] i j = [Φr, s)] h j [Φk, r + 1)] i h [Φr, s)] v j[φk, r + 1)] i v. h=1 By part b), we further have iplying that [Φr, s)] v j η r s+1 [Φk, r + 1)] i v η k r, [Φk, s)] i j η r s+1 η k r = η k s+1. d) Recall that, for each k, the coluns of the atrix Ak) are the weight vectors a 1 k),..., a k). Hence, by Assuption 1, the atrix A k) is stochastic for all k. Fro the definition of Φk, s) in Eqs. 6) 7), we have Φ k, s) = A k) A s + 1)A s), thus iplying that Φ k, s) is stochastic for all k and s with k s. Lea 2 Let Weights Rule a), Connectivity, and Bounded Intercounication Interval assuptions hold [cf. Assuptions 1a), 2, and 3]. We then have [Φs + 1)B 1, s)] i j η 1)B for all s, i, and j, where η is the lower bound of Assuption 1a) and B is the intercounication interval bound of Assuption 3. Proof. Let s, i, and j be arbitrary. If j = i, then by Lea 1a), we have [Φs + 1)B 1, s)] i i η 1)B. 9

Assue now that j i. By Connectivity [cf. Assuption 2], there is a path j = i 0 i 1... i r 1 i r = i fro j to i, passing through distinct nodes i κ, κ = 0,..., r and with edges i κ 1, i κ ) in the set E = {h, h) h, h) E k for infinitely any indices k}. Because each edge i κ 1, i κ ) of the path belongs to E, by using Assuption 3 [with k = s + κ 1)B for edge i κ 1, i κ )], we obtain i κ 1, i κ ) E s+κ 1)B E s+κb 1 for κ = 1,..., r. By using Lea 1 b), we have [Φs + κb 1, s + κ 1)B)] iκ i κ 1 η B for κ = 1,..., r. By Lea 1c), it follows that [Φs + rb 1, s)] i j η rb. Since there are agents, and the nodes in the path j = i 0 i 1... i r 1 i r = i are distinct, it follows that r 1. Hence, we have [Φs + 1)B 1, s)] i j = [Φs + rb 1, s)] h j [Φs + 1)B 1, s + rb)] i h h=1 [Φs + rb 1, s)] i j[φs + 1)B 1, s + rb)] i i η rb η 1)B rb = η 1)B, where the last inequality follows fro [Φk, s] i i η k s+1 for all i, k, and s with k s [cf. Lea 1a)]. Our ultiate goal is to study the liit behavior of Φk, s) as k for a fixed s 0. For this analysis, we introduce the atrices D k s) as follows: for a fixed s 0, D k s) = Φ s + k 1, s + k 1) ) for k = 1, 2,..., 8) where = 1)B. The next lea shows that, for each s 0, the product of these atrices converges as k increases to infinity. Lea 3 Let Weights Rule, Connectivity, and Bounded Intercounication Interval assuptions hold [cf. Assuptions 1, 2, and 3]. Let the atrices D k s) for k 1 and a fixed s 0 be given by Eq. 8). We then have: a) The liit Ds) = li k D k s) D 1 s) exists. b) The liit Ds) is a stochastic atrix with identical rows a function of s) i.e., Ds) = eφ s) where e R is a vector of ones and φs) R is a stochastic vector. 10

c) The convergence of D k s) D 1 s) to Ds) is geoetric: for every x R, D k s) D 1 s)) x Ds)x 2 1 + η ) 1 η ) k x for all k 1. In particular, for every j, the entries [D k s) D 1 s)] j i, i = 1,...,, converge to the sae liit φ j s) as k with a geoetric rate: for every j, [D k s) D 1 s)] j i φ js) 2 1 + η ) 1 η ) B k 0 for all k 1 and i, where η is the lower bound of Assuption 1a), = 1)B, is the nuber of agents, and B is the intercounication interval bound of Assuption 3. Proof. In this proof, we suppress the explicit dependence of the atrices D i on s to siplify our notation. a) We prove that the liit li k D k D 1 ) exists by showing that the sequence {D k D 1 )x} converges for every x R. To show this, let x 0 R be arbitrary, and consider the vector sequence {x k } defined by x k = D k D 1 x 0 for k 1. We write each vector x k in the following for: x k = z k + c k e with z k 0 for all k 0, 9) where e R is the vector with all entries equal to 1. The recursion is initialized with z 0 = x in 1 i [x 0] i and c 0 = in 1 i [x 0] i. 10) Having the decoposition for x k, we consider the vector x k+1 = D k+1 x k. In view of relation 9) and the stochasticity of D k+1, we have x k+1 = D k+1 z k + c k e. We define z k+1 = D k+1 z k [D k+1 ] j z k ) e, 11) c k+1 = [D k+1 ] j z k + c k. 12) where j is the index of the row vector [D k+1 ] j achieving the iniu of inner products [D k+1 ] j z k over all j {1,..., }. Clearly, we have x k+1 = z k+1 + c k+1 e and z k+1 0. By the definition of z k+1 in Eq. 11) it follows that for the coponents [z k+1 ] j we have [z k+1 ] j = [D k+1 ] j z k [D k+1 ] j z k for all j {1,..., }, 13) where [D k+1 ] j is the j-th row vector of the atrix D k+1. By Lea 2 and the definition of the atrices D k [cf. Eq. 8)], we have that all entries of each atrix D k are bounded away fro zero, i.e., for all i, j {1,..., }, [D k+1 ] i j η for all k 0. 11

Then, fro relation 13), we have for all i {1,..., }, Because z k+1 0, it follows iplying that [z k+1 ] j = [D k+1 ] j [D k+1 ] j ) z k 1 η ) z k. z k+1 1 η ) z k for all k 0, z k 1 η ) k z0 for all k 0. 14) Hence z k 0 with a geoetric rate. Consider now the sequence c k of Eq. 12). Since the vectors [D k+1 ] j and z k have nonnegative entries, it follows that c k c k+1 = c k + [D k+1 ] j z k. Furtherore, by using the stochasticity of the atrix D k+1, we obtain for all k, c k+1 c k + [D k+1 ] i j z k = c k + z k. i=1 Fro the preceding two relations and Eq. 14), it follows that 0 c k+1 c k z k 1 η ) k z0 for all k. Therefore, we have for any k 1 and r 1, 0 c k+r c k c k+r c k+r 1 + +c k+1 c k q k+r 1 + +q k ) z 0 = 1 qr 1 q qk z 0, where q = 1 η. Hence, {c k } is a Cauchy sequence and therefore it converges to soe c R. By letting r in the preceding relation, we obtain 0 c c k qk 1 q z 0. 15) Fro the decoposition of x k [cf. Eq. 9)], and the relations z k 0 and c k c, it follows that D k D 1 )x 0 ce for any x 0 R, with c being a function of x 0. Therefore, the liit of D k D 1 as k exists. We denote this liit by D, for which we have Dx 0 = cx 0 )e for all x 0 R. b) Since each D k is stochastic, the liit atrix D is also stochastic. Furtherore, because D k D 1 )x cx)e for any x R, the liit atrix D has rank one. Thus, the rows of D are collinear. Because the su of all entries of D in each of its rows is equal to 1, it follows that the rows of D are identical. Therefore, for soe stochastic vector φs) R [a function of the fixed s], we have D = eφs). 12

c) Let x k = D k D 1 )x 0 for an arbitrary x 0 R. By oitting the explicit dependence on x 0 in cx 0 ), and by using the decoposition of x k [cf. Eq. 9)], we have D k D 1 )x 0 Dx 0 = z k + c k c)e for all k. Using the estiates in Eqs. 14) and 15), we obtain for all k 1, D k D 1 )x 0 Dx 0 z k + c k c 1 + 1 ) q k z 0. 1 q Since z 0 = x 0 in 1 i [x 0 ] i [cf. Eq. 10)], iplying that z 0 2 x 0. Therefore, D k D 1 )x 0 Dx 0 2 1 + 1 ) q k x 0 for all k, 1 q with q = 1 η, or equivalently Dk D 1 )x 0 Dx 0 2 1 + η ) 1 η ) B k 0 x0 for all k. 16) Thus, the first relation of part c) of the lea is established. To show the second relation of part c) of the lea, let j {1,..., } be arbitrary. Let e j R be a vector with j-th entry equal to 1 and the other entries equal to 0. By setting x 0 = e j in Eq. 16), and by using D = eφ s) and e j = 1, we obtain [Dk D 1 ] j φ j s)e 2 1 + η ) 1 η ) k for all k. Thus, it follows that [Dk D 1 ] j i φ js) 2 1 + η ) 1 η ) k for all k 1 and i. The explicit for of the bound in part c) of this Lea 3 is new; the other parts have been established by Tsitsiklis [23]. In the following lea, we present convergence results for the atrices Φk, s) as k goes to infinity. Lea 3 plays a crucial role in establishing these results. In particular, we show that the atrices Φk, s) have the sae liit as the atrices [D 1 s) D k s)], when k increases to infinity. Lea 4 Let Weights Rule, Connectivity, and Bounded Intercounication Interval assuptions hold [cf. Assuptions 1, 2, and 3]. We then have: a) The liit Φs) = li k Φk, s) exists for each s. b) The liit atrix Φs) has identical coluns and the coluns are stochastic i.e., Φs) = φs)e, where φs) R is a stochastic vector for each s. 13

c) For every i, the entries [Φk, s)] j i, j = 1,...,, converge to the sae liit φ is) as k with a geoetric rate, i.e., for every i {1,..., } and all s 0, [Φk, s)] j i φ is) 2 1 + η 1 η 1 η ) k s for all k s and j {1,..., }, where η is the lower bound of Assuption 1a), = 1)B, is the nuber of agents, and B is the intercounication interval bound of Assuption 3. Proof. For a given s and k s +, there exists κ 1 such that s + κ k < s + κ + 1). Then, by the definition of Φk, s) [cf. Eqs. 6)-7)], we have Φk, s) = Φ s + κ 1, s) Φ k, s + κ ) = Φs + 1, s) Φ s + κ 1, s + κ 1) ) Φ k, s + κ ) = Φ s + κ 1, s + κ 1) ) Φ s + 1, s)) Φ k, s + κ ). By using the atrices D k = Φ s + k 1, s + k 1) ) for k 1 17) [the dependence of D k on s is suppressed], we can write Φk, s) = D κ D 1 ) Φ k, s + κ ). Therefore, for any i, j and k s +, we have [Φk, s)] j i = [D κ D 1 ] i h [Φ k, s + κ )] j h ax [D κ D 1 ] i h 1 h h=1 [Φ k, s + κ )] j h. Since the coluns of the atrix Φ k, s + κ ) are stochastic vectors, it follows that for any i, j and k s +, [Φk, s)] j i ax [D κ D 1 ] i h. 18) 1 h Siilarly, it can be seen that for any i, j and k s +, h=1 [Φk, s)] j i in 1 h [D κ D 1 ] i h. 19) In view of Lea 3, for a given s, there exists a stochastic vector φs) such that li D k D 1 = eφ s). k Furtherore, by Lea 3c) we have for every h {1,..., }, [D κ D 1 ] h h [φs)] h 2 1 + η ) 1 η ) B κ 0, for κ 1 and h {1,..., }. Fro the preceding relation, and inequalities 18) and 19), it follows that for k s + and any i, j {1,..., }, 14

[Φk, s)] j i [φs)] i { } ax ax [D κ D 1 ] i h [φs)] i 1 h, in [D κ D 1 ] i h [φs)] i 1 h 2 1 + η ) 1 η ) B κ 0. Since κ 1 and s + κ k < s + κ + 1), we have 1 η ) κ = 1 η ) κ+1 1 1 η = 1 η ) s+κ+1) s 1 1 η ) k s 1 1 η, 1 η where the last inequality follows fro the relations 0 < 1 η < 1 and k < s+κ+1). By cobining the preceding two relations, we obtain for k s + and any i, j {1,..., }, Therefore, we have [Φk, s)] j i [φs)] 1 + η i 2 1 η li Φk, s) = k φs)e = Φs) for every s, 1 η ) k s. 20) thus showing part a) of the lea. Furtherore, we have that all the coluns of Φs) coincide with the vector φs), which is a stochastic vector by Lea 3b). This shows part b) of the lea. Note that relation 20) holds for k s + and any i, j {1,..., }. To prove part c) of the lea, we need to show that the estiate of Eq. 20) holds for arbitrary s and for k with s + > k s, and any i, j {1,..., }. Thus, let s be arbitrary and let s + > k s. Because Φ k, s) is a stochastic atrix, we have for all i and j, 0 [Φk, s)] j i 1. Therefore, for k with s + > k s, and any i, j {1,..., }, [Φk, s)] j i [φs)] i 2 < 2 1 + η ) = 2 1 + η 1 η < 2 1 + η 1 η 1 η ) s+ s 1 η ) k s, where the last inequality follows fro the relations 0 < 1 η < 1 and k < s +. Fro the preceding relation and Eq. 20) it follows that for every s and i {1,..., }, [Φk, s)] j i [φs)] i 2 1 + η 1 η 1 η ) k s 15 for all k s and j {1,..., },

thus showing part c) of the lea. The preceding results are shown by following the line of analysis of Tsitsiklis [23] see Lea 5.2.1 in [23]; see also Bertsekas and Tsitsiklis [2]). The rate estiate given in part c) is new and provides the explicit dependence of the convergence of the transition atrices on syste paraeters and proble data. This estiate will be essential in providing convergence rate results for the subgradient ethod of Section 3 [cf. Eq. 5)]. 3.2 Liit Vectors φs) The agents objective is to cooperatively iniize the additive cost function i=1 f ix), while each agent individually perfors his own state updates according to the subgradient ethod of Eq. 5). In order to reach a consensus on the optial solution of the proble, it is essential that the agents process their individual objective functions with the sae frequency in the long run. This will be guaranteed if the liit vectors φs) converge to a unifor distribution, i.e., li s φs) = 1 e for all s. One way of ensuring this is to have φs) = 1 e for all s, which holds when the weight atrices Ak) are doubly stochastic. We forally ipose this condition in the following. Assuption 4 Doubly Stochastic Weights) Let the weight vectors a 1 k),..., a k), k = 0, 1,..., satisfy Weights Rule [cf. Assuption 1]. Assue further that the atrices Ak) = [a 1 k),..., a k)] are doubly stochastic for all k. Under this and soe additional assuptions, we show that all the liit vectors φs) are the sae and correspond to the unifor distribution 1 e. This is an iediate consequence of Lea 4, as seen in the following. Proposition 1 Unifor Liit Distribution) Let Connectivity, Bounded Intercounication Interval, and Doubly Stochastic Weights assuptions hold [cf. Assuptions 2, 3, and 4]. We then have: a) The liit atrices Φs) = li k Φk, s) are doubly stochastic and correspond to a unifor steady state distribution for all s, i.e., Φs) = 1 ee for all s. b) The entries [Φk, s)] j i converge to 1 as k with a geoetric rate uniforly with respect to i and j, i.e., for all i, j {1,..., }, [Φk, s)]j i 1 2 1 + η 1 η 1 η ) k s for all s and k with k s, where η is the lower bound of Assuption 1a), = 1)B, is the nuber of agents, and B is the intercounication interval bound of Assuption 3. 16

Proof. a) Since the atrix Ak) is doubly stochastic for all k, the atrix Φk, s) [cf. Eqs. 6)-7)] is also doubly stochastic for all s and k with k s. In view of Lea 4, the liit atrix Φs) = φs)e is doubly stochastic for every s. Therefore, we have φs)e e = e for all s, iplying that b) φs) = 1 e for all s. The geoetric rate estiate follows directly fro Lea 4c). The requireent that Ak) is doubly stochastic for all k inherently dictates that the agents share the inforation about their weights and coordinate the choices of the weights when updating their estiates. In this scenario, we view the weights of the agents as being of types: planned weights and actual weights they use in their updates. Specifically, let the weight p i jk) > 0 be the weight that agent i plans to use at the update tie t k+1 provided that an estiate x j k) is received fro agent j in the interval t k, t k+1 ). If agent j counicates with agent i during the tie interval t k, t k+1 ), then these agents counicate to each other their estiates x j k) and x i k) as well as their planned weights p j i k) and pi jk). In the next update tie t k+1, the actual weight a i jk) that agent i assigns to the estiate x j k) is selected as the iniu of the agent j planned weight p j i k) and the agent i planned weight pi jk). We suarize this in the following assuption. Assuption 5 Siultaneous Inforation Exchange) The agents exchange inforation siultaneously: if agent j counicates to agent i at soe tie, then agent i also counicates to agent j at that tie, i.e., if j, i) E k for soe k, then i, j) E k. Furtherore, when agents i and j counicate, they exchange their estiates x i k) and x j k), and their planned weights p i jk) and p j i k). We now show that when agents choose the sallest of their planned weights and the planned weights are stochastic, then the actual weights for a doubly stochastic atrix. Assuption 6 Syetric Weights) Let the agent planned weights p i jk), i, j = 1...,, be such that for soe scalar η, with 0 < η < 1, we have p i jk) η for all i, j and k, and pi jk) = 1 for all i and k. Furtherore, let the actual weights a i jk), i, j = 1,..., that the agents use in their updates be given by: i) a i jk) = in{p i jk), p j i k)} when agents i and j counicate during the tie interval t k, t k+1 ), and a i jk) = 0 otherwise. ii) a i ik) = 1 j i ai jk). The preceding discussion, cobined with Lea 1, yields the following result. 17

Proposition 2 Let Connectivity, Bounded Intercounication Interval, Siultaneous Inforation Exchange, and Syetric Weights assuptions hold [cf. Assuptions 2, 3, 5, and 6]. We then have: a) The liit atrices Φs) = li k Φk, s) are doubly stochastic and correspond to a unifor steady state distribution for all s, i.e., Φs) = 1 ee for all s. b) The entries [Φk, s)] i j converge to 1 as k with a geoetric rate uniforly with respect to i and j, i.e., for all i, j {1,..., }, [Φk, s)]j i 1 2 1 + η 1 η 1 η ) k s for all s and k with k s, where η is the lower bound of Syetric Weights Assuption 6, = 1)B, is the nuber of agents, and B is the intercounication interval bound of Assuption 3. Proof. In view of Unifor Liit Distribution [cf. Proposition 1], it suffices to show that Siultaneous Inforation Exchange and Syetric Weights assuptions [cf. Assuptions 5 and 6], iply that Assuption 4 holds. In particular, we need to show that the actual weights a i jk), i, j = 1,..., satisfy Weights Rule [cf. Assuption 1], and that the vectors a i k), i = 1,..., for a doubly stochastic atrix. First, note that by Syetric Weights assuption, the weights a i jk), i, j = 1,..., satisfy Weights Rule. Thus, the agent weight vectors a i k), i = 1,..., are stochastic, and hence, the weight atrix A k) with rows a i k), i = 1,..., is stochastic for all k. Second, note that by Siultaneous Inforation Exchange and Syetric Weights assuptions, it follows that the weight atrix Ak) is syetric for all k. Since A k) is stochastic for any k, we have that Ak) is doubly stochastic for all k. 4 Convergence of the Subgradient Method Here, we study the convergence behavior of the subgradient ethod introduced in Section 2. In particular, we have shown that the iterations of the ethod satisfy the following relation: for any i {1,..., }, and s and k with k s, k ) x i k + 1) = [Φk, s)] i jx j s) [Φk, r)] i jα j r 1)d j r 1) α i k)d i k), r=s+1 [cf. Eq. 5)]. We analyze this odel under the syetric weights assuption cf. Assuption 6). Also, we consider the case of a constant stepsize that is coon to all 18

agents, i.e., α j r) = α for all r and all agents j, so that the odel reduces to the following: for any i {1,..., }, and s and k with k s, k ) x i k + 1) = [Φk, s)] i jx j s) α [Φk, r)] i jd j r 1) αd i k). 21) r=s+1 To analyze this odel, we consider a related stopped odel whereby the agents stop coputing the subgradients d j k) at soe tie, but they keep exchanging their inforation and updating their estiates using only the weights for the rest of the tie. To describe the stopped odel, we use relation 21) with s = 0, fro which we obtain k ) x i k + 1) = [Φk, 0)] i jx j 0) α [Φk, s)] i jd j s 1) αd i k). 22) Suppose that agents cease coputing d j k) after soe tie t k, so that s=1 d j k) = 0 for all j and all k with k k. Let { x i k)}, i = 1,..., be the sequences of the estiates generated by the agents in this case. Then, fro relation 22) we have for all i, x i k) = x i k) for all k k, and for k > k, x i k) = = [Φk 1, 0)] i jx j 0) α [Φk 1, 0)] i jx j 0) α k s=1 k s=1 ) [Φk 1, s)] i jd j s 1) αd i k) ) [Φk 1, s)] i jd j s 1). By letting k and by using Proposition 2b), we see that the liit vector li k x i k) exists. Furtherore, the liit vector does not depend on i, but does depend on k. We denote this liit by y k), i.e., for which, by Proposition 2a), we have y k) = 1 li k xi k) = y k), x j 0) α k s=1 ) 1 d js 1). Note that this relation holds for any k, so ay re-index these relations by using k, and thus obtain yk + 1) = yk) α d j k) for all k. 23) 19

Recall that the vector d j k) is a subgradient of the agent j objective function f j x) at x = x j k). Thus, the preceding iteration can be viewed as an iteration of an approxiate subgradient ethod. Specifically, for each j, the ethod uses a subgradient of f j at the estiate x j k) approxiating the vector yk) [instead of a subgradient of f j x) at x = yk)]. We start with a lea providing soe basic relations used in the analysis of subgradient ethods. Siilar relations have been used in various ways to analyze subgradient approaches for exaple, see Shor [21], Polyak [20], Nedić and Bertsekas [12], [13], and Nedić, Bertsekas, and Borkar [14]). In the following lea and thereafter, we use the notation fx) = i=1 f ix). Lea 5 Basic Iterate Relation) Let the sequence {yk)} be generated by the iteration 23), and the sequences {x j k)} for j {1,..., } be generated by the iteration 22). Let {g j k)} be a sequence of subgradients such that g j k) f j yk)) for all j {1,..., } and k 0. We then have: a) For any x R n and all k 0, yk + 1) x 2 yk) x 2 + 2α 2α d j k) + g j k) ) yk) x j k) [fyk)) fx)] + α2 2 d j k) 2. b) When the optial solution set X is nonepty, there holds for all k 0, dist 2 yk + 1), X ) dist 2 yk), X ) + 2α d j k) + g j k) ) yk) x j k) 2α [fyk)) f ] + α2 2 d j k) 2. Proof. Fro relation 23) we obtain for any x R n and all k 0, yk + 1) x 2 = yk) α iplying that d j k) x 2, yk + 1) x 2 yk) x 2 2α d j k) yk) x) + α2 2 d j k) 2. 24) 20

We now consider the ter d j k) yk) x) for any j, for which we have d j k) yk) x) = d j k) yk) x j k)) + d j k) x j k) x) d j k) yk) x j k) + d j k) x j k) x). Since d j k) is a subgradient of f j at x j k) [cf. Eq. 1)], we further have for any j and any x R n, d j k) x j k) x) f j x j k)) f j x). Furtherore, by using a subgradient g j k) of f j at yk) [cf. Eq. 1)], we also have for any j and x R n, f j x j k)) f j x) = f j x j k)) f j yk)) + f j yk)) f j x) g j k) x j k) yk)) + f j yk)) f j x) g j k) x j k) yk) + f j yk)) f j x). By cobining the preceding three relations it follows that for any j and x R n, d j k) yk) x) d j k) + g j k) ) yk) x j k) + f j yk)) f j x). Suing this relation over all j, we obtain d j k) yk) x) d j k) + g j k) ) yk) x j k) + fyk)) fx). By cobining the preceding inequality with Eq. 24) the relation in part a) follows, i.e., for all x R n and all k 0, yk + 1) x 2 yk) x 2 + 2α 2α d j k) + g j k) ) yk) x j k) [fyk)) fx)] + α2 2 d j k) 2. The relation in part b) follows by letting x X and by taking the infiu over x X in both sides of the preceding relation. We adopt the following assuptions for our convergence analysis: Assuption 7 Bounded Subgradients) The subgradient sequences {d j k)} and {g j k)} are bounded for each j, i.e., there exists a scalar L > 0 such that ax{ d j k), g j k) } L for all j = 1,...,, and all k 0. This assuption is satisfied, for exaple, when each f i is polyhedral i.e., f i is the pointwise axiu of a finite nuber of affine functions). 21

Assuption 8 Nonepty Optial Solution Set) The optial solution set X is nonepty. Our ain convergence results are given in the following proposition. In particular, we provide a unifor bound on the nor of the difference between yk) and x i k) that holds for all i {1,..., } and all k 0. We also consider the averaged-vectors ŷk) and ˆx i k) defined for all k 1 as follows: ŷk) = 1 k 1 yh), k h=0 ˆx i k) = 1 k 1 x i h) for all i {1,..., }. 25) k h=0 We provide upper bounds on the objective function value of the averaged-vectors. Note that averaging allows us to provide our estiates per iteration 3. Proposition 3 Let Connectivity, Bounded Intercounication Interval, Siultaneous Inforation Exchange, and Syetric Weights assuptions hold [cf. Assuptions 2, 3, 5, and 6]. Let the Bounded Subgradients and Nonepty Optial Set assuptions hold [cf. Assuptions 7 and 8]. Let x j 0) denote the initial vector of agent j and assue that ax 1 j xj 0) αl. Let the sequence {yk)} be generated by the iteration 23), and let the sequences {x i k)} be generated by the iteration 22). We then have: a) For every i {1,..., }, a unifor upper bound on yk) x i k) is given by: yk) x i k) 2αLC 1 for all k 0, C 1 = 1 + 1 1 η ) 1 1 + η B0 1 η. b) Let ŷk) and ˆx i k) be the averaged-vectors of Eq. 25). An upper bound on the objective cost fŷk)) is given by: fŷk)) f + dist2 y0), X ) 2αk + αl2 C 2 for all k 1. When there are subgradients ĝ ij k) of f j at ˆx i k) that are bounded uniforly by soe constant ˆL 1, an upper bound on the objective value fˆx i k)) for each i is given by: fˆx i k)) f + dist2 y0), X ) 2αk ) LC + αl 2 + 2ˆL 1 C 1 for all k 1, 3 See also our recent work [15] which uses averaging to generate approxiate prial solutions with convergence rate estiates for dual subgradient ethods. 22

where L is the subgradient nor bound of Assuption 7, y0) = 1 xj 0), and C = 1+8C 1. The constant is given by = 1)B and B is the intercounication interval bound of Assuption 3. Proof. a) Fro Eq. 23) it follows that yk) = y0) α k 1 s=0 d js) for all k 1. Using this relation, the relation y0) = 1 xj 0) and Eq. 22), we obtain for all k 0 and i {1,..., }, yk) x i k) k 1 α s=1 1 α 1 x j 0) 1 ) [Φk 1, 0)]ij ) [Φk 1, s)]ij d j s 1). d j k 1) d i k 1)) Therefore, for all k 0 and i {1,..., }, yk) x i k) ax 1 j xj 0) 1 [Φk 1, 0)]i j k 1 + α d j s 1) 1 [Φk 1, s)]i j s=1 + α d j k 1) d i k 1). Using the estiates for [Φk 1, 0)] i j 1 of Proposition 2b), the assuption that ax 1 j x j 0) αl, and the subgradient boundedness [cf. Assuption 7], fro the preceding relation we obtain for all k 0 and i {1,..., }, yk) x i k) 2αL 1 + η 2αL 1 η 1 + k 1 1 η ) k 1 s + 2αL s=0 1 1 η ) 1 ) 1 + η B0. 1 η b) By using Lea 5b) and the subgradient boundedness [cf. Assuption 7], we have for all k 0, dist 2 yk+1), X ) dist 2 yk), X )+ 4αL yk) x j k) 2α [fyk)) f ]+ α2 L 2. Using the estiate of part a), we obtain for all k 0, dist 2 yk + 1), X ) dist 2 yk), X ) + 4αL 2αLC 1 2α [fyk)) f ] + α2 L 2 = dist 2 yk), X ) + α2 L 2 C 2α [fyk)) f ], 23

where C = 1 + 8C 1. Therefore, we have fyk)) f dist2 yk), X ) dist 2 yk + 1), X ) 2α/ + αl2 C 2 for all k 0. By suing preceding relations over 0,..., k 1 and dividing the su by k, we have for any k 1, 1 k k 1 k=0 fyh)) f dist2 y0), X ) dist 2 yh), X ) 2α/ dist2 y0), X ) 2αk/ + αl2 C 2 + αl2 C 2 26) By the convexity of the function f, we have 1 k k 1 fyh)) fŷk)) k=0 where ŷk) = 1 k 1 yh). k k=0 Therefore, by using the relation in 26), we obtain fŷk)) f + dist2 y0), X ) 2αk + αl2 C 2 for all k 1. 27) We now show the estiate for fˆx i k)). By the subgradient definition, we have fˆx i k)) fŷk)) + ĝ ij k) ˆx i k) ŷk)) for all i {1,..., } and k 1, where ĝ ij k) is a subgradient of f j at ˆx i k). Since ĝ ij k) ˆL 1 for all i, j {1,..., }, and k 1, it follows that fˆx i k)) fŷk)) + ˆL 1 ˆx i k) ŷk). Using the estiate in part a), the relation ˆx i k) ŷk) k 1 l=0 xi l) yl) /k, and Eq. 27), we obtain for all i {1,..., } and k 1, fˆx i k)) f + dist2 y0), X ) 2αk + αl2 C 2 + 2αˆL 1 LC 1. Part a) of the preceding proposition shows that the error between yk) and x i k) for all i is bounded fro above by a constant that is proportional to the stepsize α, i.e., by picking a saller stepsize in the subgradient ethod, one can guarantee a saller error between the vectors yk) and x i k) for all i {1,..., } and all k 0. In part b) of the proposition, we provide upper bounds on the objective function values of the averaged-vectors ŷk) and ˆx i k). The upper bounds on fˆx i k)) provide estiates for the error fro the optial value f at each iteration k. More iportantly, they show 24

that the error consists of two additive ters: The first ter is inversely proportional to the stepsize α and goes to zero at a rate 1/k. The second ter is a constant that is proportional to the stepsize α, the subgradient bound L, and the constants C and C 1, which are related to the convergence of the transition atrices Φk, s). Hence, our bounds provide explicit per-iteration error expressions for the estiates aintained at each agent i. The fact that there is a constant error ter in the estiates which is proportional to the stepsize value α is not surprising and is due to the fact that a constant stepsize rule is used in the subgradient ethod of Eq. 21). It is possible to use different stepsize rules e.g. diinishing stepsize rule or adaptive stepsize rules; see [1], [13], and [12]) to drive the error to zero in the liit. We use constant stepsize rule in view of its siplicity and because our goal is to generate approxiate optial solutions in relatively few nuber of iterations. Our analysis explicitly characterizes the tradeoff between the quality of an approxiate solution and the coputation load required to generate such a solution in ters of the stepsize value α. 5 Conclusions In this paper, we presented an analysis of a distributed coputation odel for optiizing the su of objective functions of ultiple agents, which are convex but not necessarily sooth. In this odel, every agent generates and aintains estiates of the optial solution of the global optiization proble. These estiates are counicated directly or indirectly) to other agents asynchronously and over a tie-varying connectivity structure. Each agent updates his estiates based on local inforation concerning the estiates received fro his iediate neighbors and his own cost function using a subgradient ethod. We provide convergence results for this ethod focusing on the objective function values of the estiates aintained at each agent. To achieve this, we first analyze the convergence behavior of the transition atrices governing the inforation exchange aong the agents. We provide explicit rate results for the convergence of the transition atrices. We use these rate results in the analysis of the subgradient ethod. For the constant stepsize rule, we provide bounds on the error between the objective function values of the estiates at each agent and the optial value of the global optiization proble. Our bounds are per-iteration and explicitly characterize the dependence of the error on the algorith paraeters and the underlying connectivity structure. The results in this paper add to the growing literature on the cooperative control of ulti-agent systes. The fraework provided in this paper otivates further analysis of a nuber of interesting questions: Our odel assues that there are no constraints in the global optiization proble. One interesting area of research is to incorporate constraints into the distributed coputation odel. The presence of constraints ay destroy the linearity in the inforation evolution and will necessitate a different line of analysis. We assue that the set of agents in the syste is fixed in our odel. Studying 25

the effects of dynaics in the presence of agents in the syste is an interesting research area, which we leave for future work. The update rule studied in this paper assues that there is no delay in receiving the estiates of the other agents. This is a restrictive assuption in any applications in view of counication and other types of delays. The convergence and convergence rate analysis of this paper can be extended to this ore general case and is the focus of our current research [16]. The update rule assues that agents can send and process real-valued estiates, thus excluding the possibility of counication bandwidth constraints on the inforation exchange. This is a question that is attracting uch recent attention in the context of consensus algoriths see Kashyap et al. [9] and Carli et al. [6]). Understanding the iplications of counication bandwidth constraints on the perforance of the asynchronous distributed optiization algoriths both in ters of convergence rate and error is an iportant area for future study. 26