Asynchronous Gossip Algorithms for Stochastic Optimization

Similar documents
Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Distributed Subgradient Methods for Multi-agent Optimization

Multi-Dimensional Hegselmann-Krause Dynamics

Distributed Optimization over Networks Gossip-Based Algorithms

Average Consensus and Gossip Algorithms in Networks with Stochastic Asymmetric Communications

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

Fairness via priority scheduling

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Computational and Statistical Learning Theory

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

On Constant Power Water-filling

STOPPING SIMULATED PATHS EARLY

On Conditions for Linearity of Optimal Estimation

Stochastic Subgradient Methods

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

OPTIMIZATION in multi-agent networks has attracted

Hybrid System Identification: An SDP Approach

3.8 Three Types of Convergence

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Polygonal Designs: Existence and Construction

Bipartite subgraphs and the smallest eigenvalue

arxiv: v2 [math.co] 3 Dec 2008

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

A Simple Regression Problem

Lecture 21 Nov 18, 2015

Fixed-to-Variable Length Distribution Matching

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Distributed Estimation using Bayesian Consensus Filtering

New Classes of Positive Semi-Definite Hankel Tensors

Non-Parametric Non-Line-of-Sight Identification 1

Support recovery in compressed sensing: An estimation theoretic approach

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

Estimating Entropy and Entropy Norm on Data Streams

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.

Physics 215 Winter The Density Matrix

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

A note on the multiplication of sparse matrices

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

arxiv: v1 [cs.ds] 3 Feb 2014

Chaotic Coupled Map Lattices

Least Squares Fitting of Data

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

arxiv: v1 [math.pr] 17 May 2009

Some Classical Ergodic Theorems

Convex Programming for Scheduling Unrelated Parallel Machines

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

A note on the realignment criterion

Shannon Sampling II. Connections to Learning Theory

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Detection and Estimation Theory

1 Identical Parallel Machines

Lower Bounds for Quantized Matrix Completion

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

CS Lecture 13. More Maximum Likelihood

Block designs and statistics

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

A Theoretical Analysis of a Warm Start Technique

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

arxiv: v1 [math.na] 10 Oct 2016

THE WEIGHTING METHOD AND MULTIOBJECTIVE PROGRAMMING UNDER NEW CONCEPTS OF GENERALIZED (, )-INVEXITY

Data Dependent Convergence For Consensus Stochastic Optimization

Tail estimates for norms of sums of log-concave random vectors

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Max-Product Shepard Approximation Operators

THE EFFECT OF DETERMINISTIC NOISE 1 IN SUBGRADIENT METHODS

Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

Exact tensor completion with sum-of-squares

COS 424: Interacting with Data. Written Exercises

Sparse beamforming in peer-to-peer relay networks Yunshan Hou a, Zhijuan Qi b, Jianhua Chenc

Fundamental Limits of Database Alignment

Interactive Markov Models of Evolutionary Algorithms

Supplementary Information for Design of Bending Multi-Layer Electroactive Polymer Actuators

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

The Transactional Nature of Quantum Information

Sharp Time Data Tradeoffs for Linear Inverse Problems

M ath. Res. Lett. 15 (2008), no. 2, c International Press 2008 SUM-PRODUCT ESTIMATES VIA DIRECTED EXPANDERS. Van H. Vu. 1.

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

arxiv: v1 [cs.ds] 29 Jan 2012

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Transcription:

Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu V. V. Veeravalli ECE Dept. University of Illinois Urbana, IL 680 vvv@illinois.edu Abstract We consider a distributed ulti-agent networ syste where the goal is to iniize an objective function that can be written as the su of coponent functions, each of which is nown partially with stochastic errors to a specific networ agent. We propose an asynchronous algorith that is otivated by rando gossip schees where each agent has a local Poisson cloc. At each tic of its local cloc, the agent averages its estiate with a randoly chosen neighbor and adjusts the average using the gradient of its local function that is coputed with stochastic errors. We investigate the convergence properties of the algorith for two different classes of functions. First, we consider differentiable, but not necessarily convex functions, and prove that the gradients converge to zero with probability. Then, we consider convex, but not necessarily differentiable functions, and show that the iterates converge to an optial solution alost surely. I. INTRODUCTION The proble of iniizing of a su of functions when each coponent function is available partially with stochastic errors to a specific networ agent is an iportant proble in the context of wired and wireless networs [, [4, [7, [8. These probles require the design of optiization algoriths that are distributed, i.e. without a central coordinator and local in the sense that each agent can only use its local objective function and can exchange soe liited inforation with its iediate neighbors. In this paper, we propose an asynchronous distributed algorith that is inspired by the rando gossip averaging schee of [7. Each agent has a local Poisson cloc and aintains an iterate sequence. At each tic of its local cloc, the agent first randoly selects a neighbor, and coputes the average of its current iterate and the iterate received fro the selected neighbor. Then, the agent adjusts the coputed average using the gradient of its local function, which is nown only with stochastic errors. We investigate the convergence properties of the algorith under two different assuptions on the objective functions: a differentiable but not necessarily convex, and b convex but not necessarily differentiable. The algorith in this paper is related to the distributed consensus-based optiization algorith proposed in [ and further studied in [4, [6, [8, [, [, [7, [8. In consensus-based algoriths, each agent aintains an This research is supported by a Vodafone Graduate Fellowship, and by NSF Awards CNS-08670 and CMMI-0748. iterate sequence and updates using its local function gradient inforation. These algoriths are synchronous and require the agents to update siultaneously, which is in contrast with the asynchronous algorith proposed in this paper. A different distributed odel has been proposed in [ and also studied in [, [, [, where the coplete objective function inforation is available to each agent, with the ai of distributing the processing by allowing an agent to update only a part of the decision vector. Related to the algorith of this paper is also the literature on increental algoriths [4, [, [4, [, [7, [9, [0, [4, [6, [7, [0, where the networ agents sequentially update a single iterate sequence and only one agent updates at any given tie in a cyclic or a rando order. While being local, the increental algoriths differ fundaentally fro the algorith studied in this paper where all agents aintain and update their own iterate sequence. In addition, the wor in this paper is related to a uch broader class of gossip algoriths used for averaging [, [8. Since we are interested in the effect of stochastic errors, our wor is also related to the stochastic subgradient ethods [, [0, [. The novelty of our wor is in several directions. First, our gossip-based asynchronous algoriths allow the agents to use the stepsize based on the nuber of their local updates; thus the stepsize is not coordinated aong the agents. Second, we study the convergence of the algorith when the functions are non-convex, which is unlie the recent trend in the distributed networ optiization where typically convex functions are considered see e.g., [6, [8, [, [, [, [7, [8. Third, we are dealing with the general case where the agents copute their subgradients with stochastic errors. Due to agent inforation exchange, the stochastic errors propagate across agents and tie, which together with the stochastic nature of the agent stepsizes, highly coplicates the convergence analysis. Our analysis cobines the ideas used to study the basic gossip-averaging algorith [7 with the tools that are generally used to study the convergence of the stochastic gradient schees. The rest of the paper is organized in the following anner. In the next section, we describe the proble of our interest, There are papers that discuss the convergence of cyclic increental algoriths when the functions are nonconvex e.g., [9. However, cyclic increental algoriths are not distributed since the agents have to be organized in a cycle by a central coordinator.

present our algorith and assuptions. In Section III, aong other preliinaries, we investigate the asyptotic properties of the agent disagreeents. In Section IV, the convergence properties of the algorith are studied. We conclude with a discussion in Section V. II. PROBLEM, ALGORITHM AND ASSUMPTIONS We consider a networ of agents that are indexed by,..., ; when convenient, we will use V = {,..., }. The networ has a static topology that is represented by the bidirectional graph V, E, where E is the set of lins in the networ. We have {i, j} E if agent i and agent j can counicate with each other. We assue that the networ [i.e., the graph V, E is connected. The networ objective is to solve the following optiization proble : iniize fx := f i x i= subject to x R, where f i : R R for all i. The function f i is only nown to agent i that can copute the gradient f i x with stochastic errors The goal is to solve proble using an algorith that is distributed and local. where x i,0, i V are initial iterates of the agents, x I,J = x I, x J,, f i x denotes the gradient of f i at x, ɛ i, is the stochastic error and Γ i denotes the total nuber of agent i updates up to the tie Z. B. Assuptions We ae the following assuption on the functions. Assuption : The gradients are uniforly bounded, i.e., sup x R f i x C for soe C > 0 and for all i V. In addition to this, we will use two coplientary sets of assuptions on the functions f i, as discussed later. Let F be the σ-algebra generated by the entire history of the algorith up to tie Z, i.e., F = {I l, J l, ɛ Il,l, ɛ Jl,l; 0 l }. We ae the following assuptions on the stochastic errors. Assuption : With probability, we have: a E [ ɛ i, F ν for all and i V, and soe ν. b E[ɛ I, F, I, J = 0, E[ɛ J, F, I, J = 0. The assuption is satisfied, for exaple, when the errors are zero ean, independent across tie and have bounded second oents. A. Asynchronous Gossip Optiization Algorith Let Ni be the set of neighbors of agent i, i.e. Ni = III. PRELIMINARIES {j V : {i, j} E}. Each agent has a local cloc that All vectors are colun vectors, x i to denotes the i-th tics at a Poisson rate 4 of. At each tic of its cloc, agent coponent of a vector x, and x denotes the Euclidean i averages its iterate with a randoly selected neighbor j nor of a vector x. We use to denote a vector with Ni, where each neighbor has an equal chance of being all coponents equal to. In our analysis, we frequently selected. Agents i and j then adjust their averages along the invoe the following result due to Robbins and Siegund negative direction of f i and f j, respectively, which are see Lea, Chapter., [. coputed with stochastic errors. Lea : Let Ω, F, P be a probability space and F 0 As in [7 we will find it easier to study the gossip algoriths in ters of a single virtual cloc that tics whenever F... be a sequence of sub σ-fields of F. Let {u }, {v }, {q } and {w } be F -easurable rando variables, where any of the local Poisson cloc tics. Thus, the virtual cloc {u } is uniforly bounded below, and {v }, {q } and {w } tics according to a Poisson process with rate. Let Z are non-negative. Let =0 w <, =0 q < and denote the -th tic of the virtual cloc and let I denote the index of the agent whose local cloc actually ticed at E[u F q u v w that instant. The fact that the Poisson clocs at each agent hold with probability. Then, with probability, the sequence {u } converges and are independent iply that I is uniforly distributed in the =0 set V. In addition, the eoryless property of the Poisson <. arrival process ensure that the process {I } is i.i.d. Let J A. Relative Frequency of Agents Updates denote the rando index of the agent counicating with We characterize the nuber Γ i of ties agent i updates agent I. Observe that J, conditioned on I, is uniforly its iterate until tie Z inclusively see Eq.. Define the distributed in the set NI. Let x i, denote agent i iterate event E i, = {I = i} {J = i}. This is essentially at tie iediately before Z. The iterates evolve according the event that agent i updates its iterate at tie Z. It is to easy to see that {E { i, } are independent events with the xi,j x i, = Γ i f i x I,J ɛ i, if i {I, J } sae tie invariant probability distribution. Define γ i to x i, otherwise, be the probability of event E i,. Since I is uniforly distributed on the set V and J, conditioned on I = j, is uniforly distributed on the set Nj, it follows that By coponentwise application, our results and proofs can be extended γ to the case when x is a finite-diension vector. i = j Ni Nj. See [6 for the otivation for studying stochastic errors. 4 The odel and the analysis can be easily extended to the case when the clocs have different rates. If the function is not differentiable but convex then f i x denotes a subgradient. We will discuss this later.

Define χ A to be the indicator function of an event A, and note that Γ i = l= χ E i,. Since the events {χ Ei, } are i.i.d., fro the law of iterated logariths [9, we can conclude that for any p, q > 0, with probability, Γ i γ i li p for all i V. q We can therefore conclude that with probability, for all i V and for all sufficiently large, Γ i, Γ i γ i p. 4 q B. Alternative Representation of the Algorith We next give the algorith in a ore convenient for for our analysis. Let e i denote the unit vector with only its i-th coponent being non-zero. Define W = I e I e J e I e J. Since {I }, {J } are i.i.d. sequences, {W } is also an i.i.d. sequence. Define W = E[W. Since each W is syetric and doubly stochastic with probability, W is also syetric and doubly stochastic. Further, the axiu eigenvalue of W is, and is not a repeated eigenvalue when the networ is connected 6. We also have E [ W = W see [7. Let x be the vector with coponents x i,, i =,...,. Then, fro the definition of the ethod in, we have where p = and x I,J then have x = W x p for, Γ i f i x I,J ɛ i, e i, = x I, x J, /. Define y = x. We y = x = W x p. By the doubly stochasticity of W, with probability, it follows y = x p C. Agent Consensus = y p. 6 We use x y to quantify the disagreeent between the agents, and we show that the disagreeents converge to 0. Lea : Let Assuptions and a hold. Then, with probability, we have li x y = 0. = x y < and 6 In this case, W is a stochastic irreducible atrix and λ = is its largest real eigenvalue with a unique right eigenvector, see e.g. [, Corollary, page 6. Proof: Fro and 6 it follows E[ x y F [ = E W x p y p F E[ W x y F E[ p F, 7 where the inequality follows fro the triangle inequality of nors and the doubly stochasticity of W. The first ter can be estiated using the relation E [ W W = E[W = W iplying that W is positive sei-definite as follows: E [ W x y F = x y E [ W W x y = x y W x y = λ i vi x y, i= where λ i is the i-th largest eigenvalue and v i is the corresponding eigenvector of W. The last step follows fro the eigenvector decoposition of the syetric positive seidefinite atrix W. Recall that λ = the largest value of W and the corresponding eigenvector is. Hence, Eˆ W x y F λ x y. 8 We next estiate the second ter in 7. Using and the boundedness of the gradients Assuption, we can conclude that for sufficiently large, we have Eˆ p F E4 X Γ i fi xi,j ɛ i, F 4E4 X ` fi x Γ i I,J ɛ i, F 4 C ν 9 Fro 7, 8, 9, and the Jensen s inequality we can see that E[ x y 4C ν λ E[ x y, where λ <. Therefore, we have for sufficiently large, E[ x y E[ x y λ E[ x y 4C ν. Using the deterinistic analog of Lea, we E[ x see that y <, which iplies x y < with probability. We next prove the second part of the stateent. As a consequence of the preceding result, it follows li inf x y = 0. We only need to prove alost sure convergence of x y to coplete the proof. Fro the definitions of x and y in and 6, we obtain Eˆ x y F

= E" W x p y p Eˆ W x y F F # p E[ W x y F v " # u t E p p F # E" p p F, 0 where in the last step we use Cauchy-Schwartz inequality. We next estiate the last ter in 0, as follows E[ p p F E [ p F E[ p E [ p F 4E [ p F. F In the last step we use the fact that only two coponents of p are non-zero. Using this in 0, substituting fro 8 and 9, and taing into account λ <, we obtain E [ x y F x y 8 λ C ν x y x y 6 C ν. As shown earlier, we have < with probability. We can invoe Lea to conclude that x y converges with probability. IV. CONVERGENCE ANALYSIS We here study the convergence of the algoriths under two different sets of conditions. The first requires the function to be differentiable with Lipschitz continuous gradient, i.e., fx fy L x y. A point x R is a stationary point of fx if fx = 0. A global iniu of fx is also a stationary point of fx. Typically, when the objective function is non-convex and iterative ethods are eployed, the iterates ay converge to a stationary point. Theore : Let Assuptions and hold, and let the function fx be bounded below with Lipschitz derivatives. Then, with probability, we have li x i, y = 0 for all i V, {fx i, } converges, and li inf fx i, = 0. Proof: Lea asserts that li x i, y = 0. Next, fro the definition of p in we obtain p = Γ i f i x I,J ɛ i, = γ i f i y ɛ i, Γ i Γ i f i x I,J Γ i f i y Γ i f i y. γ i Taing conditional expectations, and using, and the boundedness of the gradient we obtain i E hp F fy 4 E X f i y F fy γ i X E[ɛ i, F X L E[ x i, y F i= i= X C E» Γ i γ i F. i= Since γ i is the probability that agent i updates at tie Z it follows that E f i y F = fy, γ i so that the first ter in is equal to 0. Using Assuption, we can see that the second ter is 0. Further, note fro 4 and Lea it follows that the last three ters in are suable. Thus, fro 6 we obtain y = y fy a, where E[a F <. Additionally, fro Assuption a, Lea showing that x i, y 0 and relation 4 it follows that E [ a F <. The result now follows fro classic stochastic optiization theory see [, or Chapter of [6. Observe that, in view of Lipschitz continuity of the gradient, the assuption that the gradients are bounded is equivalent to the following standard assuption. Assuption : The sequences {x i, }, i V, are bounded with probability. This assuption is iplicit and not very easy to establish. We refer the reader to Chapter of [6 for soe discussions on techniques to verify this assuption. We will next investigate the convergence when the functions are convex, but not necessarily differentiable. At points

where the gradient does not exist, we use the notion of subgradient. A vector gx is a subgradient of a function g at a point x do g if the following relation holds gx y x gy gx for all y do g. We next discuss the convergence of the algoriths. Theore : Let Assuptions and hold. Assue that X = Argin x R fx is non-epty, and f i x is convex for each i V. Then, with probability, the sequences {x i, }, i V, converge to the sae point in X. Proof: Let x be an arbitrary point in X. Using 6 we obtain y x = y p x y x p y x p y x p x I, x J, x p i= y x i, p. Fro the definition of p in and the subgradient inequality in we can write y x y x ɛ I, ɛ J, xi, x J, f i x I,J f i x Γ i x Γ i p i= y x i, p y x ɛ I, ɛ J, f i y f i x Γ i x xi, x J, Γ i f i x I,J f i y Γ i p i= y x i, p. Using the subgradient inequality and subgradient boundedness Assuption to bound the fourth ter, we get y x y x ɛ I, ɛ J, C i= f i y f i x Γ i x xi, x J, Γ i y x i, Γ i p i= y x i, p. Taing conditional expectations and using, we obtain Eˆ y x F y x E4 X f iy f ix F Γ i E4 ɛ xi, x J, I, ɛ J, x F Γ i C X y x i, i= E[ p F P i= y x i, Eˆ p F. Using the bounds in 9, we obtain for sufficiently large, Eˆ y x F y x E4 X f iy f ix F Γ i E4 ɛ xi, x J, I, ɛ J, x F Γ i C X y x i, i= 4C ν P i= y x i, y x E4 X xi, x J, 8C ν f iy f ix γ i x F E4 ɛ I, ɛ J, F γ i E4 X f iy f ix γ i Γ i F 6C 4ν P i= y x i, 8C ν. Note fro Assuption b that the third ter is 0. Since γ i is the probability that agent i updates at tie Z, we have Eˆ y x F y x fy fx E4 X f iy f ix γ i Γ i F 6C 4ν P i= y x i, 8C ν. Using the subgradient inequality and the inequality a < a, we can bound the third ter as follows f i y f i x γ i Γ i C y x γ i Γ i C γ i Γ i y x.

Cobining the two preceding relations we obtain E [ y x F CE γ i Γ i F y x fy fx CE γ i Γ i F 6C 4ν i= y x i, 8C ν. Using 4, we can see that the conditions of Lea are satisfied. Therefore { y x } converges and fy fx < with probability, which iplies that {y } converges to a point in the set X with probability. This and the fact li x i, y = 0 for all i V, with probability, shown in Lea iply that {x i, } converge to the sae point in X, with probability. V. DISCUSSION Using very siilar ideas the algorith and the proof of convergence can be extended to the case when x is a finite diensional vector. When the proble in is a constrained optiization proble where x is restricted to a convex and closed set X, then the algorith in can be extended by projecting onto the set X at each iteration. It is easy to obtain a convergence result siilar to Theore for this case using Euclidean projection inequalities. As a part of our future wor, we plan to investigate optiization algoriths based on different gossip schees. REFERENCES [ T. Aysal, M. Yildiz, A. Sarwate, and A. Scaglione, Broadcast gossip algoriths: Design and analysis for consensus, Proceedings of the 47th IEEE Conference on Decision and Control, 008. [ D. P. Bertseas and J. N. Tsitsilis, Parallel and distributed coputation: Nuerical ethods, Athena Scientific, 997. [, Gradient convergence in gradient ethods with errors, SIAM Journal of Optiization 0 000, no., 67 64. [4 D. Blatt, A. O. Hero, and H. Gauchan, A convergent increental gradient ethod with constant stepsize, SIAM Journal of Optiization 8 007, no., 9. [ V. Borar, Asynchronous stochastic approxiations, SIAM Journal on Control and Optiization 6 998, no., 840 8. [6, Stochastic approxiation: A dynaical viewpoint, Cabridge University Press, 008. [7 S. Boyd, A. Ghosh, B. Prabhaar, and D. Shah, Randoized gossip algoriths, IEEE Transactions on Inforation Theory 006, no. 6, 08 0. [8 A. Diais, A. Sarwate, and M. Wainwright, Geographic gossip: Efficient averaging for sensor networs, IEEE Transactions on Signal Processing 6 008, no., 0 6. [9 R. Dudley, Real analysis and probability, Cabridge University Press, 00. [0 Y. Eroliev, Stochastic prograing ethods, Naua, Moscow, 976. [, Stochastic quasi-gradient ethods and their application to syste optiization, Stochastics 9 98, no., 6. [ A. A. Gaivoronsi, Convergence properties of bacpropogation for neural nets via theory of stochastic gradient ethods. Part., Optiization Methods and Software 4 994, no., 7 4. [ R. G.Gallager, Discrete stochastic processes, Kluwer Acadeic Publishers, Norwell, Massachusetts, USA, 996. [4 B. Johansson, On distributed optiization in networed systes, Ph.D. thesis, Royal Institute of Technology, Stochol, Sweden, 008. [ B. Johansson, M. Rabi, and M. Johansson, A siple peer-to-peer algorith for distributed optiization in sensor networs, Proceedings of the 46th IEEE Conference on Decision and Control, 007, pp. 470 470. [6 B. Johanssson, T. Keviczsy, M. Johansson, and K. Johansson, Subgradient ethods and consensus algoriths for solving convex optiization probles, Proceedings of the 47th IEEE Conference on Decision and Control, 008, pp. 48 490. [7 K. C. Kiwiel, Convergence of approxiate and increental subgradient ethods for convex optiization, SIAM Journal on Optiization 4 00, no., 807 840. [8 I. Lobel and A. Ozdaglar, Distributed subgradient ethods over rando networs, Lab. for Inforation and Decision Systes, MIT, Report 800, 008. [9 A. Nedić and D. P. Bertseas, Increental subgradient ethod for nondifferentiable optiization, SIAM Journal of Optiization 00, 09 8. [0, The effect of deterinistic noise in sub-gradient ethods, Tech. report, Lab. for Inforation and Decision Systes, MIT, 007. [ A. Nedić, A. Olshevsy, A. Ozdaglar, and J. N. Tsitsilis, Distributed subgradient algoriths and quantization effects, Proceedings of the 47th IEEE Conference on Decision and Control, 008. [ A. Nedić and A. Ozdaglar, On the rate of convergence of distributed asynchronous subgradient ethods for ulti-agent optiization, Proceedings of the 46th IEEE Conference on Decision and Control, 007, pp. 47 476. [ B. T. Polya, Introduction to optiization, Optiization Software Inc., 987. [4 M. G. Rabbat and R. D. Nowa, Quantized increental algoriths for distributed optiization, IEEE Journal on Select Areas in Counications 00, no. 4, 798 808. [ S. Sundhar Ra, A. Nedić, and V. V. Veeravalli, Distributed stochastic subgradient algorith for convex optiization, Available at http://arxiv.org/abs/08.9, 008. [6 S. Sundhar Ra, A. Nedić, and V. V. Veeravalli, Increental stochastic sub-gradient algoriths for convex optiization, Available at http://arxiv.org/abs/0806.09, 008. [7 S. Sundhar Ra, V. V. Veeravali, and A. Nedić, Sensor networs: When theory eets practice, ch. Distributed and recursive estiation, Springer, 009. [8 S. Sundhar Ra, V. V. Veeravalli, and A. Nedić, Distributed and nonautonoous power control through distributed convex optiization, IEEE INFOCOM, 009. [9 M. V. Solodov, Increental gradient algoriths with stepsizes bounded away fro zero, Coputational Optiization and Algoriths 998, no.,. [0 M. V. Solodov and S. K. Zavriev, Error stability properties of generalized gradient-type algoriths, Journal of Optiization Theory and Applications 98 998, no., 66 680. [ J. N. Tsitsilis, Probles in decentralized decision aing and coputation, Ph.D. thesis, Massachusetts Institute of Technology, 984. [ J. N. Tsitsilis, D. P. Bertseas, and M. Athans, Distributed asynchronous deterinistic and stochastic gradient optiization algoriths, IEEE Transactions on Autoatic Control 986, no. 9, 80 8.