OPTIMIZATION in multi-agent networks has attracted

Similar documents
Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Distributed Subgradient Methods for Multi-agent Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Multi-Dimensional Hegselmann-Krause Dynamics

Sharp Time Data Tradeoffs for Linear Inverse Problems

Stochastic Subgradient Methods

A note on the multiplication of sparse matrices

Kernel Methods and Support Vector Machines

A Simple Regression Problem

Fairness via priority scheduling

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

Feature Extraction Techniques

Chapter 6 1-D Continuous Groups

The Methods of Solution for Constrained Nonlinear Programming

3.8 Three Types of Convergence

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Non-Parametric Non-Line-of-Sight Identification 1

COS 424: Interacting with Data. Written Exercises

Weighted- 1 minimization with multiple weighting sets

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Polygonal Designs: Existence and Construction

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

On the Use of A Priori Information for Sparse Signal Approximations

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

arxiv: v1 [cs.ds] 17 Mar 2016

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Randomized Recovery for Boolean Compressed Sensing

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Hybrid System Identification: An SDP Approach

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Computational and Statistical Learning Theory

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

REDUCTION OF FINITE ELEMENT MODELS BY PARAMETER IDENTIFICATION

arxiv: v1 [cs.ds] 3 Feb 2014

Interactive Markov Models of Evolutionary Algorithms

SPECTRUM sensing is a core concept of cognitive radio

The Transactional Nature of Quantum Information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Ch 12: Variations on Backpropagation

Block designs and statistics

Bootstrapping Dependent Data

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Revealed Preference and Stochastic Demand Correspondence: A Unified Theory

Fixed-to-Variable Length Distribution Matching

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

Introduction to Optimization Techniques. Nonlinear Programming

Boosting with log-loss

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Convex Programming for Scheduling Unrelated Parallel Machines

Analyzing Simulation Results

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

A Theoretical Analysis of a Warm Start Technique

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Revealed Preference with Stochastic Demand Correspondence

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

Average Consensus and Gossip Algorithms in Networks with Stochastic Asymmetric Communications

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Support Vector Machines MIT Course Notes Cynthia Rudin

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Curious Bounds for Floor Function Sums

On Constant Power Water-filling

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

arxiv: v1 [math.na] 10 Oct 2016

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

1 Bounding the Margin

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Topic 5a Introduction to Curve Fitting & Linear Regression

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

Support Vector Machines. Goals for the lecture

An improved self-adaptive harmony search algorithm for joint replenishment problems

STOPPING SIMULATED PATHS EARLY

Shannon Sampling II. Connections to Learning Theory

Exact tensor completion with sum-of-squares

Computational and Statistical Learning Theory

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Tight Complexity Bounds for Optimizing Composite Objectives

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

Data-Driven Imaging in Anisotropic Media

Correlated Bayesian Model Fusion: Efficient Performance Modeling of Large-Scale Tunable Analog/RF Integrated Circuits

Lecture 21. Interior Point Methods Setup and Algorithm

Transcription:

Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract We provide a unifying fraework for distributed convex optiization over tie-varying networks, in the presence of constraints and uncertainty, features that are typically treated separately in the literature. We adopt a proxial iniization perspective and show that this set-up allows us to bypass the difficulties of existing algoriths while siplifying the underlying atheatical analysis. We develop an iterative algorith and show convergence of the resulting schee to soe optiizer of the centralized proble. To deal with the case where the agents constraint sets are affected by a possibly coon uncertainty vector, we follow a scenario-based ethodology and offer probabilistic guarantees regarding the feasibility properties of the resulting solution. To this end, we provide a distributed ipleentation of the scenario approach, allowing agents to use a different set of uncertainty scenarios in their local optiization progras. The efficacy of our algorith is deonstrated by eans of a nuerical exaple related to a regression proble subject to regularization. Index Ters Distributed optiization, consensus, proxial iniization, uncertain systes, scenario approach. I. INTRODUCTION OPTIMIZATION in ulti-agent networks has attracted significant attention in the control and signal processing literature, due to its applicability in different doains like power systes [], [3], wireless networks [4], [5], robotics [6], etc. Typically, agents solve a local decision aking proble, counicate their decisions with other agents and repeat the process on the basis of the new inforation received. The ain objective of this cooperative set-up is for agents to agree on a coon decision that optiizes a certain perforance criterion for the overall ulti-agent syste while satisfying local constraints. This distributed optiization schee leads to coputational and counication savings copared to centralized paradigs, while allowing agents to keep privacy by exchanging partial inforation only. A. Contributions of this work In this paper we deal with distributed convex optiization probles over tie-varying networks, under a possibly different constraint set per agent, and in the presence of uncertainty. Focusing first on the deterinistic case, we construct an iterative, proxial iniization based algorith. Proxial iniization, where a penalty ter (proxy is introduced in the objective function of each agents local decision proble, serves as an alternative to (subgradient ethods. This is Research was supported by the European Coission, H00, under the project UnCoVerCPS, grant nuber 6439. Preliinary results, related to Sections II-IV of the current anuscript, can be found in []. K. Margellos is with the Departent of Engineering Science, University of Oxford, Parks Road, OX 3PJ, Oxford, UK, e-ail: kostas.argellos@eng.ox.ac.uk A. Falsone, S. Garatti and M. Prandini are with the Dipartiento di Elettronica Inforazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 3, 033 Milano, Italy, e-ail: {alessandro.falsone, sione.garatti, aria.prandini}@polii.it interesting per se, since it constitutes the ulti-agent counterpart of connections between proxial algoriths and gradient ethods that have been established in the literature for singleagent probles (see [7]. Moreover, as observed in [8] with reference to increental algoriths, the proxial iniization approach leads to nuerically ore stable algoriths copared to their gradient-based counterparts. A rigorous and detailed analysis is provided, showing that the proposed iterative schee converges to an optiizer of the centralized proble counterpart. This is achieved without iposing differentiability assuptions or requiring excessive eory capabilities as other ethods in the literature (see Section I-B for a detailed review. We ove then to the case where constraints depend on an uncertain paraeter and should be robustly satisfied for all values that this paraeter ay take. This poses additional challenges when devising a distributed solution ethodology. Here, we exploit results on scenario-based optiization [9] [4]. In particular, we assue that each agent is provided with its own, different fro the other agents, set of uncertainty realizations (scenarios and enforces the constraints corresponding to these scenarios only. We then show that our distributed algorith is applicable and that the converged solution is feasible in a probabilistic sense for the constraints of the centralized proble, i.e., it satisfies with high probability all agents constraints when an unseen uncertainty instance is realized. To achieve this we rely on the novel contribution of [5], which leads to a sharper result copared to the one that would be obtained by a direct application of the basic scenario theory [0]. Our approach can be thought of as the data driven counterpart of robust or worst-case optiization paradigs, enabling us to provide a priori guarantees on the probability of constraint satisfaction without iposing any assuptions on the underlying distribution of the uncertainty and its oents, and/or the geoetry of the uncertainty sets (e.g., [6, Chapters 6, 7]; however, providing the overall feasibility stateent with a certain confidence. The proposed distributed ipleentation of the scenario approach, which is instead typically perfored in a centralized fashion, allows for a reduction of the counication burden and the satisfaction of privacy requireents regarding the available knowledge on the uncertain paraeter. B. Related work Most literature builds on the seinal work of [7], [7], [8] (see also [9], [0] and references therein for a ore recent proble exposition, where a wide range of decentralized optiization probles is considered, using techniques based on gradient descent, dual decoposition, and the ethod of ultipliers. The recent work of [] deals with siilar probles but fro a gae theoretic perspective. Distributed optiization probles, in the absence of constraints though, have been considered in [] [3]. In ost of

these references the underlying network is allowed to be tievarying. In the presence of constraints, the authors of [0], [3] [34] adopt Newton-based or gradient/subgradient-based approaches and show asyptotic agreeent of the agents solutions to an optiizer of the centralized proble, in [35], [36] a distributed alternating direction ethod of ultipliers approach is adopted and its convergence properties are analyzed, whereas in [37], [38] a constraints consensus approach is adopted. In these contributions, however, the underlying network is tie-invariant, while agents are required to have certain eory capabilities. In a tie-varying environent, as that considered in the present paper, [39], [40] propose a projected subgradient ethodology to solve distributed convex optiization probles in the presence of constraints. In [39], however, the particular case where the agents constraint sets are all identical is considered. As a result, the coputational coplexity of each agents local optiization progra is the sae as that of the centralized algorith. Our approach, which allows for different constraint sets per agent, is ost closely related to the work of [40], but we adopt a proxial iniization instead of a subgradient-based perspective, thus avoiding the requireent for gradient/subgradient coputation. In ost of the aforeentioned references a deterinistic setup is considered. Results taking into account both constraints and uncertainty have recently appeared in [4] [43]. In [4] a penalty-based approach is adopted and convergence of the proposed schee is shown under the assuption that the algorith is initialized with soe feasible solution, which, however, can be difficult to copute. This is not required in the approach proposed in this paper. In [43] an asynchronous algorith is developed for a quite particular counication protocol that involves gossiping, i.e., pairwise counication, under stronger regularity conditions (strong convexity of the agents objective function. Our set-up is closely related, albeit different fro the approach of [4], which proposes a projected gradient descent approach where at every iteration a rando extraction of each agents constraints is perfored. In [4] alost sure convergence is proved, but this requires that different scenarios are extracted at every iteration, and these scenarios ust be independent fro each other, and independent across iterations. This creates difficulties in accounting for teporal correlation of the uncertain paraeter, and poses challenges if sapling fro the underlying distribution is coputationally expensive. On the contrary, in our algorith each agent is provided with a given nuber of scenarios (which accounts for data driven optiization too and the sae uncertainty scenarios are used at every iteration. In this case, convergence in [4] is not guaranteed, whilst our scenario-based approach provides probabilistic feasibility, as opposed to alost sure feasibility, guarantees. This probabilistic treatent of uncertainty, which is particularly suited to data based optiization, does not appear, to the best of our knowledge, in any of the aforeentioned references. Moreover, differently fro [4], our proxial iniization perspective allows us to bypass the requireent for gradient coputations, rendering the developed progras aenable to existing nuerical solvers, and do not ipose differentiability assuptions on the agents objective functions and Lipschitz continuity of the objective gradients. Finally, it is perhaps worth entioning that our approach is fundaentally different fro the randoized algorith of [37], which is based on iteratively exchanging active con- Network Agents decision vectors tie-invariant tie-varying unconstrained - [7], [8], [3] [3] sae constraints - [39] different deterinistic [0], [3] [39] [40] constraints uncertain - [4], our work TABLE I CLASSIFICATION OF RELATED WORK. straints over a tie-invariant network; in our case the network is tie-varying and we do not require for constraint exchange, thus reducing the counication requireents. For a quick overview, Table I provides a classification of the literature ost closely related to our work in ters of counication requireents (which is related to whether the underlying network is tie-varying or not and their ability to deal with different types of constraints (which is also related to the overall coputational effort as explained before. All the aforeentioned references, and our work as well, are concerned with static optiization probles, or probles with discrete tie dynaics. As for distributed optiization for continuous tie systes, the interested reader is referred to [44] [49], and references therein. C. Structure of the paper The paper unfolds as follows: In Section II we provide a foral stateent of the proble under study, and, focusing on the deterinistic case, forulate the proposed distributed algorith based on proxial iniization; convergence and optiality are also discussed, but to strealine the presentation all proofs, along with soe preparatory results and useful relations regarding the agents local solutions, are deferred to Section V. Section III deals with the stochastic case where constraints are affected by uncertainty, following a scenariobased ethodology. To illustrate the efficacy of our algorith, Section IV provides a distributed ipleentation of a regression proble subject to L-regularization. Finally, Section VI concludes the paper and provides soe directions for future work. Notation R, R denote the real and positive real nubers, and N, N the natural and positive natural nubers, respectively. For any x R n, x denotes the Euclidean nor of x, whereas for a scalar a R, a denotes its absolute value. Moreover, x denotes the transpose vector of x. For a continuously differentiable function f( : R n R, f(x is the gradient of f(x. Given a set X, we denote by co(x its convex hull. We write dist(y, X to denote the Euclidean distance of a vector y fro a set X, i.e., dist(y, X = inf x X y x. A vector a R is said to be a stochastic vector if all its coponents a j are non-negative and su up to one, i.e., a j =. Consider a square atrix A R and denote its i-th colun by a i R, i =,...,. A is said to be doubly stochastic if both its rows and coluns are stochastic vectors, i.e., ai j = for all j =,...,, and = for all i =,...,. ai j

3 II. DISTRIBUTED CONSTRAINED CONVEX OPTIMIZATION A. Proble set-up We consider a tie-varying network of agents that counicate to cooperatively solve an optiization proble of the for P δ : in x R n f i (x ( subject to x δ X i (δ, where x R n represents a vector of n decision variables, and δ. We assue that is endowed with a σ-algebra D and that P is a fixed, but possibly unknown, probability easure defined over D. For each i =,...,, f i ( : R n R is the objective function of agent i, whereas, for any δ, X i (δ R n is its constraint set. Proble P δ is a robust progra, where any feasible solution x should belong to X i(δ for all realizations δ of the uncertainty. Note that the fact that uncertainty appears only in the constraints and not in the objective functions is without loss of generality; in the opposite case, an epigraphic reforulation would recast the proble in the for of P δ. Due to the presence of uncertainty, proble P δ ay be very difficult to solve, especially when is a continuous set. Hence, a proper way to deal with uncertainty ust be introduced. Moreover, our perspective is that f i ( and X i represent private inforation, available only to agent i and/or even though the whole inforation were available to all agents, iposing all the constraints in one shot, would result in a coputationally intensive progra. This otivates the use of a distributed algorith. To ease the exposition of our distributed algorith, we focus first on the following deterinistic variant of P δ with constraint sets being independent of δ: P : in x R n f i (x ( subject to x X i. P δ will be revisited in Section III where we will specify how to deal with the presence of uncertainty. Since ost of the subsequent results are based on f i ( and X i being convex, we foralize it in the following assuption. Assuption. [Convexity] For each i =,...,, the function f i ( : R n R and the set X i R n are convex. B. A new proxial iniization-based algorith The pseudo-code of the proposed proxial iniizationbased iterative approach is given in Algorith. Initially, each agent i, i =,...,, starts with soe tentative value x i (0 which belongs to the local constraint set X i of agent i, but not necessarily to X i. One sensible choice for x i (0 is to set it such that x i (0 arg in xi X i f i (x i. At iteration For any δ, X i (δ is supposed to represent all constraints to the decision vector iposed by agent i, including explicit constraints expressed e.g., by inequalities like h i (x, δ 0 and restrictions to the doain of the objective function f i. Algorith Distributed algorith : Initialization : Set {a i j (k} k 0, for all i, j =,...,. 3: Set {c(k} k 0. 4: k = 0. 5: Consider x i (0 X i, for all i =,...,. 6: For i =,..., repeat until convergence 7: z i (k = ai j (kx j(k. 8: x i (k = arg in xi X i f i (x i c(k z i(k x i. 9: k k. k, each agent i constructs a weighted average z i (k of the solutions counicated by the other agents and its local one (step 7, Algorith, where a i j (k are the weights. Then, each agent solves a local iniization proble, involving its local objective function f i (x i and a quadratic ter, penalizing the difference fro z i (k (step 8, Algorith, where the coefficient c(k, which is assued to be non-increasing with k, regulates the relative iportance of the two ters. Note that, unlike P, under Assuption and due to the presence of the quadratic penalty ter, the resulting proble is strictly convex with respect to x i, and hence adits a unique solution. For each k 0 the inforation exchange between the agents can be represented by a directed graph (V, E k, where the nodes V = {,..., } are the agents and the set E k of directed edges (j, i indicating that at tie k agent i receives inforation fro agent j is given by E k = { (j, i : a i j(k > 0 }. (3 Fro (3, we set a i j (k = 0 in the absence of counication. If (j, i E k we say that j is a neighboring agent of i at tie k. Under this set-up, Algorith provides a fully distributed ipleentation, where at iteration k each agent i =,..., receives inforation only fro neighboring agents. Moreover, this inforation exchange is tie-varying and ay be occasionally absent. However, the following connectivity and counication assuption is ade, where E = { (j, i : (j, i E k for infinitely any k } denotes the set of edges (j, i representing agent pairs that counicate directly infinitely often. Assuption. [Connectivity and Counication] The graph (V, E is strongly connected, i.e., for any two nodes there exists a path of directed edges that connects the. Moreover, there exists T such that for every (j, i E, agent i receives inforation fro a neighboring agent j at least once every consecutive T iterations. Assuption guarantees that any pair of agents counicates directly infinitely often, and the intercounication interval is bounded. For further details on the interpretation of the iposed network structure the reader is referred to [8], [39]. Algorith terinates if the iterates aintained by all agents converge. Fro an ipleentation point of view, agent i, i =,...,, will terinate its update process if the absolute difference (relative difference can also be used between two consecutive iterates x i (k x i (k keeps below soe user-defined tolerance for a nuber of iterations equal to T (see Assuption ties the diaeter of the graph (i.e., the greatest distance between any pair of nodes connected via an edge in E. This is the worst case nuber of iterations

4 required for an agent to counicate with all others in the network; note that if an agent terinated the process at the first iteration where the desired tolerance is et, then convergence would not be guaranteed since its solution ay still change as an effect of other agents updating their solutions. The proposed iterative ethodology resebles the structure of proxial iniization for constrained convex optiization [7, Chapter 3.4.3]. The difference, however, is that our setup is distributed and the quadratic ter in step 8 does not penalize the deviation of x i fro the previous iterate x i (k, but fro an appropriately weighted average z i (k. Note that, in contrast with the inspiring work in [39] [4] addressing P under a siilar set-up but following a projected subgradient approach, our proxial iniization-based approach allows for an intuitive econoic interpretation: at every iteration k we penalize a consensus residual proxy by the tie-varying coefficient /(c(k, which progressively increases. This can be thought of as a pricing settling echanis, where the ore we delay to achieve consensus the higher the price is. In the case where a i j (k = / for all i, j =,...,, for all k 0, that corresponds to a decentralized control paradig, the solution of our proxial iniization approach coincides with the one obtained when the alternating direction of ultipliers [7], [9], is applied to this proble (see eq. (4.7-(4.74, p. 54 in [7]. In the latter the quadratic penalty ter is not added to the local objective function as in step 8 of Algorith, but to the Lagrangian function of an equivalent proble, and the coefficient c(k is an arbitrary constant independent of k; however, a dual-update step is required. Foral connections between penalty ethods and the ethod of ultipliers have been established in [50]. Reark (Application to a specific proble structure. Algorith can be siplified when the underlying optiization proble exhibits a specific structure, naely agents need to agree on a coon decision vector y R n, but each of the decides upon a local decision vector u i R ni, i =,..., as well: in y R n,{u i R n i } subject to y f i (y, u i Y i, u i U i, i =,...,, (4 where Y i R n and U i R ni, for all i =,...,. Provided that Assuptions - hold for proble (4 with x = (y, u,..., u and X i = Y i R n U i R n, we can rewrite it as in y R n g i(y subject to y Y i, where g i (y = in ui U i f i (y, u i and siplify Algorith by replacing steps 7-8 with: z i (k = a i j(ky j (k, ( yi (k, u i (k = arg in y i Y i,u i U i f i (y i, u i c(k z i(k y i. This entails that agents only need to counicate their local estiates y i (k, i =,...,, of the coon decision vector y while the local solutions related to u i, i =,...,, need not be exchanged. C. Further structural assuptions and counication requireents We ipose soe additional assuptions on the structure of proble P in ( and the counication set-up that is considered in this paper. These assuptions will play a crucial role in the proof of convergence of Section V. Assuption 3. [Copactness] For each i =,...,, X i R n is copact. Note that due to Assuption 3, co ( X i is also copact. Let then D R be such that x D for all x co ( X i. Moreover, due to Assuptions and 3, f i ( : R n R is Lipschitz continuous on X i with Lipschitz constant L i R, i.e., for all i =,...,, f i (x f i (y L i x y, for all x, y X i. (5 Assuption 4. [Interior point] The feasibility region X i of P has a non-epty interior, i.e., there exists x and ρ R such that {x R n : x x < ρ} X i X i. Due to Assuption 4, by the Weierstrass theore (Proposition A.8, p. 65 in [7], P adits at least one optial solution. Therefore, if we denote by X X i the set of optiizers of P, then X is non-epty. Notice also that f i (, i =,...,, is continuous due to the convexity condition of Assuption ; the addition of Assuption 3 is to iply Lipschitz continuity. However, f i (, i =,...,, is not required to be differentiable. We ipose the following assuption on the coefficients {c(k} k 0, that appear in step 8 of Algorith. Assuption 5. [Coefficient {c(k} k 0 ] Assue that for all k 0, c(k R and {c(k} k 0 is a non-increasing sequence, i.e., c(k c(r for all k r, with r 0. Moreover, k=0 c(k =, k=0 c(k <. In standard proxial iniization [7] convergence is highly dependent on the appropriate choice of c(k. Assuption 5 is in fact needed to guarantee convergence of Algorith. A direct consequence of the last part of Assuption 5 is that li k c(k = 0. One choice for {c(k} k 0 that satisfies the conditions of Assuption 5 is to select it fro the class of generalized haronic series, e.g., c(k = α/(k for soe α R. Note that Assuption 5 is in a sense analogous to the conditions that the authors of [39], [40] ipose on the step-size of their subgradient algorith. It should be also noted that our set-up is synchronous, using the sae c(k for all agents, at every iteration k. Extension to an asynchronous ipleentation is a topic for future work. In line with [7], [8], [9] we ipose the following assuptions on the inforation exchange between the agents. Assuption 6. [Weight coefficients] There exists η (0, such that for all i, j {,..., } and all k 0, a i j (k R {0}, a i i (k η, and ai j (k > 0 iplies that ai j (k η. Moreover, for all k 0, ai j (k = for all i =,...,, ai j (k = for all j =,...,. Assuptions and 6 are identical to Assuptions and 5 in [39] (the sae assuptions are also iposed in [40], but were reported also here to ease the reader and facilitate the exposition of our results. Note that these are rather standard for

5 distributed optiization and consensus probles; for possible relaxations the reader is referred to [9], [5]. The interpretation of having a unifor lower bound η, independent of k, for the coefficients a i j (k in Assuption 6 is that it ensures that each agent is ixing inforation received by other agents at a non-diinishing rate in tie [39]. Moreover, points and in Assuption 6 ensure that this ixing is a convex cobination of the other agent estiates, assigning a non-zero weight to its local one since a i i (k η. Note that satisfying Assuption 6 requires agents to agree on an infinite sequence of doubly stochastic atrices (double stochasticity arises due to conditions and in Assuption 6, where a i j (k would be eleent (i, j of the atrix at iteration k. This agreeent should be perfored prior to the execution of the algorith in a centralized anner, and the resulting atrices have to be counicated to all agents via soe consensus schee; this is standard in distributed optiization algoriths of this type (see also [9], [39], [40]. It would be of interest to construct doubly stochastic atrices in a distributed anner using the achinery of [5]; however, exploiting these results requires further investigation and is outside the scope of the paper. D. Stateent of the ain convergence result Under the structural assuptions and the counication set-up iposed in the previous subsection, Algorith converges and agents reach consensus, in the sense that their local estiates x i (k, i =,...,, converge to soe iniizer of proble P. This is forally stated in the following theore, which constitutes one ain contribution of our paper. Theore. Consider Assuptions -6 and Algorith. We have that, for soe iniizer x X of P, li x i(k x = 0, for all i =,...,. (6 k To strealine the contribution of the paper, the rather technical proof of this stateent is deferred to Section V-B. III. DEALING WITH UNCERTAINTY In this section, we revisit proble P δ in (, and give a ethodology to deal with the presence of uncertainty. Motivated by data driven considerations, we assue that each agent i, i =,...,, is provided with a fixed nuber of realizations of δ, referred to as scenarios, extracted according to the underlying probability easure P with which δ takes values in. According to the inforation about the scenarios that agents possess, two cases are distinguished in the sequel (scenarios as a coon resource vs. scenarios as a private resource and the properties of the corresponding scenario progras are analyzed. Throughout, the following odifications to Assuptions - 4 are iposed: For each i =,...,, X i (δ is a convex set for any δ. For each i =,...,, and for any finite set S of values for δ, δ S X i(δ is copact. 3 For any finite set S of values for δ, δ S X i(δ has a non-epty interior. For the subsequent analysis, note that for any N N, P N denotes the corresponding product easure. We assue easurability of all involved functions and sets. A. Probabilistic feasibility - Scenarios as a coon resource We first consider the case where all agents are provided with the sae scenarios of δ, i.e., scenarios can be thought of as a coon resource for the agents. This is the case if all agents have access to the sae set of historical data for δ, or if agents counicate the scenarios with each other. The latter case, however, increases the counication requireents. Let N N denote the nuber of scenarios, and S = {δ (,..., δ ( N } be the set of scenarios available to all agents. The scenarios are independently and identically distributed (i.i.d. according to P. Consider then the following optiization progra P N, where the subscript N is introduced to ephasize the dependency with respect to the uncertainty scenarios. P N : in x R n f i (x subject to x δ S X i (δ. (7 Clearly, x δ S X i(δ is equivalent to x δ S X i(δ, and P N is aenable to be solved via the distributed algorith of Section II-A. In fact, one can apply Algorith with δ S X i(δ in place of X i, for all i =,...,. Let X N δ S X i(δ be the set of iniizers of P N. We then have the following corollary of Theore. Corollary. Consider Assuptions -6 with the odifications stated in Section III, and Algorith. We have that, for soe x N X N, li x i, k N (k x N = 0, for all i =,...,, (8 where x i, N(k denotes the solution generated at iteration k, step 8 of Algorith, when X i is replaced by δ S X i(δ. We address the proble of quantifying the robustness of the iniizer x N of P N to which our iterative schee converges according to Corollary. In the current set-up a coplete answer is given by the scenario approach theory [9], [0], which shows that x N is feasible for P δ up to a quantifiable level ε. This result is based on the notion of support constraints (see also Definition 4 in [9], and in particular on the notion of support set [5] (also referred to as copression schee in [4]. Given an optiization progra, we say that a subset of the constraints constitutes a support set, if it is the inial cardinality subset of the constraints such that by solving the optiization proble considering only this subset of constraints, we obtain the sae solution to the original proble where all the constraints are enforced. As a consequence, all constraints that do not belong to the support set are in a sense redundant since their reoval leaves the optial solution unaffected. By Theore 3 of [9], for any convex optiization progra the cardinality of the support set is at ost equal to the nuber of decision variables n, whereas in [53] a refined bound is provided. The subsequent result is valid for any given bound on the cardinality of the support set. Therefore, and since P N is convex, let d N be a known upper-bound for the cardinality of its support set. A direct application of the scenario approach theory in [9] leads then to the following result. Theore. Fix β (0, and let β ε = N d ( N. (9 d

6 We then have that P N { S N : { P δ : x N / } } X i (δ ε β. (0 In words, Theore iplies that with confidence at least β, x N is feasible for P δ apart fro a set of uncertainty instances with easure at ost ε. Notice that ε is in fact a function of N, β and d. We suppress this dependency though to siplify notation. Note that even though P N does not necessarily have a unique solution, Theore still holds for the solution returned by Algorith (assuing convergence, since it is a deterinistic algorith and hence serves as a tiebreak rule to select aong the possibly ultiple iniizers. Following [0], (9 could be replaced with an iproved ε, ( N k ε k ( ε N k = β. obtained as the solution of d k=0 However, we use (9 since it gives an explicit relation expression for ε, and also renders (0 directly coparable with the results provided in the next subsection. In case ε exceeds one, the result becoes trivial. However, note that Theore can be also reversed (as in experient design to copute the nuber N of scenarios that is required for (0 to hold for given ε, β (0,. This can be deterined by solving (9 with respect to N with the chosen ε fixed (e.g., using nuerical inversion. The reader is referred to Theore of [9] for an explicit expression of N. B. Probabilistic feasibility - Scenarios as a private resource We now consider the case where the inforation carried by the scenarios is distributed, that is, each agent has its own set of scenarios, which constitute agents private inforation. Specifically, assue that each agent i, i =,...,, is provided with a set S i = {δ ( i,..., δ (Ni i } of N i N i.i.d. scenarios of δ, extracted according to the underlying probability easure P. Here, δ (j i denotes scenario j of agent i, j =,..., N i, i =,...,. The scenarios across the different sets S i, i =,...,, are independent fro each other. The total nuber of scenarios is N = N i. Consider then the following optiization progra P N, where each agent has its own scenario set. P N : in x R n f i (x subject to x δ S i X i (δ. ( Progra P N can be solved via the distributed algorith of Section II-A, so that a solution is obtained without exchanging any private inforation regarding the scenarios. In fact, one can apply Algorith with δ S i X i (δ in place of X i, for all i =,...,. Siilarly to Corollary, letting XN δ S i X i (δ be the set of iniizers of P N, we have the following corollary of Theore. Corollary. Consider Assuptions -6 with the odifications stated in Section III, and Algorith. We have that, for soe x N X N, li x i,n (k x N = 0, for all i =,...,, ( k where x i,n (k denotes the solution generated at iteration k, step 8 of Algorith, when X i is replaced by δ S i X i (δ. As in Section III-A, we show that the iniizer x N of P N to which our iterative schee converges according to Corollary is feasible in a probabilistic sense for P δ. Here, a difficulty arises, since we seek to quantify the probability that x N satisfies the global constraint X i(δ, where δ is a coon paraeter to all X i (δ, i =,...,, while x N has been coputed considering X i (δ for uncertainty scenarios that are independent fro those of X j (δ, j i, i =,...,. Let S = {S i } be a collection of the scenarios of all agents. Siilarly to the previous case, we denote by d N a known upper-bound for the cardinality of the support set of P N. However, the way the constraints of the support set are split aong the agents depends on the specific S eployed. Therefore, for each set of scenarios S and for i =,...,, denote by d i,n (S N (possibly equal to zero the nuber of constraints that belong to both the support set of P N and S i, i.e., the constraints of agent i. We then have that d i,n (S d, for any S N. For short we will write d i,n instead of d i,n (S and ake the dependency on S explicit only when necessary. A naive result: For any collection of agents scenarios, it clearly holds that d i,n d for all i =,...,, for any scenario set. Thus, for each i =,...,, Theore can be applied conditionally to the scenarios of all other agents to obtain a local, in the sense that it holds only for the constraints of agent i, feasibility characterization. Fix β i (0, and let βi ε i = N i d. (3 ( Ni d We then { have that { } } P N S N : P δ : x N / X i (δ ε i β i. (4 By the subadditivity of P N and P, (4 can be used to quantify the probabilistic feasibility of x N with respect to the global constraint X i(δ. Following the proof of Corollary in [54], where a siilar arguent is provided, we have that P N{ { } } S N : P δ : x N / X i (δ ε i { { } = P N S N : P δ : i {,..., }, x N / X i (δ } ε i { { = P N S N : P P N { S N : P N{ { }} δ : x N / X i (δ } ε i { } } P δ : x N / X i (δ ε i { { } }} S N : P δ : x N / X i (δ ε i P N{ { } } S N : P δ : x N / X i (δ > ε i β i, (5

Probability of violation 7 which leads to the following proposition. Proposition. Fix β (0, and choose β i, i =,...,, such that β i = β. For each i =,...,, let ε i be as in (3 and set ε = ε i. We then have that { { } } P N S N : P δ : x N / X i (δ ε β. (6 Proposition iplies that with confidence at least β, x N is feasible for P δ apart fro a set with easure at ost ε. This result, however, tends to be very conservative thus prohibiting its applicability to probles with a high nuber of agents. This can be seen by coparing ε with ε, where the latter corresponds to the case where scenarios are treated as a coon resource. To this end, consider the particular set-up where N i = N and β i = β/, for all i =,...,. By (9 and (3, it follows that ε = ε i ε, thus growing approxiately (we do not have exact equality since β i = β/ linearly with the nuber of agents. This can be also observed in the nuerical coparison of Section III-B (see Fig.. The issue with Proposition is that it accounts for a worst-case setting, where d i,n = d for all i =,..., ; however, this can not occur, since d i,n d iplies that if d i,n = d for soe i, then d j,n = 0, for all j i, i =,...,. A tighter result: To alleviate the conservatis of Proposition, and exploit the fact that d i,n d, we use the recent results of [5]. For each i =,...,, fix β i (0, and consider a function ε i ( defined as follows: β i ε i (k = N i k (d ( N i, for all k = 0,..., d. (7 k Notice that ε i ( is also a function of N i, β i and d, but this dependency is suppressed to siplify notation. For each i =,...,, working conditionally with respect to the scenarios S \ S i of all other agents, Theore of [5] entails that P N{ { } S N : P δ : x N / X i (δ ε i (d i,n { S \ S i N Ni}} β i. (8 Integrating (8 with respect to the probability of realizing the scenarios S \ S i, we have that { { } } P N S N : P δ : x N / X i (δ ε i (d i,n β i. (9 The stateent in (9 iplies that for each agent i =,...,, with confidence at least β i, the probability that x N does not belong to the constraint set X i (δ of agent i is at ost equal to ε i (d i,n. Note, however, that (9 is very different fro (4, which is obtained by eans of the basic scenario approach theory, since d i,n is not known a-priori but depends on the extracted scenarios. Using (9 in place of (4 in the the derivations of (5, by the subadditivity of P N and P, we have that { { P N S N : P δ : x N / } X i (δ 0.8 0.6 0.4 0. " e" " 0 3 4 5 6 7 8 9 0 3 4 5 Nuber of agents - Fig.. Probability of constraint violation as a function of the nuber of agents, for the case where d = 50, β = 0 5, N i = N = 4500 and β i = β/, for all i =,...,. The probability of violation ε (green dashed line for the case of Section III-A is independent of, so it reains constant as the nuber of agents increases. For the case of Section III-B, ε ε (red dotted-dashed line for the considered set-up, so it grows approxiately linearly with. For the case of Section III-B, ε (blue solid line is oderately increasing with, thus offering a less conservative result copared to the approach of Section III-B, while, in contrast to the approach of Section III-A, it allows for distributed inforation about the scenarios. } ε i (d i,n β i. (0 Unlike (0 and (6, (0 is an a-posteriori stateent due to the dependency of ε i (d i,n on the extracted scenarios. However, the sought a-priori result can be obtained by considering the worst-case value for ε i(d i,n, with respect to the different cobinations of d i,n, i =,...,, satisfying d i,n d. This can be achieved by eans of the following axiization proble: ε = ax {d i N } subject to ε i (d i ( d i d, Proble ( is an integer optiization progra. It can be solved nuerically to obtain ε. The optial value ε of the proble above depends on {N i, β i } and d, but this dependency is suppressed to siplify notation. Notice the slight abuse of notation, since {d i } in ( are integer decision variables and should not be related to {d i,n }. We have the following theore which is the ain achieveent of this section. Theore 3. Fix β (0, and choose β i, i =,...,, such that β i = β. Set ε according to (. We then have that P N{ { S N : P δ : x N / } } X i (δ ε β. ( Proof. Fix β (0, and choose β i, i =,...,, such that β i = β. Consider any set S of scenarios and notice that d i,n (S d. This iplies that {d i,n (S} constitute a feasible solution of (. Due to the fact that ε is the optial value of (, ε i(d i,n (S ε for any S, which together with (0, leads to ( and hence concludes the proof.

8 The result of Theore 3 can be significantly less conservative copared to that of Proposition, since we explicitly account for the fact that d i,n d in the axiization proble in (. This can be also observed by eans of the nuerical exaple of Fig., where we investigate how ε, ε and ε change as a function of the nuber of agents. We consider a particular case where d = 50, β = 0 5, N i = N = 4500 and β i = β/, for all i =,...,. For this set-up, where β is split evenly aong agents and all agents have the sae nuber of scenarios, it turned out that the axiu value ε in ( is achieved for d i = d/, i =,...,. As it can be seen fro Fig., ε (green dashed line for the case of Section III-A is independent of, so it reains constant as the nuber of agents increases. For the case of Section III-B, ε (red dotted-dashed line rows approxiately linearly with (see also discussion at the end of Section III-B. For the case of Section III-B, ε (blue solid line is oderately increasing with, thus offering a less conservative result copared to the approach of Section III-B, while, in contrast to the approach of Section III-A, it allows for distributed inforation about the uncertainty scenarios. In certain cases (e.g., when the nuber of agents is high, ε ay still exceed one and hence the result of Theore 3 becoes trivial (the sae for Proposition in such cases. Siilarly to the discussion at the end of Section III-A, Theore 3 can be reversed to copute the nuber of scenarios N i that need to be extracted by agent i, i =,...,, for a given value of ε, β (0,. This can be achieved by nuerically seeking for values of N i, i =,...,, that lead to a solution of ( that attains the desired ε. IV. NUMERICAL EXAMPLE We address a ulti-agent regression proble subject to L -regularization, which is inspired by Exaple of [55]. Specifically, we consider functions s i (δ, i =,...,, which can, for instance, represent the effect of the sae phenoenon at different locations of agents. The functions are unknown, and each agent i has access to a (private data set {(δ (j i, s i (δ (j i, j =,..., N i } of easureents of function s i (δ only. The agents seek to deterine the agnitude of d cosinusoids at given frequencies, so that their superposition provides a central approxiation of all the s i (, i =,...,. To this end, letting x = [x [],..., x [d], x [d] ] R d, the following progra is considered: in x [d] λ x (3 x X R d d subject to x [l] cos(lδ (j i s i (δ (j i x [d], l= for all j =,..., N i, for all i =,...,. In (3, one iniizes x [d], which is the worst-case approxiation error over the data-points of all agents, plus a regularization ter λ x, which induces sparsity in the solution. The set X is a hyper-rectangular with high enough edge length so that the solution reains the sae as in the unconstrained case, and it is introduced to ensure copactness so that Algorith can be applied (in fact this set could be different per agent, and does not need to be agreed upfront. By setting f i (x = (/(x [d] λ x, X i (δ = {x Signal value.5.5 0.5 0-0.5 - -.5 - -.5-3 - - 0 3 δ [rad] (a Signal value.5.5 0.5 0-0.5 - -.5 - -.5-3 - - 0 3 δ [rad] Fig.. Data points (grey crosses, and the functions (solid lines corresponding to the local solutions returned by Algorith (a at the initialization and (b after 50 iterations. X : d l= x[l] cos(lδ s i (δ x [d] }, and S i = {δ ( i,..., δ (Ni i }, i =,...,, it is seen that proble (3 is in the for of P N, and, oreover, it satisfies the assuptions of Corollary. Hence, the distributed Algorith can be eployed to copute the optial solution of (3. Notice that x i in Algorith corresponds to a copy of x aintained by agent i and should not be confused with x [l], which is the l-th coponent of x. Each objective function f i (x is nondifferentiable. In our siulation, we considered = 6 agents on a ring of alternating counicating pairs (tie-varying counication graph, and assigned at each step the sae weight to both the local solution and that transitted by the active neighbor. Moreover, we set n = d = 5, λ = 0.00, and N i = N = 4500 for all i =,...,. All saples δ (j i, i =,...,, j =,..., N, were independently drawn fro a unifor distribution with support [ π, π], while, iicking [55], each s i (δ (j i was obtained by evaluating the su of a certain nuber of randoly shifted co-sinusoids. Finally, Algorith was initialized with the solutions satisfying the local constraints only and c(k = 0.05/(k. Figure shows the data points for each agent (grey dots and the functions d l= x[l] i sin(lδ corresponding to the agents solutions returned by Algorith (a at the initialization and (b after 50 iterations. As it appears, in confority to Corollary, all local solutions converge to a unique solution. The fact that this solution is also optial can be experientally inspected fro Figure 3, where the objective values corresponding to the agent local solutions as iterations progress are displayed against the optial objective value of proble (3 coputed via a centralized algorith for coparison purposes. The value to which x [d] i converged was 0.88. In our siulations, scenarios were treated as private resources as each agent s scenarios are independent of the scenarios of other agents. Nonetheless, for a newly seen observation δ, one ay be interested { in assessing the joint-constraint } violation probability P δ : x N / X i(δ, which in the present exaple corresponds to the probability of being apart fro the obtained central function d l= x[l] i sin(lδ ore than 0.88 for at least one of the function s i (δ, i =,...,. Using 80000 new scenarios (different fro those used in the optiization process, this probability was epirically estiated as 0.0. Using β = 0 5 and d = 50 (the bound on the diension of the support set is d = 50 and not d, since we do not need to account for the epigraphic variable x [d], see [53], Proposition and Theore 3 give ε = 0.37 and ε = 0.097, respectively. As it can be seen, the novel bound of Theore 3 provides a uch tighter guaranteed upper bound (b

Objective value 9 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0 50 00 50 Iteration Fig. 3. Objective values corresponding to the agent local solutions as iterations progress (solid lines vs. optial value of proble (3 coputed via a centralized algorith (dashed line for the probability of joint-constraint violation copared to ε, while not requiring agents to have access to the sae set of scenarios. Other runs of the exaple, with new observations extracted, always gave an estiate of the joint-constraint violation probability saller than 0.09, as it was expected given the high-confidence 0 5 with which the bound is guaranteed. V. CONVERGENCE ANALYSIS AND PROOF OF THEOREM A. Preparatory results We establish several relations between the difference of the agent estiates fro certain average quantities. At the end of this subsection we provide a suability result that is fundaental for the proof of Theore in subsection V-B. Let v(k = x i (k, for all k 0. (4 By using Assuption, the fact that the sets X i, i =,..., are closed thanks to Assuption 3, and Assuption 4, it is shown in Lea of [39] that v(k = ɛ(k ɛ(k ρ x ρ ɛ(k ρ v(k X i, for all k 0, (5 where ɛ(k = dist(v(k, X i, and x R n, ρ R are as in Assuption 4. Note that unlike x i (k and v(k, which do not necessarily belong to X i, for v(k this is always the case, thus providing a feasible solution of P. For each i =,...,, denote by e i (k = x i (k z i (k, for all k 0, (6 the error between the values coputed at steps 7 and 8 of Algorith, i.e., the difference of the weighted average z i (k coputed by agent i at tie k fro its local update x i (k. Error relations: We provide soe interediate results that for the basis of the subsequent suability result. Lea. Consider Assuptions, 3 and 4. For all k 0, x i (k v(k µ x i (k v(k, (7 where µ = (/ρd, with D given below Assuption 3. Fro step 7 of Algorith we have that for all k 0, for all i =,...,, x i (k = a i j(kx j (k x i (k z i (k = a i j(kx j (k e i (k, (8 where the last equality follows fro (6. Following [8], for each k 0 consider a atrix A(k R such that a i j (k is the j-th eleent of its i-th colun. For all k, s with k s, let Φ(k, s = A(sA(s... A(k A(k, with Φ(k, k = A(k for all k 0. Denote by [ ] i Φ(k, s eleent j of colun i of Φ(k, s. It is then shown j in [8] that, under Assuption 6, Φ(k, s is doubly stochastic. Siilarly to [8], by propagating (8 in tie, it can be shown that for all k > s (the inequality is strict for convenience of the subsequent derivations, for all i =,...,, [ ] i x i (k = Φ(k, s x j j(s r=s [ ] i Φ(k, r e j j(r e i (k. (9 For all k > s, the last stateent, together with (4 and the fact that Φ(k, s is a doubly stochastic atrix, leads to v(k = x j (s r=s e j (r e i (k. (30 We then have the following lea, which relates x i (k v(k to e i (k, i =,...,. Its proof follows fro Lea 8 in [39]. Lea. Consider Assuptions and 6. For all k, s with s 0, k > s, and for all i =,...,, x i (k v(k λq k s λq k r r=s e i (k x j (s e j (r e j (k, (3 where λ = ( η ( T / ( η ( T R and q = ( η ( T ( T (0,. A suability relation: Let N N and consider the ter N L c(k x i (k v(k, (3 where L = ax,..., L i with L i defined according to (5. We will show that (3 has an interesting relation with

0 N e i(k and will coe back to it often in the next section to establish certain suability results. Consider Lea with k in place of k and Lea, suing both sides of (3 with respect to i =,..., and setting s = 0. After soe algebraic anipulations and index changes, we have that N L c(k x i (k v(k µλ L µλ L N c(kq k r=0 x i (0 N c(kq k r e i (r N 4µ L c(k e i (k. (33 We then have the following lea: Lea 3. Consider Assuptions -6. Fix any α (0,, and consider (4-(6. We then have that for any N N, where N L c(k x i (k v(k < α N e i (k α N α = α µ L ( λ ( q 4, c(k α 3, (34 α 3 = 3 µ λ L c(0 α ( q q µλ LDc( q α D. (35 B. Algorith analysis In this section we deal with the convergence properties of Algorith, and provide the proof of Theore. Error convergence: We prove convergence properties for the error in (6, which are instruental to the proof of Theore. We use the following result, which is proven in Lea 4. in [7] (p. 57 for the case where the constraint sets are polyhedral. As entioned in p. 66 of the sae reference, the assertion of the lea reains valid also in the case of general convex constraint sets. For the latter we refer the reader to [56], and to [57] (Lea 9 for a recent use of the lea in case of convex constraint sets. Lea 4 (Lea 4. in [7] (p. 57. If y = arg in y Y J (y J (y (assuing uniqueness of the iniizer, where Y R n is a closed, convex set, J (, J ( : R n R are convex functions and J ( is continuously differentiable, then y = arg in y Y J (y J (y y, where J (y is the gradient of J (y with respect to y, evaluated at y. Consider step 8 of Algorith. Thanks to Assuptions, the fact that the sets X i, i =,..., are closed by Assuption 3, Assuption 4, and the fact that (/(c(k z i (k x i is continuously differentiable with respect to x i, Lea 4 can be applied to the proble with x i, X i in place of y, Y, respectively, f i (x i in place of J (y and (/(c(k z i (k x i in place of J (y. We have that x i (k = arg in x i X i f i (x i c(k (z i(k x i (k x i, (36 where in the second ter of (36, (/c(k(z i (k x i (k, is the gradient of (/(c(k z i (k x i with respect to x i, evaluated at x i (k. We then have the following lea, which provides a useful relation between the consecutive algorith iterates x i (k and x i (k, and we will be using it extensively in the subsequent results. The subsequent proof strongly depends on the use of Lea 4, and deviates fro the proofs of the basic iterate relations in [39] (Lea 6 and [4] (Lea 5; it is otivated by the proof of the alternating direction ethod of ultipliers (Proposition 4. in [7], Appendix A of [9], and relies on our proxial iniization perspective. Lea 5. Consider Assuptions, 3, 4 and 6. We then have that for any k N, for any x X, c(k f i ( v(k e i (k x i (k x c(k f i (x x i (k x Lc(k x i (k v(k, (37 where e i (k is given as in (6. Proof. Thanks to Lea 4, (36 holds true. Since x i (k X i is the iniizer of the optiization proble that appears in the right-hand side of (36, we have that f i (x i (k c(k (z i(k x i (k x i (k f i (x c(k (z i(k x i (k x, for all x X i. (38 Since the last stateent holds for any x X i, it will also hold for any iniizer x X X i of proble P in (. We have that for any x X, (z i (k x i (k (x i (k x = x i(k z i (k x i(k x z i(k x. (39 By (38, (39, we have that for any x X, f i (x i (k c(k x i(k z i (k c(k x i(k x f i (x c(k z i(k x