Distributed Optimization over Random Networks

Similar documents
Convergence Rate for Consensus with Delays

Distributed Optimization over Networks Gossip-Based Algorithms

Distributed Consensus over Network with Noisy Links

Quantized Average Consensus on Gossip Digraphs

Constrained Consensus and Optimization in Multi-Agent Networks

Consensus-Based Distributed Optimization with Malicious Nodes

ADMM and Fast Gradient Methods for Distributed Optimization

Asymptotics, asynchrony, and asymmetry in distributed consensus

Convergence of Rule-of-Thumb Learning Rules in Social Networks

An approximate dual subgradient algorithm for multi-agent non-convex optimization

On Distributed Optimization using Generalized Gossip

arxiv: v1 [math.oc] 24 Dec 2018

Alternative Characterization of Ergodicity for Doubly Stochastic Chains

Distributed Algorithms for Optimization and Nash-Games on Graphs

Asynchronous Non-Convex Optimization For Separable Problem

Almost Sure Convergence to Consensus in Markovian Random Graphs

Information Aggregation in Complex Dynamic Networks

Consensus Protocols for Networks of Dynamic Agents

Convergence in Multiagent Coordination, Consensus, and Flocking

Distributed online optimization over jointly connected digraphs

Discrete-time Consensus Filters on Directed Switching Graphs

Push-sum Algorithm on Time-varying Random Graphs

Routing. Topics: 6.976/ESD.937 1

Consensus Problems on Small World Graphs: A Structural Study

Convergence Rates in Decentralized Optimization. Alex Olshevsky Department of Electrical and Computer Engineering Boston University

Fastest Distributed Consensus Averaging Problem on. Chain of Rhombus Networks

Learning distributions and hypothesis testing via social learning

On Quantized Consensus by Means of Gossip Algorithm Part I: Convergence Proof

Agreement Problems in Networks with Directed Graphs and Switching Topology

Time Synchronization in WSNs: A Maximum Value Based Consensus Approach

Distributed Subgradient Methods for Multi-agent Optimization

Consensus Problems in Networks of Agents with Switching Topology and Time-Delays

On Distributed Averaging Algorithms and Quantization Effects

Asynchronous Distributed Averaging on. communication networks.

Convergence in Multiagent Coordination, Consensus, and Flocking

Zero-Gradient-Sum Algorithms for Distributed Convex Optimization: The Continuous-Time Case

ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS

Fast Linear Iterations for Distributed Averaging 1

Asymmetric Information Diffusion via Gossiping on Static And Dynamic Networks

Consensus seeking on moving neighborhood model of random sector graphs

Bayesian Learning in Social Networks

Agreement algorithms for synchronization of clocks in nodes of stochastic networks

On Quantized Consensus by Means of Gossip Algorithm Part II: Convergence Time

ABSTRACT. Ion Matei, Doctor of Philosophy, Professor John S. Baras Professor Nuno C. Martins Department of Electrical and Computer Engineering

Consensus and Distributed Inference Rates Using Network Divergence

Maximum Capability of Feedback Control for Network Systems

Network Optimization with Heuristic Rational Agents

Decentralized Consensus Optimization with Asynchrony and Delay

Degree Fluctuations and the Convergence Time of Consensus Algorithms

Discrete Double Integrator Consensus

Connections Between Cooperative Control and Potential Games Illustrated on the Consensus Problem

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization

Convergence speed in distributed consensus and averaging

E-Companion to The Evolution of Beliefs over Signed Social Networks

arxiv: v2 [cs.dc] 2 May 2018

Finite-Time Distributed Consensus in Graphs with Time-Invariant Topologies

Consensus of Hybrid Multi-agent Systems

A Note to Robustness Analysis of the Hybrid Consensus Protocols

Finite-Time Consensus based Clock Synchronization by Discontinuous Control

Randomized Gossip Algorithms

Consensus Seeking Using Multi-Hop Relay Protocol

Weighted Gossip: Distributed Averaging Using Non-Doubly Stochastic Matrices

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS. Michael G. Rabbat

Dynamic Coalitional TU Games: Distributed Bargaining among Players Neighbors

Designing Games for Distributed Optimization

Max-Consensus in a Max-Plus Algebraic Setting: The Case of Fixed Communication Topologies

A Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology

Complex Laplacians and Applications in Multi-Agent Systems

Resilient Distributed Optimization Algorithm against Adversary Attacks

Primal Solutions and Rate Analysis for Subgradient Methods

Asymmetric Randomized Gossip Algorithms for Consensus

A Distributed Newton Method for Network Utility Maximization, II: Convergence

Multi-Hop Relay Protocols for Fast Consensus Seeking

Dynamic Power Allocation and Routing for Time Varying Wireless Networks

Product of Random Stochastic Matrices

On Backward Product of Stochastic Matrices

Multi-Robotic Systems

Distributed Estimation and Detection for Smart Grid

A Hyperparameter-Based Approach for Consensus Under Uncertainties

SJÄLVSTÄNDIGA ARBETEN I MATEMATIK

Preliminary Results on Social Learning with Partial Observations

On Agreement Problems with Gossip Algorithms in absence of common reference frames

Zeno-free, distributed event-triggered communication and control for multi-agent average consensus

Flocking while Preserving Network Connectivity

Fast Algorithms for Distributed Optimization over Time-varying Graphs

Structural Analysis and Optimal Design of Distributed System Throttlers

Primal/Dual Decomposition Methods

Consensus Seeking in Multi-agent Systems Under Dynamically Changing Interaction Topologies

Effects of Topology in Networked Systems: Stochastic Methods and Small Worlds

An Unbiased Kalman Consensus Algorithm

Distributed Averaging with Flow Constraints

Consensus of Information Under Dynamically Changing Interaction Topologies

Solving Dual Problems

A survey on distributed estimation and control applications using linear consensus algorithms

Distributed Algebraic Connectivity Estimation for Adaptive Event-triggered Consensus

Distributed Consensus and Linear Functional Calculation in Networks: An Observability Perspective

Continuous-time Distributed Convex Optimization with Set Constraints

arxiv: v1 [math.oc] 29 Sep 2018

Asymptotically Unbiased Average Consensus Under Measurement Noises and Fixed Topologies

Transcription:

Distributed Optimization over Random Networks Ilan Lobel and Asu Ozdaglar Allerton Conference September 2008 Operations Research Center and Electrical Engineering & Computer Science Massachusetts Institute of Technology, USA MIT Laboratory for Information and Decision Systems 1

Motivation Increasing interest in distributed control and coordination of networks consisting of multiple autonomous agents Motivated by many emerging networking applications, such as ad hoc wireless communication networks and sensor networks, characterized by: Lack of centralized control and access to information Randomly varying connectivity Control algorithms for such networks should be: Completely distributed relying on local information Robust against changes in the network topology MIT Laboratory for Information and Decision Systems 2

Multi-Agent Optimization Problem Goal: Develop a general computational model for cooperatively optimizing a global system objective through local interactions and computations in a multi-agent system with randomly varying connectivity Global objective is a combination of individual agent performance measures Examples: Consensus problems: Alignment of estimates maintained by different agents Control of moving vehicles (UAVs), computing averages of initial values Parameter estimation in distributed sensor networks: Regression-based estimates using local sensor measurements Congestion control in data networks with heterogeneous users MIT Laboratory for Information and Decision Systems 3

Related Literature Parallel and Distributed Algorithms: General computational model for distributed asynchronous optimization Tsitsiklis 84, Bertsekas and Tsitsiklis 95 Consensus and Cooperative Control: Analysis of group behavior (flocking) in dynamical-biological systems Vicsek 95, Reynolds 87, Toner and Tu 98 Mathematical models of consensus and averaging Jadbabaie et al. 03, Olfati-Saber and Murray 04, Olshevsky, Tsitsiklis 07 Distributed Multi-agent Optimization: Distributed subgradient methods with local information and network effects Nedić and Ozdaglar 07, Nedić, Olshevsky, Ozdaglar, Tsitsiklis 07, Nedić, Ozdaglar, Parrilo 08 Existing Work: Focus on deterministic models of network connectivity Worst-case assumptions about connectivity of agents (e.g. bounded communication intervals between nodes) MIT Laboratory for Information and Decision Systems 4

Random Network Models and Our Contribution Recent interest in random algorithms Randomized gossip algorithms for averaging or computation of separable functions Boyd et al. 05, Mosk-Aoyama and Shah 07 Consensus/agreement over random networks Availability of links between agents modeled probabilistically Hatano and Mesbahi 05, Wu 06, Tahbaz-Salehi and Jadbabaie 08 Our Contributions: General random network model Development of a distributed subgradient method for multi-agent optimization over a random network Convergence analysis and performance bounds MIT Laboratory for Information and Decision Systems 5

Model We consider a network of n agents with node set N = {1,..., n} Agents want to cooperatively solve min x R m n f i (x) i=1 Function f i (x) : R n R is a convex objective function known only by node i f 1 (x 1,..., x n ) f 2 (x 1,..., x n ) f m (x 1,..., x n ) Agents update and send their information at discrete times t 0, t 1, t 2... We use x i (k) R m to denote the estimate of agent i at time t k Agent Update Rule: Agent i updates his estimate according to: n x i (k + 1) = a ij (k)x j (k) α(k)d i (k) j=1 a ij (k): weight, α(k): stepsize, d i (k): subgradient of f i (x) at x = x i (k) The model includes consensus as a special case (f i (x) = 0 for all i) MIT Laboratory for Information and Decision Systems 6

Random Network Model Let x l (k) denote the vector of the l th component of all agent estimates at time k, i.e., x l (k) = (x l 1(k),..., x l n(k)) for all l = 1,..., m Agent update rule implies that the component vectors of agent estimates evolve according to x l (k + 1) = A(k)x l (k) α(k)d l (k), d l (k) = (d l 1(k),..., d l n(k)) is a vector of the l th component of the subgradient vector of each agent A(k) is a matrix with components A(k) = [a ij (k)] i,j N We assume that A(k) is a random matrix that describes the time-varying connectivity of the network MIT Laboratory for Information and Decision Systems 7

Assumptions Assumption (Weights) Let F = (Ω, B, µ) be a probability space such that Ω is the set of all n n stochastic matrices, B is the Borel σ-algebra on Ω and µ is a probability measure on B. (a) There exists a scalar γ with 0 < γ < 1 such that A ii γ for all i and all A Ω. (b) For all k 0, the matrix A(k) is drawn independently from probability space F. Implications: Each agent takes convex combination of the information from his neighbors The induced graph, i.e., the graph (N, E + (k)), E + (k) = {(j, i) a ij (k) > 0}, is a random graph that is independent and identically distributed over time k This allows edges at any time k to be correlated Formally, we define a product probability space (Ω, B, µ ) = k=1 (Ω, B, µ). The sequence A = {A(k)} is drawn from this product probability space. MIT Laboratory for Information and Decision Systems 8

Connectivity Consider the expected value of the random matrices A(k), Ã = E[A(k)] for all k 0. Define the edge set induced by the positive elements of the matrix Ã, Ẽ = {(j, i) Ã ij > 0} Assumption (Connectivity) The mean connectivity graph (N, Ẽ) is strongly connected. This assumption ensures that in expectation, the information of an agent i reaches every other agent j through a directed path. Assume without loss of generality that the scalar γ > 0 of part (a) of the Weights Assumption satisfies min (j,i) Ẽ Ã ij 2 γ. MIT Laboratory for Information and Decision Systems 9

Linear Dynamics and Transition Matrices We introduce the transition matrices Φ(k, s) = A(s)A(s + 1) A(k 1)A(k) for all k s We use these matrices to relate x i (k + 1) to x j (s) at time s k: ( n k m ) x i (k+1) = [Φ(k, s)] ij x j (s) [Φ(k, r)] ij α(r 1)d j (r 1) α(k)d i (k). j=1 r=s+1 j=1 We analyze convergence properties of the distributed method by establishing: Convergence of (random) transition matrices Φ(k, s) (consensus part) Convergence of an approximate subgradient method (effect of optimization) MIT Laboratory for Information and Decision Systems 10

Properties of Transition Matrices We consider the edge set E(k) = {(j, i) [A(k)] ij γ} We construct a probabilistic event in which the edges of the graphs E(k) are activated over time k in such a way that information propagates from every agent to every other agent in the network. We fix a node w N and consider two directed spanning trees in the mean connectivity graph (N, Ẽ): an in-tree rooted at w, T in,w, and an out-tree rooted at w, T out,w We consider a specific ordering of the edges of these spanning trees: T in,w = {e 1, e 2,..., e n 1 }, T out,w = {f 1, f 2,..., f n 1 } MIT Laboratory for Information and Decision Systems 11

Properties of Transition Matrices (Continued) Given any time s 0, we define the events: C l (s) = {A Ω A el (s + l) γ} for all l = 1,..., n 1, D l (s) = {A Ω A fl (s + (n 1) + l) γ} for all l = 1,..., n 1, ( ) G(s) = C l (s) D l (s). l=1,...,n 1 G(s) denotes the event in which each edge in the spanning trees T in,w and T out,w are activated sequentially following time s in the given order Lemma: Let Weights and Connectivity Assumptions hold. For any s 0, let A G(s). Then, [Φ(k, s)] ij γ k s+1 for all i, j, and all k s + 2(n 1) 1. Lemma: For any s 0, the following hold: (a) The events C l (s) and D l (s) for all l = 0,..., n 1 are mutually independent and P (C l (s)) γ, and P (D l (s)) γ for all l = 0,..., n 1. (b) P (G(s)) γ 2(n 1). MIT Laboratory for Information and Decision Systems 12

Geometric Decay Assumption (Doubly Stochastic Weights) The matrices A(k) are doubly stochastic with probability 1. We introduce the metric b(k, s) = max (i,j) {1,...,n} 2 [Φ(k, s)]j i 1 n for all k s 0 Lemma: Let Connectivity and Doubly Stochastic Weights assumptions hold. Then, where β and C are given by ( ) 2 C = 3 + exp γ 2(n 1) E[b(k, s)] Cβ k s for all k s, { γ4(n 1) 2 }, β = exp } { γ4(n 1) 4(n 1) MIT Laboratory for Information and Decision Systems 13

Analysis of the Subgradient Method Recall the evolution of the estimates: ( n k m ) x i (k+1) = [Φ(k, s)] ij x j (s) [Φ(k, r)] ij α(r 1)d j (r 1) α(k)d i (k). j=1 r=s+1 j=1 We introduce a related sequence: Let y(0) be given by y(0) = 1 n n j=1 x j(0) and y(k + 1) = y(k) α(k) n n d j (k), j=1 or equivalently y(k) = 1 n n x j (0) 1 n j=1 k n α(s) d j (s 1). s=1 j=1 This iteration can be viewed as an approximate subgradient method for minimizing f(x) = j f j(x), in which a subgradient at x = x j (k) is used instead of a subgradient at x = y(k). MIT Laboratory for Information and Decision Systems 14

Analysis of the Subgradient Method (Continued) We assume that the subgradients of f i are uniformly bounded by a constant L, and max 1 j n x j (0) L. We consider (weighted) averaged-vectors x i (k) defined for all k 0 as x i (k) = 1 k s=1 α(s) k α(t)x i (t) t=1 Proposition: An upper bound on the objective value f( x i (k)) for each i is given by f( x i (k)) f + n 2 k r=1 α(r) dist2 (y(0), X ) + + 3nL2 k r=1 α(r) k t=1 [ t 1 α(t) n s=0 nl 2 2 k r=1 α(r) k α 2 (t) t=1 ] α(s 1)b(t 1, s) + 2α(t 1) MIT Laboratory for Information and Decision Systems 15

Main Convergence Result for Diminishing Stepsize Rule We consider a diminishing stepsize rule: The stepsize α k is decreasing and is s.t. α(k) = k=0 and α(k) 2 = A <. k=0 Proposition: Let Connectivity-Doubly Stochastic Weights assumptions hold. For all i N, we have Follows from the lemma: lim E[f( x i(k))] = f and k lim inf k f( x i(k)) = f with probability 1. Lemma: Let Connectivity-Doubly Stochastic Weights assumptions hold. We have [ ] lim E 1 k t 1 k k r=1 α(r) α(t)α(s 1)b(t 1, s) = 0 and t=1 s=0 lim inf k 1 k r=1 α(r) k t=1 t 1 s=0 α(t)α(s 1)b(t 1, s) = 0 with probability 1. MIT Laboratory for Information and Decision Systems 16

Main Convergence Result for Constant Stepsize Rule We consider a constant stepsize rule: The stepsize α k satisfies α k = α for all k and for some scalar α > 0 Proposition: Let Connectivity-Doubly Stochastic Weights assumptions hold. For all i N, we have lim sup k E[f( x i (k))] f + αnl 2 ( (1 + 2n)C 1 β + 1 ), 2 where C and β are the constant governing the geometric decay, i.e., E[b(k, s)] Cβ k s for all k s. Conjecture: For all i N, we have lim sup k f( x i (k)) f + αnl 2 ( (1 + 2n)C 1 β + 1 2 ) with probability 1. MIT Laboratory for Information and Decision Systems 17

Conclusions We presented a distributed subgradient method for multi-agent optimization with a random network model We analyzed the convergence of the algorithm under different stepsize rules Extensions: Optimization algorithms for the case when probability of link failures depends on the current information state of each node state-dependent consensus Asynchronous optimization algorithms: Analysis of delay effects on performance Nonconvex local objectives Effects of constraints: distributed primal-dual approaches Second-order distributed optimization algorithms MIT Laboratory for Information and Decision Systems 18