Decentralized Consensus Optimization with Asynchrony and Delay

Size: px

Start display at page:

Download "Decentralized Consensus Optimization with Asynchrony and Delay"

Imogene Bryant
5 years ago
Views:

1 Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University of California, Los Angeles 3 Department of Automation, University of Science and Technology of China Abstract We propose an asynchronous, decentralized algorithm for consensus optimization. The algorithm runs in a network of agents, where the agents perform local computation and communicate with neighbors. We design our algorithm so that the agents can compute and communicate independently, at different times, for different durations. This reduces the waiting time for the slowest agent or longest communication delay and also eliminates the need for a global clock. Mathematically, our algorithm involves both primal and dual variables, uses fixed parameters, and has convergence guarantees under a bounded delay assumption and a random agent assumption. When running synchronously, its performance matches the current state-of-the-art algorithms (for example, PG-EXTRA, which however fails to converge without synchronization.) Through simulations, we demonstrate that our asynchronous algorithm converges much faster that it does under synchronization. Index Terms decentralized, asynchronous, delay, consensus optimization. I. INTRODUCTION AND RELATED WORK This paper considers a connected network of n agents that cooperatively solve the consensus optimization problem minimize f(x) := f i (x), x R p n where f i (x) := s i (x) + r i (x), i =,..., n. () We assume that the functions s i, r i : R p R are convex differentiable and possibly nondifferentiable functions, respectively. We call f i = s i + r i a composite objective function. Each s i and r i are kept private by agent i =, 2,, n, and r i often serves as the regularization term or the indicator function to a certain constraint on x. Decentralized algorithms rely on agents local computation, as well as the information exchange between agents. Such algorithms are generally robust to failure of critical relaying agents and scalable with network size. In the decentralized setting, it is inefficient to synchronize multiple nodes and links. To see this, let x R p be the local variable of agent i, and let x = [x (),..., x (n) ] R n p stack all local variables. To perform an iteration that updates the entire x k to x k+, all the agents must wait for the slowest agent or longest communicate delay. Hence, the performance This work was supported in part by NSF grants CCF , DMS-37602, and ECCS-40772, NSF China grant , and DARPA project N s: {wuty,kunyuan,wotaoyin,sayed}@ucla.edu, qingling@mail.ustc.edu.cn is determined by the worst, not the average. In addition, a global coordinator or clock must be implemented, making it expensive to build, configure, and scale the network. This paper proposes a new asynchronous decentralized algorithm (which also works if synchronized, of course). Consider a connected network G = {V, E} with agents V = {, 2,, n} and undirected edges E = {, 2,, m}. By convention, all edges (i, j) E obey i < j. Our algorithm involves both node-variables x = [x (),..., x (n) ] and edgevariables y = [y (),..., y (m) ]. We associate each row of y with an edge e = (i, j) E, and for simplicity we let agent i keep the variable y (this choice is arbitrary). It is easy to first present our algorithm in the abstract, synchronous form: x k+ T x (x k, y k ), y k+ T y (x k, y k ), (2a) (2b) where T x, T y are operators suitable for decentralized implementation. (Their expressions are given in II-A). The performance of (2) matches the state-of-the-art synchronous algorithms (PG-)EXTRA [], [2] and ADMM [3] [5]. In our asynchronous setting, agents can compute and communicate independently, at different moments, for different durations. There is no coordination whatsoever. As such, we must count iterations in a new way: k is incremented when any agent completes a round of its computation (ties are broken arbitrarily). Suppose that this occurs with agent i, which we let perform the following updates to its local variables: x k+ T x i (x k τ k, y k δk ) (3a) y k+ (i,j) T y (i,j) (xk τ k, y k δk ), j : (i, j) E. (3b) The rows of x, y not held by agent i remain unchanged from k to k +. In (3), Ti x and T y (i,j) are the sub-operators of T x and T y corresponding to agent i and edge (i, j), respectively. Again, their computation uses only those entries of x k τ k, y k δk held by agent i and its neighbors. We let τ k R n + and δ k R m + be vectors of delays. If j is a neighbor of i, then the jth row of x k τ k is (x k τ k j (j) ), meaning that agent i uses a copy x (j) that is τj k -iteration out of date. Mathematically, under uniformly bounded (but otherwise arbitrary) delays and that the next update is done by a random agent, we will show that our sequence {x k } converges to a The index i of the agent responsible for any kth update is random and independent of those responsible for the earlier updates,..., k.

2 x () y (,2) y (,3) k agent agent 2 0 2 3 4 5 6 agent 3 x y (2,3) (2) x (3) /me Fig. : Network and uncoordinated computing. solution to Problem () with probability one. What can cause delays?

Besides, as agents start and finish their iterations independently, one agent may have updated its variables whilst its neighbors are still working on their current iterations that still use older (i.

However, the two types of delays are mathematically indifferent in (3). Fig. depicts a simple three-agent network and how the agents perform their computing.

2 2 x () y (,2) y (,3) k agent agent agent 3 x y (2,3) (2) x (3) /me Fig. : Network and uncoordinated computing. solution to Problem () with probability one. What can cause delays? Clearly, communication latency and bandwidth limits introduce delays. Besides, as agents start and finish their iterations independently, one agent may have updated its variables whilst its neighbors are still working on their current iterations that still use older (i.e., delayed) copies of those variables. Hence, both computation and communication cause delays and count for positive τ k and δ k. However, the two types of delays are mathematically indifferent in (3). Fig. depicts a simple three-agent network and how the agents perform their computing. The graph has V = {, 2, 3} and E = {(, 2), (, 3), (2, 3)}. The node-variable x and edge-variable(s) y (i,j) are assigned to agent i. At the moments when agents finish their updates, k =, 2,... are assigned. Not depicted in Fig. is that updated variables may take time to arrive at neighbors. As mentioned above, both communication and uncoordinate computing can cause delays. A. Relationship to certain synchronous algorithms Our algorithm, if running synchronously, can be algebraically reduced to PG-EXTRA [2]; they solve Problem () with a fixed step-size parameter and are typically faster than algorithms using diminishing step sizes. Also, both algorithms generalizes EXTRA [4], which only deals with differentiable functions. However, divergence (or convergence to wrong solutions) is observed when we run EXTRA and PG-EXTRA in our asynchronous setting. Our algorithm works correctly, and for this we must introduce the variable y, which leads to the moderate cost of updating and communicating y. Our algorithm is also very different from decentralized ADMM [3] [5] except that both algorithms can use fixed parameters. Distributed gradient descent [6], [7] and (Prox- )diffusion methods [8], [9] also use fixed step sizes, and they converge fast only to approximate solutions. B. Related decentralized algorithms under different settings Our setting of asynchrony is different from randomized single activation, which is assumed by the randomized gossip algorithm [0], []. Their setting activates only one edge at a time and does not allow delay. That is, before each activation, computation and communication associated with previous activations must have been completed, and only one edge in each neighborhood can be activated at any time. Likewise, our setting is different from randomized multi-activation in papers such as [2], [3] for consensus averaging and [4] [9] for consensus optimization, which activate multiple edges each 7 time and still does not allow delay. These algorithms can be alternatively viewed as synchronous algorithms running in a sequence of varying subgraphs. Since each iteration waits for the slowest agent or longest communication that are previously activated, a certain coordinator or global clock is needed. We shall also distinguish our setting from the fixedcommunication-delay setting [20], [2], where the information passing through each edge takes a fixed number of iterations to arrive. (Different edges can have different such numbers, and agents can compute with only the information they have, instead of waiting.) As demonstrated in [20], this setting can be transferred into no communication delay by replacing an edge with a chain of dummy nodes. Information passing through a chain of τ dummy nodes simulates an edge with a τ- iteration delay. The computation in this setting is synchronous, so a coordinator or global clock is still needed. Other work [20], [22] considers random communication delay in their setting. However, their algorithm is only suitable for consensus averaging, not yet for the more general problem (). Our setting is identical to the setting outlined in 2.6 of [23], where the introduced asynchronous decentralized ADMM allows both computation and communication delays. Our algorithm, however, handles composite functions. C. Contributions This paper introduces a decentralized algorithm for Problem () that has provable convergence when the next update is done by a random agent and when communication is subject to arbitrary but bounded delays. If running synchronously, our algorithm is as fast as the state-of-the-art algorithms except, to allow asynchrony, our algorithm involves updating and communicating the edge-variable y. When our algorithm runs asynchronously, it eliminates wait and is significantly faster. Our asynchronous setting is considerably less restrictive than the settings under which recent non-synchronous or nondeterministic decentralized algorithms are proposed. In our setting, the computation and communication of agents are uncoordinated. A global clock is no longer needed. Technical contributions are also made. We borrow ideas from the monotone operator theory and primal-dual operator splitting to derive our algorithms in just a few steps. (The edge variable y is our dual variable). To establish convergence under information delays, we do not simply follow the existing analysis of PG-EXTRA; instead, motivated by [23], [24], a new non-euclidean metric is introduced to absorb the delays. In machine learning, developing asynchronous algorithms for Problem () has become a hot topic. Unlike the existing majority, our algorithm involves both primal and dual variables; hence, our analysis is different. In particular, we cannot rely on the monotonicity of objective values. Instead, our approach establishes monotonic conditional expectations of certain distances to solution. We believe this new analysis can extend to a general set of primal-dual algorithms. D. Discussion and weaknesses As the reader will soon see, our algorithm involves a relaxation parameter that depends on the bound of delays,

3 3 which is a weakness. As a matter of fact, requiring delays to be bounded is itself a weakness. However, with more careful analysis, the bound can be relaxed and the relaxation parameter can be set adaptively to the local delays (i.e., how much out of date an agent knows about the variables of its neighbors). We leave this work to a forthcoming longer report. A weakness of both (PG)-EXTRA and our algorithm is that a step size parameter depends on a global Lipschitz constant. Unless the constant is known a prior, we must apply consensus averaging to obtain it. There is a way to overcome this weakness by assigning a parameter to each agent; however, we have to leave it to the longer report. A weakness that is difficult to overcome is the random agent assumption. It is not always practical, but if completely dropping it, we would face the worst deterministic case, which requires impractical assumptions to the problem. The real world lies somewhere between the worst deterministic case and the ideal random case. Our results shall shed some light on how to control an asynchronous algorithm in practice. Finally, we have not extended our algorithm to deal with directed graphs like the synchronous algorithm ExtraPush [25]. E. Notation Each agent i holds a local variable x R p, whose value at iteration k is denoted as x k. We introduce variable x to stack all local variables x s: x := x () x (2). x (n) Rn p. (4) The ith row of x is the local variable x R p kept by agent i. Now we define s(x) := s i (x ), r(x) := r i (x ), (5) as well as f(x) := n f i(x ) = s(x) + r(x). We then define the gradient of s(x) as s (x () ) s 2 (x (2) ) s(x) :=. s n (x (n) ) Rn p. (6) The inner product on R n p is defined as x, x = tr(x x) = n x x, and the norm is defined as x = x, x. II. ALGORITHMS In our network G = {V, E}, to each edge (i, j) E, we assign a weight w ij 0, which agent i uses to scale x (j) receiving from agent j. Likewise, let w ji = w ij for agent j. If (i, j) / E, then w ij = w ji = 0. For i, N i denotes the neighborhood of agent i, and E i denotes the set of all edges connected to i. Let W = [w ij ] R n n denote the weight matrix, which is symmetric and assumed to be doubly stochastic. Such W can be generated through the maximum-degree [26] or Metropolis- Hastings rules [26]. It is easy to verify that null{i W } = span{}. Introduce the diagonal matrix D R m m with diagonal entries D e,e = w ij /2 for each edge e = (i, j). Let C = [c ei ] R m n be the incidence matrix of G, and define V := DC (7) as the scaled incidence matrix. It is easy to verify: Proposition (Matrix factorization). 2 (I W ) = V V. (8) A. Proposed primal-dual algorithm Let us reformulate Problem (). First, it is equivalent to minimize f i (x ) + r i (x ), {x (),,x (n) } subject to x () = x (2) = = x (n). (9) Since null{i W } = span{}, Problem (9) is equivalent to minimize s(x) + r(x) x R n p subject to (I W )x = 0. (0) By Proposition, Problem (0) is further equivalent to minimize s(x) + r(x) x R n p subject to V x = 0, () which can be reformulated into the saddle-point problem max y R m p min s(x) + r(x) + y, V x, (2) x R n p α where α > 0 is a parameter and y is the dual variable. Problem (2) can be solved iteratively by the primal-dual algorithm that is adapted from [27] [28]: { y k+ =y k + V x k, x k+ =prox αr [x k α s(x k ) V (2y k+ y k (3) )]. 2 Next, in the x-update, eliminating y k+ by plugging in the y-update and, with I 2V V = W, we arrive at: { y k+ = y k + V x k, x k+ = prox αr [W x k α s(x k ) V y k (4) ], which computes (y k+, x k+ ) from (y k, x k ). Applying W, V and V requires communication. Other operations are local. B. Synchronous algorithm Alg. implements the iteration (4) in the synchronous setting, which requires two synchronization barriers in each iteration k. The first one holds computing until an agent receives all necessary input; after the agent finishes updating its variables, the second barrier holds it from sending out its updates until all of its neighbors also finish computing theirs (otherwise, an update intended for iteration k + may arrive at a neighbor too early, entering its computing still at iteration k). Note that the second barrier can be replaced by a buffer. 2 Proximal operator: prox αr (w) := arg min v r(v) + 2α v w 2.

4 4 Algorithm : Synchronous algorithm based on (4) Input: Starting point x 0, y 0. Set counter k = 0; while all agents i V in parallel do Wait until x k (j), j N i, and y k (l), l E i, are received; Compute: ( x k+ = prox αri w ij x k (j) α s i(x k ) V li y k ) (l) ; j N i l E i y k+ = y k + ( V ei x k + V ejx k (j)), e = (i, j) E; Wait until all neighbors finish computing; Set k k + ; Send out x k+ C. Asynchronous algorithm and y (k+) (i,j), (i, j) E, to neighbors; As already discussed, every agent computes and communicates independently in this setting. Hence, no synchronization barrier is needed. We let k increase by whenever an agent finishes a round of updating its variables. In general, agents compute with delayed information from its neighbors. Also, relaxation is added to the abstract update (3) to ensure convergence; the relaxation η i depends on how out of date an agent knows about the inputs from its neighbors. Longer delays require a smaller η i and cause slower convergence. Compute: ( x k+ = prox αri w ij x k τ k j (j) j N i ỹ k+ = y k + ( V ei x k + V ejx k τ k j (j) Relaxed updates: x k+ α s i (x k ) l E i V li y k δk l (l) ), e = (i, j) E; ( ) = x k + η i x k+ x k, ( ) = y k+ + η i ỹ k+ y k, e = (i, j) E. y k+ (5) The entries of x, y not held by agent i remain unchanged from k to k +. Alg. 2 implements the asynchronous updates. Algorithm 2: Asynchronous algorithm based on (5) Input: Starting point x 0, y 0 ; while each agent i asynchronously do Compute per (5) using the information it has; Send updated x and y (i,j), (i, j) E to neighbours; III. CONVERGENCE ANALYSIS We present our main assumptions and convergence results. As space is limited, all proofs are left to the longer report. Assumption. For any k > 0, the index i k of the agent responsible for the kth update is random and has probability q i := P (i k = i) > 0. ), The random variables i, i 2, are independent. This assumption is satisfied under either of the following scenarios: every agent i is activated following an independent Poisson process with parameter λ i and its computation is instant, leading to q i = λ i /( n λ i); (ii) every agent i runs continuously, and the duration of each round follows the exponential distribution exp(β i ), leading to q i = β i /( n β i ). Scenarios and (ii) appear in some literature as assumptions. Assumption 2. The delays τ k j, j =, 2,..., n and δk e, e =, 2,..., m, k, defined in (5) have an upper bound τ > 0. Assumption 3. Statistically speaking, the delays τ k j, j =, 2,..., n and δ k e, e =, 2,..., m, at iteration k, are independent of the index i k of the agent responsible for the update. We admit that this assumption is rather artificial, but it is crucial to our proof. In the worst case scenario, when the delays τ k j and δ k e always achieve their upper bound τ, Assumption 3 still holds and the convergence of the algorithm is still provable. In reality, what happens is between the worst case and the no-delay case. Since the weight matrix W associated with the network is symmetric and doubly stochastic, its eigenvalues lie in [, ]. We further restrict its minimum eigenvalue λ min (W ): Assumption 4. λ min (W ) >. Lemma. Under Assumption 4, we have [ ] I V G := 0. (6) V I Let ρ min := λ min (G) > 0 be the smallest eigenvalue of G and κ be its condition number. Assumption 5. ) The functions s i and r i are closed, proper and convex; 2) the functions s i are differentiable and satisfy: s i (x) s i ( x) L i x x, x, x R p ; 3) the parameter α in Eq. (4) and Alg. 2 satisfies 0 < α < 2ρ min L, where L = max i L i. We have the following theorem for z k := [x k ; y k ]. Theorem. Let Z be the set of primal-dual solutions to (2), (z k ) k 0 be the sequence generated by Alg. 2, and η i = η nq nq i with η (0, η max ] where η max < min 2τ κq and min+κ q min := min i q i. Then (z k ) k 0 converges to a point in Z with probability. This theorem guarantees that, if we run the asynchronous algorithm 2 from an arbitrary starting point x 0, then with probability, the sequence {x k } produced will converge to one of the solutions to problem (2). The upper bound of η max becomes smaller if the maximum delay is larger, or if the matrix M becomes more ill-conditioned. While the theorem bounds η i by a uniform η max, the bound can be improved by locally adapting it to the delay only pertaining to agent i. We leave this and other improvements to the longer report.

5 5 Relative error Time (ms) Synchronous Alg Asynchronous Alg 2 Fig. 2: asynchronous and synchronous algorithms IV. NUMERICAL EXPERIMENTS Since there is no similar asynchronous algorithm to compare with, we compare our algorithm between its synchronous and asynchronous settings, i.e., Alg. versus Alg. 2, to illustrate asynchronous advantages. The computation times and communication times are generated randomly. The tested problem is decentralized compressed sensing. Each agent i {,, n} holds some measurements: b = A x + e R m, where A R m p is a sensing matrix, x R p is the common unknown sparse signal, and e is i.i.d. Gaussian noise. The goal is to recover x. The number of measurements n m may be less than the number of unknowns p, so we solve the l -regularized least squares: minimize x n s i (x) + r i (x), (7) where s i (x) = 2 A x b 2 2, r i (x) = γ x, and γ is the regularization parameter with agent i. The tested network has 0 nodes and 4 edges. We set m = 3 for i =,, 0 and p = 50. The entries of A, e are independently sampled from the standard normal distribution N(0, ), and A is normalized so that A =, where is the induced 2-norm. The signal x is generated randomly with 20% nonzero elements. We simulate computation and communication times. The computation time of agent i is sampled from exp(µ i ). For agent i, µ i is set as 2+ X, X N(0, ). The communication time between agents are independently sampled from exp(0.6). We run the synchronous Alg. and the asynchronous Alg. 2 and plot the relative error xk x F x 0 x F against time, as depicted in Fig. 2. x is the exact solution. The step sizes of both algorithms are tuned by hand and are nearly optimal. From Fig. 2 we can see that both algorithms exhibit linear convergence and that Alg. 2 converges significantly faster. Within the same period (roughtly 2760ms), the asynchronous algorithm finishes 2 times as many rounds of computation and communication, due to the elimination of waiting time. REFERENCES [] W. Shi, Q. Ling, G. Wu, and W. Yin, EXTRA: An exact first-order algorithm for decentralized consensus optimization, SIAM Journal on Optimization, vol. 25, no. 2, pp , 205. [2] W. Shi, Q. Ling, G. Wu, and W. Yin, A proximal gradient algorithm for decentralized composite optimization, IEEE Transactions on Signal Processing, vol. 63, no. 22, pp , 205. [3] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, Consensus in Ad hoc WSNs with noisy links Part I: Distributed estimation of deterministic signals, IEEE Transactions on Signal Processing, vol. 56, no., pp , [4] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, On the linear convergence of the ADMM in decentralized consensus optimization, IEEE Transactions on Signal Processing, vol. 62, no. 7, pp , 204. [5] T.-H. Chang, M. Hong, and X. Wang, Multi-agent distributed optimization via inexact consensus admm, IEEE Transactions on Signal Processing, vol. 63, no. 2, pp , 205. [6] A. Nedić and A. Ozdaglar, Distributed subgradient methods for multiagent optimization, IEEE Transactions on Automatic Control, vol. 54, no., pp. 48 6, [7] K. Yuan, Q. Ling, and W. Yin, On the convergence of decentralized gradient descent, arxiv preprint arxiv: , 203. [8] A. H. Sayed, Adaptive networks, Proceedings of the IEEE, vol. 02, no. 4, pp , April 204. [9] S. Vlaski and A. H. Sayed, Proximal diffusion for stochastic costs with non-differentiable regularizers, in Proc. International Conference on Acoustic, Speech and Signal Processing (ICASSP), Brisbane, Australia, April 205, pp [0] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Randomized gossip algorithms, IEEE/ACM Transactions on Networking (TON), vol. 4, no. SI, pp , [] A. G. Dimakis, S. Kar, J. M. Moura, M. G. Rabbat, and A. Scaglione, Gossip algorithms for distributed signal processing, Proceedings of the IEEE, vol. 98, no., pp , 200. [2] J. M. Kar, S.and Moura, Sensor networks with random links: Topology design for distributed consensus, IEEE Transactions on Signal Processing, vol. 56, no. 7, pp , [3] F. Fagnani and S. Zampieri, Randomized consensus algorithms over large scale networks, IEEE Journal on Selected Areas in Communications, vol. 26, no. 4, pp , [4] A. Nedic and A. Olshevsky, Distributed optimization over time-varying directed graphs, IEEE Transactions on Automatic Control, vol. 60, no. 3, pp , 205. [5] F. Iutzeler, P. Bianchi, P. Ciblat, and W. Hachem, Asynchronous distributed optimization using a randomized alternating direction method of multipliers, in Conference on Decision and Control (CDC). IEEE, 203, pp [6] P. D. Lorenzo, S. Barbarossa, and A. H. Sayed, Decentralized resource assignment in cognitive networks based on swarming mechanisms over random graphs, IEEE Transactions on Signal Processing, vol. 60, no. 7, pp , 202. [7] X. Zhao and A. H. Sayed, Asynchronous adaptation and learning over networks Part I: Modeling and stability analysis, IEEE Transactions on Signal Processing, vol. 63, no. 4, pp , 205. [8] E. Wei and A. Ozdaglar, On the o(/k) convergence of asynchronous distributed alternating direction method of multipliers, in Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), 203, pp [9] M. Hong and T. Chang, Stochastic proximal gradient consensus over random networks, arxiv preprint arxiv: , 205. [20] K. I. Tsianos and M. G. Rabbat, Distributed consensus and optimization under communication delays, in Allerton Conference on Communication, Control, and Computing. IEEE, 20, pp [2] K. I. Tsianos and M. G. Rabbat, Distributed dual averaging for convex optimization under communication delays, in IEEE American Control Conference, 202, pp [22] L. Liu, S.and Xie and H. Zhang, Distributed consensus for multi-agent systems with delays and noises in transmission channels, Automatica, vol. 47, no. 5, pp , 20. [23] Z. Peng, Y. Xu, M. Yan, and W. Yin, ARock: an algorithmic framework for asynchronous parallel coordinate updates, ArXiv e-prints arxiv: , June 205. [24] Z. Peng, T. Wu, Y. Xu, M. Yan, and W. Yin, Coordinate friendly structures, algorithms and applications, Annals of Mathematical Sciences and Applications, vol., no., pp. 57 9, 206.

6 [25] J. Zeng and W. Yin, ExtraPush for convex smooth decentralized optimization over directed networks, UCLA CAM Report 5-6, 205. [26] A. H. Sayed, Adaptation, learning, and optimization over networks, Foundations and Trends in Machine Learning, vol. 7, no. 4-5, pp. 3 80, 204. [27] L. Condat, A primal dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms, Journal of Optimization Theory and Applications, vol. 58, no. 2, pp , 203. [28] B. C. Vũ, A splitting algorithm for dual monotone inclusions involving cocoercive operators, Advances in Computational Mathematics, vol. 38, no. 3, pp ,

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,