A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs

Size: px

Start display at page:

Download "A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs"

Dennis Oliver
5 years ago
Views:

Ceccarello Joint work with Andrea Pietracaprina,

1 A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs Matteo Ceccarello Joint work with Andrea Pietracaprina, Geppino Pucci, and Eli Upfal Università di Padova Brown University

2 Problem definition Problem definition Undirected, weighted graphs Find the weighted diameter of G(V, E, w) Φ(G) = max {dist(u, v)} u V,v V, dist(u,v)< The diameter is a fundamental topological parameter of a graph with important practical applications Goal Develop a parallel approximation algorithm that requires a number of rounds sublinear in the unweighted diameter linear space

3 Previous work: exact algorithms Computing the diameter exactly Solve All Pairs Shortest Paths Dijkstra s algorithm from every node: O(n (m + n log n)) Floyd-Warshall s dynamic programming algorithm: O(n ) Repeated fast matrix multiplication: O(n ω log n) Ω(n ) space/time is required

4 Previous work: approximation algorithms -stepping [Meyer, Sanders JoA 0] Parallel Single Source Shortest Path algorithm Provides a -approximation to the diameter Linear space Running time lower bounded by unweighted diameter MapReduce approximation algorithm for unweighted graphs [Ceccarello, Pietracaprina, Pucci, Upfal SPAA 5] Linear space Running time sublinear in the unweighted diameter

5 Our Result Our Result MapReduce algorithm for diameter approximation O(log n) approximation factor Linear space Number of rounds sublinear in the unweighted diameter

6 MapReduce: a model of computation for Big Data Characteristics Distributed computation on clusters of commodity machines Practical algorithms can only afford linear space Model [Pietracaprina, Pucci, Riondato, Silvestri, Upfal ACM-ICS ] An algorithm is expressed as a sequence of rounds, each transforming datasets of key-value pairs There are memory limits: ML memory available for local computation: sub-linear in the data size M A sum of local memories

7 A MapReduce round (k, v ) Aggregate space: M A (k, w ) (k, v ) (k, v ) Reduce k + map (space M L) (k, w ) (k, w ) (k, v ) (k, v 5) Reduce k + map (space M L) (k, w ) (k, w 5) (k, v 6) (k, v 7) Reduce k + map (space M L) (k, w 6) (k, v 8) Data shuffle (k 5, w 7)

8 Diameter approximation: high-level strategy Idea Derive from the input graph a smaller auxiliary graph whose diameter can be computed on a single reducer Relate the diameter of the auxiliary graph to the one of the input graph

9 Diameter approximation: high-level strategy G = (V, E): undirected graph, polynomial weights. n = V, m = E, weighted diameter Φ(G).. Compute a decomposition C of G into clusters of small radius centered at random nodes. Estimate diameter Φ(G) from diameter Φ(G C ), with G C a suitable quotient graph derived from C Remarks cluster granularity chosen so that G C fits into local memory small radius low round complexity, better approximation

10 A difficult input Random sampling does not work well High-diameter, skinny regions may be hard to cover large radius

11 Algorithm cluster(τ) Challenges In order to attain small cluster radius we must. Build more clusters in remote regions of the graph. Avoid heavyweight edges in building the clusters

12 Algorithm cluster(τ) Progressive clustering strategy [Ceccarello, Pietracaprina, Pucci, Upfal-SPAA 5]. Select new batch of τ cluster centers from uncovered nodes. Grow both old and new clusters until covering half of the uncovered nodes. Repeat steps - until complete coverage -stepping (inspired by [Meyer,Sanders JoA 0]) guess on minimum cluster radius In each iteration of progressive clustering (Steps -): Use only light edges (weight < ) and stop at radius If desired coverage cannot be obtained then

13 Algorithm cluster(τ) Define R G (τ): optimum radius for a τ-clustering of G l X : max number of edges in a min-weight path of weight X Theorem W.h.p. cluster(τ) computes a decomposition C of G into O(τ log n) clusters max cluster radius: O(R G (τ) log n) round complexity: O(l RG (τ) log n) on MR(n ɛ, m) for any constant ɛ (0, ).

14 Diameter approximation: example Graph G, weighted diameter Φ(G) = 6 B L C D A 5 G E H I F M N S O R P Q

15 Diameter approximation: example τ =, = B L C D A 5 G E H I F M N S O R P Q

16 Diameter approximation: example τ =, = B L C D A 5 G E H I F M N S O R P Q

17 Diameter approximation: example τ =, = B L C D A 5 G E H I F M N S O R P Q

18 Diameter approximation: example τ =, = B L C D A 5 G E H I F M N S O R P Q

19 Diameter approximation: example τ =, = B L C D A 5 G E H I F M N S O R P Q

20 Diameter approximation: example A B F G H C D E L I P Q S N O 5 M R QuotientgraphG C Φ(G C ) = Φ(G) <= ++ = 8 (vs 6) A 5 M R 7 Radius= Radius= Radius=

21 Diameter approximation: main result Theorem For a given weighted graph G, w.h.p. we can compute an upper bound to Φ(G) approximation ratio: O(log n) round complexity: O(l RG (τ) log n log n) on MR(n ɛ, m), for any constant ɛ (0, ). Corollary: Graphs of bounded doubling dimension b Ψ(G) is the unweighted diameter of G. The number of rounds is ( ) Ψ(G) O log n n ε /b Plain -stepping requires Ω(Ψ(G)) rounds.

22 Diameter approximation: experiments Experimental setup In-house cluster with 6 machines 8GB RAM / Intel i7 nehalem -core processor Spark MapReduce platform Datasets Graph n m Φ(G) roads-usa,97,7 9,66,67 55,859,80 roads-cal,890,85,8,87 6,85,58 livejournal,997,96, 68, twitter,65,0,68,65, mesh(s) S S(S ) R-MAT(S) S 6 S roads(s) S. 0 7 S time (s) Scalability R MAT(6) roads() the diameter depends on the size of the graph, controlled by S >. 0 machines

23 Diameter approximation: experiments We compare our algorithm (CLUSTER) with -stepping 0 5 Time CLUSTER stepping Rounds 0 5 CLUSTER stepping time (s) roads USA roads CAL mesh livejournal twitter R MAT() 0 0 roads USA roads CAL mesh livejournal twitter R MAT()

24 Diameter approximation: experiments Work Approximation 0 CLUSTER stepping.5 CLUSTER stepping roads USA roads CAL mesh livejournal twitter R MAT().0 roads USA roads CAL mesh livejournal twitter R MAT()

theoretical approximation factor Relate the approximation factor to the available resources

25 Conclusions Conclusions Approximation factor O(log n): much better in practice Number of rounds sublinear in the unweighted diameter Open source implementation Open problems Reduce the theoretical approximation factor Relate the approximation factor to the available resources Diameter approximation on directed graphs crono.dei.unipd.it/gradias/

arxiv: v3 [cs.dc] 6 Feb 2015

arxiv: v3 [cs.dc] 6 Feb 2015 Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation Matteo Ceccarello Department of Information Engineering University of Padova Padova, Italy ceccarel@dei.unipd.it