Ef#icient Processing of Large Graphs via Input Reduction

Size: px

Start display at page:

Download "Ef#icient Processing of Large Graphs via Input Reduction"

Elwin Copeland
6 years ago
Views:

1 Ef#icient Processing of Large Graphs via Input Reduction Amlan Kusum, Keval Vora, Rajiv Gupta, Iulian Neamtiu HPDC Kyoto, Japan 04 June, 0

2 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

3 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

4 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v t v 0 v v v v 4 v v v 7 v 8 v 0 v t v v 4 v v 7 v 8 t 0 0 t 0 0 t 0 t t t 0 4 4

5 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution 0 v v 0 v 0 v v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

6 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution Challenging due to ever- growing graph sizes 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t 0 0 t 0 0 t 0 t t t 0 4 4

7 Graph Processing Iterative graph algorithms Vertices are processed over continuously Highly parallel execution Challenging due to ever- growing graph sizes Convergence speed is dependent on initializations 0 v v 0 v v 0 v v 4 v v 7 v 8 t v 0 v v v v 4 v v v 7 v 8 t 0 0 t t t How to Pind better initializations?

8 Key Idea 0 Compute initial values using a smaller signature of the original graph Generate smaller graph using light- weight input reduction techniques

9 Key Idea 0 Compute initial values using a smaller signature of the original graph Generate smaller graph using light- weight input reduction techniques Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

10 Key Idea 0 time(input Reduction) + time(phase ) + time(phase ) < time(original) Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

11 Outline Input Reduction Vertex Transformations Correctness of Results Evaluation Conclusion

12 Input Reduction 0 Must be light- weight & general Multilevel graph partitioning [SC 9, SC 0] Matching based contraction [ICPP 9, JPDC 98] Pruning based on edge costs affecting paths [ICDM 0] Gate graph for shortest paths problem [ICDM ] Develop vertex level transformations Easily parallelizable using the vertex centric graph processing systems

13 Vertex Transformations 0 Maintain structural integrity of the graph Preserve the overall connectivity Light- weight Local Non- interfering

14 Vertex Transformations 07

15 Vertex Transformations 07 8 V if ( indegree(v) =0) then apply T : drop v! E 0 E 0 \ outedges(v) E E\ if ( outdegree(v) =0) then apply T : drop!v E 0 E 0 \ inedges(v)

16 Vertex Transformations 07 E E\ if ( indegree(v) = outdegree(v) = ) then apply T : bypass v E 0 (E 0 \{u! v, v! w}) [{u! w} where {u! v, v! w} E 0

17 Vertex Transformations 07 G 8 V if ( w outneighbors(v) s.t.w is unchanged and outneighbors(v) \ inneighbors(w) = ) then apply T : drop v! w E 0 E 0 \{(v! w)}

18 Vertex Transformations 07

19 Other Details 08 More vertex transformations Some relax structural integrity Order of transformations UniPied graph reduction algorithm

20 Processing work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph g s f c b a d 4 e

21 Processing work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

22 Input Reduction g s f c b a d 4 e

23 Input Reduction g s f c b a d 4 e g s f c b a d

24 Input Reduction g s f c b a d 4 e s f c b a d

25 Input Reduction g s f c b a d 4 e s c b a d

26 Input Reduction g s c b a d 4 e s c b a d

27 Input Reduction g s f c b a d 4 e s c b a d

28 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

29 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

30 Processing Reduced Graph Use the original iterative algorithm s c d a b 0 c d a b s 0 s c a b d 0 s c a b d 0 9

31 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

32 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

33 Mapping Results Use default values for missing vertices s c d a b 0 9 f s c d a b g e 4 0 9

34 Processing Original Graph f s c d a b g e f c d a b g e s 4 0 9

35 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

36 Work#low Input Reduction Process (Phase ) Reduced Graph 4 Process (Phase ) Original Graph

37 Correctness: SSSP Example 8

38 Correctness: SSSP Example 8

39 Correctness: SSSP Example 8

40 Correctness: SSSP Example 8

41 Correctness: SSSP Example 8

42 Correctness 9 Transformation properties Level of vertices, edges and components Allow developing & reasoning for new transformations Algorithm behavior can be reasoned Phase initializations Properties of aggregation function Correctness argued for algorithms used accurate and approximate

43 Evaluation 0 Techniques independent of frameworks & processing environment Incorporated in Galois [PLDI ] Single machine: 4- core, GB RAM benchmarks PR, SSSP, SSWP, CC, GC, CD 4 input graphs Friendster ( E =.B), Twitter ( E =.B), UKDomain ( E = 9M), RMAT- 4 ( E = 8M)

44 Reduction a Execution Time Reduction

45 Reduction a Phase (Original Graph) Execution Time Phase (Reduced Graph) Reduction

46 Reduction a Phase (Reduced Graph) Phase (Original Graph) Execution Time Phase (Original Graph) Phase (Reduced Graph) Reduction

47 Reduction a Phase (Reduced Graph) Phase (Original Graph) Execution Time Phase (Original Graph) Reduction Reduction Phase (Reduced Graph) Reduction

48 Execution Time Speedups over parallel versions Speedups increase as ERP decreases up to an extent.x -.7x for 7% - 0% Structural dissimilarity for very low ERP Normalized Execution Time Time ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 Phase Phase Reduction ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 Friendster ( E =.B) ERP 7 ERP 70 ERP 0 ERP 0 ERP 40 ERP 0 SSSP SSWP PR GC CC CD SSSP SSWP PR GC CC CD

49 Input Reduction Transformations are local, i.e., parallelizable Higher reduction requires more work Speedup 0 0 Friendster ( E =.B) FT Number of Threads Normalized Reduction Time Friendster ( E =.B) 7% 70% 0% 0% 40% 0% ERP

50 Memory Overhead 4 Tracking dissimilar elements Newly added vertices & edges. Friendster ( E =.B) Memory Overhead % 70% 0% 0% 40% 0% ERP

Community Detection Reduced Graph Original Graph Accuracy 00% 80% 0% 40% 0% 0% Baseline ERP-40 0 400 800 00

51 Community Detection Reduced Graph Original Graph Accuracy 00% 80% 0% 40% 0% 0% Baseline ERP Execution time (sec) Friendster ( E =.B) Accuracy 00.0% 99.8% 0.% 99.% 99.4% 99.% 99.0% Execution time (sec)

52 More Results Contribution of individual transformations Some transformations more useful than others Different graphs benepit from different transformations Improvement in scalability Results for all inputs

53 Conclusion 7 Input reduction using transformations that are Light- weight Parallelizable General Correctness reasoned using Pine- grained transformation properties Achieve.-.4x speedups

54 Thanks GRASP

Using R for Iterative and Incremental Processing

Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant