Multiscale Approach for the Network Compression-friendly Ordering Ilya Safro (Argonne National Laboratory) and Boris Temkin (Weizmann Institute of Science) SIAM Parallel Processing for Scientific Computing 2012
Motivation Networks can be huge (tera-, peta-, exa-, zetta-, yotta-,... bytes) structurally different at different resolutions collected in parallel, i.e., the data is mixed noisy, irregular, etc. 2
Motivation Networks can be huge (tera-, peta-, exa-, zetta-, yotta-,... bytes) structurally different at different resolutions collected in parallel, i.e., the data is mixed noisy, irregular, etc. Challenges How to store the network efficiently? How to design an extremely fast access to nodes and links? How to minimize the number of cache misses? 3
Problem Find a compressed representation of a network. 4
Network representation: compressed row format Node Sorted list of neighbors with possible edge info 1 2, 5, 6, 12, 18, 23, 103...... 1584 1585, 1592, 1600 [KDD09 Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]: given a sorted list of neighbours (x 1, x 2, x 3,...), represent it by a list of differences (x 1, x 2 x 1, x 3 x 1,...) or (x 1, x 2 x 1, x 3 x 2,...). 5
Network representation: compressed row format Node Sorted list of neighbors with possible edge info 1 2, 5, 6, 12, 18, 23, 103...... 1584 1585, 1592, 1600 [KDD09 Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]: given a sorted list of neighbours (x 1, x 2, x 3,...), represent it by a list of differences (x 1, x 2 x 1, x 3 x 1,...) or (x 1, x 2 x 1, x 3 x 2,...). Node Sorted list of neighbors with possible edge info 1 1, 4, 5, 11, 17, 22, 102...... 1584 1, 8, 16 6
Network representation: compressed row format Node Sorted list of neighbors with possible edge info 1 2, 5, 6, 12, 18, 23, 103...... 1584 1585, 1592, 1600 [KDD09 Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]: given a sorted list of neighbours (x 1, x 2, x 3,...), represent it by a list of differences (x 1, x 2 x 1, x 3 x 1,...) or (x 1, x 2 x 1, x 3 x 2,...). Node Sorted list of neighbors with possible edge info 1 1, 4, 5, 11, 17, 22, 102...... 1584 1, 8, 16... and then use a compression algorithm (such as γ-encoding). Example: Boldi-Vigna compression. 7
The Minimum Logarithmic Arrangement problem Problem MLogA: minimize over all possible π s(n) lg π(i) π(j). ij E 8
The Minimum Logarithmic Arrangement problem Problem MLogA: minimize over all possible π s(n) lg π(i) π(j). ij E when we have additional information about link/node importance (or its expected access frequency) then introduce importance coefficients w ij, node volumes v i and formulate the generalized version of MLogA Problem GMLogA : min π s(n) c(g, x π ) = ij E w ij lg x i x j such that i V x i = v i /2 + k,π(k)<π(i) v k 9
The Minimum Logarithmic Arrangement problem Goal: minimize the number of bits per link: c(g, x π )/ ij E w ij. MLogA is NP-hard [Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]. Proof using the inapproximability of MaxCut. 0 1 2 4 5 6 3 Solutions of MLogA and MinLA are different for this graph [CKLMPP09] 10
Main questions Coarsening Uncoarsening Distance metric between nodes Fine to coarse projection operator Relaxation and Refinement Coarse to fine operator Exact solution S, Temkin Multiscale approach for the network compression-friendly ordering, 2011 Lindstrom The Minimum Edge Product Linear Ordering Problem, 2011 11
AMG coarsening c f Fine(f)-to-coarse(c) interpolation operator i L f Weighted Laplacian of G at level f j iw(i j) fine level vertices b l i k w(k l) i Σ w(i j) a 0 coarse level vertices L c ( c f )T L f c f w IJ = l,k iw(il) w lk iw(kj) 12
How to measure the connectivity? Examples of existing approaches Shortest path All/some (weighted) indirect paths Spectral approaches Flow network capacity based approaches Random-walk approaches: commute time, first-passage time, etc. (Fouss, Pirotte, Renders, Saerens,...) Speed of convergence of the compatible relaxation from AMG (Brandt, Ron, Livne,...) Probabilistic interpretation of a diffusion (Nadler, Lafon, Coifman, Kevrekidis,...) Effective resistance of a graph (Ghosh, Boyd, Saberi,...) 13
Stationary iterative relaxation Relaxation process that shows which pair of vertices tends to be more connected than other. 1. i V define x i = rand() 2. Do k times step 3 3. i V x k i = (1 ω)x k 1 i + ω j w ijx k 1 j / ij w ij Conjecture If x i x j > x u x v then the local connectivity between u and v is stronger than that between i and j. We will call s (k) ij = x i x j the algebraic distance between i and j after k iterations. 14
Toy example: graph mesh 20x40+diagonal 20 40 edge weights: red=2, black=1 15
Mesh 20x40+diagonal, random 2D initialization 16
Mesh 20x40+diagonal, after 10 iterations of JOR 17
Stationary iterative relaxation Rewrite the iterative process as x (k+1) = Hx (k), where H: H GS = (D L) 1 U, H SOR = (D/ω L) 1 ((1/ω 1)D + U), H JAC = D 1 (L + U), H JOR = (D/ω) 1 ((1/ω 1)D + L + U). Definition Extended p-normed algebraic distance between i and j after k iterations x (k+1) = Hx (k) on R random initializations ρ (k) ij := ( R r=1 x (k,r) i x (k,r) j p ) 1/p 18
Stationary iterative relaxation Rewrite the iterative process as x (k+1) = Hx (k), where H: H GS = (D L) 1 U, H SOR = (D/ω L) 1 ((1/ω 1)D + U), H JAC = D 1 (L + U), H JOR = (D/ω) 1 ((1/ω 1)D + L + U). Definition Extended p-normed algebraic distance between i and j after k iterations x (k+1) = Hx (k) on R random initializations ρ (k) ij := ( R r=1 lg x (k,r) i x (k,r) j p ) 1/p 19
Applications: multilevel graph coarsening Algorithmic Classical Algebraic component AMG distance-based AMG Future volume for edge weight algebraic distance C-points selection C-points selection edge weight algebraic distance Interpolation edge weight weak edge filtering operator Ron, S, Brandt Relaxation-based coarsening and multiscale graph organization, SIAM Multiscale Modeling and Simulations, 2011 Chen, S Algebraic distance on graphs, SISC, 2011 S, Sanders, Schulz Advanced coarsening schemes for multilevel graph partitioning, 2012 20
Uncoarsening P=P 0 S 0 P 1 A. Coarsening Defines the hierarchical structure (P=P,P,...,P ) 0 1 k P k 1 S k 1 C. Interpolation S and 1 Relaxation Produces an initial solution of P i 1of from the solution S i P i and constructs final solution S i 1 from Pi 1 P k S k B. Exact solution 21
w ia w ic w ie Uncoarsening: minimizing the contribution of one node a b c d e 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 i 22
w ia w ic w ie Uncoarsening: minimizing the contribution of one node a b c d e 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 i N i the set of ith neighbors with assigned coordinates x j. To minimize the local contribution of i to the total energy, we have to assign to it a coordinate x i that minimizes j N i w ij lg x i x j. (1) j N i, x i = x j x i = x t (1) is, we resolve this by setting t = arg min k N i k j N i w kj lg x k x j. (2) 23
Uncoarsening: minimizing the contribution of one node Problem arg min k Ni k j N i w kj lg x k x j Trivial solution complexity: O( N i 2 ) 24
Uncoarsening: minimizing the contribution of one node Problem arg min k Ni k j N i w kj lg x k x j Trivial solution complexity: O( N i 2 ) Our linear approach: look for the nearly minimum sum in the point of maximal density. How to do this: Parzen window (or kernel density estimation) method. The density at point x is estimated as d(x) = 1 K N i h j N i ( ) x xj h ˆd(x) = 1 w ij 2 x xj /h N i h j N i where K is a kernel and h is a smoothing parameter. (3) 25
Uncoarsening Interpolation is similar to AMG, x f = f c x c fine seeds are projected from x c fine non-seeds are calculated from Kernel Density Estimation legalization of the order by resolving overlaps Compatible relaxation: seeds are invariant, non-seeds are relaxed Gauss-Seidel relaxation: all nodes are relaxed Refinement: nearest-neighbors improvements 26
Refinement: window minimization pass W Find π of W that minimizes w ij lg x i x j + w ij lg x i x j ij W i W,j W subject to x i = v i /2 + k,π(k)<π(i) v k 27
What are the most competitive algorithms today? Randomized ordering - sometimes is better than parallel network crawling (fast to obtain, bad for performance) Lexicographical - network traversal for some order of neighbours such as BFS and DFS (easy to calculate, can be good for networks with excellent locality) Gray ordering - inspired by Gray coding when two successive vectors differ by exactly one bit (easy to calculate, good for Web-like (or good locality) networks) Shingle ordering - brings nodes with similar neighborhoods together, uses Jaccard coefficient J(A, B) = A B / A B to measure the similarity (works good in preferential attachment models when rich gets richer). LayeredLPA - label propagation algorithm is similar to the algebraic distance (usually better than previous methods) Spectral methods - based on Fiedler vector 28
ms-gmloga vs min(rnd, lex, nat) ratios (lg or la)/min(nat,lex,rnd) 1 0.8 0.6 0.4 la/min(nat,lex,rnd) lg/min(nat,lex,rnd) 0.2 0 20 40 60 80 100 ordered graphs 29
ms-gmloga vs min(rnd, lex, nat) 1 no-n-n-lg/min(nat,lex,rnd) 1 no-n-n-lg/min(nat,lex,rnd) k25-n-n-lg/min(nat,lex,rnd) k25-n-n-lg/min(nat,lex,rnd) ratios lg/min(nat,lex,rnd) 0.8 0.6 0.4 ratios lg/min(nat,lex,rnd) 0.8 0.6 0.4 0.2 0 20 40 60 80 100 ordered graphs (a) directed 0.2 0 20 40 60 80 100 ordered graphs (b) undirected Comparison of the fastest and the slowest versions of ms-gmloga. Notations no-n-n-lg and k25-n-n lg correspond to the fast and slow versions of ms-gmloga, respectively. 30
ms-gmloga vs Gray/Shingle ratios ms-gmloga/gray 1 0.8 0.6 0.4 0.2 directed: ms-gmloga/gray undirected: ms-gmloga/gray 0 20 40 60 80 100 ordered graphs (a) Gray ordering vs ms-gmloga ratios ms-gmloga/shingle 1 0.8 0.6 0.4 0.2 directed: ms-gmloga/shingle undirected: ms-gmloga/shingle 0 0 20 40 60 80 100 ordered graphs (b) Double shingle vs ms-gmloga 31
Comparison of LayeredLPA vs ms-gmloga ratios ms-gmloga/layeredlpa 1 0.8 0.6 undirected: ms-gmloga/layeredlpa 0.4 0 20 40 60 80 100 ordered graphs 32
Scalability: time vs graph size lg( V + E ) 24 22 20 18 16 14 12 10 8-4 -2 0 2 4 6 8 10 12 lg(running time in sec.) 33
Heavy-tailed degree distribution Compressed BPL/Native BPL 1 0.8 0.6 0.4 0.2 0 10 20 30 40 Networks with heavy-tailed degree distribution 34
Cache misses: ParMETIS, Graclus, hmetis, Power method, SVD Cache misses: GMLogA/Native 1 0.8 0.6 0.4 0.2 0 10 20 30 40 Networks with heavy-tailed degree distribution 35
Further compression. Boldi-Vigna algorithm 5 4 Compression Improvement 4 3 2 1 Compression Improvement 3 2 1 0 0 1 2 3 4 5 Order Improvement 0 0 2 4 6 8 Order Improvement a) initial ordering is natural b) initial ordering is Gray 36
MUSKETEER - Mulltiscale Entropic Network Generator First release (next week)! http://www.mcs.anl.gov/ safro/musindex.html 37
Conclusions Linear time method for the network compression-friendly ordering Computational results: (heavy-tailed) networks can be compressed; number of cache misses can be minimized; running time can be better Questions: directed graph coarsening; collective refinement of nodes; more sophisticated gap encodings Thank you! 38
Conclusions Linear time method for the network compression-friendly ordering Computational results: (heavy-tailed) networks can be compressed; number of cache misses can be minimized; running time can be better Questions: directed graph coarsening; collective refinement of nodes; more sophisticated gap encodings Looking for a position Thank you! 39