Multiscale Approach for the Network Compression-friendly Ordering

Similar documents
ALGEBRAIC DISTANCE ON GRAPHS

Solving PDEs with Multigrid Methods p.1

Algebraic Multigrid as Solvers and as Preconditioner

Spectral element agglomerate AMGe

Lecture 13 Spectral Graph Algorithms

New Multigrid Solver Advances in TOPS

Bootstrap AMG. Kailai Xu. July 12, Stanford University

Aspects of Multigrid

Adaptive algebraic multigrid methods in lattice computations

1 Matrix notation and preliminaries from spectral graph theory

Entropy for Sparse Random Graphs With Vertex-Names

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Communities, Spectral Clustering, and Random Walks

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs

A greedy strategy for coarse-grid selection

1 Matrix notation and preliminaries from spectral graph theory

Spectral Clustering. Guokun Lai 2016/10

Uniform Convergence of a Multilevel Energy-based Quantization Scheme

Background. Background. C. T. Kelley NC State University tim C. T. Kelley Background NCSU, Spring / 58

AMG for a Peta-scale Navier Stokes Code

Global vs. Multiscale Approaches

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Stabilization and Acceleration of Algebraic Multigrid Method

Drawing Large Graphs by Multilevel Maxent-Stress Optimization

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Preface to the Second Edition. Preface to the First Edition

Kasetsart University Workshop. Multigrid methods: An introduction

Markov Chains and Web Ranking: a Multilevel Adaptive Aggregation Method

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Graph Metrics and Dimension Reduction

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Harmonic Analysis and Geometries of Digital Data Bases

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Numerical Programming I (for CSE)

Chapter 7 Iterative Techniques in Matrix Algebra

Data Mining and Analysis: Fundamental Concepts and Algorithms

OPERATOR-BASED INTERPOLATION FOR BOOTSTRAP ALGEBRAIC MULTIGRID

IMPLEMENTATION OF A PARALLEL AMG SOLVER

Algorithms for Data Science: Lecture on Finding Similar Items

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

Convex Optimization of Graph Laplacian Eigenvalues

Clustering. SVD and NMF

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Parallel Iterative Methods for Sparse Linear Systems. H. Martin Bücker Lehrstuhl für Hochleistungsrechnen

EXAMPLES OF CLASSICAL ITERATIVE METHODS

Introduction to Scientific Computing

Robust solution of Poisson-like problems with aggregation-based AMG

Algebraic Multigrid (AMG) for saddle point systems from meshfree discretizations

R ij = 2. Using all of these facts together, you can solve problem number 9.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Laplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters

Fokker-Planck Equation on Graph with Finite Vertices

Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina

On nonlinear adaptivity with heterogeneity

ADAPTIVE ALGEBRAIC MULTIGRID

Random Surfing on Multipartite Graphs

Self-Tuning Semantic Image Segmentation

A Generalized Eigensolver Based on Smoothed Aggregation (GES-SA) for Initializing Smoothed Aggregation Multigrid (SA)

LECTURE NOTE #11 PROF. ALAN YUILLE

Mini-project in scientific computing

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Compressed Sensing and Linear Codes over Real Numbers

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Iterative Methods for Solving A x = b

Convex Optimization of Graph Laplacian Eigenvalues

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Machine Learning for Data Science (CS4786) Lecture 11

Introduction. Marcel Radermacher Algorithmen zur Visualisierung von Graphen

An Algebraic Multigrid Method for Eigenvalue Problems

Robust and Adaptive Multigrid Methods: comparing structured and algebraic approaches

9. Iterative Methods for Large Linear Systems

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Adaptive Multigrid for QCD. Lattice University of Regensburg

The Generalized Haar-Walsh Transform (GHWT) for Data Analysis on Graphs and Networks

Lecture: Some Practical Considerations (3 of 4)

Spectral Clustering. Zitao Liu

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

MULTILEVEL ADAPTIVE AGGREGATION FOR MARKOV CHAINS, WITH APPLICATION TO WEB RANKING

Dedicated to J. Brahms Symphony No. 1 in C minor, Op. 68

Algorithms for Graph Visualization Force-Directed Algorithms

Von Neumann Analysis of Jacobi and Gauss-Seidel Iterations

INTRODUCTION TO MULTIGRID METHODS

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Graph and Controller Design for Disturbance Attenuation in Consensus Networks

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Algorithms and Data Structures

Algebraic Representation of Networks

Review: From problem to parallel algorithm

Using Local Spectral Methods in Theory and in Practice

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Non-Equidistant Particle-In-Cell for Ion Thruster Plumes

Lecture 16 Methods for System of Linear Equations (Linear Systems) Songting Luo. Department of Mathematics Iowa State University

The Removal of Critical Slowing Down. Lattice College of William and Mary

Overlapping Communities

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Faster quantum algorithm for evaluating game trees

Transcription:

Multiscale Approach for the Network Compression-friendly Ordering Ilya Safro (Argonne National Laboratory) and Boris Temkin (Weizmann Institute of Science) SIAM Parallel Processing for Scientific Computing 2012

Motivation Networks can be huge (tera-, peta-, exa-, zetta-, yotta-,... bytes) structurally different at different resolutions collected in parallel, i.e., the data is mixed noisy, irregular, etc. 2

Motivation Networks can be huge (tera-, peta-, exa-, zetta-, yotta-,... bytes) structurally different at different resolutions collected in parallel, i.e., the data is mixed noisy, irregular, etc. Challenges How to store the network efficiently? How to design an extremely fast access to nodes and links? How to minimize the number of cache misses? 3

Problem Find a compressed representation of a network. 4

Network representation: compressed row format Node Sorted list of neighbors with possible edge info 1 2, 5, 6, 12, 18, 23, 103...... 1584 1585, 1592, 1600 [KDD09 Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]: given a sorted list of neighbours (x 1, x 2, x 3,...), represent it by a list of differences (x 1, x 2 x 1, x 3 x 1,...) or (x 1, x 2 x 1, x 3 x 2,...). 5

Network representation: compressed row format Node Sorted list of neighbors with possible edge info 1 2, 5, 6, 12, 18, 23, 103...... 1584 1585, 1592, 1600 [KDD09 Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]: given a sorted list of neighbours (x 1, x 2, x 3,...), represent it by a list of differences (x 1, x 2 x 1, x 3 x 1,...) or (x 1, x 2 x 1, x 3 x 2,...). Node Sorted list of neighbors with possible edge info 1 1, 4, 5, 11, 17, 22, 102...... 1584 1, 8, 16 6

Network representation: compressed row format Node Sorted list of neighbors with possible edge info 1 2, 5, 6, 12, 18, 23, 103...... 1584 1585, 1592, 1600 [KDD09 Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]: given a sorted list of neighbours (x 1, x 2, x 3,...), represent it by a list of differences (x 1, x 2 x 1, x 3 x 1,...) or (x 1, x 2 x 1, x 3 x 2,...). Node Sorted list of neighbors with possible edge info 1 1, 4, 5, 11, 17, 22, 102...... 1584 1, 8, 16... and then use a compression algorithm (such as γ-encoding). Example: Boldi-Vigna compression. 7

The Minimum Logarithmic Arrangement problem Problem MLogA: minimize over all possible π s(n) lg π(i) π(j). ij E 8

The Minimum Logarithmic Arrangement problem Problem MLogA: minimize over all possible π s(n) lg π(i) π(j). ij E when we have additional information about link/node importance (or its expected access frequency) then introduce importance coefficients w ij, node volumes v i and formulate the generalized version of MLogA Problem GMLogA : min π s(n) c(g, x π ) = ij E w ij lg x i x j such that i V x i = v i /2 + k,π(k)<π(i) v k 9

The Minimum Logarithmic Arrangement problem Goal: minimize the number of bits per link: c(g, x π )/ ij E w ij. MLogA is NP-hard [Chierichetti, Kumar, Lattanzi, Mitzenmacher, Panconesi, Raghavan]. Proof using the inapproximability of MaxCut. 0 1 2 4 5 6 3 Solutions of MLogA and MinLA are different for this graph [CKLMPP09] 10

Main questions Coarsening Uncoarsening Distance metric between nodes Fine to coarse projection operator Relaxation and Refinement Coarse to fine operator Exact solution S, Temkin Multiscale approach for the network compression-friendly ordering, 2011 Lindstrom The Minimum Edge Product Linear Ordering Problem, 2011 11

AMG coarsening c f Fine(f)-to-coarse(c) interpolation operator i L f Weighted Laplacian of G at level f j iw(i j) fine level vertices b l i k w(k l) i Σ w(i j) a 0 coarse level vertices L c ( c f )T L f c f w IJ = l,k iw(il) w lk iw(kj) 12

How to measure the connectivity? Examples of existing approaches Shortest path All/some (weighted) indirect paths Spectral approaches Flow network capacity based approaches Random-walk approaches: commute time, first-passage time, etc. (Fouss, Pirotte, Renders, Saerens,...) Speed of convergence of the compatible relaxation from AMG (Brandt, Ron, Livne,...) Probabilistic interpretation of a diffusion (Nadler, Lafon, Coifman, Kevrekidis,...) Effective resistance of a graph (Ghosh, Boyd, Saberi,...) 13

Stationary iterative relaxation Relaxation process that shows which pair of vertices tends to be more connected than other. 1. i V define x i = rand() 2. Do k times step 3 3. i V x k i = (1 ω)x k 1 i + ω j w ijx k 1 j / ij w ij Conjecture If x i x j > x u x v then the local connectivity between u and v is stronger than that between i and j. We will call s (k) ij = x i x j the algebraic distance between i and j after k iterations. 14

Toy example: graph mesh 20x40+diagonal 20 40 edge weights: red=2, black=1 15

Mesh 20x40+diagonal, random 2D initialization 16

Mesh 20x40+diagonal, after 10 iterations of JOR 17

Stationary iterative relaxation Rewrite the iterative process as x (k+1) = Hx (k), where H: H GS = (D L) 1 U, H SOR = (D/ω L) 1 ((1/ω 1)D + U), H JAC = D 1 (L + U), H JOR = (D/ω) 1 ((1/ω 1)D + L + U). Definition Extended p-normed algebraic distance between i and j after k iterations x (k+1) = Hx (k) on R random initializations ρ (k) ij := ( R r=1 x (k,r) i x (k,r) j p ) 1/p 18

Stationary iterative relaxation Rewrite the iterative process as x (k+1) = Hx (k), where H: H GS = (D L) 1 U, H SOR = (D/ω L) 1 ((1/ω 1)D + U), H JAC = D 1 (L + U), H JOR = (D/ω) 1 ((1/ω 1)D + L + U). Definition Extended p-normed algebraic distance between i and j after k iterations x (k+1) = Hx (k) on R random initializations ρ (k) ij := ( R r=1 lg x (k,r) i x (k,r) j p ) 1/p 19

Applications: multilevel graph coarsening Algorithmic Classical Algebraic component AMG distance-based AMG Future volume for edge weight algebraic distance C-points selection C-points selection edge weight algebraic distance Interpolation edge weight weak edge filtering operator Ron, S, Brandt Relaxation-based coarsening and multiscale graph organization, SIAM Multiscale Modeling and Simulations, 2011 Chen, S Algebraic distance on graphs, SISC, 2011 S, Sanders, Schulz Advanced coarsening schemes for multilevel graph partitioning, 2012 20

Uncoarsening P=P 0 S 0 P 1 A. Coarsening Defines the hierarchical structure (P=P,P,...,P ) 0 1 k P k 1 S k 1 C. Interpolation S and 1 Relaxation Produces an initial solution of P i 1of from the solution S i P i and constructs final solution S i 1 from Pi 1 P k S k B. Exact solution 21

w ia w ic w ie Uncoarsening: minimizing the contribution of one node a b c d e 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 i 22

w ia w ic w ie Uncoarsening: minimizing the contribution of one node a b c d e 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 i N i the set of ith neighbors with assigned coordinates x j. To minimize the local contribution of i to the total energy, we have to assign to it a coordinate x i that minimizes j N i w ij lg x i x j. (1) j N i, x i = x j x i = x t (1) is, we resolve this by setting t = arg min k N i k j N i w kj lg x k x j. (2) 23

Uncoarsening: minimizing the contribution of one node Problem arg min k Ni k j N i w kj lg x k x j Trivial solution complexity: O( N i 2 ) 24

Uncoarsening: minimizing the contribution of one node Problem arg min k Ni k j N i w kj lg x k x j Trivial solution complexity: O( N i 2 ) Our linear approach: look for the nearly minimum sum in the point of maximal density. How to do this: Parzen window (or kernel density estimation) method. The density at point x is estimated as d(x) = 1 K N i h j N i ( ) x xj h ˆd(x) = 1 w ij 2 x xj /h N i h j N i where K is a kernel and h is a smoothing parameter. (3) 25

Uncoarsening Interpolation is similar to AMG, x f = f c x c fine seeds are projected from x c fine non-seeds are calculated from Kernel Density Estimation legalization of the order by resolving overlaps Compatible relaxation: seeds are invariant, non-seeds are relaxed Gauss-Seidel relaxation: all nodes are relaxed Refinement: nearest-neighbors improvements 26

Refinement: window minimization pass W Find π of W that minimizes w ij lg x i x j + w ij lg x i x j ij W i W,j W subject to x i = v i /2 + k,π(k)<π(i) v k 27

What are the most competitive algorithms today? Randomized ordering - sometimes is better than parallel network crawling (fast to obtain, bad for performance) Lexicographical - network traversal for some order of neighbours such as BFS and DFS (easy to calculate, can be good for networks with excellent locality) Gray ordering - inspired by Gray coding when two successive vectors differ by exactly one bit (easy to calculate, good for Web-like (or good locality) networks) Shingle ordering - brings nodes with similar neighborhoods together, uses Jaccard coefficient J(A, B) = A B / A B to measure the similarity (works good in preferential attachment models when rich gets richer). LayeredLPA - label propagation algorithm is similar to the algebraic distance (usually better than previous methods) Spectral methods - based on Fiedler vector 28

ms-gmloga vs min(rnd, lex, nat) ratios (lg or la)/min(nat,lex,rnd) 1 0.8 0.6 0.4 la/min(nat,lex,rnd) lg/min(nat,lex,rnd) 0.2 0 20 40 60 80 100 ordered graphs 29

ms-gmloga vs min(rnd, lex, nat) 1 no-n-n-lg/min(nat,lex,rnd) 1 no-n-n-lg/min(nat,lex,rnd) k25-n-n-lg/min(nat,lex,rnd) k25-n-n-lg/min(nat,lex,rnd) ratios lg/min(nat,lex,rnd) 0.8 0.6 0.4 ratios lg/min(nat,lex,rnd) 0.8 0.6 0.4 0.2 0 20 40 60 80 100 ordered graphs (a) directed 0.2 0 20 40 60 80 100 ordered graphs (b) undirected Comparison of the fastest and the slowest versions of ms-gmloga. Notations no-n-n-lg and k25-n-n lg correspond to the fast and slow versions of ms-gmloga, respectively. 30

ms-gmloga vs Gray/Shingle ratios ms-gmloga/gray 1 0.8 0.6 0.4 0.2 directed: ms-gmloga/gray undirected: ms-gmloga/gray 0 20 40 60 80 100 ordered graphs (a) Gray ordering vs ms-gmloga ratios ms-gmloga/shingle 1 0.8 0.6 0.4 0.2 directed: ms-gmloga/shingle undirected: ms-gmloga/shingle 0 0 20 40 60 80 100 ordered graphs (b) Double shingle vs ms-gmloga 31

Comparison of LayeredLPA vs ms-gmloga ratios ms-gmloga/layeredlpa 1 0.8 0.6 undirected: ms-gmloga/layeredlpa 0.4 0 20 40 60 80 100 ordered graphs 32

Scalability: time vs graph size lg( V + E ) 24 22 20 18 16 14 12 10 8-4 -2 0 2 4 6 8 10 12 lg(running time in sec.) 33

Heavy-tailed degree distribution Compressed BPL/Native BPL 1 0.8 0.6 0.4 0.2 0 10 20 30 40 Networks with heavy-tailed degree distribution 34

Cache misses: ParMETIS, Graclus, hmetis, Power method, SVD Cache misses: GMLogA/Native 1 0.8 0.6 0.4 0.2 0 10 20 30 40 Networks with heavy-tailed degree distribution 35

Further compression. Boldi-Vigna algorithm 5 4 Compression Improvement 4 3 2 1 Compression Improvement 3 2 1 0 0 1 2 3 4 5 Order Improvement 0 0 2 4 6 8 Order Improvement a) initial ordering is natural b) initial ordering is Gray 36

MUSKETEER - Mulltiscale Entropic Network Generator First release (next week)! http://www.mcs.anl.gov/ safro/musindex.html 37

Conclusions Linear time method for the network compression-friendly ordering Computational results: (heavy-tailed) networks can be compressed; number of cache misses can be minimized; running time can be better Questions: directed graph coarsening; collective refinement of nodes; more sophisticated gap encodings Thank you! 38

Conclusions Linear time method for the network compression-friendly ordering Computational results: (heavy-tailed) networks can be compressed; number of cache misses can be minimized; running time can be better Questions: directed graph coarsening; collective refinement of nodes; more sophisticated gap encodings Looking for a position Thank you! 39