Model Reduction for Edge-Weighted Personalized PageRank

Size: px

Start display at page:

Download "Model Reduction for Edge-Weighted Personalized PageRank"

Horace Park
5 years ago
Views:

1 Model Reduction for Edge-Weighted Personalized PageRank D. Bindel 11 Apr 2016 D. Bindel Purdue 11 Apr / 33

2 The Computational Science & Engineering Picture Application Analysis Computation MEMS Smart grids Networks Linear algebra Approximation theory Symmetry + structure HPC / cloud Simulators Solvers D. Bindel Purdue 11 Apr / 33

3 Collaborators Wenlei Xie David Bindel Johannes Gehrke Al Demers LinkedIn Cornell Microsoft Cornell D. Bindel Purdue 11 Apr / 33

4 PageRank Problem Unweighted Node weighted Edge weighted Goal: Find important vertices in a network Basic approach uses only topology Weights incorporate prior info about important nodes/edges D. Bindel Purdue 11 Apr / 33

5 PageRank Model Unweighted Node weighted Edge weighted Random surfer model: x (t+1) = Px (t) +(1 )v where P = AD 1 Stationary distribution: Mx = b where M =(I P), b =(1 )v D. Bindel Purdue 11 Apr / 33

6 Edge Weight vs Node Weight Personalization v i = v i (w) ij = ij (w) w 2 R d Introduce personalization parameters w 2 R d in two ways: Node weights: M x(w) = b(w) Edge weights: M(w) x(w) = b D. Bindel Purdue 11 Apr / 33

7 Edge Weight vs Node Weight Personalization Node weight personalization is well-studied Topic-sensitive PageRank: fast methods based on linearity Localized PageRank: fast methods based on sparsity Some work on edge weight personalization ObjectRank/ScaleRank: personalize weights for di erent edge types But lots of work incorporates edge weights without personalization Our goal: General, fast methods for edge weight personalization D. Bindel Purdue 11 Apr / 33

8 Edge Weight Parameterizations Di erent ways to personalize =) di erent algorithm options 1 Linear: Take an edge of type i with probability w i P(w) = dx w i P (i) i=1 2 Scaled linear: Take an edge with probability / (linear) edge weight P(w) =A(w)D(w) 1, A(w) = dx w i A (i), D(w) = i=1 dx w i D (i), i=1 3 Fully nonlinear: Both A and P depend nonlinearly on w D. Bindel Purdue 11 Apr / 33

9 Model Reduction Expensive full model (Mx = b) = U Reduced basis Approximation ansatz Reduced model ( My = b) = Model reduction procedure from physical simulation world: O ine: Construct reduced basis U 2 R n k O ine: Choose k equations to pick approximation ˆx = Uy Online: Solve for y(w) given w and reconstruct ˆx D. Bindel Purdue 11 Apr / 33

10 Reduced Basis Construction: SVD (aka POD/PCA/KL) Snapshot matrix V T x 1 x 2... x r U w 1 w 2 w r Sample points D. Bindel Purdue 11 Apr / 33

11 Choosing Good Spaces What is the best possible approximation ˆx = Uy? where min y kuy x(w)k 2 apple k+1 kxk 2 + e interp (w) e interp (w) = is error in an interpolant. x(w) rx x(w j )c j (w) j=1 Pay attention where x has large derivatives! Also suggests sampling strategies (sparse grids, adaptive methods) 2 D. Bindel Purdue 11 Apr / 33

12 Approximation Ansatz Want r = MUy b 0. Consider two approximation conditions: Method Ansatz Properties Bubnov-Galerkin U T r =0 Goodaccuracyempirically Fast for P(w) linear DEIM min kr I k Fast even for nonlinear P(w) (collocation) Complex cost/accuracy tradeo Petrov-Galerkin a bit more accurate than Bubnov-Galerkin future work. D. Bindel Purdue 11 Apr / 33

13 Bubnov-Galerkin Method U T y M U b =0. Linear case: w i = probability of transition with edge type i M(w) =I X! X w i P (i), M(w) =I i i w i P(i) where we can precompute P (i) = U T P (i) U Nonlinear: Cost to form M(w) comparable to cost of PageRank!! D. Bindel Purdue 11 Apr / 33

14 Discrete Empirical Interpolation Method (DEIM) y Equations in I M U b =0. I Ansatz: Minimize kr I k for chosen indices I Only need a few rows of M (and associated rows of U) If given A(w), also need column sums for normalization. Di erence from physics applications: high-degree nodes! D. Bindel Purdue 11 Apr / 33

15 Error Behavior Similar error analysis framework for both Galerkin and DEIM Consistency + Stability = Accuracy Consistency: Does the subspace contain good approximants? Stability: Is the approximation subproblem far from singular? Characterize stability by a quasi-optimality condition kx Uyk applemin z Ckx Uzk D. Bindel Purdue 11 Apr / 33

16 Standard Quasi-Optimality Approach Define a solution projector: x = approximate solution when true solution is x Note that U = U. The error projector I maps a true solution to error Note that (I )U = 0. e = x x =(I )x If e min = x Uz is the smallest norm error in the space, then e =(I )x (I )Uz =(I )e min Therefore, a bound on ki k apple 1 + k k establishes quasi-optimality. D. Bindel Purdue 11 Apr / 33

17 Quasi-Optimality: Galerkin and DEIM Galerkin : = U M 1 W T M M W T MU DEIM : = U M M I,: M M I,: U Key to stability: M far from singular Suggests pivoting schemes for good I in DEIM Also helps to explicitly enforce P i ˆx i =1 Can bound k k o ine for Galerkin + linear parameterization. D. Bindel Purdue 11 Apr / 33

18 Interpolation Costs Consider subgraph relevant to one interpolation equation: i 2I 1/3 1/50... Incoming neighbors of i Really care about weights of edges incident on I Need more edges to normalize (unless A(w) linear) Cost to include i 2I: {j, k : a ij 6=0anda kj 6=0} High in/out degree are expensive but informative D. Bindel Purdue 11 Apr / 33

19 Interpolation Cost and Accuracy Key question: how to choose I to balance cost vs accuracy? Want to pick I once, so look at rows of Z = M(w 1 )U M(w 2 )U... for sample parameters w (i). Pivoted QR-like greedy row selection with proxy measures for Cost: Nonzeros in row (+ assoc columns if normalization required) Accuracy: Residual when projecting row onto those previously selected Several heuristics for cost/accuracy tradeo (see paper) D. Bindel Purdue 11 Apr / 33

20 Online Costs If ` = # PR components needed, online costs are: Form M O(dk 2 )forb-g More complex for DEIM Factor M O(k 3 ) Solve for y O(k 2 ) Form Uy O(k`) Online costs do not depend on graph size! (unless you want the whole PR vector) D. Bindel Purdue 11 Apr / 33

21 Example Networks DBLP (citation network) 3.5M nodes / 18.5M edges Seven edge types =) seven parameters P(w) linear Competition: ScaleRank Weibo (micro-blogging) 1.9M nodes / 50.7M edges Weight edges by topical similarity of posts Number of parameters = number of topics (5, 10, 20) (Studied global and local PageRank see paper for latter.) D. Bindel Purdue 11 Apr / 33

22 Singular Value Decay Value DBLP-L Weibo-S5 Weibo-S10 Weibo-S i th Largest Singular Value r = 1000 samples, k = 100 D. Bindel Purdue 11 Apr / 33

23 DBLP Accuracy Kendall@100 Normalized L1 Galerkin DEIM-100 DEIM-120 DEIM-200 ScaleRank D. Bindel Purdue 11 Apr / 33

24 DBLP Running Times (All Nodes) Running time (s) Coefficients Construction Galerkin DEIM-100 DEIM-120 DEIM-200 ScaleRank D. Bindel Purdue 11 Apr / 33

25 Weibo Accuracy 10-1 Normalized L # Parameters D. Bindel Purdue 11 Apr / 33

26 Weibo Running Times (All Nodes) Running time (s) Coefficients Construction # Parameters D. Bindel Purdue 11 Apr / 33

27 Application: Learning to Rank Goal: Given T = {(i q, j q )} T q=1,findw that mostly ranks i q over j 1. (c.f. Backstrom and Leskovec, WSDM 2011) Standard idea: j = M(w) 1 j Dominant cost: d + 1 solves with the PageRank system M(w) One PageRank solve to evaluate a loss function One PageRank solve per parameter to evaluate gradients D. Bindel Purdue 11 Apr / 33

28 Application: Learning to Rank Goal: Given T = {(i q, j q )} T q=1,findw that mostly ranks i q over j 1. (c.f. Backstrom and Leskovec, WSDM 2011) Standard: Gradient descent on full problem One PR computation for objective One PR computation for each gradient component Costs d + 1 PR computations per step With model reduction Rephrase objective in reduced coordinate space Use factorization to solve PR for objective Re-use same factorization for gradient D. Bindel Purdue 11 Apr / 33

29 DBLP Learning Task Objective Function Value Standard Galerkin DEIM Iteration (8 papers for training + 7 params) D. Bindel Purdue 11 Apr / 33

30 The Punchline Test case: DBLP, 3.5M nodes, 18.5M edges, 7 params Cost per Iteration: Method Standard Bubnov-Galerkin DEIM-200 Time(sec) D. Bindel Purdue 11 Apr / 33

31 Roads Not Taken In the paper (but not the talk) Selecting interpolation equations for DEIM Localized PageRank experiments (Weibo and DBLP) Comparison to BCA for localized PageRank Room for future work! Analysis, applications, systems,... D. Bindel Purdue 11 Apr / 33

32 Questions? Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier Wenlei Xie, David Bindel, Johannes Gehrke, and Al Demers KDD 2015, paper 117 Sponsors: NSF (IIS and IIS ) iad Project from the National Research Council of Norway D. Bindel Purdue 11 Apr / 33

33 Welcome to SIAM ALA in Hong Kong! Hong Kong Baptist University, 4-8 May 2018 D. Bindel Purdue 11 Apr / 33

Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier

Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier W. Xie D. Bindel A. Demers J. Gehrke 12 Aug 2015 W. Xie, D. Bindel, A. Demers, J. Gehrke KDD2015 12 Aug 2015 1 / 1 PageRank