Strong Localization in Personalized PageRank Vectors

Size: px

Start display at page:

Download "Strong Localization in Personalized PageRank Vectors"

Charla Terry
6 years ago
Views:

1 Strong Localization in Personalized PageRank Vectors Huda Nassar 1, Kyle Kloster 2, and David F. Gleich 1 1 Purdue University, Computer Science Department 2 Purdue University, Mathematics Department {hnassar,kkloste,dgleich}@purdue.edu Abstract. The personalized PageRank diffusion is a fundamental tool in network analysis tasks like community detection and link prediction. This tool models the spread of a quantity from a small, initial set of seed nodes, and has long been observed to stay localized near this seed set. We derive a sublinear upper-bound on the number of nonzeros necessary to approximate a personalized PageRank vector on a power-law graph. Our experimental results on power-law graphs with a wide variety of parameter settings demonstrate that the bound is loose, and instead supports a new conjectured bound. Keywords: PageRank, diffusion, local algorithms 1 Introduction Personalized PageRank vectors [20] are a frequently used tool in data analysis of networks in biology [9, 18] and information-relational domains such as recommender systems and databases [12, 14, 19]. In comparison to the standard PageRank vector, personalized PageRank vectors model a random-walk process on a network that randomly returns to a single starting node instead of restarting at random in the network as in the traditional PageRank. This process is also called a random-walk with restart. The stationary distributions of the resulting process are typically called personalized PageRank vectors. We prefer the terms localized PageRank or seeded PageRank as these choices are not as tied to PageRank s origins on the web. A seeded PageRank vector depends on three terms: the network modeled as a column-stochastic matrix P characterizing the random-walk process, a parameter α that determines the reset probability (1 α), and a seed node s. The vector e s is the vector of all zeros with a single 1 in the position corresponding to node s. The seeded PageRank vector x is then the solution of the linear system: (I αp)x = (1 α)e s. Supported by NSF CAREER award CCF Code available online

2 2 Nassar, Kloster, and Gleich When the network is strongly connected, the solution x is non-zero for all nodes. This is because there is a non-zero probability of walking from the seed to any other node in a strongly connected network. Nevertheless, the solution x displays a behavior called localization. We can attain accurate localized PageRank solutions by truncating small elements of x to zero. Put another way, there is a sparse vector x ε that approximates x to an accuracy of ε. This behavior is desirable for applications of seeded PageRank because they typically seek to highlight a small region in a large graph related to the seed node s. The essential question we study in this paper is: how sparse can we make x ε? To be precise, we consider a notion of strong localization, x ε x 1 ε, and we are then concerned with the behavior of f(ε) := min nonzeros(x ε ). There are a few details missing in this simplified setup. For instance, x ε depends on α, P, and the seed s. We only consider stochastic matrices P that arise from random-walks on strongly-connected graphs. Thus, a more precise statement is: f α (ε) = max P max min nonzeros(x ε ) where x ε x(α, P, s) s x 1 ε, ε and where x(α, P, s) is the seeded PageRank vector (1 α)(i αp) 1 e s. When clear from context, we will just write f(ε). The goal is to establish bounds on f(ε). That is, can we bound f(ε) in terms of the accuracy ε and some properties of the graph (e.g. its size n)? Answering this also implies lower-bounds on the work involved in computing an approximation of a localized PageRank vector. Adverserial localized PageRank constructions exist where the solutions x are the uniform distribution (see Section 2). Thus, it is not possible to meaningfully bound f(ε) as anything other than n. To overcome this bound, we study f(ε) where the maximum is taken over graphs with a power-law degree distribution. We establish an upper-bound on f α (ε) as a function of the power-law exponent, 1/ε, α, the maximum degree d, and a few terms that grow sublinearly with n (Theorem 1). The essence of the argument is that we study the number of steps of an algorithm required to solve the PageRank linear system to a desired accuracy, and we then bound the number of iterations assuming that we start at the vertex with maximum degree. 1.1 Related work on weak localization There is another notion of localization that appears for uses of PageRank in partitioning undirected graphs: D 1 (x ε x) = max [x ε ] i x i /d i ε. i If this notion is used for a localized Cheeger inequality [1, 8], then we need the additional property that 0 x ε x element-wise. But when restated as a localization result, the famous Andersen-Chung-Lang PageRank partitioning result [1] includes a proof that: max P max min nonzeros(x ε ) 1 s x ε 1 1 α ε, where D 1 (x ε x(α, P, s)) ε.

3 Localization in Personalized PageRank 3 This establishes that any uniform random walk on a graph satisfies a weaklocalization property. The paper also gives a fast algorithm to find these weakly local solutions. More recently, there have appeared a variety of additional weaklocalization results on diffusions [10, 16]. 1.2 Related work on functions of matrices and diffusions Localization in diffusions is broadly related to localization in functions of matrices [5]. The results in that literature tend to focus on the case of banded matrices (e.g. [4]), although there are also discussions of more general results in terms of graphs arising from sparse matrices [5]. In the context of the decay of functions of matrices, our result in this manuscript establishes a strong decay bound without assuming a constant degree graph. These same types of decay bounds can apply to a variety of graph diffusion models that involve a stochastic matrix [2, 13], and recent work shows that they may even extend beyond this regime [10]. 2 A negative result for strong localization Here we construct an example of a graph that has a seeded PageRank vector that cannot be approximated locally or with a sublinear number of non-zeros. More concretely, we demonstrate the existence of a personalized PageRank vector that requires at least (n 1)ε(1 + α)/α nonzeros to attain a 1-norm accuracy of ε, where n is the number of nodes in the graph. The construction is simple, and we sketch it as follows: let G be an undirected star graph on n nodes. Then the PageRank vector seeded on the center node has two values: 1/(1 + α) for the center node and α/((1 + α)(n 1)) for all leaf nodes. Suppose an approximation ˆx of x has M of these leaf-node entries set to 0. Then the 1-norm error x ˆx 1 would be at least Mα/((1 + α)(n 1)). Attaining a 1-norm of ε requires Mα/((1 + α)(n 1)) < ε, and so the minimum number of entries of the approximate PageRank vector required to be non-zero (n M) is then lowerbounded by (n 1)ε(1 + α)/α. Thus, we have that the number of nonzeros required in the approximate PageRank vector must be linear in n. 3 Localization in Personalized PageRank The example in Section 2 demonstrates that there exist seeded PageRank vectors that are non-local. Here we focus on a particular class of graphs, those with a degree distribution following a power-law (Section 3.2), and show that seeded PageRank vectors on these graphs can always be ε-approximated with a number of non-zero entries that depends sublinearly on n. Theorem 1. Let P be a uniform random walk transition matrix of a graph with maximum degree d and minimum degree δ. Additionally, suppose the graph has a sub-power-law degree distribution with exponent p, so that the kth largest degree, d(k), satisfies d(k) d k p. The Gauss-Southwell coordinate relaxation method

4 4 Nassar, Kloster, and Gleich applied to the Personalized PageRank problem (I αp)x = (1 α)e s produces an approximation x ε satisfying x x ε 1 < ε in t steps and t non-zeros in the solution, where t satisfies t 1 δ C ( 1 ) δ 1 α p ε, (1) and where we define C p to be C p := d(1 + log d) if p = 1 ( ( )) := d p d 1 p 1 1 otherwise. This theorem originates in our recent work on the heat kernel diffusion (see Theorem 5.7 in [11]), and we adapt the techniques used in that proof for the current seeded PageRank vector. 3.1 Deriving the bound We want to compute an ε-approximation x ε to the equation (I αp) x = (1 α)e s for some seed vector e s. Given an approximation, ˆx, we can bound the 1-norm error as follows. First, define the residual vector r = (1 α)e s (I αp)ˆx, and note the following relationship of the residual vector to the error vector x ˆx = (I αp) 1 r. (2) Using this relationship, we can bound our approximation s 1-norm accuracy, x ˆx 1, with the quantity 1 1 α r 1. This is because the column-stochasticity of P implies that (I αp) 1 1 = 1 1 α. Guaranteeing a 1-norm error x ˆx 1 < ε is then a matter of ensuring that r 1 < (1 α)ε holds. To bound the residual norm, we look more closely at a particular method for producing the approximation. Gauss-Southwell iteration The Gauss-Southwell algorithm is a coordinate-descent method for solving a linear system akin to the Gauss-Seidel linear solver. When solving a linear system, the Gauss-Southwell method proceeds by updating the entry of the approximation that corresponds to the largest magnitude entry of the residual, r, as follows. Next we describe the Gauss-Southwell update specifically as it is used to solve the seeded PageRank linear system given in the theorem statement. The algorithm begins by setting the initial solution x (0) = 0 and r (0) = (1 α)e s. In step k, let j = j(k) be the entry of r (k) with the largest magnitude, and let m = r (k) j. We update the solution x (k) and residual vectors as follows: x (k+1) = x (k) + me j (3) r (k+1) = s (I αp)x (k+1), (4) and the residual update can be expanded to r (k+1) = r (k) me j + mαpe j. This application of Gauss-Southwell to personalized PageRank style problems has

5 Localization in Personalized PageRank 5 appeared numerous times in recent literature [6, 7, 15, 17]. In at least one instance ([7], Section 5.2) the authors showed that the residual and solution vector stay nonnegative throughout this process, assuming the seed vector is nonnegative (which, in our context, it is). So the 1-norm of the residual can be expressed r (k+1) 1 = e T r (k+1). Expanding the residual in terms of the iterative update presented above, we can write the residual norm as e T ( r (k) me j + mαpe j ). Then, denoting r (k) 1 by r k and simplifying yields r k+1 = r k m(1 α). Next we bound the max magnitude entry, m, to be able to bound the residual norm. Observe that since m is the largest magnitude entry in r, in particular it is larger than the average value of r. Let Z(k) denote the number of nonzero entries in r (k) ; then the average value can be expressed as r k /Z(k). Hence, we have m r k /Z(k), and so we can bound r k m(1 α) above by r k r k (1 α)/z(k). Simplifying yields r k+1 r k (1 (1 α)/z(k)). Recurring this gives r k+1 r 0 k t=0 ( ) 1 1 α Z(t), (5) where r 0 = (1 α) because r 0 = (1 α)e s. Then, using the fact that log(1 x) x for x < 1, we can show ( ) k ( ) k r k+1 (1 α) 1 1 α 1 Z(t) (1 α) exp (1 α) Z(t). (6) t=0 To progress from here we need some control over the quantity Z(t). One property that can limit the behavior of Z(t) is the degree distribution of the graph, which we now explore. 3.2 Using the degree distribution At last we can use the fill-in analysis for power-law graphs presented in [11]. Before continuing, we first establish some notation. Let our graph have max degree d and min degree δ, and denote the degree of the node that has the kth largest degree by d(k). Assuming the graph s degree distribution is power-law or sub power-law with exponent p, we know d(k) satisfies d(k) d k p. (Note that we use sub power-law here to refer to the fact that d(k) is bounded above by, rather than exactly equal to, the shown quantity.) We show below that the nonzero fill-in above is bounded as follows t=0 Z(k) C p + δk, (7) where the term C p is defined in the statement of Theorem 1. We remark that the quantity C p used here are slightly tighter than the values presented in the original paper. Next we use this bound on Z(k) to control the bound on r k. Lemma 5.6 from [11] implies that k t=0 1 Z(t) 1 δ log ((δ(k + 1) + C p)/c p )

6 6 Nassar, Kloster, and Gleich and so, plugging into (6), we can bound ( ( )) r k+1 (1 α) exp (1 α) (δ(k+1)+cp) δ log C p, which simplifies to r k+1 (1 α) ((δ(k + 1) + C p )/C p ) (α 1)/δ. Finally, to guarantee r k < ε(1 α), it suffices to choose k so that ((δk + C p )/C p ) (α 1)/δ ε. But this holds if and only if (δk + C p ) C p (1/ε) δ/(α 1) is satisfied; this, in turn, is guaranteed by k 1 δ C p (1/ε) δ/(1 α), which proves the bound on the number of iterations. Proving the degree distribution bound Here we prove the inequality in (7) used in the proof above. We give a sketch here, as this is essentialy the same as the proof of Lemma 5.5 in [11]. First, observe that the number of nonzeros in the residual after t steps is upperbounded by the sum of the largest t degrees, Z(t) t k=1 d(k). Now we use the power-law bound on d(k). We substitute into the bound Z(t) t k=1 d(k) the degree bound d(k) dk p. However, this makes sense only when k (d/δ) 1/p, for δ the minimum degree; this is because for k > (d/δ) 1/p the quantity dk p is less than 1, which makes no sense as a node s degree. Hence, we split the summation Z(t) t k=1 Z(t) t d(k) k=1 d(k) into two pieces, (d/δ) 1/p dk p k=1 + t k= (d/δ) 1/p +1 We want to prove that this implies Z(t) C p + δt. The second summand is straightforward to majorize with δt. The first summand can be upperbounded by d (1 + ) (d/δ) 1/p x p dx using a right-hand integral rule. But this integral is 1 straightforward to bound above by the quantity C p defined in Theorem 1. This completes the proof. δk. 4 Experiments We present experimental results on the localization of seeded PageRank vectors on random power-law graphs and compare the actual sparsity with the predictions of our theoretical bound. This involves generating random power-law graphs (Section 4.1), and then, comparing the experimental localization with our theoretical bound (Section 4.3). It is not particularly accurate, and so we conjecture a new bound that better predicts the behavior witnessed (Section 4.4). 4.1 Generating the graphs For experimental comparison, we wanted a test suite of graphs with varying but specific sizes and degree distributions. To produce these graphs, we use

7 Localization in Personalized PageRank 7 the Bayati-Kim-Saberi procedure [3] for generating undirected graphs with a prescribed degree distribution. The degree distributions used follow a power-law in the first (δ/d) 1/p elements with degrees d/i p. All other nodes are assigned to have degree δ up to a total size of n. We choose the maximum degree d to be n and δ = 2. After generating the sequence vector v, we use the Erdős-Gallai conditions and the Havel-Hakimi algorithm to check whether v is graphical. If the previously generated sequence fails, we perturb the vector slightly and recheck the conditions. This step has to be passed for a graph to be generated. This usually fails because the sequence has an odd number of edges, and it is sufficient to increase the degree of one of the nodes with minimum degree by 1. After the graph is generated, we verify that it contains a large connected component. We proceed with the graph when the largest component includes at least n( ) nodes. 4.2 Measuring the non-zeros We first compute a PageRank vector to high-accuracy (1-norm error bounded by ) using the power method. This requires (log(ε/2))/(log(α)) iterations based on the geometric convergence at rate α. We then study vectors x ε satisfying x ε x 1 ε, for accuracies ε = [10 1,, 10 3, 10 4 ]. To count the number or nonzeros in a vector x ε for a particular accuracy ε, we first recall: f α (ε) = max P max min nonzeros(x ε ) where x ε x(α, P, s) s x 1 ε. ε Thus, we need to compute x ε in a way that includes as many zeros as possible, subject to the constraint that the 1-norm of the difference between x ε and x stays ε. The idea is to use the solution vector x and generate x ε from it as long as the error difference stays less than the chosen ε. The following steps illustrate our process to accomplish this: Compute the PageRank vector x with accuracy via the power method. Sort x in ascending order. Determine the largest index j so that the sum of entries 1 through j of x is less than ε. Truncate these j entries to 0. Then x ε contains n j nonzeros. 4.3 Testing the theoretical bound To test the effectiveness of our theoretical bound in Theorem 1, we generate with different values of power-law exponent p = [0.5, 0.75, 1, 1.25] and with different sizes n = 10 4,..., Then we solve the seeded PageRank system, seeded on the node of maximum degree, using parameter settings α = [0.25, 0.3, 0.5, 0.65, 0.85].

8 8 Nassar, Kloster, and Gleich Figure 1 shows the outcome of this experiment with p = We can see that the theoretical bound is slightly more accurate as the graph size gets bigger, yet stays far from the plot of the sparsity of the ε-approximate diffusion. Since the theoretical bound behaves poorly even on the extreme points of the parameter settings, we wished for a tighter empirical bound. α = n = 10 5 n = 10 p value = n = 10 n = 10 7 n = 10 8 n = 10 9 α =0.85 α =0.65 α =0.5 α =0.3 0 Fig. 1. A log-log plot of the nnz(xε) versus 1/ε obtained for different experiments as α varies. We fix p = 1.25 for all plots, and run experiments on graphs of sizes [10 4,, 10 6, 10 7, 10 8, 10 9 ]. The red dashed line represents a vector with all non-zeros present (i.e. a ratio of 1). The black dashed line shows the bound predicted by Theorem 1. The blue curve shows the actual ratio of non-zeros found. 4.4 Experimental analysis In this section, we present a new conjectured bound that better predicts the behavior of the number of nonzeros in x ε as other parameters vary. To derive a relationship between nnz(x ε ), ε, p and α, we consider the following experiment. We fix n = 10 6, p = 1.25 and generate a graph as mentioned in Section 4.1. We then solve the PageRank problem and find the number of nonzeros for different ε values as mentioned in Section 4.2. We use α = [0.25, 0.3, 0.5, 0.65, 0.85] and count the number of nonzeros in the diffusion vector based on four accuracy

9 Localization in Personalized PageRank 9 settings, ε = [10 1,, 10 3, 10 4 ]. We then generate a log-log plot of nnz(xε) versus 1/ε for the different values of α. (The choice of was based on pieces of the theoretical bound.) The outcome is illustrated in Figure 2 (left). From Figure 2, we can see that as α increases, the values nnz(xε) also increase, interestingly, almost as a linear shift. We prefer to focus on (1 α), and so we view our previous observation to be in terms of (1 α). As (1 α) decreases, the curves representing nnz(xε) shift upward, nearly linearly. The initial goal is to find a relation of the form nnz(xε) g(α, ε, p) for some function g. Since we have seen that nnz(xε) seems to vary inversely with (1 α), we choose this relation to be: nnz(xε) g(ε, p). c1 (1 α) We similary derive a relation between nnz(xε) and p. To study the relationship, we fix n = 10 6, generate graphs of different power-law degrees, namely: p = [0.5, 0.75, 1, 1.25], solve the PageRank problem with α = 0.5, and count the number of nonzeros based on four accuracy settings. We report the results in Figure 2 (right). Through figure 2 (right), we can see that as the value of p increases, the curves nnz(xε) appear to grow much more slowly. Furthermore, the difference between the curves becomes exponential as 1/ε increases. Thus, this leads us to think of the relation between p and nnz(xε) as an exponential function in terms of 1/ε. Also, since p and nnz(xε) are inversely related, we consider 1/p rather than p. Therefore, we arrive at a relationship of the form: nnz(x ε) ( c1 1 ) c2/p c 3 (1 α) ε for some constants c 1, c 2, c 3. After experimenting with the above bound, we found that the best bound is achieved at c 1 = 0.2, c 2 = 0.25, c 3 = Results The experimental bound derived in Section 4.4 is now: ( nnz ) 1/(2p) 2 (1 α) ε. (8) In what follows, we demonstrate the effectiveness of this bound in describing the localizaiton of seeded PageRank vectors computed with different values of α, on power-law graphs of different sizes and with varying power-law exponents. For each set of parameters (graph size n, power-law exponent p, and PageRank constant α), a plot in Figures 3 and 4 displays the number of nonzeros needed to approximate a PageRank vector with 1-norm accuracy ε. In other words, each plot shows how f α (ε) grows as 1/ε grows for one type of graph. The blue curve represents the actual number of nonzeros required in the ε-approximation. Each plot also has a black dashed line showing the prediction by our conjectured bound (8). We note that our conjectured bound fails in a few sub-plots of Figure 2, for p = 0.5, α = This is likely because we tuned the constants in our bound

10 10 Nassar, Kloster, and Gleich 10 1 varying α varying p nnz/dlog(d) 10 1 nnz/dlog(d) log (1/ ε) log (1/ ε) Fig. 2. Log-log plots of nnz(xε) versus 1/ε obtained on graphs of size n = 106 as α and p vary. At left, p is fixed to p = 1.25 and the black, green, blue, red, and dotted black curves represent nnz(xε) for α values = [0.25, 0.3, 0.5, 0.65, 0.85] respectively. At right, α is fixed to α = 0.5 and the dashed blue, black, green, and red curves represent nnz(xε) for p values = [0.5, 0.75, 1, 1.25], respectively. from less extremal parameter settings; in contrast, the settings p = 0.5 and α = 0.85 represent the densest graphs and most-diffusive PageRank setting among those we consider. 5 Discussion We have shown that seeded PageRank vectors, though not localized on all graphs, must behave locally on graphs with power-law degree distributions. Our experiments show our theoretical bound to be terribly loose. In some sense this is to be expected as our algorithmic analysis is worst case. However, it isn t clear that any real-world graphs realize these worst-case scenarios. We thus plan to continue our study of simple graph models to identify empirical and theoretical localization bounds based on the parameters of the models. This will include a theoretical justification or revisitation of the empirically derived bound. It will also include new studies of Chung-Lu graphs as well as the Havel-Hakimi construction itself. Finally, we also plan to explore the impact of local clustering. Our conjecture is that this should exert a powerful localization effect beyond that due to the degree distribution. One open question sparked by our work regards the relationship between localized solutions and constant or shrinking average distance in graphs. It is well known that social networks appear to have shrinking or constant effective diameters. Existing results in the theory of localization of functions of matrices imply that a precise bound on diameter would force delocalization as the graph grows. Although the localization theory says nothing about average distance or small effective diameters, it hints that the solutions would delocalize. However, solutions often localize nicely in real-world networks and we wish to understand the origins of the empirical localization behavior more fully.

11 Localization in Personalized PageRank 11 α =0.65 α =0.5 α =0.3 α =0.25 α =0.85 p value =0.5 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 α =0.85 α =0.65 α =0.5 α =0.3 α =0.25 p value =0.75 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 Fig. 3. Each sub-plot has x-axis representing 1/ε, and y-axis representing the ratio of non-zeros present in a diffusion vector of 1-norm accuracy ε. The red dashed line represents a vector with all non-zeros present (i.e. a ratio of 1). The black dashed line shows the bound predicted by our bound (8). The blue curve shows the actual ratio of non-zeros found. As graphs get bigger (i.e. the fourth and fifth columns) the theoretical bound (black line) almost exactly predicts the locality of the ε-approximate diffusion.

12 12 Nassar, Kloster, and Gleich α =0.85 α =0.65 α =0.5 α =0.3 α =0.25 p value =1 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 α =0.85 α =0.65 α =0.5 α =0.3 α =0.25 p value =1.25 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 Fig. 4. Each sub-plot has x-axis representing 1/ε, and y-axis representing the ratio of non-zeros present in a diffusion vector of 1-norm accuracy ε. The red dashed line represents a vector with all non-zeros present (i.e. a ratio of 1). The black dashed line shows the bound predicted by our bound (8). The blue curve shows the actual ratio of non-zeros found. As graphs get bigger (i.e. the fourth and fifth columns) the theoretical bound (black line) almost exactly predicts the locality of the ε-approximate diffusion.

13 Localization in Personalized PageRank 13 References 1. Andersen, R., Chung, F., Lang, K.: Local graph partitioning using PageRank vectors. In: FOCS2006 (2006) 2. Baeza-Yates, R., Boldi, P., Castillo, C.: Generalizing PageRank: Damping functions for link-based ranking algorithms. In: SIGIR2006. pp (2006) 3. Bayati, M., Kim, J., Saberi, A.: A sequential algorithm for generating random graphs. Algorithmica 58(4), (2010) 4. Benzi, M., Razouk, N.: Decay bounds and O(n) algorithms for approximating functions of sparse matrices. ETNA 28, (2007) 5. Benzi, M., Boito, P., Razouk, N.: Decay properties of spectral projectors with applications to electronic structure. SIAM Review 55(1), 3 64 (2013) 6. Berkhin, P.: Bookmark-coloring algorithm for personalized PageRank computing. Internet Mathematics 3(1), (2007) 7. Bonchi, F., Esfandiar, P., Gleich, D.F., Greif, C., Lakshmanan, L.V.: Fast matrix computations for pairwise and columnwise commute times and Katz scores. Internet Mathematics 8(1-2), (2012) 8. Chung, F.: The heat kernel as the PageRank of a graph. Proceedings of the National Academy of Sciences 104(50), (December 2007) 9. Freschi, V.: Protein function prediction from interaction networks using a random walk ranking algorithm. In: BIBE. pp (2007) 10. Ghosh, R., Teng, S.h., Lerman, K., Yan, X.: The interplay between dynamics and networks: Centrality, communities, and cheeger inequality. pp (2014) 11. Gleich, D.F., Kloster, K.: Sublinear column-wise actions of the matrix exponential on social networks. Internet Mathematics (just-accepted) (2014) 12. Gori, M., Pucci, A.: ItemRank: a random-walk based scoring algorithm for recommender engines. In: IJCAI. pp (2007) 13. Huberman, B.A., Pirolli, P.L.T., Pitkow, J.E., Lukose, R.M.: Strong regularities in World Wide Web surfing. Science 280(5360), (1998) 14. Jain, A., Pantel, P.: Factrank: Random walks on a web of facts. In: COLING. pp (2010) 15. Jeh, G., Widom, J.: Scaling personalized web search. In: WWW. pp (2003) 16. Kloster, K., Gleich, D.F.: Heat kernel based community detection. In: KDD. pp (2014) 17. McSherry, F.: A uniform approach to accelerated PageRank computation. In: WWW. pp (2005) 18. Morrison, J.L., Breitling, R., Higham, D.J., Gilbert, D.R.: Generank: using search engine technology for the analysis of microarray experiments. BMC bioinformatics 6(1), 233 (2005) 19. Nie, Z., Zhang, Y., Wen, J.R., Ma, W.Y.: Object-level ranking: Bringing order to web objects. In: WWW. pp (2005) 20. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Tech. Rep , Stanford University (1999)

Strong localization in seeded PageRank vectors

Strong localization in seeded PageRank vectors https://github.com/nassarhuda/pprlocal David F. Gleich" Huda Nassar Computer Science" Computer Science " supported by NSF CAREER CCF-1149756, " IIS-1422918,