Strong Localization in Personalized PageRank Vectors

Size: px
Start display at page:

Download "Strong Localization in Personalized PageRank Vectors"

Transcription

1 Strong Localization in Personalized PageRank Vectors Huda Nassar 1, Kyle Kloster 2, and David F. Gleich 1 1 Purdue University, Computer Science Department 2 Purdue University, Mathematics Department {hnassar,kkloste,dgleich}@purdue.edu Abstract. The personalized PageRank diffusion is a fundamental tool in network analysis tasks like community detection and link prediction. This tool models the spread of a quantity from a small, initial set of seed nodes, and has long been observed to stay localized near this seed set. We derive a sublinear upper-bound on the number of nonzeros necessary to approximate a personalized PageRank vector on a power-law graph. Our experimental results on power-law graphs with a wide variety of parameter settings demonstrate that the bound is loose, and instead supports a new conjectured bound. Keywords: PageRank, diffusion, local algorithms 1 Introduction Personalized PageRank vectors [20] are a frequently used tool in data analysis of networks in biology [9, 18] and information-relational domains such as recommender systems and databases [12, 14, 19]. In comparison to the standard PageRank vector, personalized PageRank vectors model a random-walk process on a network that randomly returns to a single starting node instead of restarting at random in the network as in the traditional PageRank. This process is also called a random-walk with restart. The stationary distributions of the resulting process are typically called personalized PageRank vectors. We prefer the terms localized PageRank or seeded PageRank as these choices are not as tied to PageRank s origins on the web. A seeded PageRank vector depends on three terms: the network modeled as a column-stochastic matrix P characterizing the random-walk process, a parameter α that determines the reset probability (1 α), and a seed node s. The vector e s is the vector of all zeros with a single 1 in the position corresponding to node s. The seeded PageRank vector x is then the solution of the linear system: (I αp)x = (1 α)e s. Supported by NSF CAREER award CCF Code available online

2 2 Nassar, Kloster, and Gleich When the network is strongly connected, the solution x is non-zero for all nodes. This is because there is a non-zero probability of walking from the seed to any other node in a strongly connected network. Nevertheless, the solution x displays a behavior called localization. We can attain accurate localized PageRank solutions by truncating small elements of x to zero. Put another way, there is a sparse vector x ε that approximates x to an accuracy of ε. This behavior is desirable for applications of seeded PageRank because they typically seek to highlight a small region in a large graph related to the seed node s. The essential question we study in this paper is: how sparse can we make x ε? To be precise, we consider a notion of strong localization, x ε x 1 ε, and we are then concerned with the behavior of f(ε) := min nonzeros(x ε ). There are a few details missing in this simplified setup. For instance, x ε depends on α, P, and the seed s. We only consider stochastic matrices P that arise from random-walks on strongly-connected graphs. Thus, a more precise statement is: f α (ε) = max P max min nonzeros(x ε ) where x ε x(α, P, s) s x 1 ε, ε and where x(α, P, s) is the seeded PageRank vector (1 α)(i αp) 1 e s. When clear from context, we will just write f(ε). The goal is to establish bounds on f(ε). That is, can we bound f(ε) in terms of the accuracy ε and some properties of the graph (e.g. its size n)? Answering this also implies lower-bounds on the work involved in computing an approximation of a localized PageRank vector. Adverserial localized PageRank constructions exist where the solutions x are the uniform distribution (see Section 2). Thus, it is not possible to meaningfully bound f(ε) as anything other than n. To overcome this bound, we study f(ε) where the maximum is taken over graphs with a power-law degree distribution. We establish an upper-bound on f α (ε) as a function of the power-law exponent, 1/ε, α, the maximum degree d, and a few terms that grow sublinearly with n (Theorem 1). The essence of the argument is that we study the number of steps of an algorithm required to solve the PageRank linear system to a desired accuracy, and we then bound the number of iterations assuming that we start at the vertex with maximum degree. 1.1 Related work on weak localization There is another notion of localization that appears for uses of PageRank in partitioning undirected graphs: D 1 (x ε x) = max [x ε ] i x i /d i ε. i If this notion is used for a localized Cheeger inequality [1, 8], then we need the additional property that 0 x ε x element-wise. But when restated as a localization result, the famous Andersen-Chung-Lang PageRank partitioning result [1] includes a proof that: max P max min nonzeros(x ε ) 1 s x ε 1 1 α ε, where D 1 (x ε x(α, P, s)) ε.

3 Localization in Personalized PageRank 3 This establishes that any uniform random walk on a graph satisfies a weaklocalization property. The paper also gives a fast algorithm to find these weakly local solutions. More recently, there have appeared a variety of additional weaklocalization results on diffusions [10, 16]. 1.2 Related work on functions of matrices and diffusions Localization in diffusions is broadly related to localization in functions of matrices [5]. The results in that literature tend to focus on the case of banded matrices (e.g. [4]), although there are also discussions of more general results in terms of graphs arising from sparse matrices [5]. In the context of the decay of functions of matrices, our result in this manuscript establishes a strong decay bound without assuming a constant degree graph. These same types of decay bounds can apply to a variety of graph diffusion models that involve a stochastic matrix [2, 13], and recent work shows that they may even extend beyond this regime [10]. 2 A negative result for strong localization Here we construct an example of a graph that has a seeded PageRank vector that cannot be approximated locally or with a sublinear number of non-zeros. More concretely, we demonstrate the existence of a personalized PageRank vector that requires at least (n 1)ε(1 + α)/α nonzeros to attain a 1-norm accuracy of ε, where n is the number of nodes in the graph. The construction is simple, and we sketch it as follows: let G be an undirected star graph on n nodes. Then the PageRank vector seeded on the center node has two values: 1/(1 + α) for the center node and α/((1 + α)(n 1)) for all leaf nodes. Suppose an approximation ˆx of x has M of these leaf-node entries set to 0. Then the 1-norm error x ˆx 1 would be at least Mα/((1 + α)(n 1)). Attaining a 1-norm of ε requires Mα/((1 + α)(n 1)) < ε, and so the minimum number of entries of the approximate PageRank vector required to be non-zero (n M) is then lowerbounded by (n 1)ε(1 + α)/α. Thus, we have that the number of nonzeros required in the approximate PageRank vector must be linear in n. 3 Localization in Personalized PageRank The example in Section 2 demonstrates that there exist seeded PageRank vectors that are non-local. Here we focus on a particular class of graphs, those with a degree distribution following a power-law (Section 3.2), and show that seeded PageRank vectors on these graphs can always be ε-approximated with a number of non-zero entries that depends sublinearly on n. Theorem 1. Let P be a uniform random walk transition matrix of a graph with maximum degree d and minimum degree δ. Additionally, suppose the graph has a sub-power-law degree distribution with exponent p, so that the kth largest degree, d(k), satisfies d(k) d k p. The Gauss-Southwell coordinate relaxation method

4 4 Nassar, Kloster, and Gleich applied to the Personalized PageRank problem (I αp)x = (1 α)e s produces an approximation x ε satisfying x x ε 1 < ε in t steps and t non-zeros in the solution, where t satisfies t 1 δ C ( 1 ) δ 1 α p ε, (1) and where we define C p to be C p := d(1 + log d) if p = 1 ( ( )) := d p d 1 p 1 1 otherwise. This theorem originates in our recent work on the heat kernel diffusion (see Theorem 5.7 in [11]), and we adapt the techniques used in that proof for the current seeded PageRank vector. 3.1 Deriving the bound We want to compute an ε-approximation x ε to the equation (I αp) x = (1 α)e s for some seed vector e s. Given an approximation, ˆx, we can bound the 1-norm error as follows. First, define the residual vector r = (1 α)e s (I αp)ˆx, and note the following relationship of the residual vector to the error vector x ˆx = (I αp) 1 r. (2) Using this relationship, we can bound our approximation s 1-norm accuracy, x ˆx 1, with the quantity 1 1 α r 1. This is because the column-stochasticity of P implies that (I αp) 1 1 = 1 1 α. Guaranteeing a 1-norm error x ˆx 1 < ε is then a matter of ensuring that r 1 < (1 α)ε holds. To bound the residual norm, we look more closely at a particular method for producing the approximation. Gauss-Southwell iteration The Gauss-Southwell algorithm is a coordinate-descent method for solving a linear system akin to the Gauss-Seidel linear solver. When solving a linear system, the Gauss-Southwell method proceeds by updating the entry of the approximation that corresponds to the largest magnitude entry of the residual, r, as follows. Next we describe the Gauss-Southwell update specifically as it is used to solve the seeded PageRank linear system given in the theorem statement. The algorithm begins by setting the initial solution x (0) = 0 and r (0) = (1 α)e s. In step k, let j = j(k) be the entry of r (k) with the largest magnitude, and let m = r (k) j. We update the solution x (k) and residual vectors as follows: x (k+1) = x (k) + me j (3) r (k+1) = s (I αp)x (k+1), (4) and the residual update can be expanded to r (k+1) = r (k) me j + mαpe j. This application of Gauss-Southwell to personalized PageRank style problems has

5 Localization in Personalized PageRank 5 appeared numerous times in recent literature [6, 7, 15, 17]. In at least one instance ([7], Section 5.2) the authors showed that the residual and solution vector stay nonnegative throughout this process, assuming the seed vector is nonnegative (which, in our context, it is). So the 1-norm of the residual can be expressed r (k+1) 1 = e T r (k+1). Expanding the residual in terms of the iterative update presented above, we can write the residual norm as e T ( r (k) me j + mαpe j ). Then, denoting r (k) 1 by r k and simplifying yields r k+1 = r k m(1 α). Next we bound the max magnitude entry, m, to be able to bound the residual norm. Observe that since m is the largest magnitude entry in r, in particular it is larger than the average value of r. Let Z(k) denote the number of nonzero entries in r (k) ; then the average value can be expressed as r k /Z(k). Hence, we have m r k /Z(k), and so we can bound r k m(1 α) above by r k r k (1 α)/z(k). Simplifying yields r k+1 r k (1 (1 α)/z(k)). Recurring this gives r k+1 r 0 k t=0 ( ) 1 1 α Z(t), (5) where r 0 = (1 α) because r 0 = (1 α)e s. Then, using the fact that log(1 x) x for x < 1, we can show ( ) k ( ) k r k+1 (1 α) 1 1 α 1 Z(t) (1 α) exp (1 α) Z(t). (6) t=0 To progress from here we need some control over the quantity Z(t). One property that can limit the behavior of Z(t) is the degree distribution of the graph, which we now explore. 3.2 Using the degree distribution At last we can use the fill-in analysis for power-law graphs presented in [11]. Before continuing, we first establish some notation. Let our graph have max degree d and min degree δ, and denote the degree of the node that has the kth largest degree by d(k). Assuming the graph s degree distribution is power-law or sub power-law with exponent p, we know d(k) satisfies d(k) d k p. (Note that we use sub power-law here to refer to the fact that d(k) is bounded above by, rather than exactly equal to, the shown quantity.) We show below that the nonzero fill-in above is bounded as follows t=0 Z(k) C p + δk, (7) where the term C p is defined in the statement of Theorem 1. We remark that the quantity C p used here are slightly tighter than the values presented in the original paper. Next we use this bound on Z(k) to control the bound on r k. Lemma 5.6 from [11] implies that k t=0 1 Z(t) 1 δ log ((δ(k + 1) + C p)/c p )

6 6 Nassar, Kloster, and Gleich and so, plugging into (6), we can bound ( ( )) r k+1 (1 α) exp (1 α) (δ(k+1)+cp) δ log C p, which simplifies to r k+1 (1 α) ((δ(k + 1) + C p )/C p ) (α 1)/δ. Finally, to guarantee r k < ε(1 α), it suffices to choose k so that ((δk + C p )/C p ) (α 1)/δ ε. But this holds if and only if (δk + C p ) C p (1/ε) δ/(α 1) is satisfied; this, in turn, is guaranteed by k 1 δ C p (1/ε) δ/(1 α), which proves the bound on the number of iterations. Proving the degree distribution bound Here we prove the inequality in (7) used in the proof above. We give a sketch here, as this is essentialy the same as the proof of Lemma 5.5 in [11]. First, observe that the number of nonzeros in the residual after t steps is upperbounded by the sum of the largest t degrees, Z(t) t k=1 d(k). Now we use the power-law bound on d(k). We substitute into the bound Z(t) t k=1 d(k) the degree bound d(k) dk p. However, this makes sense only when k (d/δ) 1/p, for δ the minimum degree; this is because for k > (d/δ) 1/p the quantity dk p is less than 1, which makes no sense as a node s degree. Hence, we split the summation Z(t) t k=1 Z(t) t d(k) k=1 d(k) into two pieces, (d/δ) 1/p dk p k=1 + t k= (d/δ) 1/p +1 We want to prove that this implies Z(t) C p + δt. The second summand is straightforward to majorize with δt. The first summand can be upperbounded by d (1 + ) (d/δ) 1/p x p dx using a right-hand integral rule. But this integral is 1 straightforward to bound above by the quantity C p defined in Theorem 1. This completes the proof. δk. 4 Experiments We present experimental results on the localization of seeded PageRank vectors on random power-law graphs and compare the actual sparsity with the predictions of our theoretical bound. This involves generating random power-law graphs (Section 4.1), and then, comparing the experimental localization with our theoretical bound (Section 4.3). It is not particularly accurate, and so we conjecture a new bound that better predicts the behavior witnessed (Section 4.4). 4.1 Generating the graphs For experimental comparison, we wanted a test suite of graphs with varying but specific sizes and degree distributions. To produce these graphs, we use

7 Localization in Personalized PageRank 7 the Bayati-Kim-Saberi procedure [3] for generating undirected graphs with a prescribed degree distribution. The degree distributions used follow a power-law in the first (δ/d) 1/p elements with degrees d/i p. All other nodes are assigned to have degree δ up to a total size of n. We choose the maximum degree d to be n and δ = 2. After generating the sequence vector v, we use the Erdős-Gallai conditions and the Havel-Hakimi algorithm to check whether v is graphical. If the previously generated sequence fails, we perturb the vector slightly and recheck the conditions. This step has to be passed for a graph to be generated. This usually fails because the sequence has an odd number of edges, and it is sufficient to increase the degree of one of the nodes with minimum degree by 1. After the graph is generated, we verify that it contains a large connected component. We proceed with the graph when the largest component includes at least n( ) nodes. 4.2 Measuring the non-zeros We first compute a PageRank vector to high-accuracy (1-norm error bounded by ) using the power method. This requires (log(ε/2))/(log(α)) iterations based on the geometric convergence at rate α. We then study vectors x ε satisfying x ε x 1 ε, for accuracies ε = [10 1,, 10 3, 10 4 ]. To count the number or nonzeros in a vector x ε for a particular accuracy ε, we first recall: f α (ε) = max P max min nonzeros(x ε ) where x ε x(α, P, s) s x 1 ε. ε Thus, we need to compute x ε in a way that includes as many zeros as possible, subject to the constraint that the 1-norm of the difference between x ε and x stays ε. The idea is to use the solution vector x and generate x ε from it as long as the error difference stays less than the chosen ε. The following steps illustrate our process to accomplish this: Compute the PageRank vector x with accuracy via the power method. Sort x in ascending order. Determine the largest index j so that the sum of entries 1 through j of x is less than ε. Truncate these j entries to 0. Then x ε contains n j nonzeros. 4.3 Testing the theoretical bound To test the effectiveness of our theoretical bound in Theorem 1, we generate with different values of power-law exponent p = [0.5, 0.75, 1, 1.25] and with different sizes n = 10 4,..., Then we solve the seeded PageRank system, seeded on the node of maximum degree, using parameter settings α = [0.25, 0.3, 0.5, 0.65, 0.85].

8 8 Nassar, Kloster, and Gleich Figure 1 shows the outcome of this experiment with p = We can see that the theoretical bound is slightly more accurate as the graph size gets bigger, yet stays far from the plot of the sparsity of the ε-approximate diffusion. Since the theoretical bound behaves poorly even on the extreme points of the parameter settings, we wished for a tighter empirical bound. α = n = 10 5 n = 10 p value = n = 10 n = 10 7 n = 10 8 n = 10 9 α =0.85 α =0.65 α =0.5 α =0.3 0 Fig. 1. A log-log plot of the nnz(xε) versus 1/ε obtained for different experiments as α varies. We fix p = 1.25 for all plots, and run experiments on graphs of sizes [10 4,, 10 6, 10 7, 10 8, 10 9 ]. The red dashed line represents a vector with all non-zeros present (i.e. a ratio of 1). The black dashed line shows the bound predicted by Theorem 1. The blue curve shows the actual ratio of non-zeros found. 4.4 Experimental analysis In this section, we present a new conjectured bound that better predicts the behavior of the number of nonzeros in x ε as other parameters vary. To derive a relationship between nnz(x ε ), ε, p and α, we consider the following experiment. We fix n = 10 6, p = 1.25 and generate a graph as mentioned in Section 4.1. We then solve the PageRank problem and find the number of nonzeros for different ε values as mentioned in Section 4.2. We use α = [0.25, 0.3, 0.5, 0.65, 0.85] and count the number of nonzeros in the diffusion vector based on four accuracy

9 Localization in Personalized PageRank 9 settings, ε = [10 1,, 10 3, 10 4 ]. We then generate a log-log plot of nnz(xε) versus 1/ε for the different values of α. (The choice of was based on pieces of the theoretical bound.) The outcome is illustrated in Figure 2 (left). From Figure 2, we can see that as α increases, the values nnz(xε) also increase, interestingly, almost as a linear shift. We prefer to focus on (1 α), and so we view our previous observation to be in terms of (1 α). As (1 α) decreases, the curves representing nnz(xε) shift upward, nearly linearly. The initial goal is to find a relation of the form nnz(xε) g(α, ε, p) for some function g. Since we have seen that nnz(xε) seems to vary inversely with (1 α), we choose this relation to be: nnz(xε) g(ε, p). c1 (1 α) We similary derive a relation between nnz(xε) and p. To study the relationship, we fix n = 10 6, generate graphs of different power-law degrees, namely: p = [0.5, 0.75, 1, 1.25], solve the PageRank problem with α = 0.5, and count the number of nonzeros based on four accuracy settings. We report the results in Figure 2 (right). Through figure 2 (right), we can see that as the value of p increases, the curves nnz(xε) appear to grow much more slowly. Furthermore, the difference between the curves becomes exponential as 1/ε increases. Thus, this leads us to think of the relation between p and nnz(xε) as an exponential function in terms of 1/ε. Also, since p and nnz(xε) are inversely related, we consider 1/p rather than p. Therefore, we arrive at a relationship of the form: nnz(x ε) ( c1 1 ) c2/p c 3 (1 α) ε for some constants c 1, c 2, c 3. After experimenting with the above bound, we found that the best bound is achieved at c 1 = 0.2, c 2 = 0.25, c 3 = Results The experimental bound derived in Section 4.4 is now: ( nnz ) 1/(2p) 2 (1 α) ε. (8) In what follows, we demonstrate the effectiveness of this bound in describing the localizaiton of seeded PageRank vectors computed with different values of α, on power-law graphs of different sizes and with varying power-law exponents. For each set of parameters (graph size n, power-law exponent p, and PageRank constant α), a plot in Figures 3 and 4 displays the number of nonzeros needed to approximate a PageRank vector with 1-norm accuracy ε. In other words, each plot shows how f α (ε) grows as 1/ε grows for one type of graph. The blue curve represents the actual number of nonzeros required in the ε-approximation. Each plot also has a black dashed line showing the prediction by our conjectured bound (8). We note that our conjectured bound fails in a few sub-plots of Figure 2, for p = 0.5, α = This is likely because we tuned the constants in our bound

10 10 Nassar, Kloster, and Gleich 10 1 varying α varying p nnz/dlog(d) 10 1 nnz/dlog(d) log (1/ ε) log (1/ ε) Fig. 2. Log-log plots of nnz(xε) versus 1/ε obtained on graphs of size n = 106 as α and p vary. At left, p is fixed to p = 1.25 and the black, green, blue, red, and dotted black curves represent nnz(xε) for α values = [0.25, 0.3, 0.5, 0.65, 0.85] respectively. At right, α is fixed to α = 0.5 and the dashed blue, black, green, and red curves represent nnz(xε) for p values = [0.5, 0.75, 1, 1.25], respectively. from less extremal parameter settings; in contrast, the settings p = 0.5 and α = 0.85 represent the densest graphs and most-diffusive PageRank setting among those we consider. 5 Discussion We have shown that seeded PageRank vectors, though not localized on all graphs, must behave locally on graphs with power-law degree distributions. Our experiments show our theoretical bound to be terribly loose. In some sense this is to be expected as our algorithmic analysis is worst case. However, it isn t clear that any real-world graphs realize these worst-case scenarios. We thus plan to continue our study of simple graph models to identify empirical and theoretical localization bounds based on the parameters of the models. This will include a theoretical justification or revisitation of the empirically derived bound. It will also include new studies of Chung-Lu graphs as well as the Havel-Hakimi construction itself. Finally, we also plan to explore the impact of local clustering. Our conjecture is that this should exert a powerful localization effect beyond that due to the degree distribution. One open question sparked by our work regards the relationship between localized solutions and constant or shrinking average distance in graphs. It is well known that social networks appear to have shrinking or constant effective diameters. Existing results in the theory of localization of functions of matrices imply that a precise bound on diameter would force delocalization as the graph grows. Although the localization theory says nothing about average distance or small effective diameters, it hints that the solutions would delocalize. However, solutions often localize nicely in real-world networks and we wish to understand the origins of the empirical localization behavior more fully.

11 Localization in Personalized PageRank 11 α =0.65 α =0.5 α =0.3 α =0.25 α =0.85 p value =0.5 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 α =0.85 α =0.65 α =0.5 α =0.3 α =0.25 p value =0.75 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 Fig. 3. Each sub-plot has x-axis representing 1/ε, and y-axis representing the ratio of non-zeros present in a diffusion vector of 1-norm accuracy ε. The red dashed line represents a vector with all non-zeros present (i.e. a ratio of 1). The black dashed line shows the bound predicted by our bound (8). The blue curve shows the actual ratio of non-zeros found. As graphs get bigger (i.e. the fourth and fifth columns) the theoretical bound (black line) almost exactly predicts the locality of the ε-approximate diffusion.

12 12 Nassar, Kloster, and Gleich α =0.85 α =0.65 α =0.5 α =0.3 α =0.25 p value =1 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 α =0.85 α =0.65 α =0.5 α =0.3 α =0.25 p value =1.25 trial =1 n = 10 4 n = n = 10 6 n = 10 7 n = 10 8 n = 10 9 Fig. 4. Each sub-plot has x-axis representing 1/ε, and y-axis representing the ratio of non-zeros present in a diffusion vector of 1-norm accuracy ε. The red dashed line represents a vector with all non-zeros present (i.e. a ratio of 1). The black dashed line shows the bound predicted by our bound (8). The blue curve shows the actual ratio of non-zeros found. As graphs get bigger (i.e. the fourth and fifth columns) the theoretical bound (black line) almost exactly predicts the locality of the ε-approximate diffusion.

13 Localization in Personalized PageRank 13 References 1. Andersen, R., Chung, F., Lang, K.: Local graph partitioning using PageRank vectors. In: FOCS2006 (2006) 2. Baeza-Yates, R., Boldi, P., Castillo, C.: Generalizing PageRank: Damping functions for link-based ranking algorithms. In: SIGIR2006. pp (2006) 3. Bayati, M., Kim, J., Saberi, A.: A sequential algorithm for generating random graphs. Algorithmica 58(4), (2010) 4. Benzi, M., Razouk, N.: Decay bounds and O(n) algorithms for approximating functions of sparse matrices. ETNA 28, (2007) 5. Benzi, M., Boito, P., Razouk, N.: Decay properties of spectral projectors with applications to electronic structure. SIAM Review 55(1), 3 64 (2013) 6. Berkhin, P.: Bookmark-coloring algorithm for personalized PageRank computing. Internet Mathematics 3(1), (2007) 7. Bonchi, F., Esfandiar, P., Gleich, D.F., Greif, C., Lakshmanan, L.V.: Fast matrix computations for pairwise and columnwise commute times and Katz scores. Internet Mathematics 8(1-2), (2012) 8. Chung, F.: The heat kernel as the PageRank of a graph. Proceedings of the National Academy of Sciences 104(50), (December 2007) 9. Freschi, V.: Protein function prediction from interaction networks using a random walk ranking algorithm. In: BIBE. pp (2007) 10. Ghosh, R., Teng, S.h., Lerman, K., Yan, X.: The interplay between dynamics and networks: Centrality, communities, and cheeger inequality. pp (2014) 11. Gleich, D.F., Kloster, K.: Sublinear column-wise actions of the matrix exponential on social networks. Internet Mathematics (just-accepted) (2014) 12. Gori, M., Pucci, A.: ItemRank: a random-walk based scoring algorithm for recommender engines. In: IJCAI. pp (2007) 13. Huberman, B.A., Pirolli, P.L.T., Pitkow, J.E., Lukose, R.M.: Strong regularities in World Wide Web surfing. Science 280(5360), (1998) 14. Jain, A., Pantel, P.: Factrank: Random walks on a web of facts. In: COLING. pp (2010) 15. Jeh, G., Widom, J.: Scaling personalized web search. In: WWW. pp (2003) 16. Kloster, K., Gleich, D.F.: Heat kernel based community detection. In: KDD. pp (2014) 17. McSherry, F.: A uniform approach to accelerated PageRank computation. In: WWW. pp (2005) 18. Morrison, J.L., Breitling, R., Higham, D.J., Gilbert, D.R.: Generank: using search engine technology for the analysis of microarray experiments. BMC bioinformatics 6(1), 233 (2005) 19. Nie, Z., Zhang, Y., Wen, J.R., Ma, W.Y.: Object-level ranking: Bringing order to web objects. In: WWW. pp (2005) 20. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Tech. Rep , Stanford University (1999)

Strong localization in seeded PageRank vectors

Strong localization in seeded PageRank vectors Strong localization in seeded PageRank vectors https://github.com/nassarhuda/pprlocal David F. Gleich" Huda Nassar Computer Science" Computer Science " supported by NSF CAREER CCF-1149756, " IIS-1422918,

More information

Facebook Friends! and Matrix Functions

Facebook Friends! and Matrix Functions Facebook Friends! and Matrix Functions! Graduate Research Day Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Network Analysis Use linear algebra

More information

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks Kyle Kloster and David F. Gleich Purdue University December 14, 2013 Supported by NSF CAREER 1149756-CCF Kyle

More information

Heat Kernel Based Community Detection

Heat Kernel Based Community Detection Heat Kernel Based Community Detection Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Local Community Detection Given seed(s) S in G, find a

More information

A Nearly-Sublinear Method for Approximating a Column of the Matrix Exponential for Matrices from Large, Sparse Networks

A Nearly-Sublinear Method for Approximating a Column of the Matrix Exponential for Matrices from Large, Sparse Networks A Nearly-Sublinear Method for Approximating a Column of the Matrix Exponential for Matrices from Large, Sparse Networks Kyle Kloster, and David F. Gleich 2, Purdue University, Mathematics Department 2

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

The Push Algorithm for Spectral Ranking

The Push Algorithm for Spectral Ranking The Push Algorithm for Spectral Ranking Paolo Boldi Sebastiano Vigna March 8, 204 Abstract The push algorithm was proposed first by Jeh and Widom [6] in the context of personalized PageRank computations

More information

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Random Surfing on Multipartite Graphs

Random Surfing on Multipartite Graphs Random Surfing on Multipartite Graphs Athanasios N. Nikolakopoulos, Antonia Korba and John D. Garofalakis Department of Computer Engineering and Informatics, University of Patras December 07, 2016 IEEE

More information

Seeded PageRank solution paths

Seeded PageRank solution paths Euro. Jnl of Applied Mathematics (2016), vol. 27, pp. 812 845. c Cambridge University Press 2016 doi:10.1017/s0956792516000280 812 Seeded PageRank solution paths D. F. GLEICH 1 and K. KLOSTER 2 1 Department

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

PAGERANK PARAMETERS. Amy N. Langville. American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010

PAGERANK PARAMETERS. Amy N. Langville. American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010 PAGERANK PARAMETERS 100David F. Gleich 120 Amy N. Langville American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010 Gleich & Langville AIM 1 / 21 The most important page on

More information

Cutting Graphs, Personal PageRank and Spilling Paint

Cutting Graphs, Personal PageRank and Spilling Paint Graphs and Networks Lecture 11 Cutting Graphs, Personal PageRank and Spilling Paint Daniel A. Spielman October 3, 2013 11.1 Disclaimer These notes are not necessarily an accurate representation of what

More information

Finding central nodes in large networks

Finding central nodes in large networks Finding central nodes in large networks Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands Woudschoten Conference 2017 Complex networks Networks: Internet, WWW, social

More information

Four graph partitioning algorithms. Fan Chung University of California, San Diego

Four graph partitioning algorithms. Fan Chung University of California, San Diego Four graph partitioning algorithms Fan Chung University of California, San Diego History of graph partitioning NP-hard approximation algorithms Spectral method, Fiedler 73, Folklore Multicommunity flow,

More information

Locally-biased analytics

Locally-biased analytics Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2:

More information

Analysis of Google s PageRank

Analysis of Google s PageRank Analysis of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills AN05 p.1 PageRank An objective measure of the citation importance of a web page [Brin & Page 1998]

More information

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES MICHELE BENZI AND CHRISTINE KLYMKO Abstract This document contains details of numerical

More information

Analysis and Computation of Google s PageRank

Analysis and Computation of Google s PageRank Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca S. Wills ANAW p.1 PageRank An objective measure of the citation importance of a web

More information

Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm

Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm Reid Andersen and Fan Chung University of California at San Diego, La Jolla CA 92093, USA, fan@ucsd.edu, http://www.math.ucsd.edu/~fan/,

More information

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018 Local properties of PageRank and graph limits Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018 Centrality in networks Network as a graph G = (V, E) Centrality

More information

Mathematical Properties & Analysis of Google s PageRank

Mathematical Properties & Analysis of Google s PageRank Mathematical Properties & Analysis of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca M. Wills Cedya p.1 PageRank An objective measure of the citation importance

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

The Second Eigenvalue of the Google Matrix

The Second Eigenvalue of the Google Matrix The Second Eigenvalue of the Google Matrix Taher H. Haveliwala and Sepandar D. Kamvar Stanford University {taherh,sdkamvar}@cs.stanford.edu Abstract. We determine analytically the modulus of the second

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

A hybrid reordered Arnoldi method to accelerate PageRank computations

A hybrid reordered Arnoldi method to accelerate PageRank computations A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

1 The linear algebra of linear programs (March 15 and 22, 2015)

1 The linear algebra of linear programs (March 15 and 22, 2015) 1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real

More information

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our

More information

Inf 2B: Ranking Queries on the WWW

Inf 2B: Ranking Queries on the WWW Inf B: Ranking Queries on the WWW Kyriakos Kalorkoti School of Informatics University of Edinburgh Queries Suppose we have an Inverted Index for a set of webpages. Disclaimer Not really the scenario of

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Model Reduction for Edge-Weighted Personalized PageRank

Model Reduction for Edge-Weighted Personalized PageRank Model Reduction for Edge-Weighted Personalized PageRank David Bindel Mar 2, 2015 David Bindel SCAN Mar 2, 2015 1 / 29 The PageRank Model Surfer follows random link (probability α) or teleports to random

More information

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods Stat260/CS294: Spectral Graph Methods Lecture 20-04/07/205 Lecture: Local Spectral Methods (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics

More information

Personal PageRank and Spilling Paint

Personal PageRank and Spilling Paint Graphs and Networks Lecture 11 Personal PageRank and Spilling Paint Daniel A. Spielman October 7, 2010 11.1 Overview These lecture notes are not complete. The paint spilling metaphor is due to Berkhin

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Google Page Rank Project Linear Algebra Summer 2012

Google Page Rank Project Linear Algebra Summer 2012 Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

Uncertainty and Randomization

Uncertainty and Randomization Uncertainty and Randomization The PageRank Computation in Google Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it 1993: Robustness of Linear Systems 1993: Robustness of Linear Systems 16 Years

More information

Parallel Local Graph Clustering

Parallel Local Graph Clustering Parallel Local Graph Clustering Kimon Fountoulakis, joint work with J. Shun, X. Cheng, F. Roosta-Khorasani, M. Mahoney, D. Gleich University of California Berkeley and Purdue University Based on J. Shun,

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web

More information

Krylov Subspace Methods to Calculate PageRank

Krylov Subspace Methods to Calculate PageRank Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Algebraic Representation of Networks

Algebraic Representation of Networks Algebraic Representation of Networks 0 1 2 1 1 0 0 1 2 0 0 1 1 1 1 1 Hiroki Sayama sayama@binghamton.edu Describing networks with matrices (1) Adjacency matrix A matrix with rows and columns labeled by

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Affine iterations on nonnegative vectors

Affine iterations on nonnegative vectors Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS

LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS LOCAL CHEEGER INEQUALITIES AND SPARSE CUTS CAELAN GARRETT 18.434 FINAL PROJECT CAELAN@MIT.EDU Abstract. When computing on massive graphs, it is not tractable to iterate over the full graph. Local graph

More information

Computing PageRank using Power Extrapolation

Computing PageRank using Power Extrapolation Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Matrix estimation by Universal Singular Value Thresholding

Matrix estimation by Universal Singular Value Thresholding Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric

More information

Node and Link Analysis

Node and Link Analysis Node and Link Analysis Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 10.02.2014 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014

More information

Heat Kernel Based Community Detection

Heat Kernel Based Community Detection Heat Kernel Based Community Detection Kyle Kloster Purdue University West Lafayette, IN kkloste@purdueedu David F Gleich Purdue University West Lafayette, IN dgleich@purdueedu ABSTRACT The heat kernel

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Calculating Web Page Authority Using the PageRank Algorithm

Calculating Web Page Authority Using the PageRank Algorithm Jacob Miles Prystowsky and Levi Gill Math 45, Fall 2005 1 Introduction 1.1 Abstract In this document, we examine how the Google Internet search engine uses the PageRank algorithm to assign quantitatively

More information

Node Centrality and Ranking on Networks

Node Centrality and Ranking on Networks Node Centrality and Ranking on Networks Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social

More information

Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative

Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative David Gleich 1, Peter Glynn 2, Gene Golub 3, Chen Greif 4 1 Stanford University, Institute for Computational and Mathematical

More information

Lecture 3: Simulating social networks

Lecture 3: Simulating social networks Lecture 3: Simulating social networks Short Course on Social Interactions and Networks, CEMFI, May 26-28th, 2014 Bryan S. Graham 28 May 2014 Consider a network with adjacency matrix D = d and corresponding

More information

Inferring Rankings Using Constrained Sensing Srikanth Jagabathula and Devavrat Shah

Inferring Rankings Using Constrained Sensing Srikanth Jagabathula and Devavrat Shah 7288 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 Inferring Rankings Using Constrained Sensing Srikanth Jagabathula Devavrat Shah Abstract We consider the problem of recovering

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Link Analysis. Leonid E. Zhukov

Link Analysis. Leonid E. Zhukov Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

More information

Computational Economics and Finance

Computational Economics and Finance Computational Economics and Finance Part II: Linear Equations Spring 2016 Outline Back Substitution, LU and other decomposi- Direct methods: tions Error analysis and condition numbers Iterative methods:

More information

1 Complex Networks - A Brief Overview

1 Complex Networks - A Brief Overview Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,

More information

Applications to network analysis: Eigenvector centrality indices Lecture notes

Applications to network analysis: Eigenvector centrality indices Lecture notes Applications to network analysis: Eigenvector centrality indices Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the second part of the course Nonnegative and spectral matrix

More information

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds MAE 298, Lecture 8 Feb 4, 2008 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in a file-sharing

More information

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs

More information

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10 PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to

More information

Detailed Proof of The PerronFrobenius Theorem

Detailed Proof of The PerronFrobenius Theorem Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design Rahul Dhal Electrical Engineering and Computer Science Washington State University Pullman, WA rdhal@eecs.wsu.edu

More information

Linearly-solvable Markov decision problems

Linearly-solvable Markov decision problems Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

CLASSICAL ITERATIVE METHODS

CLASSICAL ITERATIVE METHODS CLASSICAL ITERATIVE METHODS LONG CHEN In this notes we discuss classic iterative methods on solving the linear operator equation (1) Au = f, posed on a finite dimensional Hilbert space V = R N equipped

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Eigenvalue Problems Computation and Applications

Eigenvalue Problems Computation and Applications Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from

More information

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Part a: You are given a graph G = (V,E) with edge weights w(e) > 0 for e E. You are also given a minimum cost spanning tree (MST) T. For one particular edge

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Data science with multilayer networks: Mathematical foundations and applications

Data science with multilayer networks: Mathematical foundations and applications Data science with multilayer networks: Mathematical foundations and applications CDSE Days University at Buffalo, State University of New York Monday April 9, 2018 Dane Taylor Assistant Professor of Mathematics

More information

Robust PageRank: Stationary Distribution on a Growing Network Structure

Robust PageRank: Stationary Distribution on a Growing Network Structure oname manuscript o. will be inserted by the editor Robust PageRank: Stationary Distribution on a Growing etwork Structure Anna Timonina-Farkas Received: date / Accepted: date Abstract PageRank PR is a

More information

Numerical linear algebra

Numerical linear algebra Numerical linear algebra Purdue University CS 51500 Fall 2017 David Gleich David F. Gleich Call me Prof Gleich Dr. Gleich Please not Hey matrix guy! Huda Nassar Call me Huda Ms. Huda Please not Matrix

More information

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

ORIE 6334 Spectral Graph Theory September 22, Lecture 11 ORIE 6334 Spectral Graph Theory September, 06 Lecturer: David P. Williamson Lecture Scribe: Pu Yang In today s lecture we will focus on discrete time random walks on undirected graphs. Specifically, we

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Capacity Releasing Diffusion for Speed and Locality

Capacity Releasing Diffusion for Speed and Locality 000 00 002 003 004 005 006 007 008 009 00 0 02 03 04 05 06 07 08 09 020 02 022 023 024 025 026 027 028 029 030 03 032 033 034 035 036 037 038 039 040 04 042 043 044 045 046 047 048 049 050 05 052 053 054

More information

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei

More information

Updating PageRank. Amy Langville Carl Meyer

Updating PageRank. Amy Langville Carl Meyer Updating PageRank Amy Langville Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC SCCM 11/17/2003 Indexing Google Must index key terms on each page Robots crawl the web software

More information