A Fast Two-Stage Algorithm for Computing PageRank

Size: px
Start display at page:

Download "A Fast Two-Stage Algorithm for Computing PageRank"

Transcription

1 A Fast Two-Stage Algorithm for Computing PageRank Chris Pan-Chi Lee Stanford University Gene H. Golub Stanford University Stefanos A. Zenios Stanford University ABSTRACT In this paper we present a fast two-stage algorithm for computing the PageRank [16] vector. Our algorithm exploits the observation that the homogeneous discrete-time Markov chain associated with PageRank is lumpable [13]; the lumpable subset of nodes are precisely the dangling nodes. As a result the algorithm can converge in a fraction of the time compared to the standard PageRank algorithm [16]. On data of 451,237 webpages, our two-stage algorithm converged in only 20% of the time compared the standard PageRank algorithm. The algorithm described here also replaces a common practice which is in general not correct. Namely, the practice of including the dangling nodes only during the last stages of computation [16] doesn t necessarily accelerate convergence in a general context; on the other hand, our algorithm is provable, generally applicable, and achieves the desired speed gains. Keywords PageRank, link analysis, dangling nodes, Power Method, eigenvector computation, limiting distribution, statespace reduction, state aggregation, lumpable Markov chains 1. INTRODUCTION Aside from its commercial success, the PageRank approach to ranking webpages has generated a significant amout of interest in the research community. The Markov chain interpretation gives an explicit model for web traffic and surfer behavior, yet the computation poses a numerically daunting challenge [15]. With billions of webpages already in existence, computing the PageRank vector is a very time-consuming procedure. It is reported in [11] that the computation of a PageRank vector over 290 million webpages requires as much as 3 hours 1 ; the computing time for a realistically large subset of the entire web would take days. Furthremore, frequent computation of the PageRank vector is often necessary. With webpages constantly updated, added, or removed, the PageRank vector needs to be re-computed continusouly to maintain timeliness and relevance of the search results. In the context of personalized web search [9], a number of PageRank vectors need to be computed to Scientific Computing & Computational Mathematics Program Department of Computer Science Division of Operations, Information, & Technology, Graduate School of Business 1 On a 1.5GHz AMD Athlon with 3.5GB of RAM. reflect the preferences of different classes of websurfers. Clearly, there is a demand for faster algorithms. The PageRank vector can be regarded as the limiting distribution of a homogeneous discrete-time Markov chain that jumps from webpage to webpage. In this paper we present a fast algorithm for computing the PageRank vector. The algorithm exploits the observation that the Markov chain is in fact lumpable [13]. This algorithm proceeds in two stages. In the first stage, we compute the limiting distribution of the chain in which the dangling nodes [16] are combined into one super node; in the second stage, we compute the limiting distribution of the chain in which the non-dangling nodes are combined into one 2. When the limiting distribution of the two chains are concatenated, we recover the limiting distribution of the original chain i.e. the PageRank vector. As we shall see, this approach can dramatically reduce the overall amount of computing time. A number of papers discuss accelerating PageRank computation, and many of these focus on numerical linear algebra techniques. A Gauss-Seidel algorithm is discussed in [1] where the most recent component values of the PageRank vector are used in the computation. In [12], one periodically subtracts away approximation of the sub-dominant eigenvectors to accelerate convergence. It is noted in [11] that when sorted by url, the Google matrix has a block structure; hence, a PageRank vector can be computed separately for each block, and the results are pasted together to yield a good starting iterate for the entire matrix. It is noted in [10] that components of the PageRank vector converge at different rates, and hence by not re-computing components that have converged performance gains are realized. This paper contributes to this growing literature in a number of ways. First, by adopting a characteristically Markov chain view and observing that the chain associated with PageRank is lumpable, not only are we able to achive performance gains we bring to forefront a powerful technique for statespace reduction. This technique of lumping is disctinctively different from the better-known technique of state aggregation cf. [3], [14], [17], which we also make use of in this paper. Thus, we have a two-stage algorithm where during each stage a different statespace reduction method is used; the reduction is aggressive, the overall performance gains are very significant, and the concept is novel. Second, our approach is analyzable. Whereas previous methods sometimes rely on intuition and approximate arguments, our procedure can be analyzed with greater precision, leading to some very interesting results. In addition, we show that the common practice of including the dangling nodes only during the last stages of computation [16] does not accelerate convergence in general and can be replaced by our present 2 In this paper, the terms node, state, and webpage are used interchangeably.

2 algorithm. Lastly, our approach is easily combined with many other methods to exploit even greater performance gains. For example, all of the existing methods described above can be combined with our approach, especially during the first-stage of our algorithm. Notation Notation in this paper is as follows. If v is a vector, then vi denotes the i-th element of v. If M is a matrix, then Mi, j denotes the element in the i-th row and the j-th column; Mi : j, k : l denotes the elements in rows i through j and columns k through l; Mi, : denotes the entire i-th row; and so on. Supesrcripts and subscripts may have different meanings depending on the context, but the meaning is always made clear. An un-transposed vector is always a column vector; the transpose is superscripted with a T. 1 is the sum of the absolute values of a vector. For example, v 1 = i vi. The notation en means an n-dimensional vector of 1 s. 2. PAGERANK REVIEW The central idea behind PageRank is to regard web surfing as a Markov chain. Imagine a collection of webpages indexed as S = {1, 2,..., N}, and suppose we have a personalization vector u R N 1 which records a generic surfer s preference for each page in S. 3 Let this generic surfer be currently at some page i S. We assume at the next time step, the surfer will move to some j S according to the probabilty: { where Qi, j = Gi,j Nl=1 Gi,l uj if Gi, l = 1 for some l otherwise { 1 if there is an outlink from i to j Gi, j = 0 otherwise. The above definition has a nice interpretation. If the i-th page has outlinks, the surfer will move to one of the outlinks with an equal probability; this corresponds to the first case in the definition of Q above. If no outlink from i exists, the surfer will move to any page in S at a probability according to preference; this corresponds to the second case above. A page that has no outlink is called a dangling page. At each time step k = 0, 1, 2,..., we assume the surfer to jump from page to page according to the above probabilities. This gives rise to a homogeneous discrete-time Markov chain. Q = [Qi, j] is simply the transition probability matrix for this Markov chain. Let π 0 R N 1 denote the probability distribution of where the surfer is to be found at the initial time step, then the distribution for the k-th time step is given by πk T = π0 T Q k. The idea of PageRank is that the importance of webpages can be defined as the limiting probability distribution associated with the Markov chain as k. Such a limiting distribution can be interpreted has the proportion of time the surfer spends on each webpage, and certainly quantifies the notion of importance. Unfortunately, there is nothing in our definition so far that guarantees convergence to such a limiting distribution as k. The solution is to consider a closely related Markov chain by adding to Q a small shift. We take P = cq + 1 ce N u T 1 to be our new transition probability matrix. e N u T is simply a matrix where each row is u T. c is a constant in 0, 1, and P in some 3 Specifically, the personalization vector is assumed to be componentwise positive and normalized so that N l=1 ul = 1 sense approximates Q for c close to 1. 4 It is easily seen that P is a positive matrix, and each row sums to 1. Specifically, the Markov chain associated with P is irreducible and aperiodic. 5 The Perron- Frobenius Theorem and the Power Method cf. pp of [2], pp of [5] guarantee that for such a matrix a unique limiting distribution π T = lim k π0 T P k exists regardless of the initial distribution. The PageRank vector is defined to be this limiting distribution π. P is also known as the Google matrix. P has some interesting properties. Let S D denote the subset of S containing only the dangling nodes, and let S ND = S \ S D be the subset containing only the non-dangling nodes. Then { c Gi,: Nl=1 P i, : = + 1 Gi,l cut if i S ND ; 2 u T if i S D. All the rows of P that correspond to a dangling node are identically u T. The rows that correspond to a non-dangling node are separable into two components. The first component consists of contribution from G, or the actual outlinks; the second component is u T. Below we give the standard algorithm [16] for computing the PageRank vector π. The algorithm proceeds by first taking an arbitrary vector, then multiplying it by P repeatedly until convergence; it is an implementation of the Power Method. ALGORITHM 1 PAGERANK. form P, where P i, j = select any y R N 1 do x = y y T = cx T P d = x 1 y 1 y = y + du δ = y x 1 until δ < ɛ { Gi,j Nl=1 Gi,l if i S ND ; 0 if i S D. Notice that P is never actually enumerated. Instead, a matrix P is formed. P comprises of contribution from G only; the contribution from u is left out completely. Because most webpages have only a handful of outlinks, P is mostly zeros and an extremely sparse matrix. The multiplication step x T P can hence be implemented very efficiently. The contribution of u is subsequently added in during the y + du step. Based on this approach each iteration of the loop can be performed in ON operations. In comparison, if P was explicitly enumerated each iteration would require ON 2 operations, which is prohibitively expensive due to the large size of N. 6 We emphasize that the computational savings of the PageRank algorithm are achieved by recognizing P is separable into a sparse matrix P plus a dense vector u. Multiplication is done separately to those components and added together subsequently. 4 A typical value for c is between 0.85 and It is shown in [6] that c controls the convergence rate of the PageRank algorithm. 5 The positivity ensures a direct positive-probability path between any two pages, and hence the irreducible and aperiodic properties. cf. pp of [2], [13] 6 As of year 2000, the number of webpages is on the order of See [15].

3 3. AS A LUMPABLE MARKOV CHAIN Our goal in this paper is to present an algorithm that is a substantial improvement over Algorithm 1. Our approach is based on the observation that the Markov chain associated with P is lumpable cf. [13], [4]. In general, a Markov chain is lumpable if its transition probabilities satisfy certain properties that allow its states nodes to be combined into blocks super nodes. The block-level transitions yield another Markov chain the transition probabilities of which can be very easily calculated. Unlike conventional state aggregation cf. [3], [14], [17], lumping doesn t require prior knowledge or computation of aggregation weights. Lumping is thus very effective in reducing the size of the statespace. Definition 1. Suppose M R n n is the transition probability matrix of a homogeneous discrete-time Markov chain with n states. Let S 1, S 2,..., S p {1, 2,..., n} be such that p S l = {1, 2,..., n} l=1 1 l m p S l S m = Then the Markov chain is said to be lumpable with respect to the partition S 1, S 2,...S p if for all l, m {1, 2,..., p}, every i S l satiesfies j S m Mi, j = cl, m 3 where the right-hand side is a constant that depends only on l and m. The transition probability matrix for the p-by-p lumped chain is M = [cl, m] 4 The key here is that the right-hand side of 3 depends only on l and m. We think of S 1, S 2,..., S p as blocks of nodes, and 3 requires every node in the same block to depart for another block with an identical probability. There is thus some notion of symmetry within each block. Lumping the Markov chain is to exploit this symmetry by discarding the within-block details and focusing on the between-block transitions. Because of symmetry the computation of the block-level transition probability matrix 4 involves minimal effort. We now claim the Markov chain associated with P is lumpable with respect to the partition that all the dangling nodes are lumped into one block and each non-dangling node is a singleton block. PROPOSITION 1. For each k S ND, define S k = {k}. The homogeneous discrete-time Markov chain associated with P is lumpable with respect to the partition consisted of S k for each k S ND and S D. This is a partition with cards ND + 1 blocks. PROOF. 3 is by construction true for all l, m S ND. In addition, for every i S D, j S m P i, j = P i, m = um for all m S ND and j S D P i, j = which is a constant. See 2. j S D uj Thus by lumping the dangling nodes into one block we can obtain a Markov chain with just cards ND + 1 states compared to cards ND +cards D states in the original chain. The lumped Markov chain is irreducible and aperiodic, since its transition probability matrix is necessarily positive. The Perron-Frobenius Theorem and the Power Method guarantee the existence of a unique limiting distribution. This limiting distribution is a vector with cards ND + 1 components: cards ND components are identical to the components of π corresponding to S ND ; the remaining component equals the sum of the components of π corresponding to S D. The benefit of lumping the dangling nodes is that typically cards D is very large often several times larger than cards ND. 7 Lumping can dramatically reduce the size of the transition probability matrix and enables the limiting distribution to be computed with much less effort. 4. A TWO-STAGE ALGORITHM While the lumped chain can be used to compute the limiting probability of the non-dangling nodes i.e. πk, k S ND, we re still left with the task of computing the limiting probability of the dangling nodes i.e. πk, k S D. As it turns out, once we have computed the limiting probability for the non-dangling nodes, the limiting probability for the dangling nodes can be computed with very little additional work. This is done by considering yet another Markov chain this time by combining all the non-dangling nodes into one block and treating each dangling node as a singleton block. This is a Markov chain with cards D+1 states. We emphasize this Markov chain is not obtained by lumping, as lumping is not applicable with respect to this particular partition that we re considering. Rather, we use the traditional state aggregation technique to combine the non-dangling nodes. The procedure requires aggregation weights, but we can readily compute these weights using the limiting probability of the non-dangling nodes. To summarize, we propose a two-stage algorihm that can be outlined as follows: 1. Compute the transition probability matrix P 1 of the lumped chain where the dangling nodes are combined into one block. See Proposition Compute the limiting distribution of P 1. This gives us πk for each k S ND and k S D πk. The computation is an iterative procedure similar to Algorithm 1. This step constitutes the bulk of the total work. 3. Compute the weights for state aggregation, for each k S ND. πk m S πm ND 4. Compute the transition probability matrix P 2 of the Markov chain where the non-dangling nodes are combined into a block. This requires the weights computed in Step Compute the limiting distribution of P 2. This yields πk for each k S D and k S ND πk. The amount of work involved is negligible compared to Step 2, as we ll show. 6. Concatenate the results from Step 2 and Step 5 to get πk for all k S. This is the limiting distribution of P, or the PageRank vector. 7 According to [11], a 2001 crawl by Stanford s WebBase project [7] contains 290 million pages in total; only 70 million are nondangling.

4 We ll now formalize Steps 1, 2, 4, and 5. We ll give specific numerical algorithms for an efficient implementation of these steps, and discuss performance issues. To simplify the notation, throughout subsequent sections we ll assume, without loss of generality, that S ND = {1, 2,..., K} and S D = {K + 1, K + 2,..., N}. P can be partitioned accordingly as P11 P P = 12 P 21 P 22 = P 11 P 12 e N K u T K e N K u T N K P 11, P 12, P 21, and P 22 are K-by-K, K-by-N K, N K- by-k, and N K-by-N K blocks respectively; the first K rows and columns are associated with the non-dangling nodes, and the last N K rows and columns are associated with the dangling nodes. Likewise, we partition the personalization vector u T = [ u T K u T N K]. Note that 5 follows from Formalizing Steps 1 & 2 According to Proposition 1, the transition probability matrix for the lumped chain is given by P 1 P11 P 12e N K = u T K u T N Ke N K This is a K + 1-by-K + 1 matrix. The matrix is positive, and each row sums to 1. Recall that Algorithm 1 was able to achieve each iteration of the multiplication step x T P in ON as opposed to ON 2 by separating P into two parts: a sparse matrix P and a dense vector u. Multiplication was done to these parts separately, and subsequently added together. Here we can do the same by separating P 1 into sparse and dense-vector parts. A mathematically equivalent form of 6 is 8 P P 1 1 = c 0 where for 1 i K P 1 i, : = ũ = 1 cek + 1 Gi, 1 : K N l=1 Gi, l e K uk 1 α 5 6 ũ T 7 Gi, 1 : K N l=1 Gi, l e K 8 and α = u T Ke K. Notice that P 1 is K-by-K + 1 and extremely sparse. Computationally 7 is a much more efficienct form than 6. Let x R K+1 1 be an arbitrary componentwise non-negative vector with unit one-norm 9, then x T P 1 = cx1 : K T P c + cxk + 1 ũ T 10 Because P 1 is extremely sparse, this multiplication requires only OK operations. On the other hand if one had used 6 directly, OK 2 operations would be needed. Based on this representation we can formalize an algorithm which combines Steps 1 and 2 above: ALGORITHM 2 STAGE 1. 8 See 2. 9 The components sum to 1. 9 form P 1 and ũ via 8 and 9 select y R K+1, y 0, y 1 = 1 do x = y y T = cx1 : K T P 1 +1 c + cxk + 1 ũ T δ = y x 1 until δ < ɛ Algorithm 2 converges to the limiting distribution of P 1, and yields πk, k = 1, 2,...K and N m=k+1 πm. We now have K components of the PageRank vector. Also, we use these results to compute the aggregation weights for Steps 4 & 5. We designate these weights as a vector η R K 1. For k = 1, 2,..., K ηk = πk K m=1 πm Formalizing Steps 4 & 5 With the aggregation weights of 11, we can compute the transition probability matrix P 2 by aggregating the non-dangling nodes. According to [14], we have P 2 η T P = 11e K η T P 12 e N Ku T Ke K e N Ku T N K η T P = 11e K η T P 12 αe N K e N Ku T N K 12 This is a N K+1-by-N K+1 matrix. N K =cards D. Each row of the matrix sums to 1. In addition, as the Perron Frobenius Theorem guarantees the aggregation weights 11 to be positive, P 2 is also positive. The Markov chain associated with P 2 is thus irreducible and aperiodic, and it has a unique limiting distribution. A remarkable property of P 2 is that all the rows starting from the second are identical check; the only unique rows are the first and second. In other words, P 2 is a rank-two matrix. This property allows us to compute the limiting distribution of P 2 with very little work and storage. We ll return to this shortly. Meanwhile, we can also derive an alternative form of P 2 which is computationally efficient. We again split P 2 into sparse and dense parts. Since P 2 is rank-two we need only look at the first two rows: P 2 β w T 1 c 1 : 2, : = c + α u T N K where w T = K i=1 Gi, K + 1 : N ηi N l=1 Gi, l 13 β = 1 w T e N K 14 Notice that w is the weighted sum of K extremely sparse vectors and can be formed very cheaply. The work for computing β is even less. Multiplication with a vector can be efficienctly implemented as follows. For x R N K+1 1 an arbitrary componenwise nonnegative vector with unit one-norm, we have x T P 2 = cx1 β w T + 1 cx1 α u T N K 15 In other words, multiplication with a vector can be implemented as just the sum of two vectors, which is extremely efficient.

5 Based on this representation we can formalize an algorithm which combines Steps 4 and 5 above: ALGORITHM 3 STAGE 2. Suppose we have computed the aggregation weights η according to 11. form w and β via 13 and 14 select x 0 R N K+1, x 0 0, x 0 1 = 1 for i = 1 : 3 x i T = cx i 1 1 β w T end + 1 cx i 1 1 α u T N K if x 3 x 2 1 < ɛ else end z = x 3 % Perform Aitken Extrapolation for i = 1 : N K + 1 vi = x2 i x 1 i 2 x 3 i 2x 2 i+x 1 i end z = x 1 v Algorithm 3 is characteristically different from Algorithm 1 and Algorithm 2. The latter two algorithms amout to a numerical implementation of lim k π T 0 P k and will converge iteratively; in general convergence to a fixed tolerance will occur only after many iterations, and the number of iterations needed is never known ahead of time. On the other hand, Algorithm 3 requires only three iterations of the vector-matrix multiplication. After three iterations, either convergence has already occurred or, if not, the Aitken Extrapolation [12] is performed to extract the limiting distribution. In either case the limiting distribution is available after just three iterations, guaranteed. What makes this fast convergence possible is related to the property of P 2 being a positive, rank-two matrix with each row summing to one. In the next section we ll prove the correctness of Algorithm 3. Meanwhile, Algorithm 3 involves relatively little computational work. Each iteration of the main loop is just summing two vectors, and only three such iterations are needed; the additional work of the Aitken Extrapolation is also very mild. In fact, the amout of work for the whole of Algorithm 3 is far less than one iteration of Algorithm 2, which involves multiplication with a very large matrix. Considering that Algorithm 2 can take 100 or more iterations to converge, Algorithm 3 is relatively cost-free in comparison. Furthermore, the storage requirement is extremely mild. There is no explicit enumeration of a transition probability matrix because P 2 is rank-two, only two vectors are stored. As a consequence, computationally the work for Algorithm 3 is relatively negligible, and the overall efficiency of the two-stage algorithm rests entirely on Algorithm 2. We ll compare the performance of Algorithm 2 to Algorithm 1 in the next section. We ll show that Algorithm 2 requires much less work compared to Algorithm CONVERGENCE ANALYSIS We address two issues in this section. First, we compare the performance of Algorithm 2 to Algorithm 1. Second, we validate the correctness of Algorithm 3 for computing the limiting distribution of P Convergence of Algorithm 2 We show that Algorithm 2 can always be made to converge in as many or fewer iterations than Algorithm 1. We begin with a lemma. LEMMA 1. Let x 0 R N 1 be given. Define y 0 T = I 0 x 0 T. I is the K-by-K identity matrix. Consider the two sequences of iterates for l = 0, 1, 2,... Then for l = 0, 1, 2,... But x l+1 T = x l T P y l+1 T = y l T P 1 y l T = x l T I 0 PROOF. Suppose the claim is true for l 0. Then y l 0 T P 1 = x l0 T I 0 P11 P 12 e N K u T K = x l0 T I 0 P11 P 12 u T K u T N Ke N K u T N K I 0 P11 P 12 u T K u T = P N K Thus the claim must hold for l proof. I 0 Induction completes the PROPOSITION 2. Let x l and y l be defined as in the lemma. Then for l = 0, 1, 2,... y l+1 y l 1 x l+1 x l 1 PROOF. Following the lemma y l+1 y l 1 x l+1 x l 1 N = y l+1 K + 1 y l K + 1 = 0 N i=k+1 x l+1 i x l i N i=k+1 i=k+1 x l+1 i x l i x l+1 i x l i This shows that given Algorithm 1 is applied to P with some starting iterate, a related starting iterate can always be constructed for P 1 so that Algorithm 2 converges in as many or fewer iterations, with respect to the same tolerance. In addition, we note that Algorithm 2 requires much less computational work per iteration. The reasons are Algorithm 2 works with K + 1-vectors throughout; Algorithm 1 works with N-vectors. Algorithm 2 involves multiplying by a K + 1-by-K + 1 sparse matrix; Algorithm 1 involves multiplying by an N- by-n sparse matrix With sparse matrices, the number of non-zeros is a much better guagae of performance than the size of the matrix. It is easily seen that the number of non-zeros in P 1 is only a fraction of P s.

6 Algorithm 2 gets rid of a norm-taking step duing each iteration of the loop by requiring the initial vector to have a unit-norm. In comparison, Algorithm 1 requires taking one additional norm, adding ON operations to each iteration. This suggests each iteration of Algorithm 2 is OK; on the other hand, each iteration of Algorithm 1 is ON. While the actual reduction in work depends on additional factors such as the distribution of the non-zero elements, the difference is rather dramatic as K is typically only a fraction of N. As it turns out, the amount of time the entire two-stage algorithm takes is roughly OK of what it takes for Algorithm 1. Furthermore, the two-stage algorithm never explicitly enumerates the ON entire transition probability matrix only a part of it is used at any given time. Consequently, on systems with insufficient memory to store the entire transition probability matrix 11, the performacne advantage of the two-stage algorithm is even more pronounced as the frequency of disk access is reduced. 5.2 Convergence of Algorithm 3 We now show that Algorithm 3 indeed computes the unique limiting distribution of P 2. We remark it can be easily shown that the limiting distribution of P 2 is in fact the left eigenvector associated with the eingenvalue 1. Thus it suffices to show that Algorithm 3 computes this eigenvector. LEMMA 2. To simplify notation denote M = N K + 1. i.e. P 2 is M-by-M is the dominant eigenvalue of P 2, with an algebraic and geometric multiplicity of one. 0 is also an eigenvalue, with a geometric multiplicity of M If P 2 does not have another distinct eigenvalue, then x T P 2 P 2 = x T P 2 P 2 P 2 for any x R M 1. In other words, the sequence has converged exactly to either the left eigenvector associated with the eigenvalue 1 or the null vector 0. PROOF. We have stated earlier that P 2 is positive, rank-two, and has rows that sum to 1. The Perron-Frobenius Theorem cf. pp of [2], pp of [8] establishes the first claim. Next, suppose P 2 does not have a third distinct eigenvalue. The algebraic multiplicity of 0 is necessarily M 1. The Jordan canonical form of P 2 establishes the second claim. See pp. 317 of [5], pp of [8]. PROPOSITION 3. Let x R M 1 be given. If x T P 2 P 2 x T P 2 P 2 P 2 then there exists another eigenvalue λ of P 2 such that 0 < λ < 1. In addition, for l = 1, 2,..., where x T P 2 l = c1 v T 1 + c 2 λ l v T 2 v T 1 P 2 = v T 1 v T 2 P 2 = λv T 2 and c 1 and c 2 are constants. 11 This is almost always the case. It is reported in [12] that a modest dataset with 290 million pages requires as much as 6GB; in comparison, the amount of addressable memory on a 32-bit machine is 4GB. If Algorithm 1 is used, disk use cannot be avoided. Log One Norm Error Log Error at Each Iteration Standard Stage Iteration Figure 1: Log-error at each iteration, c = PROOF. The first part follows directly from the lemma. An examination of the geometric multiplicities of P 2 reveals the existence of a full set of eigenvectors that span R M 1. Writing x as a combination of these eigenvectors establishes the second part. Proposition 3 can be rephrased as follows. Consider an arbitrary vector repeatedly multiplied by P 2. Either exact convergence will have occurred after three iterations or it won t. If converged then we re done. If not, we re still assured that all subsequent iterates are contained in the span of the first and sector eigenvectors. This knowledge enables us to extract the first eigenvector i.e. limiting distribution by subtracting away the component along the second eigenvector. 12 This validates the correctness of Algorithm A NUMERICAL EXPERIMENT The analysis in the preceding section suggests the two-stage algorithm ought to take only a fraction of the time to converge compared to the standard algorithm. We now show that this is indeed the case with an actual numerical experiment. Our results are based on a subset of N = 451, 237 webpages sampled from a 2001 crawl by the Stanford WebBase project. The number of dangling nodes in this sample is K = 137, 212, and K is roughly 30%. The experiment was conducted on a 2.4GHz N dual-xeon workstation with 4GB of RAM and a 70GB 4 RAID- 0 hard disk system. The amount of memory is ample for the size of the data, and there is thus no complications of disk access in the results. The following table summarizes the dimeonsions and the number of non-zero elements of the matrices involved in the computation. P P 1 u ũ dims N N K K + 1 N K + 1 nnz 1,082, , , ,213 Two values of c were tried. c = 0.85 as a fast-converging example, and c = 0.95 as a slow converging example. The tolerance ɛ is set to The following table shows that with either value of c, the total time for the two-stage algorithm is just 20% of the standard PageRank algorithm s: 12 This is precisely what the Aitken Extrapolation does. For details on the Aitken Extrapolation see [12].

7 1 2 3 Log Error at First 15 Iterations Standard Stage 1 We have presented a way of effectively managing the dangling nodes. Regardless of the number of dangling nodes present which is usually a very large number, the total computation time is only proportional to the number of non-dangling nodes. Log One Norm Error Iteration Figure 2: A blow-up of the first 15 iterations of Figure 1. Time in sec. No. iterations c = 0.85 c = 0.95 Step Step Step Step Step Total Standard In either case, Stage 1 Steps 1 & 2 constitutes the bulk of the work and makes up 95% and 98% of the total time. Furthremore, the error of Stage 1 at each iteration is consistently below that of the standard algorithm s, but the gap eventually diminishes; in both cases the two algorithms terminate after the same number of iterations. See Figure 1 and Figure 2. This coincides with the prediction by Proposition 2. The amount of time needed for Stage 2 Steps 4 & 5 is miniscule in comparison. When the distributions from the two stages are concatenated, we obtain the entire PageRank vector. The one-norm difference between this vector and the one produced by the standard algorithm is when c = 0.85 and when c = TREATMENT OF DANGLING NODES In this section we address the issue of dangling nodes from a modeling perspective. There are two sources of dangling nodes. A webpage is dangling if it genuinely has no outlinks. On the other hand, we also consider a webpage to be dangling if we simply have no information regarging its outlinks. The latter can arise when the webpage has been referenced i.e. linked to by another webpage in a crawl, but is itself not included in the crawl this is a very typical scenario as the vastness and rapidly-changing nature of the Web render a complete crawl impossible [16]. How to best treat the dangling nodes is very much a philosophical question. Some people choose to leave them out of the computation completely; this amounts to computing the limiting distribution of just the leading K-by-K matrix, or P 11, of 5 and defining it as the PageRank vector. On the other hand, others have chosen to include the dangling nodes in the computation by inserting the personalization vector u into the rows of P corresponding to the dangling nodes. See 2. In this paper, we have adopted the second view for a number of reasons: Because in a typical situation there usually is a very large number of dangling nodes, throwing away all of them is to give up an enormous amount of information. First, we d not have a way of ranking any of the dangling pages, which ironically is most of the webpages. Second, the resulting PageRank vector would not incorporate any information from P 12. Keep in mind that P 12 consists of actual links. It is completely legitimate and, in terms of probability mass and contribution to the final limiting distribution, is no less important that P 11. There are some very important classes of webpages urls that are by nature dangling. These include PDFs, images, movies, and etc. It would be a significant loss if one could not search for research papers or movie trailers, for example. While inserting the personalization vector into the rows of the dangling nodes may seem arbitrary at first, the practice is not necessarily so inappropriate. What is asserted here is that if there are no outlinks, the websurfer can move to any page according to preference. This is certainly not an inaccurate way to model transitions 13, and is in fact quite sensible from a behavioral perspective 14. While on this issue of whether to throw away the dangling nodes, we shall also mention that a very common suggestion is to not throw away the dangling nodes in the overall computation but do leave them out until the very last stages [16]. In other words, one d first compute the limiting distribution of P 11, pad it with more elements, and use that as the initial vector for the entire matrix P. It is believed that this procedure accelerates convergence. While this may well accelerate convergence in particular cases, it isn t true in general. 15 In general, the limiting distribution of P 11 doesn t coincide with or even approximate the first k components P s limiting distribution. What is true, from the theory of stochastc complementation [14], is that the first k components of P s limiting distribution, when normalized, coincide with the limiting distribution of the stochastic complement of P 11: S 11 = P 11 + P 12 I P 22 1 P 21 If P were a nearly completely decomposable matrice, the offdiagonal blocks would contain negligible probability mass, and S 11 and P 11 would be roughly the same; in that case the limiting distribution of P 11 would approximate the first K components of P s limiting distribution [3]. However, P is not NCD under our present partition. The dangling nodes and the non-dangling nodes do not form two NCD subsets, as a significant probability mass can be found in each block of 5. The procedure described in this paper renders this common practice irrelevant. What we ve showned is as follows. Contrary to 13 In reality, a websurfer can always go to a page by directly entering the url. An explicit link is not the only way to move to a page. 14 The bottom line is the Markov chain model of PageRank is very much a hybrid model of structure i.e. links and behavior i.e. preferences/personalization. Its success may well lie in its ability to recognize the importance of both aspects. 15 See the Appendix for a simple counter-example constructed by personalization.

8 common practice, we cannot hope to use the limiting distribution of P 11 to approximate the first K components of P s limiting distribution. On the other hand, we can compute the latter exactly by computing the limiting distribution of the lumped chain. The transition probability matrix 6 of the lumped chain is of course K + 1-by-K + 1 and effectively the same size as P 11. And once that is done with very little additional work we obtain the entire PageRank vector. 8. CONCLUDING REMARKS In this paper we present a fast two-stage algorithm for computing the PageRank vector. We exploit the Markov chain associated with PageRank being lumpable. In the first stage, we compute the limiting distribution of a Markov chain where the dangling nodes are lumped into one; in the second stage, we compute the limiting distribution for a chain where the non-dangling nodes are combined. The two limiting distributions are concatenated to form the PageRank vector. Most of the work is on computing the lumped chain, and the total work is only proportional to the number of nondagling nodes. A numerical experiment shows that in practice the two-stage algorithm finishes in only a fraction of the time required by the standard PageRank algorithm in this case as little as 20%. Furthermore, only a part of the transition probability matrix is enumerated at any given time, and the memory requirement is accordingly mild. On machines where the memory is limited relative to the size of the problem which is almost always the case in reality, the performance gap between the two-stage algorithm and the standard algorithm is likely to be even wider. Lastly, our algorithm represents an alternative to the common practice of not including the dangling nodes until the last stages of the computation. That practice lacks theoretical support and cannot be expected to accelerate convergence in general. On the other hand, the algorithm described here is provable, generally applicable, and achieves the desired speed gains. 9. ACKNOWLEDGMENTS The authors d like to thank Hector Garcia-Molina, Andreas Paepcke, Sriram Raghavan, and Gary Wesley of the Stanford WebBase project in assisting access to their data, and Sepandar Kamvar, Wang Lam, Amy Langville, and Sebastiano Vigna for their helpful comments. 10. ADDITIONAL AUTHORS Additional author: Stephanie Leung Computer Science Department, Stanford University. wleung@stanford.edu. 11. REFERENCES [1] A. Arasu, J. Novak, A. Tomkins, and J. Tomlin. PageRank computation and the structure of the Web: experiments and algorithms. In Proceedings of the Eleventh International World Wide Web Conference, Poster Track, [2] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. SIAM Press, Pennsylvania, [3] W. L. Cao and W. J. Stewart. Iterative aggregation/disaggregation techniques for nearly uncoupled Markov chains. Journal of the Association for Computing Machinery, 32, pages , [4] T. Dayar and W. J. Stewart. Quasi-lumpability, lower bounding coupling matrices, and nearly completely decomposable Markov chains. SIAM Journal on Matrix Analysis and Applications, Vol. 18, :2, pages , [5] G. H. Golub and C. F. V. Loan. Matrix Computation. John Hopkins University Press, Third Edition. [6] T. H. Haveliwala and S. D. Kamvar. The second eigenvalue of the Google matrix. Technical report, Stanford University, [7] J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: a repository of web pages. In Proceedings of the Ninth International World Wide Web Conference, [8] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, [9] G. Jeh and J. Widom. Scaling personalized web search. In Proceedings of the Twelfth International World Wide Web Conference, [10] S. D. Kamvar, T. H. Haveliwala, and G. H. Golub. Adaptive methods for the computation of PageRank. Technical report, Stanford University, [11] S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the Web for computing PageRank. Technical report, Stanford University, [12] S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Extrapolation methods for accelerating PageRank computations. In Proceedings of the Twelfth International World Wide Web Conference, [13] J. G. Kemeny and J. L. Snell. Finite Markov Chains. D. Van Norstrand, New York, [14] C. D. Meyer. Stochastic complementation, uncoupling Markov chains, and the theory of nearly reducible systems. SIAM Review, Vol. 31, :2, pages , [15] C. Moler. The world s largest matrix computation. MATLAB News & Notes, pages 12 13, October [16] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, [17] H. A. Simon and A. Ando. Aggregation of variables in dynamic systems. Econometrica, pages 29: , APPENDIX Here is a small counter-example. It demonstrates the common practice of including the dangling nodes only during the last stages of computation doesn t always accelerate convergence. Take the link matrix G = Here, K = 2 and N = 4. Take c = 0.85 and u T = 1 3a where a = 43. Thus P = It can be verified that the limiting distribution of the leading 2- by-2 submatrix of P is , while for the entire matrix it is The bottom line is the limiting distribution of the 2-by-2 yields a worse starting iterate than the uniform vector, and the desired acceleration is not observed. For more details on why see [14]. a a a,

Computing PageRank using Power Extrapolation

Computing PageRank using Power Extrapolation Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation

More information

A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations

A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations Internet Mathematics Vol. 4, No. 4: 299 327 A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations ChrisP.Lee,GeneH.Golub,andStefanosA.Zenios Abstract. The PageRank model pioneered

More information

The Second Eigenvalue of the Google Matrix

The Second Eigenvalue of the Google Matrix The Second Eigenvalue of the Google Matrix Taher H. Haveliwala and Sepandar D. Kamvar Stanford University {taherh,sdkamvar}@cs.stanford.edu Abstract. We determine analytically the modulus of the second

More information

Updating PageRank. Amy Langville Carl Meyer

Updating PageRank. Amy Langville Carl Meyer Updating PageRank Amy Langville Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC SCCM 11/17/2003 Indexing Google Must index key terms on each page Robots crawl the web software

More information

Calculating Web Page Authority Using the PageRank Algorithm

Calculating Web Page Authority Using the PageRank Algorithm Jacob Miles Prystowsky and Levi Gill Math 45, Fall 2005 1 Introduction 1.1 Abstract In this document, we examine how the Google Internet search engine uses the PageRank algorithm to assign quantitatively

More information

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer UpdatingtheStationary VectorofaMarkovChain Amy Langville Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC NSMC 9/4/2003 Outline Updating and Pagerank Aggregation Partitioning

More information

c 2005 Society for Industrial and Applied Mathematics

c 2005 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 27, No. 2, pp. 305 32 c 2005 Society for Industrial and Applied Mathematics JORDAN CANONICAL FORM OF THE GOOGLE MATRIX: A POTENTIAL CONTRIBUTION TO THE PAGERANK COMPUTATION

More information

Extrapolation Methods for Accelerating PageRank Computations

Extrapolation Methods for Accelerating PageRank Computations Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Stanford University sdkamvar@stanford.edu Taher H. Haveliwala Stanford University taherh@cs.stanford.edu Christopher D. Manning

More information

Analysis of Google s PageRank

Analysis of Google s PageRank Analysis of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills AN05 p.1 PageRank An objective measure of the citation importance of a web page [Brin & Page 1998]

More information

Mathematical Properties & Analysis of Google s PageRank

Mathematical Properties & Analysis of Google s PageRank Mathematical Properties & Analysis of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca M. Wills Cedya p.1 PageRank An objective measure of the citation importance

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Updating Markov Chains Carl Meyer Amy Langville

Updating Markov Chains Carl Meyer Amy Langville Updating Markov Chains Carl Meyer Amy Langville Department of Mathematics North Carolina State University Raleigh, NC A. A. Markov Anniversary Meeting June 13, 2006 Intro Assumptions Very large irreducible

More information

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google

More information

Analysis and Computation of Google s PageRank

Analysis and Computation of Google s PageRank Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca S. Wills ANAW p.1 PageRank An objective measure of the citation importance of a web

More information

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER ILSE C.F. IPSEN AND STEVE KIRKLAND Abstract. The PageRank updating algorithm proposed by Langville and Meyer is a special case

More information

Krylov Subspace Methods to Calculate PageRank

Krylov Subspace Methods to Calculate PageRank Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

A Note on Google s PageRank

A Note on Google s PageRank A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to

More information

PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES

PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES ILSE CF IPSEN AND TERESA M SELEE Abstract We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic

More information

A hybrid reordered Arnoldi method to accelerate PageRank computations

A hybrid reordered Arnoldi method to accelerate PageRank computations A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J MATRIX ANAL APPL Vol 29, No 4, pp 1281 1296 c 2007 Society for Industrial and Applied Mathematics PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES ILSE C F IPSEN AND TERESA M SELEE

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Affine iterations on nonnegative vectors

Affine iterations on nonnegative vectors Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider

More information

for an m-state homogeneous irreducible Markov chain with transition probability matrix

for an m-state homogeneous irreducible Markov chain with transition probability matrix UPDATING MARKOV CHAINS AMY N LANGVILLE AND CARL D MEYER 1 Introduction Suppose that the stationary distribution vector φ T =(φ 1,φ 2,,φ m ) for an m-state homogeneous irreducible Markov chain with transition

More information

Application. Stochastic Matrices and PageRank

Application. Stochastic Matrices and PageRank Application Stochastic Matrices and PageRank Stochastic Matrices Definition A square matrix A is stochastic if all of its entries are nonnegative, and the sum of the entries of each column is. We say A

More information

Fast PageRank Computation Via a Sparse Linear System (Extended Abstract)

Fast PageRank Computation Via a Sparse Linear System (Extended Abstract) Fast PageRank Computation Via a Sparse Linear System (Extended Abstract) Gianna M. Del Corso 1 Antonio Gullí 1,2 Francesco Romani 1 1 Dipartimento di Informatica, University of Pisa, Italy 2 IIT-CNR, Pisa

More information

On the eigenvalues of specially low-rank perturbed matrices

On the eigenvalues of specially low-rank perturbed matrices On the eigenvalues of specially low-rank perturbed matrices Yunkai Zhou April 12, 2011 Abstract We study the eigenvalues of a matrix A perturbed by a few special low-rank matrices. The perturbation is

More information

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1. Stationary Distributions Monday, September 28, 2015 2:02 PM No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1. Homework 1 due Friday, October 2 at 5 PM strongly

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

Uncertainty and Randomization

Uncertainty and Randomization Uncertainty and Randomization The PageRank Computation in Google Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it 1993: Robustness of Linear Systems 1993: Robustness of Linear Systems 16 Years

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Google Page Rank Project Linear Algebra Summer 2012

Google Page Rank Project Linear Algebra Summer 2012 Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant

More information

On the mathematical background of Google PageRank algorithm

On the mathematical background of Google PageRank algorithm Working Paper Series Department of Economics University of Verona On the mathematical background of Google PageRank algorithm Alberto Peretti, Alberto Roveda WP Number: 25 December 2014 ISSN: 2036-2919

More information

Finite-Horizon Statistics for Markov chains

Finite-Horizon Statistics for Markov chains Analyzing FSDT Markov chains Friday, September 30, 2011 2:03 PM Simulating FSDT Markov chains, as we have said is very straightforward, either by using probability transition matrix or stochastic update

More information

IR: Information Retrieval

IR: Information Retrieval / 44 IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC

More information

How does Google rank webpages?

How does Google rank webpages? Linear Algebra Spring 016 How does Google rank webpages? Dept. of Internet and Multimedia Eng. Konkuk University leehw@konkuk.ac.kr 1 Background on search engines Outline HITS algorithm (Jon Kleinberg)

More information

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive

More information

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

1998: enter Link Analysis

1998: enter Link Analysis 1998: enter Link Analysis uses hyperlink structure to focus the relevant set combine traditional IR score with popularity score Page and Brin 1998 Kleinberg Web Information Retrieval IR before the Web

More information

Lecture 7 Mathematics behind Internet Search

Lecture 7 Mathematics behind Internet Search CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU Outline Google is the

More information

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10 PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to

More information

The Google Markov Chain: convergence speed and eigenvalues

The Google Markov Chain: convergence speed and eigenvalues U.U.D.M. Project Report 2012:14 The Google Markov Chain: convergence speed and eigenvalues Fredrik Backåker Examensarbete i matematik, 15 hp Handledare och examinator: Jakob Björnberg Juni 2012 Department

More information

0.1 Naive formulation of PageRank

0.1 Naive formulation of PageRank PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more

More information

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University How works or How linear algebra powers the search engine M. Ram Murty, FRSC Queen s Research Chair Queen s University From: gomath.com/geometry/ellipse.php Metric mishap causes loss of Mars orbiter

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

Random Surfing on Multipartite Graphs

Random Surfing on Multipartite Graphs Random Surfing on Multipartite Graphs Athanasios N. Nikolakopoulos, Antonia Korba and John D. Garofalakis Department of Computer Engineering and Informatics, University of Patras December 07, 2016 IEEE

More information

Graph Models The PageRank Algorithm

Graph Models The PageRank Algorithm Graph Models The PageRank Algorithm Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2013 The PageRank Algorithm I Invented by Larry Page and Sergey Brin around 1998 and

More information

Fast PageRank Computation via a Sparse Linear System

Fast PageRank Computation via a Sparse Linear System Internet Mathematics Vol. 2, No. 3: 251-273 Fast PageRank Computation via a Sparse Linear System Gianna M. Del Corso, Antonio Gullí, and Francesco Romani Abstract. Recently, the research community has

More information

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states. Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring

More information

The Push Algorithm for Spectral Ranking

The Push Algorithm for Spectral Ranking The Push Algorithm for Spectral Ranking Paolo Boldi Sebastiano Vigna March 8, 204 Abstract The push algorithm was proposed first by Jeh and Widom [6] in the context of personalized PageRank computations

More information

Page rank computation HPC course project a.y

Page rank computation HPC course project a.y Page rank computation HPC course project a.y. 2015-16 Compute efficient and scalable Pagerank MPI, Multithreading, SSE 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

Data and Algorithms of the Web

Data and Algorithms of the Web Data and Algorithms of the Web Link Analysis Algorithms Page Rank some slides from: Anand Rajaraman, Jeffrey D. Ullman InfoLab (Stanford University) Link Analysis Algorithms Page Rank Hubs and Authorities

More information

Link Analysis. Leonid E. Zhukov

Link Analysis. Leonid E. Zhukov Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN*

SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN* SIAM J Matrix Anal Appl c 1994 Society for Industrial and Applied Mathematics Vol 15, No 3, pp 715-728, July, 1994 001 SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN* CARL D MEYER Abstract

More information

MAT1302F Mathematical Methods II Lecture 19

MAT1302F Mathematical Methods II Lecture 19 MAT302F Mathematical Methods II Lecture 9 Aaron Christie 2 April 205 Eigenvectors, Eigenvalues, and Diagonalization Now that the basic theory of eigenvalues and eigenvectors is in place most importantly

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

AN APPLICATION OF LINEAR ALGEBRA TO NETWORKS

AN APPLICATION OF LINEAR ALGEBRA TO NETWORKS AN APPLICATION OF LINEAR ALGEBRA TO NETWORKS K. N. RAGHAVAN 1. Statement of the problem Imagine that between two nodes there is a network of electrical connections, as for example in the following picture

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

MPageRank: The Stability of Web Graph

MPageRank: The Stability of Web Graph Vietnam Journal of Mathematics 37:4(2009) 475-489 VAST 2009 MPageRank: The Stability of Web Graph Le Trung Kien 1, Le Trung Hieu 2, Tran Loc Hung 1, and Le Anh Vu 3 1 Department of Mathematics, College

More information

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis CS-621 Theory Gems October 18, 2012 Lecture 10 Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis 1 Introduction In this lecture, we will see how one can use random walks to

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 1 Outline Outline Dynamical systems. Linear and Non-linear. Convergence. Linear algebra and Lyapunov functions. Markov

More information

Robust PageRank: Stationary Distribution on a Growing Network Structure

Robust PageRank: Stationary Distribution on a Growing Network Structure oname manuscript o. will be inserted by the editor Robust PageRank: Stationary Distribution on a Growing etwork Structure Anna Timonina-Farkas Received: date / Accepted: date Abstract PageRank PR is a

More information

A linear model for a ranking problem

A linear model for a ranking problem Working Paper Series Department of Economics University of Verona A linear model for a ranking problem Alberto Peretti WP Number: 20 December 2017 ISSN: 2036-2919 (paper), 2036-4679 (online) A linear model

More information

Volume in n Dimensions

Volume in n Dimensions Volume in n Dimensions MA 305 Kurt Bryan Introduction You ve seen that if we have two vectors v and w in two dimensions then the area spanned by these vectors can be computed as v w = v 1 w 2 v 2 w 1 (where

More information

eigenvalues, markov matrices, and the power method

eigenvalues, markov matrices, and the power method eigenvalues, markov matrices, and the power method Slides by Olson. Some taken loosely from Jeff Jauregui, Some from Semeraro L. Olson Department of Computer Science University of Illinois at Urbana-Champaign

More information

Link Analysis. Stony Brook University CSE545, Fall 2016

Link Analysis. Stony Brook University CSE545, Fall 2016 Link Analysis Stony Brook University CSE545, Fall 2016 The Web, circa 1998 The Web, circa 1998 The Web, circa 1998 Match keywords, language (information retrieval) Explore directory The Web, circa 1998

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Incompatibility Paradoxes

Incompatibility Paradoxes Chapter 22 Incompatibility Paradoxes 22.1 Simultaneous Values There is never any difficulty in supposing that a classical mechanical system possesses, at a particular instant of time, precise values of

More information

Topics in linear algebra

Topics in linear algebra Chapter 6 Topics in linear algebra 6.1 Change of basis I want to remind you of one of the basic ideas in linear algebra: change of basis. Let F be a field, V and W be finite dimensional vector spaces over

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

MATH36001 Perron Frobenius Theory 2015

MATH36001 Perron Frobenius Theory 2015 MATH361 Perron Frobenius Theory 215 In addition to saying something useful, the Perron Frobenius theory is elegant. It is a testament to the fact that beautiful mathematics eventually tends to be useful,

More information

Utilizing Network Structure to Accelerate Markov Chain Monte Carlo Algorithms

Utilizing Network Structure to Accelerate Markov Chain Monte Carlo Algorithms algorithms Article Utilizing Network Structure to Accelerate Markov Chain Monte Carlo Algorithms Ahmad Askarian, Rupei Xu and András Faragó * Department of Computer Science, The University of Texas at

More information

Cutting Graphs, Personal PageRank and Spilling Paint

Cutting Graphs, Personal PageRank and Spilling Paint Graphs and Networks Lecture 11 Cutting Graphs, Personal PageRank and Spilling Paint Daniel A. Spielman October 3, 2013 11.1 Disclaimer These notes are not necessarily an accurate representation of what

More information

Chapter 2: Matrix Algebra

Chapter 2: Matrix Algebra Chapter 2: Matrix Algebra (Last Updated: October 12, 2016) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). Write A = 1. Matrix operations [a 1 a n. Then entry

More information

CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES. D. Katz

CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES. D. Katz CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES D. Katz The purpose of this note is to present the rational canonical form and Jordan canonical form theorems for my M790 class. Throughout, we fix

More information

Using Markov Chains To Model Human Migration in a Network Equilibrium Framework

Using Markov Chains To Model Human Migration in a Network Equilibrium Framework Using Markov Chains To Model Human Migration in a Network Equilibrium Framework Jie Pan Department of Mathematics and Computer Science Saint Joseph s University Philadelphia, PA 19131 Anna Nagurney School

More information

a (b + c) = a b + a c

a (b + c) = a b + a c Chapter 1 Vector spaces In the Linear Algebra I module, we encountered two kinds of vector space, namely real and complex. The real numbers and the complex numbers are both examples of an algebraic structure

More information

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Math 471 (Numerical methods) Chapter 3 (second half). System of equations Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular

More information

Math 291-2: Lecture Notes Northwestern University, Winter 2016

Math 291-2: Lecture Notes Northwestern University, Winter 2016 Math 291-2: Lecture Notes Northwestern University, Winter 2016 Written by Santiago Cañez These are lecture notes for Math 291-2, the second quarter of MENU: Intensive Linear Algebra and Multivariable Calculus,

More information

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Jevin West and Carl T. Bergstrom November 25, 2008 1 Overview There

More information

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section

More information