A Fast Two-Stage Algorithm for Computing PageRank
|
|
- Roland Mitchell
- 5 years ago
- Views:
Transcription
1 A Fast Two-Stage Algorithm for Computing PageRank Chris Pan-Chi Lee Stanford University Gene H. Golub Stanford University Stefanos A. Zenios Stanford University ABSTRACT In this paper we present a fast two-stage algorithm for computing the PageRank [16] vector. Our algorithm exploits the observation that the homogeneous discrete-time Markov chain associated with PageRank is lumpable [13]; the lumpable subset of nodes are precisely the dangling nodes. As a result the algorithm can converge in a fraction of the time compared to the standard PageRank algorithm [16]. On data of 451,237 webpages, our two-stage algorithm converged in only 20% of the time compared the standard PageRank algorithm. The algorithm described here also replaces a common practice which is in general not correct. Namely, the practice of including the dangling nodes only during the last stages of computation [16] doesn t necessarily accelerate convergence in a general context; on the other hand, our algorithm is provable, generally applicable, and achieves the desired speed gains. Keywords PageRank, link analysis, dangling nodes, Power Method, eigenvector computation, limiting distribution, statespace reduction, state aggregation, lumpable Markov chains 1. INTRODUCTION Aside from its commercial success, the PageRank approach to ranking webpages has generated a significant amout of interest in the research community. The Markov chain interpretation gives an explicit model for web traffic and surfer behavior, yet the computation poses a numerically daunting challenge [15]. With billions of webpages already in existence, computing the PageRank vector is a very time-consuming procedure. It is reported in [11] that the computation of a PageRank vector over 290 million webpages requires as much as 3 hours 1 ; the computing time for a realistically large subset of the entire web would take days. Furthremore, frequent computation of the PageRank vector is often necessary. With webpages constantly updated, added, or removed, the PageRank vector needs to be re-computed continusouly to maintain timeliness and relevance of the search results. In the context of personalized web search [9], a number of PageRank vectors need to be computed to Scientific Computing & Computational Mathematics Program Department of Computer Science Division of Operations, Information, & Technology, Graduate School of Business 1 On a 1.5GHz AMD Athlon with 3.5GB of RAM. reflect the preferences of different classes of websurfers. Clearly, there is a demand for faster algorithms. The PageRank vector can be regarded as the limiting distribution of a homogeneous discrete-time Markov chain that jumps from webpage to webpage. In this paper we present a fast algorithm for computing the PageRank vector. The algorithm exploits the observation that the Markov chain is in fact lumpable [13]. This algorithm proceeds in two stages. In the first stage, we compute the limiting distribution of the chain in which the dangling nodes [16] are combined into one super node; in the second stage, we compute the limiting distribution of the chain in which the non-dangling nodes are combined into one 2. When the limiting distribution of the two chains are concatenated, we recover the limiting distribution of the original chain i.e. the PageRank vector. As we shall see, this approach can dramatically reduce the overall amount of computing time. A number of papers discuss accelerating PageRank computation, and many of these focus on numerical linear algebra techniques. A Gauss-Seidel algorithm is discussed in [1] where the most recent component values of the PageRank vector are used in the computation. In [12], one periodically subtracts away approximation of the sub-dominant eigenvectors to accelerate convergence. It is noted in [11] that when sorted by url, the Google matrix has a block structure; hence, a PageRank vector can be computed separately for each block, and the results are pasted together to yield a good starting iterate for the entire matrix. It is noted in [10] that components of the PageRank vector converge at different rates, and hence by not re-computing components that have converged performance gains are realized. This paper contributes to this growing literature in a number of ways. First, by adopting a characteristically Markov chain view and observing that the chain associated with PageRank is lumpable, not only are we able to achive performance gains we bring to forefront a powerful technique for statespace reduction. This technique of lumping is disctinctively different from the better-known technique of state aggregation cf. [3], [14], [17], which we also make use of in this paper. Thus, we have a two-stage algorithm where during each stage a different statespace reduction method is used; the reduction is aggressive, the overall performance gains are very significant, and the concept is novel. Second, our approach is analyzable. Whereas previous methods sometimes rely on intuition and approximate arguments, our procedure can be analyzed with greater precision, leading to some very interesting results. In addition, we show that the common practice of including the dangling nodes only during the last stages of computation [16] does not accelerate convergence in general and can be replaced by our present 2 In this paper, the terms node, state, and webpage are used interchangeably.
2 algorithm. Lastly, our approach is easily combined with many other methods to exploit even greater performance gains. For example, all of the existing methods described above can be combined with our approach, especially during the first-stage of our algorithm. Notation Notation in this paper is as follows. If v is a vector, then vi denotes the i-th element of v. If M is a matrix, then Mi, j denotes the element in the i-th row and the j-th column; Mi : j, k : l denotes the elements in rows i through j and columns k through l; Mi, : denotes the entire i-th row; and so on. Supesrcripts and subscripts may have different meanings depending on the context, but the meaning is always made clear. An un-transposed vector is always a column vector; the transpose is superscripted with a T. 1 is the sum of the absolute values of a vector. For example, v 1 = i vi. The notation en means an n-dimensional vector of 1 s. 2. PAGERANK REVIEW The central idea behind PageRank is to regard web surfing as a Markov chain. Imagine a collection of webpages indexed as S = {1, 2,..., N}, and suppose we have a personalization vector u R N 1 which records a generic surfer s preference for each page in S. 3 Let this generic surfer be currently at some page i S. We assume at the next time step, the surfer will move to some j S according to the probabilty: { where Qi, j = Gi,j Nl=1 Gi,l uj if Gi, l = 1 for some l otherwise { 1 if there is an outlink from i to j Gi, j = 0 otherwise. The above definition has a nice interpretation. If the i-th page has outlinks, the surfer will move to one of the outlinks with an equal probability; this corresponds to the first case in the definition of Q above. If no outlink from i exists, the surfer will move to any page in S at a probability according to preference; this corresponds to the second case above. A page that has no outlink is called a dangling page. At each time step k = 0, 1, 2,..., we assume the surfer to jump from page to page according to the above probabilities. This gives rise to a homogeneous discrete-time Markov chain. Q = [Qi, j] is simply the transition probability matrix for this Markov chain. Let π 0 R N 1 denote the probability distribution of where the surfer is to be found at the initial time step, then the distribution for the k-th time step is given by πk T = π0 T Q k. The idea of PageRank is that the importance of webpages can be defined as the limiting probability distribution associated with the Markov chain as k. Such a limiting distribution can be interpreted has the proportion of time the surfer spends on each webpage, and certainly quantifies the notion of importance. Unfortunately, there is nothing in our definition so far that guarantees convergence to such a limiting distribution as k. The solution is to consider a closely related Markov chain by adding to Q a small shift. We take P = cq + 1 ce N u T 1 to be our new transition probability matrix. e N u T is simply a matrix where each row is u T. c is a constant in 0, 1, and P in some 3 Specifically, the personalization vector is assumed to be componentwise positive and normalized so that N l=1 ul = 1 sense approximates Q for c close to 1. 4 It is easily seen that P is a positive matrix, and each row sums to 1. Specifically, the Markov chain associated with P is irreducible and aperiodic. 5 The Perron- Frobenius Theorem and the Power Method cf. pp of [2], pp of [5] guarantee that for such a matrix a unique limiting distribution π T = lim k π0 T P k exists regardless of the initial distribution. The PageRank vector is defined to be this limiting distribution π. P is also known as the Google matrix. P has some interesting properties. Let S D denote the subset of S containing only the dangling nodes, and let S ND = S \ S D be the subset containing only the non-dangling nodes. Then { c Gi,: Nl=1 P i, : = + 1 Gi,l cut if i S ND ; 2 u T if i S D. All the rows of P that correspond to a dangling node are identically u T. The rows that correspond to a non-dangling node are separable into two components. The first component consists of contribution from G, or the actual outlinks; the second component is u T. Below we give the standard algorithm [16] for computing the PageRank vector π. The algorithm proceeds by first taking an arbitrary vector, then multiplying it by P repeatedly until convergence; it is an implementation of the Power Method. ALGORITHM 1 PAGERANK. form P, where P i, j = select any y R N 1 do x = y y T = cx T P d = x 1 y 1 y = y + du δ = y x 1 until δ < ɛ { Gi,j Nl=1 Gi,l if i S ND ; 0 if i S D. Notice that P is never actually enumerated. Instead, a matrix P is formed. P comprises of contribution from G only; the contribution from u is left out completely. Because most webpages have only a handful of outlinks, P is mostly zeros and an extremely sparse matrix. The multiplication step x T P can hence be implemented very efficiently. The contribution of u is subsequently added in during the y + du step. Based on this approach each iteration of the loop can be performed in ON operations. In comparison, if P was explicitly enumerated each iteration would require ON 2 operations, which is prohibitively expensive due to the large size of N. 6 We emphasize that the computational savings of the PageRank algorithm are achieved by recognizing P is separable into a sparse matrix P plus a dense vector u. Multiplication is done separately to those components and added together subsequently. 4 A typical value for c is between 0.85 and It is shown in [6] that c controls the convergence rate of the PageRank algorithm. 5 The positivity ensures a direct positive-probability path between any two pages, and hence the irreducible and aperiodic properties. cf. pp of [2], [13] 6 As of year 2000, the number of webpages is on the order of See [15].
3 3. AS A LUMPABLE MARKOV CHAIN Our goal in this paper is to present an algorithm that is a substantial improvement over Algorithm 1. Our approach is based on the observation that the Markov chain associated with P is lumpable cf. [13], [4]. In general, a Markov chain is lumpable if its transition probabilities satisfy certain properties that allow its states nodes to be combined into blocks super nodes. The block-level transitions yield another Markov chain the transition probabilities of which can be very easily calculated. Unlike conventional state aggregation cf. [3], [14], [17], lumping doesn t require prior knowledge or computation of aggregation weights. Lumping is thus very effective in reducing the size of the statespace. Definition 1. Suppose M R n n is the transition probability matrix of a homogeneous discrete-time Markov chain with n states. Let S 1, S 2,..., S p {1, 2,..., n} be such that p S l = {1, 2,..., n} l=1 1 l m p S l S m = Then the Markov chain is said to be lumpable with respect to the partition S 1, S 2,...S p if for all l, m {1, 2,..., p}, every i S l satiesfies j S m Mi, j = cl, m 3 where the right-hand side is a constant that depends only on l and m. The transition probability matrix for the p-by-p lumped chain is M = [cl, m] 4 The key here is that the right-hand side of 3 depends only on l and m. We think of S 1, S 2,..., S p as blocks of nodes, and 3 requires every node in the same block to depart for another block with an identical probability. There is thus some notion of symmetry within each block. Lumping the Markov chain is to exploit this symmetry by discarding the within-block details and focusing on the between-block transitions. Because of symmetry the computation of the block-level transition probability matrix 4 involves minimal effort. We now claim the Markov chain associated with P is lumpable with respect to the partition that all the dangling nodes are lumped into one block and each non-dangling node is a singleton block. PROPOSITION 1. For each k S ND, define S k = {k}. The homogeneous discrete-time Markov chain associated with P is lumpable with respect to the partition consisted of S k for each k S ND and S D. This is a partition with cards ND + 1 blocks. PROOF. 3 is by construction true for all l, m S ND. In addition, for every i S D, j S m P i, j = P i, m = um for all m S ND and j S D P i, j = which is a constant. See 2. j S D uj Thus by lumping the dangling nodes into one block we can obtain a Markov chain with just cards ND + 1 states compared to cards ND +cards D states in the original chain. The lumped Markov chain is irreducible and aperiodic, since its transition probability matrix is necessarily positive. The Perron-Frobenius Theorem and the Power Method guarantee the existence of a unique limiting distribution. This limiting distribution is a vector with cards ND + 1 components: cards ND components are identical to the components of π corresponding to S ND ; the remaining component equals the sum of the components of π corresponding to S D. The benefit of lumping the dangling nodes is that typically cards D is very large often several times larger than cards ND. 7 Lumping can dramatically reduce the size of the transition probability matrix and enables the limiting distribution to be computed with much less effort. 4. A TWO-STAGE ALGORITHM While the lumped chain can be used to compute the limiting probability of the non-dangling nodes i.e. πk, k S ND, we re still left with the task of computing the limiting probability of the dangling nodes i.e. πk, k S D. As it turns out, once we have computed the limiting probability for the non-dangling nodes, the limiting probability for the dangling nodes can be computed with very little additional work. This is done by considering yet another Markov chain this time by combining all the non-dangling nodes into one block and treating each dangling node as a singleton block. This is a Markov chain with cards D+1 states. We emphasize this Markov chain is not obtained by lumping, as lumping is not applicable with respect to this particular partition that we re considering. Rather, we use the traditional state aggregation technique to combine the non-dangling nodes. The procedure requires aggregation weights, but we can readily compute these weights using the limiting probability of the non-dangling nodes. To summarize, we propose a two-stage algorihm that can be outlined as follows: 1. Compute the transition probability matrix P 1 of the lumped chain where the dangling nodes are combined into one block. See Proposition Compute the limiting distribution of P 1. This gives us πk for each k S ND and k S D πk. The computation is an iterative procedure similar to Algorithm 1. This step constitutes the bulk of the total work. 3. Compute the weights for state aggregation, for each k S ND. πk m S πm ND 4. Compute the transition probability matrix P 2 of the Markov chain where the non-dangling nodes are combined into a block. This requires the weights computed in Step Compute the limiting distribution of P 2. This yields πk for each k S D and k S ND πk. The amount of work involved is negligible compared to Step 2, as we ll show. 6. Concatenate the results from Step 2 and Step 5 to get πk for all k S. This is the limiting distribution of P, or the PageRank vector. 7 According to [11], a 2001 crawl by Stanford s WebBase project [7] contains 290 million pages in total; only 70 million are nondangling.
4 We ll now formalize Steps 1, 2, 4, and 5. We ll give specific numerical algorithms for an efficient implementation of these steps, and discuss performance issues. To simplify the notation, throughout subsequent sections we ll assume, without loss of generality, that S ND = {1, 2,..., K} and S D = {K + 1, K + 2,..., N}. P can be partitioned accordingly as P11 P P = 12 P 21 P 22 = P 11 P 12 e N K u T K e N K u T N K P 11, P 12, P 21, and P 22 are K-by-K, K-by-N K, N K- by-k, and N K-by-N K blocks respectively; the first K rows and columns are associated with the non-dangling nodes, and the last N K rows and columns are associated with the dangling nodes. Likewise, we partition the personalization vector u T = [ u T K u T N K]. Note that 5 follows from Formalizing Steps 1 & 2 According to Proposition 1, the transition probability matrix for the lumped chain is given by P 1 P11 P 12e N K = u T K u T N Ke N K This is a K + 1-by-K + 1 matrix. The matrix is positive, and each row sums to 1. Recall that Algorithm 1 was able to achieve each iteration of the multiplication step x T P in ON as opposed to ON 2 by separating P into two parts: a sparse matrix P and a dense vector u. Multiplication was done to these parts separately, and subsequently added together. Here we can do the same by separating P 1 into sparse and dense-vector parts. A mathematically equivalent form of 6 is 8 P P 1 1 = c 0 where for 1 i K P 1 i, : = ũ = 1 cek + 1 Gi, 1 : K N l=1 Gi, l e K uk 1 α 5 6 ũ T 7 Gi, 1 : K N l=1 Gi, l e K 8 and α = u T Ke K. Notice that P 1 is K-by-K + 1 and extremely sparse. Computationally 7 is a much more efficienct form than 6. Let x R K+1 1 be an arbitrary componentwise non-negative vector with unit one-norm 9, then x T P 1 = cx1 : K T P c + cxk + 1 ũ T 10 Because P 1 is extremely sparse, this multiplication requires only OK operations. On the other hand if one had used 6 directly, OK 2 operations would be needed. Based on this representation we can formalize an algorithm which combines Steps 1 and 2 above: ALGORITHM 2 STAGE 1. 8 See 2. 9 The components sum to 1. 9 form P 1 and ũ via 8 and 9 select y R K+1, y 0, y 1 = 1 do x = y y T = cx1 : K T P 1 +1 c + cxk + 1 ũ T δ = y x 1 until δ < ɛ Algorithm 2 converges to the limiting distribution of P 1, and yields πk, k = 1, 2,...K and N m=k+1 πm. We now have K components of the PageRank vector. Also, we use these results to compute the aggregation weights for Steps 4 & 5. We designate these weights as a vector η R K 1. For k = 1, 2,..., K ηk = πk K m=1 πm Formalizing Steps 4 & 5 With the aggregation weights of 11, we can compute the transition probability matrix P 2 by aggregating the non-dangling nodes. According to [14], we have P 2 η T P = 11e K η T P 12 e N Ku T Ke K e N Ku T N K η T P = 11e K η T P 12 αe N K e N Ku T N K 12 This is a N K+1-by-N K+1 matrix. N K =cards D. Each row of the matrix sums to 1. In addition, as the Perron Frobenius Theorem guarantees the aggregation weights 11 to be positive, P 2 is also positive. The Markov chain associated with P 2 is thus irreducible and aperiodic, and it has a unique limiting distribution. A remarkable property of P 2 is that all the rows starting from the second are identical check; the only unique rows are the first and second. In other words, P 2 is a rank-two matrix. This property allows us to compute the limiting distribution of P 2 with very little work and storage. We ll return to this shortly. Meanwhile, we can also derive an alternative form of P 2 which is computationally efficient. We again split P 2 into sparse and dense parts. Since P 2 is rank-two we need only look at the first two rows: P 2 β w T 1 c 1 : 2, : = c + α u T N K where w T = K i=1 Gi, K + 1 : N ηi N l=1 Gi, l 13 β = 1 w T e N K 14 Notice that w is the weighted sum of K extremely sparse vectors and can be formed very cheaply. The work for computing β is even less. Multiplication with a vector can be efficienctly implemented as follows. For x R N K+1 1 an arbitrary componenwise nonnegative vector with unit one-norm, we have x T P 2 = cx1 β w T + 1 cx1 α u T N K 15 In other words, multiplication with a vector can be implemented as just the sum of two vectors, which is extremely efficient.
5 Based on this representation we can formalize an algorithm which combines Steps 4 and 5 above: ALGORITHM 3 STAGE 2. Suppose we have computed the aggregation weights η according to 11. form w and β via 13 and 14 select x 0 R N K+1, x 0 0, x 0 1 = 1 for i = 1 : 3 x i T = cx i 1 1 β w T end + 1 cx i 1 1 α u T N K if x 3 x 2 1 < ɛ else end z = x 3 % Perform Aitken Extrapolation for i = 1 : N K + 1 vi = x2 i x 1 i 2 x 3 i 2x 2 i+x 1 i end z = x 1 v Algorithm 3 is characteristically different from Algorithm 1 and Algorithm 2. The latter two algorithms amout to a numerical implementation of lim k π T 0 P k and will converge iteratively; in general convergence to a fixed tolerance will occur only after many iterations, and the number of iterations needed is never known ahead of time. On the other hand, Algorithm 3 requires only three iterations of the vector-matrix multiplication. After three iterations, either convergence has already occurred or, if not, the Aitken Extrapolation [12] is performed to extract the limiting distribution. In either case the limiting distribution is available after just three iterations, guaranteed. What makes this fast convergence possible is related to the property of P 2 being a positive, rank-two matrix with each row summing to one. In the next section we ll prove the correctness of Algorithm 3. Meanwhile, Algorithm 3 involves relatively little computational work. Each iteration of the main loop is just summing two vectors, and only three such iterations are needed; the additional work of the Aitken Extrapolation is also very mild. In fact, the amout of work for the whole of Algorithm 3 is far less than one iteration of Algorithm 2, which involves multiplication with a very large matrix. Considering that Algorithm 2 can take 100 or more iterations to converge, Algorithm 3 is relatively cost-free in comparison. Furthermore, the storage requirement is extremely mild. There is no explicit enumeration of a transition probability matrix because P 2 is rank-two, only two vectors are stored. As a consequence, computationally the work for Algorithm 3 is relatively negligible, and the overall efficiency of the two-stage algorithm rests entirely on Algorithm 2. We ll compare the performance of Algorithm 2 to Algorithm 1 in the next section. We ll show that Algorithm 2 requires much less work compared to Algorithm CONVERGENCE ANALYSIS We address two issues in this section. First, we compare the performance of Algorithm 2 to Algorithm 1. Second, we validate the correctness of Algorithm 3 for computing the limiting distribution of P Convergence of Algorithm 2 We show that Algorithm 2 can always be made to converge in as many or fewer iterations than Algorithm 1. We begin with a lemma. LEMMA 1. Let x 0 R N 1 be given. Define y 0 T = I 0 x 0 T. I is the K-by-K identity matrix. Consider the two sequences of iterates for l = 0, 1, 2,... Then for l = 0, 1, 2,... But x l+1 T = x l T P y l+1 T = y l T P 1 y l T = x l T I 0 PROOF. Suppose the claim is true for l 0. Then y l 0 T P 1 = x l0 T I 0 P11 P 12 e N K u T K = x l0 T I 0 P11 P 12 u T K u T N Ke N K u T N K I 0 P11 P 12 u T K u T = P N K Thus the claim must hold for l proof. I 0 Induction completes the PROPOSITION 2. Let x l and y l be defined as in the lemma. Then for l = 0, 1, 2,... y l+1 y l 1 x l+1 x l 1 PROOF. Following the lemma y l+1 y l 1 x l+1 x l 1 N = y l+1 K + 1 y l K + 1 = 0 N i=k+1 x l+1 i x l i N i=k+1 i=k+1 x l+1 i x l i x l+1 i x l i This shows that given Algorithm 1 is applied to P with some starting iterate, a related starting iterate can always be constructed for P 1 so that Algorithm 2 converges in as many or fewer iterations, with respect to the same tolerance. In addition, we note that Algorithm 2 requires much less computational work per iteration. The reasons are Algorithm 2 works with K + 1-vectors throughout; Algorithm 1 works with N-vectors. Algorithm 2 involves multiplying by a K + 1-by-K + 1 sparse matrix; Algorithm 1 involves multiplying by an N- by-n sparse matrix With sparse matrices, the number of non-zeros is a much better guagae of performance than the size of the matrix. It is easily seen that the number of non-zeros in P 1 is only a fraction of P s.
6 Algorithm 2 gets rid of a norm-taking step duing each iteration of the loop by requiring the initial vector to have a unit-norm. In comparison, Algorithm 1 requires taking one additional norm, adding ON operations to each iteration. This suggests each iteration of Algorithm 2 is OK; on the other hand, each iteration of Algorithm 1 is ON. While the actual reduction in work depends on additional factors such as the distribution of the non-zero elements, the difference is rather dramatic as K is typically only a fraction of N. As it turns out, the amount of time the entire two-stage algorithm takes is roughly OK of what it takes for Algorithm 1. Furthermore, the two-stage algorithm never explicitly enumerates the ON entire transition probability matrix only a part of it is used at any given time. Consequently, on systems with insufficient memory to store the entire transition probability matrix 11, the performacne advantage of the two-stage algorithm is even more pronounced as the frequency of disk access is reduced. 5.2 Convergence of Algorithm 3 We now show that Algorithm 3 indeed computes the unique limiting distribution of P 2. We remark it can be easily shown that the limiting distribution of P 2 is in fact the left eigenvector associated with the eingenvalue 1. Thus it suffices to show that Algorithm 3 computes this eigenvector. LEMMA 2. To simplify notation denote M = N K + 1. i.e. P 2 is M-by-M is the dominant eigenvalue of P 2, with an algebraic and geometric multiplicity of one. 0 is also an eigenvalue, with a geometric multiplicity of M If P 2 does not have another distinct eigenvalue, then x T P 2 P 2 = x T P 2 P 2 P 2 for any x R M 1. In other words, the sequence has converged exactly to either the left eigenvector associated with the eigenvalue 1 or the null vector 0. PROOF. We have stated earlier that P 2 is positive, rank-two, and has rows that sum to 1. The Perron-Frobenius Theorem cf. pp of [2], pp of [8] establishes the first claim. Next, suppose P 2 does not have a third distinct eigenvalue. The algebraic multiplicity of 0 is necessarily M 1. The Jordan canonical form of P 2 establishes the second claim. See pp. 317 of [5], pp of [8]. PROPOSITION 3. Let x R M 1 be given. If x T P 2 P 2 x T P 2 P 2 P 2 then there exists another eigenvalue λ of P 2 such that 0 < λ < 1. In addition, for l = 1, 2,..., where x T P 2 l = c1 v T 1 + c 2 λ l v T 2 v T 1 P 2 = v T 1 v T 2 P 2 = λv T 2 and c 1 and c 2 are constants. 11 This is almost always the case. It is reported in [12] that a modest dataset with 290 million pages requires as much as 6GB; in comparison, the amount of addressable memory on a 32-bit machine is 4GB. If Algorithm 1 is used, disk use cannot be avoided. Log One Norm Error Log Error at Each Iteration Standard Stage Iteration Figure 1: Log-error at each iteration, c = PROOF. The first part follows directly from the lemma. An examination of the geometric multiplicities of P 2 reveals the existence of a full set of eigenvectors that span R M 1. Writing x as a combination of these eigenvectors establishes the second part. Proposition 3 can be rephrased as follows. Consider an arbitrary vector repeatedly multiplied by P 2. Either exact convergence will have occurred after three iterations or it won t. If converged then we re done. If not, we re still assured that all subsequent iterates are contained in the span of the first and sector eigenvectors. This knowledge enables us to extract the first eigenvector i.e. limiting distribution by subtracting away the component along the second eigenvector. 12 This validates the correctness of Algorithm A NUMERICAL EXPERIMENT The analysis in the preceding section suggests the two-stage algorithm ought to take only a fraction of the time to converge compared to the standard algorithm. We now show that this is indeed the case with an actual numerical experiment. Our results are based on a subset of N = 451, 237 webpages sampled from a 2001 crawl by the Stanford WebBase project. The number of dangling nodes in this sample is K = 137, 212, and K is roughly 30%. The experiment was conducted on a 2.4GHz N dual-xeon workstation with 4GB of RAM and a 70GB 4 RAID- 0 hard disk system. The amount of memory is ample for the size of the data, and there is thus no complications of disk access in the results. The following table summarizes the dimeonsions and the number of non-zero elements of the matrices involved in the computation. P P 1 u ũ dims N N K K + 1 N K + 1 nnz 1,082, , , ,213 Two values of c were tried. c = 0.85 as a fast-converging example, and c = 0.95 as a slow converging example. The tolerance ɛ is set to The following table shows that with either value of c, the total time for the two-stage algorithm is just 20% of the standard PageRank algorithm s: 12 This is precisely what the Aitken Extrapolation does. For details on the Aitken Extrapolation see [12].
7 1 2 3 Log Error at First 15 Iterations Standard Stage 1 We have presented a way of effectively managing the dangling nodes. Regardless of the number of dangling nodes present which is usually a very large number, the total computation time is only proportional to the number of non-dangling nodes. Log One Norm Error Iteration Figure 2: A blow-up of the first 15 iterations of Figure 1. Time in sec. No. iterations c = 0.85 c = 0.95 Step Step Step Step Step Total Standard In either case, Stage 1 Steps 1 & 2 constitutes the bulk of the work and makes up 95% and 98% of the total time. Furthremore, the error of Stage 1 at each iteration is consistently below that of the standard algorithm s, but the gap eventually diminishes; in both cases the two algorithms terminate after the same number of iterations. See Figure 1 and Figure 2. This coincides with the prediction by Proposition 2. The amount of time needed for Stage 2 Steps 4 & 5 is miniscule in comparison. When the distributions from the two stages are concatenated, we obtain the entire PageRank vector. The one-norm difference between this vector and the one produced by the standard algorithm is when c = 0.85 and when c = TREATMENT OF DANGLING NODES In this section we address the issue of dangling nodes from a modeling perspective. There are two sources of dangling nodes. A webpage is dangling if it genuinely has no outlinks. On the other hand, we also consider a webpage to be dangling if we simply have no information regarging its outlinks. The latter can arise when the webpage has been referenced i.e. linked to by another webpage in a crawl, but is itself not included in the crawl this is a very typical scenario as the vastness and rapidly-changing nature of the Web render a complete crawl impossible [16]. How to best treat the dangling nodes is very much a philosophical question. Some people choose to leave them out of the computation completely; this amounts to computing the limiting distribution of just the leading K-by-K matrix, or P 11, of 5 and defining it as the PageRank vector. On the other hand, others have chosen to include the dangling nodes in the computation by inserting the personalization vector u into the rows of P corresponding to the dangling nodes. See 2. In this paper, we have adopted the second view for a number of reasons: Because in a typical situation there usually is a very large number of dangling nodes, throwing away all of them is to give up an enormous amount of information. First, we d not have a way of ranking any of the dangling pages, which ironically is most of the webpages. Second, the resulting PageRank vector would not incorporate any information from P 12. Keep in mind that P 12 consists of actual links. It is completely legitimate and, in terms of probability mass and contribution to the final limiting distribution, is no less important that P 11. There are some very important classes of webpages urls that are by nature dangling. These include PDFs, images, movies, and etc. It would be a significant loss if one could not search for research papers or movie trailers, for example. While inserting the personalization vector into the rows of the dangling nodes may seem arbitrary at first, the practice is not necessarily so inappropriate. What is asserted here is that if there are no outlinks, the websurfer can move to any page according to preference. This is certainly not an inaccurate way to model transitions 13, and is in fact quite sensible from a behavioral perspective 14. While on this issue of whether to throw away the dangling nodes, we shall also mention that a very common suggestion is to not throw away the dangling nodes in the overall computation but do leave them out until the very last stages [16]. In other words, one d first compute the limiting distribution of P 11, pad it with more elements, and use that as the initial vector for the entire matrix P. It is believed that this procedure accelerates convergence. While this may well accelerate convergence in particular cases, it isn t true in general. 15 In general, the limiting distribution of P 11 doesn t coincide with or even approximate the first k components P s limiting distribution. What is true, from the theory of stochastc complementation [14], is that the first k components of P s limiting distribution, when normalized, coincide with the limiting distribution of the stochastic complement of P 11: S 11 = P 11 + P 12 I P 22 1 P 21 If P were a nearly completely decomposable matrice, the offdiagonal blocks would contain negligible probability mass, and S 11 and P 11 would be roughly the same; in that case the limiting distribution of P 11 would approximate the first K components of P s limiting distribution [3]. However, P is not NCD under our present partition. The dangling nodes and the non-dangling nodes do not form two NCD subsets, as a significant probability mass can be found in each block of 5. The procedure described in this paper renders this common practice irrelevant. What we ve showned is as follows. Contrary to 13 In reality, a websurfer can always go to a page by directly entering the url. An explicit link is not the only way to move to a page. 14 The bottom line is the Markov chain model of PageRank is very much a hybrid model of structure i.e. links and behavior i.e. preferences/personalization. Its success may well lie in its ability to recognize the importance of both aspects. 15 See the Appendix for a simple counter-example constructed by personalization.
8 common practice, we cannot hope to use the limiting distribution of P 11 to approximate the first K components of P s limiting distribution. On the other hand, we can compute the latter exactly by computing the limiting distribution of the lumped chain. The transition probability matrix 6 of the lumped chain is of course K + 1-by-K + 1 and effectively the same size as P 11. And once that is done with very little additional work we obtain the entire PageRank vector. 8. CONCLUDING REMARKS In this paper we present a fast two-stage algorithm for computing the PageRank vector. We exploit the Markov chain associated with PageRank being lumpable. In the first stage, we compute the limiting distribution of a Markov chain where the dangling nodes are lumped into one; in the second stage, we compute the limiting distribution for a chain where the non-dangling nodes are combined. The two limiting distributions are concatenated to form the PageRank vector. Most of the work is on computing the lumped chain, and the total work is only proportional to the number of nondagling nodes. A numerical experiment shows that in practice the two-stage algorithm finishes in only a fraction of the time required by the standard PageRank algorithm in this case as little as 20%. Furthermore, only a part of the transition probability matrix is enumerated at any given time, and the memory requirement is accordingly mild. On machines where the memory is limited relative to the size of the problem which is almost always the case in reality, the performance gap between the two-stage algorithm and the standard algorithm is likely to be even wider. Lastly, our algorithm represents an alternative to the common practice of not including the dangling nodes until the last stages of the computation. That practice lacks theoretical support and cannot be expected to accelerate convergence in general. On the other hand, the algorithm described here is provable, generally applicable, and achieves the desired speed gains. 9. ACKNOWLEDGMENTS The authors d like to thank Hector Garcia-Molina, Andreas Paepcke, Sriram Raghavan, and Gary Wesley of the Stanford WebBase project in assisting access to their data, and Sepandar Kamvar, Wang Lam, Amy Langville, and Sebastiano Vigna for their helpful comments. 10. ADDITIONAL AUTHORS Additional author: Stephanie Leung Computer Science Department, Stanford University. wleung@stanford.edu. 11. REFERENCES [1] A. Arasu, J. Novak, A. Tomkins, and J. Tomlin. PageRank computation and the structure of the Web: experiments and algorithms. In Proceedings of the Eleventh International World Wide Web Conference, Poster Track, [2] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. SIAM Press, Pennsylvania, [3] W. L. Cao and W. J. Stewart. Iterative aggregation/disaggregation techniques for nearly uncoupled Markov chains. Journal of the Association for Computing Machinery, 32, pages , [4] T. Dayar and W. J. Stewart. Quasi-lumpability, lower bounding coupling matrices, and nearly completely decomposable Markov chains. SIAM Journal on Matrix Analysis and Applications, Vol. 18, :2, pages , [5] G. H. Golub and C. F. V. Loan. Matrix Computation. John Hopkins University Press, Third Edition. [6] T. H. Haveliwala and S. D. Kamvar. The second eigenvalue of the Google matrix. Technical report, Stanford University, [7] J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: a repository of web pages. In Proceedings of the Ninth International World Wide Web Conference, [8] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, [9] G. Jeh and J. Widom. Scaling personalized web search. In Proceedings of the Twelfth International World Wide Web Conference, [10] S. D. Kamvar, T. H. Haveliwala, and G. H. Golub. Adaptive methods for the computation of PageRank. Technical report, Stanford University, [11] S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the Web for computing PageRank. Technical report, Stanford University, [12] S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Extrapolation methods for accelerating PageRank computations. In Proceedings of the Twelfth International World Wide Web Conference, [13] J. G. Kemeny and J. L. Snell. Finite Markov Chains. D. Van Norstrand, New York, [14] C. D. Meyer. Stochastic complementation, uncoupling Markov chains, and the theory of nearly reducible systems. SIAM Review, Vol. 31, :2, pages , [15] C. Moler. The world s largest matrix computation. MATLAB News & Notes, pages 12 13, October [16] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, [17] H. A. Simon and A. Ando. Aggregation of variables in dynamic systems. Econometrica, pages 29: , APPENDIX Here is a small counter-example. It demonstrates the common practice of including the dangling nodes only during the last stages of computation doesn t always accelerate convergence. Take the link matrix G = Here, K = 2 and N = 4. Take c = 0.85 and u T = 1 3a where a = 43. Thus P = It can be verified that the limiting distribution of the leading 2- by-2 submatrix of P is , while for the entire matrix it is The bottom line is the limiting distribution of the 2-by-2 yields a worse starting iterate than the uniform vector, and the desired acceleration is not observed. For more details on why see [14]. a a a,
Computing PageRank using Power Extrapolation
Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation
More informationA Two-Stage Algorithm for Computing PageRank and Multistage Generalizations
Internet Mathematics Vol. 4, No. 4: 299 327 A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations ChrisP.Lee,GeneH.Golub,andStefanosA.Zenios Abstract. The PageRank model pioneered
More informationThe Second Eigenvalue of the Google Matrix
The Second Eigenvalue of the Google Matrix Taher H. Haveliwala and Sepandar D. Kamvar Stanford University {taherh,sdkamvar}@cs.stanford.edu Abstract. We determine analytically the modulus of the second
More informationUpdating PageRank. Amy Langville Carl Meyer
Updating PageRank Amy Langville Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC SCCM 11/17/2003 Indexing Google Must index key terms on each page Robots crawl the web software
More informationCalculating Web Page Authority Using the PageRank Algorithm
Jacob Miles Prystowsky and Levi Gill Math 45, Fall 2005 1 Introduction 1.1 Abstract In this document, we examine how the Google Internet search engine uses the PageRank algorithm to assign quantitatively
More informationUpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer
UpdatingtheStationary VectorofaMarkovChain Amy Langville Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC NSMC 9/4/2003 Outline Updating and Pagerank Aggregation Partitioning
More informationc 2005 Society for Industrial and Applied Mathematics
SIAM J. MATRIX ANAL. APPL. Vol. 27, No. 2, pp. 305 32 c 2005 Society for Industrial and Applied Mathematics JORDAN CANONICAL FORM OF THE GOOGLE MATRIX: A POTENTIAL CONTRIBUTION TO THE PAGERANK COMPUTATION
More informationExtrapolation Methods for Accelerating PageRank Computations
Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Stanford University sdkamvar@stanford.edu Taher H. Haveliwala Stanford University taherh@cs.stanford.edu Christopher D. Manning
More informationAnalysis of Google s PageRank
Analysis of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills AN05 p.1 PageRank An objective measure of the citation importance of a web page [Brin & Page 1998]
More informationMathematical Properties & Analysis of Google s PageRank
Mathematical Properties & Analysis of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca M. Wills Cedya p.1 PageRank An objective measure of the citation importance
More informationLab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018
Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google
More informationUpdating Markov Chains Carl Meyer Amy Langville
Updating Markov Chains Carl Meyer Amy Langville Department of Mathematics North Carolina State University Raleigh, NC A. A. Markov Anniversary Meeting June 13, 2006 Intro Assumptions Very large irreducible
More informationGoogle PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano
Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google
More informationAnalysis and Computation of Google s PageRank
Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca S. Wills ANAW p.1 PageRank An objective measure of the citation importance of a web
More informationCONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER
CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER ILSE C.F. IPSEN AND STEVE KIRKLAND Abstract. The PageRank updating algorithm proposed by Langville and Meyer is a special case
More informationKrylov Subspace Methods to Calculate PageRank
Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationA Note on Google s PageRank
A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to
More informationPAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES
PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES ILSE CF IPSEN AND TERESA M SELEE Abstract We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic
More informationA hybrid reordered Arnoldi method to accelerate PageRank computations
A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationLinear Algebra and Eigenproblems
Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details
More informationc 2007 Society for Industrial and Applied Mathematics
SIAM J MATRIX ANAL APPL Vol 29, No 4, pp 1281 1296 c 2007 Society for Industrial and Applied Mathematics PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES ILSE C F IPSEN AND TERESA M SELEE
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationIntroduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa
Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis
More informationAffine iterations on nonnegative vectors
Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider
More informationfor an m-state homogeneous irreducible Markov chain with transition probability matrix
UPDATING MARKOV CHAINS AMY N LANGVILLE AND CARL D MEYER 1 Introduction Suppose that the stationary distribution vector φ T =(φ 1,φ 2,,φ m ) for an m-state homogeneous irreducible Markov chain with transition
More informationApplication. Stochastic Matrices and PageRank
Application Stochastic Matrices and PageRank Stochastic Matrices Definition A square matrix A is stochastic if all of its entries are nonnegative, and the sum of the entries of each column is. We say A
More informationFast PageRank Computation Via a Sparse Linear System (Extended Abstract)
Fast PageRank Computation Via a Sparse Linear System (Extended Abstract) Gianna M. Del Corso 1 Antonio Gullí 1,2 Francesco Romani 1 1 Dipartimento di Informatica, University of Pisa, Italy 2 IIT-CNR, Pisa
More informationOn the eigenvalues of specially low-rank perturbed matrices
On the eigenvalues of specially low-rank perturbed matrices Yunkai Zhou April 12, 2011 Abstract We study the eigenvalues of a matrix A perturbed by a few special low-rank matrices. The perturbation is
More informationNo class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.
Stationary Distributions Monday, September 28, 2015 2:02 PM No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1. Homework 1 due Friday, October 2 at 5 PM strongly
More informationECEN 689 Special Topics in Data Science for Communications Networks
ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs
More informationMarkov Chains and Spectral Clustering
Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu
More informationUncertainty and Randomization
Uncertainty and Randomization The PageRank Computation in Google Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it 1993: Robustness of Linear Systems 1993: Robustness of Linear Systems 16 Years
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationGoogle Page Rank Project Linear Algebra Summer 2012
Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant
More informationOn the mathematical background of Google PageRank algorithm
Working Paper Series Department of Economics University of Verona On the mathematical background of Google PageRank algorithm Alberto Peretti, Alberto Roveda WP Number: 25 December 2014 ISSN: 2036-2919
More informationFinite-Horizon Statistics for Markov chains
Analyzing FSDT Markov chains Friday, September 30, 2011 2:03 PM Simulating FSDT Markov chains, as we have said is very straightforward, either by using probability transition matrix or stochastic update
More informationIR: Information Retrieval
/ 44 IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC
More informationHow does Google rank webpages?
Linear Algebra Spring 016 How does Google rank webpages? Dept. of Internet and Multimedia Eng. Konkuk University leehw@konkuk.ac.kr 1 Background on search engines Outline HITS algorithm (Jon Kleinberg)
More informationPageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)
PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive
More informationLecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure
Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationCourse Notes: Week 1
Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues
More information1998: enter Link Analysis
1998: enter Link Analysis uses hyperlink structure to focus the relevant set combine traditional IR score with popularity score Page and Brin 1998 Kleinberg Web Information Retrieval IR before the Web
More informationLecture 7 Mathematics behind Internet Search
CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU Outline Google is the
More informationPageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10
PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to
More informationThe Google Markov Chain: convergence speed and eigenvalues
U.U.D.M. Project Report 2012:14 The Google Markov Chain: convergence speed and eigenvalues Fredrik Backåker Examensarbete i matematik, 15 hp Handledare och examinator: Jakob Björnberg Juni 2012 Department
More information0.1 Naive formulation of PageRank
PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more
More informationHow works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University
How works or How linear algebra powers the search engine M. Ram Murty, FRSC Queen s Research Chair Queen s University From: gomath.com/geometry/ellipse.php Metric mishap causes loss of Mars orbiter
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationDistributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract
More informationNotes for CS542G (Iterative Solvers for Linear Systems)
Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationMarkov Chains and Stochastic Sampling
Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,
More informationRandom Surfing on Multipartite Graphs
Random Surfing on Multipartite Graphs Athanasios N. Nikolakopoulos, Antonia Korba and John D. Garofalakis Department of Computer Engineering and Informatics, University of Patras December 07, 2016 IEEE
More informationGraph Models The PageRank Algorithm
Graph Models The PageRank Algorithm Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2013 The PageRank Algorithm I Invented by Larry Page and Sergey Brin around 1998 and
More informationFast PageRank Computation via a Sparse Linear System
Internet Mathematics Vol. 2, No. 3: 251-273 Fast PageRank Computation via a Sparse Linear System Gianna M. Del Corso, Antonio Gullí, and Francesco Romani Abstract. Recently, the research community has
More informationDefinition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.
Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring
More informationThe Push Algorithm for Spectral Ranking
The Push Algorithm for Spectral Ranking Paolo Boldi Sebastiano Vigna March 8, 204 Abstract The push algorithm was proposed first by Jeh and Widom [6] in the context of personalized PageRank computations
More informationPage rank computation HPC course project a.y
Page rank computation HPC course project a.y. 2015-16 Compute efficient and scalable Pagerank MPI, Multithreading, SSE 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationLecture: Local Spectral Methods (1 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More information1 Searching the World Wide Web
Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on
More informationData and Algorithms of the Web
Data and Algorithms of the Web Link Analysis Algorithms Page Rank some slides from: Anand Rajaraman, Jeffrey D. Ullman InfoLab (Stanford University) Link Analysis Algorithms Page Rank Hubs and Authorities
More informationLink Analysis. Leonid E. Zhukov
Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationSENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN*
SIAM J Matrix Anal Appl c 1994 Society for Industrial and Applied Mathematics Vol 15, No 3, pp 715-728, July, 1994 001 SENSITIVITY OF THE STATIONARY DISTRIBUTION OF A MARKOV CHAIN* CARL D MEYER Abstract
More informationMAT1302F Mathematical Methods II Lecture 19
MAT302F Mathematical Methods II Lecture 9 Aaron Christie 2 April 205 Eigenvectors, Eigenvalues, and Diagonalization Now that the basic theory of eigenvalues and eigenvectors is in place most importantly
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationAN APPLICATION OF LINEAR ALGEBRA TO NETWORKS
AN APPLICATION OF LINEAR ALGEBRA TO NETWORKS K. N. RAGHAVAN 1. Statement of the problem Imagine that between two nodes there is a network of electrical connections, as for example in the following picture
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationMultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors
MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering
More informationData Mining and Matrices
Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages
More informationLinear Programming Redux
Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains
More informationMPageRank: The Stability of Web Graph
Vietnam Journal of Mathematics 37:4(2009) 475-489 VAST 2009 MPageRank: The Stability of Web Graph Le Trung Kien 1, Le Trung Hieu 2, Tran Loc Hung 1, and Le Anh Vu 3 1 Department of Mathematics, College
More informationLecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis
CS-621 Theory Gems October 18, 2012 Lecture 10 Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis 1 Introduction In this lecture, we will see how one can use random walks to
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More information6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities
6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 1 Outline Outline Dynamical systems. Linear and Non-linear. Convergence. Linear algebra and Lyapunov functions. Markov
More informationRobust PageRank: Stationary Distribution on a Growing Network Structure
oname manuscript o. will be inserted by the editor Robust PageRank: Stationary Distribution on a Growing etwork Structure Anna Timonina-Farkas Received: date / Accepted: date Abstract PageRank PR is a
More informationA linear model for a ranking problem
Working Paper Series Department of Economics University of Verona A linear model for a ranking problem Alberto Peretti WP Number: 20 December 2017 ISSN: 2036-2919 (paper), 2036-4679 (online) A linear model
More informationVolume in n Dimensions
Volume in n Dimensions MA 305 Kurt Bryan Introduction You ve seen that if we have two vectors v and w in two dimensions then the area spanned by these vectors can be computed as v w = v 1 w 2 v 2 w 1 (where
More informationeigenvalues, markov matrices, and the power method
eigenvalues, markov matrices, and the power method Slides by Olson. Some taken loosely from Jeff Jauregui, Some from Semeraro L. Olson Department of Computer Science University of Illinois at Urbana-Champaign
More informationLink Analysis. Stony Brook University CSE545, Fall 2016
Link Analysis Stony Brook University CSE545, Fall 2016 The Web, circa 1998 The Web, circa 1998 The Web, circa 1998 Match keywords, language (information retrieval) Explore directory The Web, circa 1998
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationIncompatibility Paradoxes
Chapter 22 Incompatibility Paradoxes 22.1 Simultaneous Values There is never any difficulty in supposing that a classical mechanical system possesses, at a particular instant of time, precise values of
More informationTopics in linear algebra
Chapter 6 Topics in linear algebra 6.1 Change of basis I want to remind you of one of the basic ideas in linear algebra: change of basis. Let F be a field, V and W be finite dimensional vector spaces over
More informationMarkov Chain Monte Carlo The Metropolis-Hastings Algorithm
Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability
More informationMATH36001 Perron Frobenius Theory 2015
MATH361 Perron Frobenius Theory 215 In addition to saying something useful, the Perron Frobenius theory is elegant. It is a testament to the fact that beautiful mathematics eventually tends to be useful,
More informationUtilizing Network Structure to Accelerate Markov Chain Monte Carlo Algorithms
algorithms Article Utilizing Network Structure to Accelerate Markov Chain Monte Carlo Algorithms Ahmad Askarian, Rupei Xu and András Faragó * Department of Computer Science, The University of Texas at
More informationCutting Graphs, Personal PageRank and Spilling Paint
Graphs and Networks Lecture 11 Cutting Graphs, Personal PageRank and Spilling Paint Daniel A. Spielman October 3, 2013 11.1 Disclaimer These notes are not necessarily an accurate representation of what
More informationChapter 2: Matrix Algebra
Chapter 2: Matrix Algebra (Last Updated: October 12, 2016) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). Write A = 1. Matrix operations [a 1 a n. Then entry
More informationCANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES. D. Katz
CANONICAL FORMS FOR LINEAR TRANSFORMATIONS AND MATRICES D. Katz The purpose of this note is to present the rational canonical form and Jordan canonical form theorems for my M790 class. Throughout, we fix
More informationUsing Markov Chains To Model Human Migration in a Network Equilibrium Framework
Using Markov Chains To Model Human Migration in a Network Equilibrium Framework Jie Pan Department of Mathematics and Computer Science Saint Joseph s University Philadelphia, PA 19131 Anna Nagurney School
More informationa (b + c) = a b + a c
Chapter 1 Vector spaces In the Linear Algebra I module, we encountered two kinds of vector space, namely real and complex. The real numbers and the complex numbers are both examples of an algebraic structure
More informationMath 471 (Numerical methods) Chapter 3 (second half). System of equations
Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular
More informationMath 291-2: Lecture Notes Northwestern University, Winter 2016
Math 291-2: Lecture Notes Northwestern University, Winter 2016 Written by Santiago Cañez These are lecture notes for Math 291-2, the second quarter of MENU: Intensive Linear Algebra and Multivariable Calculus,
More informationPseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports
Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Jevin West and Carl T. Bergstrom November 25, 2008 1 Overview There
More information642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004
642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section
More information