How to optimize the personalization vector to combat link spamming

Size: px
Start display at page:

Download "How to optimize the personalization vector to combat link spamming"

Transcription

1 Delft University of Technology Faculty of Electrical Engineering, Mathematics and Computer Science Delft Institute of Applied Mathematics How to optimize the personalization vector to combat link spamming Report for the Delft Institute of Applied Mathematics as part of BACHELOR OF SCIENCE in APPLIED MATHEMATICS by Jenny Tjan Delft, Nederland June 2016 Copyright c 2016 by Jenny Tjan. All rights reserved.

2

3 BSc report APPLIED MATHEMATICS Delft University of Technology Jenny Tjan Technische Universiteit Delft Thesis advisor Dr.ir. M.B. van Gijzen Other members of the graduation committee Dr. J.L.A. Dubbeldam Dr.ir. M. Keijzer June, 2016 Delft

4

5 Abstract Google uses the PageRank algorithm to rank the web. The algorithm models the behavior of a random surfer. It follows an outlink or it goes to any page by entering a URL into an address bar. This is also called teleportation. The probability that a surfer teleports to a page is given in the personalization vector. The PageRank algorithm returns a Pagerank score for each page. The score determines the position of the page. The higher the score, the higher the page will be on the list. However, some people want to increase their PageRank score artificially. Link spamming is the name for adding and removing links between pages with the sole purpose of increasing the PageRank score. We want to find a method to lower the effect of link spamming. One way is to change the personalization vector. If we restrict the pages the random surfer can teleport to, we can avoid having the surfer teleport to a page that is suspected of link spamming. So one way to suppress the effect of link spamming is to optimize the personalization vector. In order to combat link spamming, we have looked at the role and influence of the personalization vector. We describe two different methods to optimize the personalization vector. The first method generates a number of personalization vectors and calculates the sum of the PageRank scores of the pages that are suspected of link spamming. The lowest score belongs to an optimal personalization vector. The second method uses linear programming. We minimize the PageRank scores of the suspected pages and find the optimal personalization vector. The results were not always useful. For that reason, we added two extra requirements. One was setting an upper limit for all the pages on the probability that the surfer can teleport. The other one was suppressing the pages in the irreducible subsets. If a surfer gets into an irreducible subset, it will never leave the subset by following outlinks. 4

6 Preface Finally, this thing in your hands is the last thing I had to do to get my bachelor degree. These three years went by faster than I thought. One of the many reasons why I choose this study was wondering why you can only study mathematics in a university. And the other question was why there were two kinds. There was mathematics and applied mathematics, what was the difference? Instead of ing and asking around I enrolled into the university just to see it for myself. I can say for sure that changing studies was one of the yolo -moments of my life. The same as walking into the numerical department looking for an interesting bachelor project. Although I find probability theory very interesting, somehow I ed up here. I do not regret my decision at all and had a pretty fun time sitting on the third floor every day. If I have to thank everyone who has helped me through this bachelor journey, I will have to kill a few trees to write down all the names. Although the previous versions of this report has been printed out at least 10 times and by the time you read this, around 15 times. I m going to be environment-frily and just thank my supervisor Martin van Gijzen. He has helped me with my bachelor project by giving me advice, sharing his wisdom and checking my spelling. I especially want to thank these people for proofreading my thesis: Dyan Konijnenberg and Pim Otte Small thanks to K.P. Hart, Rowan Kerstens and Vivian van der Heul for reading and giving me feedback about the abstract, introduction and conclusion. so I know that everybody from different knowledge of mathematics to understand what this report is about. As well, Dylan Huizing for putting finishing touches and Tim Hegeman for the light bulb moment. Do not not forget Joanne Tjan for moral support and doing absolutely nothing. I hope you will enjoy reading this as much I had with writing this. Let me this with one of my favorite cheesy quote. It s not about the destination, it s about the journey. Oh, and the journey has not ed because I will return to Delft as a master student next semester. That means more joy for at least two years. Jenny Tjan,

7 Contents 1 Introduction 8 2 Preliminary mathematical definitions Notations Definitions Theorems Google Matrix Model for the random surfer Computing the PageRank vector Power Method Linear system approach Example Effect of link spamming Link spamming Suspected pages Research question Modifying the PageRank Model with v Personalization vector Influence of the personalization vector Validation Finding an optimal v Method Example with 7 nodes Method 2: Linear Programming Example Irreducible subset Method 2.1 and Method Example Summary of the results of the four methods Numerical results G Computation Time Conclusion 27 A End results 28 A.1 G A.2 G B Matlab codes 31 B.0.1 WebH.m B.1 Methods to calculate the PageRank vector B.1.1 pagerankpow.m B.1.2 IT.m

8 B.2 Methods to optimize personalization vector B.2.1 Random.m B.2.2 OPTx.m B.2.3 OPTx2.m

9 1 Introduction Millions of people use the internet to search for information every day. They usually go to a search engine to get their results. A search engine is software that is designed to search for information on the web and Google is one of the most used search engines at this moment. But why do people prefer Google over Yahoo and other search engines? The internet nowadays consists of billions of web pages and even more links connecting them. How does Google sort and return these search results? The answer to these questions is PageRank. This algorithm is used to return certain search results and staying at the top of his field. The PageRank algorithm models the behavior of a random web surfer that follows an outlink from the page with probability α or goes to another page with probability (1 α) by writing the URL in the address bar. This is also called teleportation. The surfer will teleport to any pages by entering the URL in the address bar with a probability given in the personalization vector. The mathematical way to interpret this is a random walk in a directed graph, which could also be called a Markov chain. The algorithm returns a PageRank vector with the probability, also named PageRank score, that the web surfer will be there after many steps. The result follows from the PageRank by ranking the page scores from high to low. Some people may want to increase the PageRank score of their page. One method to do so is Link spamming. Link spamming aims to fool the model by adding links to pages to increase the PageRank scores. What we want to achieve in this project, is to return a more honest search result than PageRank would. This will be done by modifying the model such that pages that are suspected of link spamming get a lower PageRank score. It is well known how to detect link spamming and we refer to the work of Sangers and van Gijzen. [3] The personalization vector is a probability vector that indicates the pages where the surfer will teleport to. This research aims to investigate and optimize the personalization vector to combat link spamming. The structure of this report is as follows. In Section 2 we explain the preliminary mathematical definitions that will be used later in the report. In Section 3 we will discuss the mathematical interpretation of the PageRank algorithm and how Google models the random surfer. In Section 4 we will further describe link spamming and its harmful effects. In the remaining sections will be about the effect of the personalization vector and how to optimize it to combat link spamming. Finally, we will discuss the methods and conclude our findings. 8

10 2 Preliminary mathematical definitions In this report there will be notations, definitions and theorems used. These will be defined in this section. 2.1 Notations Bold symbol denotes a vector and non bold symbol with index denotes his coefficient. Let π be a n n vector, then we can write π = (π 1,..., π n ) 2.2 Definitions Definition 1. Let A be a n n matrix. A is said to be a reducible matrix when there exists a permutation matrix P such that [ ] P X Y AP = 0 Z where X and Z are both square Otherwise A is said to be an irreducible matrix. Definition 2. Let l be the number of irreducible subsets of S. Then we can rewrite S in canonical form by renumbering the coefficients: A 1,1 A 1,2... A 1,r A 1,r+1 A 1,r+2... A 1,m 0 A 2,2... A 2,r A 2,r+1 A 2,r+2... A 2,m A A r,r A r,r+1 A r,r+2... A r,m A r+1,r A r+2,r A m,m where l = m r. Where each A 1,1,..., A m,m is either irreducible or [0] 1 1 and each A r+1,r+1,..., A m,m is an irreducible subset. See [5]. Definition 3. Matrix A is positive if A > 0 for every a ij A. i.e. all the elements of A are greater than zero Definition 4. Let A be a n n matrix with the corresponding λ as eigenvalues. The spectral radius of A is given by ρ(a) = max λ i i Definition 5. Given Markov chain on a state space V with transition matrix S. We call a subset C V closed subset or irreducible subset if and only if j C S ij = 0 for each i C Definition 6. Let x be a 1 n vector and A be a n n matrix. Then the 1-norm and inf-norm are defined as follows n x 1 = x i x = max x i 1 i n A 1 = max 1 j n i=1 n a ij A = max i=1 9 1 i n j=1 n a ij

11 Definition 7. The condition number of the matrix A is given by 2.3 Theorems κ(a) = A A 1 Theorem 1 (Perron-Frobenius). Let G be a n n irreducible positive matrix. Then G has a unique positive real eigenvalue λ 1 equal to its spectral radius. If G is positive then λ 1 is dominant. To λ 1 corresponds a positive eigenvector. Theorem 2 (Gershgorin circle theorem). The eigenvalues of a general matrix n n matrix A are located in the complex plane in the union of circles λ a ii j i j=1 a ij where λ C 10

12 3 Google Matrix 3.1 Model for the random surfer Let W = (V, E) denote the web graph with V the set of n web pages and E the set of direct edges between the pages. Let H be the matrix representation of W, which means H ij = 1 if there is an outlink from page i to page j, with 1 i, j n. Let the row sums denoted by r i = n j=1 H ij. If r i = 0, means page i does not have outlinks to other pages. This is called a dangling node, for example image files or word files [2]. The way of treating a dangling node is linking it to other pages with equal probability. Now we can define matrix S as follows: S ij = { Hij r i r i 0 1 n r i = 0 S is the so-called web hyperlink matrix. Note that S is a row-stochastic matrix and S is a column-stochastic matrix. The rows of S sums up to 1 and the column of S sums up to 1. Other words, S is a transition matrix if the surfer only follows the outlinks. To give a more realistic model of the random surfer we have to take into account that a surfer can go to other pages by entering the URL in the address bar. This is also called teleportation. To this, we also introduce matrix E which is the matrix that models teleportation. Let E denote the teleportation matrix defined by E = v e, v is a 1 n probability vector called personalization vector and e is a 1 n vector with consisting of only ones. v gives the probability that a random surfer jumps to a certain page. We introduce the Google matrix to model the random surfer: G = αs + (1 α)e where α is an amplification factor between 0 < α < 1, which means the surfer has probability α to follow an outlink of the page or teleport with probability (1 α). Google started to use α = 0.85 and uniform vector for v to model the surfer [2, 6]. That means if the random surfer teleports, it will teleport to any page with equal probability. Thus v = 1 [ ] n Remark, G is a column-stochastic matrix since it is a convex combination of two transition matrices [4]. Every element of G is between 0 and 1 (i.e., 0 G ij 1) and each column sums up to 1. Furthermore, V is finite and G is irreducible since it is possible to go from any page to any other page by entering the URL. Theorem 3. Let G be a n n Google matrix. Then λ 1 = 1 is the unique eigenvalue of G. To λ 1 = 1 corresponds a PageRank vector. Proof. Let e be a n 1 vector consisting only ones. We know that the columns of G sums up to 1. So we have that G e = e. Hence 1 is an eigenvalue of G. We also apply Gershgorin circle theorem to find the greatest value in the modules of the eigenvalues. ρ(λ) max j G ij = 1 Thus λ 1 = 1 is a dominant eigenvalue of the matrix G. Since G is positive and irreducible it follows from Perron-Frobenius follows that λ 1 = 1 the dominant eigenvalue of G. 3.2 Computing the PageRank vector Power Method The random surfer can take infinitely many steps without getting tired. The surfer start with a certain initial distribution: π 0 and to know where the surfer is after i steps we do the following i 11

13 calculation: π i = G i π 0. We assume G is diagonalisable, thus G = P ΛP 1. Remark that the dominant eigenvalue of G is equal to 1 and the other eigenvalues are strictly smaller than 1. So after infinitely many steps we get that the PageRank vector corresponds to the eigenvector for the dominate eigenvalue. This method is also called the power method [1, 2, 4]. The method goes is follows. We begin with a starting vector π 0, take for example π 0 = v and compute the following steps: π i+1 = Gπ i with i = 1, 2,... until it satisfies some convergence conditions. This method is the most common way to solve for large problems [3]. Remark that for every i we have that π i is a probability vector. If we rewrite the problem we get: π i+1 = Gπ i (1) = αs π i (1 α)v eπ i (2) = αs π i (1 α)v (3) So to perform a power iteration, there is only matrix multiplication needed with the matrix S while adding a scalar vector (1 α)v. The average number of outlinks for a page is 52 [2], that means that S is very sparse, i.e. S has a lot of zero elements. Although S is very large, the power method will compute easily and fast Linear system approach For smaller testproblems we compute the PageRank vector by rewriting the problem as a linear system. We know that the PageRank vector corresponds to the dominant eigenvalue, i.e. λ 1 = 1. It has to satisfy the following condition: Gπ = π. Remark eπ = 1, this could be rewritten as: 0 = π Gπ (4) = π αs π (1 α)v eπ (5) = π αs π (1 α)v (6) = (I αs )π (1 α)v (7) To find the PageRank vector, we just have to solve the following equation (I αs )π = (1 α)v (8) This consists of a system of n linear equations that needs to be solved. Note that calculating matrix vector multiplications is much faster than solving a linear system. 3.3 Example Look at the following example given by Figure 1. We have a small web that only consists of 7 pages. Let s calculate the PageRank vector for this testproblem. First we set-up webhyperlink matrix S before computing the Google matrix. Thus: S =

14 Figure 1: Testproblem with 7 nodes Notice page 6 and 7 are dangling nodes, it can be seen in S that column 6 and 7 are uniformly distributed. To calculate the PageRank vector we define the following. Let α =.85 and v be the uniform vector. It follows that G is defined by G = αs + (1 α)v e = The PageRank vector calculated by the power method is with convergence requirement π i+1 π i 10 4 : π = [ ] This was done in 32 iterations. The vector is the same PageRank vector calculated with linear method. The algorithm will return the following order page order: Page 2 Page 1 Page 3 Page 4 Page 7 Page 5 Page 6 13

15 4 Effect of link spamming The goal of the research is to suppress the pages that are suspected of link spamming. In this section, we will explain what link spamming is and how to find the suspect pages of link spamming. 4.1 Link spamming Link spamming, or link farm aims to fool the algorithm by adding and removing links to pages to increase the PageRank score of certain pages. For example, a company has a page that rarely get hits. The PageRank score of his page is very low. To increase their PageRank score, they would hire people to add links from other pages. The spammer will make sure the links are hidden in multiple sites fooling the algorithm that the page is important, so it will give the page a higher score. For this reason, Google uses other parameters and algorithms to model the random surfer which we do not know of [2]. 4.2 Suspected pages We can find the suspected pages by finding the eigenvector corresponds with the second eigenvalue, see [3, 8]. One of the effective methods of link spamming is creating irreducible subsets. Creating irreducible subsets is very effective, that is because if a surfer gets there, it cannot leave by following outlinks. Also, the probability he will leave the irreducible subset by entering the URL is small. If α = 0.85, then the probability that the surfer teleports is around One way to find the irreducible subset is to rewrite S in canonical form by renumbering the nodes. If we recall the testproblem given in Figure Figure 2: After renumbering The web hyperlink matrix S has thus the following form: [ ] S1,1 S S = 1,2 = S 2,1 S 2, Figure 2 shows there is one irreducible subset S 2,2. This could also be seen in the matrix S. If the surfer goes to page 6 or 7, it will stay there until it decides to teleport. 14

16 4.3 Research question This report aims to analyze the personalization vector. Can we use it to combat link spamming? If so, how can we optimize the personalization vector? In the following sections, we are going to look at the role and impact of the personalization vector. Then we will discuss how to find and verify the optimal personalization vector. 15

17 5 Modifying the PageRank Model with v 5.1 Personalization vector One of the first modifications done to the model is changing the teleportation matrix E = v e. One could use another probability vector instead of the uniform vector for v. Each time the surfer teleports, it will teleport to a page with a certain probability distribution given in v. There are a few reasons why Google named the personalization vector that way. One of the main reason is because Google wanted different vectors to model different types of surfers more accurately [6, 3]. It will bring that surfer to the pages where he likely will teleport to. For example, a sports person will get more sports related pages than someone who has no interest in sports. 5.2 Influence of the personalization vector We try out different probability distributions for v and calculate the PageRank vectors. We compare that to the PageRank vector calculated with the uniform vector, to see how much the PageRank vector deps on the difference in personalization vector. To bound the perturbation, we look at the condition number κ 1 (I αs ) = (I αs ) 1 (I αs ) 1 1. Theorem 4. Let S be an n n row-stochastic matrix whose elements S ii = 0. Let α be a real number such that 0 α 1. Let E be the n n rank-one row-stochastic matrix with E = v e, where e is the 1 n vector whose elements are all e i = 1 and v is an n-vector that represents probability distribution. Define the matrix G = αs + (1 α)e. The problem Gπ = π has condition number κ 1 (I αs ) = I αs 1 (I αs ) 1 1 = 1+α 1 α. Proof. To prove the theorem, we are going to determine the norm of the matrix and inverse of the matrix separately. 1. I αs 1 = 1 + α We assume the diagonal elements of αs are zeros: I αs 1 = I 1 + α S 1 = 1 + α S 1 = 1 + α (9) 2. (I αs ) 1 1 = 1 1 α Let e i be an unit vector with 1 on the position i. Let π (e i ) be the n-vector that satisfies the following: v = e i, (I αs )π(e i ) = π (e i ), π (e i ) 1 = 1 Remark from the equation 8: π (e i ) = (1 α)(i αs ) 1 e i. Thus by taking the norm we get: π(e i ) 1 = (1 α) (I αs ) 1 e i 1 (10) 1 = (1 α) (I αs ) 1 e i 1 (11) (I αs ) 1 e i 1 = 1 1 α (12) Notice for each column we get (I αs ) 1 e i 1 = 1 1 α. Hence (I αs ) 1 1 = 1 1 α 16

18 The condition number is therefore κ 1 (I αs ) = I αs 1 (I αs ) 1 1 = 1+α 1 α For the complete proof we refer to Kamvar, Sepandar, and Taher Haveliwala [11]. For α = 0.85 the condition number κ Since v is probability vector, we get v 1 = v + v v 1 v + v 1 + v 1 = 2 Which means, the upper bound of the difference of the PageRank vector π Remark that the difference in the PageRank vector is also π 1 = π + π π 1 π + π 1 + π 1 = 2 since π is a probability vector. We will look at the linear equation to get a better upper bound. To find a better bound we look at the equation 8 Let π be the PageRank vector corresponding with to the following personalization vector v. Then, π = (1 α)(i αs ) 1 v. Let π = π + π be the disturbed PageRank vector corresponding to the following personalization vector v = v + v : π = (1 α)(i αs ) 1 (v + v ) Note the following: π 1 = π + π 1 = v 1 = v + v 1 = 1. From equation 8 we get: π 1 (1 α) (I αs ) 1 1 v 1 (13) π 1 π 1 (1 α) (I αs ) 1 1 v 1 v 1 (14) π 1 π 1 v 1 v 1 (15) π 1 v 1 (16) Notice step 15 follows from theorem 4.2. We found an better bound for the overall disturbance in π. This will all be tested in subsection We also want to know how much the difference in personalization vector affects the largest value of the PageRank vector. We can look at the following inequality: π π 1 Thus, we know that π is also bounded by the same bound as π 1. Thus Validation π π 1 v 1 To test the claim stated above, we use the test problem defined in Figure 1. We test the two upper bounds for the 1-norm of the PageRank vector π 1. Let v u be the uniform vector to the PageRank vector π u. We generate 1000 random personalization vectors {v i } for 1 i 1000 and define the difference as v i = v u v i. For each personalization vector we find the PageRank vector π i and define the difference π i = π u π i. Then we plot π i 1 to v i 1 and to κ 1 v i 1 that can be seen in Figure 3. 17

19 (a) π 1 κ 1 v 1 (b) π 1 v 1 Figure 3: Upper bound for 1 As can be noticed, both bounds holds the inequality. The second upper bound is better than the upper bound given with the condition number. We see in Figure 3a that the difference between the π 1 and κ 1 v is quite large. Now we are going to test the bound for the maximum value of the PageRank vector, i.e. π v 1. We plot π i to v i 1, the result can be seen in Figure 4 Figure 4: π v 1 By looking at the result, we can conclude the following. If we want a small difference in the PageRank vector, it is obvious to choose another probability vector that does not differ much from the uniform personalization vector. However, if we choose a personalization vector that differs a lot from the uniform vector, it does not lead to a large difference in the PageRank vector. 18

20 6 Finding an optimal v Let C be the set of pages that should be suppressed. Let v be a personalization vector with π as PageRank vector. Let β denote the sum of the PageRank score of the suppressed pages, thus β = i C π i. We call the vector v the optimal personalization vector if its subject to β = min v i C π i v is personalization vector In this chapter, we will discuss two main methods and some variation of the last method to find the optimal personalization vector v. 6.1 Method 1 We do not want the random surfer to teleport to one of the suspected pages. So we generate a set of m personalization vector such that v i = 0 for every i C. Let e j be a unit 1 n vector with 1 on the j th position. Then we define for every personalization vector v k = j / C γ k,je j with j / C γ k,j = 1 and 0 γ k,j 1 for every 1 k m. We calculate the corresponding PageRank vector and calculate β k = i C π k,i. The method returns v t for a fixed t if β t β k for every 1 k m Example with 7 nodes Let us reformulate the test problem. We alter the web structure by adding in- and outlinks to increase the PageRank of page 4. Also, we set the matrix S in canonical form. Hence, we have the following directed graph that can be seen in Figure 5: Figure 5: After adding and removing links and renumbering By renumbering, the web hyperlink matrix looks as follows: S = It can be seen in the figure that the test problem has two irreducible closed subsets. If the surfer gets into a closed subset, it can never leave the subset by following an outlink. Let α =.85 and v be an uniform vector. The calculated PageRank vector would be: π u = [ ] 19

21 Let s take for example we want to suppress page 4 since the PageRank score of page 4 is the highest. For the first method we take m = We call π m1 the PageRank vector and v m1 the personalization vector found with this method. As result we find : v m1 = [ ] π m1 = [ ] For this method, we have to generate m personalization vectors and calculate m PageRank vectors. Google takes days to calculate one PageRank vector and finding personalization vector with this method is space and time consuming. We will introduce another method in the next paragraph. 6.2 Method 2: Linear Programming Instead of looking at different personalization vectors, one could look at the requirement of the PageRank scores. Recall we defined β = min v i C π i for v is a personalization vector. So if we set π i = 0 for i C, we get the lowest value for β. But it does not always meet the condition of v being a probability vector, i.e. v 0 and v 1 = 1. We can formulate the problem as a linear optimization problem. We start from the linear system (I αs )π = (1 α)v as can be seen in equation 8. min cπ (I αs subject to ) 1 α π 0 i π i = 1 0 π i 1 1 i n with the following: c is a n 1 vector, with c i = 1 if i C. It should also subject to v 0 and that the sum of the coefficients of π must be equal to 1, since π is a probability vector. Also, all the coefficients of π should be between 0 and Example Given the problem defined in Example We use method 2 to calculate the PageRank vector and corresponding personalization vector. We call π m2 the PageRank vector and v m2 the personalization vector found with this method. We want to suppress page 4, thus we have to solve the next problem: min π 4 (I αs subject to ) 1 α π 0 i π i = 1 0 π i 1 1 i 7 We get the next result: π m2 = [ ] v m2 = [ ] Note that the solution is not unique. If we choose v = [ ]. The PageRank score of page 4 will still be 0. In other words, the optimal personalization vector is not unique. 20

22 6.3 Irreducible subset The result we obtained with method 2 does lower PageRank score for page 4, but the result is not the most ideal. It is quite logical for the random surfer to jump to an irreducible closed subset. A surfer cannot reach page 4 if he gets stuck and stays stuck. If we decrease the upper bound of the linear problem, this will force the surfer to teleport to other pages as well. 6.4 Method 2.1 and Method 2.2 We change method 2 in two different ways to get a better result. We call the first modification method 2.1. The method will prevent the random surfer from getting stuck in an irreducible subset. A downside of it is that we do not know by how much we should lower the bounds. If we lower the bounds too much, there is a chance we will not find a solution for v. Another method is to find all the pages that are part of irreducible subsets, we call this method 2.2. We minimize the PageRank score of those pages and also the pages we want to suppress originally. To find the pages of the irreducible subsets we can apply Tarjan SCC algorithm. The MATLAB code of the Tarjan SCC algorithm can be found in [13] 6.5 Example To illustrate the methods described before we use the example defined in Example Define the Google matrix of this problem as G 7. We lower the upper bound to 0.2 instead of 1 for illustration. We call π m21 the PageRank vector and v m21 the personalization vector found with this method. The result would be: π m21 = [ ] v m21 = [ ] Notice the values of the PageRank scores are more realistic, the surfer can teleport to more pages than before and the PageRank score of page 4 is less than the value of π u,4. So in the, we did lower the PageRank score. To try the other method we first look for the pages of the irreducible subsets. As can be seen in matrix S, those pages are I = {4, 5, 6, 7}. We call π m22 the PageRank vector and v m22 the personalization vector found with this method. By minimizing those PageRank scores we get: π m22 = [ ] v m22 = [ ] Notice that personalization vectors from every method meet the requirement that v ml,4=0 for l = {1, 2, 21, 22}. 21

23 6.6 Summary of the results of the four methods To get a better visualization of the results, we order the pages based on their PageRank score. Recall the following: v u is the uniform personalization vector to the PageRank vector π u v m1 is the personalization vector to the PageRank vector π m1 found with method 1. See Section 6.1 for more information. v m2 is the personalization vector to the PageRank vector π m2 found with method 2.. See Section 6.2 for more information. v m21 is the personalization vector to the PageRank vector π m21 found with method 2.2. See Section 6.4 for more information. v m22 is the personalization vector to the PageRank vector π m22 found with method 2.1. See Section 6.4 for more information. The result can be seen in the Table 1, where the best search result is shown at the top of each column. π u π m1 π m2 π m21 π m22 page 4 page 6 page 7 page 7 page 4 page 5 page 7 page 6 page 6 page 5 page 7 page 5 page 4 page 4 page 2 page 6 page 4 page 5 page 5 page 7 page 1 page 3 page 2 page 1 page 6 page 2 page 2 page 3 page 2 page 3 page 3 page 1 page 1 page 3 page 1 Table 1: End result If we take a closer look at the table, we notice that the position of page 4 has dropped for the first three methods. If we only look at the results, we would suspect that method 2 with also minimizing the irreducible subset, would not work so well. It did not decrease the position and the order is also different than the first column. 22

24 7 Numerical results In this experiment, we examine the numerical results of the algorithms used in the previous sections. We will check whether the methods work and whether they lower the PageRank score of the page(s). Since the most methods work for small test problems, we want to know how well this works with other sizes of test problems. In Section 7.1, we test it the methods for two matrices of different size: G 500 and G G 500 is 500 by 500 matrix proposed by Moler and Cleve [1] and G 9914 is a 9914 by 9914 matrix proposed by Gleich [12]. Later on, we show the computation time of the methods. We show the link structure of the web using the MATLAB code spy on the two test problems. This can be seen in Figure 6. (a) Spy plot of G 500 (b) Spy plot of G 9914 Figure 6: Spy plots 7.1 G 500 We only show the test results of G 500. The computational time of every method can be found in Section 7.2. First, we look at order of the pages corresponding to the uniform vector. This can be seen in Table 2 Page PageRank score URL Table 2: PageRank score of G 500 We tried method 2 to optimize the result. The result is given in Table 3. Remark that page 1 is suppressed, but the top 7 pages are different then the order the uniform vector returns. Another thing is, the PageRank score of the pages are not very realistic (page 132 score went up by factor 517). 23

25 Page PageRank score URL Table 3: PageRank scores by using method 2 To get a more realistic PageRank vector, we also minimize the irreducible subsets. After applying Tarjan SCC algorithm, we get as result I = {132, 161}. Note that the pages are also the pages that are on top after applying method 2. The result after suppressing the following pages {1, 132, 161}, this can be seen in Figure 4. Page PageRank score URL Table 4: PageRank scores by using method 2.2 As a result, we get the pages that are closer to page 500 than page 1. The reason is because the pages further away from 1 have less connection than the pages with a small value. This is what we could have expected. We set the PageRank scores in a plot for a better overview. This can be seen in Figure 7. (a) Original (b) PageRank after using method 2 (c) PageRank after using method 2.2 Figure 7: PageRanks of G 500 Notice that the PageRank scores are more realistic when suppressing pages in irreducible subsets. With both methods, the PageRank score of page 1 has dropped, which is the result we aimed for. The results of β could be seen in Table 5. The factor is calculated by β mj β u for j = {1, 2, 21, 22}. 24

26 We also denote the C i as the collection of pages that we want to lower the PageRank score of: The result can be seen in Figure 5. C 1 = {4} C 2 = {1} C 3 = {252} C 4 = {252, 351, 523, 123, 2345, 1553, 9862} Set Matrix Uniform Method 1 Method 2 β u β m1 factor β m2 factor C 1 G C 2 G C 3 G E C 4 G Set Matrix Uniform Method 2.1 Method 2.2 β u β m21 factor β m22 factor C 1 G C 2 G C 3 G C 4 G Table 5: β: the sum of the suppressed pages Remark that for small test problems, the method does not always lower the PageRank score. For larger test problems, however, it does. The rest of the numerical results of G 500 and G 9914 can be found in appix A. 7.2 Computation Time To time the results we make use of the MATLAB code tic toc. The program used was MAT- LAB and the computer used to obtain the results was: MacBook Pro with 2.5 GHz Intel Core i5 and 16 GB 1600 MHz DDR3 The MATLAB code for the next methods are: Method 1 : Random.m Method 2 : OPTx.m Method 2.2 : OPTx2.m Remark that Method 2 and Method 2.1 are the same algorithms. For method 1 we generate 1000 personalization vectors and for method 2.1 we choose as upper bound 0.2. We also explain the abbreviations: Req 1: Requirement that the calculated personalization vector is a probability vector, i.e. v sums up to 1 and v 0 Req 2: Requirement that the calculated PageRank vector is a probability vector, i.e. π sums up to 1 The results can be seen in Table 6. Remark that the first method does not always return a probability vector. This could be seen in G That is because the method used to calculate the PageRank vector is not very accurate. The coefficients of a vector are small, which makes it sensitive to rounding errors. This could be solved if we rescale the PageRank such that π 1 = 1. 25

27 Algorithm G 7 G 500 G 9914 Time Req 1 Req 2 Time Req 1 Req 2 Time Req 1 Req 2 Random.m YES YES YES YES 1677 NO YES OPTx.m YES YES YES YES YES YES OPTx2.m YES YES YES YES YES YES Table 6: Numerical Results The question was whether we could optimize the personalization vector so that we can combat link spamming. It is possible to suppress the pages that are suspicious. The disadvantage of the discussed methods is that there is nothing left of the original order. The results from the methods are different and unpredictable. Using personalization vectors to combat link spamming with these methods is not the optimal way. 26

28 8 Conclusion In the introduction, we introduced the PageRank algorithm Google uses to return the search results. People who want their page to have a higher PageRank score could fool the algorithm by using link spamming. In this report, we discussed link spamming and looked at whether we could combat it by modifying the model, more specifically by changing the personalization vector. First, we wanted to know how large the impact of the personalization vector on the PageRank vector was. We have seen that the difference in PageRank vector is bounded by π, π 1 v 1. By looking at the plots given in Figure 3 and Figure 4, we noticed that a large difference in personalization vectors does not necessarily lead to a large difference in the PageRank vector. An optimal personalization vector should meet the requirement that v i = 0 if page i is suspected of link spamming. There were two methods to find an optimal v. One way is generating a number of personalization vectors and choosing the personalization vector that is optimal. The downside of this method is that it takes too much time and space for large matrices. The other method was using linear programming to find the PageRank vector of the optimal personalization vector. The result was not always realistic. We remodeled the method to get better results. It is more desirable to suppress the effect of link spamming with the personalization vector, but the results are not predictable. It would be preferable to suppress the pages and still maintain the original order of the search results. In this report, we focused on the personalization vector to combat link spamming. The Page- Rank model has other elements we can modify. One of them is the dangling node fix. The dangling node fix is applied by linking the page to all other pages. The dangling node fix could be adjusted to make it link to certain pages only, i.e. those that should not be suppressed. That way the PageRank score of the normal pages are increased in comparison to the pages that we want to suppress. This approach would be better to combat link spamming. This is a recommations for future research. There are also other ways to combat link spamming with personalization vector, such as those of Jeh, Glen and Widom [9] and Kamvar, Sepandar et al. [10] These other methods have not been reviewed or compared to the methods in this paper. 27

29 A End results A.1 G 500 The page(s) we wanted to suppress was: 1 This plot represent the PageRank vector calculated with the uniform personalization vector. Figure 8: Original The next plots represent the PageRank vector calculated with the personalization vector found with the different methods. Hereafter we will show the PageRank vector after applying some modifications to the second method. (a) PageRank after using method 2 (b) PageRank after using method 2.2 (c) PageRank after using method 2.1 (d) PageRank after using method 2.2 Figure 9: PageRanks of G

30 A.2 G 9914 The page(s) we wanted to suppress was: 252 This plot represent the PageRank vector calculated with the uniform personalization vector. Figure 10: Original The next plots represent the PageRank vector calculated with the personalization vector found with the different methods. (a) PageRank after using method 2 (b) PageRank after using method 2.2 (c) PageRank after using method 2.1 (d) PageRank after using method 2.2 Figure 11: PageRanks of G

31 The page(s) we wanted to suppress was: {252, 351, 523, 123, 2345, 1553, 9862} The next plots represent the PageRank vector calculated with the personalization vector found with the different methods. (a) PageRank after using method 2 (b) PageRank after using method 2.2 (c) PageRank after using method 2.1 (d) PageRank after using method 2.2 Figure 12: PageRanks of G

32 B Matlab codes B.0.1 WebH.m function [ P ] = WebH(A) % Generates web h y p e r l i n k matrix. % I f A i s a 2xn matrix with the f i r s t row the o u t p u t s % and second row t he i n p u t s then i t should be made i n t o a % c o n n e c t i v i t y matrix. % Check to see i f i t i s a c o n n e c t i v i t y m a t r i x, or should be made % i n t o a c o n n e c t i v i t y matrix i f max(max(a))==1 n=size (A, 1 ) ; S=A; else n=max(max(a) ) ; S=sparse (A( 1, : ),A( 2, : ), 1, n, n ) ; %Webhyperlink matrix c=sum( S ) ; %f o r r e s c a l i n g for i =1:n i f c ( i )>0 P ( :, i )=S ( :, i )/ c ( i ) ; else P ( :, i )=S ( :, i )+1/n ; %d a n g l i n g nodes P=sparse (P ) ; B.1 Methods to calculate the PageRank vector B.1.1 pagerankpow.m function [ x, cnt ] = pagerankpow (G) % PAGERANKPOW PageRank by power method with no matrix o p e r a t i o n s. % x = pagerankpow (G) i s t h e PageRank o f t h e graph G. % [ x, cnt ] = pagerankpow (G) a l s o counts the number o f i t e r a t i o n s. % There are no matrix o p e r a t i o n s. Only the l i n k s t r u c t u r e % o f G i s used with the power method. % Link s t r u c t u r e [ n, n ] = size (G) ; for j = 1 : n 31

33 L{ j } = find (G( :, j ) ) ; c ( j ) = length (L{ j } ) ; % Power method p =. 8 5 ; d e l t a = (1 p )/ n ; x = ones (n, 1 ) / n ; z = zeros (n, 1 ) ; cnt = 0 ; while max( abs ( x z ) ) >.0001 z = x ; x = zeros (n, 1 ) ; for j = 1 : n i f c ( j ) == 0 x = x + z ( j )/ n ; else x (L{ j }) = x (L{ j }) + z ( j )/ c ( j ) ; x = p x + d e l t a ; cnt = cnt +1; B.1.2 IT.m function [ x, counter ] = IT (S, v, p ) % Powermethod to c a l c u l a t e pageranks v e c t o r n=size (S, 1 ) ; counter =0; x=1/n ones (n, 1 ) ; y=zeros (n, 1 ) ; while max( abs ( x y )) > y=x ; x=p S x+(1 p ) v ; counter=counter +1; B.2 Methods to optimize personalization vector B.2.1 Random.m function [ xm1, vm1 ] = Random(K, set, l ) % Random g e n e r a t i n g numbers a few times to p i c k the b e s t 32

34 % combination f o r pers. v e c t o r s u p r e s s i n g the s e t p =.85; set=sort ( set ) ; n=size (K, 2 ) ; m=size ( set, 2 ) ; xm1=pagerankpow (K) ; xm=sum(xm1( set ) ) ; vm1=1/n ones (n, 1 ) ; for j =1: l ran =1/(n m 1) rand(n m 1,1); ran (n m)=1 (sum( ran ) ) ; vc=ones (n, 1 ) ; vc ( set )=0; c =1; for l =1:n i f vc ( l ) =0 vc ( l )=ran ( c ) ; c=c +1; y=it (K, vc, p ) ; i f sum( y ( set ))<xm xm=sum( y ( set ) ) ; vm1=vc ; xm1=y ; B.2.2 OPTx.m function [ x, v ] = OPTx(K, alpha, c, lower, upper) % Optimize pagerank v e c t o r % K : Webhyperlink matrix % c : 1xn v e c t o r with c ( i )=0 i f page i should be minimized % alpha : a m p l i f i c a t i o n f a c t o r % lower, upper : nx1 v e c t o r o f bounds f o r x ( i ) n=size (K, 1 ) ; b=zeros (n, 1 ) ; 33

35 p =0.85; I = speye (n, n ) ; A = (I alpha sparse (K))/(1 alpha ) ; Aeq=ones ( 1, n ) ; beq =1; x = l i n p r o g ( c,a, b, Aeq, beq, lower, upper ) ; v = ( I p K) x/(1 p ) ; B.2.3 OPTx2.m function [ x, v ] = OPTx2(K, alpha, c, lower, upper) % Optimize pagerank v e c t o r % K : Webhyperlink matrix % c : 1xn v e c t o r with c ( i )=0 i f page i should be minimized % alpha : a m p l i f i c a t i o n f a c t o r % lower, upper : nx1 v e c t o r o f bounds f o r x ( i ) n=size (K, 1 ) ; b=zeros (n, 1 ) ; p =0.85; [ node, i g n o r e ]= scomponents (K) ; %f i n d i r r e d u c i b l e s u b s e t s l =1; for i =1:n i f node ( i ) =1 set ( l )= i ; l=l +1; c ( set )=1; I = speye (n, n ) ; A = (I alpha sparse (K))/(1 alpha ) ; Aeq=ones ( 1, n ) ; beq =1; x = l i n p r o g ( c,a, b, Aeq, beq, lower, upper ) ; v = ( I p K) x/(1 p ) ; 34

36 References [1] Moler, Cleve B. Numerical Computing with MATLAB: Revised Reprint, Chapter 7 Google PageRank. Siam, [2] Wills, Rebecca S. Googles pagerank. The Mathematical Intelligencer 28.4 (2006): [3] Sangers, Alex, and Martin B. van Gijzen. The eigenvectors corresponding to the second eigenvalue of the Google matrix and their relation to link spamming. Journal of Computational and Applied Mathematics 277 (2015): [4] Langville, Amy N., and Carl D. Meyer. Google s PageRank and beyond: The science of search engine rankings., Chapter 6. Princeton University Press, [5] Meyer, Carl D. Matrix analysis and applied linear algebra, Chapter 8. Vol. 2. Siam, [6] Langville, Amy N., and Carl D. Meyer. Deeper inside pagerank, Chapter 6 Tinkering with the Basic PageRank Model. Internet Mathematics 1.3 (2004): [7] Ipsen, Ilse CF, and Rebecca S. Wills. Mathematical properties and analysis of Googles Page- Rank. Bol. Soc. Esp. Mat. Apl 34 (2006): [8] Haveliwala, Taher, and Sepandar Kamvar. The second eigenvalue of the Google matrix. Stanford University Technical Report (2003). [9] Jeh, Glen, and Jennifer Widom. Scaling personalized web search. Proceedings of the 12th international conference on World Wide Web. ACM, [10] Kamvar, Sepandar, et al. Exploiting the block structure of the web for computing pagerank. Stanford University Technical Report (2003). [11] Kamvar, Sepandar, and Taher Haveliwala. The condition number of the PageRank problem [12] Gleich, David F. Stanford CS web. matrices/gleich/wb-cs-stanford.html 2001 [13] Gleich, David F. gaimc : Graph Algorithms In Matlab Code. matlabcentral/fileexchange/24134-gaimc---graph-algorithms-in-matlab-code/ content/gaimc 2009/scomponents.m 35

Calculating Web Page Authority Using the PageRank Algorithm

Calculating Web Page Authority Using the PageRank Algorithm Jacob Miles Prystowsky and Levi Gill Math 45, Fall 2005 1 Introduction 1.1 Abstract In this document, we examine how the Google Internet search engine uses the PageRank algorithm to assign quantitatively

More information

1998: enter Link Analysis

1998: enter Link Analysis 1998: enter Link Analysis uses hyperlink structure to focus the relevant set combine traditional IR score with popularity score Page and Brin 1998 Kleinberg Web Information Retrieval IR before the Web

More information

Computing PageRank using Power Extrapolation

Computing PageRank using Power Extrapolation Computing PageRank using Power Extrapolation Taher Haveliwala, Sepandar Kamvar, Dan Klein, Chris Manning, and Gene Golub Stanford University Abstract. We present a novel technique for speeding up the computation

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1 Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google

More information

A Note on Google s PageRank

A Note on Google s PageRank A Note on Google s PageRank According to Google, google-search on a given topic results in a listing of most relevant web pages related to the topic. Google ranks the importance of webpages according to

More information

The Google Markov Chain: convergence speed and eigenvalues

The Google Markov Chain: convergence speed and eigenvalues U.U.D.M. Project Report 2012:14 The Google Markov Chain: convergence speed and eigenvalues Fredrik Backåker Examensarbete i matematik, 15 hp Handledare och examinator: Jakob Björnberg Juni 2012 Department

More information

Link Mining PageRank. From Stanford C246

Link Mining PageRank. From Stanford C246 Link Mining PageRank From Stanford C246 Broad Question: How to organize the Web? First try: Human curated Web dictionaries Yahoo, DMOZ LookSmart Second try: Web Search Information Retrieval investigates

More information

Mathematical Properties & Analysis of Google s PageRank

Mathematical Properties & Analysis of Google s PageRank Mathematical Properties & Analysis of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca M. Wills Cedya p.1 PageRank An objective measure of the citation importance

More information

Link Analysis. Leonid E. Zhukov

Link Analysis. Leonid E. Zhukov Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization

More information

Analysis and Computation of Google s PageRank

Analysis and Computation of Google s PageRank Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca S. Wills ANAW p.1 PageRank An objective measure of the citation importance of a web

More information

Analysis of Google s PageRank

Analysis of Google s PageRank Analysis of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills AN05 p.1 PageRank An objective measure of the citation importance of a web page [Brin & Page 1998]

More information

The Second Eigenvalue of the Google Matrix

The Second Eigenvalue of the Google Matrix The Second Eigenvalue of the Google Matrix Taher H. Haveliwala and Sepandar D. Kamvar Stanford University {taherh,sdkamvar}@cs.stanford.edu Abstract. We determine analytically the modulus of the second

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/7/2012 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 Web pages are not equally important www.joe-schmoe.com

More information

MATH36001 Perron Frobenius Theory 2015

MATH36001 Perron Frobenius Theory 2015 MATH361 Perron Frobenius Theory 215 In addition to saying something useful, the Perron Frobenius theory is elegant. It is a testament to the fact that beautiful mathematics eventually tends to be useful,

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University. CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu What is the structure of the Web? How is it organized? 2/7/2011 Jure Leskovec, Stanford C246: Mining Massive

More information

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES MICHELE BENZI AND CHRISTINE KLYMKO Abstract This document contains details of numerical

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Updating PageRank. Amy Langville Carl Meyer

Updating PageRank. Amy Langville Carl Meyer Updating PageRank Amy Langville Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC SCCM 11/17/2003 Indexing Google Must index key terms on each page Robots crawl the web software

More information

0.1 Naive formulation of PageRank

0.1 Naive formulation of PageRank PageRank is a ranking system designed to find the best pages on the web. A webpage is considered good if it is endorsed (i.e. linked to) by other good webpages. The more webpages link to it, and the more

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

Data and Algorithms of the Web

Data and Algorithms of the Web Data and Algorithms of the Web Link Analysis Algorithms Page Rank some slides from: Anand Rajaraman, Jeffrey D. Ullman InfoLab (Stanford University) Link Analysis Algorithms Page Rank Hubs and Authorities

More information

Applications. Nonnegative Matrices: Ranking

Applications. Nonnegative Matrices: Ranking Applications of Nonnegative Matrices: Ranking and Clustering Amy Langville Mathematics Department College of Charleston Hamilton Institute 8/7/2008 Collaborators Carl Meyer, N. C. State University David

More information

How does Google rank webpages?

How does Google rank webpages? Linear Algebra Spring 016 How does Google rank webpages? Dept. of Internet and Multimedia Eng. Konkuk University leehw@konkuk.ac.kr 1 Background on search engines Outline HITS algorithm (Jon Kleinberg)

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering

More information

Lecture 7 Mathematics behind Internet Search

Lecture 7 Mathematics behind Internet Search CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU Outline Google is the

More information

Uncertainty and Randomization

Uncertainty and Randomization Uncertainty and Randomization The PageRank Computation in Google Roberto Tempo IEIIT-CNR Politecnico di Torino tempo@polito.it 1993: Robustness of Linear Systems 1993: Robustness of Linear Systems 16 Years

More information

On the mathematical background of Google PageRank algorithm

On the mathematical background of Google PageRank algorithm Working Paper Series Department of Economics University of Verona On the mathematical background of Google PageRank algorithm Alberto Peretti, Alberto Roveda WP Number: 25 December 2014 ISSN: 2036-2919

More information

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University. Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering

More information

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation Graphs and Networks Page 1 Lecture 2, Ranking 1 Tuesday, September 12, 2006 1:14 PM I. II. I. How search engines work: a. Crawl the web, creating a database b. Answer query somehow, e.g. grep. (ex. Funk

More information

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Link Analysis Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages

More information

Affine iterations on nonnegative vectors

Affine iterations on nonnegative vectors Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider

More information

Google Page Rank Project Linear Algebra Summer 2012

Google Page Rank Project Linear Algebra Summer 2012 Google Page Rank Project Linear Algebra Summer 2012 How does an internet search engine, like Google, work? In this project you will discover how the Page Rank algorithm works to give the most relevant

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Krylov Subspace Methods to Calculate PageRank

Krylov Subspace Methods to Calculate PageRank Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10 PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank Graphs

More information

Online Social Networks and Media. Link Analysis and Web Search

Online Social Networks and Media. Link Analysis and Web Search Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information

More information

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS DATA MINING LECTURE 3 Link Analysis Ranking PageRank -- Random walks HITS How to organize the web First try: Manually curated Web Directories How to organize the web Second try: Web Search Information

More information

IR: Information Retrieval

IR: Information Retrieval / 44 IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC

More information

Graph Models The PageRank Algorithm

Graph Models The PageRank Algorithm Graph Models The PageRank Algorithm Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2013 The PageRank Algorithm I Invented by Larry Page and Sergey Brin around 1998 and

More information

Link Analysis. Stony Brook University CSE545, Fall 2016

Link Analysis. Stony Brook University CSE545, Fall 2016 Link Analysis Stony Brook University CSE545, Fall 2016 The Web, circa 1998 The Web, circa 1998 The Web, circa 1998 Match keywords, language (information retrieval) Explore directory The Web, circa 1998

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J MATRIX ANAL APPL Vol 29, No 4, pp 1281 1296 c 2007 Society for Industrial and Applied Mathematics PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES ILSE C F IPSEN AND TERESA M SELEE

More information

A hybrid reordered Arnoldi method to accelerate PageRank computations

A hybrid reordered Arnoldi method to accelerate PageRank computations A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER ILSE C.F. IPSEN AND STEVE KIRKLAND Abstract. The PageRank updating algorithm proposed by Langville and Meyer is a special case

More information

Node Centrality and Ranking on Networks

Node Centrality and Ranking on Networks Node Centrality and Ranking on Networks Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University How works or How linear algebra powers the search engine M. Ram Murty, FRSC Queen s Research Chair Queen s University From: gomath.com/geometry/ellipse.php Metric mishap causes loss of Mars orbiter

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES

PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES ILSE CF IPSEN AND TERESA M SELEE Abstract We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic

More information

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson) Link Analysis Web Ranking Documents on the web are first ranked according to their relevance vrs the query Additional ranking methods are needed to cope with huge amount of information Additional ranking

More information

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds MAE 298, Lecture 8 Feb 4, 2008 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in a file-sharing

More information

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,

More information

Complex Social System, Elections. Introduction to Network Analysis 1

Complex Social System, Elections. Introduction to Network Analysis 1 Complex Social System, Elections Introduction to Network Analysis 1 Complex Social System, Network I person A voted for B A is more central than B if more people voted for A In-degree centrality index

More information

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Jevin West and Carl T. Bergstrom November 25, 2008 1 Overview There

More information

Lecture 12: Link Analysis for Web Retrieval

Lecture 12: Link Analysis for Web Retrieval Lecture 12: Link Analysis for Web Retrieval Trevor Cohn COMP90042, 2015, Semester 1 What we ll learn in this lecture The web as a graph Page-rank method for deriving the importance of pages Hubs and authorities

More information

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains Chapter 0 Finite-State Markov Chains Introductory Example: Googling Markov Chains Google means many things: it is an Internet search engine, the company that produces the search engine, and a verb meaning

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer UpdatingtheStationary VectorofaMarkovChain Amy Langville Carl Meyer Department of Mathematics North Carolina State University Raleigh, NC NSMC 9/4/2003 Outline Updating and Pagerank Aggregation Partitioning

More information

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web directories Yahoo, DMOZ, LookSmart

More information

c 2005 Society for Industrial and Applied Mathematics

c 2005 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 27, No. 2, pp. 305 32 c 2005 Society for Industrial and Applied Mathematics JORDAN CANONICAL FORM OF THE GOOGLE MATRIX: A POTENTIAL CONTRIBUTION TO THE PAGERANK COMPUTATION

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

A Fast Two-Stage Algorithm for Computing PageRank

A Fast Two-Stage Algorithm for Computing PageRank A Fast Two-Stage Algorithm for Computing PageRank Chris Pan-Chi Lee Stanford University cpclee@stanford.edu Gene H. Golub Stanford University golub@stanford.edu Stefanos A. Zenios Stanford University stefzen@stanford.edu

More information

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains, Random Walks on Graphs, and the Laplacian Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer

More information

Updating Markov Chains Carl Meyer Amy Langville

Updating Markov Chains Carl Meyer Amy Langville Updating Markov Chains Carl Meyer Amy Langville Department of Mathematics North Carolina State University Raleigh, NC A. A. Markov Anniversary Meeting June 13, 2006 Intro Assumptions Very large irreducible

More information

Extrapolation Methods for Accelerating PageRank Computations

Extrapolation Methods for Accelerating PageRank Computations Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Stanford University sdkamvar@stanford.edu Taher H. Haveliwala Stanford University taherh@cs.stanford.edu Christopher D. Manning

More information

The Push Algorithm for Spectral Ranking

The Push Algorithm for Spectral Ranking The Push Algorithm for Spectral Ranking Paolo Boldi Sebastiano Vigna March 8, 204 Abstract The push algorithm was proposed first by Jeh and Widom [6] in the context of personalized PageRank computations

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #9: Link Analysis Seoul National University 1 In This Lecture Motivation for link analysis Pagerank: an important graph ranking algorithm Flow and random walk formulation

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides Web Search: How to Organize the Web? Ranking Nodes on Graphs Hubs and Authorities PageRank How to Solve PageRank

More information

MAA704, Perron-Frobenius theory and Markov chains.

MAA704, Perron-Frobenius theory and Markov chains. November 19, 2013 Lecture overview Today we will look at: Permutation and graphs. Perron frobenius for non-negative. Stochastic, and their relation to theory. Hitting and hitting probabilities of chain.

More information

Application. Stochastic Matrices and PageRank

Application. Stochastic Matrices and PageRank Application Stochastic Matrices and PageRank Stochastic Matrices Definition A square matrix A is stochastic if all of its entries are nonnegative, and the sum of the entries of each column is. We say A

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

Class President: A Network Approach to Popularity. Due July 18, 2014

Class President: A Network Approach to Popularity. Due July 18, 2014 Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code

More information

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006. Markov Chains As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006 1 Introduction A (finite) Markov chain is a process with a finite number of states (or outcomes, or

More information

Page rank computation HPC course project a.y

Page rank computation HPC course project a.y Page rank computation HPC course project a.y. 2015-16 Compute efficient and scalable Pagerank MPI, Multithreading, SSE 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and

More information

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search 6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search Daron Acemoglu and Asu Ozdaglar MIT September 30, 2009 1 Networks: Lecture 7 Outline Navigation (or decentralized search)

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

Applications to network analysis: Eigenvector centrality indices Lecture notes

Applications to network analysis: Eigenvector centrality indices Lecture notes Applications to network analysis: Eigenvector centrality indices Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the second part of the course Nonnegative and spectral matrix

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University We ve had our first HC cases. Please, please, please, before you do anything that might violate the HC, talk to me or a TA to make sure it is legitimate. It is much

More information

MITOCW ocw f99-lec30_300k

MITOCW ocw f99-lec30_300k MITOCW ocw-18.06-f99-lec30_300k OK, this is the lecture on linear transformations. Actually, linear algebra courses used to begin with this lecture, so you could say I'm beginning this course again by

More information

Combating Web Spam with TrustRank

Combating Web Spam with TrustRank Combating Web Spam with rustrank Authors: Gyöngyi, Garcia-Molina, and Pederson Published in: Proceedings of the 0th VLDB Conference Year: 00 Presentation by: Rebecca Wills Date: April, 00 Questions we

More information

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property Chapter 1: and Markov chains Stochastic processes We study stochastic processes, which are families of random variables describing the evolution of a quantity with time. In some situations, we can treat

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 2 Web pages are important if people visit them a lot. But we can t watch everybody using the Web. A good surrogate for visiting pages is to assume people follow links

More information

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211

PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 PV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211 IIR 18: Latent Semantic Indexing Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

More information

Cutting Graphs, Personal PageRank and Spilling Paint

Cutting Graphs, Personal PageRank and Spilling Paint Graphs and Networks Lecture 11 Cutting Graphs, Personal PageRank and Spilling Paint Daniel A. Spielman October 3, 2013 11.1 Disclaimer These notes are not necessarily an accurate representation of what

More information

Probability & Computing

Probability & Computing Probability & Computing Stochastic Process time t {X t t 2 T } state space Ω X t 2 state x 2 discrete time: T is countable T = {0,, 2,...} discrete space: Ω is finite or countably infinite X 0,X,X 2,...

More information

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure Stat260/CS294: Spectral Graph Methods Lecture 19-04/02/2015 Lecture: Local Spectral Methods (2 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Introduction to Algebra: The First Week

Introduction to Algebra: The First Week Introduction to Algebra: The First Week Background: According to the thermostat on the wall, the temperature in the classroom right now is 72 degrees Fahrenheit. I want to write to my friend in Europe,

More information

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 10 Graphs II Rainer Gemulla, Pauli Miettinen Jul 4, 2013 Link analysis The web as a directed graph Set of web pages with associated textual content Hyperlinks between webpages

More information

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors Michael K. Ng Centre for Mathematical Imaging and Vision and Department of Mathematics

More information