Graphs and Networks Lecture 5. PageRank. Lecturer: Daniel A. Spielman September 20, PDF Free Download

Graphs and Networks Lectre 5 PageRank Lectrer: Daniel A. Spielman September 20, 2007 5.1 Intro to PageRank PageRank, the algorithm reportedly sed by Google, assigns a nmerical rank to eery web page. More important pages get higher rankings. The more in-links a page has, the higher its ranking shold be. Bt, more importantly, a page has a higher rank if it is pointed to by high-rank pages. Low-rank pages inflence the rank of a page less. If one page points to many others, it will hae less inflence on their rankings than if it jst points to a few. To algebraicize this intitiely appealing idea, PageRank treats the web as a directed graph, with web pages as ertices and links as directed edges. The rank of ertex is denoted r(), and is spposed to satisfy the formla: r() = r()/d + (), (5.1) :(,) E where d + () is the nmber of edges going ot of. Note that this sm is oer edges going in to. To express this in matrix form, we let D + be the diagonal matrix whose th diagonal is d + (). We then A be the directed adjacency matrix of the graph, where A(, ) = 1 if there is an edge from to. Yes, I know that this looks backwards. Bt, it is what I hae to do if I want to make r be a colmn ector. We then find that r mst satisfy the eqation r = AD 1 + r. (5.2) That is to say that r is an eigenector of eigenale 1 of the matrix AD 1 +. Howeer, AD 1 + is not a symmetric matrix, and is not in any way similiar to a symmetric matrix. So, some of the eigenales of this matrix can be complex, it might not hae n eigenectors, and the eigenectors it does hae can hae complex entries. Neertheless, in this lectre we will show that 1. If the graph has no ertices of ot-degree 0, then 1 is an eigenale. 2. If the graph is strongly connected, then the eigenale 1 has mltiplicity 1. 3. If the graph is strongly connected, then the niqe soltion (5.2) is strictly positie. 5-1

Lectre 5: September 20, 2007 5-2 Before I go frther, I wold like to point ot that this measre of importance was first sggested in the social network commnity in the paper by Phillip Bonacich, Factoring and weighting approaches to stats scores and cliqe identification, Jornal of Mathematical Sociology, 1972. I shold also point ot that r can be nderstood as the stable distribtion of the directed random walk on the graph G. Bt, random walks on directed graphs are more complicated than on ndirected graphs. 5.2 Eigenale 1 Set M def = AD 1 + Lemma 5.2.1. If G has no ertices of ot-degree 0, then 1 is an eigenale of M. Proof. If G has no ertices of ot-degree 0, then eery colmn of A has at least one non-zero entry. In fact, the th colmn of A has d + () non-zero entries, so the th colmn of AD 1 + has sm 1. This implies that 1M = 1, and so M has an eigenector of eigenale 1. This is similar to the ndirected case in both cases the walk matrix has the ector 1 as a lefteigenector. Howeer, it differs in that we do not know any simple expression for the corresponding right-eigenector, r. Lemma 5.2.2. If G is strongly connected, then the eigenale 1 has mltiplicity 1. In particlar, if M =, then we mst hae = c1 for some constant c. The proof of this is similar to the proof in the ndirected case, so we will skip it. 5.3 r is positie Lemma 5.3.1. If G is strongly connected and if the soltion of (5.2) is non-negatie, then it is positie. Proof. First, note that the soltion to (5.2) cannot be the all-zero ector. So, if it is non-negatie, it mst hae at leae one positie coordinate. So, assme that r(z) > 0. Now, let be any node

Lectre 5: September 20, 2007 5-3 that z points to. Eqation (5.1) tells s that r() = :(,) E r(z)/d + (z) > 0. r()/d + () In general, for eery node z for which r(z) > 0, eery node that z points to mst hae r() > 0. Since the graph is strongly connected, we can apply indction to show that r() > 0 for all V. Now, we mst show that r is non-negatie. To do this, we will consider the matrix We need to establish a few properties of M. M def = 1 n 1 M i. n Claim 5.3.2. If M r = r, then M r = r. Similarly, 1M = 1. Claim 5.3.3. The matrix M has no negatie or zero entries. Proof. As M is non-negatie, it follows immediately that M is non-negatie. To show that M has no zero entries, note that M t (b, a) is eqal to the probability that a random walk starting at a hits b in exactly t time steps. As the graph is strongly connected, for eery pair of ertices a and b, there is some t less than n for which this probability is non-zero (yo may proe this in the same way yo proed 5.3.1). As M (b, a) is the aerage of these probabilities for t between 0 and n, it is non-zero as well. Theorem 5.3.4. The eqation M r = r has a non-negatie soltion. Proof. We will show that it has a soltion in which all the signs are the same, which implies that it has a non-negatie soltion (flip all signs if necessary). Bt, we will work with the matrix M, which we proed also satisfies M r = r. (5.3) Assme by way of contradiction that r is not sign-niform. That is, that r has both positie and negatie entries. We will se the fact that if x is some ector with both positie and negatie entries, then x () < x (). From eqation (5.3), we hae that for all, i=0 r() = M (, )r(),

Lectre 5: September 20, 2007 5-4 and so r() = M (, )r(). As we hae assmed that r is not sign-niform, and M (, ) is always positie, we hae the ineqality M (, )r() < M (, ) r(), which implies r() < M (, ) r(). If we now sm oer all, we get r() < = = = M (, ) r() M (, ) r() r() M (, ) r(), as 1M (, ) = 1 is eqialent to Bt, we hae deried a contradiction. M (, ) = 1. 5.4 Closer to PageRank Brin and Page tell s that they don t actally take A to be the original web graph. Rather, they consider a random srfer who actally jmps to a random web page with some fixed probability at each time step. We can model this by inclding an edge between all pairs of ertices, giing that edge low weight. Since we haen t discssed weighting edges yet, let me instead say that this is eqialent to forcing r to satisfy the eqation ((1 α)m + αj /n)r = r, (5.4) where α is the probability of jmping to a random web page at any moment, and J is the all-1s matrix. This eqation is actally mch nicer than the original. First of all, it gies s an all-positie matrix. So, we know that the soltion will be all positie. It also eliminates the isse of nodes with no ot-edges.

Lectre 5: September 20, 2007 5-5 If we decide that we are going to normalize r so that 1r = 1, then we hae that so eqation (5.4) becomes which is eqialent to and J r = 1, (1 α)m r + (α/n)1 = r, (α/n)1 = (I (1 α)m )r, ((I (1 α)m )) 1 (α/n)1 = r. That is, r is now gien by the soltion to a system of linear eqations. Een better, we can sole these eqations qickly. We hae that (I (1 α)m )) 1 = ((1 α)m ) t. (This is jst like the formla yo learned for 1/(1 x), bt for matrices. It is tre as long as the sm conerges). Moreoer, this sm conerges ery qickly. Brin and Page sggest sing α =.15. We know that eery entry of M t is at most 1, so eery entry of ((1 α)m ) t is at most 0.85 t, which becomes small ery qickly as we increase t. So, we can qickly approximate r by sing the first few terms from this series. t=0

Graphs and Networks Lecture 5. PageRank. Lecturer: Daniel A. Spielman September 20, 2007