ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 8 Random Walks, Matrices and PageRank
Graphs Graph G = (V,E) V = vertices, a.k.a. nodes E = edges, a.k.a. links Edge = node pair Edge (x,y) joins nodes x, y Undirected edge (x,y) No order between x and y Directed edge (x,y) x and y ordered Initial node x Terminal node y node x node y edge (x,y)
Graph ranking queries Rank order nodes for a particular query Find top k web pages about a topic Top k friend recommendations when joining social network Using both topological and non-topological attributes Topological component to rank Relevance as measured by (in) degree Semantic relevance Does page content match query topic? Overlap: relevance also inferred from content of related
Definitions Adjacency matrix A Transition matrix P /2 /2 4
What is a random walk t=0 /2 /2 5
What is a random walk t=0 t= /2 /2 /2 /2 6
What is a random walk t=0 t= /2 /2 /2 /2 t=2 /2 /2 7
What is a random walk t=0 t= /2 /2 /2 /2 t=2 t=3 /2 /2 /2 /2 8
Probability Distributions x t (i) = probability that the surfer is at node i at time t x t+ (i) = j (Probability of being at node j)*pr(j->i) = j x t (j)*p(j,i) x t+ = x t P = x t- *P*P= x t-2 *P*P*P = =x 0 P t What happens when the walker keeps walking for a long time? 9
Stationary Distribution When the walker keeps walking for a long time When the distribution does not change anymore i.e. x T+ = x T For well-behaved graphs this does not depend on the start distribution!! 0
What is a stationary distribution? The stationary distribution at a node is related to the amount of time a random walker spends visiting that node. Remember that we can write the probability distribution at a node as x t+ = x t P For the stationary distribution v 0 we have v 0 = v 0 P v 0 is left eigenvector of the transition matrix with eigenvalue
Questions Does a stationary distribution always exist? Is it unique? Yes, if the graph is well-behaved. How fast will the random walker approach this stationary distribution? Mixing Time 2
Well-behaved graphs Irreducible: There is a path from every node to every other node. Irreducible Not irreducible 3
Well behaved graphs Aperiodic: The Greatest Common Divisor (GCD) of all cycle lengths is. This GCD is also called period. Periodicity is 3 Aperiodic 4
Perron-Frobenius Theorem Assume P: a real square matrix with strictly positive entries PF Theorem: P has eigenvalue λ of maximal λ such that λ is real and positive λ is simple only one linearly independent eigenvector v with eigenvalue λ The eigenvector v has strictly positive entries If P has nonnegative entries, λ may not be unique Example: nxn identity matrix I(n has n linear independent eigenvectors of eigenvalue
Implications of Perron-Frobenius Theorem Suppose random walk is irreducible and aperiodic. The PF eigenvalue of the transition matrix will be equal to and all the other eigenvalues will be strictly less than. Let the eigenvalues of P be {σ i i=0:n-} in non-increasing order of σ i. σ 0 = > σ > σ 2 >= >= σ n These results imply that for a well behaved graph there exists an unique stationary distribution. 6
Test out these ideas on undirected graphs Suppose G an undirected connected graph Unit edge weights: A(i,j) = A(j,i) = if (i,j) is an edge Transition probability P(i,j) = / n i If at i, then equally likely to transition to any of the n i neighbors If G is connected then P irreducible There is a path connecting any two nodes Suppose shortest path between i and j has length m Can move from vertex i to j in m time steps: (Pm ) ij > 0
What is stationary distribution? P ij = /n i if j N(i) = set of neighbors if i, 0 otherwise Stationary distribution v: v j = Σ i v i P ij Prove that v j proportional to n j : Σ i n i P ij = Σ i N(j) n i / n i = Σ i N(j) = n j Intuition: The more neighbors a node has, the more likely it is to be visited.
Convergence to stationary distribution Need aperiodic in addition to irreducible If periodic: P m = I (the indentity matrix) some m up m = u for any initial u, up n does not converge to stationary distribution v o as nà Simple example: period = 2 P =! # " 0 0 $ & P 2 = %! # " 0 0 $ & % 9
Ranking algorithms on the web Pagerank (Page & Brin, 998) Directed graph A webpage is important if other important pages point to it. Expect stationary distribution to look like: v(i) = j N in (i ) v( j) n out ( j) N in (i) = inbound neighbors to i, N in (i) = #n in (i) N out (j) = inbound neighbors from j, N out (j) = #n out (j) 20
Iterative computation of pagerank Aim: rank pages i according to stationary distribution v 0 Method Iterative computation Start with initial page rank x o Divide page rank x 0,i at page i over all vertices linked to from i i.e. over all j in N out (i) Iterate This is just iteration x t+ = x t P x t à stationary v 0 under Perron-Frobenius conditions
Pagerank & Perron-Frobenius Perron Frobenius needs graph irreducible & aperiodic. But how can we guarantee that for the web graph? Do it with a small restart probability c. At any time-step the random walker jumps (teleport) to any other node with probability c jumps to its direct neighbors with total probability -c. ~ P = ( c)p + cu for U ij = n i, j Effectively: Introduce fully connected graph with link weights c 22
Power iteration Power Iteration is an algorithm for computing the stationary distribution. Start with any distribution x 0 Keep computing x t+ = x t P Stop when x t+ and x t are almost the same. 23
Power iteration Why should this work? Write x 0 as a linear combination of the left eigenvectors {v 0, v,, v n- } of P Remember that v 0 is the stationary distribution. x 0 = v 0 + c v + c 2 v 2 + + c n- v n- 24
Power iteration x 0 v 0 v v 2. v n- c c 2 c n- 25
Power iteration ~ x = x P 0 v 0 v v 2. v n- σ 0 σ c σ 2 c 2 σ n- c n- 26
Power iteration ~ x = x P 2 ~ 2 = x P 0 v 0 v v 2. v n- σ 0 2 σ 2 c σ 22 c 2 σ n-2 c n- 27
Power iteration ~ t x = x P t 0 v 0 v v 2. v n- σ 0 t σ t c σ 2t c 2 σ n-t c n- 28
Power iteration ~ t x = x P t 0 σ 0 = > σ σ n v 0 v v 2. v n- σ t c σ 2t c 2 σ n-t c n- 29
Power iteration x σ 0 = > σ σ n v 0 v v 2. v n- 0 0 0 30
Convergence Analysis x 0 = v 0 + c v + c 2 v 2 + + c n- v n- x t = x 0 P t = (v 0 + c v + c 2 v 2 + + c n- v n- )P t = v 0 + c v σ t + c 2 v 2 σ 2 t + + c n- v n- σ n- t x 0 P t v 0 a λ t λ = σ is eigenvalue with second largest magnitude The smaller the second largest eigenvalue (in magnitude), the faster the mixing. For λ< there exists an unique stationary distribution, namely the first left eigenvector of the transition matrix. 3
Pagerank and convergence The transition matrix pagerank uses is ~ P = ( c)p+cu Strictly positive entries: Perron-Frobeniousà is simple eigenvalue, others λ < The second largest eigenvalue (-c) Convergence rate determined by c Trade-off Larger c: faster convergence, but result less specific to original P Trivial example: c = à x = x = uniform distribution 32
Pagerank generalized Pagerank: seek stationary distribution as solution to ~ v=vp = ( c)vp+cvu = (-c)vp + cr ~ v=vp r i = (vu) i = Σ j v j U ji = n - Σ j v j = uniform in i, does not change over time What happens if r is non-uniform? 33
Personalized Pagerank Non-uniform r à non-uniform teleportation distribution Regard r as non-uniform preference vector e.g. specific to an user. Resulting v gives personalized views of the web. Convergence? ~ P = (-c)p +cu becomes (-c)p + c T r Can show second largest eigenvalue λ < - c as before 34
Personalized Page rank How to determine appropriate r for user During user query: inferring preferences from query Recompute pagerank? Costly Precomputation Preclassify pages according to category Each category c à preference vector r c Teleportation preferentially to random page within category Precompute pagerank v c for each category Query runtime Map query q to category c(q) and use pagerank v c(q) 35
Multiple Categories Page rank v r is linear function of preference vector r Formally: v = (-c)vp + r à v =r ( (-c) P) -! a $! $! 0$ # & # & # & r= # 0 & v(r)=av(r 0 )+(-a)v(r 2 ) r 0 = # 0&,r 2 = # 0& # " -a& # % " 0& # % " & % At Query runtime: Attribute query q to class c with weight α c > 0 : Σ c α c = Return page rank vector as linear combination of precomputed page rank vectors v c over categories v = : Σ c α c v c 36
Rank stability How does the ranking change when the link structure changes? The web-graph is changing continuously. How does that affect page-rank? 37
Rank stability ~ P Theorem: if v is the left eigenvector of. Let the pages i, i 2,, i k be changed in any way, and let v be the new pagerank. Then v v' = ( c)p + cu So if c is not too close to 0, the ranks are stable and still have good time convergence k ) j= v(i j c ~ P 38
Acknowledgements and References Based in part on slides by Purnamrita Sarkar, CMU The anatomy of a large-scale hypertextual Web search engine, Brin and Page, 998 The Second Eigenvalue of the Google Matrix, Haveliwala &. Kamvar, Stanford University Technical Report, 2003. Topic-sensitive PageRank, Haveliwala, WWW2002, Scaling Personalized Web Search, Jeh & Widom. WWW, 2003 Random Walks on Graphs: A Survey, Laszlo Lovasz Reversible Markov Chains and Random Walks on Graphs, Aldous & Fill 39