Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization of Networks Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 1 / 21
Lecture outline 1 Graph-theoretic definitions 2 Web search engines 3 Web page ranking algorithms Pagerank HITS 4 The Web as a graph 5 PageRank beyond the web Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 2 / 21
Graph theory Graph G(E, V ), V = n, E = m Adjacency matrix A n n, A ij, edge i j Graph is directed, matrix is non-symmetric: A T A, A ij A ji Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 3 / 21
Graph theory sinks: zero out degree nodes, k out (i) = 0, absorbing nodes sources: zero in degree nodes, k in (i) = 0 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 4 / 21
Graph theory Graph is strongly connected if every vertex is reachable form every other vertex. Strongly connected components are partitions of the graph into subgraphs that are strongly connected In strongly connected graphs there is a path is each direction between any two pairs of vertices image from Wikipedia Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 5 / 21
Graph theory A directed graph is aperiodic if the greatest common divisor of the lengths of its cycles is one (there is no integer k >1 that divides the length of every cycle of the graph) image from Wikipedia Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 6 / 21
Web search engine The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page, 1998 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21
Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal) Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 8 / 21
Ranking on directed graph iteratively update r i j N(i) r j = j A ji r j r t+1 i = j A ji rj t, with rj t=0 = rj 0 r t+1 = A T r t, r t=0 = r 0 norm r t+1 r t Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 9 / 21
Ranking on directed graph Absorbing nodes Source nodes Cycles r t+1 = A T r t, r t=0 = r 0 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 10 / 21
PageRank PageRank can be thought of as a model of user behavior. We assume there is a random surfer who is given a web page at random and keeps clicking on links, never hitting back but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. PR(A) = (1 d) + d(pr(t 1)/C(T 1) +... + PR(Tn)/C(Tn)) Sergey Brin and Larry Page, 1998 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 11 / 21
Random walk Random walk on a directed graph p t+1 i = j N(i) p t j d out j = j A ji dj out p j D ii = diag{d out i } p t+1 = (D 1 A) T p t p t+1 = P T p t Markov chain with transition probability matrix P = D 1 A lim t pt = π Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 12 / 21
Perron-Frobenius Theorem Perron-Frobenius theorem (Fundamental Theorem of Markov Chains) If matrix is then stochastic (non-negative and rows sum up to one, describes Markov chain) irreducible (strongly connected graph) aperiodic lim t p t = π and can be found as a left eigenvector πp = π, where π 1 = 1 π - stationary distribution of Markov chain, raw vector Oscar Perron, 1907, Georg Frobenius,1912. Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 13 / 21
PageRank Transition matrix: P = D 1 A Stochastic matrix: P = P + set n PageRank matrix: P = αp + (1 α) eet n Eigenvalue problem (choose solution with λ = 1): P T p = λp Notations: e - unit column vector, s - absorbing nodes indicator vector (column) Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 14 / 21
PageRank computations Eigenvalue problem (λ = 1, p 1 = p T e = 1): (αp + (1 α) eet n ) T p = λp p = αp T p + (1 α) e n Power iterations: p αp T p + (1 α) e n Sparse linear system: (I αp T )p = (1 α) e n Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 15 / 21
Graph structure of the web Bow tie structure of the web Andrei Broder et al, 1999 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 16 / 21
Hubs and Authorities (HITS) Citation networks. Reviews vs original research (authoritative) papers authorities, contain useful information, a i hubs, contains links to authorities, h i Mutual recursion Good authorities reffered by good hubs a i j A ji h j Good hubs point to good authorities h i j A ij a j Jon Kleinberg, 1999 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 17 / 21
HITS System of linear equations a = αa T h h = βaa Symmetric eigenvalue problem (A T A)a = λa (AA T )h = λh where eigenvalue λ = (αβ) 1 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 18 / 21
HITS Focused subgraph of WWW Jon Kleinberg, 1999 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 19 / 21
PageRank beyond the Web from David Gleich Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 20 / 21
References The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin and Lawrence Page, 1998 Authoritative Sources in a Hyperlinked Environment. Jon M. Kleinberg, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1999 Graph structure in the Web, Andrei Broder et all. Procs of the 9th international World Wide Web conference, 2000 A Survey of Eigenvector Methods of Web Information Retrieval. Amy N. Langville and Carl D. Meyer, 2004 PageRank beyond the Web. David F. Gleich, arxiv:1407.5107, 2014 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 21 / 21