Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Similar documents
Data and Algorithms of the Web

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

Link Mining PageRank. From Stanford C246

CS249: ADVANCED DATA MINING

CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University

Slides based on those in:

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

0.1 Naive formulation of PageRank

A Note on Google s PageRank

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

IR: Information Retrieval

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Link Analysis. Stony Brook University CSE545, Fall 2016

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Link Analysis. Leonid E. Zhukov

Online Social Networks and Media. Link Analysis and Web Search

1998: enter Link Analysis

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Data Mining and Matrices

How does Google rank webpages?

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Google Page Rank Project Linear Algebra Summer 2012

Review of Linear Algebra

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Math 304 Handout: Linear algebra, graphs, and networks.

Computing PageRank using Power Extrapolation

Pr[positive test virus] Pr[virus] Pr[positive test] = Pr[positive test] = Pr[positive test]

Jeffrey D. Ullman Stanford University

Page rank computation HPC course project a.y

Lesson Plan. AM 121: Introduction to Optimization Models and Methods. Lecture 17: Markov Chains. Yiling Chen SEAS. Stochastic process Markov Chains

Jeffrey D. Ullman Stanford University

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Application. Stochastic Matrices and PageRank

ECEN 689 Special Topics in Data Science for Communications Networks

Online Social Networks and Media. Link Analysis and Web Search

eigenvalues, markov matrices, and the power method

IS4200/CS6200 Informa0on Retrieval. PageRank Con+nued. with slides from Hinrich Schütze and Chris6na Lioma

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

Lecture 12: Link Analysis for Web Retrieval

Lecture Notes in Linear Algebra

Link Analysis Ranking

Conditioning of the Entries in the Stationary Vector of a Google-Type Matrix. Steve Kirkland University of Regina

Today. Next lecture. (Ch 14) Markov chains and hidden Markov models

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Updating PageRank. Amy Langville Carl Meyer

Introduction to Data Mining

Matrices. Chapter Definitions and Notations

Uncertainty and Randomization

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains

Data Mining Recitation Notes Week 3

Calculating Web Page Authority Using the PageRank Algorithm

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

c c c c c c c c c c a 3x3 matrix C= has a determinant determined by

MAT 1302B Mathematical Methods II

Linear Algebra and Eigenproblems

Properties of Matrices and Operations on Matrices

Eigenvalues and Eigenvectors

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

CS 246 Review of Linear Algebra 01/17/19

Lecture 7 Mathematics behind Internet Search

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Linear Algebra Review. Vectors

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

OR MSc Maths Revision Course

The Google Markov Chain: convergence speed and eigenvalues

Node Centrality and Ranking on Networks

On the mathematical background of Google PageRank algorithm

3 (Maths) Linear Algebra

Krylov Subspace Methods to Calculate PageRank

The Theory behind PageRank

8 Extremal Values & Higher Derivatives

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Introduction to Information Retrieval

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

Image Registration Lecture 2: Vectors and Matrices

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Chapter 3 Transformations

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Linear Algebra V = T = ( 4 3 ).

Linear Algebra Solutions 1

Section 9.2: Matrices. Definition: A matrix A consists of a rectangular array of numbers, or elements, arranged in m rows and n columns.

Markov chains (week 6) Solutions

Elementary maths for GMT

Transcription:

Google PageRank Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano fricci@unibz.it 1

Content p Linear Algebra p Matrices p Eigenvalues and eigenvectors p Markov chains p Google PageRank 2

Literature p C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. Chapter 21 p Markov chains description on wikipedia p Amy N. Langville & Carl D. Meyer, Google's PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006. 3

Google p Google is the leading search and online advertising company - founded by Larry Page and Sergey Brin (Ph.D. students at Stanford University) p googol or 10 100 is the mathematical term Google was named after p Google s success in search is largely based on its PageRank algorithm p Gartner reckons that Google now make use of more than 1 million servers, spitting out search results, images, videos, emails and ads p Google reports that it spends some 200 to 250 million US dollars a year on IT equipment. 4

Matrices p p p p p A Matrix is a rectangular array of numbers a A = a 11 21 1 = 4 a ij is the element of matrix A in row i and column j A is said to be a n x m matrix if it has n rows and m columns A square matrix is a n x n matrix a a 12 22 a a 13 23 The transpose A T of a matrix A is the matrix obtained by exchanging the rows and the columns! A T = " T a 11 T a 21 T a 31 T a 12 T a 22 T a 32 $! & & & = & & % " a 11 a 21 a 12 a 22 a 13 a 23 2 5 3 6 $ &! & & = & & " % 1 4 2 5 3 6 $ & & & & % & 5

Exercise p What is the size of these matrices p Compute their transpose 6

Exercise p What is the size of these matrices 2x3 3x1 3X4 p Compute their transpose " $ $ $ $ $ 1 20 13 9 5 6 % ' ' ' ' &'! " 4 1 8 $ " $ $ $ $ $ $ $ 0 1 2 1 0 1 2 1 0 3 2 1 % ' ' ' ' ' ' &' 7

Matrices p A square matrix is diagonal iff has a ij = 0 for all i j a A = 0 0 a22 p The Identity matrix 1 is the diagonal matrix with 1 s along the diagonal p A symmetric matrix A satisfy the condition A=A T 11 1 A = 0 0 1 8

Exercise p Is a diagonal matrix symmetric? p Make an example of a symmetric matrix p Make an example of a 2x3 symmetric matrix 9

Exercise p Is a diagonal matrix symmetric? n YES because if it is diagonal then a ij = 0 for all i j, hence a ij = a ji for all i j p Make an example of a symmetric matrix! " 1 2 2 3 $ & % p Make an example of a 2x3 symmetric matrix n Impossible, a symmetric matrix is a square matrix 10

Vectors p A vector v is a one-dimensional array of numbers (is an n x 1 matrix column vector) p Example:! v = " p The standard form of a vector is a column vector 3 5 7 p The transpose of a column vector v T =(3 5 7) is a row vector. $ & & & % 11

Operation on matrices p Addition: A=(a ij ), B=(b ij ), C=(c ij ) = A+B n c ij = a ij + b ij p Scalar multiplication: λ is a number, λ A = (λa ij ) p Multiplication: if A and B are compatible, i.e., the number of columns of A is equal to the number of rows of B, then n C=(c ij )= AB n c ij = Σ k a ik b kj 12

Examples! " 1 2 3 4 5 6! $ & & % " 1 4 2 5 3 6 $ & &! & = " & % 1*1+ 2 * 2 + 3*3 1* 4 + 2 * 5+ 3* 6 4*1+ 5* 2 + 6 *3 4* 4 + 5* 5+ 6 * 6 $! & = % " 14 32 32 77 $ & %! " 1 1 0 1 $! & %" 1 1 0 1 $ & = %! " 1 0 0 1 $ & % It is symmetric. Is it a general fact? Is AA T always symmetric? p If AB=1, then B is said to be the inverse of A and is denoted with A -1 p If a matrix has an inverse is called invertible or non singular 13

Exercise p Compute the following operations 14

Exercise p Compute the following operations 15

Rank of a Matrix p The row (column) rank of a matrix is the maximum number of rows (columns) that are linearly independent p The vectors v 1,, v n are linearly independent iff there is no linear combination a 1 v 1 + + a n v n (with coefficients a i not all 0) of the vectors that is equal to 0 p Example 1: (1 2 3), (1 4 6), and (0 2 3) are linearly dependent: show it p Example 2: (1 2 3) and (1 4 6) are not linearly dependent: show it p The kernel of a matrix A is the subspace of vectors v such that Av=0 16

Exercise solution p 1*(1 2 3) T -1*(1 4 6) T + 1*(0 2 3) T =(0 0 0) T p! " 1 1 0 2 4 2 3 6 3 $! & & & %" 1 1 1 $! & & = & % " 0 0 0 $ & & & % (1-1 1) T is in the kernel of the matrix:! " 1 1 0 2 4 2 3 6 3 $ & & & % p a*(1 2 3) + b*(1 4 6) = (0 0 0) n Then a=-b and also a = -2b, absurd. 17

Rank and Determinant p Theorem. A n x n square matrix is nonsingular iff has full rank (i.e. n). p Theorem. A matrix has full column rank iff it does not have a null vector p Theorem. A n x n matrix A is singular iff the det(a)=0 det( ) = n A ( 1) j= 1 1+ j a a 11 1 j det( A = 1 p A [ij] is the ij minor, i.e., the matrix obtained by deleting the i-th row and the j-th column from A. [1 j] ) if if n n > 1 18

Exercise p Compute the determinant of the following matrices! " 1 1 2 4 $ & %! " 1 1 0 2 4 2 3 6 3 $ & & & % 19

Exercise p Compute the determinant of the following matrices! " 1 1 2 4 $ & = 1*4-1*2 = 2 %! " 1 1 0 2 4 2 3 6 3 $ & & & % = 1*(4*3-2*6)-(2*3-2*3)=0 http://www.bluebit.gr/matrix-calculator/ 20

Eigenvectors and Eigenvalues p Definition. If M is a square matrix, v is a nonzero vector and λ is a number such that n M v = λ v p then v is said to be an (right) eigenvector of A with eigenvalue λ p If v is an eigenvector of M with eigenvalue λ, then so is any nonzero multiple of v p Only the direction matters. 21

Example p The matrix Is it singular? M = 2 1 3 2 p Has two (right) eigenvectors: n v 1 =(1 1) t and v 2 = (3 1) t Prove that 22

Example p The matrix Is it singular? M = 2 1 3 2 p Has two eigenvectors: n v 1 =(1 1) t and v 2 = (3 1) t Prove that p Mv 1 = (-1-1) t = -1 v 1 n The eigenvalue is -1 p Mv 2 = (3 1) t = 1 v 2 n The eigenvalue is 1 23

Transformation M = 2 1 3 2 p There is a lot of distortion in these directions (1 0) t, (1 1) t, (0 1) t 24

Transformation along eigenvectors p There are two independent directions which are not twisted at all by the matrix M: (1 1) and (3 1) p one of them is flipped (1 1) p We see less distortion if our box is oriented in the two special directions. 25

Results p Theorem: every square matrix has at least one eigenvector p The usual situation is that an n x n matrix has n linearly independent eigenvectors p If there are n of them, they are a useful basis for R n. p Unfortunately, it can happen that there are fewer than n of them. 26

Finding Eigenvectors p M v = λ v n v is an eigenvector and is λ an eigenvalue p If λ = 0, then finding eigenvectors is the same as finding nonzero vectors in the null space iff det(m) = 0, i.e., the matrix is singular p If λ!= 0, then finding the eigenvectors is equivalent to finding the null space for the matrix M λ1 (1 is the identity matrix) p The matrix M λ1 has a non zero vector in the null space iff det(m λ1) = 0 p det(m λ1) = 0 is called the characteristic equation. 27

Exercise M = 2 1 3 2 Find the eigenvalues and the eigenvectors of this matrix 1) Find the solutions λ of the characteristic equation (eigenvalues) 2) Find the eigenvectors corresponding to the found eigenvalues. 28

Exercise Solution M 2 = 1 3 2 p det(m λ1) = 0 Find the eigenvalues and the eigenvectors of this matrix n (2 - λ)(-2 - λ) + 3 = λ 2-1 p The solutions are +1 and -1 29

Exercise Solution M 2 = 1 3 2 p det(m λ1) = 0 Find the eigenvalues and the eigenvectors of this matrix n (2 - λ)(-2 - λ) + 3 = λ 2-1 p The solutions are +1 and -1 p Now we have to solve the set of linear equations n Mv=v (for the first eigenvalue) 30

Exercise Solution M 2 = 1 3 2 p det(m λ1) = 0 n (2 - λ)(-2 - λ) + 3 = λ 2-1 p The solutions are +1 and -1 p Now we have to solve the set of linear equations n Mv=v (for the first eigenvalue) 2x 3y = x 2y = Find the eigenvalues and the eigenvectors of this matrix x y n Has solution x=3y, (3 1) t and all vectors obtained multiplying this with a scalar. 31

Algorithm p To find the eigenvalues and eigenvectors of M: n First find the eigenvalues by solving the characteristic equation. Call the solutions λ 1,..., λ n. (There is always at least one eigenvalue, and there are at most n of them.) n For all λ k, the existence of a nonzero vector in this null space is guaranteed. Any such vector is an eigenvector. 32

Graphs p A directed graphs G is a pair (V,E), where V is a finite set and E is a binary relations on V n V is the Vertex set of G: contains the vertices n E is the Edge set of G: contains the edges p In an undirected graphs G=(V,E) the edges consists of unordered pairs of vertices p The in-degree of a vertex v (directed graph) is the number of edges entering in v p The out-degree of a vertex v (directed graph) is the number of edges leaving v. 33

The Web as a Directed Graph Page A Anchor hyperlink Page B Assumption 1: A hyperlink between pages denotes author perceived relevance (quality signal) Assumption 2: The anchor of the hyperlink describes the target page (textual context) 34

Ranking web pages p To count inlinks: enter in google search form link:www.mydomain.com p Web pages are not equally important n www.unibz.it vs. www.stanford.edu n Inlinks as votes p www.stanford.edu has 3200 inlinks p www.unibz.it has 352 inlink (Feb 2013) p Are all inlinks equal? n Recursive question! 35

Simple recursive formulation p Each link s vote is proportional to the importance of its source page p If page P with importance x has n outlinks, each link gets x/n votes 333 $ 333 $ 1000 $ 333 $ 36

Simple flow model The web in 1839 y Yahoo y/2 y = y /2 + a /2 a = y /2 + m a/2 y/2 m = a /2 m Amazon Microsoft a a/2 m a, m, and y are the importance of these pages 37

Solving the flow equations p 3 equations, 3 unknowns, no constants n No unique solution n If you multiply a solution by a constant (λ) you obtain another solution - try with (2 2 1) p Additional constraint forces uniqueness n y+a+m = 1 (normalization) n y = 2/5, a = 2/5, m = 1/5 n These are the scores of the pages under the assumption of the flow model p Gaussian elimination method works for small examples, but we need a better method for large graphs. 38

Matrix formulation p Matrix M has one row and one column for each web page (square matrix) p Suppose page i has n outlinks n If i links to j, then M ij =1/n n Else M ij =0 p M is a row stochastic matrix n Rows sum to 1 p Suppose r is a vector with one entry per web page n r i is the importance score of page i n Call it the rank vector 39

Example y a m y Yahoo y/2 y ½ ½ 0 a ½ 0 ½ m 0 1 0 = M a/2 y/2 (y a m) = (y a m)m Amazon a m a/2 Microsoft m y = y /2 + a /2 a = y /2 + m m = a /2 40

Power Iteration Solution Yahoo y a m y ½ ½ 0 a ½ 0 ½ m 0 1 0 = M Amazon Microsoft (y a m) = (y a m)m (1/3 1/3 1/3) (1/3 1/3 1/3)M = (1/3 1/2 1/6) (1/3 1/2 1/6)M = (5/12 1/3 1/4) (5/12 1/3 1/4)M = (3/8 11/24 1/6) (2/5 2/5 1/5) 41

Example 42

States and probabilities 0.15 0.85 DRY RAIN 0.62 0.38 43

Composing transitions 0.44 = 0.38*0.15+0.62*0.62 What kind of operation is on the matrix? 44

Composing transitions dry rain dry.85.38 rain.15.85.62.38.15.62 =.78.56.22.44 = A 2 p The probabilities of the 12hours transitions are given by squaring the matrix representing the probabilities of the 6hours transitions n P(rain-in-12hours rain-now)= P(rain-in-12hours rainin-6hours)*p(rain-in-6hours rain-now)+p(rainin-12hours dry-in-6hours)*p(dry-in-6hours rain-now)=. 62*.62+.15*.38=.44 n P(dry-in-12hours rain-now)= P(dry-in-12hours rainin-6hours)*p(rain-in-6hours rain-now)+p(dryin-12hours dry-in-6hours)*p(dry-in-6hours rain-now)= 38*.62+.85*.38=.56 45

Behavior in the limit =.62.38.15.85 A =.36.64.25.75 3 A =.29.71.28.72 7 A =.28.72.28.72 8 A =.44.56.22.78 2 A =.32.68.27.73 4 A =.29.71.28.72 6 A =.30.70.28.72 5 A =.28.72.28.72 9 A =.28.72.28.72 A 46

Behavior in the limit p If a,b <=1, and a+b=1, i.e., (a b) is a generic state with a certain probability a to be dry and b=1-a to be rain, then.72 ( ) a b A = ( a b) = (.72.28).72.28.28 p In particular (.72.28)A=(.72.28), i.e., it is a (left) eigenvector with eigenvalue 1 p The eigenvector (.72.28) represents the limit situation starting from a generic situation (a b): it is called the stationary distribution. 47

Exercise p Find one (left) eigenvector of the matrix below: n Solve first the characteristic equation (to find the eigenvalues) n and then find the left eigenvector corresponding to the largest eigenvalue! ".85.15.38.62 $ & % 48

Exercise Solution p Characteristic equation " det$.85 λ.15.38.62 λ % ' =(0.85-λ)(0.62-λ) - 0.15*0.38 = λ 2 1.47λ +0.47 & λ = 1.47± 1.472 4*0.47 2 Solutions λ =1 and λ = 0.47! ( x y ) ".85.15.38.62 $ & = x y % ( ) 0.85x + 0.38y=x x + y = 1 0.85x +0.38(1-x)=x -0.53x +0.38=0 x = 0.38/0.53=0.72 y = 1 0.72= 0.28 49

Markov Chain p A Markov chain is a sequence X 1, X 2, X 3,... of random variables (Σ v all possible values of X P(X=v) = 1) with the property: p Markov property: the conditional probability distribution of the next future state X n+1 given the present and past states is a function of the present state X n alone! " p If the state space is finite then the transition probabilities can be described with a matrix P ij =P(X n+1 = j X n = i ), i,j =1, m.85.15.38.62 $! & = % " P(X n+1 =1 X n =1) P(X n+1 = 2 X n =1) P(X n+1 =1 X n = 2) P(X n+1 = 2 X n = 2) 50 $ & %

Example: Web p X t is the page visited by a user (random surfer) at time t; p At every time t the user can be in one among m pages (states) p We assume that when a user is on page i at time t, then the probability to be on page j at time t+1 depends only on the fact that the user is on page i, and not on the pages previously visited. 51

Probabilities P(P 1 P 1 ) = 0.1 P(P 2 P 1 ) = 0.4 P(P 1 P 0 )= 1.0 P P 2 P 1 0 P(P 1 P 2 )= 1.0 P(P 0 P 1 )= 0.05 P(P 3 P 1 ) = 0.3 P(P P(P 4 P 1 ) = 0.15 1 P 3 ) = 0.5 P 3 Goal P 4 P(P 4 P 3 ) = 0.5 In this example there are 5 states and the probability to jump from a page/state to another is not constant (it is not 1/(of outlinks of the node)) as we have assumed before in the simple web graph This is not a Markov chain! (why?) 52

Examples p P ij =P(X n+1 = j X n = i ), i,j =1, m p (1, 0, 0,, 0) P = (P 11, P 12, P 13,, P 1n ) n if at time n it is in state 1, then at time n+1 it is in state j with probability P 1j, i.e., the first row of P ij gives the probabilities to be in the other states p (0.5, 0.5, 0,, 0) P = (P 11 0.5 + P 21 0.5,, P 1n 0.5 + P 2n 0.5) n this is the linear combination of the first two rows. 53

Stationary distribution p A stationary distribution is a m-dimensional (sum 1) vector which satisfies the equation: p Where π is a (column) vector and π T (row vector) is the transpose of π p A stationary distribution always exists, but is not guaranteed to be unique (can you make an example of a Markov chain with more than one stationary distribution?) p If there is only one stationary distribution then T π = lim x n p Where x is a generic distribution over the m states (i.e., it is an m-dimensional vector whose entries are <=1 and the sum is 1) T P n 54

Random Walk Interpretation p Imagine a random web surfer n At any time t, surfer is on some page P n At time t+1, the surfer follows an outlink from P uniformly at random n Ends up on some page Q linked from P n Process repeats indefinitely p Let p(t) be a vector whose i th component is the probability that the surfer is at page i at time t n p(t) is a probability distribution on pages 55

The stationary distribution p Where is the surfer at time t+1? n Follows a link uniformly at random n p(t+1) = p(t)m p Suppose the random walk reaches a state such that p(t+1) = p(t)m = p(t) n Then p(t) is a stationary distribution for the random walk p Our rank vector r= p(t) satisfies r = rm. 56

Ergodic Markov chains p A Markov chain is ergodic if: n Informally: there is a path from any state to any other; and the states are not partitioned into sets such that all state transitions occur cyclically from one set to another. n Formally: for any start state, after a finite transient time T 0, the probability of being in any state at any fixed time T>T 0 is nonzero. Not ergodic (even/ odd). Not ergodic: the probability to be in a state, at a fixed time, e.g., after 500 transitions, is always either 0 or 1 57 according to the initial state.

Ergodic Markov chains p For any ergodic Markov chain, there is a unique long-term visit rate for each state n Steady-state probability distribution p Over a long time-period, we visit each state in proportion to this rate p It doesn t matter where we start. p Note: non ergodic Markov chains may still have a steady state. 58

Non Ergodic Example 1.0 P = 0 0.2 0 0.5 0 0 0.5 0.8 1 1 0.5 0.2 0.5 3 2 0.8 p It is easy to show that the steady state (left eigenvector) is π T = (0 0 1), π T P=π T, i.e., is the state 3 p The user will always reach the state 3 and will stay there (spider trap) p This is a non-ergodic Markov Chain (with a steady-state). 59

Random teleports p The Google solution for spider traps (not for dead ends) p At each time step, the random surfer has two options: n With probability β, follow a link at random n With probability 1-β, jump to some page uniformly at random n Common values for β are in the range 0.8 to 0.9 p Surfer will teleport out of spider trap within a few time steps 60

Matrix formulation p Suppose there are N pages n Consider a page i, with set of outlinks O(i) n We have p M ij = 1/ O(i) when i links j p and M ij = 0 otherwise n The random teleport is equivalent to p adding a teleport link from i to every other page with probability (1-β)/N p reducing the probability of following each outlink from 1/ O(i) to β/ O(i) p Equivalent: tax each page a fraction (1-β) of its score and redistribute evenly. 61

Example p Simple example with 6 pages 5 1 4 6 3 2 p P(5 1)=P(4 1)=P(3 1)=P(2 1)= β/4 +(1-β)/6 p P(1 1)=P(6 1)= (1-β)/6 p P(* 1) = 4[β/4 +(1-β)/6] + 2(1-β)/6 = 1 62

Google Page Rank p Construct the NxN matrix A as follows n A ij = βm ij + (1-β)/N p Verify that A is a stochastic matrix p The page rank vector r is the principal eigenvector of this matrix n satisfying r = ra n The score of each page r i satisfies the following: r i = β % $ k I (i) r & k O(k) ( ' + (1 β) N p I(i) is the set of nodes that have a link to page i p O(k) is the set of links exiting from k p r is the stationary distribution of the random walk with 63 teleports.

Example 5 3 1 2 4 A = 6 0,03 0,24 0,24 0,24 0,24 0,03 0,03 0,03 0,45 0,03 0,03 0,45 0,03 0,03 0,03 0,03 0,88 0,03 0,03 0,88 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,88 0,03 0,03 0,03 0,88 0,03 0,03 P(4 1)=0.24=0.85/4 + 0.15/6 P(6 1)=0.03=0.15/6 P(4 6)=0.88=0.85/1 + 0.15/6 β=0.85 A 30 = 0,03 0,23 0,13 0,24 0,14 0,24 0,03 0,23 0,13 0,24 0,14 0,24 0,03 0,23 0,13 0,24 0,14 0,24 0,03 0,23 0,13 0,24 0,14 0,24 0,03 0,23 0,13 0,24 0,14 0,24 0,03 0,23 0,13 0,24 0,14 0,24 Stationary distribution = (0.03 0.23 0.13 0.24 0.14 0.24) 64

Dead ends p Pages with no outlinks are dead ends for the random surfer (dangling nodes) n Nowhere to go on next step p When there are dead ends the matrix is no longer stochastic (the sum of the row elements is not 1) p This is true even if we add the teleport n because the probability to follow a teleport link is only (1-β)/N and there are just N of these teleports- hence any of them is (1-β) 65

Dealing with dead-ends p 1) Teleport n Follow random teleport links with probability 1.0 from dead-ends (i.e., for that pages set β = 0) n Adjust matrix accordingly p 2) Prune and propagate n Preprocess the graph to eliminate dead-ends n Might require multiple passes (why?) n Compute page rank on reduced graph n Approximate values for dead ends by propagating values from reduced graph 66

Computing page rank p Key step is matrix-vector multiply n r new = r old A p Easy if we have enough main memory to hold A, r old, r new p Say N = 1 billion pages n We need 4 bytes (32 bits) for each entry (say) n 2 billion entries for vectors r new and r old, approx 8GB n Matrix A has N 2 entries, i.e., 10 18 p it is a large number! 67

Sparse matrix formulation p Although A is a dense matrix, it is obtained from a sparse matrix M n 10 links per node, approx 10N entries p We can restate the page rank equation n r = βrm + [(1-β)/N] N (see slide 63) n [(1-β)/N] N is an N-vector with all entries (1-β)/N p So in each iteration, we need to: n Compute r new = βr old M n Add a constant value (1-β)/N to each entry in r new 68

Sparse matrix encoding p Encode sparse matrix using only nonzero entries n Space proportional roughly to number of links n say 10N, or 4*10*1 billion = 40GB n still won t fit in memory, but will fit on disk source node degree destination nodes 0 3 1, 5, 7 1 5 17, 64, 113, 117, 245 2 2 13, 23 69

Basic Algorithm p Assume we have enough RAM to fit r new, plus some working memory Store r old and matrix M on disk n Basic Algorithm: p Initialize: r old = [1/N] N p Iterate: n Update: Perform a sequential scan of M and r old and update r new n n Write out r new to disk as r old for next iteration Every few iterations, compute r new -r old and stop if it is below threshold p Need to read in both vectors into memory 70

Update step Initialize all entries of r new to (1-β)/N For each page p (out-degree n): Read into memory: p, n, dest 1,,dest n, r old (p) for j = 1 N: r new (dest j ) += β*r old (p)/n 0 1 2 3 4 5 6 r new src degree destination 0 3 1, 5, 6 1 4 17, 64, 113, 117 2 2 13, 23 The old value in 0 contributes to updating only the new values in 1,5, and 6. 71 r old 0 1 2 3 4 5 6