Analysis of Google s PageRank

Similar documents
Mathematical Properties & Analysis of Google s PageRank

Analysis and Computation of Google s PageRank

PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES

c 2007 Society for Industrial and Applied Mathematics

The Second Eigenvalue of the Google Matrix

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER

Computing PageRank using Power Extrapolation

PageRank Problem, Survey And Future Research Directions

Krylov Subspace Methods to Calculate PageRank

Fast PageRank Computation Via a Sparse Linear System (Extended Abstract)

Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative

Fast PageRank Computation via a Sparse Linear System

Calculating Web Page Authority Using the PageRank Algorithm

Updating PageRank. Amy Langville Carl Meyer

c 2005 Society for Industrial and Applied Mathematics

Updating Markov Chains Carl Meyer Amy Langville

for an m-state homogeneous irreducible Markov chain with transition probability matrix

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

A hybrid reordered Arnoldi method to accelerate PageRank computations

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Uncertainty and Randomization

Link Analysis. Leonid E. Zhukov

Combating Web Spam with TrustRank

Finding central nodes in large networks

1998: enter Link Analysis

REPORT DOCUMENTATION PAGE

Link Analysis Ranking

A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations

Link Analysis. Stony Brook University CSE545, Fall 2016

arxiv: v1 [cs.ir] 24 Mar 2013

Extrapolation Methods for Accelerating PageRank Computations

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

A Fast Two-Stage Algorithm for Computing PageRank

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

Slides based on those in:

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

On the eigenvalues of specially low-rank perturbed matrices

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Complex Social System, Elections. Introduction to Network Analysis 1

Link Mining PageRank. From Stanford C246

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

A Singular Perturbation Approach for Choosing the PageRank Damping Factor

Graph Models The PageRank Algorithm

Introduction to Data Mining

Traps and Pitfalls of Topic-Biased PageRank

The Push Algorithm for Spectral Ranking

A Note on Google s PageRank

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Distribution of PageRank Mass Among Principle Components of the Web

PAGERANK PARAMETERS. Amy N. Langville. American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010

eigenvalues, markov matrices, and the power method

A New Method to Find the Eigenvalues of Convex. Matrices with Application in Web Page Rating

Affine iterations on nonnegative vectors

ECEN 689 Special Topics in Data Science for Communications Networks

Computational Economics and Finance

Random Surfing on Multipartite Graphs

NewPR-Combining TFIDF with Pagerank

Data and Algorithms of the Web

Comparison of perturbation bounds for the stationary distribution of a Markov chain

Intelligent Data Analysis. PageRank. School of Computer Science University of Birmingham

How to optimize the personalization vector to combat link spamming

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Online Social Networks and Media. Link Analysis and Web Search

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

MATH36001 Perron Frobenius Theory 2015

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

PageRank: Functional Dependencies

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Node and Link Analysis

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

Node Centrality and Ranking on Networks

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

The Kemeny Constant For Finite Homogeneous Ergodic Markov Chains

Models and Algorithms for Complex Networks. Link Analysis Ranking

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Maximizing PageRank via outlinks

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Eigenvalue Problems Computation and Applications

Department of Applied Mathematics. University of Twente. Faculty of EEMCS. Memorandum No. 1712

Using a Layered Markov Model for Distributed Web Ranking Computation

Algebraic Representation of Networks

Computing approximate PageRank vectors by Basis Pursuit Denoising

Algebraic Multigrid Preconditioners for Computing Stationary Distributions of Markov Processes

0.1 Naive formulation of PageRank

Markov Chains and Web Ranking: a Multilevel Adaptive Aggregation Method

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof.

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

The Choice of a Damping Function for Propagating Importance in Link-Based Ranking

Robust PageRank: Stationary Distribution on a Growing Network Structure

Lecture: Local Spectral Methods (1 of 4)

CS47300: Web Information Search and Management

Monte Carlo methods in PageRank computation: When one iteration is sufficient

CS6220: DATA MINING TECHNIQUES

Transcription:

Analysis of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills AN05 p.1

PageRank An objective measure of the citation importance of a web page [Brin & Page 1998] Assigns a rank to every web page Influences the order in which Google displays search results Based on link structure of the web graph Topic independent AN05 p.2

Overview Google Matrix Stability of PageRank Eigenvalue Problem: Power Method Linear System: Jacobi Method Dangling Nodes AN05 p.3

Simple Web Model Construct matrix S Page i has d 1 outgoing links: If page i has link to page j then s ij = 1/d else s ij = 0 Page i has 0 outgoing links: s ij = 1/n (dangling node) s ij : probability that surfer moves from page i to page j AN05 p.4

Google Matrix Convex combination G = αs + (1 α)1v T Stochastic matrix S Damping factor 0 < α < 1, e.g. α =.85 Personalization vector v 0 v 1 = 1 TrustRank (to combat web spam) v i = 0 if page i is spam page [Gyöngyi, Garcia-Molina, Pedersen 2004] AN05 p.5

Properties of G G = αs + (1 α)1v T Stochastic, reducible Eigenvalues: 1 > αλ 2 (S) αλ 3 (S)... [Elden 2003] Unique left eigenvector: π T G = π T π 0 π 1 = 1 ith entry of π: PageRank of page i PageRank. = largest left eigenvector of G AN05 p.6

Stability of PageRank How sensitive is PageRank π to Round off errors Changes in damping factor α Changes in personalization vector v Addition/deletion of links AN05 p.7

Perturbation Theory For Markov chains Schweizer 1968, Meyer 1980 Haviv & van Heyden 1984 Funderlic & Meyer 1986 Golub & Meyer 1986 Seneta 1988, 1991 Ipsen & Meyer 1994 Kirkland, Neumann & Shader 1998 Cho & Meyer 2000, 2001 Kirkland 2003, 2004 AN05 p.8

Perturbation Theory For Google matrix Chien, Dwork, Kumar & Sivakumar 2001 Ng, Zheng & Jordan 2001 Bianchini, Gori & Scarselli 2003 Boldi, Santini & Vigna 2004 Langville & Meyer 2004 Golub & Greif 2004 Kirkland 2005 Chien, Dwork, Kumar, Simon & Sivakumar 2005 AN05 p.9

Changes in the Matrix S Exact: π T G = π T G = αs + (1 α)1v T Perturbed: π T G = π T G = α(s + E) + (1 α)1v T Error: π T π T = α π T E(I αs) 1 π π 1 α 1 α E AN05 p.10

Changes in Damping Factor α Exact: π T G = π T G = αs + (1 α)1v T Perturbed: π T G = π T G = (α+µ)s+(1 (α + µ)) 1v T Error: π π 1 2 1 α µ [Langville & Meyer 2004] AN05 p.11

Changes in Vector v Exact: π T G = π T G = αs + (1 α)1v T Perturbed: π T G = π T G = αs + (1 α)1(v + f) T Error: π π 1 f 1 AN05 p.12

Sensitivity of PageRank π π T G = π T G = αs + (1 α)1v T Changes in S: condition number α/(1 α) α: condition number 2/(1 α) v: condition number 1 α =.85: condition numbers 14 α =.99: condition numbers 200 PageRank insensitive to perturbations AN05 p.13

Adding an In-Link i π i > π i Adding an in-link increases PageRank (monotonicity) Removing an in-link decreases PageRank [Chien, Dwork, Kumar & Sivakumar 2001] [Chien, Dwork, Kumar, Simon & Sivakumar 2005] AN05 p.14

Adding an Out-Link 1 3 2 π 3 = 1 + α + α2 3(1 + α + α 2 /2) < π 3 = 1 + α + α2 3(1 + α) Adding an out-link may decrease PageRank AN05 p.15

Justification for TrustRank Adjust personalization vector to change PageRank Increase v for page i: v i := v i + φ Decrease v for page j: v j := v j φ PageRank of page i increases: π i > π i PageRank of page j decreases: π j < π j Total change in PageRank π π 1 2φ AN05 p.16

PageRank Computation Power method Page, Brin, Motwani & Winograd 1999 Acceleration of power method Kamvar, Haveliwala, Manning & Golub 2003 Haveliwala, Kamvar, Klein, Manning & Golub 2003 Brezinski & Redivo-Zaglia 2004 Brezinski, Redivo-Zaglia & Serra-Capizzano 2005 Aggregation/Disaggregation Langville & Meyer 2002, 2003, 2004 Ipsen & Kirkland 2004 AN05 p.17

PageRank Computation Methods that adapt to web graph Broder, Lempel, Maghoul & Pedersen 2004 Kamvar, Haveliwala & Golub 2004 Haveliwala, Kamvar, Manning & Golub 2003 Lee, Golub & Zenios 2003 Lu, Zhang, Xi, Chen, Liu, Lyu & Ma 2004 Krylov methods Golub & Greif 2004 AN05 p.18

Power Method Eigenvector problem: π T G = π T Power method: Pick x (0) > 0 x (0) 1 = 1 Repeat [x (k+1) ] T = [x (k) ] T G until x (k+1) x (k) τ [x (k+1) ] T [x (k) ] T = [x (k) ] T G [x (k) ] T residual AN05 p.19

Error in Power Method π T G = π T G = αs + (1 α)1v T Error in iteration k Forward: e k x (k) π e k 1, α k e 0 1, [Bianchini, Gori & Scarselli 2003] Residual: r T k = [x(k) ] T G [x (k) ] T r k 1, α k r 0 1, AN05 p.20

Termination Residual norm r k 2α k Stop when r k 10 8 For α =.85: k 119 n 2293 2947 3468 5757 281903 683446 k 75 76 83 79 69 65 Bound can be pessimistic AN05 p.21

PageRank from Linear System Eigenvector problem: π T (αs + (1 α)1v T ) }{{} G = π T π 0 π 1 = 1 Linear system: π T (I αs) = (1 α)v T I αs nonsingular M-matrix [Arasu, Novak, Tomkins & Tomlin 2002] [Bianchini, Gori & Scarselli 2003] AN05 p.22

Jacobi Method Assume no page has a link to itself π T (I αs) = (1 α)v T I αs = D O [x (k+1) ] T = [x (k) ] T OD 1 + (1 α)v T D 1 I αs is M-matrix Jacobi converges No dangling nodes: D = I O = αs Jacobi method = power method AN05 p.23

Dangling Nodes S = H + dw T is dense What to do about dangling nodes? Remove [Brin, Page, Motwani & Winograd 1998] No PageRank for dangling nodes Biased PageRank for other nodes Lump into single state [Lee, Golub & Zenios 2003] As above Remove dw T [Langville & Meyer 2004] [Arasu, Novak, Tomkins & Tomlin 2002] H is not stochastic What is being computed? AN05 p.24

Use v for Dangling Nodes π T (I αs) = (1 α)v T S = H + dw T Choose w = v Solve π T (I αh) = (1 α + απ T d) v T }{{} multiple of v T δ T (I αh) = multiple of v T Then δ is multiple of π [Gleich, Zhukov & Berkhin 2005] AN05 p.25

Iteration Counts for w = v n Power Jacobi 2293 75 74 2947 76 74 3468 83 82 5757 79 78 281903 69 69 683446 65 65 r k 10 8, α =.85 Jacobi: same # iterations as power method AN05 p.26

Extension to Arbitrary w π T (I αs) = (1 α)v T S = H + dw T Rank-one update: I αs = (I αh) αdw T 1. Solve δ T (I αh) = (1 α)v T 2. Solve ω T (I αh) = w T 3. Update π T = δ T + α δt d 1 α ω T d ωt This requires only two sparse solves AN05 p.27

Solve with I αh After similarity permutation: ( ) H1 H H = 2 0 0 ( ) ( ) x T 1 x T I αh 1 αh 2 2 0 I = ( b T 1 b T 2 ) 1. Sparse solve x T 1 (I αh 1) = b 1 2. Set x T 2 = α xt 1 H 2 + b T 2 AN05 p.28

PageRank via Linear System Rank one update: S = ( ) H1 H 2 0 0 + ( ) 0 1 w T General dangling node vector w 0, w 1 = 1 Traditional: w = 1 n 1, w = v Cost: Two sparse solves with I αh 1 via Jacobi Two matrix vector multiplications with H 2 Inner products and vector additions More dangling nodes cheaper AN05 p.29

Summary Google Matrix G = α S + (1 α) 1v T PageRank = left eigenvector of G PageRank insensitive to perturbations in G Adding in-links increases PageRank Adding out-links may decrease PageRank Justification for TrustRank AN05 p.30

Summary, ctd Power method Computes PageRank as eigenvector Forward, backward errors 2 α k Jacobi Method Computes PageRank from linear system No dangling nodes: Jacobi = power method Rank one update for dangling nodes More dangling nodes cheaper AN05 p.31