Analysis and Computation of Google s PageRank

Similar documents
Analysis of Google s PageRank

Mathematical Properties & Analysis of Google s PageRank

c 2007 Society for Industrial and Applied Mathematics

PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES

CONVERGENCE ANALYSIS OF A PAGERANK UPDATING ALGORITHM BY LANGVILLE AND MEYER

The Second Eigenvalue of the Google Matrix

Computing PageRank using Power Extrapolation

Fast PageRank Computation Via a Sparse Linear System (Extended Abstract)

c 2005 Society for Industrial and Applied Mathematics

Fast PageRank Computation via a Sparse Linear System

PageRank Problem, Survey And Future Research Directions

Krylov Subspace Methods to Calculate PageRank

Link Mining PageRank. From Stanford C246

Graph Models The PageRank Algorithm

Calculating Web Page Authority Using the PageRank Algorithm

Combating Web Spam with TrustRank

Link Analysis. Stony Brook University CSE545, Fall 2016

Updating PageRank. Amy Langville Carl Meyer

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Updating Markov Chains Carl Meyer Amy Langville

Link Analysis. Leonid E. Zhukov

REPORT DOCUMENTATION PAGE

Uncertainty and Randomization

Slides based on those in:

A Two-Stage Algorithm for Computing PageRank and Multistage Generalizations

Online Social Networks and Media. Link Analysis and Web Search

A Note on Google s PageRank

Link Analysis Ranking

Finding central nodes in large networks

arxiv: v1 [cs.ir] 24 Mar 2013

The Push Algorithm for Spectral Ranking

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

1998: enter Link Analysis

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Intelligent Data Analysis. PageRank. School of Computer Science University of Birmingham

How to optimize the personalization vector to combat link spamming

A hybrid reordered Arnoldi method to accelerate PageRank computations

Extrapolation Methods for Accelerating PageRank Computations

A Fast Two-Stage Algorithm for Computing PageRank

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Algebraic Representation of Networks

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Manipulability of PageRank under Sybil Strategies

A New Method to Find the Eigenvalues of Convex. Matrices with Application in Web Page Rating

PAGERANK PARAMETERS. Amy N. Langville. American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010

Models and Algorithms for Complex Networks. Link Analysis Ranking

PageRank: Functional Dependencies

Node Centrality and Ranking on Networks

ECEN 689 Special Topics in Data Science for Communications Networks

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Introduction to Data Mining

Data and Algorithms of the Web

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Using a Layered Markov Model for Distributed Web Ranking Computation

Complex Social System, Elections. Introduction to Network Analysis 1

eigenvalues, markov matrices, and the power method

Maximizing PageRank via outlinks

for an m-state homogeneous irreducible Markov chain with transition probability matrix

Affine iterations on nonnegative vectors

Online Social Networks and Media. Link Analysis and Web Search

Google Page Rank Project Linear Algebra Summer 2012

Department of Applied Mathematics. University of Twente. Faculty of EEMCS. Memorandum No. 1712

Traps and Pitfalls of Topic-Biased PageRank

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

Eigenvalue Problems Computation and Applications

A Singular Perturbation Approach for Choosing the PageRank Damping Factor

0.1 Naive formulation of PageRank

Inf 2B: Ranking Queries on the WWW

Computational Economics and Finance

Distribution of PageRank Mass Among Principle Components of the Web

The Google Markov Chain: convergence speed and eigenvalues

Lecture 7 Mathematics behind Internet Search

Data Mining Recitation Notes Week 3

CS6220: DATA MINING TECHNIQUES

NewPR-Combining TFIDF with Pagerank

Manipulation-resistant Reputations Using Hitting Time

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Node and Link Analysis

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Exploring and Understanding Scientific Metrics in Citation Networks

Lecture 12: Link Analysis for Web Retrieval

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Numerical Methods I: Eigenvalues and eigenvectors

Non-backtracking PageRank. Arrigo, Francesca and Higham, Desmond J. and Noferini, Vanni. MIMS EPrint:

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

1 Searching the World Wide Web

Robust PageRank: Stationary Distribution on a Growing Network Structure

MATH36001 Perron Frobenius Theory 2015

Randomization and Gossiping in Techno-Social Networks

Chapter 7 Network Flow Problems, I

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

Transcription:

Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University, USA Joint work with Rebecca S. Wills ANAW p.1

PageRank An objective measure of the citation importance of a web page [Brin & Page 1998]... continues to provide the basis for all of our web search tools http://www.google.com/technology/ Assigns a rank to every web page Influences the order in which Google displays search results Based on link structure of the web graph ANAW p.2

Overview PageRank Computation of PageRank Termination Criterion Ranking ANAW p.3

Simple Web Model Construct matrix S Page i has d 1 outgoing links: If page i has link to page j then s ij = 1/d else s ij = 0 Page i has 0 outgoing links: s ij = 1/n (dangling node) s ij : probability that surfer moves from page i to page j ANAW p.4

Google Matrix S is stochastic: 0 s ij 1 S1 = 1 Google Matrix: convex combination G = αs + (1 α)1v T Damping factor 0 < α < 1, e.g. α =.85 Personalization vector v 0 v 1 = 1 TrustRank: v i = 0 if page i is spam page [Gyöngyi, Garcia-Molina & Pedersen 2004] ANAW p.5

PageRank G = αs + (1 α)1v T Unique left eigenvector: π T G = π T π 0 π 1 = 1 ith entry of π: PageRank of page i PageRank. = largest left eigenvector of G Properties of PageRank and extensions: [Serra-Capizzano 2004, Horn & Serra-Capizzano 2006] ANAW p.6

PageRank Computation Power method Page, Brin, Motwani & Winograd 1999 Acceleration of power method Kamvar, Haveliwala, Manning & Golub 2003 Haveliwala, Kamvar, Klein, Manning & Golub 2003 Brezinski & Redivo-Zaglia 2004 Brezinski, Redivo-Zaglia & Serra-Capizzano 2005 Aggregation/Disaggregation Langville & Meyer 2002, 2003, 2004 Ipsen & Kirkland 2004 ANAW p.7

PageRank Computation Methods that adapt to web graph Broder, Lempel, Maghoul & Pedersen 2004 Kamvar, Haveliwala & Golub 2004 Haveliwala, Kamvar, Manning & Golub 2003 Lee, Golub & Zenios 2003 Lu, Zhang, Xi, Chen, Liu, Lyu & Ma 2004 Krylov methods Golub & Greif 2004 ANAW p.8

PageRank Computation Linear System Solution Arasu, Novak & Tomkins 2003 Bianchini, Gori & Scarselli 2003 Gleich, Zukov & Berkhin 2004 Del Corso, Gullì & Romani 2004 Survey Berkhin 2005 ANAW p.9

Power Method Want: π such that π T G = π T Power method: [x (k) ] T = [x (0) ] T G k, k 1 Converges to unique vector Vectorizes Storage for only a single vector Matrix vector multiplication with very sparse matrix Accurate (no subtractions) Simple (few decisions) ANAW p.10

Error in Power Method π T G = π T G = αs + (1 α)1v T Forward error in iteration k: x (k) π 2 α k [Bianchini, Gori & Scarselli 2003] Residual in iteration k: [x (k) ] T G [x (k) ] T 2 α k Norms: 1, ANAW p.11

Termination Criterion Residual: [x (k) ] T G [x (k) ] T = [x (k+1) ] T [x (k) ] T Power method: Pick x (0) 0 x (0) 1 = 1 Repeat [x (k+1) ] T = [x (k) ] T G until x (k+1) x (k) small enough After k iterations: x (k+1) x (k) 2 α k ANAW p.12

Iteration Counts for Different α G = αs + (1 α)1v T α n = 281903 n = 683446 bound.85 69 65 118.90 107 102 166.95 219 220 415.99 1114 1208 2075 bound: k such that 2 α k 10 8 Iteration counts independent of matrix size n ANAW p.13

Ranking After k iterations of power method: Residual: [x (k) ] T G [x (k) ] T 2 α k Error: x (k) π 2 α k But: Do the components of x (k) have the same ranking as those of π? Rank-stability, rank-similarity: [Lempel & Moran, 2005] [Borodin, Roberts, Rosenthal & Tsaparas 2005] ANAW p.14

Absolute Errors versus Ranking π T = (.23.24.26.27 ) [x (k) ] T = (.27.26.24.23 ) x (k) π =.04 Small error, but incorrect ranking [x (k) ] T = ( 0.001.002.997 ) x (k) π =.727 Large error, but correct ranking ANAW p.15

Web Graph is a Ring S = 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 ANAW p.16

All Pages are Trusted S is circulant of order n, v = 1 n 1 PageRank: π = 1 n 1 All pages have same PageRank Power method x (0) = v: x (0) = π correct ranking x (0) v: [x (k) ] T = 1 n 1T + α k ( [x (0) ] T S k 1 n 1) Ranking does not converge (in exact arithmetic) ANAW p.17

Only One Page is Trusted v T = ( 1 0 0 0 0 ) ANAW p.18

Only One Page is Trusted π T ( 1 α α 2 α 3 α 4) PageRank decreases with distance from page 1 ANAW p.19

Only One Page is Trusted S is circulant of order n, v = e 1 PageRank: π T ( 1 α... α n 1) Power method ( with x (0) = v: ) [x (k) ] T 1 α... α k 1 α k 1 α 0... 0 ) [x (n) ] T (1 + αn 1 α α α2... α n 1 Rank convergence in n iterations ANAW p.20

Too Many Iterations Power method with x (0) = v = e 1 : After n iterations: [x (n) ] T ( 1 + αn 1 α α α2... α n 1 ) After n + 1 iterations: [x (n+1) ] T If α =.85, n = 10: ( 1 + α n α + αn+1 1 α α 2... α n 1 ) Additional iterations can destroy a converged ranking α + αn+1 1 α > 1 + αn ANAW p.21

Recovery of Ranking S is circulant of order n After k iterations: k 1 [x (k) ] T = α k [x (0) ] T S k + (1 α)v T After k + n iterations: j=0 [x (k+n) ] T = α n [x (k) ] T + (1 α n )π T α j S j If x (k) has correct ranking, so does x (k+n) ANAW p.22

Any Personalization Vector S is circulant of order n PageRank: π T v T n 1 j=0 αj S j Power method with x (0) = 1 n 1 [x (n) ] T = (1 α n ) π }{{} T + scalar αn n 1T }{{} constant vector For any v: rank convergence after n iterations ANAW p.23

Summary Google matrix G = αs + (1 α) 1v T PageRank π: left eigenvector of G Computation of π via power method Convergence of residual/error norms: Depends only on α Convergence of rank: May never happen Depends on: α, v, starting vector, matrix size, graph of S A converged ranking can be destroyed ANAW p.24