PAGERANK PARAMETERS. Amy N. Langville. American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010

Similar documents
Finding central nodes in large networks

Analysis of Google s PageRank

Link Analysis. Leonid E. Zhukov

Mathematical Properties & Analysis of Google s PageRank

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

Analysis and Computation of Google s PageRank

Math 304 Handout: Linear algebra, graphs, and networks.

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

Slides based on those in:

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Three results on the PageRank vector: eigenstructure, sensitivity, and the derivative

STA141C: Big Data & High Performance Statistical Computing

Lecture: Local Spectral Methods (1 of 4)

Lecture 7 Mathematics behind Internet Search

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

Link Analysis Ranking

Personalized PageRank with Node-dependent Restart

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

1998: enter Link Analysis

Strong Localization in Personalized PageRank Vectors

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Complex Social System, Elections. Introduction to Network Analysis 1

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

ECEN 689 Special Topics in Data Science for Communications Networks

Google Matrix, dynamical attractors and Ulam networks Dima Shepelyansky (CNRS, Toulouse)

How does Google rank webpages?

Node Centrality and Ranking on Networks

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Facebook Friends! and Matrix Functions

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Graph Models The PageRank Algorithm

Monte Carlo methods in PageRank computation: When one iteration is sufficient

Applications. Nonnegative Matrices: Ranking

Online Social Networks and Media. Link Analysis and Web Search

CS6220: DATA MINING TECHNIQUES

Calculating Web Page Authority Using the PageRank Algorithm

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Updating PageRank. Amy Langville Carl Meyer

eigenvalues, markov matrices, and the power method

CS 188: Artificial Intelligence

Introduction to Data Mining

Combating Web Spam with TrustRank

Data science with multilayer networks: Mathematical foundations and applications

Model Reduction for Edge-Weighted Personalized PageRank

PageRank: Functional Dependencies

Machine Learning CPSC 340. Tutorial 12

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

The Theory behind PageRank

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

A Note on Google s PageRank

Random Surfing on Multipartite Graphs

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

Approximate Inference

CS6220: DATA MINING TECHNIQUES

Online Social Networks and Media. Link Analysis and Web Search

Fast PageRank Computation Via a Sparse Linear System (Extended Abstract)

IR: Information Retrieval

The Choice of a Damping Function for Propagating Importance in Link-Based Ranking

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

Degree Distribution: The case of Citation Networks

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Uncertainty and Randomization

Traps and Pitfalls of Topic-Biased PageRank

Data and Algorithms of the Web

CS 188: Artificial Intelligence Fall Recap: Inference Example

Using Rank Propagation and Probabilistic Counting for Link-based Spam Detection

AXIOMS FOR CENTRALITY

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Krylov Subspace Methods to Calculate PageRank

CS47300: Web Information Search and Management

CS 188: Artificial Intelligence Spring 2009

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains

0.1 Naive formulation of PageRank

Data Mining Techniques

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Link Analysis. Stony Brook University CSE545, Fall 2016

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

arxiv: v1 [math.pr] 8 May 2018

Markov Chains and Hidden Markov Models

Data Mining and Matrices

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

c 2007 Society for Industrial and Applied Mathematics

Kristina Lerman USC Information Sciences Institute

Improving Diversity in Ranking using Absorbing Random Walks

Page rank computation HPC course project a.y

Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier

How to optimize the personalization vector to combat link spamming

CS 5522: Artificial Intelligence II

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Applications to network analysis: Eigenvector centrality indices Lecture notes

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

DiffusionRank: A Possible Penicillin for Web Spamming

Pr[positive test virus] Pr[virus] Pr[positive test] = Pr[positive test] = Pr[positive test]

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Transcription:

PAGERANK PARAMETERS 100David F. Gleich 120 Amy N. Langville American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010 Gleich & Langville AIM 1 / 21

The most important page on the web 100 120 Gleich & Langville Recap AIM 2 / 21

The most important page on the web 100 120 Gleich & Langville Recap AIM 2 / 21

PageRank details 2 1 3 100 120 5 1/6 1/2 0 0 0 0 1/6 0 0 1/3 0 0 4 1/6 1/2 0 1/3 0 0 P j 0 1/6 0 1/2 0 0 0 e T P=e T 1/6 0 1/2 1/3 0 1 1/6 6 } 0 0 {{ 0 1 0 } P Markov chain Linear system Ignored jump v = [ 1 n... 1 n ] T 0 αp + (1 α)ve T x = x unique x j 0, e T x = 1. ( αp)x = (1 α)v dangling nodes patched back to v algorithms later e T v=1 Gleich & Langville Recap AIM 3 / 21

NM_003748 Contig32125_RC NM_003862 AB037863 U82987 Contig55377_RC NM_020974 NM_003882 Contig48328_RC NM_000849 Contig46223_RC NM_006117 NM_003239 NM_0181 AF257175 NM_001282 AF201951 Contig63102_RC Contig34634_RC NM_000286 NM_000320 AB033007 NM_000017 AL355708 NM_006763 Contig57595 AF148505 NM_0012 AJ224741 Contig49670_RC U45975 Contig25055_RC Contig753_RC Contig53646_RC Contig42421_RC Contig51749_RC NM_004911 AL137514 NM_000224 Contig41887_RC NM_013262 NM_004163 NM_015416 AB020689 Contig43747_RC NM_012429 AB033043 NM_016569 AL133619 NM_0044 Contig37063_RC NM_004798 NM_000507 Contig502_RC AB037745 Contig53742_RC NM_001007 Contig51963 NM_018104 Contig53268_RC NM_012261 Contig55813_RC NM_020244 Contig27312_RC Contig464_RC NM_002570 NM_002900 NM_015417 AL050090 Contig475_RC Contig55829_RC NM_016337 Contig45347_RC Contig37598 NM_020675 NM_003234 AL0110 Contig17359_RC AL137295 NM_013296 NM_019013 Contig55313_RC AF052159 NM_002358 Contig50106_RC NM_004358 NM_005342 NM_014754 Contig64688 U533 Contig3902_RC NM_001827 Contig41413_RC NM_015434 NM_0178 NM_018120 NM_001124 Contig45816_RC L275 NM_006115 AL050021 NM_001333 Contig51519_RC NM_005496 Contig1778_RC NM_014363 NM_001905 NM_018454 NM_002811 NM_0043 NM_0096 AB032973 Contig462_RC D25328 NM_0104 X94232 Contig55188_RC Contig8581_RC Contig53226_RC Contig50410 NM_012214 NM_006201 Contig134_RC NM_006372 Contig128_RC AL137502 NM_003676 Contig2504_RC NM_013437 NM_012177 AL1333 R70506_RC NM_003662 NM_018136 NM_000158 Contig21812_RC NM_018410 NM_0052 Contig864_RC Contig4595 NM_003878 NM_005563 U96131 Contig44799_RC NM_018455 NM_003258 NM_004456 NM_003158 Contig25343_RC NM_014750 Contig57864_RC NM_005196 NM_014109 Contig58368_RC NM_0028 Contig46653_RC NM_004504 NM_014875 M21551 NM_001168 NM_003376 NM_0198 NM_020166 AF161553 NM_017779 NM_018265 NM_004701 AF155117 Contig44289_RC NM_006281 Contig33814_RC NM_004336 NM_0030 NM_006265 NM_000291 NM_000096 NM_001673 NM_001216 NM_014968 NM_018354 NM_007036 Contig2399_RC NM_004702 Contig20217_RC NM_0019 NM_003981 NM_007203 NM_006681 NM_014889 AF055033 NM_020386 Contig56457_RC NM_000599 Contig24252_RC NM_005915 Contig55725_RC NM_002916 NM_014321 NM_006931 Contig51464_RC AL0079 NM_000788 NM_016448 NM_014791 X05610 Contig831_RC NM_015984 AK000745 Contig32185_RC NM_016577 AF052162 NM_0037 AF073519 NM_006101 Contig25991 NM_003875 Contig35251_RC NM_004994 NM_000436 NM_002073 NM_002019 NM_000127 NM_020188 Contig28552_RC AL137718 Contig38288_RC AA555029_RC Contig46218_RC NM_016359 Contig63649_RC AL0059 10 20 30 50 70 Other uses for PageRank What else people use PageRank to do GeneRank EventRank 100 120 Morrison et al. GeneRank, 2005 Note Use ( αgd 1 )x = w to find nearby important genes. New paper LabRank with a random scientist? ProteinRank ObjectRank IsoRank Clustering Sports ranking Food webs Centrality Reverse PageRank FutureRank SocialPageRank BookRank ArticleRank ItemRank SimRank DiffusionRank TrustRank TweetRank Gleich & Langville Recap AIM 4 / 21

Ulam Networks Chirikov map Ulam network y t+1 = ηy t +k sin( t +θ t ) 1. divide phase space into100 uniform cells 120 t+1 = t + y t+1 2. form P based on trajectories. Note log(e [x(a)]) A Bet (2, 16) log(std [x(a)]))/ log(e [x(a)]) White is larger, black is smaller Google matrix, dynamical attractors, and Ulam networks, Shepelyansky and Zhirov, arxiv Gleich & Langville Recap AIM 5 / 21

100 120 Choosing alpha Choosing alpha Slide 6 of 21 Choosing personalization Related methods Open issues

What is alpha? There s no single answer. Ask yourself, why am I computing PageRank? Then use the best value for your application. 100 120 tune α for the best feature web-search vector node centrality find important nodes in a web-graph understand what random jumps mean in your graph use the random surfer interpretation Author α Brin and Page (1998) Najork et al. (2007) 0.85 0.85 Litvak et al. (2006) 0.5 Pan el al. (2004) 0.15 Algorithms (...) 0.85 Experiment??? Gleich & Langville Choosing alpha AIM 7 / 21

The PageRank limit value Singular? ( αp)x = (1 α)v 100 120 0 P = X X 0 J 1 1 0 αx X 1 x = (1 α)v X α α 0 J 1 0 0 J 1 0 0 J 1 X 1 x = (1 α)v y = (1 α)z (1 α)y 1 = (1 α)z 1 ( αj 2 )y 2 = (1 α)z 2 Boldi et al. 2003: PageRank as a function of the damping parameter Gleich & Langville Choosing alpha AIM 8 / 21

TotalRank 100 120 1 t = x(α) dα 0 Proposed by Boldi et al. (2005) as a parameter free PageRank. Gleich & Langville Choosing alpha AIM 9 / 21

Generalized PageRank 100 120 PageRank ( αp)x = (1 α)v x = =0 (1 α)(α )P v Generalized PageRank y = =0 ƒ ( )P v ƒ ( ) < TotalRank ƒ ( ) = 1 +1 1 +2 LinearRank... HyperRank... Baeza-Yates et al. 2006 Gleich & Langville Choosing alpha AIM 10 / 21

Pick a distribution Multiple surfers should have an impact! Each person picks α from distribution 100 A 120 x(e [A]) x(e [A]) = E [x(a)] TotalRank : E [x(a)] : A U[0, 1] Constantine & Gleich, Internet Mathematics, in press.... E [x(a)] Gleich & Langville Choosing alpha AIM 11 / 21

From users 100 120 2.0 1.5 1.0 density 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Raw α Sample mean μ = 0.631. Gleich et al., WWW2010 Note 257,664 users from Microsoft toolbar data Gleich & Langville Choosing alpha AIM 12 / 21

100 120 Choosing alpha Choosing personalization Slide 13 of 21 Choosing personalization Related methods Open issues

Personalization choices 100 120 Application specific GeneRank : v = normalized microarray weights TopicRank: v = pages on the same topic TrustRank: v = only pages known to be good BadRank: v = only pages known to be bad (an reverse the graph) Super-personalized Set v to have only a single non-zero : v = e. Gleich & Langville Choosing personalization AIM 14 / 21

Personalized PageRank 100 120 B = (1 α)( αp) 1 B j = personalized score of page when jumping to page Gleich & Langville Choosing personalization AIM 15 / 21

100 120 Choosing alpha Related methods Slide 16 of 21 Choosing personalization Related methods Open issues

PageRank history 100 120 See Vigna 2010: Spectral Ranking and Franceschet 2010: PageRank: Standing on the shoulder of giants. Let A be the adjacency matrix of a graph. PageRank ( αp)x = (1 α)v (αp + (1 α)ve T )x = x Seeley 1949 Wei 1952 Katz 1953 Hubbell 1965 ( αa)x = e Px = x A T x = x A T x = x + v Gleich & Langville Related methods AIM 17 / 21

Graph centrality 100 120 For a graph G, a score assigned to each vertex V is a centrality score if larger scores are more central vertices and the score is independent of the labeling on the vertices. Gleich & Langville Related methods AIM 18 / 21

100 120 Choosing alpha Open issues Slide 19 of 21 Choosing personalization Related methods Open issues

100 120 Vigna, A history of spectral ranking, MMDS2010 Fro

Other issues 100 120 Gleich & Langville Open issues AIM 21 / 21

100 120 QUESTIONS?