PAGERANK PARAMETERS 100David F. Gleich 120 Amy N. Langville American Institute of Mathematics Workshop on Ranking Palo Alto, CA August 17th, 2010 Gleich & Langville AIM 1 / 21
The most important page on the web 100 120 Gleich & Langville Recap AIM 2 / 21
The most important page on the web 100 120 Gleich & Langville Recap AIM 2 / 21
PageRank details 2 1 3 100 120 5 1/6 1/2 0 0 0 0 1/6 0 0 1/3 0 0 4 1/6 1/2 0 1/3 0 0 P j 0 1/6 0 1/2 0 0 0 e T P=e T 1/6 0 1/2 1/3 0 1 1/6 6 } 0 0 {{ 0 1 0 } P Markov chain Linear system Ignored jump v = [ 1 n... 1 n ] T 0 αp + (1 α)ve T x = x unique x j 0, e T x = 1. ( αp)x = (1 α)v dangling nodes patched back to v algorithms later e T v=1 Gleich & Langville Recap AIM 3 / 21
NM_003748 Contig32125_RC NM_003862 AB037863 U82987 Contig55377_RC NM_020974 NM_003882 Contig48328_RC NM_000849 Contig46223_RC NM_006117 NM_003239 NM_0181 AF257175 NM_001282 AF201951 Contig63102_RC Contig34634_RC NM_000286 NM_000320 AB033007 NM_000017 AL355708 NM_006763 Contig57595 AF148505 NM_0012 AJ224741 Contig49670_RC U45975 Contig25055_RC Contig753_RC Contig53646_RC Contig42421_RC Contig51749_RC NM_004911 AL137514 NM_000224 Contig41887_RC NM_013262 NM_004163 NM_015416 AB020689 Contig43747_RC NM_012429 AB033043 NM_016569 AL133619 NM_0044 Contig37063_RC NM_004798 NM_000507 Contig502_RC AB037745 Contig53742_RC NM_001007 Contig51963 NM_018104 Contig53268_RC NM_012261 Contig55813_RC NM_020244 Contig27312_RC Contig464_RC NM_002570 NM_002900 NM_015417 AL050090 Contig475_RC Contig55829_RC NM_016337 Contig45347_RC Contig37598 NM_020675 NM_003234 AL0110 Contig17359_RC AL137295 NM_013296 NM_019013 Contig55313_RC AF052159 NM_002358 Contig50106_RC NM_004358 NM_005342 NM_014754 Contig64688 U533 Contig3902_RC NM_001827 Contig41413_RC NM_015434 NM_0178 NM_018120 NM_001124 Contig45816_RC L275 NM_006115 AL050021 NM_001333 Contig51519_RC NM_005496 Contig1778_RC NM_014363 NM_001905 NM_018454 NM_002811 NM_0043 NM_0096 AB032973 Contig462_RC D25328 NM_0104 X94232 Contig55188_RC Contig8581_RC Contig53226_RC Contig50410 NM_012214 NM_006201 Contig134_RC NM_006372 Contig128_RC AL137502 NM_003676 Contig2504_RC NM_013437 NM_012177 AL1333 R70506_RC NM_003662 NM_018136 NM_000158 Contig21812_RC NM_018410 NM_0052 Contig864_RC Contig4595 NM_003878 NM_005563 U96131 Contig44799_RC NM_018455 NM_003258 NM_004456 NM_003158 Contig25343_RC NM_014750 Contig57864_RC NM_005196 NM_014109 Contig58368_RC NM_0028 Contig46653_RC NM_004504 NM_014875 M21551 NM_001168 NM_003376 NM_0198 NM_020166 AF161553 NM_017779 NM_018265 NM_004701 AF155117 Contig44289_RC NM_006281 Contig33814_RC NM_004336 NM_0030 NM_006265 NM_000291 NM_000096 NM_001673 NM_001216 NM_014968 NM_018354 NM_007036 Contig2399_RC NM_004702 Contig20217_RC NM_0019 NM_003981 NM_007203 NM_006681 NM_014889 AF055033 NM_020386 Contig56457_RC NM_000599 Contig24252_RC NM_005915 Contig55725_RC NM_002916 NM_014321 NM_006931 Contig51464_RC AL0079 NM_000788 NM_016448 NM_014791 X05610 Contig831_RC NM_015984 AK000745 Contig32185_RC NM_016577 AF052162 NM_0037 AF073519 NM_006101 Contig25991 NM_003875 Contig35251_RC NM_004994 NM_000436 NM_002073 NM_002019 NM_000127 NM_020188 Contig28552_RC AL137718 Contig38288_RC AA555029_RC Contig46218_RC NM_016359 Contig63649_RC AL0059 10 20 30 50 70 Other uses for PageRank What else people use PageRank to do GeneRank EventRank 100 120 Morrison et al. GeneRank, 2005 Note Use ( αgd 1 )x = w to find nearby important genes. New paper LabRank with a random scientist? ProteinRank ObjectRank IsoRank Clustering Sports ranking Food webs Centrality Reverse PageRank FutureRank SocialPageRank BookRank ArticleRank ItemRank SimRank DiffusionRank TrustRank TweetRank Gleich & Langville Recap AIM 4 / 21
Ulam Networks Chirikov map Ulam network y t+1 = ηy t +k sin( t +θ t ) 1. divide phase space into100 uniform cells 120 t+1 = t + y t+1 2. form P based on trajectories. Note log(e [x(a)]) A Bet (2, 16) log(std [x(a)]))/ log(e [x(a)]) White is larger, black is smaller Google matrix, dynamical attractors, and Ulam networks, Shepelyansky and Zhirov, arxiv Gleich & Langville Recap AIM 5 / 21
100 120 Choosing alpha Choosing alpha Slide 6 of 21 Choosing personalization Related methods Open issues
What is alpha? There s no single answer. Ask yourself, why am I computing PageRank? Then use the best value for your application. 100 120 tune α for the best feature web-search vector node centrality find important nodes in a web-graph understand what random jumps mean in your graph use the random surfer interpretation Author α Brin and Page (1998) Najork et al. (2007) 0.85 0.85 Litvak et al. (2006) 0.5 Pan el al. (2004) 0.15 Algorithms (...) 0.85 Experiment??? Gleich & Langville Choosing alpha AIM 7 / 21
The PageRank limit value Singular? ( αp)x = (1 α)v 100 120 0 P = X X 0 J 1 1 0 αx X 1 x = (1 α)v X α α 0 J 1 0 0 J 1 0 0 J 1 X 1 x = (1 α)v y = (1 α)z (1 α)y 1 = (1 α)z 1 ( αj 2 )y 2 = (1 α)z 2 Boldi et al. 2003: PageRank as a function of the damping parameter Gleich & Langville Choosing alpha AIM 8 / 21
TotalRank 100 120 1 t = x(α) dα 0 Proposed by Boldi et al. (2005) as a parameter free PageRank. Gleich & Langville Choosing alpha AIM 9 / 21
Generalized PageRank 100 120 PageRank ( αp)x = (1 α)v x = =0 (1 α)(α )P v Generalized PageRank y = =0 ƒ ( )P v ƒ ( ) < TotalRank ƒ ( ) = 1 +1 1 +2 LinearRank... HyperRank... Baeza-Yates et al. 2006 Gleich & Langville Choosing alpha AIM 10 / 21
Pick a distribution Multiple surfers should have an impact! Each person picks α from distribution 100 A 120 x(e [A]) x(e [A]) = E [x(a)] TotalRank : E [x(a)] : A U[0, 1] Constantine & Gleich, Internet Mathematics, in press.... E [x(a)] Gleich & Langville Choosing alpha AIM 11 / 21
From users 100 120 2.0 1.5 1.0 density 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Raw α Sample mean μ = 0.631. Gleich et al., WWW2010 Note 257,664 users from Microsoft toolbar data Gleich & Langville Choosing alpha AIM 12 / 21
100 120 Choosing alpha Choosing personalization Slide 13 of 21 Choosing personalization Related methods Open issues
Personalization choices 100 120 Application specific GeneRank : v = normalized microarray weights TopicRank: v = pages on the same topic TrustRank: v = only pages known to be good BadRank: v = only pages known to be bad (an reverse the graph) Super-personalized Set v to have only a single non-zero : v = e. Gleich & Langville Choosing personalization AIM 14 / 21
Personalized PageRank 100 120 B = (1 α)( αp) 1 B j = personalized score of page when jumping to page Gleich & Langville Choosing personalization AIM 15 / 21
100 120 Choosing alpha Related methods Slide 16 of 21 Choosing personalization Related methods Open issues
PageRank history 100 120 See Vigna 2010: Spectral Ranking and Franceschet 2010: PageRank: Standing on the shoulder of giants. Let A be the adjacency matrix of a graph. PageRank ( αp)x = (1 α)v (αp + (1 α)ve T )x = x Seeley 1949 Wei 1952 Katz 1953 Hubbell 1965 ( αa)x = e Px = x A T x = x A T x = x + v Gleich & Langville Related methods AIM 17 / 21
Graph centrality 100 120 For a graph G, a score assigned to each vertex V is a centrality score if larger scores are more central vertices and the score is independent of the labeling on the vertices. Gleich & Langville Related methods AIM 18 / 21
100 120 Choosing alpha Open issues Slide 19 of 21 Choosing personalization Related methods Open issues
100 120 Vigna, A history of spectral ranking, MMDS2010 Fro
Other issues 100 120 Gleich & Langville Open issues AIM 21 / 21
100 120 QUESTIONS?