Lecture 7 Mathematics behind Internet Search

Similar documents
A Note on Google s PageRank

Inf 2B: Ranking Queries on the WWW

Link Analysis. Leonid E. Zhukov

Graph Models The PageRank Algorithm

1998: enter Link Analysis

Google Page Rank Project Linear Algebra Summer 2012

Node Centrality and Ranking on Networks

Calculating Web Page Authority Using the PageRank Algorithm

Data Mining and Matrices

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Applications to network analysis: Eigenvector centrality indices Lecture notes

How does Google rank webpages?

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Application. Stochastic Matrices and PageRank

Data Mining Recitation Notes Week 3

Announcements: Warm-up Exercise:

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Math 304 Handout: Linear algebra, graphs, and networks.

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

A New Method to Find the Eigenvalues of Convex. Matrices with Application in Web Page Rating

Node and Link Analysis

Practice Problems - Linear Algebra

1 Searching the World Wide Web

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

googling it: how google ranks search results Courtney R. Gibbons October 17, 2017

IR: Information Retrieval

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Link Mining PageRank. From Stanford C246

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Chapter 10. Finite-State Markov Chains. Introductory Example: Googling Markov Chains

eigenvalues, markov matrices, and the power method

Applications. Nonnegative Matrices: Ranking

Justification and Application of Eigenvector Centrality

Introduction to Data Mining

Online Social Networks and Media. Link Analysis and Web Search

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

The Giving Game: Google Page Rank

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Slides based on those in:

Link Analysis Ranking

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

3.3 Eigenvalues and Eigenvectors

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

How Does Google?! A journey into the wondrous mathematics behind your favorite websites. David F. Gleich! Computer Science! Purdue University!

STA141C: Big Data & High Performance Statistical Computing

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Eigenvalues and Eigenvectors

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

MATH36001 Perron Frobenius Theory 2015

ECEN 689 Special Topics in Data Science for Communications Networks

Class President: A Network Approach to Popularity. Due July 18, 2014

Uncertainty and Randomization

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

0.1 Naive formulation of PageRank

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES

Data and Algorithms of the Web

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Eigenvalues and eigenvectors

Updating PageRank. Amy Langville Carl Meyer

Eigenvalues of Exponentiated Adjacency Matrices

Online Social Networks and Media. Link Analysis and Web Search

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

On the eigenvalues of specially low-rank perturbed matrices

Krylov Subspace Methods to Calculate PageRank

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

On the mathematical background of Google PageRank algorithm

Faloutsos, Tong ICDE, 2009

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Complex Social System, Elections. Introduction to Network Analysis 1

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

A linear model for a ranking problem

Numerical Methods I: Eigenvalues and eigenvectors

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

Applications of The Perron-Frobenius Theorem

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Matrices and Linear Algebra

Statistics 992 Continuous-time Markov Chains Spring 2004

CS249: ADVANCED DATA MINING

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Eigenvalues and Eigenvectors

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 12

MAT 1302B Mathematical Methods II

MATH Mathematics for Agriculture II

The Google Markov Chain: convergence speed and eigenvalues

CS246 Final Exam, Winter 2011

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Transcription:

CCST907 Hidden Order in Daily Life: A Mathematical Perspective Lecture 7 Mathematics behind Internet Search Dr. S. P. Yung (907A) Dr. Z. Hua (907B) Department of Mathematics, HKU

Outline Google is the preferred search engine Google s PageRank Ranking web pages by importance scores Iterative steps for finding importance scores Limiting Scores Theorem Perron-Frobenious Theorem

Google and other search engines You probably use Google everyday to search the internet. There are many search engines other than Google, but Google gets a large chunk of search market: Try to search: Lloyd Shapley Google (64.6%) Yahoo! (6%) [Powered by Bing since 0] Why people choose Google? Microsoft/Bing (0.7%) [Homepage similar to Google now] AOL (.%) Ask (.7%) [Powered by HITS] My Web (.%) (Others:.7%) Market share in terms of number of searches in the US in August 009. Source: Nielsen MegaView Search

Google s PageRank The PageRank Algorithm: One major ingredient of Google search engine Rates importance of each webpage of the internet Put important/relevant pages at front upon search requests Consequences: More people use Google Webmasters want to boost up PageRank of their webpages Trading of high PageRank webpages

How it begins Background: text based web pages (*.html). Hypertext is text with links to other text. Jon Kleinberg developed Hyperlink-Induced Topic Search (HITS) in 998. Jon Kleinberg (Cornell University) A new idea of using hyperlink structure of the web to improve search engine results Before this, most search engines used textual content to return relevant document, and the results are not too satisfactory. E.g.: 907 pages contain Lloyd Shapley, the 900 th is about the Nobel prize winner. Need to put them in some order of relevancy.

How it begins Nearby, two PhD students Sergey Brin and Larry Page were using similar (but not the same) ideas. Brin and Page (Stanford U.) Page got an idea about ranking of importance of webpages from the citations of scientific literature. E.g., published work of Shapley may have citations from 0,000 different papers. If replacing citations by links, the same can be said about the importance of websites. Thus, the sites with the most links pointing to them should be considered as more important.

How it begins On the other hand, Page also realized that not all links are created equal: if a site is pointed to by an important site, this would also raise the importance of that site. This is similar to a reference letter from a public figure increase one s importance. Page began calling his link-rating scheme PageRank, and Brin and their PhD advisors worked together to develop an algorithm. With their search algorithm, Brin and Page started a business at their dorm rooms in 998. It later became the giant Google benefited from the elegant mathematics behind PageRank. The name Google was the result of misspelling googol, which means 0 00

Ranking pages Definition (Importance score) Importance score (or just score) means quantitative rating of a web page s importance. We ll use a nonnegative number to represent it. Definition (Backlinks) The inward links to a given page are the backlinks of that page. Basic ideas of PageRank on ranking pages: A page is important if it is pointed to by many other pages, or by other important pages

Web as a directed graph Suppose the web of interest has n pages, each page indexed by a number,,..., n. Represent each forward link by an arrow. Example of a web with 4 pages: A 4-page web Page 4 has backlinks Denote x i the importance score of page i. How can we determine x, x, x, x 4 for our 4-page web?

Formulating importance scores Three rules of the importance scores: R0 Initially, without taking into account the hyperlink structure of the web, all pages are equally important with the same score. For our 4-page web, x = x = x = x 4 =

Formulating importance scores Three rules of the importance scores: After taking into account the hyperlink structure of the web, page scores should be updated as followings. R A page has higher score if it is pointed to by more pages; R A page has higher score if it is pointed to by other important (i.e. higher score) pages. Note R means that the score x i of page i is also the influence power of page i to other pages. And R means that the score x i reflects the total influences from all other pages pointing to page i.

Formulating importance scores To update page scores according to R, we count the number of backlinks to each page (R is now ignored but will be dealt with later). Assuming each backlink has the same weight of power, one may update the page score as x if a page has x backlinks. Consider page in our 4-page web, define x =?, if page has a backlink b,i = from page i; 0, otherwise. Then x = Total backlinks from pages, & 4 = b, + b, + b,4 = (0) + () + () = which is sum of Power of page i (x i ) b,i for i =, & 4.

Formulating importance scores Update other page scores according to R: x = b, + b, + b,4 = () + (0) + (0) = x = b, + b, + b,4 = () + () + () = x 4 = b 4, + b 4, + b 4, = () + () + (0) = As a result, x, x, x, x 4 are all updated simultaneously. However, we need to use x i and b j,i in each Power of page i (x i ) b j,i term better.

Formulating importance scores After updating our 4-page web according to R: x = x = x = x 4 = The way that we assign to b,i has a defect. The power of each page should not be identical because page 4 has sent out two arrows to page, and page has sent out only one arrow to page. So each arrow from page 4 should only be counted as / and same for all the other pages. We now abolish the old weights and re-do the calculations using new weights.

Formulating importance scores Denote N i = number of outward links from page i. Then, all N outward links from page should carry the same weight, say W, given by W = No. of outward links from page = N = For the others, we also set W i = N i. Therefore, we have W =, W =, W 4 = in the updating. and use them

Formulating importance scores Recall that the modified backlink weights for our 4-page web are W =, W =, W =, W 4 =. Consider again page, b,i should be set as x =? W i, if page has a backlink b,i = from page i; 0, otherwise. Then x = b, + b, + b,4 = (0) + (W ) + (W 4 ) ( ) = (0) + () + = which is just the sum of x i b,i for i =,, 4.

Formulating importance scores Update other page scores accordingly: x = b, + b, + b,4 = (W + (0) + (0) ( ) = + (0) + (0) = x = b, + b, + b,4 = (W + (W + (W 4 ( ) ( ) ( ) = + + = x 4 = b 4, + b 4, + b 4, = (W + (W + (0) ( ) ( ) = + + (0) = 5 6

Formulating importance scores After updating our 4-page web according to R: x = x = x = x 4 = 5 6 Note that the newly computed score for each page is different than the old score that was originally used in the updating. Since each page now has a different score (i.e. power), R demands further updating using the newly computed page scores.

Formulating importance scores For our 4-page web, initial scores are x = x = x = x 4 = After updating our 4-page web according to R: x = x = x = x 4 = 5 6 R demands further update of scores: e.g., the backlinks of page comes from pages & 4 with scores initially &, but now & 5 6 ; the score x should be further updated to reflect these changes of importance of each page.

Formulating importance scores Recall that the backlink weights for our 4-page web are W =, W =, W =, W 4 =. x =? W i, if page has a backlink b,i = from page i; 0, otherwise. x = b, + b, + b,4 Consider again page, the score should now be updated as x = x b, + x b, + x 4 b,4 = x (0) + x (W ) + x 4 (W 4 ) ) = (0) + () + 5 6 ( = 4

Formulating importance scores Update other page scores according to R: x = x b, + x b, + x 4 b,4 = x (W + x (0) + x 4 (0) = ( ) + (0) + 5 6 (0) = x = x b, + x b, + x 4 b,4 = x (W + x (W + x 4 (W 4 = ( ) + ( ) + 5 ( ) 6 = x 4 = x b 4, + x b 4, + x b 4, = x (W + x (W + x (0) = ( ) + ( ) + (0) =

Formulating importance scores After further updating our 4-page web as required by R: x = 4 x = x = x 4 = Each newly computed score seems to be different than the old scores! Is this an endless process? The answer would be no if we can make x, x, x, x 4 remain unchanged after certain number of updates. Will this happen? Explanation will be described below (using matrix-vector notation).

Iterative steps for finding scores First for a web of n pages, introduce the notations e, e,..., e n, each equal to, for the initial equally important scores. For our 4-page web, initial scores are e = e = e = e 4 = Then we call the scores after first update the importance scores after iteration. For clarity we denote them as x [k], x [k],..., x n [k], where the superscript [k] indicates that they are scores after k iteration. Conventionally, we denote the initial score e, e,..., e n by x [0], x [0],..., x n [0].

Iterative steps for finding scores The importance scores after iteration are x [] = x [] = x [] = x [] 4 = 5 6 Similarly, the importance scores after iteration are x [] = 4 x [] = x [] = x [] 4 =

The link matrix For a web of n pages, outward links from page i carry the same weight: W i = No. of outward links from page i For a fixed page j, each backlink associates with a number: { W i, if page j has a backlink from page i; b j,i = 0, otherwise. Define the link matrix of an n-page web as 0 b, b, b,n b, 0 b, b,n A = b, b, 0 b,n.... b n, b n, b n, 0

The link matrix The backlink weights for our 4-page web are W =, W =, W =, W 4 =. Then the link matrix is 0 b, b, b,4 0 0 W W 4 A = b, 0 b, b,4 b, b, 0 b,4 = W 0 0 0 W W 0 W 4 b 4, b 4, b 4, 0 W W 0 0 0 0 0 0 0 = 0 0 0

The link matrix Interpretation of the link matrix: 0 0 0 0 W W 4 W 0 0 0 W W 0 W 4 = 0 0 0 0 W W 0 0 0 0 Zero diagonal entries means there is no link from a page to itself. Column i (vertical) represents outward links from page i; Sum of each column is reflects the initial power of each page is. Row j (horizontal) represents backlinks to page j; Put W i at the ith position (of row j) if page i links to page j; otherwise put a 0 there.

Iterative steps for finding scores We used e =, e =, e =, e 4 = in the calculation of the importance scores after iteration ( ) x [] = (0) + () + = ( ) x [] = + (0) + (0) = ( ) ( ) ( ) x [] = + + = ( ) ( ) x [] 4 = + + (0) = 5 6 x [] = x [] = x [] = x [] 4 = 5 6

Iterative steps for finding scores Then the calculation of the importance scores after iteration ( ) x [] = e (0) +e () +e 4 = ( ) x [] = e +e (0) +e 4 (0) = ( ) ( ) ( ) x [] = e +e +e 4 = ( ) ( ) x [] 4 = e +e +e (0) = 5 6 can be written as a matrix-vector equation 0 0 e e A e e = 0 0 0 e = 0 e e 4 0 0 e 4 x [] x [] x [] x [] 4

Iterative steps for finding scores We used x [] =, x [] =, x [] =, x [] 4 = 5 6 in the calculation of the importance scores after iteration x [] = x [] ) = ( ) x [] = ( x [] 4 = ( ) (0) + () +5 6 ) + ( + ( ) ( ) = 4 + (0) +5 6 (0) = + 5 ( ) 6 = + (0) = x [] = 4 x [] = x [] = x [] 4 =

Iterative steps for finding scores Then the calculation of the importance scores after iteration x [] = x [] ) x [] = x [] x [] = x [] ( ) ( ( ) x [] 4 = x [] x [] 4 +x [] ( [] [] (0) +x () +x 4 ) ( ( ) +x [] +x [] [] (0) +x ) = 4 4 (0) = +x [] 4 ( ) = +x [] (0) = can be written as a matrix-vector equation (with the same A) x [] 0 0 x [] x [] x [] A 0 0 0 x [] x [] = x [] 0 x [] = x [] 0 0 x [] 4 x [] 4

Iterative steps for finding scores To satisfy R, we can continue to calculate the importance scores after iterations as x [] 0 0 x [] x [].4... x [] A 0 0 0 x [] x [] = x [] 0 x [] = 0.58... x [] =.6... 0 0 0.8... x [] 4 x [] 4 x [] 4 In general, the importance scores of our 4-page web after k iteration is given by x [k] x [k] x [k] x [k] 4 = A x [k ] x [k ] x [k ] x [k ] 4 = A k e e e e 4 (Is this an endless process?)

Iterative steps for finding scores The iterative formula of the importance scores for our 4-page web can be generalized to a web of n pages. Let A be the link matrix of a web of n pages. Let e = [e, e,, e n ] T be a column vector of n entries of. The scores x [],..., x n [] after iteration are given by 0 b, b, b,n x [] b, 0 b, b,n Ae = b, b, 0 b,n x [] =..... = x []. b n, b n, b n, 0 or equivalently x [] = A e x [] n where x [] is the column vector with entries x [],..., x [] n.

Iterative steps for finding scores The scores after iteration are given by x [] = A x [] = A(Ae) = A e In general, the scores after k iterations are given by x [k] = A x [k ] = A k e ( ) Note that this iteration process can go on and on indefinitely, to obtain newer and newer sets of scores. Question: Will this iteration process go on and on without settling into a fixed score?

Limiting Scores Theorem Thanks to nice properties of A, here is an answer: Theorem (Limiting scores) Suppose the web is interconnected in such a way that one can travel from any given page to any other given page through the existing links (in this case we say the link matrix is irreducible). Let x [k], x [k],..., x n [k] denote the scores of the pages after k iterations. Then, when k becomes bigger and bigger, either () the set of scores will converge to a unique set of limit scores, or, when () does not hold, () then for each page i, the average of its scores x [0] i, x [] i,..., x [k] i will approach a limit score.

Limiting Scores Theorem According to the Theorem, if the web is well-connected enough, then one may just compute the scores x [k], x [k],..., x n [k] by the iterative formula ( ), and observe whether these sets of scores will approach any set of limit scores when k becomes big. If these sets of scores approach some limit set of scores, which is case () of the Theorem, then these limit scores will be the final scores of the pages. On the other hand if case () of the Theorem does not hold, then case () must hold. In this case the average of x [0] i, x [] i,..., x [k] i, when k is big enough, will be taken as the final score of page i. The above will establish the final ranking of the pages. Q: How to find these final scores? A: Solve Av = v with A being the link matrix.

Stochastic matrix and -eigenvector The Limiting Scores Theorem says when k is bigger and bigger, either x [k] will approach to a vector v, or the average of x [0], x [],..., x [k] will approach a vector v. In fact, in either case, v will satisfy Av = v. Definition (-eigenvector) Given an n n matrix a a a n a a a n A =.. a n a n a nn we say that a vector w is a -eigenvector of A if w is not a zero vector and it satisfies Aw = w.

Stochastic matrix and -eigenvector Definition (stochastic matrix) An n n matrix A is called a (column) stochastic matrix if all its entries are nonnegative, and all its column sums are. It is not difficult to see that if a web is such that all its pages have link(s) to some other page(s), then the link matrix of the web 0 b, b, b,n b, 0 b, b,n A = b, b, 0 b,n.... b n, b n, b n, 0 is a stochastic matrix.

Stochastic matrix and -eigenvector To find the -eigenvector v of a stochastic matrix A, one can compute x [0] = Ae, x [] = Ax [0],..., x [k] = Ax [k ],... When k is large enough, x [k] will approach v. Remark: starting with the vector e is required by R0 which states that: Initially, all pages are equally important with the same score. In many cases, it does not take many iterations to observe that the largest entries of x [k] are already in some particular positions, which identify those pages which are most important.

Perron-Frobenious Theorem In fact the Limiting Scores Theorem is a result of Theorem (Perron-Frobenius) For any stochastic matrix A, it must have a -eigenvector v whose entries are all nonnegative and satisfies Av = v. Moreover, if A is irreducible, then it must have a -eigenvector v with entries which are all positive. For the link matrix of a web, if it is an irreducible stochastic matrix then its -eigenvector v with positive entries will determine the ranking of the pages the bigger the i-th entry of v, the higher the rank of page i.

When A is not irreducible Suppose a web is not well-connected, so that some page cannot go to some other page via a route of existing links. Then the corresponding link matrix A is not irreducible, and the Limiting Scores Theorem cannot apply. For simplicity we assume all pages of the web has some links pointing to some other pages, so that the link matrix will not have a column of zeros, and hence is still a stochastic matrix. For this stochastic but not irreducible matrix A, define a matrix S = ( α)a + αe, where 0 < α and E is the n n matrix with all entries equal to n. Then all entries of S are positive, and S is an irreducible stochastic matrix. We may choose α to be a very small positive number so that S is very close to A. Using S as the link matrix instead of A, we may compute scores x [k] i as before. It turns out that, since all entries of S are positive, case () of the Limiting Scores Theorem will hold. We may then take the limit scores as the final page scores of the web.

Google s PageRank (cont.) Main idea of Google s PageRank method: Form a (huge!) stochastic matrix A which represents the link-structure of the www. Form matrix S = ( α)a + αe, where α is a small positive number. Compute the -eigenvector v of S which has all entries positive. The ranking of webpages follows the magnitude of the entries of v. To find the -eigenvector v of S, one can compute x [0] = Se, x [] = Sx [0],..., x [k] = Sx [k ],... When k is large enough, x [k] will approach v.

Example For our example: 4 0 0 0 0 0 A = 0 0 0 After verifying that A is irreducible, there is no need to define S and we may compute the scores x [k] directly using A:.5 x [] = Ae = 0.., x [] = Ax [] = 0.8.75 0.5.08 0.667,

Example (cont.).47.58 x [] = Ax [] = 0.58.67 = Ax [] = 0.47.8 0.8 0.764.56.58 x [5] = Ax [4] = 0.58.46 = Ax [5] = 0.5.67 0.764 0.785 One may pick up the trends from these results and conclude that webpage should rank st, webpage ranks nd, webpage 4 ranks rd, and webpage ranks 4th..5489... In fact the -eigenvector of A is 0.56....69.... 0.7749...

Assignment 7 Q. Assignment 7 Due date: Oct 0 (Monday) before :00pm. Please put your assignment into the assignment box of this course. Please write your tutorial group number on the right hand corner of your assignment. Question Your Google Twin is the person you find as a search result for your own full name on Google. Consider the 4-page web example we have been using in the lecture. The owner of page finds that page is her Google Twin. Upset, she creates a new page 5 that links to page and page also links to page 5. Will this help boost her PageRank score?

Assignment 7 Q. Assignment 7 Question Let A be a matrix ( ) a b. c d An eigenvalue of A is defined to be a number λ such that there exists a (column) vector v such that Av = λv. For a matrix, the eigenvalues of A are roots of the polynomial λ (a + d)λ + (ad bc) = 0. Now suppose A is a column stochastic matrix. Show that there exists no eigenvalue λ of A such that λ >.

References The $5,000,000,000 eigenvector: the linear algebra behind Google, K. Bryan & T. Leise, SIAM Review, 48 (006), 569 58. Deeper inside PageRank, Amy N. Langville & Carl D. Meyer, Internet Mathematics, vol. no., 00, pp.5-80. The Google Story, David A. Vise, Pan Books, 006. How Google Finds Your Needle in the Web s Haystack (www.ams.org/samplings/feature-column/fcarc-pagerank).