Matching. Lecture 13 Link Analysis ( ) 13.1 Link Analysis ( ) 13.2 Google s PageRank Algorithm The Top Ten Algorithms in Data Mining

Similar documents
SoSe 2014: M-TANI: Big Data Analytics

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Math 1051 Diagnostic Pretest Key and Homework

4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX. be a real symmetric matrix. ; (where we choose θ π for.

Link Mining PageRank. From Stanford C246

COMP 465: Data Mining More on PageRank

r = cos θ + 1. dt ) dt. (1)

AQA Further Pure 2. Hyperbolic Functions. Section 2: The inverse hyperbolic functions

The Atwood Machine OBJECTIVE INTRODUCTION APPARATUS THEORY

1.1 Reviewing the Exponent Laws

Slides based on those in:

A Planar Perspective Image Matching using Point Correspondences and Rectangle-to-Quadrilateral Mapping

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

OXFORD H i g h e r E d u c a t i o n Oxford University Press, All rights reserved.

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Exponents and Powers

1 Linear Least Squares

Bellman Optimality Equation for V*

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system

7-1: Zero and Negative Exponents

Chapter Bisection Method of Solving a Nonlinear Equation

Reinforcement Learning

Types of forces. Types of Forces

Second degree generalized gauss-seidel iteration method for solving linear system of equations. ABSTRACT

Chapter 0. What is the Lebesgue integral about?

Reinforcement learning II

4 7x =250; 5 3x =500; Read section 3.3, 3.4 Announcements: Bell Ringer: Use your calculator to solve

UNIQUENESS THEOREMS FOR ORDINARY DIFFERENTIAL EQUATIONS WITH HÖLDER CONTINUITY

PHY 5246: Theoretical Dynamics, Fall Assignment # 5, Solutions. θ = l mr 2 = l

1. Extend QR downwards to meet the x-axis at U(6, 0). y

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Introduction to Data Mining

approaches as n becomes larger and larger. Since e > 1, the graph of the natural exponential function is as below

Does the Order Matter?

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Chapter 3 MATRIX. In this chapter: 3.1 MATRIX NOTATION AND TERMINOLOGY

PHYS 601 HW3 Solution

We will see what is meant by standard form very shortly

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.

Construction of Gauss Quadrature Rules

Phys101 Lecture 4,5 Dynamics: Newton s Laws of Motion

Ranking Systems: The PageRank Axioms

Before we can begin Ch. 3 on Radicals, we need to be familiar with perfect squares, cubes, etc. Try and do as many as you can without a calculator!!!

I do slope intercept form With my shades on Martin-Gay, Developmental Mathematics

Chapter 3 Polynomials

CSCI 5525 Machine Learning

Matrix Eigenvalues and Eigenvectors September 13, 2017

New Expansion and Infinite Series

Bayesian Networks: Approximate Inference

Administrivia CSE 190: Reinforcement Learning: An Introduction

fractions Let s Learn to

Math Lecture 23

B.Sc. in Mathematics (Ordinary)

Read section 3.3, 3.4 Announcements:

A sequence is a list of numbers in a specific order. A series is a sum of the terms of a sequence.

Week 10: DTMC Applications Ranking Web Pages & Slotted ALOHA. Network Performance 10-1

Math 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

7 - Continuous random variables

Chapter 14. Matrix Representations of Linear Transformations

Data and Algorithms of the Web

Z b. f(x)dx. Yet in the above two cases we know what f(x) is. Sometimes, engineers want to calculate an area by computing I, but...

TWO DIMENSIONAL INTERPOLATION USING TENSOR PRODUCT OF CHEBYSHEV SYSTEMS

Discussion Question 1A P212, Week 1 P211 Review: 2-D Motion with Uniform Force

AT100 - Introductory Algebra. Section 2.7: Inequalities. x a. x a. x < a

The graphs of Rational Functions

Quotient Rule: am a n = am n (a 0) Negative Exponents: a n = 1 (a 0) an Power Rules: (a m ) n = a m n (ab) m = a m b m

Best Approximation in the 2-norm

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Math 31S. Rumbos Fall Solutions to Assignment #16

EXPONENT. Section 2.1. Do you see a pattern? Do you see a pattern? Try a) ( ) b) ( ) c) ( ) d)

M344 - ADVANCED ENGINEERING MATHEMATICS

Math 1102: Calculus I (Math/Sci majors) MWF 3pm, Fulton Hall 230 Homework 2 solutions

CS6220: DATA MINING TECHNIQUES

Test 3 Review. Jiwen He. I will replace your lowest test score with the percentage grade from the final exam (provided it is higher).

Proc. of the 8th WSEAS Int. Conf. on Mathematical Methods and Computational Techniques in Electrical Engineering, Bucharest, October 16-17,

Nondeterminism and Nodeterministic Automata

Recitation 3: More Applications of the Derivative

JDEP 384H: Numerical Methods in Business

ECONOMETRIC THEORY. MODULE IV Lecture - 16 Predictions in Linear Regression Model

AP Calculus Multiple Choice: BC Edition Solutions

BIFURCATIONS IN ONE-DIMENSIONAL DISCRETE SYSTEMS

Lyapunov-type inequalities for Laplacian systems and applications to boundary value problems

MA 131 Lecture Notes Calculus Sections 1.5 and 1.6 (and other material)

Name Date. In Exercises 1 6, tell whether x and y show direct variation, inverse variation, or neither.

Each term is formed by adding a constant to the previous term. Geometric progression

Chapter 6 Notes, Larson/Hostetler 3e

Section 6.1 INTRO to LAPLACE TRANSFORMS

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Chapter 3. Vector Spaces

Lesson 1: Quadratic Equations

The Regulated and Riemann Integrals

Equations and Inequalities

Chapter 3 Solving Nonlinear Equations

dt. However, we might also be curious about dy

( dg. ) 2 dt. + dt. dt j + dh. + dt. r(t) dt. Comparing this equation with the one listed above for the length of see that

Formulae For. Standard Formulae Of Integrals: x dx k, n 1. log. a dx a k. cosec x.cot xdx cosec. e dx e k. sec. ax dx ax k. 1 1 a x.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Riemann Sums and Riemann Integrals

Transcription:

Lecture 13 Link Anlsis () 131 13.1 Serch Engine Indexing () 132 13.1 Link Anlsis () 13.2 Google s PgeRnk Algorith The Top Ten Algoriths in Dt Mining J. McCorick, Nine Algoriths Tht Chnged the Future, Princeton Universit Press. 2 nd chpter 13.3 Efficient Coputtion of PgeRnk b MpReduce () Slides provided b J. Leskovec, A. Rjrn, J. Ulln, vilble t http://www.ds.org Mtching 133 Web s Directed () Grph () 134 Quer ct Pges 1, 3 Quer ct dog Pges 3 Quer ct st : Wordloction trick Ct: 1-2, 3-2 St: 1-3, 3-7 Answer: Pge 1

How to orgnize the Web? (1) First tr: Hun curted Web directories () Yhoo, DMOZ, LookSrt Ter sp: Add ter ovie 1000 ties t the botto () of the pge. Chnge the color of the text to the bckground ( ) color of the pge, ke it rell sll. 135 How to orgnize the Web? (2) Second tr: Web Serch Infortion Retrievl investigtes: Find relevnt docs in sll nd trusted set Newspper rticles, Ptents, etc. But: Web is huge, full of untrusted docuents, rndo things, web sp, etc. Ide: Links s votes Pge is ore iportnt if it hs ore links: In-coing links? Out-going links? www.stnford.edu 2,3400 in-links, www.joe-schoe.co 1 in-link 136 13.2 Google s PgeRnk Algorith (1/2) References: http://pr.efctor.de/e-pgernk-lgorith.shtl L. Pge, S. Brin, R. Motwni, nd T. Winogrd, The PgeRnk Cittion Rnking: Bringing Order to the Web, Stnford () Universit, 1999. Pgernk: A qulit () esure of Web pges () Assuption: The nuber of links to pge gives infortion bout the iportnce () of pge For web pge i, it hs 1 inlinks nd 3 outlinks. i 13-7 Google s PgeRnk Algorith (2/2) PR(X) is the PgeRnk of pge X rndo surf () (Mrkov chin, 1906) PR ( A ) PR ( A ) PR ( C ), PR ( B ), PR ( C ) 2 PR ( A ) 0 0 1 PR ( A ) PR B 1 1 PR B ( ) 0 0 ( ) Mx PR C 2 ( ) PR ( C ) 1 1 0 2 PR ( A ) PR ( B ) 2 The pgernk vector: Eigenvector of M with eigenvlue of 1. ( (Twitter) (who-to-follow) 13-8

Solving PgeRnk Algorith (1/2) Solving directl () the bove sste of equtions: 2r PR r. 2r r 1/ 5 PR( A) PR( B) PR( C) 1 Web pges A nd C re ore iportnt () thn pge B Rndoize the order of A nd C (wiki) Arithetic coplexit of Gussin eliintion: O(n 3 ) Proble: The web consists of trillion (10 12 ) of docuents (Google, 7/25/2008) 13-9 Solving PgeRnk Algorith (2/2) Insted, the pgernk vector is coputed b the following itertive () process: PR(k+1) = M PR(k) If the trix M stisfies certin conditions, the process converges () to unique distribution nd it converges ver fst! Excel: 3 * 1 =MMULT($A$2:$C$4,E2:E4) ctrl+shift+enter Stop if x 1 = 1iN x i is the L1 nor Cn use n other vector nor, e.g., Eucliden 13-10 1311 1312 PgeRnk: Probles (1) Soe pges re ded ends (hve no out-links) Rndo wlk hs nowhere to go to Such pges cuse iportnce to lek out Ded end Proble: Ded Ends ½ ½ 0 ½ 0 0 0 ½ 0 r = r /2 + r /2 r = r /2 r = r /2 (2) Spider trps () (ll out-links re within the group) Rndo wlk gets stuck in trp And eventull spider trps bsorb ll iportnce Itertion 0, 1, 2, 3, r 1/3 2/6 3/12 5/24 0 r = 1/3 1/6 2/12 3/24 0 r 1/3 1/6 1/12 2/24 0 The PgeRnk leks out since the trix is not stochstic.

1313 1314 Solution: Teleport ()! (1) Solution: Teleport ()! (2) Teleports: Follow rndo teleport links with probbilit 1.0 fro ded-ends ½ ½ r = r /2 + r /2 + r /3 ½ 0 0 ½ r = r /2 + r /3 r = r /2 + r /3 ½ ½ 0 ½ 0 0 0 ½ 0 ½ ½ ½ 0 0 ½ Itertion 0, 1, 2, r 1/3 8/18 49/108 6/13 r = 1/3 5/18 34/108 4/13 r 1/3 5/18 25/108 3/13 1315 1316 Proble: Spider Trps () Solution: Teleports! ½ ½ 0 ½ 0 0 0 ½ 1 is spider trp Itertion 0, 1, 2, r = r /2 + r /2 r = r /2 r = r /2 + r The Google solution for spider trps: At ech tie step, the rndo surfer hs two options With probbilit, follow link t rndo With prob. 1-, jup to soe rndo pge Coon vlues for re in the rnge 0.8 to 0.9 Surfer will teleport out of spider trp within steps r 1/3 2/6 3/12 5/24 0 r = 1/3 1/6 2/12 3/24 0 r 1/3 3/6 7/12 16/24 1 All the PgeRnk score gets trpped in node.

The Google Mtrix PgeRnk eqution [Brin-Pge, 98] The Google Mtrix A: We hve recursive proble: And the Power ethod still works! Wht is? In prctice =0.8 (ke 5 steps on vg., jup) 1317 [1/N] NxN N b N trix where ll entries re 1/N = Rndo Teleports ( ) 7/15 1/3 1/3 1/3 0.33 0.20 0.46 13/15 0.24 0.20 0.52 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 7/15 7/15 1/15 = 7/15 1/15 1/15 1/15 7/15 13/15 0.26 0.18 0.56 M... A 7/33 5/33 21/33 1318 [1/N] NxN 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 13.3 Efficient Coputtion of PgeRnk b MpReduce () One itertion of the PgeRnk lgorith involves tking n estited PgeRnk vector v nd coputing the next estite v b = 0.8, e is vector of ll 1 s, n is the nuber of nodes in the grph Proble: The web consists of trillion (10 12 ) of docuents (Google, 7/25/2008). v is uch too lrge to fit in in eor 1319 PgeRnk b MpReduce () Prtition trix into squre blocks Size of the trix One pproch Mpper: Copute Reducer: on one processor, totll 9 1320

Cluster Architecture 1321 Soe Infortion 1322 1 Gbps between n pir of nodes in rck Me Switch Me Ech rck contins 16-64 nodes 2-10 Gbps bckbone between rcks Switch Me Switch Me Jeffre Den nd Snj Ghewt, MpReduce: Siplified Dt Processing on Lrge Clusters, Counictions of the ACM, 2008, pp. 107-113. The pper on Google s p-reduce Chpter 2 Mp-Reduce nd the New Softwre Stck in the 2nd edition, http://www.ds.org Cheng T. Chu, et l., Mp-Reduce for Mchine Lerning on Multicore, NIPS, 2006, pge 281-288. Ipleenttions of 10 lgoriths In 2011 it ws estited tht Google hd 1M chines, http://bit.l/shh0ro