Link Analysis. Leonid E. Zhukov

Similar documents
Node Centrality and Ranking on Networks

Node and Link Analysis

Online Social Networks and Media. Link Analysis and Web Search

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Link Analysis Ranking

1998: enter Link Analysis

Data Mining and Matrices

Applications to network analysis: Eigenvector centrality indices Lecture notes

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

How does Google rank webpages?

Online Social Networks and Media. Link Analysis and Web Search

Inf 2B: Ranking Queries on the WWW

ECEN 689 Special Topics in Data Science for Communications Networks

Data Mining Recitation Notes Week 3

Link Analysis. Stony Brook University CSE545, Fall 2016

0.1 Naive formulation of PageRank

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

Updating PageRank. Amy Langville Carl Meyer

Calculating Web Page Authority Using the PageRank Algorithm

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Page rank computation HPC course project a.y

Graph Models The PageRank Algorithm

Lecture 7 Mathematics behind Internet Search

Web Structure Mining Nodes, Links and Influence

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

IR: Information Retrieval

Lecture 12: Link Analysis for Web Retrieval

eigenvalues, markov matrices, and the power method

Complex Social System, Elections. Introduction to Network Analysis 1

Link Mining PageRank. From Stanford C246

Math 304 Handout: Linear algebra, graphs, and networks.

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

STA141C: Big Data & High Performance Statistical Computing

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

1 Searching the World Wide Web

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Applications. Nonnegative Matrices: Ranking

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

CS47300: Web Information Search and Management

Diffusion and random walks on graphs

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

Intelligent Data Analysis. PageRank. School of Computer Science University of Birmingham

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

MATH36001 Perron Frobenius Theory 2015

Uncertainty and Randomization

The Second Eigenvalue of the Google Matrix

Markov Chains and Stochastic Sampling

A Note on Google s PageRank

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

Updating Markov Chains Carl Meyer Amy Langville

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

SUPPLEMENTARY MATERIALS TO THE PAPER: ON THE LIMITING BEHAVIOR OF PARAMETER-DEPENDENT NETWORK CENTRALITY MEASURES

The Google Markov Chain: convergence speed and eigenvalues

Google Page Rank Project Linear Algebra Summer 2012

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

The Theory behind PageRank

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Slides based on those in:

Introduction to Data Mining

Link Analysis and Web Search

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page

THEORY OF SEARCH ENGINES. CONTENTS 1. INTRODUCTION 1 2. RANKING OF PAGES 2 3. TWO EXAMPLES 4 4. CONCLUSION 5 References 5

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Finding central nodes in large networks

Markov Chains and Spectral Clustering

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Models and Algorithms for Complex Networks. Link Analysis Ranking

On the mathematical background of Google PageRank algorithm

Some relationships between Kleinberg s hubs and authorities, correspondence analysis, and the Salsa algorithm

Affine iterations on nonnegative vectors

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Ten good reasons to use the Eigenfactor TM metrics

Applications of The Perron-Frobenius Theorem

DOMINANT VECTORS APPLICATION TO INFORMATION EXTRACTION IN LARGE GRAPHS. Laure NINOVE OF NONNEGATIVE MATRICES

Algebraic Representation of Networks

COMPSCI 514: Algorithms for Data Science

Mathematical Properties & Analysis of Google s PageRank

The Dynamic Absorbing Model for the Web

Computing PageRank using Power Extrapolation

Degree Distribution: The case of Citation Networks

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Perron Frobenius Theory

Transcription:

Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization of Networks Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 1 / 21

Lecture outline 1 Graph-theoretic definitions 2 Web search engines 3 Web page ranking algorithms Pagerank HITS 4 The Web as a graph 5 PageRank beyond the web Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 2 / 21

Graph theory Graph G(E, V ), V = n, E = m Adjacency matrix A n n, A ij, edge i j Graph is directed, matrix is non-symmetric: A T A, A ij A ji Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 3 / 21

Graph theory sinks: zero out degree nodes, k out (i) = 0, absorbing nodes sources: zero in degree nodes, k in (i) = 0 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 4 / 21

Graph theory Graph is strongly connected if every vertex is reachable form every other vertex. Strongly connected components are partitions of the graph into subgraphs that are strongly connected In strongly connected graphs there is a path is each direction between any two pairs of vertices image from Wikipedia Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 5 / 21

Graph theory A directed graph is aperiodic if the greatest common divisor of the lengths of its cycles is one (there is no integer k >1 that divides the length of every cycle of the graph) image from Wikipedia Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 6 / 21

Web search engine The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page, 1998 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21

Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal) Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 8 / 21

Ranking on directed graph iteratively update r i j N(i) r j = j A ji r j r t+1 i = j A ji rj t, with rj t=0 = rj 0 r t+1 = A T r t, r t=0 = r 0 norm r t+1 r t Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 9 / 21

Ranking on directed graph Absorbing nodes Source nodes Cycles r t+1 = A T r t, r t=0 = r 0 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 10 / 21

PageRank PageRank can be thought of as a model of user behavior. We assume there is a random surfer who is given a web page at random and keeps clicking on links, never hitting back but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. PR(A) = (1 d) + d(pr(t 1)/C(T 1) +... + PR(Tn)/C(Tn)) Sergey Brin and Larry Page, 1998 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 11 / 21

Random walk Random walk on a directed graph p t+1 i = j N(i) p t j d out j = j A ji dj out p j D ii = diag{d out i } p t+1 = (D 1 A) T p t p t+1 = P T p t Markov chain with transition probability matrix P = D 1 A lim t pt = π Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 12 / 21

Perron-Frobenius Theorem Perron-Frobenius theorem (Fundamental Theorem of Markov Chains) If matrix is then stochastic (non-negative and rows sum up to one, describes Markov chain) irreducible (strongly connected graph) aperiodic lim t p t = π and can be found as a left eigenvector πp = π, where π 1 = 1 π - stationary distribution of Markov chain, raw vector Oscar Perron, 1907, Georg Frobenius,1912. Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 13 / 21

PageRank Transition matrix: P = D 1 A Stochastic matrix: P = P + set n PageRank matrix: P = αp + (1 α) eet n Eigenvalue problem (choose solution with λ = 1): P T p = λp Notations: e - unit column vector, s - absorbing nodes indicator vector (column) Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 14 / 21

PageRank computations Eigenvalue problem (λ = 1, p 1 = p T e = 1): (αp + (1 α) eet n ) T p = λp p = αp T p + (1 α) e n Power iterations: p αp T p + (1 α) e n Sparse linear system: (I αp T )p = (1 α) e n Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 15 / 21

Graph structure of the web Bow tie structure of the web Andrei Broder et al, 1999 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 16 / 21

Hubs and Authorities (HITS) Citation networks. Reviews vs original research (authoritative) papers authorities, contain useful information, a i hubs, contains links to authorities, h i Mutual recursion Good authorities reffered by good hubs a i j A ji h j Good hubs point to good authorities h i j A ij a j Jon Kleinberg, 1999 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 17 / 21

HITS System of linear equations a = αa T h h = βaa Symmetric eigenvalue problem (A T A)a = λa (AA T )h = λh where eigenvalue λ = (αβ) 1 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 18 / 21

HITS Focused subgraph of WWW Jon Kleinberg, 1999 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 19 / 21

PageRank beyond the Web from David Gleich Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 20 / 21

References The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin and Lawrence Page, 1998 Authoritative Sources in a Hyperlinked Environment. Jon M. Kleinberg, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1999 Graph structure in the Web, Andrei Broder et all. Procs of the 9th international World Wide Web conference, 2000 A Survey of Eigenvector Methods of Web Information Retrieval. Amy N. Langville and Carl D. Meyer, 2004 PageRank beyond the Web. David F. Gleich, arxiv:1407.5107, 2014 Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 21 / 21