Node and Link Analysis

Similar documents
Node Centrality and Ranking on Networks

Link Analysis. Leonid E. Zhukov

Centrality Measures. Leonid E. Zhukov

Centrality Measures. Leonid E. Zhukov

Web Structure Mining Nodes, Links and Influence

Inf 2B: Ranking Queries on the WWW

Applications to network analysis: Eigenvector centrality indices Lecture notes

Link Analysis Ranking

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Data Mining and Matrices

Online Social Networks and Media. Link Analysis and Web Search

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Online Social Networks and Media. Link Analysis and Web Search

Degree Distribution: The case of Citation Networks

1998: enter Link Analysis

STA141C: Big Data & High Performance Statistical Computing

The Second Eigenvalue of the Google Matrix

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Uncertainty and Randomization

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Lecture 7 Mathematics behind Internet Search

Complex Social System, Elections. Introduction to Network Analysis 1

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

ECEN 689 Special Topics in Data Science for Communications Networks

1 Searching the World Wide Web

Calculating Web Page Authority Using the PageRank Algorithm

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Graph Models The PageRank Algorithm

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Math 304 Handout: Linear algebra, graphs, and networks.

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Applications of The Perron-Frobenius Theorem

0.1 Naive formulation of PageRank

Link Mining PageRank. From Stanford C246

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Google Page Rank Project Linear Algebra Summer 2012

Link Analysis. Stony Brook University CSE545, Fall 2016

Multiple Relational Ranking in Tensor: Theory, Algorithms and Applications

Bonacich Measures as Equilibria in Network Models

On the mathematical background of Google PageRank algorithm

Ten good reasons to use the Eigenfactor TM metrics

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Data Mining Recitation Notes Week 3

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

AXIOMS FOR CENTRALITY

IR: Information Retrieval

Computational Economics and Finance

How does Google rank webpages?

6 Evolution of Networks

Justification and Application of Eigenvector Centrality

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Lecture 15 Perron-Frobenius Theory

Hyperlinked-Induced Topic Search (HITS) identifies. authorities as good content sources (~high indegree) HITS [Kleinberg 99] considers a web page

A Note on Google s PageRank

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Algebraic Representation of Networks

Krylov Subspace Methods to Calculate PageRank

Lecture 12: Link Analysis for Web Retrieval

Random Surfing on Multipartite Graphs

Finding central nodes in large networks

Link Analysis and Web Search

Spectral Graph Theory Tools. Analysis of Complex Networks

Slides based on those in:

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Page rank computation HPC course project a.y

Introduction to Data Mining

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

The Push Algorithm for Spectral Ranking

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

NewPR-Combining TFIDF with Pagerank

Computing PageRank using Power Extrapolation

Faloutsos, Tong ICDE, 2009

eigenvalues, markov matrices, and the power method

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

The Google Markov Chain: convergence speed and eigenvalues

Links between Kleinberg s hubs and authorities, correspondence analysis, and Markov chains

Models and Algorithms for Complex Networks. Link Analysis Ranking

Some relationships between Kleinberg s hubs and authorities, correspondence analysis, and the Salsa algorithm

Analysis of Google s PageRank

Randomization and Gossiping in Techno-Social Networks

How Does Google?! A journey into the wondrous mathematics behind your favorite websites. David F. Gleich! Computer Science! Purdue University!

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Applications of Matrix Functions to Network Analysis and Quantum Chemistry Part I: Complex Networks

Convex Optimization CMU-10725

MATH36001 Perron Frobenius Theory 2015

c 2005 Society for Industrial and Applied Mathematics

Lecture: Local Spectral Methods (1 of 4)

Mathematical Properties & Analysis of Google s PageRank

Transcription:

Node and Link Analysis Leonid E. Zhukov School of Applied Mathematics and Information Science National Research University Higher School of Economics 10.02.2014 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 1 / 25

Centrality Measures Sociology. Linton Freeman, 1979. Most "important"actors: actor location in the social network Actor centrality - involvment with other actors, many ties, source or recepient Actor prestige - recipient (object) of many ties, ties directed to an actor Three graphs: Star graph Circle graph Line graph Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 2 / 25

Three graphs Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 3 / 25

Degree Centrality Degree centrality C D (i) = k(i) = j A ij = j A ji Normalized degree centrality C D (i) = 1 n 1 C D(i) High centrality degree -direct contact with many other actors Low degree - not active, peripherial Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 4 / 25

Closeness Centrality Closeness centrality How close an actor to all the other actors in network C C (i) = d(i, j) j 1 Normalized closenss centrality C C (i) = (n 1)C C (i) Actor in the center can quickly interact with all others, short communication path to others, minimal number of steps to reach others Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 5 / 25

Betweenness Centrality Betweenness Centrality Number of shortest paths going through the actor σ st (i) C B (i) = Normalized betweenness centrality s t i σ st (i) σ st C B (i) = 2 (n 1)(n 2) C B(i) Probability that a communication from s to t will go through i (geodesics) Edge betweenness Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 6 / 25

Degree Prestige Degree prestige P D (i) = k in (i) = j A ji Normalized degree prestige P D (i) = 1 n 1 P D(i) Prestigious actors recieve many nominations Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 7 / 25

Proximity Prestige Proximity prestige Influence domain - set of actors that can reach i directly and indirectly. I i - size of influence domain. Average distance j d(j, i)/i i P p (i) = I i/(n 1) j d(j, i)/i i Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 8 / 25

Centralization Centralization (network measure) - how central the most central node in the network in relation to all other nodes. C x - one of the centrality measures N i [C x (p ) C x (p i )] C x = max N i [C x (p ) C x (p i )] Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 9 / 25

Link Analysis Graph G(n, m) and adjacency matrix A ij, edge i j Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 10 / 25

Status or Rank Prestige Leo Katz, 1953. Take into account status (prestige) of directly connected actors p i j N(i) p j = j A ji p j p i = j A ji p j p = A T p (I A T )p = 0 Nontrivial solution only if det(i A T ) = 0. Need to constraint matrix Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 11 / 25

Status or Rank Prestige Leo Katz, 1953 An actor gives equal parts of its prestige to all nearest neighbours p i j A ji p j k out (j) = j A ji k out (j) p j p = (D 1 A) T p where D ii = max(k i, 1) D 1 A - stochastic matrix, j (D 1 A) ij = 1, guaranteed λ max = 1 (D 1 A) T p = λp Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 12 / 25

Eigenvector Centrality Phillip Bonacich, 1972. c i j A ji c j c i = 1 A ji c j κ κc = A T c (κi A T )p = 0 Nontrivial solution only when det(κi A T ) = 0. Eigenvalue problem. Choose eigenvector corresponding do maximum eigenvalue: κ max = κ 1, c = c 1 j Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 13 / 25

Status or Rank Prestige. Eigenvector Centrality Phillip Bonacich, 1987. Parametrized centrality measure c(α, β) c i = j (α + βc j )A ji c = αa T e + βa T c c = α(i βa T ) 1 A T e α - found from normalizaion c 2 = c 2 i = 1 β - parameter, degree and direction of dependence on others Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 14 / 25

Link Analysis Graph G(n, m) zero out degree nodes, k out (i) = 0 zero in degree nodes, k in (i) = 0 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 15 / 25

Real World Bow tie structure of the web Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 16 / 25

Perron-Frobenius Theorem Oscar Perron, 1907, Georg Frobenius,1912. Eigenvalue problem: Pp = λp Perron-Frobenius theorem: Real square matrix with positive entries stochastic (non-negative and rows sum up to one) irreducible (strongly connected graph) aperiodic then unique largest eigenvalue λ max = 1, with positive left eigenvector and power iterations coneverges to it. Solution satisfies p 1 = 1 Stationary distribution of Markov chain Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 17 / 25

PageRank Sergey Brin and Larry Page, 1998 Transition matrix: P = D 1 A Stochastic matrix: P = P + det n PageRank matrix: P = αp + (1 α) eet n Eigenvalue problem (choose solution with λ = 1): P T p = λp Notations: e - unit column vector, d - absorbing nodes indicator vector (column) Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 18 / 25

PageRank computations Eigenvalue problem [ ) ] α ((D 1 A) T + edt + (1 α) eet p = λp n n Power iterations p α(d 1 A) T p + α e n (dt p) + (1 α) e n (et p) p p p Sparse linear system (λ = 1, p 1 = 1) [ )] I α ((D 1 A) T + edt p = (1 α) e n n Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 19 / 25

Hubs and Authorities HITS, Jon Kleinberg, 1999 Citation networks. Reviews vs original research (authoritative) papers authorities, contain useful information, a i hubs, contains links to authorities, h i Mutual recursion Good authorities reffered by good hubs a i j A ji h j Good hubs point to good authorities h i j A ij a j Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 20 / 25

HITS System of linear equations a = αa T h h = βaa Symmetric eigenvalue problem (A T A)a = λa (AA T )h = λh where eigenvalue λ = (αβ) 1 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 21 / 25

Centrality and Prestige of Florentine Families The Medici family marriage network Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 22 / 25

Ranking comparison The Kendall tau rank distance is a metric that counts the number of pairwise disagreements between two ranking lists Kendall rank correlation coefficient, commonly referred to as Kendall s tau coefficient τ = n c n d n(n 1)/2 n c - number of concordant pairs, n d - number of discordant pairs 1 τ 1, perfect agreement τ = 1, reversed τ = 1 Example Rank 1 A B C D E Rank 2 C D A B E τ = 6 4 5(5 1)/2 = 0.2 Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 23 / 25

References Centrality in Social Networks. Conceptual Clarification, Linton C. Freeman, Social Networks, 1, 215-239, 1979 Power and Centrality: A Family of Measures, Phillip Bonacich, The American Journal of Sociology, Vol. 92, No. 5 (Mar., 1987), 1170-1182. Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 24 / 25

References A new status index dervided from sociometric analysis, L. Katz, Psychometrika, 19, 39-43, 1953. Power and centrality: A family of measures, P. Bonacich, American Journal of Sociology, 92,1170-1182,1987. Graph structure in the Web, Andrei Broder et all, Procs of the 9th international World Wide Web conference on Computer networks, 2000. The PageRank Citation Ranknig: Bringing Order to the Web, S. Brin, L. Page, R. Motwany, T. Winograd, Stanford Digital Library Technologies Project, 1998 Authoritative Sources in a Hyperlinked Environment, Jon M. Kleinberg, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, Leonid E. Zhukov (HSE) Lecture 5 10.02.2014 25 / 25