Complex Social System, Elections. Introduction to Network Analysis 1

Similar documents
Link Analysis. Leonid E. Zhukov

Link Analysis Ranking

Degree Distribution: The case of Citation Networks

Node Centrality and Ranking on Networks

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Finding central nodes in large networks

Online Social Networks and Media. Link Analysis and Web Search

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Link Analysis Information Retrieval and Data Mining. Prof. Matteo Matteucci

Thanks to Jure Leskovec, Stanford and Panayiotis Tsaparas, Univ. of Ioannina for slides

Online Social Networks and Media. Link Analysis and Web Search

Web Structure Mining Nodes, Links and Influence

Lecture: Local Spectral Methods (2 of 4) 19 Computing spectral ranking with the push procedure

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Link Analysis. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

A Note on Google s PageRank

Node and Link Analysis

PageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Applications to network analysis: Eigenvector centrality indices Lecture notes

MultiRank and HAR for Ranking Multi-relational Data, Transition Probability Tensors, and Multi-Stochastic Tensors

How does Google rank webpages?

Introduction to Data Mining

Data and Algorithms of the Web

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Computing PageRank using Power Extrapolation

Slides based on those in:

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

0.1 Naive formulation of PageRank

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

1998: enter Link Analysis

CS54701 Information Retrieval. Link Analysis. Luo Si. Department of Computer Science Purdue University. Borrowed Slides from Prof.

Google PageRank. Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano

How works. or How linear algebra powers the search engine. M. Ram Murty, FRSC Queen s Research Chair Queen s University

Data Mining and Matrices

Jeffrey D. Ullman Stanford University

CS47300: Web Information Search and Management

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Analysis of Google s PageRank

The Push Algorithm for Spectral Ranking

Mathematical Properties & Analysis of Google s PageRank

Kristina Lerman USC Information Sciences Institute

Math 2J Lecture 16-11/02/12

ECEN 689 Special Topics in Data Science for Communications Networks

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Improving Diversity in Ranking using Absorbing Random Walks

Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.

Link Analysis. Stony Brook University CSE545, Fall 2016

IR: Information Retrieval

CS6220: DATA MINING TECHNIQUES

1 Searching the World Wide Web

As it is not necessarily possible to satisfy this equation, we just ask for a solution to the more general equation

Extrapolation Methods for Accelerating PageRank Computations

Local properties of PageRank and graph limits. Nelly Litvak University of Twente Eindhoven University of Technology, The Netherlands MIPT 2018

Lecture 7 Mathematics behind Internet Search

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Information Retrieval and Search. Web Linkage Mining. Miłosz Kadziński

Why matrices matter. Paul Van Dooren, UCL, CESAME

Class President: A Network Approach to Popularity. Due July 18, 2014

Krylov Subspace Methods to Calculate PageRank

Lecture: Local Spectral Methods (1 of 4)

Data science with multilayer networks: Mathematical foundations and applications

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Page rank computation HPC course project a.y

Calculating Web Page Authority Using the PageRank Algorithm

Jeffrey D. Ullman Stanford University

STA141C: Big Data & High Performance Statistical Computing

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

Eigenvalues of Exponentiated Adjacency Matrices

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

Complex Networks CSYS/MATH 303, Spring, Prof. Peter Dodds

Math 304 Handout: Linear algebra, graphs, and networks.

Multimedia Databases - 68A6 Final Term - exercises

arxiv: v1 [physics.soc-ph] 26 Apr 2017

Eigenvalue Problems Computation and Applications

CS6220: DATA MINING TECHNIQUES

Authoritative Sources in a Hyperlinked Enviroment

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Analysis and Computation of Google s PageRank

Cutting Graphs, Personal PageRank and Spilling Paint

Lecture 12: Link Analysis for Web Retrieval

2. The Power Method for Eigenvectors

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

Data Mining and Analysis: Fundamental Concepts and Algorithms

Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search

AXIOMS FOR CENTRALITY

On the eigenvalues of specially low-rank perturbed matrices

Bounds on the Ratio of Eigenvalues

Uncertainty and Randomization

Updating PageRank. Amy Langville Carl Meyer

Ten good reasons to use the Eigenfactor TM metrics

Math 307 Learning Goals. March 23, 2010

Transcription:

Complex Social System, Elections Introduction to Network Analysis 1

Complex Social System, Network I person A voted for B A is more central than B if more people voted for A In-degree centrality index = number of in-edges Introduction to Network Analysis 2

Complex Social System, Network II person A convinced B Central because it mediates information between groups - Betweeness centrality (between groups) Another example: friendship network, degree-based centrality Introduction to Network Analysis 3

Measures and Metrics Q: What are the most important nodes and edges? Degree centrality: Closeness centrality: Edge removal centrality: There are many different ways to quantify the importance Can you come up with your interesting new measure of importance? Is it important for different types of networks? Can you compute it efficiently? Introduction to Network Analysis 4

Eccentricity from U. Brandes Network Analysis Introduction to Network Analysis 5

Eigenvector Centrality - - - - degree centrality - - - - Repeating last step t times Rewrite this linear combination with eigenvectors of A d(i)=2 Eigenvector centrality is defined by values Introduction to Network Analysis 6

Eigenvector Centrality 1 0.120 3 0.103 8 0.101 5 0.095 4 0.093 9 0.092 6 0.085 16 0.077 2 0.053 5 3 6 2 7 1 4 16 8 9 15 11 10 12 7 0.044 11 0.039 13 15 0.038 10 0.035 12 0.018 13 0.005 14 0.001 17 0.000 14 17 Introduction to Network Analysis 7

PageRank Markov Chain: probabilities of visiting the pages after k steps is P k x. Problem: dangling nodes (d + (i)=0) Solution: damping factor a (usually small) transition matrix regular random walk teleportation Note: P is still a Markov matrix with no 0s Introduction to Network Analysis 8

Eigenproblem-based Centralities: Computational Problems These problems are equivalent to solving eigenproblems, namely, P v = v. Gaussian elimination is very expensive. Iterative computation of P v is less expensive because P is usually sparse. Principal eigenvector of P is a PageRank. It can be solved by Power Method (iterative) and by Algebraic Multigrid. Parallel computation Sublinear computation Updating PageRank vector, evolving networks Further reading: Berkhin ``A Survey on PageRank computing Langville, Meyer ``Deeper Inside PageRank Introduction to Network Analysis 9

Hubs and Authorities Observation: In some networks a vertex may be important if it points to others with high centrality. Examples: X, Y, Z Cool Survey 9/11 contacts X sends an information to many leaders but they don t know X. X is influential. Scientific paper citations Raddicchi et al. Diffusion of scientific credits and the ranking of scientists Introduction to Network Analysis 10

Hubs and Authorities Authorities are nodes that contain useful information on a topic of interest. Hubs are nodes that point to best authorities. HITS Algorithm computes for each node i its authority and hub centralities x i and y i, respectively same leading (see eigenvector centrality) eigenvalue in both cases! All eigenvalues of AA T and A T A are the same (check and prove!). Multiplying both sides of the first equation by A T gives i.e., once we have a vector of authorities, the hub centrality can be calculated faster Note: Authority centrality of A = Eigenvector centrality of co-citation matrix AA T Current applications: Teoma.com and Ask.com Further reading: Kleinberg, Authoritative sources in a hyper-linked environment Introduction to Network Analysis 11

Closeness Centrality Problems: i-j path is infinity (for example different connected components) Solution: Harmonic Mean Distance Centrality In practice C spans relatively small range of values Geodesic distances are integers, often small because distance increases logarithmically with the size of the network Example: Internet Movie Database highest centrality = 0.4143, lowest centrality = 0.1154, ratio 3.6 for 500K actors Introduction to Network Analysis 12

Shortest Paths-based Centrality is a number of s-t shortest paths containing i is a number of all s-t shortest-paths Observation: In practice, communication or transport of goods in networks follow different kinds of paths that tend to be shortest. Question: How much work can be done by a node? Disadvantage? Stress Centrality for nodes Relation between stress centralities Betweenness Centrality for edges Interpretation: BC is a quantification of communication control which has a vertex over all pairs of nodes. Introduction to Network Analysis 13