Temporal patterns in information and social systems

Similar documents
6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Power Laws & Rich Get Richer

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

CS224W: Analysis of Networks Jure Leskovec, Stanford University

arxiv: v1 [physics.soc-ph] 15 Dec 2009

1 Complex Networks - A Brief Overview

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000

Chaos, Complexity, and Inference (36-462)

Self Similar (Scale Free, Power Law) Networks (I)

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

1 The preferential attachment mechanism

Chaos, Complexity, and Inference (36-462)

Modeling of Growing Networks with Directional Attachment and Communities

Network models: dynamical growth and small world

poster presented at: Complex Networks: from Biology to Information Technology Pula (CA), Italy, July 2-6, 2007

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Competition and multiscaling in evolving networks

Network Science (overview, part 1)

Growing a Network on a Given Substrate

Modeling face-to-face social interaction networks

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Rate Equation Approach to Growing Networks

Measuring the shape of degree distributions

Mining Triadic Closure Patterns in Social Networks

Wiki Definition. Reputation Systems I. Outline. Introduction to Reputations. Yury Lifshits. HITS, PageRank, SALSA, ebay, EigenTrust, VKontakte

Complex Social System, Elections. Introduction to Network Analysis 1

Page Max. Possible Points Total 100

Google Matrix, dynamical attractors and Ulam networks Dima Shepelyansky (CNRS, Toulouse)

arxiv: v1 [physics.soc-ph] 26 Apr 2017

Modeling Dynamic Evolution of Online Friendship Network

arxiv:cond-mat/ v1 3 Sep 2004

Model Reduction for Edge-Weighted Personalized PageRank

A LINE GRAPH as a model of a social network

PageRank algorithm Hubs and Authorities. Data mining. Web Data Mining PageRank, Hubs and Authorities. University of Szeged.

arxiv:physics/ v1 [physics.soc-ph] 14 Dec 2006

Decision Making and Social Networks

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

The Beginning of Graph Theory. Theory and Applications of Complex Networks. Eulerian paths. Graph Theory. Class Three. College of the Atlantic

Urban characteristics attributable to density-driven tie formation

Node Centrality and Ranking on Networks

arxiv:physics/ v2 [physics.soc-ph] 9 Feb 2007

Modelling exploration and preferential attachment properties in individual human trajectories

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 3 Oct 2005

A PSYCHOPHYSICAL INTERPRETATION OF RASCH S PSYCHOMETRIC PRINCIPLE OF SPECIFIC OBJECTIVITY

networks in molecular biology Wolfgang Huber

Social and Technological Network Analysis: Spatial Networks, Mobility and Applications

Scale-Free Networks. Outline. Redner & Krapivisky s model. Superlinear attachment kernels. References. References. Scale-Free Networks

On Search, Ranking, and Matchmaking in Information Networks

Predicting the Impact of Software Engineering Topics An Empirical Study

Supporting Statistical Hypothesis Testing Over Graphs

Stability and topology of scale-free networks under attack and defense strategies

Small-world structure of earthquake network

Updating PageRank. Amy Langville Carl Meyer

arxiv: v1 [physics.soc-ph] 2 Sep 2008

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Collaborative Filtering. Radek Pelánek

Analysis & Generative Model for Trust Networks

Biased Assimilation, Homophily, and the Dynamics of Polarization

Heavy Tails: The Origins and Implications for Large Scale Biological & Information Systems

Rank and Rating Aggregation. Amy Langville Mathematics Department College of Charleston

UpdatingtheStationary VectorofaMarkovChain. Amy Langville Carl Meyer

Opinion Dynamics on Triad Scale Free Network

Overlapping Communities

Gutenberg-Richter Law for Internetquakes

Infinite Limits of Other Random Graphs

Data Mining Recitation Notes Week 3

Overview of Network Theory

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

NONLINEAR FEEDBACK LOOPS. Posted on February 5, 2016 by Joss Colchester

Overview of gradient descent optimization algorithms. HYUNG IL KOO Based on

HW 3: Heavy-tails and Small Worlds. 1 When monkeys type... [20 points, collaboration allowed]

The Mathematics of Scientific Research: Scientometrics, Citation Metrics, and Impact Factors

Internal link prediction: a new approach for predicting links in bipartite graphs

Fundamentals of Dynamical Systems / Discrete-Time Models. Dr. Dylan McNamara people.uncw.edu/ mcnamarad

Self-organized Criticality in a Modified Evolution Model on Generalized Barabási Albert Scale-Free Networks

DIFFERENTIAL EQUATIONS

Problem (INFORMAL). Given a dynamic graph, find a set of possibly overlapping temporal subgraphs to concisely describe the given dynamic graph in a

Bayesian nonparametric models for bipartite graphs

Link Analysis Ranking

EVOLUTION OF COMPLEX FOOD WEB STRUCTURE BASED ON MASS EXTINCTION

arxiv:physics/ v1 [physics.soc-ph] 11 Mar 2005

* Matrix Factorization and Recommendation Systems

CS224W: Social and Information Network Analysis

arxiv:physics/ v3 [physics.soc-ph] 21 Nov 2005

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Snowballing Effects in Preferential Attachment: The Impact of The Initial Links

The first-mover advantage in scientific publication

Complex Systems Methods 11. Power laws an indicator of complexity?

Construction and Analysis of Climate Networks

Deterministic scale-free networks

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Shlomo Havlin } Anomalous Transport in Scale-free Networks, López, et al,prl (2005) Bar-Ilan University. Reuven Cohen Tomer Kalisky Shay Carmi

Why does attention to web articles fall with time?

A neural network representation of the potential energy surface in Si- and Si-Li systems

Matrix Factorization Techniques For Recommender Systems. Collaborative Filtering

Greedy Search in Social Networks

Worst case analysis for a general class of on-line lot-sizing heuristics

Supplementary Information: The origin of bursts and heavy tails in human dynamics

TEMPORAL EXPONENTIAL- FAMILY RANDOM GRAPH MODELING (TERGMS) WITH STATNET

MARLBORO CENTRAL SCHOOL DISTRICT CURRICULUM MAP. Unit 1: Integers & Rational Numbers

Transcription:

Temporal patterns in information and social systems Matúš Medo University of Fribourg, Switzerland COST Workshop Quantifying scientific impact: networks, measures, insights? 12-13 February, 2015, Zurich Matúš Medo (Uni Fribourg) Temporal patterns... 1 / 12

Outline 1 Growing networks with fitness and aging 2 Temporal bias of PageRank 3 Leaders and followers and the consequences The common theme Temporal patterns and the role of time in information and social systems. Matúš Medo (Uni Fribourg) Temporal patterns... 2 / 12

Preferential attachment (PA) A classical network model Yule (1925), Simon (1955), Price (1976), Barabási & Albert (1999) Growth of cities, citations of scientific papers, WWW,... Matúš Medo (Uni Fribourg) Temporal patterns... 3 / 12

Preferential attachment (PA) A classical network model Yule (1925), Simon (1955), Price (1976), Barabási & Albert (1999) Growth of cities, citations of scientific papers, WWW,... Nodes and links are added with time Probability that a node acquires a new link proportional to its current degree P(i, t) k i (t) 3 1 2 4 Matúš Medo (Uni Fribourg) Temporal patterns... 3 / 12

Preferential attachment (PA) A classical network model Yule (1925), Simon (1955), Price (1976), Barabási & Albert (1999) Growth of cities, citations of scientific papers, WWW,... Nodes and links are added with time Probability that a node acquires a new link proportional to its current degree P(i, t) k i (t) Pros: simple, produces a power-law degree distribution 3 1 2 4 Matúš Medo (Uni Fribourg) Temporal patterns... 3 / 12

PA in scientific citation data Journals of the American Physical Society from 1893 to 2009: degree increment k 5 4 3 2 1 empirical data preferential attachment 0 0 50 100 150 200 250 300 degree k See also Adamic & Huberman (2000), Redner (2005), Newman (2009),... Matúš Medo (Uni Fribourg) Temporal patterns... 4 / 12

PA in scientific citation data Journals of the American Physical Society from 1893 to 2009: degree increment k 5 4 3 2 1 1 year 10 years 30 years 0 0 50 100 150 200 250 300 degree k Matúš Medo (Uni Fribourg) Temporal patterns... 4 / 12

Time decay is fundamental Matúš Medo (Uni Fribourg) Temporal patterns... 5 / 12

Growing networks with fitness and aging (PRL 107, 238701, 2011) Probability that node i attracts a new link P(i, t) k i (t) }{{} degree }{{} f i D R (t) fitness }{{} aging } {{ } relevance The aging factor D R (t) decays with time: a decay of relevance When D R (t) 0, the popularity of nodes eventually saturates Matúš Medo (Uni Fribourg) Temporal patterns... 6 / 12

Growing networks with fitness and aging (PRL 107, 238701, 2011) Probability that node i attracts a new link P(i, t) k i (t) }{{} degree }{{} f i D R (t) fitness }{{} aging } {{ } relevance The aging factor D R (t) decays with time: a decay of relevance When D R (t) 0, the popularity of nodes eventually saturates The bottom line: Good: Produces various realistic degree distributions (power-law, etc.) Bad: Difficult to validate (high-dimensional statistics) Good: This model explains the data much better than any other (PRE 89, 032801, 2014) Matúš Medo (Uni Fribourg) Temporal patterns... 6 / 12

Growing networks with fitness and aging (PRL 107, 238701, 2011) Probability that node i attracts a new link P(i, t) k i (t) }{{} degree }{{} f i D R (t) fitness }{{} aging } {{ } relevance The aging factor D R (t) decays with time: a decay of relevance When D R (t) 0, the popularity of nodes eventually saturates Point to note: Paper popularity grows exponentially with fitness (quality) Fitness depends logarithmically on popularity Matúš Medo (Uni Fribourg) Temporal patterns... 6 / 12

Two forms of aging in information networks The decay of relevance: D R (t) Node relevance influences the in-coming links Matúš Medo (Uni Fribourg) Temporal patterns... 7 / 12

Two forms of aging in information networks The decay of relevance: D R (t) Node relevance influences the in-coming links The decay of activity: D A (t) Nodes activity influences the out-going links Activity decays in time (mostly) Matúš Medo (Uni Fribourg) Temporal patterns... 7 / 12

Two forms of aging in information networks The decay of relevance: D R (t) Node relevance influences the in-coming links The decay of activity: D A (t) Nodes activity influences the out-going links Activity decays in time (mostly) time old nodes recent nodes A growing network with a quick decay of attractiveness and no decay of activity Matúš Medo (Uni Fribourg) Temporal patterns... 7 / 12

The biases of PageRank recent nodes in top 100 100 90 80 70 60 50 40 30 20 indegree PageRank no time bias 10 0 10 100 1000 10000 relevance decay θ R Matúš Medo (Uni Fribourg) Temporal patterns... 8 / 12

The biases of PageRank 10000 r(fitness-pagerank) / r(fitness-indegree) 2.0 activity decay θa 1000 100 1.5 1.0 0.5 10 10 100 1000 10000 relevance decay θ R 0.0 Matúš Medo (Uni Fribourg) Temporal patterns... 8 / 12

The biases of PageRank Implications In citation data, the time scales of relevance and activity decay are very different (Θ A = 0 because outgoing links are created only upon arrival). PageRank (and its variants) is nevertheless commonly applied on citation data. One should think twice! Matúš Medo (Uni Fribourg) Temporal patterns... 8 / 12

The case for leaders in social systems Bipartite user-item data (who bought what at Amazon.com) Similar behavior in monopartite social data (user-user) Most users are driven by item popularity (followers) Some users are driven by item fitness (leaders) Matúš Medo (Uni Fribourg) Temporal patterns... 9 / 12

The case for leaders in social systems Bipartite user-item data (who bought what at Amazon.com) Similar behavior in monopartite social data (user-user) Most users are driven by item popularity (followers) Some users are driven by item fitness (leaders) A user makes a discovery when they are among the first 5 users to collect an eventually highly popular item (top 1% of all items are used as target) A new metric, user surprisal, shows that there are users who make discoveries so often that it cannot be explained by luck Matúš Medo (Uni Fribourg) Temporal patterns... 9 / 12

Leaders in Amazon data A B item degree item degree 1000 ordinary user 500 0 0 1000 time 2000 3000 1000 temporal leader 500 0 0 1000 time 2000 3000 Black bars: popularity of collected items when they are collected. Blue bars: final popularity of collected items. Red circles: discoveries. Matúš Medo (Uni Fribourg) Temporal patterns... 10 / 12

The game changer Network growth model with to rules reproduces the real data patterns 1 Followers choose items driven by k i (t)d R (t) 2 Leaders choose items driven by f i (t)d R (t) Matúš Medo (Uni Fribourg) Temporal patterns... 11 / 12

The game changer Network growth model with to rules reproduces the real data patterns 1 Followers choose items driven by k i (t)d R (t) 2 Leaders choose items driven by f i (t)d R (t) Model data poses a puzzle to classical ranking algorithms fraction of leaders in top 50 1.0 0.8 0.6 0.4 0.2 0.0 bi-hits surprisal random 0 500 1000 1500 2000 number of leaders Matúš Medo (Uni Fribourg) Temporal patterns... 11 / 12

The game changer Network growth model with to rules reproduces the real data patterns 1 Followers choose items driven by k i (t)d R (t) 2 Leaders choose items driven by f i (t)d R (t) Model data poses a puzzle to classical ranking algorithms fraction of leaders in top 50 1.0 0.8 0.6 0.4 0.2 0.0 bi-hits surprisal random 0 500 1000 1500 2000 number of leaders Reason: Insightful choices of the leaders are copied by the followers. All users ultimately collect items of the same fitness and an algorithm acting on a static data snapshot cannot distinguish them. Solution: Algorithms that take time into account adequately. Matúš Medo (Uni Fribourg) Temporal patterns... 11 / 12

Coming back to the guiding questions: Are the implicit assumptions of centrality measures justified in scientometrics? It doesn t seem to be the case! Are altmetrics shallow? Build on the community structure of science! (PLoS ONE 9, e112022, 2014) Thank you for your attention