Supplementary Information Activity driven modeling of time varying networks

Similar documents
Time varying networks and the weakness of strong ties

Epidemics in Complex Networks and Phase Transitions

A new centrality measure for probabilistic diffusion in network

Modeling face-to-face social interaction networks

The Spreading of Epidemics in Complex Networks

Epidemics and information spreading

Epidemics on networks

Oscillatory epidemic prevalence in g free networks. Author(s)Hayashi Yukio; Minoura Masato; Matsu. Citation Physical Review E, 69(1):

ECS 289 F / MAE 298, Lecture 15 May 20, Diffusion, Cascades and Influence

A BINOMIAL MOMENT APPROXIMATION SCHEME FOR EPIDEMIC SPREADING IN NETWORKS

Urban characteristics attributable to density-driven tie formation

SIS epidemics on Networks

arxiv: v2 [cond-mat.stat-mech] 9 Dec 2010

Network Science & Telecommunications

Introduction to SEIR Models

Reaction-diffusion processes and meta-population models in heterogeneous networks Supplementary information

Géza Ódor MTA-EK-MFA Budapest 16/01/2015 Rio de Janeiro

Notes on MapReduce Algorithms

Lecture VI Introduction to complex networks. Santo Fortunato

Transmission in finite populations

Lecture 10. Under Attack!

Géza Ódor, MTA-EK-MFA Budapest Ronald Dickman, UFMG Brazil 08/04/2015

Data Mining and Analysis: Fundamental Concepts and Algorithms

Comparative analysis of transport communication networks and q-type statistics

Analysis and Monte Carlo simulations of a model for the spread of infectious diseases in heterogeneous metapopulations

Die-out Probability in SIS Epidemic Processes on Networks

Virgili, Tarragona (Spain) Roma (Italy) Zaragoza, Zaragoza (Spain)

Thursday. Threshold and Sensitivity Analysis

INCT2012 Complex Networks, Long-Range Interactions and Nonextensive Statistics

CS224W: Analysis of Networks Jure Leskovec, Stanford University

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 18 Feb 2004 Diego Garlaschelli a,b and Maria I. Loffredo b,c

Improved Bounds on the Epidemic Threshold of Exact SIS Models on Complex Networks

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 24 Apr 2004

Preventive behavioural responses and information dissemination in network epidemic models

Mathematical Analysis of Epidemiological Models: Introduction

Machine Learning and Modeling for Social Networks

On Susceptible-Infected-Susceptible Epidemic Spreading: an Overview of Recent Study

CS224W: Analysis of Networks Jure Leskovec, Stanford University

ECS 289 / MAE 298, Lecture 7 April 22, Percolation and Epidemiology on Networks, Part 2 Searching on networks

KINETICS OF SOCIAL CONTAGION. János Kertész Central European University. SNU, June

Phase Transitions of an Epidemic Spreading Model in Small-World Networks

Personalized Social Recommendations Accurate or Private

1 Complex Networks - A Brief Overview

Inferring the origin of an epidemic with a dynamic message-passing algorithm

A note on modeling retweet cascades on Twitter

arxiv: v2 [physics.soc-ph] 9 Jul 2016

Finding Rumor Sources on Random Trees

arxiv:physics/ v3 [physics.soc-ph] 28 Jan 2007

Networks as a tool for Complex systems

arxiv: v1 [physics.soc-ph] 24 Mar 2013

SOCIAL MEDIA IN THE COMMUNICATIONS CENTRE

Research Article Effects of Resource Limitations and Cost Influences on Computer Virus Epidemic Dynamics and Tipping Points

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000

Introduction: What one must do to analyze any model Prove the positivity and boundedness of the solutions Determine the disease free equilibrium

Several ways to solve the MSO problem

Modeling Dynamic Evolution of Online Friendship Network

Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard

arxiv: v2 [physics.soc-ph] 5 Sep 2016

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

COOPERATIVE EPIDEMIC SPREADING IN COMPLEX NETWORKS

Quarantine generated phase transition in epidemic spreading. Abstract

arxiv:cond-mat/ v3 [cond-mat.dis-nn] 10 Dec 2003

Assortativity and Mixing. Outline. Definition. General mixing. Definition. Assortativity by degree. Contagion. References. Contagion.

Degree Distribution: The case of Citation Networks

Social Networks- Stanley Milgram (1967)

A node-based SIRS epidemic model with infective media on complex networks

How to distribute antidote to control epidemics

Social Influence in Online Social Networks. Epidemiological Models. Epidemic Process

F 99 Final Quiz Solutions

Modeling Social Media Memes as a Contagious Process

Balanced Allocation Through Random Walk

Supplementary Information Modeling human mobility responses to the large-scale spreading of infectious diseases

LAW OF LARGE NUMBERS FOR THE SIRS EPIDEMIC

arxiv: v1 [physics.soc-ph] 15 Dec 2009

Epidemic spreading is always possible on regular networks

Complex networks: an introduction Alain Barrat

Note on Influences of Resource Limitations and Transmission Costs on Epidemic Simulations and Critical Thresholds in Scale-Free Networks

Analytically tractable processes on networks

THRESHOLDS FOR EPIDEMIC OUTBREAKS IN FINITE SCALE-FREE NETWORKS. Dong-Uk Hwang. S. Boccaletti. Y. Moreno. (Communicated by Mingzhou Ding)

Epidemics on networks

Diffusion of information and social contagion

Spreading and Opinion Dynamics in Social Networks

A Study on Relationship between Modularity and Diffusion Dynamics in Networks from Spectral Analysis Perspective

Estimating the Exponential Growth Rate and R 0

Spreading of infectious diseases on complex networks with non-symmetric transmission probabilities

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for COMPUTATIONAL BIOLOGY A. COURSE CODES: FFR 110, FIM740GU, PhD

On Influential Node Discovery in Dynamic Social Networks

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap

Edge Removal in Random Contact Networks and the Basic Reproduction Number

Dynamics of a network-based SIS epidemic model with nonmonotone incidence rate

arxiv: v1 [cs.si] 14 Dec 2018

Fixed Point Analysis of Kermack Mckendrick SIR Model

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Forecast and Control of Epidemics in a Globalized World

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling

Complex Social System, Elections. Introduction to Network Analysis 1

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing

Transcription:

Supplementary Information Activity driven modeling of time varying networks. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani May 11, 2012 Contents 1 The Model 1 1.1 Integrated network...................................... 2 2 Epidemic threshold 4 2.1 Case m = 1........................................... 4 2.2 Case m > 1........................................... 5 3 Persistence of links 5 4 Dataset Details 6 4.1 PRL Dataset.......................................... 6 4.2 Twitter Dataset........................................ 8 4.3 IMDb Dataset......................................... 8 1 The Model Let us consider initially disconnected individuals/nodes. We define for each agent i the activity potential x i as the probability that any social act/connection in the system has been engaged by the actor i. We define F(x) as the probability distribution that a randomly chosen agent, i, has activity potential x. To avoid possible divergences of the distribution close to 0 we fix a bound by imposing x [ɛ, 1]. Given the activity potential x i is natural to define the activity rate a i as: a i = ηx i, (1) where η is a rescaling factor defined such that the average number of active nodes per unit time in the system is η x. At each time step t the network G t is build starting from disconnected vertices (all the edges in the network are deleted). The links are created in the following way: With probability a i t the vertex i becomes active (fires) and generates m links that are connected to m other vertex selected randomly 1

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 2 With probability 1 a i t the node will not fire, but can however receive connections from other active vertices. 1.1 Integrated network We define the integrated network G as the union of all networks obtained in each time step, G = t=t t=0 G t. (2) Multiple edges or self-links are not allowed. For a given distribution of activity potential, F(x), we are interested in computing the degree distribution of the integrated network at time T, P T (k). Let us consider the number τ = T/ t of instantaneous networks generated up to time T. The number of times that the vertex i will be active is given by a binomial distribution with an average τa i t = Ta i. At time T the average number of active nodes will be while in the single instantaneous network will be Ta i = T a, (3) i t = a t η x t. (4) Each active node will create m links. Therefore, the average number of edges per unit time will be on average: E t = mη x (5) leading to an average degree per unit time k t = 2E t = 2mη x. (6) The instantaneous network will be composed by a set of stars, the vertices that were active at that time step, with degree larger than or equal to m, plus some vertices with low degree. The structure of the integrated network will be however much complex. Consider this integrated network at time T. The degree of each one of its nodes can be written as k T (i) = k out T (i) + k in T (i), where the out-degree k out T (i) corresponds to the links emanating from i due to its becoming active, while the in-degree k in T (i) is due to the links that have arrived to i from other active nodes. Let us focus on the out-degree. In the time interval T, node i will have tried to send in average Tma i edges. ot all those edges will contribute to its integrated out-degree, just only those arriving to different nodes (no multiple links are allowed in the integrated network). The out-degree can thus be computed by making an analogy to the Polya urns problem: it will be equal to the number of different balls extracted from a urn with balls, performing Tma i extractions. The probability of extracting d balls will be given by ( ) P(d) = p d (1 p) ( d) (7) d

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 3 where ( p = 1 1 ) 1 Tmai (8) is the probability of extracting at least one ball in the urn. The average value is then d = p. (9) Therefore the average out-degree of a vertex i in the integrated network can be estimated as k out T (i) = (1 e Tma i/ ) (10) in the limit of large and small T/. The in-degree will come from the rest of vertices that are active and have sent connections to i, without receiving them from it. In T time steps, vertex i will have fired Ta i times in average. The probability that a vertex has not received any connection form i will then be ( 1 1 ) mtai exp[ mta i /], (11) where we assume again large and small T/. The average number of vertices that have fired and are not connected to i with and egde that emanated from i will then be T a exp[ mta i /]. These nodes have m possibilities to reach at random i, each with probability 1/. The number of connections that have reached this node will then be on average mtη x exp[ mta i /]. The degree of the node i will be simply the sum of these to contributions: k T (i) = k out T (i) + k in T (i) = (1 e Tmai/ ) + Tmη x e Tma i/ [ ( = 1 1 mη x T ) ] e Tmai/ ) (12) (1 e Tma i/ ) = (1 e Tmηx i/ ), (13) again for small T/. From the last relation we can write the activity potential x as an effective function of the integrated degree k, i.e.: x(k) = ( ηmt ln 1 k ). (14) At the probabilistic level we can use the relation P T (k)dk F(x)dx, where P is the degree distribution of the integrated, network after T time steps, to finally obtain P T (k) F[x(k)] dx(k) dk = 1 [ 1 Tmη 1 k F ( ηmt ln 1 k )]. (15) In the limit of small k/ (for not very large values of T) we can expand the logarithm, to obtain the simplified expression [ ] 1 k P T (k) Tmη F. (16) Tmη That is, we obtain an important relation binding the functional form of the degree distribution to the activity distribution of the nodes.

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 4 2 Epidemic threshold Let us consider the SIS epidemic compartmental model, characterized by a transition probability λ and a recovery time µ 1, spreading in the dynamical network generated as discussed above. Let us assume a distribution of activity a of nodes given by a general distribution F(x) as before. At a mean-field level, the epidemic process will be characterized by the number of infected individuals in the class of activity a, at time t, namely I t a. 2.1 Case m = 1 The number of infected individuals of class a at time t + t given by: I t+ t a = µ ti t a + I t a + λ( t a I t a)a t da It a + λ(t a I t a) da It a a t, (17) where a is the total number of individuals with activity a. In Eq. (17), the third term on the right side takes into account the probability the a susceptible of class a is active and get the infection getting a connection from any other infected individual (summing over all different classes), while the last term takes into account the probability that a susceptible, independently of his activity, gets a connection from any infected active individual. ow summing on all the classes we get (ignoring the second order terms): dai t+ t a = I t+ t = I t µ ti t + λ a I t t + λθ t t, (18) where θ t = da I t a a. We can get another expression multiplying both sides of Eq. (17) by a and integrating, to obtain θ t+ t = θ t µθ t t + λ a 2 I t t + λ a θ t t. (19) In the limit t 0, we can write Eqs. (17) and (19) in a differential form: t I = µi + λ a I + λθ, (20) t θ = µθ + λ a 2 I + λ a θ. (21) The Jacobian matrix of this set of linear differential equations takes the form ( µ + λ a λ J = λ a 2 µ + λ a ), and has eigenvalues Λ (1,2) = λ a µ ± λ a 2. (22) The epidemic threshold is obtained requiring the largest eigenvalues to be larger the 0, which leads to the condition for the presence of an endemic state: λ µ > 1 a + a 2 + O( 1 ) (23)

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 5 The order 1/ is present because we are not considering events in which two infected nodes choose each other for connection. An important epidemiological quantity is the reproductive number R 0, defined as the average number of secondary cases generated by a primary case in an entirely susceptible population. For a SIS model we have R 0 = β µ λ k (24) µ where β = λ k is the per capita spreading rates that takes into account the rate of contacts of each individual. Considering this in the equation (23) at the first order we get as a threshold in terms of the reproductive numner: 2.2 Case m > 1 R 0 > R C 0 = 2 a a + a 2 = 2 x x + x 2 = 2 (25) 1 + x 2 x 2 In this general case we can use the same machinery as above. We have just to add the fact that each active agent can now make more then one connection (m at most) at each trial. In this case the probability to get a connection changes: ( 1 1 1 1 ) m m Using this fact, we can work out the corresponding set of differential equations, whose Jacobian matrix is now: ( ) µ + λm a λm J m = λm a 2, µ + λm a with eigenvalues (26) Λ (1,2) = mλ a µ ± λm a 2. (27) That is, the eigenvalues in the case m > 1. Thus, the epidemic threshold in the general case can be simply written as R 0 > R C 0 = 1 2m a m a + a 2 = 2 x x + (28) x 2 3 Persistence of links In the model, at each time step a random markovian network is created. There is no memory. Each link created at time step t t is deleted and recreated randomly at time step t. This is an oversimplification of real social interaction. In the real world we can imagine that individuals establish social connections preferentially within an limited circle of friends, and that they have a memory. This implies that a certain number of connections are active for more than one single time window t, they are persistent. In Figure S1 we show the distribution of persistence for the APS dataset, in Figure S2 for Twitter and in Figure S3 for the IMDb. As it becomes clear from the plots,

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 6 10 6 10 4 ρ( t) 10 2 1 10 100 t Figure S1: For the APS database we plot the distribution of connections active for t consecutive time steps: persistence of links. In this case one time window correspond to one year a relevant number of connections stay active across different time windows. In our model we consider instead a mean-field approach where temporal correlations between users are neglected. However it is easy to generalize our model in order to take into account the persistence of links and its effect in the spreading of an epidemic disease. This will be a matter of future work. 4 Dataset Details We study three different dataset: the collaborations in the journal "Physical Review Letters (PRL) published by the APS 1, the message exchanged on Twitter and the activity of actors in movies and TV series as recorded in the Internet Movie Database (IMDb) 2. 4.1 PRL Dataset In this database the network representation considers each author of a PRL article as a node. An undirected link between two different authors is drawn if they collaborated in the same article. We filter out all the articles with more than 10 authors in order to focus our attention just on small collaborations in which we can assume that the social components is relevant. We consider the period between 1960 and 2004. In this time window we registered 71.583 active nodes and 261.553 connections among them. In this dataset is natural defining the activity rate, a, of each author as the number of papers written in a specific time window t. 1 The data are available here: http://prx.aps.org/node/3966 2 The data are available here: http://www.imdb.com/interfaces

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 7 10 6 ρ( t) 10 4 10 2 10 1 10 2 t Figure S2: For the Twitter database we plot the distribution of connections active for t consecutive time steps: persistence of links. In this case one time window correspond to one day 10 8 10 6 ρ( t) 10 4 10 2 10 1 10 2 t Figure S3: For IMDb we plot the distribution of connections active for t consecutive time steps: persistence of links. In this case one time window correspond to one year.

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 8 10-2 P(k) 10-4 10-6 1 day 5 days 10 days 30 days 10 1 10 2 k Figure S4: Considering the subset of users active on the first 31 days of our twitter dataset we plot the degree distribution of the resulting networks for different integrating time windows. 4.2 Twitter Dataset Having been granted temporary access to Twitter s firehose we mined the stream for over 6 months to identify a large sample of active user accounts. Using the API, we then queried for the complete history of 3 million users, resulting in a total of over 380 million individual tweets covering almost 4 years of user activity on Twitter. In this database the network representation considers each users as a node. An undirected link between two different users is drawn if they exchanged at least one message. We focus our attention on 9 months during 2008. In this time window we registered 531.788 active nodes and 2.566.398 connections among them. In this dataset we define the activity rate of each user as the number of messages sent in a time window t. In figure S4 we show the degree distribution of the subgraph obtained by integrating over 1, 5, 10, 30 days. The subset of nodes considered are those active in the the first months of the dataset (January 2008). It is clear how the network integrated in one day has a sparse nature. Increasing the time window heterogeneous connectivity patterns start to emerge. 4.3 IMDb Dataset In this database the network representation considers each actor as a node. An undirected link between two different actors is drawn if they collaborated in the same movie/tv series. We focus on the period between 1950 and 2010. During this time period we registered 1.273.631 active nodes and 47.884.882 connections between them. A natural way to define the activity rate in this dataset is to consider the number of movies acted by each actor in a specific time window t. In figure S5 we show the degree distribution of the subgraph obtained by integrating over 1, 5, 10, 30 years. The subset of actors is formed by those active in the period 1970 1980. Even in this case is clear how increasing the integrating time window the level of heterogeneity increases.

. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani 9 10-2 P(k) 10-4 10-6 1 year 5 years 10 years 30 years 10-8 10 1 10 2 10 3 k Figure S5: Considering the subset of actors active during the period 1970 1980 in the IMDb data we plot the degree distribution of the resulting networks for different integrating time windows.