Information Dissemination in Social Networks under the Linear Threshold Model

Size: px

Start display at page:

Download "Information Dissemination in Social Networks under the Linear Threshold Model"

Kerry Holt
6 years ago
Views:

1 Information Dissemination in Social Networks under the Linear Threshold Model Srinivasan Venkatramanan and Anurag Kumar Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore January, 2011 Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

2 Outline of Tutorial 1 Social Networks and Applications 2 Mathematical Model 3 Recursive Analytical Expressions 4 Examples 5 The HILT Model: ODE Approximation 6 Final Remarks Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

3 Outline of Talk 1 Social Networks and Applications 2 Mathematical Model 3 Recursive Analytical Expressions 4 Examples 5 The HILT Model: ODE Approximation 6 Final Remarks Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

4 Social Networks A collection of individuals with pair-wise relationships between them Naturally modelled as a graph Recently much interest in the context of information and communication technologies World-wide-web link networks Online social networks, -contact networks Citation networks and coauthorship networks Forwarding in mobile opportunistic networks An important application area for Network Science Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

5 Mobile Wireless Opportunistic Networks Disruption (or Delay) Tolerant Networks (DTN) People or vehicles carry mobile communication devices Links form when such nodes come in contact Data moves by Store-and-Carry-Forward (much like a postal system) Suitable for some services, e.g., asynchronous messaging, content distribution (e.g., news) Pocket Switched Networks (PSN) Mobility models Contact models Not every physical contact may lead to link formation Communication link formation can depend on existence of a social link Forwarding models Not every link formation should lead to message transfer Message forwarding should depend on who has been met and his/her relation to the destination in a social network Recall: greedy geographical forwarding! Hence, need to bring in the social network model above the traditional DTN model Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

6 Inuence Spread, Information Dissemination, Web Mining One of the interesting problems to consider is modeling and analyzing spread of information among the nodes in the network Applications Identifying most inuential nodes in the network for viral marketing Characterize the spread of epidemics across the network Propose suitable vaccination and quarantine schemes to impede the spread of epidemics Recent study facilitated by massive quantities of data available for empirical analysis, from social media sites. The recent DARPA Network Challenge [1] tested the ability of such massive ad-hoc teams to solve real-world problems, with applications to disaster management. Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

7 Role of Network Science: Characterising Graph Structure To characterize information spread, it is essential to know the underlying graph structure. In most scenarios, it is not practically feasible to obtain/work with the entire network data. Random graphs were used to model such networks in the past, but in reality, there are imperceptible rules rather than randomness that drives the network structure. Network science allows us to incorporate random graphs along with models of graph evolution (based on local interaction) to obtain a more realistic model of the network. Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

8 Outline of Talk 1 Social Networks and Applications 2 Mathematical Model 3 Recursive Analytical Expressions 4 Examples 5 The HILT Model: ODE Approximation 6 Final Remarks Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

9 The Social Network Model The social network is modeled by a directed weighted graph N = (V, E) Edge weights w i,j give a measure of inuence of node i on node j Activation process We begin with an initial set of active nodes A 0 At each step k, new nodes are added to the active set A k by the inuence of their neighbours Nodes once activated cannot become inactive (progressive case) This goes on until we reach a terminal set A S. Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

10 wm,j Activation Model: The Linear Threshold Model We require that i j w i,j 1 At the beginning, each node j randomly chooses a threshold Θ j distributed uniformly over [0,1] The thresholds are independent across the nodes At step k, a node j gets activated if, it had been inactive until step k 1 and i A k 1 w i,j Θ j j w i,j wk,j i k m i w i,j 1, j i.e., the total inuence of the active nodes into j exceeds the threshold of j Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

11 Literature Survey Threshold models to explain collective behaviour were rst put forward by Granovetter in [2], where he discussed the spread of binary decisions, among a group of rational agents (e.g. voting models) Domingos and Richardson [3] studied information diusion under the viral marketing framework, and proposed the combinatorial optimization problem of nding the most inuential nodes Kempe et al. [4] studied the inuence spread problem under the LT and IC (independent cascade) model Studied the problem of obtaining the optimal initial set of size K (in the sense of maximising the expected size of the nal active set) Showed that the problem is NP-hard The objective function is monotone and submodular, and hence the greedy algorithm has an approximation factor of (1 1 e ) Recent works include a general framework for cost eective outbreak detection [5], where they exploit the submodularity property to propose a lazy forward algorithm to achieve near greedy algorithm performance Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

12 Our Contributions We consider only the LT activation model Recursive analytical expressions for the expected information spread achieved by an initial set under the Linear Threshold model Interpretation in terms of acyclic paths in a certain DTMC Insights into information spread in some simple network topologies, such as the star, ring, and certain special mesh networks Homogeneous Inuence LT model (HILT): by using Kurtz's theorem, we show that, an appropriately scaled version of the inuence process, converges to a system of ODEs Provide explicit formulas for trajectories followed by the inuence process Simple design formulas can be obtained Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

13 Notation N : weighted directed graph of the entire social network w i,j : edge weights of N indicating inuence from i to j W: inuence matrix with w i,j as entries Θ j : random threshold chosen by j uniformly from [0, 1] b j (A): = i A w i,j, total inuence into node j from set A A 0 : Initial active set A k : Set of all active nodes at time step k D k : Set of infectious nodes at time step k S: = arg min k {A k = A k 1 } (k): = P (N ) (j D j k A 0 = A) (N,A) g (N,A) g : = P (N,A) (j A j S ) σ (N,A) : = E (N,A) [ A S ] Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48 A 0 A k 1 A k D k N

14 Outline of Talk 1 Social Networks and Applications 2 Mathematical Model 3 Recursive Analytical Expressions 4 Examples 5 The HILT Model: ODE Approximation 6 Final Remarks Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

15 (N,A) Activation Probabilities g (k) 1/3 j The following key lemma writes the activation probabilities g (N,A 0) (k) j in recursive form The i.i.d. uniformly distributed activation threshold plays a key role in the proof Lemma 1 j A 0, (N,A0) (a) g (N,A0) j (0) = 1 (b) g j (k) = 0, for all k > 0 2 j / A 0, (N,A0) (a) g j (0) = 0 (N,A0) (b) g j (k) = l N \{j} (N \{j},a0) g l (k 1) w l,j, for all k > 0 Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

16 (N,A) Activation Probabilities g (k) 2/3 j Proof. 1(a), 1(b) and 2(a) are obvious For 2(b), since A 0 N \{j}, ( ) g (N,A 0) (k) = P (N \{j},a 0) b j j (A k 2 ) < Θ j b j (A k 1 ) Since D k 1 = A k 1 \A k 2, and Θ j is chosen uniformly from [0,1], we can write, g (N,A 0) (k) = E (N \{j},a0) [b j j (D k 1 )] [ ] = E (N \{j},a 0) I {l Dk 1 }w l,j = l N \{j} l N \{j} g (N \{j},a 0) (k 1) w l l,j Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

17 (N,A) Activation Probabilities g (k) 3/3 j The previous result yields g (N,i) j (k) = l 1 i,j l 2 i,j,l 1... l k 1 i,j,l 1,l 2,...l k 2 w i,l1 w l1,l 2... w lk 1,j i.e., starting with A 0 = {i}, the probability that j is activated at Step k is the sum of the product of path-weights along all k-hop acyclic paths starting at i and ending at j (N,A) Further, g (k) is just the sum of the weights of such paths that j avoid the set A 0 = A (except for the initial node) Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

18 (N,i) Total Expected Activation due to {i}: σ Theorem Given a social network N, with inuence matrix W, the total inuence of any node i in the network under the LT model is given by σ (N,i) = 1 + w i,j σ (N \{i},j) (1) j N \{i} Proof. Note that, σ (N,i) = g k=0 j N (N,i) j (k) In the above expression, by substituting using the previous lemma recursively and rearranging summations, we get the theorem. Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

19 Total Expected Activation due to Set A 0 : σ (N,A 0) Theorem Given a network N with inuence matrix W and an initial set A 0, for all i A 0, dene sub-networks N A 0 = {N \A i 0 } {i}. Then the expected inuence of the initial set A 0 is given by, σ (N,A 0) = σ (N A 0,i) i i A 0 (2) Proof. The proof follows by substituting the g (N,A 0) (k) expressions and noting j that the edge weights {w i,j, j A 0 } do not have any eect on σ (N,A0). This allows us to split the problem of evaluating inuences sub-problems each involving only one node from A 0. Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

20 (N,A) Illustrations of the Recursive Calculations for σ j i j j k 1+w i,j k +w i,k k Singleton Initial Set {i} i i k + k Initial Set A 0 = {i, k} Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

21 Replacing an Initial Set with a Supernode Theorem Given a network N, with inuence matrix W, an initial set A 0 can be replaced by a supernode X as shown below. Then, σ (N,A 0) is equal to the inuence of X in the modied network, with X being counted as X nodes instead of 1. i j k X X {i, j, k} v / X, w X,v = w i,v + w j,v + w k,v w v,x = w v,i + w v,j + w v,k Replacing the initial set with a supernode Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

22 A Discrete Time Markov Chain P := W T is a row substochastic matrix with zero diagonal terms Make stochastic by additing p j,j values, appropriately Interpret P as the transition matrix of a DTMC {X k }, obtained by reversing the edges in the social network Dene c k (j A 0 ) := P({X m N \A 0, 0 m < k, X k A 0, X 0 X 1 X k } X0 = j) the probability of the DTMC reaching A 0 for the rst time via a k step acyclic path, given that X 0 = j Also, dene c(j A 0 ) := c k (j A 0 ) i.e., given that we start from state j, the probability of hitting the set A 0 for the rst time through an acyclic path Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48 k=0

23 Inuence Spread Analysis via the DTMC j i j i p i,j = w j,i k k (N,i) g k = w i,k + w i,j w j,k c(k i) = p k,i + p k,j p j,i Social Network Equivalent Markov Chain In general, it can be shown that g (N,A 0) j (k) = c k (j A 0 ) g (N,A 0) j = c(j A 0 ) This viewpoint leads to an alternative proof of the monotonicity and submodularity of σ (N,A 0) Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

24 Outline of Talk 1 Social Networks and Applications 2 Mathematical Model 3 Recursive Analytical Expressions 4 Examples 5 The HILT Model: ODE Approximation 6 Final Remarks Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

25 Star Topology Consider the star topology network N with N nodes including the hub e.g., a central authority through which all inuence/information must be disseminated α( 1) : inuence of the hub on peripheral nodes β( 1 ) : inuence of each peripheral node on the hub N 1 Star topology Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

26 Star Topology: Analysis using Recursive Expressions What is the optimal initial set of size K? Let H(K) : set with one hub node and K 1 peripheral nodes. H(K) : set with K peripheral nodes The recursive expressions yield σ (N,H(K)) = K + α(n K) σ (N, H(K)) = K + Kβ ( 1 + α(n K 1) ) There exists α, such that σ (N, H(K)) > σ (N,H(K)) for α < α, i.e., it is optimal to exclude the hub from the initial set α = (N K) 1 β K K(N K 1) Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

27 Ring Topology Consider the ring topology N with N homogeneous nodes α 0.5 denotes the inuence of any node on each of its neighbours Ring topology Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

28 Ring Topology: Analysis using Recursive Expressions Let A(K) be an initial set of K nodes (a 1, a 2,... a K ) : Indices of the chosen nodes in the ring (l 1, l 2,..., l K ) be the number of nodes in between the chosen nodes in the ring, i.e. l 1 = a 2 a 1 1, l 2 = a 3 a 2 1 and so on Then the earlier analysis yields σ (N,A(K)) = K + 2 α ( K ) K α l i 1 α In order to maximize the inuence, we see that the K nodes should be distributed uniformly over the ring Argument via majorisation and Shur convexity Also, with α = 0.5, for large N and small K, the inuence of A(K) grows as 3K i=1 Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

29 Degree Based Model Start with an undirected graph without cycles, whose adjacency matrix is given by M d j : the degree of node j Dene W as follows w i,j = m i,j /d j j 1 d(j) i 1 d(j) k m Node Degree Based Model Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

30 Degree Based Model: Example Calculation of Inuence j σ (N,j) = ( ( )) = 3 = d j + 1 Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

31 Degree Based Model: General Result on Inuence Theorem Consider an acyclic undirected network N with degree based inuence weights (as dened). Then, for any node i N, σ (N,i) = d i + 1 Note that the local property (a node's degree) governs its global property (inuence on the entire network) The most inuential node is the one with the highest degree, and the set of top-k inuential nodes has high correlation with the set of top-k high degree nodes Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

32 Uniform Inuence and Susceptance Model UISLT: Uniform Inuence and Susceptance LT Model Two parameters α i and β i associated with the node i Measures of the level of inuence and susceptance of the node i Dene w i,j = α i β j, for allj i α i, β i, 1 i N, are chosen such that, j i w j,i 1 Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

33 Uniform Models k α j β k j i α j β i j i α j 1 β i, i Uniform Inuence and Susceptance (UISLT) model v α v β i α i β i α i β i α i α i i i j α i 1, j Uniform Inuence (UILT) model β i β i i β i 1 N 1, i Uniform Susceptance (USLT) model Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

34 USLT and UILT Models: Optimal Initial Set USLT: The optimal A 0 in this case, is found to be the set of K nodes with least β i values, i.e. the least susceptible ones UILT: The optimal A 0 in this case, is found to be the set of K nodes with highest α i values, i.e. the most inuential ones For these models, this turns out to be equivalent to picking the K nodes with the top-k values of the stationary p.m.f. of P = W T Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

35 Comments on the PageRank Algorithm Google's PageRank algorithm, in its simplest form, ranks pages by calculating the stationary probability over the inuence graph of web-pages, by assuming a random surfer model Thus, PageRank yields the optimum solution for UILT and USLT We have found general UISLT examples where PageRank can perform poorly Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

36 Outline of Talk 1 Social Networks and Applications 2 Mathematical Model 3 Recursive Analytical Expressions 4 Examples 5 The HILT Model: ODE Approximation 6 Final Remarks Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

37 Homogeneous Inuence Linear Threshold (HILT) Model All edge weights = γ, γ 1 N 1 HILT model Consider a network of N nodes with a complete graph Each edge carries an equal weight γ 1. N 1 Dene h γ (N) (k) := σ (N,A), for all A of size k By using the analytical expressions derived earlier, we obtain h (N) γ (k) = k[1 + (N k)γ[1 + (N k 1)γ[1 + We would like to approximate the evolution of the inuence process We study this via a uid limit in the large N regime Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

38 The Activation Markov Chain Let A(k) and D(k) be the sizes of the active and infectious sets at step k 0 Dene B(k) = A(k 1) B(k) is the size of the set of active nodes that have already exercised their inuence (no longer infectious) Consider the process (B(k), D(k)), k 0, with B(0) = 0 and D(0) = A(0) Using the uniformly distributed i.i.d. activation thresholds B(k + 1) = B(k) + D(k) P(D(k + 1) = l (B(i), D(i)), 0 i k 1, (B(k), D(k)) = (b, d)) ( )( ) N b l ( d γd = 1 γd ) (N b d l) l 1 γb 1 γb = P(D(k + 1) = l B(k) = b, D(k) = d) Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

39 Scaling the Process (B(k), D(k)) Dene a scaled Markov process (B N (k), D N (k)) Can be viewed as evolving over minislots of width 1, whereas the N original process evolves over unit steps In each minislot, each node in D N (k) decides to spread its inuence with probability 1 or defer with probability 1 1 N N In the former case, it contributes inuence of γ and then moves to the set B N (k + 1), else it stays in D N (k + 1) set Dening B N (k) = B N (k), D N (k) = DN (k), the evolution equations N N can be written as follows: B N (k + 1) = B N (k) + D N (k) N + Ỹ N (k + 1) D N (k + 1) = N 1 D N (k) + Γ D N (k) N N 1 Γ B N (k) (N B N (k) D N (k)) + Z N (k + 1) where we have asserted that γ scales with N as Γ N Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

40 ODE approximation to HILT model The hypotheses of Kurtz's theorem [6] (uniform convergence of drift to a Lipschitz function, and condition on noise variance) Applying ( the theorem we conclude that for every ɛ > 0 and T > 0 P ( B N ( Nt ), D N ( Nt ) ) ( b(t), d(t) ) ) N > ɛ 0 sup 0 t T where (b(t), d(t)) is the unique solution of the ODE (with given (b(0), d(0))) ḃ(t) = d(t) d(t) = d(t) + Γd(t) (1 b(t) d(t)) 1 Γb(t) Solving the ODE for b(0) = 0, d(0) = d 0, dening r = 1 Γ Γd 0, we obtain b(t) = d 0 r d 0 r e rt d(t) = d 0 e rt Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

41 Convergence of B N ( Nt ), D N ( Nt ) to (b(t), d(t)) The trajectory of the ODE is plotted along with sample paths of the scaled process B N ( Nt ), D N ( Nt ) for N=50,100,500,1000 B N (k)/n,d N (k)/n,b(t),d(t) Convergence of scaled HILT process to the ODE solution Illustration of Kurtz s theorem 0.8 N=500 b(t) N= N=1000 N= Time (t) d(t) HILT parameters: Γ = 0.9 d 0 = 0.2 Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

42 Approximation of the Original Process by the ODE Simulation of the original process ( B(k), D(k) ) compared with the N N ODE solution (b(t), d(t)) b(t), d(t), B(k)/N, D(k)/N Comparison of ODE approximation (b(t),d(t)) with the original process (B(k),D(k)) D(k)/N B(k)/N b(t) d(t) Time (t) Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

43 HILT Network: Applications of the ODE Solution We have an approximation to the trajectory of the activation process Given that we start with d 0 fraction of nodes in the infectious set, the nal fraction of activated nodes (b ) is d 0 r where r = 1 Γ + Γd 0. Given b, the optimal d 0 to be chosen is given by, d 0 = b (1 Γ) 1 b Γ For large N, with Γ < 1 we cannot inuence the entire population (i.e., b = 1) unless we start with the entire population active (i.e., d 0 = 1) But if Γ = 1 then b = 1 provided d 0 > 0 We can also answer questions on time-constrained inuence spread, where we are interested in the value of a(t ) = b(t ) + d(t ), given nite T Time T at which a(t ) = α, given d 0 and Γ Optimal d 0 to start with so that a(t ) α Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

44 Outline of Talk 1 Social Networks and Applications 2 Mathematical Model 3 Recursive Analytical Expressions 4 Examples 5 The HILT Model: ODE Approximation 6 Final Remarks Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

45 Other Related Work We have used the Acyclic Path Probabilities interpretation to provide an alternative proof to the monotonicity and submodularity of the function σ (N,A 0) Using the recursive expression, we have identied special cases where the PageRank algorithm is optimal for the inuence maximization problem We have also given a heuristic algorithm (G1-sieving) for Inuence maximization problem, based on insights from the recursive expression, which performs on par with the Greedy algorithm We have extended the work done on uid limit for HILT model to multi-class scenario and derived the approximating system of ODE Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

46 Future Work We can consider mobile nodes (as in a Delay-Tolerant Network), where nodes are able to transfer inuence only on meeting The problem can also be generalized to edge weights and threshold functions varying with time Finally, we can also study information dissemination with dierent activation processes, and on more generic networks to gain insights into the underlying mechanisms of information dissemination Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

47 References I Darpa Network Challenge Mark Granovetter. Threshold models of collective behaviour. American Journal of Sociology, Matt Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral marketing. In ACM SIGKDD, D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of inuence through a social network. In ACM SIGKDD, Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

48 References II C. Guestrin C. Faloutsos J. Vanbriesen J. Leskovec, A. Krause and N. Glance. Cost-eective outbreak detection in networks. In ACM SIGKDD, Thomas G. Kurtz. Solutions of ordinary dierential equations as limits of pure jump markov processes, Srinivasan, Kumar (IISc, Bangalore) Information Spread in Social Networks 3 January, / 48

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.