Influence Maximization in Big Networks: An Incremental Algorithm for Streaming Subgraph Influence Spread Estimation

Size: px
Start display at page:

Download "Influence Maximization in Big Networks: An Incremental Algorithm for Streaming Subgraph Influence Spread Estimation"

Transcription

1 Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015 Influence Maximization in Big etworks: An Incremental Algorithm for Streaming Subgraph Estimation Wei-Xue Lu, Peng Zhang, Chuan Zhou, Chunyi Liu, Li Gao Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China Center on Quantum Computation & Intelligent Systems, Univ. of Technology Sydney, Australia Abstract Influence maximization plays a key role in social network viral marketing. Although the problem has been widely studied, it is still challenging to estimate influence spread in big networks with hundreds of millions of nodes. Existing heuristic algorithms and greedy algorithms incur heavy computation cost in big networks and are incapable of processing dynamic network structures. In this paper, we propose an incremental algorithm for influence spread estimation in big networks. The incremental algorithm breaks down big networks into small subgraphs and continuously estimate influence spread on these subgraphs as data streams. The challenge of the incremental algorithm is that subgraphs derived from a big network are not independent and MC simulations on each subgraph (defined as snapshots may conflict with each other. In this paper, we assume that different combinations of MC simulations on subgraphs generate independent samples. In so doing, the incremental algorithm on streaming subgraphs can estimate influence spread with fewer simulations. Experimental results demonstrate the performance of the proposed algorithm. 1 Introduction Social networks is of vital importance in information diffusion and viral marketing. Influence maximization in social networks is defined as finding a subset of nodes that can trigger the largest number of users propagating the given information. The problem can be formulated as a discrete optimization problem under the independent cascade model (IC and the linear threshold model (LT [Kempe et al., 2003]. It has been proved P-hard, and a greedy algorithm is favored. However, heavy Monte-Carlo simulations are needed to estimate the influence spreads of different seed sets at each iteration. As social networks increase in size continuously, greedy algorithms are inapplicable of processing big networks. Many efforts have been made to explore scalable algorithms in big netowrks. An intuitive idea is to use heuristic algorithms, i.e. [Chen et al., 2009], [Chen et al., 2010], IRIE [Jung et al., 2011], Group- [Liu et al., 2014]. However, these algorithms are not robust with respect to network structures. On the other hand, many advanced greedy algorithms, i.e. [Leskovec et al., 2007], [Goyal et al., 2011], [Zhou et al., ], have been proposed to speedup seed selection. These methods focus on pruning unnecessary Monte-Carlo simulations in selecting new influential nodes. Unfortunately, these algorithms are still incapable of processing big networks with hundreds of millions of nodes because of unavoidable cost of simulating influence cascades. Moreover, all these greedy methods are designed on static networks and cannot be generalized to dynamic networks which commonly occur in reality. In this paper, we propose an incremental algorithm for estimating the expected influence spread of seed nodes and evaluate the performance under influence maximization. Specifically, the algorithm breaks down a big network into a large number of small subgraphs, and then processes these subgraphs as data streams. The results on subgraphs are joined to recover the global result in the whole network. The nature of incremental processing also enables it to handle dynamic networks where network structures change over time. The main challenge of the problem is that small subgraphs derived from a big network are not independent. For example, as in Figure 1, the two subgraphs on the right hand side share nodes 4, 5, 7 and even worse, two subgraphs may share common edges. In this case, MC simulation results on subgraphs (defined as snapshots may not be directly addable or even conflict with each other. In this work, we assume that edges between different subgraphs are disjoint and each subgraph is converted as a class of Strongly Connected Components (SCCs. SCCs on subgraphs are joined to form snapshots or SCCs of the whole network, and different combinations of simulations on subgraphs generate independent samples. In experiments, we demonstrate that the solution outperforms heuristics and existing MC simulation based methods. The merits of the proposed incremental algorithm are summarized as follows: A big network is broken down into subgraphs and the coin flip technique [Kempe et al., 2003] is applied to generate snapshots on each subgraph. Subgraphs can be processed in parallel. Snapshots on subgraphs are converted to be a class of 2076

2 Strongly Connected Components (SCCs. Snapshots on different subgraphs are joined to restore the global result of the original network. The number of snapshot joins increases polynomially with the number of subgraphs, which significantly reduces the complexity of MC simulations. Snapshots are incrementally maintained and updated such that replicated simulations can be avoided when changing seed sets. 2 Related Work Data intensive applications such as data streams motivate incremental learning algorithms. Incremental learning is advantageous when dealing with very large or non-stationary data. Existing incremental learning algorithms can be categorized into two groups: approximate incremental learning [Kivinen et al., 2004] and exact incremental learning [Laskov et al., 2006]. Typical examples include the incremental SVM [Poggio, 2001] and PCA models [Balsubramani et al., 2013]. However, these models were proposed to handle vectorial training data with the Identical and Independent Distributions (IID assumption. Incrementally learning from subgraph data without the IID assumption has not been addressed before. In the domain of network influence maximization, most existing models cannot handle big networks with hundreds of millions of nodes due to heavy Monte-Carlo computation. The typical algorithm Cost-Effective Lazy Forward ( [Leskovec et al., 2007] greatly reduce the number of influence spread estimations and is 700 times speed-up against previous greedy algorithms. Unfortunately, these improved greedy algorithms are still inefficient due to too many Monte- Carlo simulations for influence spread estimation. Static- GreedyDU [Cheng et al., 2013] reuses snapshots generated from a given graph and reduces the number of simulations by two orders of magnitude, meanwhile the performance is almost the same as the original greedy algorithm. Recent work [Ohsaka et al., 2014] includes a Pruned BFS method and community-based methods, i.e. [Wang et al., 2010], [Chen et al., 2014]. Unlike these work, our method scales well to big networks by breaking down the big network into small subgraphs for fast spread estimation. 3 Preliminaries Let G = (V, E be an undirected graph with a node set V of size V and an edge set E of size E. Without loss of generality, we use the independent cascade (IC model as the propagation model. A node subset S V is called a seed set if we first activate nodes in S. Given a seed set S, its influence spread σ(s is defined as the expected number of nodes eventually activated. Formally, the influence maximization problem is to find the optimal seed set S such that S = argmax S V ; S =k σ(s, where S stands for the number of elements in S, and k is a predefined integer. To solve the influence estimation problem, one needs to estimate σ(s for any given S. However, exact computation of σ( is #P-hard [Chen et al., 2010], so in practice Monte- Carlo simulations are employed to estimate σ(s: Figure 1: An illustration of the network decomposition. Propagation Simulation. The influence spread is obtained by directly simulating the random process propagating from a given seed set S. Let A i denote the set of nodes newly activated in the i-th iteration and we have A 0 = S. In the i + 1-th iteration, each node u A i has a single chance to activate its each inactive neighbor v with probability p uv. If v is activated, it is added to A i+1. The process stops when A T +1 = the first time for some T. And X(S = A 0 A 1 A T is the influence spread of this single simulation. We run such simulations for many times and obtain the estimated expected influence spread σ(s by averaging over all simulations. Snapshot Simulation. Based on coin flip [Kempe et al., 2003], we flip coins in advance to produce a snapshot X = (V, E, which is a sampled subgraph of G, where an edge uv is either kept with the probability p uv or removed, such that E E. Thus, in this specific snapshot X the influence spread for the given seed set S, denoted as X(S, equals to the number of nodes reachable from S. We generate a large number of snapshots and obtain the estimated expected influence spread σ(s by averaging over all the generated snapshots. The above two simulation methods are equivalent, because X(S and X(S share the same probability distribution [Kempe et al., 2003]. In this work, we use snapshot simulations because we need to frequently estimate the influence spread of different seed sets when choosing new seeds to S. 4 Problem Formulation The basic idea of the paper is to break down a big network into small subgraphs and analyze the reachability of nodes on each subgraph independently. The results from each subgraph are joined to restore the results on the original network. We use Figure 1 to explain the incremental subgraph algorithm. Figure 1(a is the original network that each edge is associated with a propagation probability p uv, u, v {1, 2,, 8}. For simplicity, assume the edges in the graph are split into two disjoint sets and two subgraphs are obtained as shown in Figures 1(b and (c. The propagation probability p uv on each edge in the subgraphs is inherited from the original network. This way, the original graph is broken down into two small and tractable subgraphs and incremental learning can be applied to the subgraphs. Consider that the original network G = (V, E is broken down into subgraphs G 1 = (V 1, E 1, G 2 = 2077

3 (V 2, E 2,, G = (V, E with V 1 V 2 V = V, E 1 E 2 E = E, and E i E j =, 1 i, j. In dynamic networks with increasing nodes and edges, we keep increasing nodes and edges as new streaming subgraphs denoted by G +1 = (V +1, E +1,. Then, we use the coin flip method on each subgraph G i for M times and generate M snapshots. Denote these snapshots on G i as X (i 1, X(i 2,, X(i M. Then, our problem turns to given a seed set in a big network, incrementally learning snapshots generated from subgraphs to estimate the influence spread of the seed set in the original big network. 5 The Incremental Algorithm 5.1 Expression of Influence First, each snapshot X r of the original network can be represented by a m-dimensional binary vector r = (r uv uv E, where r uv = 1 if edge uv is activated in a coin flipping; otherwise, r uv = 0. We call r a representation vector of X r. The closed set of all possible snapshot representation vectors is denoted by R as follows, R = {r r = (r uv uv E {0, 1} E } (1 Under the IC model assumption, the probability distribution on R is given by q(r = (p uv ruv (1 p uv 1 ruv, r R (2 uv E If the original graph is broken down into subgraphs, the snapshot we get by restricting X r on subgraph G i is denoted by X r (i = X r Gi with the representation vector r (i = (r uv uv Ei. Obviously, the marginal distribution of r (i is q(r (i = uv E i (p uv ruv (1 p uv 1 ruv, which similarly describes the distribution of snapshots on subgraph G i. These distributions satisfy the following equation q(r = q(r (1 q(r (2 q(r (. Therefore, subgraphs can be processed incrementally by using the network decomposition. We introduce the symbol as the join operator in the equation X r = X r (1 X r (2 X r (, which means that X r can be broken down into snapshots X r (i, 1 i, where X r (i is the snapshot corresponding to subgraph G i. Conversely, if snapshots X r (i, 1 i are increasingly obtained from subgraphs, we can join them to restore the snapshot X r of the original network. Hence the equation X r = X r (1 X r (2 X r ( interprets the relationship between X r and X r (i. Let (S; X r denote the set of nodes reachable from the seed set S in a specific snapshot X r. The expected influence spread σ(s of the seed set S can be described as follows, σ(s = q(r (1 q(r (2 q(r ( (S; X r (3 r (1 r (2 r ( We can observe from the above equation that σ(s is determined by (S; X r = (S; X r (1 X r (2 X r (. Thus, we explain how to use the snapshots generated from these subgraphs to estimate the expected influence spread of a given seed set in the original network. Algorithm 1: The join of SCC classes. Input: C 1 = {C 11, C 12,, C 1α1 } and C 2 = {C 21, C 22,, C 2α2 } Output: A class of SCCs after joining of C 1 and C 2 1 function join(c 1, C 2 2 for j = 1, 2,, C 2 = α 2 do 3 s = 1, C, C ; 4 for i = 1, 2,, C 1 do 5 if C 1i C 2j then 6 if C 2j C 1i then return C 1 ; 7 else C C C 1i ; 8 else 9 C s C 1i ; C {C, C s }; s s C {C, C}; C 1 C 11 return C 1 For any snapshot X in the network G = (V, E, we have u v if node v is reachable from node u on the snapshot X. We say the two nodes u and v are equivalent (u v if and only if both u v and v u are satisfied. Also, we define that each node u is equivalent to itself, u u. Based on the definitions, a set of nodes C = C(u; X = {v v u on X, v V } is called a Strongly Connected Component (SCC of X represented by the node u. The set C is either a node {u} or equivalent nodes. This way, the snapshot X can be described by a class of SCCs C = {C(u; X, u E} = {C 1, C 2, }, where C 1, C 2, are disjoint sets, and C 1 C 2 = E. Also, SCCs are used for memory-saving and efficient calculation. As we aim to find the most influential nodes, single sets in class C can be removed. Then, C = {C 1, C 2,, C α } with C i 2, i {1, 2,, α}. And the elements of (S; X r can be described as follows, (S; X r = u S C(u; X r u S C(u;X r 2 C(u; X (1 r X ( r Based on the above equation, we join SCCs on subgraphs to restore the result on the original network. 5.2 Join of the SCC Classes Consider that we have transformed all the snapshot X (i j, 1 i, 1 j M generated from subgraphs into classes of SCCs C (i j = {C (i jk, 1 k α(i j, C(i jk 2}, where C(i j is a SCC class of X (i j. We have discussed that if the snapshots have been incrementally obtained on each subgraph, we can join them to obtain snapshots of the original network. ow, the problem turns to using the SCC classes of these subgraphs to restore SCC classes of the original network. What we need to do next is to join the SCC classes of these subgraphs. We take = 4 for example. Suppose we incrementally have C (1, C (2, C (3 j 3, C (4 j 4, where each C (i j i is a SCC class of the subgraph G i. In order to restore a SCC class of the original network, we can first join C (1, C (2 to obtain C (1,2,2 which is a SCC class of the subgraph G 1 G 2 (V 1 V 2, E 1 E 2, then C (3 j 3 arrives and we increasingly join it with C (1,2,2 to ob- 2078

4 tain C (1,2,3,2,3. At last, C (4 j 4 arrives and we join it with C (1,2,3,2,3 to obtain C (1,2,3,4,2,3,4, which is a SCC class of the original network. From this example, we can observe that the join of C (1, C (2,, C ( j, 1,, j M can be reduced to the join of any two SCC classes. Therefore, we define a function join(c 1, C 2 in Algorithm 1 to return the class after joining C 1 = {C 11, C 12,, C 1α1 } and C 2 = {C 21, C 22,, C 2α2 }. 5.3 Influence Estimation In this section, we first briefly introduce the traditional network influence estimation methods theoretically guaranteed by Theorem 1. When extending the traditional methods into streaming subgraphs, we can join the snapshots of subgraphs to restore samples of the original network. However, the join of subgraph snapshots exponentially increases with the number of subgraphs. So, we propose to select only a few number of snapshot joins. Proposition 3 explains that when the selected joins are independent and identically distributed (i.i.d., the estimation converges fast. However, it is hard to select i.i.d. snapshot joins across subgraphs in reality. Therefore, we propose Definitions 1 and 2 to choose snapshot joins that approximately follow the i.i.d. condition. Proposition 3 shows that by using the approximation, we can use a polynomial number of snapshot joins to restore the global result. For simplicity, (S; X (1 X (2 X ( j is denoted by g j1,,,j (S, which means that the snapshot of the original network can be obtained by incrementally joining the j i -th snapshot from the i-th subgraph, and g j1,,,j (S returns the influence spread of a given seed set S. The representation vector of X (1 X (2 X ( j follows the distribution q(r. Hence for any 1,, j M, g j1,,,j (S is a random variable such that Eg j1,,,j (S = σ(s, where E is the expectation operator. Generally, M simulations are independently employed on the original network, generating M snapshots denoted by X i = X (1 i X ( i, and σ(s is empirically estimated by the following equation σ(s = 1 M M g i,i,,i(s where g i,i,,i (S, 1 i M are bounded and i.i.d.. So the estimation is guaranteed by the following Theorem [Hoeffding, 1963], Theorem 1 (Hoeffding s inequality. Let Y 1,, Y M be independent variables span over the interval [a, b], and S M Y Y M. Then, for all ε > 0, P ( 1 SM ESM ε M 2 exp ( 2Mε2 (b a 2 In the traditional problem setting, components of the subscript of g i,i,,i (S are the same, because simulations are employed on a single network G and subscript (i, i,, i denotes the i-th snapshot simulation on G. Furthermore, snapshots on subgraphs are not obtained by independent simulations but by restricting snapshots of the original network on subgraphs. In contrast, our incremental algorithm considers that snapshots of subgraphs are generated independently and then joined to the previous results. Specifically, we first generate M snapshots X (1, 1 (4 M on subgraph G 1, then the subgraph G 2 arrives where we generate another M snapshots X (2, 1 M. We join them to the snapshots of G 1 and obtain snapshots X (1 X (2. The process continues until the -th subgraph is joined. This way, we generate M random variables g j1,,,j (S, 1,,, j M such that Eg j1,,,j (S = σ(s. Unfortunately, it is not reasonable to take S Σ 1/M M j M 1=1 j =1 g,,,j (S as an estimation of σ(s. On the one hand, the number of snapshot joins M exponentially increases with respect to the number of subgraphs. On the other hand, many snapshot joins are redundant when estimating the expected influence spread. To better explain these two points, we first rewrite S Σ as the average summation of M 1 groups such that each group is the average summation of M independent and identically distributed random variables, i.e., S Σ = 1/M 1 M 1 β=1 [ 1/M M α=1 g j (α,β 1,,j (α,β ] (S 1/M 1 M 1 β=1 T β, where g (α,β,,j (α,β (S, 1 α M in T β are i.i.d.. Figure 2(a shows the join of three snapshots of the original network and the first join snapshot is obtained by incrementally join X (1 1, X(2 1, X(3 1, X(4 1. This join snapshot corresponds to the random variable g 1,1,1,1 (S. Similarly, the rest two corresponding random variables are g 2,2,2,2 (S and g 3,3,3,3 (S. It is clear that these three are i.i.d. and the average summation is T 1 = 1/3 [g 1,1,1,1 (S + g 2,2,2,2 (S + g 3,3,3,3 (S]. Hence, Figure 2(a corresponds to T 1. Similarly, Figure 2(b, 2(c, 2(d correspond to T 2, T 3, T 4 respectively. ote that not all T j are shown in the figure. Each T j can be treated as some S M in Theorem 1 and Eq. (4 is satisfied, i.e., P ( T j σ(s ε ( 2 exp. 2Mε2 Therefore, each T (b a 2 j has the same convergence rate as the traditional method. ext, our goal is to choose a portion of T j to estimate the influence spread in order to obtain higher convergence rate. If the same convergence rate is required, our method needs fewer simulations on each subgraph than the traditional method. Moreover, our method can deal with dynamic networks. For this goal, we first introduce the following proposition. Proposition 2. If T 1,, T k are independent and identically distributed random variables, then 1/k k T i has a higher convergence rate than the righthand side of Eq. (4. Proof. By applying the independence and Hoeffding s lemma [Hoeffding, 1963], we have, for any t > 0 ( ( 1 k P T i σ(s ε = P t 1 k T i tσ(s tε k k ( E exp t 1 k k T i e tσ(s e tε = Ee k t T i e tσ(s e tε k k = exp ( t exp k Ti ( 1 exp 8M ( 1 t 2 8M k (b a2 tε t 2 (b a2 e tσ(s e tε k2 ( 2 kmε 2 exp (b a 2 = 2079

5 (a Join group 1 (b Join group 2 (c Join group 3 (d Join group 4 Figure 2: Joins of subgraph snapshots. X (i j represents the j th snapshot simulation result on the i th subgraph. The first inequality is based on the Chebyshev s inequality. The second equality is guaranteed by the independence assumption of T 1,, T k. The second inequality is due to the Hoeffding s lemma. The last inequality is the upper bound of the previous term after maximization with respect to t. The last item of the above equation shows that a higher convergence rate is obtained. From the above proposition, the strategy is to choose those T j that tend to be independent. Besides, the influence spread of a particular seed set is not only dependent on different snapshots X (i j of subgraphs but also on different joins of them, which determines the network topology of the snapshot of the original graph. Hence, we define (,, j as a join path with 1 stages (,, (, j 3,, (j 1, j. Definition 1. We call two join paths (i 1, i 2,, i and (,,, j are orthogonal if there is no overlapping stage, i.e., (i α, i α+1 (j α, j α+1 for all 1 α 1. Definition 2. Let T β1 = 1/M M α=1 g j (α,β 1 1,,j (α,β 1 and T β2 = 1/M M α =1 g j (α,β 2 1,,j (α,β 2.T β1 and T β2 are called join-independent if for any given α and α such that 1 α, α M, (j (α,β 1 1,, j (α,β 1 and (j (α,β 2 1,, j (α,β 2 are orthogonal. For example, in Figure 2, T 1, T 2, T 3 are pairwise joinindependent and there are paths in T 1 and T 4 with two overlapping stages. ow, we introduce the set T = {T β T β1, T β2 are join-independent, T β1, T β2 T }. We use T β T to estimate the influence spread. The following proposition shows how many T β are selected. Proposition 3. Let T be the set of T β that are join-dependent pairwise, i.e., T = {T β T β1, T β2 are join-independent, T β1, T β2 T }, then we have T = M. Proof. All the paths in T β T form a set of paths P. Each candidate T corresponds to a unique P. All candidates T (or P have the same cardinality. It can be verified that the set P 0 = {(i, j, i, j,, 1 i, j M} is a portion of candidates P. So, we have P = P 0 = M 2, which means T = M for T β T with M paths. Algorithm 2: The Subgraph Incremental Algorithm. Input: G = (V, E, G 1 = (V 1, E 1,, G = (V, E, k, R, propagation probability p Output: the optimal seed set S that maximizes the influence on the original network 1 for i = 1, 2,, do 2 for j = 1, 2,, M do 3 sample each edge with probability p on the i-th subgraph to obtain a snapshot X (i j ; 4 use the DFS graph search to get a class of SCCs C (i j ; 5 for i = 1, 2,, M do 6 for j = 1, 2,, M do 7 calculate C ij using the function join(, 8 S ; 9 while S < k do 10 for v V \S do 11 σ(v S 0; 12 for i = 1, 2,, M do 13 for j = 1, 2,, M do 14 if v S then return σ(v S; 15 else σ(v S σ(v S + C ij (v ; 16 t argmax v V \S 1 M 2 σ(v S; 17 S S {t} 18 return S Therefore, we regard σ(s = 1/M T β T T β as the estimation of the influence spread given the seed set S on the original graph. Though there may be many such T, from the proof of Proposition 3 it is easily can be seen that there exists some T 0 such that the set all the paths in T β T 0 equals P 0. Hence one specific expression of σ(s can be given by σ(s = 1/M M M j=1 g i,j,i,j, (S, where a polynomial number M 2 of snapshot joins are used. 5.4 Influence Maximization In this section, we evaluate our estimation of the influence spread in the process of influence maximization. Greedy algorithm [Kempe et al., 2003] is adopted to select seed nodes, starting with an empty seed set S, and then add the node u with the maximum marginal influence in every iteration, i.e., u = argmax v V \S σ(s v σ(s, into S until the threshold k ( S = k is met. Because the influence spread function is non-negative, monotone and submodular, the accuracy of the optimum solution is guaranteed by f(s (1 1/ef(S, where S is the optimal solution [emhauser et al., 1978]]. We summarize the above in Algorithm 2. ote that σ(s is used in each iteration when the marginal influence is calculated σ(s v σ(s = 1/M 2 M M j=1 [g i,j,i,j, (S v g i,j,i,j, (S] = 1/M 2 M (S; X (1 M j=1 i X (2 j { (S v; X(1 i X (2 j } = 1/M 2 M M j=1 [1 I(S v] C i,j (v, where I(S v = 1 if s S such that s v, C i,j (v denotes the SCC including v on the snapshot 2080

6 Datasets number of nodes number of edges ego-facebook 4, ,468 ca-hepph 12, ,521 ca-condmat 23, ,936 -enron 36, ,662 Table 1: The four real-world datasets. X (1 i X (2 j whose class of SCCs is denoted by C ij. 6 Experiments We conduct experiments on four real-world data sets. All algorithms are implemented in C++ and run on a Windows 7 system with 2.3GHz CPU and 4GB memory. 6.1 Data Sets We use four network datasets from snap.stanford.edu/data/ for testing and comparisons. The number of nodes and edges in each network is summarized in Table 1. ego-facebook: This data set consists of circles (or friends lists from Facebook. It was collected from survey participants using the Facebook app. ca-hepph and ca-condmat: These two data sets are obtained from the e-print arxiv. If authors i and j publish a paper together, the network contains an edge connecting nodes i to j. -enron: odes of the network are addresses and if an address i sent at least one to address j, the graph contains an undirected edge from i to j. 6.2 Benchmark Methods We compare the incremental algorithm with five algorithms. The propagation probability of each edge is set to p = p uv = 0.01, uv E. : We set M = 10, 000 in the Monte-Carlo simulations to estimate the spread of a seed set. : reuses the generated snapshots to reduce the number of simulations. The number of snapshots is set to M = 10, 000. : This algorithm is developed for the IC model with the uniform propagation probability. : It uses maximization paths for influence spread estimation. The value of the parameter θ is set to 1/320. : We implement the power method with a damping factor of 0.85 and set the k highest-ranked nodes as seeds. The algorithm stops by the threshold value of 10 6 with the L 1 -norm. 6.3 Experimental Results In our experiments, to obtain the influence spread of the heuristic algorithms for each seed set, we run 10,000 simulations and take the average spread as the influence spread, which matches the accuracy of greedy algorithms (a ego-facebook (c ca-condmat (b ca-hepph (d -enron Figure 3: Comparisons between benchmark algorithms and the incremental algorithm on the four data sets. Running time (in msec ego facebook ca CondMat ca HepPh Enron Figure 4: Comparisons w.r.t. the running time on the four data sets Influence spread. Figure 3 shows the influence spreads of different algorithms on the real-world networks with respect to the seed set size k. k increases from 1 to 50. From the figure, we can observe that our method performs the best in all settings. Together with and, our method is also independent of the network structures. Moreover, only M = 100 snapshots on each subgraph are generated to estimate the influence spreads in the process of selecting the most influential nodes. The results show much fewer MC simulations than the other greedy methods. Running time. Figure 4 shows the running time under the optimal seed set of size 50 on the four datasets. Undoubtedly, heuristic methods, including DigreeDiscount, and, are the fastest. is the slowest because too many MC simulations are employed during the estimation of influence spreads. We can observe that the speed of our algorithm is as fast as the algorithm which 2081

7 is a popularly used scalable algorithm. From these result, we can conclude that our algorithm is robust to network structures, performs the best compared with existing algorithms, and scale well to large networks. 7 Conclusions In this paper, we presented an incremental algorithm for solving the big network influence maximization problem. Our idea is to break down the original graph into small subgraphs which can be sequentially processed as subgraph streams. Experimental results show that the algorithm is robust to network structures, performs the best compared with existing algorithms, and scale well to large networks. Acknowledgments This work was supported by the SFC (o , and the Strategic Leading Science and Technology Projects of CAS (o.xda , 973 project (o. 2013CB and Australia ARC Discovery Project (DP References [Balsubramani et al., 2013] Akshay Balsubramani, Sanjoy Dasgupta, and Yoav Freund. The fast convergence of incremental pca. In Advances in eural Information Processing Systems, pages , [Chen et al., 2009] Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [Chen et al., 2010] Wei Chen, Chi Wang, and Yajun Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [Chen et al., 2014] Yi-Cheng Chen, Wen-Yuan Zhu, Wen- Chih Peng, Wang-Chien Lee, and Suh-Yin Lee. Cim: Community-based influence maximization in social networks. ACM Trans. Intell. Syst. Technol., 5(2:25:1 25:31, April [Cheng et al., 2013] Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, and Xueqi Cheng. Staticgreedy: solving the scalability-accuracy dilemma in influence maximization. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages ACM, [Goyal et al., 2011] Amit Goyal, Wei Lu, and Laks VS Lakshmanan. Celf++: optimizing the greedy algorithm for influence maximization in social networks. In Proceedings of the 20th international conference companion on World wide web, pages ACM, [Hoeffding, 1963] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301:13 30, [Jung et al., 2011] Kyomin Jung, Wooram Heo, and Wei Chen. Irie: Scalable and robust influence maximization in social networks. arxiv preprint arxiv: , [Kempe et al., 2003] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [Kivinen et al., 2004] Jyrki Kivinen, Alexander J Smola, and Robert C Williamson. Online learning with kernels. Signal Processing, IEEE Transactions on, 52(8: , [Laskov et al., 2006] Pavel Laskov, Christian Gehl, Stefan Krüger, and Klaus-Robert Müller. Incremental support vector learning: Analysis, implementation and applications. The Journal of Machine Learning Research, 7: , [Leskovec et al., 2007] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and atalie Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [Liu et al., 2014] Qi Liu, Biao Xiang, Enhong Chen, Hui Xiong, Fangshuang Tang, and Jeffrey Xu Yu. Influence maximization over large-scale social networks: A bounded linear approach. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pages ACM, [emhauser et al., 1978] George L emhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functionsłi. Mathematical Programming, 14(1: , [Ohsaka et al., 2014] aoto Ohsaka, Takuya Akiba, Yuichi Yoshida, and Ken-ichi Kawarabayashi. Fast and accurate influence maximization on large networks with pruned monte-carlo simulations [Poggio, 2001] Gert CauwenberghsTomaso Poggio. Incremental and decremental support vector machine learning. In Advances in eural Information Processing Systems 13: Proceedings of the 2000 Conference, volume 13, page 409. MIT Press, [Wang et al., 2010] Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [Zhou et al., ] Chuan Zhou, Peng Zhang, Wenyu Zang, and Li Guo. On the upper bounds of spread for greedy algorithms in social network influence maximization. IEEE Transactions on Knowledge and Data Engineering, no. 1, pp. 1, PrePrints PrePrints, doi: /tkde

Maximizing the Coverage of Information Propagation in Social Networks

Maximizing the Coverage of Information Propagation in Social Networks Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 215) Maximizing the Coverage of Information Propagation in Social Networks Zhefeng Wang, Enhong Chen, Qi

More information

arxiv: v4 [cs.si] 21 May 2017

arxiv: v4 [cs.si] 21 May 2017 Submitted to manuscript 2017 No Time to Observe: Adaptive Influence Maximization with Partial Feedback Jing Yuan Shaojie Tang The University of Texas at Dallas arxiv:1609.00427v4 [cs.si] 21 May 2017 Although

More information

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03 Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03 Influence and Social Networks Economics, sociology, political science, etc. all have studied

More information

Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process

Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process Wei Chen Microsoft Research Asia Beijing, China weic@microsoft.com Wei Lu University of British Columbia Vancouver,

More information

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework Performance Evaluation ( ) Contents lists available at ScienceDirect Performance Evaluation journal homepage: www.elsevier.com/locate/peva Analyzing competitive influence maximization problems with partial

More information

Diffusion of Innovation and Influence Maximization

Diffusion of Innovation and Influence Maximization Diffusion of Innovation and Influence Maximization Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics

More information

Influence Spreading Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond

Influence Spreading Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond 1 ing Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond Bo Liu, Gao Cong, Yifeng Zeng, Dong Xu,and Yeow Meng Chee Abstract Influence maximization is a fundamental

More information

A Fast Approximation for Influence Maximization in Large Social Networks

A Fast Approximation for Influence Maximization in Large Social Networks A Fast Approximation for Influence Maximization in Large Social Networks ABSTRACT Jong-Ryul Lee Dept. of Computer Science, KAIST 29 Daehak-ro, Yuseong-gu, Daejeon, Korea jrlee@islab.kaist.ac.kr This paper

More information

Greedy Maximization Framework for Graph-based Influence Functions

Greedy Maximization Framework for Graph-based Influence Functions Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)

More information

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users?

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? ABSTRACT Yu Yang Simon Fraser University Burnaby, Canada yya119@sfu.ca Jian Pei Simon Fraser University Burnaby,

More information

9. Submodular function optimization

9. Submodular function optimization Submodular function maximization 9-9. Submodular function optimization Submodular function maximization Greedy algorithm for monotone case Influence maximization Greedy algorithm for non-monotone case

More information

Influence Maximization with ε-almost Submodular Threshold Functions

Influence Maximization with ε-almost Submodular Threshold Functions Influence Maximization with ε-almost Submodular Threshold Functions Qiang Li, Wei Chen, Xiaoming Sun, Jialin Zhang CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology,

More information

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network Peng Zhang Purdue University zhan1456@purdue.edu Yajun Wang Microsoft yajunw@microsoft.com Wei Chen Microsoft weic@microsoft.com

More information

Time Constrained Influence Maximization in Social Networks

Time Constrained Influence Maximization in Social Networks Time Constrained Influence Maximization in Social Networks Bo Liu Facebook Menlo Park, CA, USA bol@fb.com Gao Cong, Dong Xu School of Computer Engineering Nanyang Technological University, Singapore {gaocong,

More information

Profit Maximization for Viral Marketing in Online Social Networks

Profit Maximization for Viral Marketing in Online Social Networks Maximization for Viral Marketing in Online Social Networks Jing TANG Interdisciplinary Graduate School Nanyang Technological University Email: tang0311@ntu.edu.sg Xueyan TANG School of Comp. Sci. & Eng.

More information

Time sensitive influence maximization in social networks

Time sensitive influence maximization in social networks Time sensitive influence maximization in social networks Mohammadi, A, Saraee, MH and Mirzaei, A http://dx.doi.org/10.1177/0165551515602808 Title Authors Type URL Time sensitive influence maximization

More information

Influence Maximization in Dynamic Social Networks

Influence Maximization in Dynamic Social Networks Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang and Xiaoming Sun Department of Computer Science and Technology, Tsinghua University Department of Computer

More information

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University CS 322: (Social and Inormation) Network Analysis Jure Leskovec Stanord University Initially some nodes S are active Each edge (a,b) has probability (weight) p ab b 0.4 0.4 0.2 a 0.4 0.2 0.4 g 0.2 Node

More information

Multi-Round Influence Maximization

Multi-Round Influence Maximization Multi-Round Influence Maximization Lichao Sun 1, Weiran Huang 2, Philip S. Yu 1,2, Wei Chen 3, 1 University of Illinois at Chicago, 2 Tsinghua University, 3 Microsoft Research lsun29@uic.edu, huang.inbox@outlook.com,

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Find most influential set S of size k: largest expected cascade size f(s) if set S is activated

More information

On Multiset Selection with Size Constraints

On Multiset Selection with Size Constraints On Multiset Selection with Size Constraints Chao Qian, Yibo Zhang, Ke Tang 2, Xin Yao 2 Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University

More information

In Search of Influential Event Organizers in Online Social Networks

In Search of Influential Event Organizers in Online Social Networks In Search of Influential Event Organizers in Online Social Networks ABSTRACT Kaiyu Feng 1,2 Gao Cong 2 Sourav S. Bhowmick 2 Shuai Ma 3 1 LILY, Interdisciplinary Graduate School. Nanyang Technological University,

More information

Maximizing the Spread of Influence through a Social Network

Maximizing the Spread of Influence through a Social Network Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,

More information

Detecting Anti-majority Opinionists Using Value-weighted Mixture Voter Model

Detecting Anti-majority Opinionists Using Value-weighted Mixture Voter Model Detecting Anti-majority Opinionists Using Value-weighted Mixture Voter Model Masahiro Kimura, Kazumi Saito 2, Kouzou Ohara 3, and Hiroshi Motoda 4 Department of Electronics and Informatics, Ryukoku University

More information

Subset Selection under Noise

Subset Selection under Noise Subset Selection under Noise Chao Qian 1 Jing-Cheng Shi 2 Yang Yu 2 Ke Tang 3,1 Zhi-Hua Zhou 2 1 Anhui Province Key Lab of Big Data Analysis and Application, USTC, China 2 National Key Lab for Novel Software

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

WITH the vast penetration of online social networks

WITH the vast penetration of online social networks An Efficient Linear Programming Based Method for the Influence Maximization Problem in Social Networks 1 Abstract The influence maximization problem (IMP aims to determine the most influential individuals

More information

Summarizing Transactional Databases with Overlapped Hyperrectangles

Summarizing Transactional Databases with Overlapped Hyperrectangles Noname manuscript No. (will be inserted by the editor) Summarizing Transactional Databases with Overlapped Hyperrectangles Yang Xiang Ruoming Jin David Fuhry Feodor F. Dragan Abstract Transactional data

More information

Diffusion of Innovation

Diffusion of Innovation Diffusion of Innovation Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social Network Analysis

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model

Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model Xinran He Guojie Song Wei Chen Qingye Jiang Ministry of Education Key Laboratory of Machine Perception, Peking

More information

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24

CSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24 CSCI 3210: Computational Game Theory Cascading Behavior in Networks Ref: [AGT] Ch 24 Mohammad T. Irfan Email: mirfan@bowdoin.edu Web: www.bowdoin.edu/~mirfan Course Website: www.bowdoin.edu/~mirfan/csci-3210.html

More information

Submodular Functions Properties Algorithms Machine Learning

Submodular Functions Properties Algorithms Machine Learning Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised

More information

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach. arxiv: v3 [cs.

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach. arxiv: v3 [cs. 1 Boosting Information Spread: An Algorithmic Approach Yishi Lin, Wei Chen Member, IEEE, John C.S. Lui Fellow, IEEE arxiv:1602.03111v3 [cs.si] 26 Jun 2017 Abstract The majority of influence maximization

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More information

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE

Cost and Preference in Recommender Systems Junhua Chen LESS IS MORE Cost and Preference in Recommender Systems Junhua Chen, Big Data Research Center, UESTC Email:junmshao@uestc.edu.cn http://staff.uestc.edu.cn/shaojunming Abstract In many recommender systems (RS), user

More information

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks Shen HW, Cheng XQ, Wang YZ et al. A dimensionality reduction framework for detection of multiscale structure in heterogeneous networks. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(2): 341 357 Mar. 2012.

More information

Learning to Predict Opinion Share in Social Networks

Learning to Predict Opinion Share in Social Networks Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence AAAI-10 Learning to Predict Opinion Share in Social Networks Masahiro Kimura Department of Electronics and Informatics Ryukoku

More information

A Note on the Budgeted Maximization of Submodular Functions

A Note on the Budgeted Maximization of Submodular Functions A Note on the udgeted Maximization of Submodular Functions Andreas Krause June 2005 CMU-CALD-05-103 Carlos Guestrin School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Many

More information

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada On Top-k Structural 1 Similarity Search Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China 2014/10/14 Pei

More information

arxiv: v1 [stat.ml] 28 Oct 2017

arxiv: v1 [stat.ml] 28 Oct 2017 Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical

More information

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Bin Gao Tie-an Liu Wei-ing Ma Microsoft Research Asia 4F Sigma Center No. 49 hichun Road Beijing 00080

More information

A Wavelet Neural Network Forecasting Model Based On ARIMA

A Wavelet Neural Network Forecasting Model Based On ARIMA A Wavelet Neural Network Forecasting Model Based On ARIMA Wang Bin*, Hao Wen-ning, Chen Gang, He Deng-chao, Feng Bo PLA University of Science &Technology Nanjing 210007, China e-mail:lgdwangbin@163.com

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences

Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences Minimum-sized Positive Influential Node Set Selection for Social Networks: Considering Both Positive and Negative Influences Jing (Selena) He, Shouling Ji, Xiaojing Liao, Hisham M. Haddad, Raheem Beyah

More information

Computing Competing Cascades on Signed Networks

Computing Competing Cascades on Signed Networks Noname manuscript No. (will be inserted by the editor Computing Competing Cascades on Signed Networks Ajitesh Srivastava Charalampos Chelmis Viktor K. Prasanna Received: date / Accepted: date Abstract

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness

More information

Model updating mechanism of concept drift detection in data stream based on classifier pool

Model updating mechanism of concept drift detection in data stream based on classifier pool Zhang et al. EURASIP Journal on Wireless Communications and Networking (2016) 2016:217 DOI 10.1186/s13638-016-0709-y RESEARCH Model updating mechanism of concept drift detection in data stream based on

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

Influence Maximization in Social Networks: An Ising-model-based Approach

Influence Maximization in Social Networks: An Ising-model-based Approach Influence Maximization in Social etworks: An Ising-model-based Approach Shihuan Liu, Lei Ying, and Srinivas Shakkottai Department of Electrical and Computer Engineering, Iowa State University Email: {liush08,

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

From Competition to Complementarity: Comparative Influence Diffusion and Maximization

From Competition to Complementarity: Comparative Influence Diffusion and Maximization From Competition to Complementarity: Comparative Influence Diffusion and Maximization ei Lu University of British Columbia Vancouver, B.C., Canada welu@cs.ubc.ca ei Chen Microsoft Research Beijing, China

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Tractable Models for Information Diffusion in Social Networks

Tractable Models for Information Diffusion in Social Networks Tractable Models for Information Diffusion in Social Networks Masahiro Kimura 1 and Kazumi Saito 2 1 Department of Electronics and Informatics, Ryukoku University Otsu, Shiga 520-2194, Japan 2 NTT Communication

More information

Robust Influence Maximization

Robust Influence Maximization Robust Influence Maximization Wei Chen Microsoft Research weic@microsoft.com Mingfei Zhao IIIS, Tsinghua University mingfeizhao@hotmail.com Tian Lin Tsinghua University lint0@mails.tsinghua.edu.cn Xuren

More information

Similarity Measures for Link Prediction Using Power Law Degree Distribution

Similarity Measures for Link Prediction Using Power Law Degree Distribution Similarity Measures for Link Prediction Using Power Law Degree Distribution Srinivas Virinchi and Pabitra Mitra Dept of Computer Science and Engineering, Indian Institute of Technology Kharagpur-72302,

More information

On the Submodularity of Influence in Social Networks

On the Submodularity of Influence in Social Networks On the Submodularity of Influence in Social Networks Elchanan Mossel Dept. of Statistics U.C. Berkeley mossel@stat.berkeley.edu Sebastien Roch Dept. of Statistics U.C. Berkeley sroch@stat.berkeley.edu

More information

Mining Triadic Closure Patterns in Social Networks

Mining Triadic Closure Patterns in Social Networks Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University

More information

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria Coping With NP-hardness Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you re unlikely to find poly-time algorithm. Must sacrifice one of three desired features. Solve

More information

Influence Maximization in Continuous Time Diffusion Networks

Influence Maximization in Continuous Time Diffusion Networks Manuel Gomez-Rodriguez 1,2 Bernhard Schölkopf 1 1 MPI for Intelligent Systems and 2 Stanford University MANUELGR@STANFORD.EDU BS@TUEBINGEN.MPG.DE Abstract The problem of finding the optimal set of source

More information

A Polynomial-Time Algorithm for Pliable Index Coding

A Polynomial-Time Algorithm for Pliable Index Coding 1 A Polynomial-Time Algorithm for Pliable Index Coding Linqi Song and Christina Fragouli arxiv:1610.06845v [cs.it] 9 Aug 017 Abstract In pliable index coding, we consider a server with m messages and n

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Deterministic Decentralized Search in Random Graphs

Deterministic Decentralized Search in Random Graphs Deterministic Decentralized Search in Random Graphs Esteban Arcaute 1,, Ning Chen 2,, Ravi Kumar 3, David Liben-Nowell 4,, Mohammad Mahdian 3, Hamid Nazerzadeh 1,, and Ying Xu 1, 1 Stanford University.

More information

Least cost influence propagation in (social) networks

Least cost influence propagation in (social) networks Least cost influence propagation in (social) networks Matteo Fischetti 1, Michael Kahr 2, Markus Leitner 2, Michele Monaci 3 and Mario Ruthmair 2 1 DEI, University of Padua, Italy. matteo.fischetti@unipd.it

More information

Info-Cluster Based Regional Influence Analysis in Social Networks

Info-Cluster Based Regional Influence Analysis in Social Networks Info-Cluster Based Regional Influence Analysis in Social Networks Chao Li,2,3, Zhongying Zhao,2,3,JunLuo, and Jianping Fan Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen

More information

DRIMUX: Dynamic Rumor Influence Minimization with User Experience in Social Networks

DRIMUX: Dynamic Rumor Influence Minimization with User Experience in Social Networks DRIMUX: Dynamic Rumor Influence Minimization with User Experience in Social Networks Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Department of Electronic Engineering, Shanghai

More information

Technische Universität Dresden Institute of Numerical Mathematics

Technische Universität Dresden Institute of Numerical Mathematics Technische Universität Dresden Institute of Numerical Mathematics An Improved Flow-based Formulation and Reduction Principles for the Minimum Connectivity Inference Problem Muhammad Abid Dar Andreas Fischer

More information

Analytically tractable processes on networks

Analytically tractable processes on networks University of California San Diego CERTH, 25 May 2011 Outline Motivation 1 Motivation Networks Random walk and Consensus Epidemic models Spreading processes on networks 2 Networks Motivation Networks Random

More information

Information Dissemination in Social Networks under the Linear Threshold Model

Information Dissemination in Social Networks under the Linear Threshold Model Information Dissemination in Social Networks under the Linear Threshold Model Srinivasan Venkatramanan and Anurag Kumar Department of Electrical Communication Engineering, Indian Institute of Science,

More information

Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm

Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm Tasuku Soma TASUKU SOMA@MIST.I.U-TOKYO.AC.JP Graduate School of Information Science and Technology, The University of Tokyo, Tokyo,

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Determining the Diameter of Small World Networks

Determining the Diameter of Small World Networks Determining the Diameter of Small World Networks Frank W. Takes & Walter A. Kosters Leiden University, The Netherlands CIKM 2011 October 2, 2011 Glasgow, UK NWO COMPASS project (grant #12.0.92) 1 / 30

More information

DS504/CS586: Big Data Analytics Graph Mining II

DS504/CS586: Big Data Analytics Graph Mining II Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 v Course Project I has been graded. Grading was based on v 1. Project report

More information

Synchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback

Synchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback Synchronization of a General Delayed Complex Dynamical Network via Adaptive Feedback Qunjiao Zhang and Junan Lu College of Mathematics and Statistics State Key Laboratory of Software Engineering Wuhan

More information

Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard

Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-214-28 December 2, 214 Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis,

More information

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan

Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks. Osman Yağan Modeling, Analysis, and Control of Information Propagation in Multi-layer and Multiplex Networks Osman Yağan Department of ECE Carnegie Mellon University Joint work with Y. Zhuang and V. Gligor (CMU) Alex

More information

Subset Selection under Noise

Subset Selection under Noise Subset Selection under Noise Chao Qian Jing-Cheng Shi 2 Yang Yu 2 Ke Tang 3, Zhi-Hua Zhou 2 Anhui Province Key Lab of Big Data Analysis and Application, USTC, China 2 National Key Lab for Novel Software

More information

The Bang for the Buck: Fair Competitive Viral Marketing from the Host Perspective

The Bang for the Buck: Fair Competitive Viral Marketing from the Host Perspective The Bang for the Buck: Fair Competitive Viral Marketing from the Host Perspective Wei Lu Francesco Bonchi Amit Goyal Laks V.S. Lakshmanan University of British Columbia Yahoo! Research Vancouver, B.C.,

More information

WITH the recent advancements of information technologies,

WITH the recent advancements of information technologies, i Distributed Rumor Blocking with Multiple Positive Cascades Guangmo (Amo) Tong, Student Member, IEEE, Weili Wu, Member, IEEE, and Ding-Zhu Du, arxiv:1711.07412 [cs.si] 1 Dec 2017 Abstract Misinformation

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

On Constrained Boolean Pareto Optimization

On Constrained Boolean Pareto Optimization On Constrained Boolean Pareto Optimization Chao Qian and Yang Yu and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software

More information

A Submodular Framework for Graph Comparison

A Submodular Framework for Graph Comparison A Submodular Framework for Graph Comparison Pinar Yanardag Department of Computer Science Purdue University West Lafayette, IN, 47906, USA ypinar@purdue.edu S.V.N. Vishwanathan Department of Computer Science

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Notes on MapReduce Algorithms

Notes on MapReduce Algorithms Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our

More information

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,

More information

Cascade Dynamics Modeling with Attention-based Recurrent Neural Network

Cascade Dynamics Modeling with Attention-based Recurrent Neural Network Cascade Dynamics Modeling with Attention-based Recurrent Neural Network Yongqing Wang 1,2,, Huawei Shen 1,2,, Shenghua Liu 1,2,, Jinhua Gao 1,2, and Xueqi Cheng 1,2, 1 CAS Key Laboratory of Network Data

More information

Online Influence Maximization

Online Influence Maximization Online Influence Maximization Siyu Lei University of Hong Kong Pokfulam Road, Hong Kong sylei@cs.hku.hk Reynold Cheng University of Hong Kong Pokfulam Road, Hong Kong ckcheng@cs.hku.hk Silviu Maniu Noah

More information

ParaGraphE: A Library for Parallel Knowledge Graph Embedding

ParaGraphE: A Library for Parallel Knowledge Graph Embedding ParaGraphE: A Library for Parallel Knowledge Graph Embedding Xiao-Fan Niu, Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer Science and Technology, Nanjing University,

More information

CONTAINMENT OF MISINFORMATION PROPAGATION IN ONLINE SOCIAL NETWORKS WITH GIVEN DEADLINE

CONTAINMENT OF MISINFORMATION PROPAGATION IN ONLINE SOCIAL NETWORKS WITH GIVEN DEADLINE Association for Information Systems AIS Electronic Library (AISeL) PACIS 2014 Proceedings Pacific Asia Conference on Information Systems (PACIS) 2014 CONTAINMENT OF MISINFORMATION PROPAGATION IN ONLINE

More information

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria 12. LOCAL SEARCH gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley h ttp://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

On Subset Selection with General Cost Constraints

On Subset Selection with General Cost Constraints On Subset Selection with General Cost Constraints Chao Qian, Jing-Cheng Shi 2, Yang Yu 2, Ke Tang UBRI, School of Computer Science and Technology, University of Science and Technology of China, Hefei 23002,

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Time Series Data Cleaning

Time Series Data Cleaning Time Series Data Cleaning Shaoxu Song http://ise.thss.tsinghua.edu.cn/sxsong/ Dirty Time Series Data Unreliable Readings Sensor monitoring GPS trajectory J. Freire, A. Bessa, F. Chirigati, H. T. Vo, K.

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Splitting a Logic Program Revisited

Splitting a Logic Program Revisited Splitting a Logic Program Revisited Jianmin Ji School of Computer Science and Technology University of Science and Technology of China Hefei 230027, China jianmin@ustc.edu.cn Hai Wan and Ziwei Huo and

More information

arxiv: v1 [cs.lg] 15 Aug 2017

arxiv: v1 [cs.lg] 15 Aug 2017 Theoretical Foundation of Co-Training and Disagreement-Based Algorithms arxiv:1708.04403v1 [cs.lg] 15 Aug 017 Abstract Wei Wang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing

More information