Identifying influential spreaders in complex networks based on entropy method

Similar documents
Iterative resource allocation for ranking spreaders in complex networks

Identifying influential spreaders and efficiently estimating infection numbers in epidemic models: a walk counting approach

Kristina Lerman USC Information Sciences Institute

arxiv: v1 [cs.si] 21 Dec 2018

Social Influence in Online Social Networks. Epidemiological Models. Epidemic Process

On Susceptible-Infected-Susceptible Epidemic Spreading: an Overview of Recent Study

arxiv: v1 [cs.si] 1 Jul 2014

Source Locating of Spreading Dynamics in Temporal Networks

Epidemics and information spreading

Centrality Measures. Leonid E. Zhukov

arxiv: v1 [physics.soc-ph] 13 Nov 2013

Diffusion of Innovation and Influence Maximization

Lecture 10. Under Attack!

arxiv: v2 [physics.soc-ph] 27 Apr 2015

Web Structure Mining Nodes, Links and Influence

Modelling of the Hand-Foot-Mouth-Disease with the Carrier Population

Inferring the origin of an epidemic with a dynamic message-passing algorithm

Centrality Measures. Leonid E. Zhukov

Node and Link Analysis

Iterative Laplacian Score for Feature Selection

Similarity Measures for Link Prediction Using Power Law Degree Distribution

The prediction of membrane protein types with NPE

A new centrality measure for probabilistic diffusion in network

The Semi-Pascal Triangle of Maximum Deng Entropy

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

Diffusion of Innovation

ECS 289 / MAE 298, Lecture 7 April 22, Percolation and Epidemiology on Networks, Part 2 Searching on networks

Peripheries of Cohesive Subsets

Diffusion of information and social contagion

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

The Research of Railway Coal Dispatched Volume Prediction Based on Chaos Theory

A Modified Earthquake Model Based on Generalized Barabási Albert Scale-Free

arxiv: v1 [physics.soc-ph] 7 Jul 2015

arxiv:physics/ v1 9 Jun 2006

Analysis of an Optimal Measurement Index Based on the Complex Network

ECS 289 F / MAE 298, Lecture 15 May 20, Diffusion, Cascades and Influence

Cellular Automata Models for Diffusion of Innovations

The Dynamic Properties of a Deterministic SIR Epidemic Model in Discrete-Time

arxiv: v1 [physics.soc-ph] 26 Apr 2017

Virgili, Tarragona (Spain) Roma (Italy) Zaragoza, Zaragoza (Spain)

Peripheries of cohesive subsets

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Numerical evaluation of the upper critical dimension of percolation in scale-free networks

Phase Transitions of an Epidemic Spreading Model in Small-World Networks

Synchronization in Quotient Network Based on Symmetry

arxiv: v3 [physics.soc-ph] 14 Nov 2018

Doubly Decomposing Nonparametric Tensor Regression (ICML 2016)

arxiv: v1 [physics.soc-ph] 20 Oct 2010

Application of Statistical Physics to Terrorism

Interact with Strangers

arxiv: v1 [physics.soc-ph] 28 Feb 2014

Network Infusion to Infer Information Sources in Networks Soheil Feizi, Ken Duffy, Manolis Kellis, and Muriel Medard

Networks as a tool for Complex systems

GEMF: GENERALIZED EPIDEMIC MODELING FRAMEWORK SOFTWARE IN PYTHON

arxiv: v2 [physics.soc-ph] 6 Aug 2012

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap

Data Mining and Analysis: Fundamental Concepts and Algorithms

Anti-synchronization Between Coupled Networks with Two Active Forms

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Visualisation of knowledge domains in interdisciplinary research organizations

Network synchronizability analysis: The theory of subgraphs and complementary graphs

A Study on Relationship between Modularity and Diffusion Dynamics in Networks from Spectral Analysis Perspective

Modeling Epidemic Risk Perception in Networks with Community Structure

Network Theory with Applications to Economics and Finance

A stencil of the finite-difference method for the 2D convection diffusion equation and its new iterative scheme

Toward Understanding Spatial Dependence on Epidemic Thresholds in Networks

OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks. Fan Zhang 1, Wenjie Zhang 2, Ying Zhang 1, Lu Qin 1, Xuemin Lin 2

A node-based SIRS epidemic model with infective media on complex networks

Social Networks- Stanley Milgram (1967)

Online Sampling of High Centrality Individuals in Social Networks

Spreading and Opinion Dynamics in Social Networks

Threshold Parameter for a Random Graph Epidemic Model

Comparative Analysis for k-means Algorithms in Network Community Detection

Effective Social Network Quarantine with Minimal Isolation Costs

Social Networks with Competing Products

Network skeleton for synchronization: Identifying redundant connections Cheng-Jun Zhang, An Zeng highlights Published in

Network observability and localization of the source of diffusion based on a subset of nodes

CS224W: Social and Information Network Analysis

Controlling a Novel Chaotic Attractor using Linear Feedback

Location Regularization-Based POI Recommendation in Location-Based Social Networks

CS 224w: Problem Set 1

Stochastic modelling of epidemic spread

KINETICS OF SOCIAL CONTAGION. János Kertész Central European University. SNU, June

Hybrid Fault diagnosis capability analysis of Hypercubes under the PMC model and MM model

A new pseudorandom number generator based on complex number chaotic equation

HellRank: A Hellinger-based Centrality Measure for Bipartite Social Networks

Attentive Betweenness Centrality (ABC): Considering Options and Bandwidth when Measuring Criticality

Acceleration of Levenberg-Marquardt method training of chaotic systems fuzzy modeling

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016

Bivariate Degradation Modeling Based on Gamma Process

Universal robustness characteristic of weighted networks against cascading failure

Global Stability of a Computer Virus Model with Cure and Vertical Transmission

Evolutionary Games on Networks. Wen-Xu Wang and G Ron Chen Center for Chaos and Complex Networks

Spectral Graph Theory for. Dynamic Processes on Networks

Collective influence maximization in threshold models of information cascading with first-order transitions

Data Mining and Analysis: Fundamental Concepts and Algorithms

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING

arxiv: v1 [physics.soc-ph] 6 Nov 2013

Lecture 11 October 11, Information Dissemination through Social Networks

How to count trees? arxiv:cond-mat/ v3 [cond-mat.stat-mech] 7 Apr 2005

Transcription:

Abstract Identifying influential spreaders in complex networks based on entropy method Xiaojian Ma a, Yinghong Ma School of Business, Shandong Normal University, Jinan 250014, China; a xiaojianma0813@163.com It is a fundamental task in network to identifying the importance of nodes. In this paper, we propose a measure by a semi-local manner which considering both the nearest and next neighbor of a node.under the help of entropy value, the node whoes neighbors with uniform distribution is assigned to be a high centrality value. The experimental results have showed that the proposed method can better distinguish the influence on nodes when compared with DC, BC, CC, and H-index. And the proposed method is highly correlated with the result of SIR ranking and outperforms other methods in evaluating the nodes spreading. Keywords Complex network; Influential nodes; SIR epidemic model. 1. Introduction Identifying influential nodes in complex network is of theoretical and practical significance, such as the controlling and dissemination of information. A significant number of measures have been proposed in recently years to evaluate the importance of nodes. Degree centrality (DC) [1] is a basics measures, and the importance of one node is measured by the number of neighbors. Global measures such as betweenness centrality (BC) [2] and closeness centrality (CC)[3] can identify node influences in the global scope. K-shell [4] method has been proposed to identify the importance of nodes. Later, researchers proposed other modified method based on the K-shell to further improve the ranking performance [5]. Chen proposed a semi-local centrality [6], which take more neighbors information into consideration when evaluate the nodes importance. H-index [7] considers the importance of nodes according to the concept of h-index, that is to say there exist at least h neighbors whose degree is no less than h. The nodes with high degree value seem to be more influential in the spreading process, since more nodes will likely to be influenced in the process. However, more adjacent information will be neglected since the degree only considers nodes in a local scope. So, an effective measure capturing more information should be employed to evaluate the nodes importance in general information transfer. Motivated by these considerations, in this paper, we design a measure, which considers a node s centrality by taking the nearest and the next nearest neighbors into consideration, the application of entropy which achieves an effect that the nodes whose neighbors with uniform distribution is assigned to be a high centrality value. The SIR model is employed to examine the performance of the proposed method. Experiment results of several real networks suggest the measure outperforms the other measures in terms of correctness of the ranking list. In addition, the method can be easily applied in large-scale networks since it only requires known local information. The part of the paper is organized as followes. The details of the proposed measure are described in Section 2. The simulation strategies and experimental results are present in Section 3. Section 4 the discussions is given. 133

2. Proposed method 2.1 Entropy Information entropy, is widely used in information science and statistical physics to describe the order of information distribution. The Shannon s entropy of is defined as: n i1 p E( p) pilog( pi) (1) n Where pi( i 1,2,..., n) be a probability vector of event, and 0 1, pi 1.If the probabilities i1 uniform distribution, higher the entropy value will be [8].and as the value of n increases, so does the entropy value. Therefore, the entropy method can be employed to detection nodes with more uniform neighbors[9]. pi pi have 2.2 Proposed methood A Social Network is denoted as an graph consisting of nodes and edges shown as G (, E),where represent the set of nodes in network and E represent the set of edges between nodes. The Entropy Centrality of node is denoted by : vi Cv) ( Cv ( ) =- *log( ) d w 2( v) w w 2( v) w 2( v) EC( v) = C( u) u1( v) (2) (3) Where node w is the degree of node, and w2( v) is the nearest and next nearest neighbors of u v is the nearest neighbors of node. v and1( ) 3. Experiment 3.1 Dataset Some details about the real-world network can be found in Table 1.Including Karate club network (Karate) [10],Pol-book network (Polbook) [11],US air line (Usair)[12], E-mail network(email)[13]. The details include the Node number, Edge number,average Degree, Maximum degree and Clustering coefficient. Table 1 Some properties of network Network Node number Edge number Max degree Average degree Cluster Karata 34 78 17 4.5882 0.5706 Polbook 105 441 25 8.4 0.4875 Usair 332 2126 139 12.8072 0.6252 Email 1133 5451 71 9.623 0.2219 3.2 Evaluation In this section, the Proposed method is compared with other four well-known measures in aspect of discriminability and correctness. These measures include degree centralty[1], betweenness centrality[2], closeness centrality[3], and H-index[7]. 3.2.1 Distinguishability To evaluate the capability of different measures in distinguishing each nodes spreading ability. In this section, Complementary Cumulative Distribution Function (CCDF) [14]is utilized to measure how well the specification of the ranking distributions by different measures, calculated by: v CCDF () r 134 r i 1ni

Where is the total numbers of nodes and ni denotes the number of nodes in rank i, and. A slower slope in CCDF, a better ranking distribution performance the method achieves. Fig.1 CCDF plots for ranking list offered by different measures The CCDF is plotted for Karata, Polbook, Usair and Email shown in Fig. 1, AS can be seen in it, the proposed method achieve a better performance in terms of the ranking distributions. The degree method considers the neighbor information, while many nodes may have the same number of neighbors, so the degree cannot have a better performance in distinguishing the nodes influence. The BC and CC method measure the nodes influence in a global scope, but our method still perform better even under the circumstance that the two method consider more information. 3.2.2 Rank correlation In the next section, the susceptible-infectious-recovered (SIR) model [15] is utilized to examine the performance of the measures. The ranking of different measures are compared with the ranking obtained from the SIR model. In the SIR model, At the beginning, one node is set to be infected, all the other nodes are in state susceptible. Then, each infected nodes infect its susceptible neighbors with a probability of and then the node itself moves to the R state. The spreading process is repeated until no nodes with state I. And the number of the recovered nodes denotes the spreading influence of the initially infected node. The value of should be set as larger than the threshold value [16], k calculated as 2 k,where k and k 2 denote the average degrees and average second-order degree of the nodes,respectively. The kendall s is employed as a rank correlation coefficient [17]. By comparing the ranking list obtained from the SIR simulation with the ranking list generated through each of the measures. Nc Nd ( XY, ) 1 N( N 1) 2 Where c N and d N are the numbers of concordant and discordant pairs in the ranking lists respectively the higher is,the more concordant of the ranking lists with the SIR spreading. Firstly, the rank correlations of DC, BC, CC, H-index and our proposed method are compared in Karate, Polbook, Usair and Email networks. Shown as in Table 2 the rank correlation under the infection probability are calculated. The result shows that our method is more correlated with the 135 th

SIR spreading process. Next, the performance of different method is evaluated under different infection probability.as can observed from Fig. 2 that the proposed method outperforms other method when the infection probability β is around the threshold value. th Table 2 The Rank correlation Network Karata 0.129 0.13 0.7385 0.6256 0.7395 0.6779 0.9397 Polbook 0.0838 0.09 0.7814 0.3669 0.3871 0.7588 0.9149 Usair 0.0225 0.03 0.7538 0.8126 0.8126 0.7356 0.9193 Email 0.0535 0.06 0.7913 0.6308 0.8145 0.7711 0.8351 th ( DC, ) ( BC, ) ( CC, ) ( Hin, ) ( Our, ) Karata Polbook 4. Conclusion Usair Email Fig. 2 Rank correlation under different probability It is a fundamental task in network to identifying nodes importance, especially the application in epidemic spread control. In this paper, an effective measure is proposed in evaluating the influence of nodes. The nodes importance is taken into consideration by a manner of semi-local in which both the nearest and the next nearest neighbors are taken into consideration. Through application of entropy value, the node who has neighbors with uniform distribution is assigned to be a high centrality value. The experimental results have demonstrated that the proposed method can better distinguish the influence on nodes with more nodes are assigned with different centrality values when compared with DC, BC, CC, and H-index. Another experiment shows that the proposed method is highly correlated with the result of SIR ranking. References [1] Freeman L C. Centrality in social networks conceptual clarification[j]. Social Networks, 1978, 1(3):215-239. [2] Freeman L C. A Set of Measures of Centrality Based on Betweenness[J]. Sociometry, 1977, 40(1):35-41. 136

[3] Sabidussi G. The centrality index of a graph[j]. Psychometrika, 1966, 31(4):581-603. [4] Kitsak M, Gallos L K, Havlin S, et al. Identification of influential spreaders in complex networks[j]. Nature Physics, 2011, 6(11):888-893. [5] Wang Z, Zhao Y, Xi J, et al. Fast ranking influential nodes in complex networks using a k-shell iteration factor[j]. Physica A Statistical Mechanics & Its Applications, 2016, 461:171-181. [6] D. Chen, L. Y. Lü, M. S. Shang, et al, Identifying influential nodes in complex networks, Physica A, 391(4) (2011) 1777 1787. [7] LüL, Zhou T, Zhang Q M, et al. The H-index of a network node and its relation to degree and coreness[j]. Nature Communications, 2016, 7:10168. [8] COER TM, THOMAS JA. Elements of information theory. 2nd ed; 2006. [9] Zareie A, Sheikhahmadi A, Fatemi A. Influential nodes ranking in complex networks: An entropy-based approach[j]. Chaos Solitons & Fractals, 2017, 104:485-494. [10] Zachary W W. An Information Flow Model for Conflict and Fission in Small Groups[J]. Journal of Anthropological Research, 1977, 33(4):452-473. [11] Information on http://www.orgnet.com/ [12] 1adimir Batagelj and Andrej Mrvar Pajek datasets.http://lado.fmf.uni-lj.si/pub/networks/ data/. [13] GuimerãR, Danon L, DãAz-Guilera A, et al. Self-similar community structure in a network of human interactions[j]. Phys Rev E, 2003, 68(6 Pt 2):065103. [14] Bae, S. Kim, Identifying and ranking influential spreaders in complex networks by neighborhood coreness, Physica A 395 (2014) 549 559. [15] Kitsak M, Gallos L K, Havlin S, et al. Identification of influential spreaders in complex networks[j]. Nature Physics, 2011, 6(11):888-893. [16] Bae, J., & Kim, S. (2014). Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Physica A, 395, 549 559. [17] Nelsen RB. Kendall tau Metric, encyclopedia of mathematics. Springer; 2001. ISBN: 978-1-55608-010-4. 137