Lobby index in networks

Similar documents
Analysis of an Optimal Measurement Index Based on the Complex Network

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Degree Distribution: The case of Citation Networks

arxiv: v1 [physics.soc-ph] 15 Dec 2009

Opinion Dynamics on Triad Scale Free Network

Data Mining and Analysis: Fundamental Concepts and Algorithms

Web Structure Mining Nodes, Links and Influence

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Kristina Lerman USC Information Sciences Institute

Journal of Informetrics

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

arxiv: v2 [physics.soc-ph] 15 Nov 2017

Computational Tasks and Models

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Modeling of Growing Networks with Directional Attachment and Communities

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Evolving network with different edges

The Beginning of Graph Theory. Theory and Applications of Complex Networks. Eulerian paths. Graph Theory. Class Three. College of the Atlantic

Three right directions and three wrong directions for tensor research

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Combating Web Spam with TrustRank

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design

Complex-Network Modelling and Inference

AXIOMS FOR CENTRALITY

Centrality Measures. Leonid E. Zhukov

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Self Similar (Scale Free, Power Law) Networks (I)

Justification and Application of Eigenvector Centrality

Measuring the attractiveness of academic journals: A direct influence aggregation model

Appendix of Homophily in Peer Groups The Costly Information Case

Spatial and Temporal Behaviors in a Modified Evolution Model Based on Small World Network

Complex Social System, Elections. Introduction to Network Analysis 1

. p.1. Mathematical Models of the WWW and related networks. Alan Frieze

Lecture: Local Spectral Methods (1 of 4)

1 The preferential attachment mechanism

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 3 Centrality, Similarity, and Strength Ties

Notes on Primitive Roots Dan Klain

Lecture 8 - Algebraic Methods for Matching 1

Deterministic scale-free networks

Modeling Dynamic Evolution of Online Friendship Network

arxiv: v1 [math.co] 11 Jul 2016

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

This section is an introduction to the basic themes of the course.

Data Mining Recitation Notes Week 3

Discrete-time Consensus Filters on Directed Switching Graphs

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Multivariate Distribution Models

Shortest Paths & Link Weight Structure in Networks

Complex Systems Methods 11. Power laws an indicator of complexity?

Power Laws & Rich Get Richer

Rao s degree sequence conjecture

COMPUTATIONAL INEFFICIENCY OF NONPARAMETRIC TESTS FOR CONSISTENCY OF DATA WITH HOUSEHOLD CONSUMPTION. 1. Introduction

Introduction Centrality Measures Implementation Applications Limitations Homework. Centrality Metrics. Ron Hagan, Yunhe Feng, and Jordan Bush

The L 3 (4) near octagon

Hyperbolic metric spaces and their applications to large complex real world networks II

CSI 445/660 Part 6 (Centrality Measures for Networks) 6 1 / 68

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Numerical differentiation

Robust Knowledge and Rationality

An improved approximation algorithm for the stable marriage problem with one-sided ties

Network Augmentation and the Multigraph Conjecture

arxiv: v1 [math.co] 3 Nov 2014

Modelling self-organizing networks

Growing a Network on a Given Substrate

The following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5.

Online Social Networks and Media. Link Analysis and Web Search

Móstoles, Spain. Keywords: complex networks, dual graph, line graph, line digraph.

Robust Network Codes for Unicast Connections: A Case Study

Modular Monochromatic Colorings, Spectra and Frames in Graphs

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks

Learning distributions and hypothesis testing via social learning

arxiv: v2 [math.co] 7 Jan 2016

Undirected Graphical Models

MATH 415, WEEKS 14 & 15: 1 Recurrence Relations / Difference Equations

A new centrality measure for probabilistic diffusion in network

Computing Trusted Authority Scores in Peer-to-Peer Web Search Networks

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Second Eigenvalue of the Google Matrix

Link Analysis Ranking

arxiv: v1 [math.co] 13 May 2016

Self-organized Criticality in a Modified Evolution Model on Generalized Barabási Albert Scale-Free Networks

Modeling, Analysis and Validation of Evolving Networks with Hybrid Interactions

Introduction to Data Mining

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Do not copy, post, or distribute

A New Estimator for a Tail Index

Module 1 Linear Regression

2) There should be uncertainty as to which outcome will occur before the procedure takes place.

Tutorial on Mathematical Induction

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Functional Limits and Continuity

Social Networks. Chapter 9

Some Background Material

Nonlinear Dynamical Behavior in BS Evolution Model Based on Small-World Network Added with Nonlinear Preference

Network models: random graphs

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU

Comparative analysis of transport communication networks and q-type statistics

Competition amongst scientists for publication status: Toward a model of scientific publication and citation distributions

Transcription:

arxiv:0809.0514v6 [physics.soc-ph] 26 Jan 2009 Lobby index in networks A. Korn, A. Schubert, A. Telcs May 22, 2018 Abstract We propose a new node centrality measure in networks, the lobby index, which is inspired by Hirsch s h-index. It is shown that in scale free networks with exponent α the distribution of the l-index has power tail with exponent α(α+1). Properties of the l-index and extensions are discussed. Efficient communication means high impact (wide access or high reach) and low cost. This goal is common in communication networks, in society and in biological systems. In the course of time many centrality measures proposed to characterize a node s role, position, or influence in a network but none of them capture the efficiency of communication. This paper is intended to fill this gap and propose a new centrality measure, the lobby index. Hirsch [?] proposed the h-index: the number of papers with citation number h, as a useful index to characterize the scientific output of a researcher. Barabási & al [?] devised a very simple network model which has several key properties: most importantly the degree distribution has a powerlaw upper tail, the node degrees are independent, and typical nodes are close Department of Telecommunications and Media Informatics, University of Technology and Economics Budapest, 1117 Budapest, Magyar tudósok körútja 2, korn.andras@tmit.bme.hu Institute for Research Organisation, H-1245 Budapest, P.O.Box 994, schuba@iif.hu Department of Computer Science and Information Theory, University of Technology and Economics Budapest, 1117 Budapest, Magyar tudósok körútja 2, telcs@szit.bme.hu 1

to each other. Schubert [?] used the h-index as a network indicator, particularly in scale-free networks. This paper is devoted to the characterization of network nodes with a h-index type measure. Definition 1 The l-index or lobby index of a node x is the largest integer k such that x has at least k neighbors with a degree of at least k. (See also (1).) In what follows some properties of the lobby index are investigated; it is shown that in Scale Free (SF) networks, with exponent α, the distribution of the l-index has a fat tail with exponent α(α+1). Furthermore the empirical distribution of the l-index in generated and real life networks is investigated and some further extensions are discussed. 1 Centrality measures Freeman s prominent paper [?] (1979) pointed out that: Over the years, a great many measures of centrality have been proposed. The several measures are often only vaguely related to the intuitive ideas they purport to index, and many are so complex that it is difficult or impossible to discover what, if anything, they are measuring. It is perhaps worth noting that research in this field dates back to Bavelas [?] (1949). At the time of Freeman s paper most centrality measures were equivalents or modifications of the three major and widely accepted indexes, the degree (cf.[?]), closeness (cf. [?] and [?]) and betweenness (cf. [?]) centrality (see also Borgatti [?]). As time passed, many new centrality measures were proposed. After years of research and application, the above three and eigenvector centrality (a variant of which computer scientists call PageRank [?] and Google uses to rank search results) can be said to have become a standard; the others are not widely used. The historical three and eigenvector centrality are thus the conceptual base for investigating centrality behavior of nodes and full networks. Notwithstanding Freeman s wise warning the present paper proposes the lobby centrality (index) in the belief that Hirsch s insight into publication activity(which produces the citation network) has an interesting and relevant message to network analysis in general. 2

The diplomat s dilemma It is clear that a person has strong lobby power, the ability to influence people s opinions, if he or she has many highly connected neighbors. This is exactly the aim of a lobbyist or a diplomat [?]. The diplomat s goal is to have strong influence on the community while keeping the number of his connections (which have a cost) low. If x has a high lobby index, then the l-core L(x) (those neighbors which provide the index) has high connectivity (statistically higher than l(x), see (6) and the comment there). In this sense, the l-index is closely related to the solution of the diplomat s dilemma. Communication networks Research of communication networks and network topology are in interaction. Node centrality measures are essential in the study of net mining [?], malware detection [?], in reputation based peer topeer systems [?], delay tolerant networks [?] andothers (see [?], [?] andthe references therein). We expect that in the case of social and communication networks (some of which are also based on social networks) the lobby-index is located between the bridgeness [?], closeness, eigenvector and betweenness centrality. Based on this intermediate position of the lobby index we expect that it can be a useful aid in developing good defence and immunization strategies for peer to peer networks as well as help create more efficient broadcasting schemes in sensor networks and marketing or opinion shaping strategies. The distribution of the l-index Let us consider scale-free networks and assume that the node degrees are independent. The degree is denoted by deg(x) for nodes and the l-index is defined as follows. Let us consider all y i neighbors of x so that deg(y 1 ) deg(y 2 )...; then, l(x) = max{k : deg(y k ) k}. (1) Theorem If the vertex degrees are independent and P(deg(x) k) ck α for all nodes x, then for all nodes x, 1 The proof is provided in the Appendix. P(l(x) k) k α(α+1) (2) 1 Here and in what follows a n b n means that an b n c as n and a n b n means 1 that there is a C > 1 such that for all n, C αn b n C. 3

The Hirsch index The original Hirsch index is based on a richer model: author paper and paper citing paper links. Let x be a randomly chosen authorofthescientificcommunityunderscrutinyandn = n(x)isthenumber of his/her papers (either in general or within a defined period). Let y i denote the individual papers (where i = 1,...n,) and c(y i ) their citation score (in decreasing order), so that c(y 1 ) c(y 2 )... c(y n ). h(x) is the Hirsch index of x : h(x) = max{k : c(y k ) k}. Assume that the paper productivity has an α-fat tail: G P l = P(n(x) l) cl α and the citation score has a β-fat tail: G Q l = P(c(y) l) cl β. Along the lines of the argument that led to (2) one can see that h has an α(β +1)-fat tail [?]: P(h(x) k) k α(β+1) (3) How good is an l-index of k? If a node x of degree n has an l-index of k, Glänzel s [?] observation provides a preliminary assessment of this value: l(x) cdeg(x) 1 α+1 (4) where α is the tail exponent of the degree distribution. Consequently a lobbyist is doing a good job of solving the diplomat s dilemma if l(x) deg(x) 1 α+1. On the other hand our result shows that l(x) k means that x belongs to the top 100c α k α(α+1) percent of lobbyists. The lobby gain The performance of a lobbyist is indicated by a measure called the lobby gain. The lobby gain shows how the access to the network is multiplied using a link to the l-core. Let us use the notation D i (y) = {z : d(x,y) = i} and set D2 L (x) = y L(x) D 1 (y)\[d 1 (x) {x}] then the number of second neighbors reachable via the l-core is deg L 2 (x) = D L 2 (x) and the lobby gain is defined as Γ l (x) = degl 2 (x). (5) l(x) The lobby gain Γ l (x) is much larger than one if a typical link to the l-core provides a lot of connections to the rest of the network for x via that link. 4

It can be shown (see [?]) that the number of second neighbors reachable via the l-core (with multiplicity) is l(x) 2. The degree distribution within the l-core The influential acquaintances of a given lobbyist follow a fat tail distribution provided the underlying network is SF. In other words if y L(x) and l = k then the truncated distribution (by k) of the degree distribution of y again follows a fat tail distribution: for m > k > 0 ( m ) α. P(deg(y) m y L(x) and l = k) c (6) k Let us note that this conditional or truncated distribution has a higher expected value than the original one. 2 Network examples The analysis of different networks received particular attention in the last decades. The research goals and tools vary greatly. Here we regress to the roots and consider some classic networks and study the distribution of their lobby index. Generated scale-free networks We have generated 50 20000-node Barabási (BA) networks [?] with 10 new links each step, starting with 10 initial nodes. The degree distribution passed the preliminary test and has α = 1.96, i.e. a 1.96-fat tail. As Figure 1. shows, the empirical distribution of the lobby index has η = 5.14 while the theory predicts η = 5.76. 5

The log-log plots for the distribution of l-index We have used the generalized Barabási model [?] which can provide arbitrary α > 1. We generated 50 graphs of 10000 nodes with the proposed algorithm and obtained networks with α = 1.9186 in average, which would imply η = 5.60; we observed η = 5.28. Let us remark that the estimate of the tail exponent of fat tail distributions has a sophisticated technique [?] superior to the line fit on the log-log scale. The careful analysis and application of these methods to the l-index will be published elsewhere. The AS level graph The Autonomous System (AS) level of the Internet infrastructure has already been investigated in depth (c.f. [?] and its bibliography). It turned out that it not only has a scale-free degree distribution but displays the rich club phenomenon as well. High degree nodes are more densely interlinked than expected in a BA graph. The standard choice for AS a source of sample data is the CAIDA [?] project. We determined the exponent of the tail of the degree distribution and compared it with the exponent of the tail of the empirical distribution of the l-index. We found that α = 1.61 and η = α(α+1) = 4.21 and η = 4.14 is estimated from the empirical distribution of l. 6

The IG model Mondragón&al[?] proposed a modification of the Barabási network model, the Interactive Growth (IG) model to generate scale-free networks which exhibit the rich club behavior. In each iteration, a new node is linked to one or two existing nodes (hosts). In the first case the host node is connected to two additional peer nodes using the preferential attachment scheme while in the latter case only one of the involved (randomly chosen) hosts is connected to a new peer. We implemented this algorithm and again compared the exponents extracted from sample data. In this case the network sizewas3000andtheprobabilityofonehostwas0.4andoftwohostswas0.6. Again the log-log fit of the degree distribution tail yielded α = 1.23,η = 2.74 and η = 2.45 given by the empirical distribution of l. The place of the lobby index among other centrality measures As we already indicated above the lobby index lies somewhere between the closeness centrality (cl), betweenness (bw) and eigenvector (ev) centrality. Strong correlation with degree centrality is out of the question in the light of (4). In order to gain a better picture on the behavior of the lobby index we determined the Spearman correlation between these centrality measures in the AS graph. l cl bw ev l 1.652.768.604 cl.652 1.500.972 bw.768.500 1.479 ev.604.972.479 1 Table 1. Spearman rank correlation The correlations in Table 1. indicate that the l-index contains a well balanced mix of other centrality measures; the l-index is slightly closer to the three classical centralities than they are to each other (the quadratic mean of the three correlations, in boldface, is 0.638 while the quadratic mean of of the correlations, in italic, with the l-index is 0.678). The Kendall correlations of the investigated centralities have been calculated and yielded a very similar picture. For biological networks the Spearman correlation between the closeness and eigenvector centrality is high(c.f. [?]); high Pearson correlation can be observed on other networks as well (c.f. [?]). One centrality measure can be used to approximate the other, which is not the case with the l-index for the AS graph, but may happen in other types of networks. This will 7

save computation time given the simplicity of the calculation of the l-index. Freeman s paper [?] as well as [?] and [?] are calls for a further analysis of centrality measures and the l-index on different types of networks. Conclusion A new centrality measure, the l-index is proposed and examined. It is shown that the distribution of the l-index has α(α+1)-fat tail of SF networks with exponent α. There is a good match between empirical observations (collected in Table 2) and the theoretical result. In this case the aim of the empirical results was not to verify the theory but to emphasize that the investigated networks behave in the expected way with respect to the lobby index. α η η BA graph 1.96 5.76 5.14 generalized BA 1.92 5.60 5.28 AS 1.61 4.21 4.14 IG 1.23 2.74 2.45 Table 2. The tail exponents of networks The lobby index is placed on the map along other centrality measures. Some further extensions and properties are discussed as well: the relation to the diplomat s dilemma is investigated and the lobby index is demonstrated to be a good performance measure for lobbyists. Acknowledgments We thank Tamás Nepusz who kindly provided the calculations of the classical centralities and their correlations in Table 1. for the AS graph and gave same valuable remarks on the manuscript. We also thank Fülöp Bazsó for his useful comments on the draft. Thanks are due to Tommaso Colozza and Erik Bodzsár who gave very valuable numerical calculations and simulation results for this work. The authors were partially supported by the High Speed Network Laboratory of the Budapest University of Technology and Economics. The authors also wish to thank the referees for their helpful comments and suggestions. 3 Appendix In what follows we provide a rigorous derivation of (2) 2. Let us use the notation l k = P(l(x) = k) for the distribution of the l-index, and G k = 2 Henceforth c will be an arbitrary positive constant unless specified otherwise. Its value may change from occurrence to occurrence. 8

P(deg(x) k) = 1 F k. l k = = P(l(x) = k,deg(x) = k +l) P(l(x) = k deg(x) = k +l)p(deg(x) = k +l) Partition of unity and conditional probability is used (Bayes Theorem). As a result we have to investigate what is the probability that a node has k links, each has degree k and l other links with degree not higher then k given that it has k +l links in total. That criteria makes exactly l(x) = k. First we develop a lower estimate for l k. l k c k 1k αk ( ) k +l (k +l) (α+1) (1 c1 k α) l l We estimate l k using 1 c 1 k α e c 1k α and ( ) k+l l l k l k c ck 1 k αk c ck 1 k αk 0 l k (α+1) e c 1 l k α x k (α+1) e c 1 x k α dx cc k αkγ(k (α+1)+1) 1k = ck α2γ(k α) ( k α ck α2(k α)k α 1/2 e k α+θ/12(k α) k k+1/2 e k+θ/12(k) ck α(α+1) 1. c 1 ) k (α+1)+1 where the Stirling formula has been used for Γ(k α) and as well and 0 < Θ < 1. The upper estimate works similarly as follows. l k = (G k ) k ( ) k +l (k +l) (α+1) (F k+1 ) l. l 9

l k c ck 1 k αk c ck 1k αk = c ck 1k αk = c ck 1 k αk c ck 1 k αk c ck 1k αk l=k l=k Introducing a new variable one obtains l k c ck 1k αk [( k = c ck 1 k αk 0 k+1) αc1 k α 0 (k +l) k (α+1) e c 1 l (k+1) α (k +l) k (α+1) e c 1 l k α( k k+1) α l k (α+1) e c 1 (l k) ( k k+1) α k α l k (α+1) e c 1 l k α( k k+1) α l k (α+1) e c 1 l k α( k k+1) α x k (α+1) e ( k x k (α+1) e ( k = cc k αkγ(k (α+1)+1) 1k ck α2γ(k α) k+1) α c 1 x k α k+1) α c 1 x k α dx dx ] k+α y k (α+1) e y dy ( 1+ 1 k ) k ck α2(k α)k α 1/2 e k α+θ/12(k α) k k+1/2 e k+θ/12(k) ck α(α+1) 1 0 ( ) k α k α [( ) α ] k α k +1 whereattheendthestirlingformulashavebeenusedasinthelowerestimate. c 1 k 10

-6 3 3.5 4 4.5-7 -8-9 -10-11 -12 y = -5.1447x + 11.003 R 2 = 0.9213-13

-2.8 1.5 2 2.5 3-3.8-4.8-5.8-6.8-7.8 y = -2.497x + 0.9611 R 2 = 0.9243-8.8-9.8

0 2.5 3 3.5-1 -2-3 -4-5 -6-7 -8-9 -10 y = -5.2813x + 11.4 R 2 = 0.9975

-3-4 2.5 3 3.5-5 -6-7 y = -2.6427x + 3.8965 R 2 = 0.8485-8 -9