Lognormal distribution in the digg online social network

Similar documents
Continuous Random Variables

1 Online Learning and Regret Minimization

7.2 The Definite Integral

1 Probability Density Functions

Chapter 5 : Continuous Random Variables

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Lecture 3 Gaussian Probability Distribution

8 Laplace s Method and Local Limit Theorems

Review of Calculus, cont d

New Expansion and Infinite Series

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Entropy and Ergodic Theory Notes 10: Large Deviations I

Recitation 3: More Applications of the Derivative

Chapter 0. What is the Lebesgue integral about?

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

Math 1B, lecture 4: Error bounds for numerical methods

3.4 Numerical integration

The Regulated and Riemann Integrals

Tests for the Ratio of Two Poisson Rates

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

The steps of the hypothesis test

1B40 Practical Skills

Definition of Continuity: The function f(x) is continuous at x = a if f(a) exists and lim

221B Lecture Notes WKB Method

Math 8 Winter 2015 Applications of Integration

Review of basic calculus

Math 31S. Rumbos Fall Solutions to Assignment #16

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA. 0 if t < 0, 1 if t > 0.

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Part I: Basic Concepts of Thermodynamics

Lecture 14: Quadrature

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Travelling Profile Solutions For Nonlinear Degenerate Parabolic Equation And Contour Enhancement In Image Processing

Theoretical foundations of Gaussian quadrature

Riemann Sums and Riemann Integrals

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Riemann Sums and Riemann Integrals

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), )

Operations with Polynomials

du = C dy = 1 dy = dy W is invertible with inverse U, so that y = W(t) is exactly the same thing as t = U(y),

Section 5.1 #7, 10, 16, 21, 25; Section 5.2 #8, 9, 15, 20, 27, 30; Section 5.3 #4, 6, 9, 13, 16, 28, 31; Section 5.4 #7, 18, 21, 23, 25, 29, 40

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Heat flux and total heat

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Statistical Physics I Spring Term Solutions to Problem Set #1

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION

13: Diffusion in 2 Energy Groups

The Wave Equation I. MA 436 Kurt Bryan

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

A SHORT NOTE ON THE MONOTONICITY OF THE ERLANG C FORMULA IN THE HALFIN-WHITT REGIME. Bernardo D Auria 1

Predict Global Earth Temperature using Linier Regression

5.7 Improper Integrals

1.2. Linear Variable Coefficient Equations. y + b "! = a y + b " Remark: The case b = 0 and a non-constant can be solved with the same idea as above.

1.9 C 2 inner variations

Student Activity 3: Single Factor ANOVA

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Math 42 Chapter 7 Practice Problems Set B

MATH SS124 Sec 39 Concepts summary with examples

The Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve.

Math 426: Probability Final Exam Practice

221A Lecture Notes WKB Method

Non-Linear & Logistic Regression

ODE: Existence and Uniqueness of a Solution

p-adic Egyptian Fractions

Module 6: LINEAR TRANSFORMATIONS

2008 Mathematical Methods (CAS) GA 3: Examination 2

IN GAUSSIAN INTEGERS X 3 + Y 3 = Z 3 HAS ONLY TRIVIAL SOLUTIONS A NEW APPROACH

A PREY-PREDATOR MODEL WITH COVER FOR THE PREY AND AN ALTERNATIVE FOOD FOR THE PREDATOR AND CONSTANT HARVESTING OF BOTH THE SPECIES *

A New Statistic Feature of the Short-Time Amplitude Spectrum Values for Human s Unvoiced Pronunciation

4.4 Areas, Integrals and Antiderivatives

Conservation Law. Chapter Goal. 5.2 Theory

Shear and torsion interaction of hollow core slabs

Week 10: Line Integrals

Name Solutions to Test 3 November 8, 2017

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

Green s functions. f(t) =

Math 135, Spring 2012: HW 7


Applicable Analysis and Discrete Mathematics available online at

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019

Generalized Fano and non-fano networks

Definite integral. Mathematics FRDIS MENDELU

Section 6.1 INTRO to LAPLACE TRANSFORMS

Probability Distributions for Gradient Directions in Uncertain 3D Scalar Fields

CHM Physical Chemistry I Chapter 1 - Supplementary Material

Numerical integration

1 1D heat and wave equations on a finite interval

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Calculus of Variations

Transcription:

Eur. Phys. J. B 83, 251 261(2011) DOI:10.1140/epjb/e2011-20124-0 Lognorml distribution in the digg online socil network P. Vn Mieghem, N. Blenn nd C. Doerr

Eur. Phys. J. B 83, 251 261(2011) DOI: 10.1140/epjb/e2011-20124-0 Regulr Article THE EUROPEAN PHYSICAL JOURNAL B Lognorml distribution in the digg online socil network P. Vn Mieghem,N.Blenn,ndC.Doerr Delft University of Technology, Fculty of EEMCS, 2628 CD Delft, Netherlnds Received 16 Februry 2011 / Received in finl form 2 June 2011 Published online 15 September 2011 c EDP Sciences, Società Itlin di Fisic, Springer-Verlg 2011 Abstrct. We nlyse the number of votes, clled the digg vlue, which mesures the impct or populrity of submitted informtion in the Online Socil Network Digg. Experiments over five yers indicte tht the digg vlue of story on the first frontpge follows closely lognorml distribution. While the lw of proportionte effect explins lognorml behvior, the proportionlity fctor in tht lw is ssumed to hve constnt men, wheres experiments show tht decreses linerly with time. Our hypothesis, the probbility tht user diggs (votes) on story given tht he observes certin digg vlue m equls m, cn explin observtions, provided tht the popultion of users tht cn digg on tht story is close to Gussin. 1 Introduction In recent yers, online socil networks (OSNs) hve experienced n explosive growth. With hundreds of such services vilble nd subscribed user bse of severl hundred million people, they hve significntly ltered the wy how people spend their time [1] nd how they serch for content [2]. One specific kind of OSNs ddressed in this pper re socil bookmrking services nd, in prticulr, socil news ggregtors such s digg.com, delicious.com or reddit.com. In this type of OSN, users re shring nd commenting on informtion (such s bookmrks, opinions, news, etc.), further clled stories. The community will vote (n ctivity referred to s to digg ) onthesubmittedstories,where the sum of ll votes on story, which is clled the digg vlue of story, is publicly displyed s some rnking informtion nd therefore reflects the impct of story. The life-time of story is lso studied nd used in [3] to predict the populrity of online content. Collbortive tgging in socil medi, s in digg.com, hs been overviewed in [4] from sttisticl physics point of view. Empiricl dt (see e.g. [5,6]) illustrte tht the digg vlue of n rbitrry story follows lognorml distribution s shown in Figure 1. For over hundred yers, lognorml distributions hve been observed in mny different res [7 9], from economy to biology nd now in OSNs. Recent work [10] further demonstrted tht lognorml distribution my further universlly chrcterize dtsets which hve previously been thought of s typicl instnces of power-lw. The fscinting underlying process tht symptoticlly genertes lognorml distribution is the lw of proportionte effect, which is briefly reviewed in Appendix A. Erlier, Wu nd Hubermn [11] hve rgued tht the digg vlue is, indeed, generted by the lw of e-mil: p.f..vnmieghem@tudelft.nl proportionte effect when geing effects re tken into ccount. However, Section 6 questions their rguments. In this rticle, we propose nother process in Section 3 tht leds to lognorml distribution: our counting process voids symptotic limits, but in return requires tht the user popultion is pproximtely Gussin. The new insight is the probbilistic reltion (12) in Section 3.2 of whether user will digg on story, given tht he observes the digg vlue. The reminder of the pper tries to relte both governing processes with experiments. The mechnics of Digg nd the dt extrction method from the Digg OSN, observed over period of five yers, re described in [6,12]. Section 7 concludes our explortion: the lw of proportionte effect lone seems insufficient to explin the experimentl evidence, while our counting process is ble to explin most findings, provided the user popultion is normlly distributed. The key quntity is the proportionlity fctor. Experimentsexhibitroughly linerly decresing with time. The verge of the proportionlity fctor in the lw of proportionte effect (Appendix A) is, however, constnt. Our finite counting process nd the hypothesis (12) gree with experiments: decreses linerly in the number of users tht could digg on story nd pproximtely linerly with time. Our results my hve broder pplicbility, s [13] for exmple lso find log-norml behvior in the populrity of movies t vrious levels, nd hypothesize selfreinforcing process from explicit socil recommendtions s potentil cuse. 2Thediggvlueofstory Let D s denote the digg vlue, the number of diggs, on story s. Ifwessumethtusercnonlydiggonceon story, then N s D s = 1 {user j diggs on s} (1) j=1

252 The Europen Physicl Journl B Fig. 1. (Color online) The probbility density function (pdf) of the digg vlue X d of stories on the first five frontpges in Digg. Ech pdf is fitted by lognorml (2) nd the corresponding prmeters µ nd σ re shown in the legend. where N s is the totl number of users tht hve the opportunity to digg on story s. Theindictorfunction 1 x equls one if the event x is true, else it is zero. 2.1 Experiments Empiricl dt illustrted in Figure 1 show tht only the digg vlue of stories on the first frontpge is well fitted by lognormldensityfunction[14],p.57 f lognorml (u) = 1 uσ 2π exp [ (log u µ)2 2σ 2 ]. (2) The probbility density function (pdf) of the other frontpges decys in the left- nd right til fster thn lognorml distribution. The decrese fster thn lognorml is likely due to the geing of the story (explined in Sect. 2.2), the decresing number of users tht visit subsequentil frontpges nd the convolution effect (tht forces ny distribution towrds Gussin distribution). The ltter is consequence of the definition D s = D s0 + D s1 + D s2 +...,whered sk is the sum of the diggs on story s while on frontpge k =1, 2,...nd D s0 is the digg vlue of story s just before it is promoted from the upcoming section to the first frontpge. Although D sk nd D sm re not independent, we infer tht they re only wekly dependent. 2.2 Users digg dependently It is conceivble tht the probbility tht the j-th user, further clled user j, diggs on story s is dependent on the digg vlue tht he observes, when his eye ctches the story s. Thediggvluethtuserj sees when first encountering the story s is, for j>1, the sum of ll digg vlues of users tht dugg on the story before him/her X s (j) = 1 {user k diggs on s}. (3) The eventul digg vlue of story s is D s = X s (N s +1), while X s (1) = 0 nd E [X s (2)] = p 1,theprobbilitytht the first user diggs on story s. Wemyrguethtthe digging probbility of user is influenced by the story s digg vlue: if the number of diggs is low, user hs n priorifeelingthtthestoryisnotsottrctivendhis motivtion to digg is lowered, while the opposite occurs when high number of diggs is observed. Another phenomenon is geing: fter some time, the novelty of the story my diminish, especilly if the story contins news or temporry informtion. In tht cse, the overll motivtion to digg on the story, independent of the digg vlue, decreses. Wu nd Hubermn[11]explintht the story s growing ttrction, mesured vi its incresing digg vlue, is counterblnced by the geing of the story. As discussed in Section 6, the nlysis nd ssumptions of geing effects in [11] re debtble. Moreover, the user popultion tht visits the k-th frontpge for k>4isdecresing so quickly tht dt is scrce.therefore,inthe sequel, geing is not considered, nor time-dependent effects tht ffect the digg vlue. We lso ignore primcy nd recency effects [15] tht would predict tht the top most nd bottom most elements would ttrct lrger ttention, simply becuse of their position in long list. For exmple, the Digg feture, Top Stories in All Topics,

P. Vn Mieghem et l.: Lognorml distribution in the digg online socil network 253 displys 10 stories in n bbrevited formt on ech frontpge, so tht the list cn be seen in one scn by ny visitor (15 items on VGA screen require one scrolling ction). Due to their specil plcement lone (irrespective of their content), Top Stories re likely to receive higher thn norml number of diggs. In the sequel, we do not distinguish between stories. Our min interest is to figure out whether the motivtion to digg on story is linerly dependent on tht story s diggvlue. If liner dependence holds, lognorml distribution is the nturl consequence by the lw of proportionte effect (Appendix A). In order to keep externl fctors s constnt s possible nd guided by the experiments in Figure 1, we minly concentrte on the number of digg vlues of story s long s it is on the first frontpge. 3Agenerldescriptionforthevergedigg vlue We tke dvntge of the fct tht the digg vlue, both X s (j) s D s, is sum of indictors: Bernoulli rndom vribles tht re either zero or one. Since E [ 1 {user k diggs on s} ] = Pr[user k diggs on s] nd since the expecttion opertor is liner (irrespective of dependencies between rndom vribles in the sum), tking the expecttion of (3) yields E [X s (j)] = Pr [user k diggs on s]. By the lw of totl probbility [14], we hve tht m=0 Pr [user k diggs on s] = Pr [user k diggs on s X s (k) =m]pr[x s (k) =m]. When writing the conditionl probbility s Pr [ user k diggs on s X s (k) =m] =g k (m) (4) where g k (x) isnon-negtivefunctionthtmpsx to the intervl [0, 1], we obtin Pr [user k diggs on s] = m=0 g k (m)pr[x s (k) =m] = E [g k (X s (k))] (5) where we hve used the definition of the expecttion of function of rndom vrible [14], p. 17. Thus, we rrive t the generl reltion for the verge number of diggs tht the jth user sees E [X s (j)] = E [g k (X s (k))] (6) from which we obtin the difference eqution E [X s (j)] E [X s (j 1)] = E [g j 1 (X s (j 1))]. (7) The verge totl number D s = X s (N s +1) of diggs on story s by popultion of N s users is N s E [D s ]= E [g k (X s (k))]. We illustrte in Appendix B tht, for positively correlted users s in Digg, the vrince Vr[X s (j)] cn be smll. In fct, the stronger the correltion in the digging behvior between users, the smller the vrince. A smll vrince Vr[X s (j)] implies tht the men E [X s (j)] is good pproximtion for the rndom vrible X s (j). So fr, we hve implicitly ssumed tht N s is constnt. However, N s is, in fct, lso rndom vrible, denoting the number of users tht hs discovered story s within certin time intervl. If N s is rndom vrible, then the bove computtion is vlid for the conditionl expecttion Y s = E [D s N s ], which is the rndom vrible equl to the verge number of diggs on story s given tht the totl number of users equls N s. We believe tht the r.v. Y s is pproximtely mesured. Ech story s hs number D s of diggs tht re recorded, while the totl popultion of potentil diggers is N s.hence,wehvesequenceofstorieswiththeircorresponding diggs nd popultion {(D s, N s )} s 1.Clerly, it follows from the definition (1) tht D s N s.ifthe informtion bout the popultion is omitted, intuitively one feels tht D 1 cnnot be compred to D 2,becuseitis obvious tht if n 2 = n1 2,thenthenumberofdiggsD 2 cnnot be higher thn D1 2.Yet,lldiggvluesofthedifferent stories, the sequence {D s } s 1,replcedin1histogrm. In fct, by doing so, we compre different processes, while histogrmshouldonlyrecordtheoutcomesofsme stochstic process nd ech outcome should be independent of ll others. Therefore, we believe tht the histogrm pproximtes the conditionl rndom vrible D s N s by its best guess, the estimted vlue E [D s N s ]=Y s. The reminder consists of choosing the function g k (x), defined in (4). 3.1 A simple cse: proportionlity A simple mthemticl choice is the liner function g k (x) = k x + b k nd the generl difference (7) becomes E [X s (j)] = (1 + j 1 ) E [X s (j 1)] + b j 1. After m itertions of this difference eqution, we obtin E [X s (j)] = E [X s (j m)] + m l=1 m (1 + j k ) l 1 b j l (1 + j k ) (8)

254 The Europen Physicl Journl B nd for j = N s +1sinceD s = X s (N s +1), E [D s ]=E [X s (N s +1 m)] + N s l=n s+1 m b l N s k=l+1 N s k=n s+1 m (1 + k ). (1 + k ) With the initil condition E [X s (2)] = p 1,wehve N s E [D s ]=p 1 k=2 N s (1 + k )+ l=2 b l N s k=l+1 (1 + k ). (9) When choosing ll k = nd ll b k = b, thenexpression (8) simplifies to E [X s (j)] = E [X s (j m)] (1 + ) m + b = such tht, for j = N s +1, E [D s ]= m (1 + ) l 1 l=1 ( E [X s (j m)] + b ) (1 + ) m b (10) ( E [X s (N s +1 m)] + b ) (1 + ) m b. (11) Hence, the verge digg vlue E [D s ] = E [D s1 ] of story just before it disppers from the frontpge cn be expressed s function of the verge digg vlue E [X s (N s +1 m)] = E [D s0 ]justfterthestoryppers on the frontpge 1.Duringthetimeonthefrontpge, precisely m users hve hd the opportunity to digg on tht story. Suppose tht the number of users m in (11) is Gussin N ( µ, σ 2), then the rndom vrible Y s1 = E [D s1 m] islognorml,providede [X s (N s +1 m)] = E [D s0 ]isknownconstnt.indeed, [( Pr [Y s1 x] =Pr E [D s0 ]+ b ) (1 + ) m b ] x log =Pr m = 1 σ 2π x+ b E[D s0]+ b log (1 + ) log x+ b E[D s0 ]+ b log(1+) ] (t µ)2 exp [ 2 σ 2 dt. 1 Similrly, we cn express the verge number on the k-th frontpge by considering E [D s] = E [D sk ], E [X s (N s +1 m)] = E [D s, ]ndm is the number of users tht hd the opportunity to digg on the story during its sty on the k-th frontpge. such tht f Ys1 (x) = d Pr [Y s1 x] dx 1 = ( x + b ) σ log (1 + ) 2π [ exp (log ( ( x+ ) b log E [Ds0 ]+ b 2 σ 2 log 2 (1 + ) ) ] µ log (1+)) 2 which is recognized from (2) s lognorml distribution in u = x + b with prmeters µ =log( E [D s0 ]+ b ) + µ log (1 + ) nd σ = σ log (1 + ). Hence, ssuming Gussin popultion, the digg vlue D sk of the story on frontpge k cn be described by lognorml distribution, possibly with different prmeters nd b when geing is tken into ccount. Clerly, the expressions simplify considerbly if b =0. In the cse b =0,ourderivtionisinlinewiththelwof proportionte effect s shown in Section 4. However, the prmeters of the lognorml distribution re different. 3.2 Interprettion The choice g j (x) = j x + b j implies, s follows from (4), tht Pr [user j diggs on s X s (j) =m] = j m + b j. This generl liner form dependent on user j cn be useful to specify the digging behvior of friends nd non-friends of the origintor of the story s. Incsethtuserj is friend of the origintor of the story s,he/she my be insensitive to the digg vlue X s (j) =m tht he/she observes when first encountering the story, becuse his/her friendship reltion with the origintor outweighs the judgements of other diggers. Hence, j =0ndb j = b 1, constnt vlue tht expresses the fith or depth of the friendship reltion of friends in the origintor. On the other hnd, non-friends hve lmost no fith in the origintor, but in their peers, such tht j = 1ndb j =0.Consequently, the generl liner reltion(9)becomes E [D s ]=p 1 (1 + ) (N non-friends 2) + b (N friends 2) where N friends is the number of friends of the origintor of the story s nd N s = N non-friends + N friends. Assuming tht both popultions of friends nd nonfriends re Gussin-like distributed (with possibly different men nd vrince), this expression shows tht E [D s N friends, N non-friends ]isthesumofgussinnd lognorml,whichisginhevy-tiled.sincen friends is usully smller thn N non-friends,lognormldistribution is expected to dominte [6]. When returning to the simplest liner cse for g k (x) = j x + b j,where j = nd b j =0,then Pr [ user j diggs on s X s (j) =m] =m (12)

P. Vn Mieghem et l.: Lognorml distribution in the digg online socil network 255 implying tht >0 is ssumed to be constnt for ech user j. Thus, 1 N s in order tht ll conditionl probbilities, for ech j N s,resmllerthnorequlto1. Thus, (12) illustrtes tht decreses with N s,whichis in line with Figure 4. Furthermore, reltion (5) reduces to Pr [user k diggs on s] =E [X s (k)]. If the user is the k-th user, counted since the story is on the j-th frontpge, then, with (10), we hve tht Pr [user k diggs on s] =E [D s,j 1 ](1+) k which shows n exponentilly incresing digging probbility in k. 4Applictionofthelwofproportionte effect to the digg vlue The lw of proportionte effect, explined in Appendix A, is more generl thn the nlysis in Section 2.2: it pplies to the rndom vrible X s (n) instedofthemen E [X s (n)]. On the other hnd, it is n symptotic result (n ), which implicitly ssumes rpid convergence towrds Gussin in order to observe for finite n lredy the lognorml distribution. In our cse, where the digg vlue X j = X s (j +1) is sum of indictors s defined in (3), we hve tht X s (j +1) X s (j) =1 {user j diggs on s}. According to the lw of proportionte effect X j = (1 + α j ) X j 1,wededucetht 1 {user j diggs on s} = α j X s (j). (13) Since α j nd X s (j) reindependentsssumedinthe lw of proportionte effect, tking the expecttion leds to Pr [user j diggs on s] =E [α j ] E [X s (j)] nd we conclude from (5) with g k (x) = j x tht E [α j ]= j.thelwofproportionteeffectssumesthtllrndom vribles α j re i.i.d. with men E [α], such tht E [α j ]=E[α] nd j = for ech j. Hence,theconditionl probbility (12) is mnifesttion of the lw of proportionte effect nd provides nother wy to verify proportionl behvior. We will now compute the prmeter µ = E [log (1 + α)] in the lognorml limit lw for X n = X s (n +1) for lrge n (see Appendix A). Tking the m-th power of (13) yields 1 {user j diggs on s} = α m j X m s (j) from which it follows tht Pr [user j diggs on s] =E [ α m ] j E [X m s (j)] = E [α m ] E [X m s (j)]. Since E [X s (j)] > 1fornottoosmllj, wemyconclude tht E [α m ] < 1forllm 1. Hence, the series [ ] E [log (1 + α)] = E ( 1) m 1 α m m = m=1 m=1 ( 1) m 1 E [α m ] m =Pr[userj diggs on s] m=1 ( 1) m 1 me [X m s converges nd the prmeter µ = E [log (1 + α)] equls ( 1) m 1 µ =Pr[userj diggs on s] me [Xs m (j)]. m=1 (j)] The ltter lternting series with decresing terms is bounded by 1 E [X s (j)] 1 2E [Xs 2 (j)] < Invoking the bounds yields nd Pr [user j diggs on s] E [X s (j)] m=1 ( 1) m 1 me [Xs m (j)] < 1 E [X s (j)]. ( 1 E [X ) s (j)] 2E [Xs 2 (j)] < <µ Pr [user j diggs on s] E [X s (j)] µe [X s (j)] < Pr [user j diggs on s] < E [X s (j)] µ ( 1 E[Xs(j)] 2E[X 2 s (j)] ). Hence, we find tht µe [X s (j)] < 1. However, experimentl results (Fig. 1) indicte tht both µ 5.2 nd E [X s (j)] re lrger thn 1, contrdicting µe [X s (j)] < 1. The nlysis indictes tht the lw of proportionte effect only sets in when certin vlue of X s (j), or equivlently j, isreched. This conclusion is supported by the nlysis in Section 3.1 (nd Fig. 3 below). Using (11), ssuming tht E [X s (N s +1 m)] = D s0,thediggvlueofstorys just before it is promoted to the frontpge, is known constnt nd m is normlly distributed s N ( µ, σ 2),we obtin lognorml with men µ =log ( D s0 + b ) + µ log (1 + ) nd 1 m.confiningtothecsewhereb =0ndusing the verge µ s good estimte for m such tht = 1 µ, then ( µ log (1 + ) µ log 1+ 1 µ ) ( 1 = µ µ 1 ( )) 1 2 µ 2 + O µ 3 =1 1 2 µ + O ( 1 µ 2 ) < 1

256 The Europen Physicl Journl B Fig. 2. (Color online) The time t of story s on which user hs dug on tht story. Five typicl stories re shown. The horizontl lines indicte the durtion when the story ws on the frontpge nd the time t seems to be pproximtely liner with the the number of digging users while the story is on the frontpge. A story stys, on verge, 2.4 h on the first frontpge, on which most Digg users re ctive. nd µ log (D s0 ) 1. Since µ 5.2 infigure1,wededucethtd s0 e 4.2, which grees with experiments [6]. 5Theincrementlincreseofthediggvlue with time If the lw of proportionte effect is correct, then it follows from (A.2) tht X n X m = X m n (1 + α j ) 1. j=m+1 For not too smll m, (13)showsthtα j 1 X j < 1. Multiplying out nd neglecting terms with products yields X n X m X m n j=m+1 α j X m (n m) E [α] nd we rrive, with = E [α] sshowninsection4,t X n X m (n m) X m. Figure 2 shows the ge t of story s function of the number of users tht dugg on the story s. Inotherwords, ech dt point reflects the time t which user diggs on the story s. Theremrkbleobservtionisthtthe number of digging users on stories, when stories re on the frontpge (time intervl between two horizontl lines in Fig. 2), seem to be liner in time t. Figure2illustrtesfive stories with different eventul digg vlue. Hence, Figure 2 suggests tht the user s increment n m for stories on the frontpge is proportionl with time t, i.e.the n-th user nd his ppernce re relted s n = βt n + δ. In tht cse (nd ssuming tht lso non-digging users rrive ccording to the liner lw), we hve pproximtely X n X m X m β (t n t m ). (14) Figure 3 shows four five minute intervls in which the increment f (x) =X n X m is drwn versus the digg vlue x = X m.whenthetimeintervlisshort,hrdlynyliner correltion s suggested by (14) is observed. When the time intervl is longer, for exmple 1 h insted of 5 min, Figure 4 strts revelingthelwin(14): the increments re proportionl to the digg vlue t the beginning of the time intervl. Hence, Figure 4 supports the clim tht the digg vlues of story re dependent nd obey the lw of proportionte effect, when the digg vlue is sufficiently high. The lter condition is mthemticlly not precisely determined, becuse the lw of proportionte

P. Vn Mieghem et l.: Lognorml distribution in the digg online socil network 257 Fig. 3. (Color online) The increments during 5 min of stories on the frontpge versus their diggvlue t the beginning of the 5 min intervl. Four 5 min intervls re shown nd the liner fit of the increments f (x) versusthediggvluex. Noticethtthe liner fit y = x on log-log scle is line with slope 1 (or 45 degrees) nd t x =1equltolog. Fig. 4. (Color online) The increments of story s versus its digg vlue x during 1 h. The only difference with Figure 3 is the durtion of the intervl (1 h versus 5 min).

258 The Europen Physicl Journl B Fig. 5. (Color online) The slope, fitted from the increments during one hour versus the digg vlue, versus the sequentil time intervls of 1 h. The fit of the first hours is lso included. effect is n symptotic lw (see Appendix A), yet observble in relity, i.e. when the digg vlues re finite. Figure 4 indictes tht the tendency towrds the symptotic regime is rther slow. In view of the rther slow tendency towrds the symptotic regime, it is surprising tht the distribution Pr [X d = k] infigure1isclosetolognorml.these observtions question tht the underlying process responsible for the nice lognorml on the front pge is generted by the lw of proportionte effect, becuse the time scles do not mtch. Rther, they suggest tht the user popultion must be Gussin-like distributed (s needed for finite digg vlues in Sect. 3.1) nd tht collective dependence, in proportionl fshion ccording to the hypothesis (12) constitutes the generting process. Figure 5 shows the fits β (t n t m ) in (14), where t n t m =1hoursuchthtβ ct γ n.additionllypproximting γ 1ndusingn = βt n + δ, wededuce tht c c βt n n nd this result (with c 0.3 < 1) grees with 1 N s derived in Section 3.2, but disgrees with the lw of proportionte effect. For longer time intervls (exceeding 17 h), we observe in Figure 5 devitions in the fit of the slopes. For these long times, other effects (minly due to geing, the role of the user interfce nd the principles of humn ttention) strt dominting. 6AnlysisofWundHubermnin[11] After n initilly fst increse, the digg vlue of story flttens with time becuse the novelty of the story hs pssed. Wu nd Hubermn [11] hve tken the geing of story into ccount. The effect of geing mens tht the set {α k } k 1 in (A.1) strts decresing fter some time threshold k c. Bsed on fitting experiments, Wu nd Hubermn [11] propose α k = r k b k, where r k = exp ( 0.4k 0.4) is the geing fctor nd the set {b k } k 1 of rndom vribles is i.i.d. nd with finite men nd vrince. The Lindeberg conditions in [16] for the CLT stte tht σk 2 =Vr[α k]shouldbesmllcompred to N s j=1 σ2 k nd the ltter sum should tend to infinity when N s. The decrese in α k is so fst, tht n lim n j=1 σ2 k is finite, in which cse the limiting distribution is not Gussin nd, consequently, lognorml cnnot be explined from the lw of proportionte effect! In tht cse, the function g k (x) in(4)cnnotbeliner function. Nevertheless, Wu nd Hubermn [11] clim convergence to Gussin (lognorml) by referring to Embrechts nd Mejim [17], Theorem 2, who show tht Z = j=1 c j (λ) X j converges to Gussin with rte of the order of O ( λ αβ),whereα>0ndwhere{x j } j 0 is setofi.i.drndomvribleswithmen0ndvrince1, nd where c j (λ) =O ( λ β) with β>0. Yet, it is not cler whether this theorem pplies to demonstrteconvergence to Gussin (lognorml), becuse r j =exp ( 0.4j 0.4) is compred to c j (λ) nd the r j s depend on j, while the convergence in Embrechts nd Mejim ssumes tht c j (λ) =O ( λ β),forllj. While we hve shown in Section 5 tht the lw of proportionte effect cnnot explin the experiments, the rguments of Wu nd Hubermn t lest lck rigor, s sketched bove. Aprt from mthemticl rigor, since geing does not ply dominnt role on the first frontpge,

P. Vn Mieghem et l.: Lognorml distribution in the digg online socil network 259 their pproch is essentilly equl to the lw of proportionte effect, which cnnot explin the nice lognorml on the first frontpge. 7Conclusion We hve investigted two different nlyses tht led to lognorml distribution, observed (see Fig. 1) for the digg vlue of story, while on the first few frontpges. Usully, the lognorml distribution is the chrcteristic fingerprint of the lw of proportionte effect. However, we found tht the lw of proportionte effect only seems to hold when certindiggvlueofthestoryisreched,ndnotfrom the beginning of the digg counting. Moreover, the proportionlity fctor is not constnt s required by the lw of proportionte effect, but dependent on the number of users s shown in our nlysisofsection3ndexperimentilly verified in Section 5. Finlly, the governing difference eqution (A.1) of the lw of proportionte effect is not experimentlly observed t times the story is on the first frontpge. Hence, the lw of proportionte effect cnnot explin the fst convergence towrds the lognorml distribution of digg vlues on the first frontpge. The second nlysis just sums dependent Bernoulli rndom vribles. We show tht the dependence mong users is reflected by (12), Pr [user j diggs on s X s (j) =m] =m. This conditionl probbility (12) is mnifesttion of proportionte effect nd provides notherwytoverifyproportionl behvior. The conditionl probbility (12) illustrtes how individul humn behvior is ffected by tht of others, given tht the individul cn observe how the others rect, e.g. vi the digg vlue X s (j) = m. Also, the bove conditionl probbility shows tht 1 m for ech user m. Atlst,ournlysisofSection3demonstrtes tht lognorml distribution of the digg vlue D s is obtined when the popultion N s of potentil diggers is normlly distributed. Gussin user popultions re pproximtely mesured 2. At lst, our nlysis underlines the importnce of the user popultion specifics, which re, by its symptotic nture (N s ), not relevnt in the lw of proportionte effect. We re grteful to Siyu Tng for her useful comments. Nobert Blenn is funded by TRANS (www.trns-reserch.nl). Appendix A: The lw of proportionte effect As mentioned in [7] nd in [18], Kpteyn considered in 1903 the eqution X j X j 1 = α j f (X j 1 ) 2 See e.g. http://www.lex.com, Alex the Web Informtion Compny, AlexInternet,Inc.,2010. where the set {α j } 1 j n of rndom vribles is mutully independent nd identiclly distributed, equl to the distribution of the rndom vrible α with men E [α] nd vrince Vr[α]. Moreover, the set {α j } 1 j n of rndom vribles is lso independent of the rndom vribles X 1,X 2,...,X n.thespecilcsewheref(x) =x reduces to X j =(1+α j ) X j 1 (A.1) nd the process tht determines the sequence X 1, X 2,...,X n, given X 0, is sid to obey the lw of proportionte effect, which ws first introduced by Gibrt [19]. After iterting the eqution (A.1), we obtin n X n = X 0 (1 + α j ). (A.2) j=1 By the Centrl Limit Theorem [14] nd ssuming tht ny α j > 1, the sum S n = n j=1 log (1 + α j)ofthe i.i.d. rndom vribles {log (1 + α j )} j 1,echwithdistribution identicl to tht of log (1 + α) with(finite)men E [log (1 + α)] = µ nd vrince σ 2 =Vr[log(1+α)], converges to S n nµ σ d N (0, 1) n d which implies tht S n = log( n j=1 (1 + α j)) N ( nµ, nσ 2).Equivlently,e Sn = n j=1 (1 + α j)tends,for lrge n, tolognormldistributionwithprmetersnµ nd nσ 2.Hence,wehveshowntht,forlrgen, X n is symptoticlly lognormlly distributed with prmeters nµ nd nσ 2,thtrelinerinn. There is continuous vrint of the lw of proportionte effect. In biology, the growth in the number n (t) of items of sme species over time t cn be modelled by the following first order differentil eqution dn (t) dt = r (t) n (t). which reltes the growth (chnge in the popultion) s proportionl to the popultion n (t) ndtheproportion- lity fctor r (t) istimedependent.thegenerlsolution is, for t>, log n (t) =logn ()+ t r (u) du. When we dditionlly ssume tht r (t) chnges t some times = t 0 <t 1 <t 2 <...<t m = t, wheret j re rndom time moments, then where R j = tj t r (u) du = m j=1 tj t j 1 r (u) du = m j=1 t j 1 r (u) du = r (ξ j )(t j t j 1 ) nd ξ j [t j 1,t j ] R j

260 The Europen Physicl Journl B ρ = {Pr [ user k diggs on s user l diggs on s] Pr [user k diggs on s]} Pr [user l diggs on s] Pr [user k diggs on s](1 Pr [user k diggs on s]) Pr [user l diggs on s](1 Pr [user l diggs on s]). is rndom vrible with men µ j = E [r (ξ j )(t j t j 1 )]. Assuming tht the Centrl Limit Theorem cn be pplied, the set of rndom vribles {R j } 1 j m tends to GussinN ( mµ, mσ 2),ndthelognormldistribution of n (t) forlrget then follows in the usul wy. Agin, the men is liner in the time t becuse t m = E [ t], the verge time-spcing. Appendix B: The vrince of X s (j) In order to compute the vrince Vr[X s (j)] = E [ Xs 2 (j)] (E [X s (j)]) 2,wefirstrewriteXs 2 (j) using the definition (3) s Xs 2 (j) = 1 {user k diggs on s} 1 {user l diggs on s} nd l=1 = X 2 s (j) =X s (j) 1 {user k diggs on s} +2 1 {user k diggs on s} 1 {user l diggs on s} l=1 +2 1 {(user k diggs on s) (user l diggs on s)}. l=1 Tking the expecttion gives E [ X 2 s (j) ] = E [X s (j)] +2 Pr [(user k diggs on s) (user l diggs on s)] l=1 where user l proceeds user k. Invoking the conditionl probbility, finlly results in E [ X 2 s (j) ] = E [X s (j)] +2 Pr [user k diggs on s user l diggs on s] l=1 Pr [user l diggs on s]. (B.1) In generl, we cn upper bound E [ X 2 s (j) ] s E [ Xs 2 (j) ] E [X s (j)] + 2 Pr [user l diggs on s] l=1 = E [X s (j)] + 2 E [X s (k)] (j 1) 2 from which we obtin n upper bound of the vrince Vr [X s (j)] E [X s (j)] + 2 E [X s (k)] (E [X s (j)]) 2. If users were to digg independently so tht Pr [user k diggs on s user l diggs on s] = Pr [user k diggs on s] then the conditionl probbility in (B.1) would reduce to E [ X 2 s (j)] = E [X s (j)] Pr [user k diggs on s] E [X s (k)]. j 1 +2 In Digg, users re positively correlted 3 such tht Pr [user k diggs on s user l diggs on s] Pr [user k diggs on s] which leds to higher second moment E [ X 2 s (j)] in (B.1) thn for independent users. However, the men E [X s (j)] j 1inDiggislsolrger.SinceVr[X s (j)] (j 1) 2 (E [X s (j)]) 2,themendecresesthevrince qudrticlly, which my led toreltivelysmllvrince. In other words, the stronger the proportionte effect (mesured vi the proportionlity fctor in (12)), the smller the vrince Vr[X s (j)] nd the better the men E [X s (j)] pproximtes the rndom vrible X s (j). References 1. Rob McGnn, Internet Edges Out Fmily Time More Thn TV Time (ClickZ.com, 2005) 2. C. Nuttll nd D. Gelles, Fcebook becomes bigger hit thn Google (Finncil Times, 2010) 3 Applying the definition [14], p. 30 of the liner correltion coefficient ρ yields see eqution bove.

P. Vn Mieghem et l.: Lognorml distribution in the digg online socil network 261 3. G. Szbo, B.A. Hubermn, Commun. ACM 53, 80 (2010) 4. C. Cstellno, S. Fortunto, V. Loreto, Revi. Mod. Phys. 81, 591(2009) 5. K. Lermn, A. Glstyn, Proceedings of the First Workshop on online Socil Networks (WOSP 08) (ACM, New York, 2008), pp. 7 12 6. S. Tng, N. Blenn, C. Doerr, P. Vn Mieghem, to pper in IEEE Trnsctions on Multimedi 7. E.L. Crow, K. Shimizu, Lognorml distributions, Theory nd Applictions (Mrcel Dekker, Inc. New York, 1988) 8. E. Limpert, W.A. Sthel, M. Abbt, Bioscience 51, 341 (2001) 9. W. Shockley, Proceedings of the IRE 45, 279(1957) 10. F. Rdicchi, S. Fortunto, C. Cstellno, Proc. Ntl. Acd. Sci. USA 105, 17268(2008) 11. F. Wu, B.A. Hubermn, Proc. Ntl. Acd. Sci. USA 104, 17599 (2007) 12. C. Doerr, S. Tng, N. Blenn, P. Vn Mieghem, Are Friends Overrted? A Study of the Socil News Aggregtor Digg.com (IFIP Networking, 2011) 13. R.K. Pn, S. Sinh, New J. Phys. 12, 115004(2010) 14. P. Vn Mieghem. Performnce Anlysis of Communictions Systems nd Networks (Cmbridge University Press, Cmbridge, U.K., 2006) 15. Deese, Kufmn, J. Exp. Psychol. 54, 180(1957) 16. W. Feller, An Introduction to Probbility Theory nd Its Applictions, 2ndedn.(JohnWiley&Sons,NewYork, 1971), Vol. 2 17. P. Embrechts, M. Mejim, Zeitung für Whrscheinlichkeitstheorie und verwndte Gebiete 68, 191(1984) 18. M. Armte, Mthémtiques et sciences humines 129, 5 (1995) 19. R. Gibrt, Bulletin de l Sttistique Générle de l Frnce 19, 469(1930)