A Random Graph Model for Massive Graphs

Size: px

Start display at page:

Download "A Random Graph Model for Massive Graphs"

Madison Daniels
5 years ago
Views:

1 A Ranom Graph Moel for Massive Graphs William Aiello AT&T Labs Florham Park, New Jersey Fan Chung University of California, San Diego Linyuan Lu University of Pennsylvania, Philaelphia ABSTRACT We propose a ranom graph moel which is a special case of sparse ranom graphs with given egree sequences. This moel involves only a small number of parameters, calle logsize an log-log growth rate. These parameters capture some universal characteristics of massive graphs. Furthermore, from these parameters, various properties of the graph can be erive. For example, for certain ranges of the parameters, we will compute the expecte istribution of the sizes of the connecte components which almost surely occur with high probability. We will illustrate the consistency of our moel with the behavior of some massive graphs erive from ata in telecommunications. We will also iscuss the threshol function, the giant component, an the evolution of ranom graphs in this moel.. INTRODUCTION Is the Worl Wie Web completely connecte? If not, how big is the largest component, the secon largest component, etc.? Anyone who has surfe the Web for any length of time will unoubtely come away feeling that if there are isconnecte components at all, then they must be small an few in number. Is the Web too large, ynamic an structureless to answer these questions? Probably yes, if the answers for the sizes of the largest components are require to be exact. Recently, however, some structure of the Web has come to light which may enable us to escribe graph properties of the Web qualitatively. Kumar et al. [; 2] an Kleinberg et al. [0] have measure the egree sequences of the Web an shown that it is well approximate by a power law istribution. That is, the number of noes, y, of a given egree x is proportional to x for some constant > 0. This was reporte inepenently by Albert, Barabási an Jeong in [3; 5; 6]. The power law istribution of the egree sequence appears to be a very robust property of the Web espite its ynamic nature. In fact, the power law istribution of the egree sequence may be a ubiquitous characteristic, applying to many massive Permission to make igital or har copies of all or part of this work for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page. To copy otherwise, to republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee. STOC 2000 Oregon, Portlan Copyright 2000 ACM /97/05..$5.00 real worl graphs. Inee, Abello et al. [] have shown that the egree sequence of so calle call graphs is nicely approximate by a power law istribution. Call graphs are graphs of calls hanle by some subset of telephony carriers for a specific time perio. In aition, Faloutsos, et al. [9] have shown that the egree sequence of the Internet router graph also follows a power law. Just as many other real worl processes have been effectively moele by appropriate ranom moels, in this paper we propose a parsimonious ranom graph moel for graphs with a power law egree sequence. We then erive connectivity results which hol with high probability in various regimes of our parameters. An finally, we compare the results from the moel with the exact connectivity structure for some call graphs compute by Abello et al. [].. Power-Law Ranom Graphs The stuy of ranom graphs ates back to the work of rős an Rényi whose seminal papers [7; 8] lai the founation for the theory of ranom graphs. There are three stanar moels for what we will call in this paper uniform ranom graphs [4]. ach has two parameters. One parameters controls the number of noes in the graph an one controls the ensity, or number of eges. For example, the ranom graph moel G(n, m) assigns uniform probability to all graphs with n noes an m eges while in the ranom graph moel G(n, p) each ege in an n noe graph is chosen with probability p. Our power law ranom graph moel also has two parameters. The two parameters only roughly elineate the size an ensity but they are natural an convenient for escribing a power law egree sequence. The power law ranom graph moel P(, ) is escribe as follows. Let y be the number of noes with egree x. P(, ) assigns uniform probability to all graphs with y = e /x (where self loops are allowe). Note that is the intercept an is the (negative ) slope when the egree sequence is plotte on a log-log scale. We remark that there is also an alternative power law ranom graph moel analogous to the uniform graph moel G(n, p). Instea of having a fixe egree sequence, the ranom graph has an expecte egree sequence istribution. The two moels are basically asymptotically equivalent, subject to bouning error estimates of the variances (which will be further escribe in a subsequent paper)..2 Our Results Just as for the uniform ranom graph moel where graph properties are stuie for certain regimes of the en-

2 sity parameter an shown to hol with high probability asymptotically in the size parameter, in this paper we stuy the connectivity properties of P(, ) as a function of the power which hol almost surely for sufficiently large graphs. Briefly, we show that when <, the graph is almost surely connecte. For < < 2 there is a giant component, i.e., a component of size Θ(n). Moreover, all smaller components are of size O(). For 2 < < 0 = there is a giant component an all smaller components are of size O(log n). For = 2 the smaller components are of size O(log n/log log n). For > 0 the graph almost surely has no giant component. In aition we erive several results on the sizes of the secon largest component. For example, we show that for > 4 the numbers of components of given sizes can be approximate by a power law as well..3 Previous Work Strictly speaking our moel is a special case of ranom graphs with a given egree sequence for which there is a large literature. For example, Wormal [7] stuie the connectivity of graphs whose egrees are in an interval [r, R], where r 3. Luczak [3] consiere the asymptotic behavior of the largest component of a ranom graph with given egree sequence as a function of the number of vertices of egree 2. His result was further improve by Molloy an Ree [4; 5]. They consier a ranom graph on n vertices with the following egree istribution. The fraction of vertices of egree 0,, 2,... is asymptotically λ 0, λ,..., respectively, È where the λ s sum to. It is shown in [4] that if Q = i i(i 2)λi > 0 (an the maximum egree is not too large), then such ranom graphs have a giant component with probability tening to as n goes to infinity, while if Q < 0 (an the maximum egree is not too large), then all components are small with probability tening to as n. They also examine the threshol behavior of such graphs. In this paper, we will apply these techniques to eal with the special case that applies to our moel. Several other papers have taken a ifferent approach to moeling power law graphs than the one taken here [2; 5; 6; 0; 2]. The essential iea of these papers is to efine a ranom process for growing a graph by aing noes an eges. The intent is to show that the efine processes asymptotically yiel graphs with a power law egree sequence with very high probability. While this approach is interesting an important it has several ifficulties. First, the moels are ifficult to analyze rigorously since the transition probabilities are themselves epenent on the current state. For example, [5; 6] implicitly assume that the probability that a noe has a given egree is a continuous function. The authors of [0; 2] will offer an improve analysis in an upcoming paper [6]. In [2] we erive a power law egree sequence for several graph evolution moels for asymptotically large graphs by explicitly solving the recurrence relations given by the ranom evolution process for the expecte egree sequence an showing tight concentration aroun the mean using Azuma s inequality for martingales. We also erive results for the istribution of connecte component sizes, but not for the entire range of powers given in this paper. Secon, while the moels may generate graphs with power law egree sequences, it remains to be seen if they generate graphs which uplicate other structural properties of the Web, the Internet, an call graphs. For example, the moel in [5; 6] cannot generate graphs with a power law other than c/x 3. Moreover, all the graphs can be ecompose into m isjoint trees, where m is a parameter of the moel. The (, ) moel in [2] is able to generate graphs for which the power law for the inegree is ifferent than the power law for the outegree as is the case for the Web. However, to o so, the moel requires that there be a constant fraction of noes that have only inegree an no outegree an visa versa. While this may be appropriate for call graphs (e.g., customer service numbers) it remains to be seen whether it moels the Web. Thus, while the ranom graph generation approach hols the promise of accurately preicting a wie variety a structural properties of many real worl massive graphs much work remains to be one. In this paper we take a ifferent approach. We o not attempt to answer how a graph comes to have a power law egree sequence. Rather, we take that as a given. In our moel, all graphs with a given power law egree sequence are equi-probable. The goal is to erive structural properties which hol with probability asymptotically approaching. Such an approach, while potentially less accurate than the etaile moeling approach above, has the avantage of being robust: the structural properties erive in this moel will be true for the vast majority of graphs with the given egree sequence. Thus, we believe that this moel will be an important complement to ranom graph generation moels. The power law ranom graph moel will be escribe in etail in the next section. In Sections 3 an 4, our results on connectivity will be erive. In section 5, we iscuss the sizes of the secon largest components. In section 6, we compare the results of our moel to exact connectivity ata for call graphs. 2. A RANDOM GRAPH MODL We consier a ranom graph with the following egree istribution epening on two given values an. Suppose there are y vertices of egree x > 0 where x an y satisfy In other wors, we have log y = log x. {v eg(v) = x} = y = e x. Basically, is the logarithm of the number of noes of egree an is the log-log rate of ecrease of the number of noes a given egree. We note that the number of eges shoul be an integer. To be precise, the above expression for y shoul be roune own to e. If we use real numbers instea of rouning x own to integers, it may cause some error terms in further computation. However, we will see that the error terms can be easily boune. For simplicity an convenience, we will use real numbers with the unerstaning the actual numbers are their integer parts. Another constraint is that the sum of the egrees shoul be even. This can be assure by aing a vertex of egree if the sum is ol if neee. Furthermore, for simplicity, we here assume that there is no isolate vertices. We can euce the following facts for our graph: There are several ways to eal with noes with zero egree. For simplicity, here we simply exclue such isolate noes from the graph. 2

3 () The maximum egree of the graph is e. Note that 0 log y = log x. (2) The vertices number n can be compute as follows: By summing y(x) for x from to e, we have n = e x= e x ζ()e if > e if = e if 0 < < È where ζ(t) = n= is the Riemann Zeta function. n t (3) The number of eges can be compute as follows: = 2 e x= x e x 2 ζ( )e if > 2 4 e if = 2 e2 if 0 < < (4) The ifferences of the real numbers in ()-(3) an their integer parts can be estimate as follows: For the number n of vertices, the error term is at most e. For, it is o(n), which is a lower orer term. For 0 < <, the error term for n is relatively large. In this case, we have n e e = e Therefore, n has the same magnitue as e. The number of eges can be treate in a similarly way. For 2, the error term of is o(), a lower orer term. For 0 < < 2, has the same magnitue as in formula of item (3). In this paper, we mainly eal with the case > 2. The only place that we eal with the case 0 < < 2 is in the next section where we refer to 2 as a constant. By using real numbers instea of rouning own to their integer parts, we simplify the arguments without affecting the conclusions. In orer to consier the ranom graph moel, we will nee to consier large n. We say that some property almost surely (a. s.) happens if the probability that the property hols tens to as the number n of the vertices goes to infinity. Thus we consier to be large but where is fixe. We use the following ranom graph moel for a given egree sequence: The moel:. Form a set L containing eg(v) istinct copies of each vertex v. 2. Choose a ranom matching of the elements of L. 3. For two vertices u an v, the number of eges joining u an v is equal to the number of eges in the matching of L joining copies of u to copies of v. We remark that the graphs that we are consiering are in fact multi-graphs, possibly with loops. This moel is a natural extension of the moel for k-regular graphs, which is forme by combining k ranom matching. For references an unefine terminology, the reaer is referre to [4; 8]. We note that this ranom graph moel is slightly ifferent from the uniform selection moel P(, ) as escribe in section.. However, by using techniques in Lemma of [5], it can be shown that if a ranom graph with a given egree sequence a. s. has property P uner one of these two moels, then it a. s. has property P uner the other moel, provie some general conitions are satisfie. 3. TH CONNCTD COMPONNTS Molloy an Ree [4] showe that for a ranom graph with (λ i + o())n vertices of egree i, where λ i are nonnegative values which È sum to, the giant component emerges a. s. when Q = i i(i 2)λi > 0, provie that the maximum egree is less than n /4 ǫ. They also show that È almost surely there is no giant component when Q = i i(i 2)λ i < 0 an maximum egree less than n /8 ǫ. Here we compute Q for our (, )-graphs. Q = e x= e x= x(x 2) e x e e x 2 2 x= e x (ζ( 2) 2ζ( ))e if > 3 Hence, we consier the value 0 = , which is a solution to If > 0, we have ζ( 2) 2ζ( ) = 0 e x= x(x 2) e x < 0 We first summarize the results here:. When > 0 = , the ranom graph a. s. has no giant component. When < 0 = , there is a. s. a unique giant component. 2. When 2 < < 0 = , the secon largest components are a. s. of size Θ(log n). For any 2 x < Θ(log n), there is almost surely a component of size x. 3. When = 2, a. s. the secon largest components are of size Θ( log n log n ). For any 2 x < Θ( ), there loglog n loglog n is almost surely a component of size x. 4. When < < 2, the secon largest components are a. s. of size Θ(). The graph is a. s. not connecte. 5. When 0 < <, the graph is a. s. connecte. 6. For = 0 = , this is a very complicate case. It correspons to the ouble jump of ranom graph G(n, p) with p =. For =, there is a nontrivial probability for either cases that the graph is n connecte or isconnecte. Before proceeing to state the main theorems, here are some general iscussions: For > 8, Molloy an Ree s result immeiately implies that almost surely there is no giant component. When 8, aitional analysis is neee to eal with the egree constraints. We will prove in Theorem 2 that almost surely there is no giant component when > 0. Also, almost surely there is a unique giant component when < 0 (The proof will be given in the full paper). For 2 0, we will consier the sizes of the secon largest component in section 5. It can be shown that the secon largest component almost surely has size O(log n). 3

4 In the other irection, we will show that the secon largest component has size at least Θ(log n). For 0 < < 2, the graph has Θ(e 2 ) eges. We expect that the giant exponent is very large. For some constants T an C, a. s. every vertex of egree greater that T log n C belongs to the giant component. That is, the number of eges which o not belong to the giant component is quite small. It is at most C 2 x= x e x Θ((C)2 e ) O( 2 log 2 ) Now we consier the secon largest component. For any pair (u, v), the probability that u belongs to the giant component while v belongs to the other component of size greater than M = O() is at most ( 2 log 2 ) M = o(n 2 ) for some large constant M, which only epens on. This implies that all components except for the giant component a. s. have size at most M. Therefore, a. s. the secon largest component has size O(). For < < 2, fix a vertex v of egree. The probability that the other vertex that connects to v is also of egree is about Θ( e e 2 Therefore the probability that no component has size of 2 is at most ( Θ( e Θ(e2 2 )) e e ) o() e 2 In other wors, the graph a. s. has at least one component of size 2. For 0 < <, the ranom graph is a. s. connecte. Here we sketch the ieas. Since the size of the possible secon largest component is boune by a constant M, all vertices of egree M are almost surely in the giant component. We only nee to show the probability that there is an ege connecting two small egree vertices is small. There are only M x= ) x e x Ce vertices with egree less than M. For any ranom pair of vertices (u, v), the probability that there is an eges connecting them is about = 2 Θ(e ) Hence the probability that there is ege connecting two small egree vertices is at most u,v = (Ce ) 2 Θ(e 2 ) = o() Hence, every vertex is a. s. connecte to a vertex with egree M, which a. s. belongs to the giant exponent. Hence, the ranom graph is a. s. connecte. The case of = 2 is quite interesting. In this case, the graph has 4 e eges. Since a. s. all other components except for the giant one has size at most O(log n) = M log for some constant M. Hence, a. s. all vertices with egree at least M log are in the giant component. Hence, the giant component is so large that only a small portion of vertices (as boune below) are not in it. M log x= x e x 2 (log)e For any pair of vertices (u, v), the probability that u is in the giant component while v is in other component of size at least 2. is at most log ( log )2. log = e 2. loglog log 2. = o(n 2 ) Again, this almost surely is not likely to happen. Hence, we prove that the size of the secon largest components is at most 2.. log Now we fin a vertex v of egree x = 0.9. The log probability that all its neighbors are of egree is ( )x. The probability that no such vertex exists is at most ( ( )x ) e x 2 e ( )x e x 2 = e e0. x 2 = o() Hence, a. s. there is a vertex of egree 0.9, which forms log a connecte component of size Again, when x log is smaller, almost surely there is a component of size x. 4. TH SIZS OF CONNCTD COMPO- NNTS IN CRTAIN RANGS FOR For > 0 = , almost surely there is no giant component. This range is of special interest since it is quite useful later for escribing the istribution of small components. We will prove the following: Theorem. For (, )-graphs with > 4, the istribution of the number of connecte components is as follows:. For each vertex v of egree = Ω(), let τ be the size of connecte component containing v. Then Pr where = 2 ζ( 2) ζ( ) τ > 2λ Ö c2 2 λ 2 ζ( 3) an c2 = ζ( 2) ζ( ) ζ( ) are two constants. In other wors, for a (slowly) increasing function an λ = ǫ, for some arbitrarily small postive constant ǫ, the vertex v a. s. belongs to a connecte component of size + O( 2 +ǫ ). 2. The number of connecte components of size x is a. s. at least an at most where c 3 = 4+ c 2 ( 2)+. ( + o()) e c x c 3 e log 2 n x is a constant only epening on 4

5 3. A connecte component of the (, )-graph a. s. has the size at most e 2 +2 = Θ(n 2 +2 log n) In our proof we use the secon moment whose convergence epens on > 4. In fact for 4 the secon moment iverges as the size of the graph goes to infinity so that our metho no longer applies. Theorem strengthens the following result (which can be erive from Lemma 3 in [4]) for the range of > 4. Theorem 2. For > 0 = , a connecte component of the (, )-graph a. s. has the size at most where C = 6 c 2 Ce 2 = Θ(n 2 log n) is a constant only epening on. The proof for Theorem 2 is by using branching process metho. We here briefly escribe the proof since it is neee for the proof of Theorem. We start by exposing any vertex v 0 in our graph, then we expose its neighbors, an then the neighbors of its neighbors, repeating until the entire component is expose. At any stage of the process the entire component will have some noes which are marke live, some which are marke ea, an some which are not marke at all. At stage i, we choose an arbitrary live vertex v to expose. Then we mark v ea an, for each neighbor u of v, we mark u live if u is unmarke so far. Let L i be the set of marke vertices at stage i an X i be the ranom variable that enotes the number of vertices in L i. We note that all vertices in L i are marke by either live or ea. Let O i be the set of live vertices an Y i be the ranom variable that is the number of vertices of O i. At each step we mark exact one ea vertex, so the total number of ea vertices at i-th step is i. We have X i = Y i + i. Initially we assign L 0 = O 0 = {v 0}. Then at stage i, we o the following:. If Y i = 0, then we stop an output X i. 2. Otherwise, ranomly choose a live vertex u from O i an expose its neighbors in N u. Then mark u ea an mark each vertex live if it is in N u but not in L i. We have L i = L i N u, an O i = (O i \ {u}) (N u \ L i ). Suppose that v has egree. Then X = +, an Y =. ventually Y i will hit 0 if i is large enough. Let τ enote the stopping time of Y, namely, Y τ = 0. Then X τ = Y τ + τ = τ measures the size of the connecte component. We first compute the expecte value of Y i an then use Azuma s Inequality [4] to prove Theorem 2. Suppose that the vertex u is expose at stage i. Then N u L i contains at least one vertex, which was expose to reach u. However, N u L i may contain more than one vertex. We call them backeges. We note that backeges causes the exploration to stop more quickly, especially when the component is large. However in our case > 0 = , the contribution of backeges is quite small. We enote Z i = #{N u} an W i = #{N u L i }. Z i measures the egree of the vertex expose at stage i, while W i measures the number of backeges. By efinition, we have Y i Y i = Z i 2 W i. We have È e (Z i) = x= x x e x = e 3 = ζ( 2)+O(n ) 2 ζ( )+O(n ) = ζ( 2) ζ( ) + O(n 3 ) È e x= x2 Now we will boun W i. Suppose that there are m eges expose at stage i. Then the probability that a new neighbor is in L i is at most m. We have (W i) provie m = o(). = x= x m x m ( m )2 ( ) = m + O((m )2 ) When i Ce 2, m is at most ie Ce 3. Hence, We have (Y i) = Y + m = O(n 3 log n) = o() i j=2 i (Y j Y j ) = + (Z j 2 W j) j=2 ζ( 2) = + (i ) ζ( ) 2 io(n 3 log n) = (i ) + io() Proof of Theorem 2: Since Y j Y j e, by Azuma s martingale inequality, we have t 2 Pr( Y i (Y i) > t) 2exp 2ie 2/ By taking i = 6 e 2 c 2 log n, an t = 2 i. Since (Y i)+t = (i )+io()+ c 2 i = c 2 i++c+io() < 0 We have Pr(τ > 6 e c 2 log n) = Pr(τ > i) Pr(Y i 0) Pr(Y i > (Y i) + t) 2 exp t 2 2ie 2/ = 2 n 2 Hence, the probability that there exists a vertex v such that v lies in a component of size greater than 6 e 2 c 2 log n is at most n 2 n 2 = 2 n = o(). vertices The proof of Theorem uses the methoology above as a starting point while introucing the calculation of the variance of the above ranom variables. Proof of Theorem 5

6 We follow the notation an previous results of Section 4. Uner the assumption > 4, we consier the following: V ar(z i) = since > 4. e x= = e x 2 x e x (Zi)2 e x= x 3 (Z i) 2 = ζ( 3) + O(n 4 ) ζ( ) + O(n 2 ) = 2 ζ( 2) ζ( ) +O(n 3 ) 2 ζ( 3) ζ( 2) ζ( ) + O(n 4 ) ζ( ) = c 2 + o() We nee to compute the covariants. There are moels for ranom graphs in which the eges are in epenently chosen. Then, Z i an Z j are inepenent. However, in the moel base on ranom matchings, there is a small correlation. For example, Z i = x slightly effects the probability of Z j = y. Namely, Z j = x has slightly less chance, while Z j = y x has slightly more chance. Both ifferences can be boune by 2 2 Hence CoV ar(z i, Z j) (Z i) 2 = O( ) if i j. 2 n Now we will boun W i. Suppose that there are m eges expose at stage i. Then the probability that a new neighbor is in L i is at most m. We have When i = O(e ), m ie = O(e 2 ), we have (Y i) = + (i ) = (i ) + O(n 4 ) = (i ) + o() V ar(y i) = V ar( + = V ar( = i i j=2 ζ( 2) ζ( ) 2 + io(n 3 ) + i m i j=2 (Y j Y j )) (Z j W j)) (V ar(z j) + V ar(w j)) + (CoV ar(z j, Z k ) j=2 2 j k i CoV ar(z j, W k ) + CoV ar(w j, W k )) = ic 2 + io() + i 2 (O( n ) + O( Õe ( 2 ) ) +O(e ( 2 ) )) = ic 2 + io() + i(o(e ( 2 2 ) ) + O(e ( 3 ) )) = ic 2 + io() Chebyshev s inequality gives Pr( Y i (Y i) > λσ) < λ 2 where σ is the stanar eviation of Y i, σ = ic 2 + o( i) Let Õ Õ i = 2λ c 2 an i 2 = + 2λ c 2. We have (Y i ) λσ = (i ) + o() λ c 2i + o( i ) Hence, 2λÖ c2 λ Ö = λö c2 o( ) > 0 c 2 o( ) Pr(τ < i ) Pr(Y i 0) Pr(Y i < (Y i ) λσ) λ 2 V ar(w i) = x= x 3 m x (Wi) 2 m ( m + ) ( m )3 O((m )2 ) = m + O((m )2 ) Similarly, (Y i2 ) + λσ = (i 2 ) + o() + λ c 2i 2 + o( i 2) Hence, 2λÖ c2 + λ Ö = λö c2 + o( ) < 0 c 2 + o( ) CoV ar(w i, W j) Ô V ar(w i)v ar(w j) m + O((m )2 ) Ô Ö m CoV ar(z i, W j) V ar(z i)v ar(w j) = O( ) Pr(τ > i 2) Pr(Y i2 > 0) Pr(Y i2 > (Y i2 ) + λσ) λ 2 Therefore Pr τ > 2λ Ö c2 2 λ 2 6

7 For a fixe v an λ a slowly increasing function to infinity, above inequality implies that almost surely we have τ = + O(λ ). We note that almost all components generate by vertices of egree x is about the size of. One such component can have at most about vertices of egree. Hence, the number of component of size is at least e. Let = c x. Then the number of components of size x is at least e ( + o()) c x The above proof actually gives the following result. The size of every component, whose vertices have egree at most 0, is almost surely C 2 0 log n where C = 6 is the same c 2 constant as in Theorem 2. Let x = C 2 0 log n an consier the number of components of size x. A component of size x almost surely contains at least one vertex of egree greater than 0. For each vertex v with egree 0, by part, we have Pr τ > 2λ Ö c2 2 λ 2 Õ Let λ = C 2 0 log n c 4 c 2, we have Pr(τ C 2 0 log n) Pr C log2 n Ö τ > + 2λ c2 where C 3 = 32c 2 = c 2 is constant epening only on. c 3 C2 8 Since there are only e vertices of egree, the number of components of size at least x is at most e e C3 = log2 n C 3e 4 0 log2 n = where c 3 = 2C 3 ( 2) C+ 2 = 4 + c 2 = 0 C 3e log2 n C 3e ( 2) +2 0 log 2 n = c 3 e log 2 n x 2 + ( 2)+. For x = e 2 +2, the above inequality implies that the number of components of size at least x is at most o(). In other wors, almost surely no component has size greater than e This completes the proof of Theorem. 5. ON TH SIZ OF TH SCOND LARG -ST COMPONNT For the range of 2 < < 0, we want to show that the secon largest components almost surely have size of at most O(log n). However, we can not apply Azuma s martingale inequality irectly as in the proofs of previous sections. For example, the branching process metho is no longer feasible when vertices of large egrees are involve. We will moify the branching process metho as follows: È e For 2 < < 0, we consier Q = e x= x(x 2). x (Note that Q is a positive constant.) There is a constant È integer x 0 satisfying x0 e x= x(x 2) > Q. We choose x 2 δ satisfying: δ ( δ) 2 = Q 4. If the component has more than δ eges, it must have Θ(n) vertices. So it is a giant component an we are one. We may assume that the component has no more than δ eges. We now consier the following moifie branching process: We start with Y0 live vertices an Y0 C log n. At the i-th step, we choose one live vertex u an expose its neighbors. If the egree of u is less than or equal to x 0, we procee as in section 4, by marking u ea an all vertices v N(u) live (provie v is not marke before). If the egree of u is greater than x 0, we will mark exactly one vertex v N(u) live an others ea, provie v is not marke before. In both case u is marke ea. Let Oi be the set of live vertices at i-th step ( in contrast to the live set O i). We enote by Yi the new ranom variable (in contrast to Y i) that is the number of vertices in Oi. Our main iea is to show that Yi, a truncate version of Y i, is well-concentrate aroun (Yi ). Although it is ifficult to irectly erive such result for Y i because of vertices of large egrees, we will be able to boun the istribution Yi. To be precise, Yi satisfies the following: Y0 C log n, where C = 30x2 0 Q epening on. Y i Y i x 0. is a constant only Let W i be the number of backeges as efine in section 4. By inequality (*) an the assumption that the number of eges m in the component is at most δn, we have (W i) (Y i Y i ) δ ( δ) 2 = Q 4 x 0 x= Q 2 Q 4 = Q 4. By Azuma s martingale inequality, we have Pr(Y i. Hence, we have x(x 2) e x (Wi) Qi 8 ) Pr(Y i (Y i ) Qi 8 ) < e (Qi/8) 2 2ix 2 0 = o(n ) provie i > C log n. The above inequality implies that with probability at least o(n ), Yi > Qi > 0 when i > C log n. Since Y 8 i ecreases at most by at each step, Yi can not be zero if i C log n. So Yi > 0 for all i. In other wors, a. s. the branching process will not stop. However, it is impossible to have Yn > 0, that is a contraiction. Thus we conclue that the component must have at least δn eges. So it is a giant component. We note that if a component has more than C log n eges expose, then almost surely it is a giant component. In particular, any vertex with egree more than 7

8 C log n is almost surely in the giant component. Hence, the secon component have size of at most Θ(log n). Next, we will show that the secon largest has size at least Θ(log n). We consier the vertices v of egree x = c, where c is some constant. There is a positive probability that all neighboring vertices of v have egree. In this case, we get a connecte component of size x+ = Θ(log n). The probability of this is about e (c) ( ζ( ) )c Since there are vertices of egree x, the probability that none of them has the above property is about ( ζ( ) ) e c (c) e ζ( ) c ( e = e ζ( ) c ) (c) where we have if 3 c = if 3 > > 2 2 log( 2) e (c) = o() In other wors, a. s. there is a component of size c + = Θ(log n). Therefore, the secon largest component has size Θ(log n). Moreover, the above argument hols if we replace c by any small number. Hence, small components exhibit a continuous behavior. We remark that the methos escribe in this section can be extene to eal with the case of 0 2. The etaile treatment will be left to the full paper. 6. COMPARISONS WITH RALISTIC MAS- SIV GRAPHS Our (, )-ranom graph moel was originally erive from massive graphs generate by long istance telephone calls. These so-calle call graphs are taken over ifferent time intervals. For the sake of simplicity, we consier all the calls mae in one ay. very complete phone call is an ege in the graph. very phone number which either originates or receives a call is a noe in the graph. When a noe originates a call, the ege is irecte out of the noe an contributes to that noe s outegree. Likewise, when a noe receives a call, the ege is irecte into the noe an contributes to that noe s inegree. In Figure 2, we plot the number of vertices versus the inegree for the call graph of a particular ay. Let y(i) be the number of vertices with inegree i. For each i such that y(i) > 0, a is marke at the coorinate (i, y(i)). As similar plot is shown in Figure for the outegree. Plots of the number of vertices versus the inegree or outegree for the call graphs of other ays are very similar. For the same call graph in Figure 3 we plot the number of connecte components for each possible size. The egree sequence of the call graph oes not obey perfectly the (, )-graph moel. The number of vertices of a given egree oes not even monotonically ecrease with increasing egree. Moreover, the call graph is irecte, i.e., for each ege there is a noe that originates the call an a noe that receives the call. The inegree an outegree of a noe nee not be the same. Clearly the (, )-ranom graph moel oes not capture all of the ranom behavior of the real worl call graph. Nonetheless, our moel oes capture some of the behavior of the call graph. To see this we first estimate an of Figure 2. Recall that for an (, )-graph, the number of vertices as a function of egree is given by log y = log x. By approximating Figure 2 by a straight line, can be estimate using the slope of the line to be approximately 2.. The value of e for Figure 2 is approximately The total number of noes in the call graph can be estimate by ζ(2.) e =.56 e For between 2 an 0, the (, )-graph will have a giant component of size Θ(n). In aition, a. s., all other components are of size O(log n). Moreover, for any 2 x O(log n), a component of size x exists. This is qualitatively true of the istribution of component sizes of the call graph in Figure 3 2. The one giant component contains nearly all of the noes. The maximum size of the next largest component is inee exponentially smaller than the size of the giant component. Also, a component of nearly every size below this maximum exists. Interestingly, the istribution of the number of components of size smaller than the giant component is nearly log-log linear. This suggests that after removing the giant component, one is left with an (, )-graph with > 4 (Theorem yiels a log-log linear relation between number of components an component size for > 4. ) This intuitively seems true since the greater the egree, the fewer noes of that egree we expect to remain after eleting the giant component. This will increase the value of for the resulting graph. There are numerous questions that remain to be stuie. For example, what is the effect of time scaling? How oes it correspon with the evolution of? What are the structural behaviors of the call graphs? What are the correlations between the irecte an unirecte graphs? It is of interest to unerstan the phase transition of the giant component in the realistic graph. In the other irection, the number of tiny components of size is leaing to many interesting questions as well. Clearly, there is much work to be one in our unerstaning of massive graphs. Acknowlegments. We are grateful to J. Feigenbaum, J. Abello, A. Buchsbaum, J. Rees, an J. Westbrook for their assistance in preparing the figures an for many interesting iscussions on call graphs. 7. RFRNCS [] J. Abello, A. Buchsbaum, an J. Westbrook, A functional approach to external graph algorithms, Proc. 6th uropean Symposium on Algorithms, pp , 998. [2] W. Aiello, F. Chung, L. Lu, Ranom evolution of power law graphs, manuscript. [3] R. Albert, H. Jeong an A. Barabási, Diameter of the Worl Wie Web, Nature, 40, September 9, 999. [4] N. Alon an J. H. Spencer, The Probabilistic Metho, Wiley an Sons, New York, 992. [5] A. Barabási, an R. Albert, mergence of scaling in ranom networks, Science, 286, October 5, This ata was compile by J. Abello an A. Buchsbaum of AT&T Labs from raw phone call recors using, in part, the external memory algorithm of Abello, Buchsbaum, an Westbrook [] for computing connecte components of massive graphs. 8

9 [6] A. Barabási, R. Albert, an H. Jeong Scale-free characteristics of ranom networks: the topology of the worl wie web, lsevier Preprint August 6, 999. [7] P. rős an A. Rényi, On the evolution of ranom graphs, Publ. Math. Inst. Hung. Aca. Sci. 5 (960), 7 6. [8] P. rős an A. Rényi, On the strength of connecteness of ranom graphs, Acta Math. Aca. Sci. Hungar. 2 (96), [9] M. Faloutsos, P. Faloutsos, an C. Faloutsos, On powerlaw relationships of the internet topology, Proceeings of the ACM SIGCOM Conference, Cambrige, MA, 999. [0] J. Kleinberg, S. R. Kumar, P. Raphavan, S. Rajagopalan an A. Tomkins, The web as a graph: Measurements, moels an methos, Proceeings of the International Conference on Combinatorics an Computing, July 26 28, 999. [] S. R. Kumar, P. Raphavan, S. Rajagopalan an A. Tomkins, Trawling the web for emerging cyber communities, Proceeings of the 8th Worl Wie Web Conference, inburgh, Scotlan, May 5 9, 999. [2] S. R. Kumar, P. Raghavan, S. Rajagopalan an A. Tomkins, xtracting large-scale knowlege bases from the web, Proceeings of the 25th VLDB Conference, inburgh, Scotlan, September 7 0, 999. [3] Tomasz Luczak, Sparse ranom graphs with a given egree sequence, Ranom Graphs, vol 2 (Poznań, 989), 65-82, Wiley, New York, 992. [4] Michael Molloy an Bruce Ree, A critical point for ranom graphs with a given egree sequence. Ranom Structures an Algorithms, Vol. 6, no. 2 an 3 (995) [5] Michael Molloy an Bruce Ree, The size of the giant component of a ranom graph with a given egree sequence, Combin. Probab. Comput. 7, no. (998), [6] P. Raghavan, personal communication. [7] N. C. Wormal, The asymptotic connectivity of labele regular graphs, J. Comb. Theory (B) 3 (98), [8] N. C. Wormal, Moels of ranom regular graphs, Surveys in Combinatorics, 999 (LMS Lecture Note Series 267, s J.D.Lamb an D.A.Preece),

10 e+07 e+07 Number of vertices e+06 e+05 e+04 e+03 e+02 Number of vertices e+06 e+05 e+04 e+03 e+02 e+0 e+0 e+00 e+00 e+0 e+02 e+03 e+04 e+05 Outegree 8/0/98 Figure : The number of vertices for each possible outegree for the call graph of a typical ay. e+00 e+00 e+0 e+02 e+03 e+04 e+05 Inegree 8/0/98 Figure 2: The number of vertices for each possible inegree for the call graph of a typical ay. e+06 Number of components e+05 e+04 e+03 e+02 e+0 e+00 e+00 e+0 e+02 e+03 e+04 e+05 e+06 e+07 Component size 8/0/98 Figure 3: The number of connecte components for each possible component size for the call graph of a typical ay. 0

A Random Graph Model for Power Law Graphs

A Random Graph Model for Power Law Graphs William Aiello Fan Chung Linyuan Lu Abstract We propose a random graph model which is a special case of sparse random graphs with given degree sequences which