Assortativity and Mixing. Outline. Definition. General mixing. Definition. Assortativity by degree. Contagion. References. Contagion.

Outline Complex Networks, Course 303A, Spring, 2009 Prof Peter Dodds Department of Mathematics & Statistics University of Vermont Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 30 License Frame 1/25 Frame 2/25 Basic idea: between node categories Random networks with arbitrary distributions cover much territory but do not represent all networks Moving away from pure random networks was a key first step We can extend in many other directions and a natural one is to introduce correlations between different kinds of nodes Node attributes may be anything, eg: 1 2 demographics (age, gender, etc 3 group affiliation We speak of mixing patterns, correlations, biases Networks are still random at base but now have more global structure Build on work by Newman [3, 4] Frame 3/25 Assume types of nodes are countable, and are assigned numbers 1, 2, 3, Consider networks with directed edges ( an edge connects a node of type µ e µν = Pr to a node of type ν a µ = Pr(an edge comes from a node of type ν b µ = Pr(an edge leads to a node of type ν Write E = [e µν ], a = [a µ ], and b = [b µ ] Requirements: e µν = 1, µ ν ν e µν = a µ, and µ e µν = b ν Frame 4/25

Notes: Correlation coefficient: Varying e µν allows us to move between the following: 1 Perfectly assortative networks where nodes only connect to like nodes, and the network breaks into subnetworks Requires e µν = 0 if µ ν and µ e µµ = 1 2 Uncorrelated networks (as we have studied so far For these we must have independence: e µν = a µ b ν 3 Disassortative networks where nodes connect to nodes distinct from themselves Disassortative networks can be hard to build and may require constraints on the e µν Basic story: level of assortativity reflects the to which nodes are connected to nodes within their group Quantify the level of assortativity with the following assortativity coefficient [4] : µ r = e µµ µ a µb µ 1 µ a = Tr E E 2 1 µb µ 1 E 2 1 where 1 is the 1-norm = sum of a matrix s entries Tr E is the fraction of edges that are within groups E 2 1 is the fraction of edges that would be within groups if connections were random 1 E 2 1 is a normalization factor so r max = 1 When Tr e µµ = 1, we have r = 1 When e µµ = a µ b µ, we have r = 0 Frame 5/25 Frame 6/25 Correlation coefficient: Scalar quantities Notes: r = 1 is inaccessible if three or more types are presents Disassortative networks simply have nodes connected to unlike nodes no measure of how unlike nodes are Minimum value of r occurs when all links between non-like nodes: Tr e µµ = 1 where 1 r min < 0 r min = E 2 1 1 E 2 1 Now consider nodes defined by a scalar integer quantity Examples: age in years, height in inches, number of friends, e jk = Pr a randomly chosen edge connects a node with value j to a node with value k a j and b k are defined as before Can now measure correlations between nodes based on this scalar quantity using standard Pearson correlation coefficient ( : r = j k j k(e jk a j b k σ a σ b = j 2 a j 2 a jk j a k b k 2 b k 2 b Frame 7/25 This is the observed normalized deviation from randomness in the product jk Frame 8/25

Degree- correlations Degree- correlations Natural correlation is between the s of connected nodes Now define e jk with a slight twist: ( an edge connects a j + 1 node e jk = Pr to a k + 1 node ( an edge runs between a node of in- j = Pr and a node of out- k Useful for calculations (as per R k Important: Must separately define P 0 as the {e jk } contain no information about isolated nodes Directed networks still fine but we will assume from here on that e jk = e kj Degree- correlations Frame 9/25 Notation reconciliation for undirected networks: j k r = j k(e jk R j R k MIXING PATTERNS IN NETWORKS where, as before, R k is the probability that a randomly chosen edge leads to a node of k + 1, and σ 2 R = j σ 2 R j 2 R j j jr j TABLE II Size n, assortativity coefficient r, and expected error r on the assortativity, for a number of social, technological, and biological networks, both directed and undirected Social networks: coauthorship networks of a physicists and biologists 46 and b mathematicians 47, in which authors are connected if they have coauthored one or more articles in learned journals; c collaborations of film actors in which actors are connected if they have appeared together in one or more movies 5,7 ; d directors of fortune 1000 companies for 1999, in which two directors are connected if they sit on the board of directors of the same company 48 ; e romantic not necessarily sexual relationships between students at a US high school 49 ; f network of email address books of computer users on a large computer system, in which an edge from user A to user B indicates that B appears in A s address book 50 Technological networks: g network of high voltage transmission lines in the Western States Power Grid of the United States 5 ; h network of direct peering relationships between autonomous systems on the Internet, April 2001 51 ; i network of hyperlinks between pages in the World Wide Web domain ndedu, circa 1999 52 ; j network of dependencies between software packages in the GNU/Linux operating system, in which an edge from package A to package B indicates that A relies on components of B for its operation Biological networks: k protein-protein interaction network in the yeast S Cerevisiae 53 ; l metabolic network of the bacterium E Measurements Coli 54 ; m neural network of theof nematode - worm C Elegans 5,55 ; tropic interactions correlations between species in the food webs of n Ythan Estuary, Scotland 56 and o Little Rock Lake, Wisconsin 57 2 PHYSICAL REVIEW E 67, 026126 2003 Frame 10/25 Error estimate for r: Remove edge i and recompute r to obtain r i Repeat for all edges and compute using the jackknife method ( [1] σ 2 r = i (r i r 2 Mildly sneaky as variables need to be independent for us to be truly happy and edges are correlated Frame 11/25 Group Network Type Size n Assortativity r Error r a Physics coauthorship undirected 52 909 0363 0002 a Biology coauthorship undirected 1 520 251 0127 00004 b Mathematics coauthorship undirected 253 339 0120 0002 Social c Film actor collaborations undirected 449 913 0208 00002 d Company directors undirected 7 673 0276 0004 e Student relationships undirected 573 0029 0037 f Email address books directed 16 881 0092 0004 g Power grid undirected 4 941 0003 0013 Technological h Internet undirected 10 697 0189 0002 i World Wide Web directed 269 504 0067 00002 j Software dependencies directed 3 162 0016 0020 k Protein interactions undirected 2 115 0156 0010 l Metabolic network undirected 765 0240 0007 Biological m Neural network directed 307 0226 0016 n Marine food web directed 134 0263 0037 o Freshwater food web directed 92 0326 0031 m almost never is In this paper, therefore, we take an alternative approach, making use of computer simulation We would like to generate on a computer a random network having, for instance, a particular value of the matrix e jk This also fixes the distribution, via Eq 23 In Ref 22 we discussed disassortative one possible way of doing this using an algorithm similar to that of Sec II C One would draw edges from the desired distribution e jk and then join the k ends randomly in groups of k to create the network This algorithm has also been discussed recently by 58,59 The algorithm is as follows 1 Given the desired edge distribution e jk, we first calculate the corresponding distribution of excess s q k from Eq 23, and then invert Eq 22 to find the distribution: Frame 12/25 p k q k 1 /k 27 q j 1 / j j Social networks tend to be assortative (homophily Technological and biological networks tend to be

Spreading on -correlated networks Spreading on -correlated networks Next: Generalize our work for random networks to -correlated networks As before, by allowing that a node of k is activated by one neighbor with probability β k1, we can handle various problems: 1 find the giant component size 2 find the probability and extent of spread for simple disease models 3 find the probability of spreading for simple threshold models Goal: Find f n,j = Pr an edge emanating from a j + 1 node leads to a finite active subcomponent of size n Repeat: a node of k is in the game with probability β k1 Define β 1 = [β k1 ] Plan: Find the generating function F j (x; β 1 = n=0 f n,jx n Frame 13/25 Frame 14/25 Spreading on -correlated networks Spreading on -correlated networks Recursive relationship: F j (x; β 1 = x 0 + x e jk R j (1 β k+1,1 e [ jk β k+1,1 F k (x; R k β 1 ] j First term = Pr that the first node we reach is not in the game Second term involves Pr we hit an active node which has k outgoing edges Next: find average size of active components reached by following a link from a j node = F j (1; β 1 Frame 15/25 Differentiate F j (x; β 1, set x = 1, and rearrange We use F k (1; β 1 = 1 which is true when no giant component exists We find: R j F j (1; β 1 = e jk β k+1,1 + ke jk β k+1,1 F k (1; β 1 Rearranging and introducing a sneaky δ jk : ( δjk R k kβ k+1,1 e jk F k (1; β 1 = e jk β k+1,1 Frame 16/25

Spreading on -correlated networks Spreading on -correlated networks So, in principle at least: In matrix form, we have F (1; β 1 = A 1 E, β 1 E β 1 where [A E, β1 ] A E, β1 F (1; β 1 = E β 1 = δ jkr k kβ k+1,1 e jk, j+1,k+1 [ F (1; β 1 ] = F k (1; β 1, k+1 [ ] [E] j+1,k+1 = e jk, and β1 = β k+1,1 k+1 Now: as F (1; β 1, the average size of an active component reached along an edge, increases, we move towards a transition to a giant component Right at the transition, the average component size explodes Exploding inverses of matrices occur when their determinants are 0 The condition is therefore: det A E, β1 = 0 Frame 17/25 Frame 18/25 Spreading on -correlated networks Spreading on -correlated networks General condition details: det A E, β1 = det ( δ jk R k 1 (k 1β k,1 e j 1,k 1 = 0 The above collapses to our standard contagion condition when e jk = R j R k When β 1 = β 1, we have the condition for a simple disease model s successful spread We ll next find two more pieces: 1 P trig, the probability of starting a cascade 2 S, the expected extent of activation given a small seed Triggering probability: det ( δ jk R k 1 β(k 1e j 1,k 1 = 0 When β 1 = 1, we have the condition for the existence of a giant component: Generating function: H(x; β 1 = x P k [F k 1 (x; β k 1 ] det ( δ jk R k 1 (k 1e j 1,k 1 = 0 Bonusville: We ll find another (possibly better version of this set of conditions later Frame 19/25 Generating function for vulnerable component size is more complicated Frame 20/25

Spreading on -correlated networks Want probability of not reaching a finite component P trig = S trig =1 H(1; β 1 =1 P k [F k 1 (1; β k 1 ] Last piece: we have to compute F k 1 (1; β 1 Nastier (nonlinear we have to solve the recursive expression we started with when x = 1: F j (1; β 1 = e jk R j (1 β k+1,1 + Iterative methods should work here e jk VOLUME 89, NUMBER 20 P H Y S I C A L R E V I E W L E T T E R S 11 NOVEMBER 2002 Equation (7 diverges at the point at which the determinant of A is zero This point marks the phase transition at which a giant component forms in our graph By considering the behavior of Eq (7 close to the transition, where hsi must be large and positive in the absence of a giant component, we deduce that a giant component exists in the network when deta > 0 This is the appropriate generalization for a network with assortative mixing of the criterion of Molloy and Reed [16] for the existence of a giant component To calculate the size S of the giant component, we define u k to be the probability that an edge connected to a vertex of remaining k leads to another vertex that does not belong to the giant component Then P S 1 p 0 X1 p k u k k 1 ; u k e jk u k k j : (8 k 1 Pk e jk R j β k+1,1 [F k (1; β 1 ] k Spreading on -correlated networks As before, these equations give the actual evolution of φ t for synchronous updates condition follows from θ t+1 = G( θ t equation Need small θ 0 to take off so we linearize G around θ 0 = 0 θ j,t+1 = G j ( 0 + k=1 G j ( 0 θ k,t θ k,t + k=1 Largest eigenvalue of G j ( 0 θ k,t must exceed 1 Condition for spreading is therefore dependent on eigenvalues of this matrix: G j ( 0 θ k,t = e j 1,k 1 (k 1(β k1 β k0 R j 1 Insert question from assignment 5 ( To test these results and to help Frame form21/25 a more complete picture of the properties of assortatively mixed networks, we have also performed computer simulations, generating networks with given values of e jk and measuring their properties directly Generating such networks is not entirely trivial One cannot simply draw a set of pairs j i ;k i for edges i from the distribution e jk, since such a set would almost certainly fail to satisfy the basic topological requirement that the number Assortativity of edges andending at vertices of k must be a multiple of k Instead, therefore we propose the following Monte Carlo algorithm for generating graphs First, we generate a random graph with the desired distribution according to the prescription given in Ref [16] Then we apply a Metropolis dynamics to the graph in which on each step we Assortativity choose at byrandom two edges, denoted by the vertex pairs, v 1 ;w 1 and v 2 ;w 2, that they connect We measure the remaining s j 1 ;k 1 and j 2 ;k 2 for these vertex pairs, and then replace the edges with two new ones v 1 ;v 2 and w 1 ;w 2 with probability 2 G j ( min 1; e j1 j 2 e k1 k 2 = e j1 k 1 e j2 k 2 This dynamics conserves the sequence, is ergodic on the set of graphs having 0 θk,t 2 θk,t 2 that + sequence, and, with the choice of acceptance probability above, satisfies detailed balance for state probabilities Q i e ji k i, and hence has the required edge distribution e jk as its fixed point As an example, consider the symmetric binomial form j k j k e jk N e j k = p j q k p k q j ; (9 j k where p q 1, > 0, and N 1 2 1 e 1= is a normalizing constant (The binomial probabilities p and q should not be confused with the quantities p k and q k introduced earlier This distribution is chosen for analytic tractability, although its behavior is also quite natural: the distribution of the sum j k of the s at the ends of an edge falls off as a simple Frame exponential, 23/25 while that sum is distributed between the two ends binomially, Spreading on -correlated networks Truly final piece: Find final size using approach of Gleeson [2], a generalization of that used for uncorrelated random networks Need to compute θ j,t, the probability that an edge leading to a j node is infected at time t the parameter p controlling the assortative mixing From Eq (3, the value of r is Evolution of edge activity probability: 8pq 1 r 2e 1= 1 2 p q ; (10 2 θ j,t+1 = G j ( θ t = φ 0 + (1 φ 0 which can take both positive and negative values, passing through zero when p p 0 1 2 k 1 e p 1 4 2 j 1,k 1 ( 0:1464 In Fig 1 we show the size of the giant component k 1 for graphs of this type as a function of the scale parameter, from bothrour j 1 numerical simulations i and the exact solution k=1 above As the figure i=0 shows, the two are in good agreement The three curves in the figure are for p 0:05, where the graph is disassortative, p p 0, where it is neutral (neither assortative nor disassortative, and p 0:5, where it is assortative k As becomes large we see the expected phase transition at which φ t+1 a giant = φcomponent 0 +(1 φ forms 0 ThereP are k two important points to notice about the figure First, i=0 the position of the phase transition moves lower as the graph becomes more assortative That is, the graph percolates more easily, creating a giant component, if the high vertices preferentially associate with other high ones Second, notice that, by contrast, the size of the giant component for large is smaller in the assortatively mixed network Overall active fraction s evolution: θ i k,t (1 θ k,t k 1 i β ki ( k θk,t i i (1 θ k,t k i β ki These findings are intuitively reasonable If the net- How the giant component changes with assortativity work mixes assortatively, then the high- vertices will tend to stick together in a subnetwork or core group of higher mean than the network as a whole It is reasonable to suppose that percolation would occur earlier within such a subnetwork Conversely, since percolation will be restricted to this subnetwork, it is not surprising giant component S 10 08 06 04 02 00 1 10 100 exponential parameter κ assortative neutral disassortative FIG 1 Size of the giant component as a fraction of graph from Newman, 2002 [3] size for graphs with the edge distribution given in Eq (9 The points are simulation results for graphs of N 100 000 vertices, while the solid lines are the numerical solution of Eq (8 Each point is an average over ten graphs; the resulting statistical errors are smaller than the symbols The values of p are 05 (circles, p 0 0:146 (squares, and 0:05 (triangles More assortative networks percolate for lower average s But disassortative networks end up with higher extents of spreading Frame 22/25 Frame 24/25 208701-3 208701-3

I B Efron and C Stein The jackknife estimate of variance The Annals of Statistics, 9:586 596, 1981 pdf ( J P Gleeson Cascades on correlated and modular random networks Phys Rev E, 77:046117, 2008 pdf ( M Newman Assortative mixing in networks Phys Rev Lett, 89:208701, 2002 pdf ( M E J Newman patterns in networks Phys Rev E, 67:026126, 2003 pdf ( Frame 25/25