Adventures in random graphs: Models, structures and algorithms

BCAM January 2011 1 Adventures in random graphs: Models, structures and algorithms Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu

BCAM January 2011 2 Complex networks Many examples Biology (Genomics, protonomics) Transportation (Communication networks, Internet, roads and railroads) Information systems (World Wide Web) Social networks (Facebook, LinkedIn, etc) Sociology (Friendship networks, sexual contacts) Bibliometrics (Co-authorship networks, references) Ecology (food webs) Energy (Electricity distribution, smart grids) Larger context of Network Science

BCAM January 2011 3 Objectives Identify generic structures and properties of networks Mathematical models and their analysis Understand how network structure and processes on networks interact What is new? Very large data sets now easily available! Dynamics of networks vs. dynamics on networks

BCAM January 2011 4 Bibliography (I): Random graphs N. Alon and J.H. Spencer, The Probabilistic Method (Second Edition), Wiley-Science Series in Discrete Mathematics and Optimization, John Wiley & Sons, New York (NY) 2000. A. D. Barbour, L. Holst and S. Janson, Poisson Approximation, Oxford Studies in Probability 2, Oxford University Press, Oxford (UK), 1992. B. Bollobás, Random Graphs, Second Edition, Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge (UK), 2001. R. Durrett, Random Graph Dynamics, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge (UK), 2007.

BCAM January 2011 5 M. Draief and L. Massoulié, Epidemics and Rumours in Complex Networks, Cambridge University Press, Cambridge (UK), 2009. D. Dubhashi and A. Panconesi, Concentration of Measure for the Analysis of Randomized algorithms, Cambridge University Press, New York (NY), 2009. M. Franceschetti and R. Meester, Random Networks for Communication: From Statistical Physics to Information Systems, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge (UK), 2007. S. Janson, T. Luczak and A. Ruciński, Random Graphs, Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley & Sons, 2000. R. Meester and R. Roy, Continuum Percolation, Cambridge University Press, Cambridge (UK), 1996.

BCAM January 2011 6 M.D. Penrose, Random Geometric Graphs, Oxford Studies in Probability 5, Oxford University Press, New York (NY), 2003.

BCAM January 2011 7 Bibliography (II): Survey papers R. Albert and A.-L. Barabási, Statistical mechanics of complex networks, Review of Modern Physics 74 (2002), pp. 47-97. M.E.J. Newman, The structure and function of complex networks, SIAM Review 45 (2003), pp. 167-256.

BCAM January 2011 8 Bibliography (III): Complex networks A. Barrat, M. Barthelemy and A. Vespignani, Dynamical Processes on Complex Networks, Cambridge University Press, Cambridge (UK), 2008. R. Cohen and S. Havlin, Complex Networks - Structure, Robustness and Function, Cambridge University Press, Cambridge (UK), 2010. D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, Cambridge University Press, Cambridge (UK) (2010). M.O. Jackson, Social and Economic Networks, Princeton University Press, Princeton (NJ), 2008.

BCAM January 2011 9 M.E.J. Newman, A.-L. Barabási and D.J. Watts (Editors), The Structure and Dynamics of Networks, Princeton University Press, Princeton (NJ), 2006.

BCAM January 2011 10 LECTURE 1

BCAM January 2011 11 Basics of graph theory

BCAM January 2011 12 What are graphs? With V a finite set, a graph G is an ordered pair (V, E) where elements in V are called vertices/nodes and E is the set of edges/links: E V V E(G) = E Nodes i and j are said to be adjacent, written i j, if e = (i, j) E, i, j V

BCAM January 2011 13 Multiple representations for G = (V, E) Set-theoretic Edge variables {ξ ij, i, j V } with 1 if (i, j) E ξ ij = 0 if (i, j) / E Algebraic Adjacency matrix A = (A ij ) with 1 if (i, j) E A ij = 0 if (i, j) / E

BCAM January 2011 14 Some terminology Simple graphs vs. multigraphs Directed vs. undirected (i, j) E if and only if (j, i) E No self loops (i, i) / E, i V Here: Simple, undirected graphs with no self loops!

BCAM January 2011 15 Types of graphs The empty graph Complete graphs Trees/forests A subgraph H = (W, F ) of G = (V, E) is a graph with vertex set W such that Cliques (Complete subgraphs) W V and F = E (W W )

BCAM January 2011 16 Labeled vs. unlabeled graphs A graph automorphism of G = (V, E) is any one-to-one mapping σ : V V that preserves the graph structure, namely (σ(i), σ(j)) E if and only if (i, j) E Group Aut(G) of graph automorphisms of G

BCAM January 2011 17 Of interest Connectivity and k-connectivity (with k 1) Number and size of components Isolated nodes Degree of a node: degree distribution/average degree, maximal/minimal degree Distance between nodes (in terms of number of hops): Shortest path, diameter, eccentricity, radius Small graph containment (e.g., triangles, trees, cliques, etc.) Clustering Centrality: Degree, closeness, in-betweenness

BCAM January 2011 18 For i, j in V, l ij = Shortest path length between nodes i and j in the graph G = (V, E) Convention: l ij = if nodes i and j belong to different components and l ii = 0. Average distance l Avg = 1 V ( V 1) i V j V l ij Diameter d(g) = max (l ij, i, j V )

BCAM January 2011 19 Eccentricity Ec(i) = max (l ij, j V ), i V Radius rad(g) = min (Ec(i), i V ) l Avg d(g) and rad(g) d(g) 2 rad(g)

BCAM January 2011 20 Centrality Q: How central is a node? Closeness centrality g(i) = 1 j V l, i V ij Betweenness centrality b(i) = k i, k j, σ kj (i) σ kj, i V

BCAM January 2011 21 Clustering Clustering coefficient of node i j i, k i, j k C(i) = ξ ijξ ik ξ kj j i, k i, j k ξ ijξ ik Average clustering coefficient C Avg = 1 n C(i) i V C = 3 Number of fully connected triples Number of triples

BCAM January 2011 22 Random graphs

BCAM January 2011 23 Random graphs? G(V ) Collection of all (simple free of self-loops undirected) graphs with vertex set V. Definition Given a probability triple (Ω, F, P), a random graph is simply a graph-valued rv G : Ω G(V ). Modeling We need only specify the pmf {P [G = G], G G(V )}. Many, many ways to do that!

BCAM January 2011 24 Equivalent representations for G Set-theoretic Link assignment rvs {ξ ij, i, j V } with 1 if (i, j) E(G) ξ ij = 0 if (i, j) / E(G) Algebraic Random adjacency matrix A = (A ij ) with 1 if (i, j) E(G) A ij = 0 if (i, j) / E(G)

BCAM January 2011 25 Why random graphs? Useful models in many applications to capture binary relationships between participating entities Because V ( V 1) G(V ) = 2 2 V 2 2 2 A very large number! there is a need to identify/discover typicality! Scaling laws Zero-one laws as V becomes large, e.g., V V n = {1,..., n} (n )

BCAM January 2011 26 Ménagerie of random graphs Erdős-Renyi graphs G(n; m) Erdős-Renyi graphs G(n; p) Generalized Erdős-Renyi graphs Geometric random models/disk models Intrinsic fitness and threshold random models Random intersection graphs Growth models: Preferential attachment, copying Small worlds Exponential random graphs Etc

BCAM January 2011 27 Erdős-Renyi graphs G(n; m) With 1 m ( ) n 2 = n(n 1), 2 the pmf on G(V n ) is specified by u(n; m) 1 if E(G) = 2m P [G(n; m) = G] = 0 if E(G) 2m where u(n; m) = ( n(n 1) ) 2 m

BCAM January 2011 28 Uniform selection over the collection of all graphs on the vertex set {1,..., n} with exactly m edges

BCAM January 2011 29 Erdős-Renyi graphs G(n; p) With 0 p 1, the link assignment rvs {χ ij (p), 1 i < j n} are i.i.d. {0, 1}-valued rvs with P [χ ij (p) = 1] = 1 P [χ ij (p) = 0] = p, 1 i < j n For every G in G(V ), P [G(n; p) = G] = p EG 2 (1 p) n(n 1) 2 EG 2

BCAM January 2011 30 Related to, but easier to implement than G(n; m) Similar behavior/results under the matching condition E(G(n; m)) = E [ E(G(n; p)) ], namely m = n(n 1) p 2

BCAM January 2011 31 Generalized Erdős-Renyi graphs With 0 p ij 1, 1 i < j n the link assignment rvs {χ ij (p), 1 i < j n} are mutually independent {0, 1}-valued rvs with P [χ ij (p ij ) = 1] = 1 P [χ ij (p ij ) = 0] = p ij, 1 i < j n An important case: With positive weights w 1,..., w n, p ij = w iw j W with W = w 1 +... + w n

BCAM January 2011 32 Geometric random graphs (d 1) With random locations in R d at X 1,..., X n, the link assignment rvs {χ ij (ρ), 1 i < j n} are given by χ ij (ρ) = 1 [ X i X j ρ ], 1 i < j n where ρ > 0.

BCAM January 2011 33 Usually, the rvs X 1,..., X n are taken to be i.i.d. rvs uniformly distributed over some compact subset Γ R d Even then, not so obvious to write P [G(n; ρ) = G], G G(V ). since the rvs {χ ij (ρ), 1 i < j n} are no more i.i.d. rvs For d = 2, long history for modeling wireless networks (known as the disk model) where ρ interpretated as transmission range

BCAM January 2011 34 Threshold random graphs Given R + -valued i.i.d. rvs W 1,..., W n with absolutely continuous probability distribution function F, for some θ > 0 i j if and only if W i + W j > θ Generalizations: i j if and only if R(W i, W j ) > θ for some symmetric mapping R : R 2 + R +.

BCAM January 2011 35 Random intersection graphs Given a finite set W {1,..., W } of features, with random subsets K 1,..., K n of W, i j if and only if K i K j Co-authorship networks, random key distribution schemes, classification/clustering

BCAM January 2011 36 Growth models with rules {G t, t = 0, 1,...} and V t+1 V t G t+1 (G t, V t+1 ) Preferential attachment Copying Scale-free networks

BCAM January 2011 37 Small worlds Between randomness and order Shortcuts Short paths but high clustering Milgram s experiment and six degrees of separation

BCAM January 2011 38 Exponential random graphs Models favored by sociologists and statisticians Graph analog of exponential families often used in statistical modeling Related to Markov random fields

BCAM January 2011 39 With I parameters θ = (θ 1,..., θ I ) and a set of I observables (statistics) we postulate with normalization constant u i : G(V ) R +, P [G = G] = ep I i=1 θ iu i (G), G G(V ) Z(θ) Z(θ) = G G(V ) e P I i=1 θ iu i (G)

BCAM January 2011 40 In sum Many different ways to specify the pmf on G(V ) Local description vs. global representation Static vs. dynamic Application-dependent mechanisms