Erdős-Renyi random graphs basics

Similar documents
Lecture 06 01/31/ Proofs for emergence of giant component

Almost giant clusters for percolation on large trees

A simple branching process approach to the phase transition in G n,p

1 Mechanistic and generative models of network structure

The range of tree-indexed random walk

6.207/14.15: Networks Lecture 3: Erdös-Renyi graphs and Branching processes

Scaling limits for random trees and graphs

Modern Discrete Probability Branching processes

Random Graphs. 7.1 Introduction

Hard-Core Model on Random Graphs

Branching processes. Chapter Background Basic definitions

THE SECOND LARGEST COMPONENT IN THE SUPERCRITICAL 2D HAMMING GRAPH

Spectral asymptotics for stable trees and the critical random graph

Critical percolation on networks with given degrees. Souvik Dhara

Discrete random structures whose limits are described by a PDE: 3 open problems

w i w j = w i k w k i 1/(β 1) κ β 1 β 2 n(β 2)/(β 1) wi 2 κ 2 i 2/(β 1) κ 2 β 1 β 3 n(β 3)/(β 1) (2.2.1) (β 2) 2 β 3 n 1/(β 1) = d

Chapter 3. Erdős Rényi random graphs

WXML Final Report: Chinese Restaurant Process

Latent voter model on random regular graphs

MIXING TIMES OF RANDOM WALKS ON DYNAMIC CONFIGURATION MODELS

Eötvös Loránd University, Budapest. 13 May 2005

UPPER DEVIATIONS FOR SPLIT TIMES OF BRANCHING PROCESSES

Effect of scale on long-range random graphs and chromosomal inversions

Gibbs distributions for random partitions generated by a fragmentation process

On trees with a given degree sequence

Critical behavior in inhomogeneous random graphs

ANATOMY OF THE GIANT COMPONENT: THE STRICTLY SUPERCRITICAL REGIME

On Martingales, Markov Chains and Concentration

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

The phase transition in the Erdös-Rényi random graph model

The genealogy of branching Brownian motion with absorption. by Jason Schweinsberg University of California at San Diego

Weakly interacting particle systems on graphs: from dense to sparse

Phase Transitions in Random Discrete Structures

Poisson Processes. Stochastic Processes. Feb UC3M

4.5 The critical BGW tree

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Resistance Growth of Branching Random Networks

Manual for SOA Exam MLC.

Random Graphs. EECS 126 (UC Berkeley) Spring 2019

Homework 6: Solutions Sid Banerjee Problem 1: (The Flajolet-Martin Counter) ORIE 4520: Stochastics at Scale Fall 2015

Mixing time and diameter in random graphs

BRANCHING PROCESSES 1. GALTON-WATSON PROCESSES

Self Similar (Scale Free, Power Law) Networks (I)

LogFeller et Ray Knight

Itô s excursion theory and random trees

Survival Probabilities for N-ary Subtrees on a Galton-Watson Family Tree

Random trees and applications

14 Branching processes

ADFOCS 2006 Saarbrücken. Phase Transition. Joel Spencer

BIRTHDAY PROBLEM, MONOCHROMATIC SUBGRAPHS & THE SECOND MOMENT PHENOMENON / 23

Hitting Probabilities

Yaglom-type limit theorems for branching Brownian motion with absorption. by Jason Schweinsberg University of California San Diego

RANDOM WALKS AND THE PROBABILITY OF RETURNING HOME

High temperature regime in spatial random permutations. Lorenzo Taggi, TU Darmstadt (Joint work with Volker Betz, TU Darmstadt)

Universal examples. Chapter The Bernoulli process

Improved diffusion Monte Carlo

arxiv: v1 [math.co] 7 Mar 2018

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

CS1800: Strong Induction. Professor Kevin Gold

3. The Voter Model. David Aldous. June 20, 2012

Networks: Lectures 9 & 10 Random graphs

Random Geometric Graphs

Random trees and branching processes

Scaling limits of random trees and random graphs

The main results about probability measures are the following two facts:

Random graphs: Random geometric graphs

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Random topology and geometry

Introduction to Random Diffusions

Tianjin. June Phase Transition. Joel Spencer

STEP Support Programme. Pure STEP 1 Questions

Solutions to November 2006 Problems

Physics 6720 Introduction to Statistics April 4, 2017

Unit 01 Motion with constant velocity. What we asked about

On the backbone exponent

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Jim Pitman. Department of Statistics. University of California. June 16, Abstract

The Contour Process of Crump-Mode-Jagers Branching Processes

A class of trees and its Wiener index.

Random self-similar trees and their applications

6.207/14.15: Networks Lecture 4: Erdös-Renyi Graphs and Phase Transitions

µ X (A) = P ( X 1 (A) )

An essay on the general theory of stochastic processes

Theory of Computation

Component structure of the vacant set induced by a random walk on a random graph. Colin Cooper (KCL) Alan Frieze (CMU)

Mandelbrot s cascade in a Random Environment

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Expansion and Isoperimetric Constants for Product Graphs

Sharpness of second moment criteria for branching and tree-indexed processes

Simulation methods for stochastic models in chemistry

Proof Techniques (Review of Math 271)

Connection to Branching Random Walk

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

1 Complex Networks - A Brief Overview

9 Brownian Motion: Construction

The Wright-Fisher Model and Genetic Drift

Randomness and Computation

COLLEGE ALGEBRA. Paul Dawkins

Random Walks on Hyperbolic Groups IV

Transcription:

Erdős-Renyi random graphs basics Nathanaël Berestycki U.B.C. - class on percolation We take n vertices and a number p = p(n) with < p < 1. Let G(n, p(n)) be the graph such that there is an edge between any two given vertices i and j with probability p(n), and the existence of an edge between two vertices are independent events for different pairs of vertices. We denote by P p the law of this random variable on the space of all graphs with n vertices. (The graph G(n, p(n)) is now known as an Erdős-Renyi random graph). The question asked by Erdős and Renyi was the following: what can be said about the properties of the graph for various scaling of the function p(n)? 1 Connectivity Our first example concerns connectivity of the random graph. Let A denote the event that the graph is connected. Theorem 1. Erdős and Renyi (196). Let ε >. If p = (1 ε) log n/n then P p (A), and if p = (1 + ε) log n/n then P p (A) 1. We won t prove this result, but it is easy to see that connectivity cannot occur for p smaller than (1 ε) log n/n, since simple calculations show that some vertex is still isolated. In fact the random graph becomes connected when the last isolated vertex connects to something else. In order to get a more exciting phase transition, closer to the percolation one, we need to look at different scalings of p. 2 The giant component The result announced concerns the emergence of a giant cluster that encompasses a positive fraction of all vertices much before the graph becomes completely connected. More precisely, the size of the largest connected component has a phase transition at p = c/n for c = 1. 1

Theorem 2. (Erdős-Renyi, 1959). Suppose p(n) = c/n for c >. As n, Regime Size of largest component c < 1 (subcritical regime) 3 (1 c) 2 log n c = 1 (critical regime) O(n 2/3 ) c > 1 (supercritical regime) Largest = θ(c)n Second = O(log n). Here, θ(c) is the probability of survival for a branching process whose offspring distribution is a Poisson random variable with mean c. This has the precise meaning that the ratio of the size of the largest component to 3(1 c) 2 log n (when c < 1), or to θ(c)n (when c > 1), converges to 1 in probability. When c = 1 there is a non-degenerate limit in distribution, although a satisfactory description of the limiting distribution has only recently been given by Aldous (1997). 2.1 Proof: The branching process approximation The proof of Theorem 2 relies on a limit theorem for the size of the connected component containing a distinguished vertex, which can be identified as the total progeny of a branching process with offspring distribution a Poisson random variable with mean c (abbreviated into P GW (c), for Poisson Galton-Watson process). This limit theorem will also be very useful in the understanding of the random transposition random walk. Recall that (Z i, i ) is a P GW (c) process if it is a Markov chain on the nonnegative integers N, with transition probabilities p(k, ) = µ k ( ) where µ is the Poisson(c) distribution and µ k denotes k fold convolution of µ with itself. Let Z = i= Z i be the total progeny of this process when started with Z = 1 individual. Proposition 1. Let C be the cluster that contains vertex 1. Let c > and p = c/n. Then as n P p ( C = k) P (Z = k) Proof. The number of children of vertex 1, Z 1 n = Y 1 has distribution Binomial (n 1, c/n) since it has n 1 potential neighbors, each being an actual neighbor with probability c/n. This converges to a Poisson(c) limit by elementary computations with Stirling s formula. We can proceed one generation further: given that 1 is connected to exactly n 1 other vertices, each of the n 1 children is connected to a Binomial(n 1 n 1, c/n) number of individuals. This also converges to Poisson(c) distribution. However this time we need to be a little 2

careful, since some of these children in the second generation could be the same for different individuals in the first. An easy calculation shows that these can be neglected asymptotically. Proceeding like this with further generations gives the proof. Armed with this Proposition, we now finish the sketch of the proof of Theorem 2. When c < 1, the expected number of offsprings is < 1 so the branching process dies out. Roughly, there are n clusters of finite size Z with exponential tails, so assuming independence of the size of different clusters gives that the largest cluster is of order O(log n). When c > 1, there is a probability 1 θ(c) that P GW (c) dies out. This means that for roughly (1 θ(c))n vertices, their cluster dies very quickly. For the remaining θ(c)n vertices, their cluster grows quite large, but this observation is not enough to understand the existence of the giant component: it remains to see why there is only one such large component. 2.2 Uniqueness of the giant cluster We need to see why all vertices whose cluster is quite large are actually part of the same cluster. In the Proposition above, we proved finite-dimensional distribution convergence toward those of a P GW (c). However a more sophisticated argument (a consequence of the birthday problem) shows that this approximation is reasonable at least until n 1/2 ω(n) have been exposed in this growing procedure, where ω(n) slowly enough. Let x and y be two vertices whose clusters are larger than n 1/2 ω(n), and suppose that by the time each has size n 1/2 ω(n), they haven t intersected yet. We show that it is likely that they will intersect at the next generation. Indeed, there are roughly n 1/2 ω(n) vertices in the last generation of each cluster, which means that there are potentially N = ( ) n 1/2 ω(n) 2 edges between these two sets. Therefore, the probability that there is no edge between the two sets of vertices is smaller than (1 c/n) N exp( ω(n) 2 ) which explains the uniqueness of the giant component. For more on this approach (and real proofs), see the recent notes of Durrett (25). 3 Exact asymptotics for C Let C be the cluster containing a distinguished vertex, say 1. Let p = c/n. Theorem 3. For k 1, P ( C = k) 1 c Borel distribution. k k 1 (ce c ) k. This is known as the k! Since we have already identified that C d Z, this means that Corollary 1. Let Z be the total progeny of a P GW (c). Then Z has a Borel (c) distribution. 3

3.1 Proof: combinatorics We start by doing a different problem and count the expected number N(k) of clusters of a given size k in the random graph. Using the previous results we may simplify the problem and only count the number of trees of size k. Each given tree on k given vertices has probability ( c k 1 ( 1 n) c ) ( k 2) (k 1)+k(n k) n n (k 1) 1 c (ce c ) k to be one of the connected components of the random graphs. By Cayley s (1889) formula, there are k k 2 ways to draw a tree on k given vertices, so, since ( n k) n k /k! for fixed k, the expected number of clusters of size k is, approximately, E(N(k)) n 1 c Now, observe that by exchangeability, which gives the result. k k 2 k! (ce c ) k P ( C = k) = E( kn(k) ) n 4 A closer look at the critical regime In (1997) David Aldous gave a very precise description of the component sizes of the random graph at and near the point of its phase transition. Remember that the critical value of p is p = 1/n. Aldous takes for λ R, that is c = 1 + λn 1/3. p = 1 n + λn 4/3 = 1 n (1 + λn 1/3 ) To state his result, we call K1 n K2 n... the ordered component sizes of G(n, p). Let (B s, s ) be a standard one-dimensional Brownian motion. Call X the process such that X t = B t + (λ s)ds = B t + λt t 2 /2 X t is a Brownian motion with a parabolic drift. Let X be the process obtained by reflecting X above its infimum: X t = X t inf s t X s If X was Brownian motion then an old result of Paul Lévy says that X would have the same distribution as the absolute value B t of a Brownian motion, but here X is a bit more complicated because of the parabolic drift. Having made these definitions, we may state Aldous result: 4

Theorem 4. Fix λ R. As n {n 2/3 K n j ; j 1} d {L j ; j 1} where L j are the ranked lengths of the excursions of X away from. Here we don t worry too much about the sense of the convergence in distribution, but if you are curious, to know the exact meaning of this statement we need to put a metric on the space of nonincreasing nonnegative sequences, which here is the l 2 metric. The sketch of proof given here is inspired by Durrett s notes (25), but they follow quite closely Aldous original arguments. The idea is to expose the vertices of a cluster one at a time instead of generation per generation when we did the branching process approximation. We proceed in discrete time and start at vertex 1. We split the vertices into three categories. We will have a set of removed sites R t, corresponding to vertices whose connections have been entirely explored by time t. Thus initially R =. Then we will have a set of active sites A t corresponding to sites who have discovered by time t but their connections have not been completely explored (hence initially A = {1}). Finally, there will be the set of unexplored sites U t, consisting of all the other vertices in the graph. We explore the cluster one vertex at a time, this vertex being in the active set. If it has neighbours that were previously never discovered before, we add them to the active sites and remove the one being explored from the list of active sites. This process is called breadth-first walk by computer scientists. Note that since we explore one vertex at a time we always have that U t = t. When we are exploring the last vertex of a cluster we take the smallest integer in U t and then begin exploring its cluster. Note that as long as we are exploring a cluster, A t > and A t = only at the beginning and at the end of this exploration, so that the size of the cluster is exactly the length of the excursion of A t away from. In fact to be consistent with ourselves we will artificially subtract 1 to A t each time we finish the exploration of a cluster. We call a t the resulting process. Then the sizes of the clusters are the lengths of the excursions of a t away from its infimum. Hence the result will be proved (... or at least plausible!) if we show that after speeding time by a factor of n 2/3 (to scale the size of the components) and the increments of a t by a factor of n 1/3 in order to have a random walk scaling, x s = n 1/3 a sn 2/3 d X s On the other hand, note that the increments of a t are given by the number of neighbors of the active site being explored. Since at time t it has n t a t potential new neighbors, each with probability p, E( a t F t ) = 1 + (n t a t )p while var ( a t F t ) = (n t a t )p(1 p) 5

After taking p = (1 + λn 1/3 )/n and speeding up time by a factor of n 2/3 and space by n 1/3 : E( x s F sn 2/3 ) = (λ s)n 2/3 + o(n 2/3 ) (λ s)ds since now the time-increments are ds = n 2/3. On the other hand In other words, asymptotically is a martingale and var ( x s F sn 2/3 ) n 2/3 = ds M t = x t M 2 t (λ s)ds ds = M 2 t t is another martingale. Hence asymptotically, x s is continuous and satisfies the martingale problem 1 associated with the parabolic drift and diffusion coefficient equal to 1, which is the process X t. By uniqueness in law of the solution to the martingale problem, we conclude that x t d X t as n, which finishes the proof of Aldous result. In fact, Aldous results are stronger than what is stated here. In particular, a simple modification of the arguments also yield results about the corresponding complexity of the component and also allows to discuss the much more complex problem of the dynamics of the components as λ evolves from to +. To do this he introduced a process called the multiplicative coalescent. 1 If you need to refresh your memory on martingale problems, here is a quick reminder. Recall that a process continuous process X is said to satisfy the martingale problem associated with the functions f and σ if M t = X t f(s, X s)ds is a martingale and Mt 2 σ 2 (s, X s)ds is another martingale. It can be shown that if X satisfies the (f, σ)-martingale problem and that f and σ are nice enough, then X is in fact a solution to the stochastic differential equation: dx t = σ(t, X t )dw t + f(t, X t )dt 6