arxiv: v2 [math.pr] 22 Oct 2018

Size: px

Start display at page:

Download "arxiv: v2 [math.pr] 22 Oct 2018"

Hope Whitehead
5 years ago
Views:

1 Breaking of ensemble equivalence for perturbed Erdős-Rényi random graphs arxiv: v [math.pr] Oct 08 F. den Hollander M. Mandjes A. Roccaverde 3 N.J. Starreveld 4 October 3, 08 Abstract In a previous paper we analysed a simple undirected random graph subject to constraints on the total number of edges and the total number of triangles. We considered the dense regime in which the number of edges per vertex is proportional to the number of vertices. We showed that, as soon as the constraints are frustrated, i.e., do not lie on the Erdős-Rényi line, there is breaking of ensemble equivalence, in the sense that the specific relative entropy per edge of the microcanonical ensemble with respect to the canonical ensemble is strictly positive in the limit as the number of vertices tends to infinity. In the present paper we analyse what happens near the Erdős-Rényi line. It turns out that the way in which the specific relative entropy tends to zero depends on whether the total number of triangles is slightly larger or slightly smaller than typical. We investigate what the constrained random graph looks like asymptotically in the microcanonical ensemble. MSC 00: 05C80, 60K35, 8B0. Key words: Erdős-Rényi random graph, Gibbs ensembles, relative entropy, graphon, breaking of ensemble equivalence. Acknowledgement: The research in this paper was supported through NWO Gravitation Grant NETWORKS Introduction In this paper we analyse random graphs that are subject to constraints. Statistical physics prescribes what probability distribution on the set of graphs we should choose when we want to model a given type of constraint []. Two important choices are: () The microcanonical ensemble, where the constraints are hard (i.e., are satisfied by each individual graph). () The canonical ensemble, where the constraints are soft (i.e., hold as ensemble averages, while individual graphs may violate the constraints). Mathematical Institute, Leiden University, P.O. Box 95, 300 RA Leiden, The Netherlands denholla@math.leidenuniv.nl Korteweg de-vries Institute, University of Amsterdam, P.O. Box 9448, 090 GE Amsterdam, The Netherlands m.r.h.mandjes@uva.nl 3 Mathematical Institute, Leiden University, P.O. Box 95, 300 RA Leiden, The Netherlands a.roccaverde@math.leidenuniv.nl 4 Korteweg de-vries Institute, University of Amsterdam, P.O. Box 9448, 090 GE Amsterdam, The Netherlands n.j.starreveld@uva.nl

2 For random graphs that are large but finite, the two ensembles are obviously different and, in fact, represent different empirical situations. Each ensemble represents the unique probability distribution with maximal entropy respecting the constraints. In the limit as the size of the graph diverges, the two ensembles are traditionally assumed to become equivalent as a result of the expected vanishing of the fluctuations of the soft constraints, i.e., the soft constraints are expected to behave asymptotically like hard constraints. This assumption of ensemble equivalence is one of the corner stones of statistical physics, but it does not hold in general (see [3] for more background). In a series of papers the question of possible breaking of ensemble equivalence was investigated for various choices of the constraints, including the degree sequence and the total number of edges, wedges and triangles. Both the sparse regime (where the number of edges per vertex remains bounded) and the dense regime (where the number of edges per vertex is of the order of the number of vertices) have been considered. The effect of community structure on ensemble equivalence has been investigated as well. Relevant references are [3], [4], [5], [30] and [3]. In [5] we considered a random graph subject to constraints on the total number of edges and the total number of triangles, in the dense regime. With the help of large deviation theory for graphons, we derived a variational formula for s = lim n n s n, where n is the number of vertices and s n is the relative entropy of the microcanonical ensemble with respect to the canonical ensemble. We found that s > 0 when the constraints are frustrated. In the present paper we analyse the behaviour of s when the constraints are close to but different from those of the Erdős-Rényi random graph, and we identify what the constrained random graph looks like asymptotically in the microcanonical ensemble. It turns out that the behaviour changes when the total number of triangles is larger, respectively, smaller than that of the Erdős-Rényi random graph with a given total number of edges. While breaking of ensemble equivalence is a relatively new concept in the theory of random graphs, there are many studies on the asymptotic structure of random graphs. In the pioneering work [9], followed by [], the large deviation principle for dense Erdős-Rényi random graphs is proven and the asymptotic structure of constrained Erdős-Rényi random graphs is described as the solution of a variational problem. In the past few years significant progress has been made regarding sparse random graphs as well. We refer the reader to [8], [0], [3] and [36]. Two other random graph models that have been extensively studied are the exponential random graph model and the constrained exponential random graph model. Exponential random graphs, which are related to the canonical ensemble we consider in this paper, have been analysed in [], [3], [7], [7], [9], [33] and [35]. In [3] Constrained exponential random graphs have been analysed in [], [7], [] and [34]. In [7], [8], [4] and [8]. The asymptotic structure of graphs drawn from the microcanonical ensemble is investigated for various values of the constraints on the edge density and the triangle density. In [8] the authors study the behavior of random graphs with edge and triangle densities close to the Erdős-Rényi curve. They manage to identify the scaling behavior close to the curve by proving a bound on the entropy function. We extend their results and find the exact behavior of constrained random graphs close to the Erdős-Rényi curve. The same question has been dealt with in [4] for a constraint on the edge and triangle density close to the lower boundary curve of the admissibility region. In [8] the authors manage to determine, through extensive simulations, regions where phase transitions in the structure of constrained random graphs occur. The remainder of this paper is organised as follows. In Section we define the two ensembles, give the definition of equivalence of ensembles in the dense regime, and recall some basic facts about graphons. In Section.4 we recall the variational representation of s derived in [5] when the constraints are on the total numbers of subgraphs drawn from a finite collection of subgraphs. We also recall the analysis of s in [5] for the special case where the subgraphs are the edges and the triangles. In Section 3 we state our main theorems. Proofs are given in Sections?? and 5. Definitions and preliminaries In Section. we give the formal definition of the two ensembles we are interested in and give our definition of equivalence of ensembles in the dense regime. In Section. we recall some basic facts

3 about graphons, in Section.3 we present some basic properties of the canonical ensemble and in Section.4 we give a variational characterisation of ensemble equivalence proven in [5].. Microcanonical ensemble, canonical ensemble, relative entropy For n N, let G n denote the set of all (n ) simple undirected graphs with n vertices. Any graph G Gn can be represented by a symmetric n n matrix with elements { h G if there is an edge between vertex i and vertex j, (i, j) := (.) 0 otherwise. Let C denote a vector-valued function on G n. We choose a specific vector C, which we assume to be graphical, i.e., realisable by at least one graph in G n. For this C the microcanonical ensemble is the probability distribution P mic on G n with hard constraint C defined as P mic (G) := { /Ω C, if C(G) = C, 0, otherwise, G G n, (.) where Ω C := {G G n : C(G) = C } (.3) is the number of graphs that realise C. The canonical ensemble P can is the unique probability distribution on G n that maximises the entropy S n (P) := G G n P(G) log P(G) (.4) subject to the soft constraint C = C, where This gives the formula [6] with C := G G n C(G) P(G). (.5) P can (G) := θ Z( θ ) eh(, C(G)), G G n, (.6) H( θ, C(G)) := θ C(G), Z( θ ) := G G n e θ C(G), (.7) denoting the Hamiltonian and the partition function, respectively. In (.6) (.7) the parameter θ, which is a real-valued vector whose size is equal to the number of constraints, must be set to the unique value that realises C = C. As a Lagrange multiplier, θ always exists, but uniqueness is non-trivial. In the sequel we will only consider examples where the gradients of the constraints in (.5) are linearly independent vectors. Consequently, the Hessian matrix of the entropy of the canonical ensemble in (.6) is a positive-definite matrix, which implies uniqueness. The relative entropy of P mic with respect to P can is defined as S n (P mic P can ) := P mic (G) log P mic(g) P can (G). (.8) G G n For any G, G G n, P can (G ) = P can (G ) whenever C(G ) = C(G ), i.e., the canonical probability is the same for all graphs with the same value of the constraint. We may therefore rewrite (.8) as S n (P mic P can ) = log P mic(g ) P can (G ), (.9) 3

4 where G is any graph in G n such that C(G ) = C (recall that we assumed that C is realisable by at least one graph in G n ). All the quantities above depend on n. In order not to burden the notation, we exhibit this n-dependence only in the symbols G n and S n (P mic P can ). When we pass to the limit n, we need to specify how C(G), C and θ are chosen to depend on n. We refer the reader to [5] where this issue has been discussed in detail. Definition. In the dense regime, if then P mic and P can are said to be equivalent. s := lim n S n(p mic P can ) = 0, (.0) Remark. In [3], which was concerned with the sparse regime, the relative entropy was divided by n (the number of vertices). In the dense regime, however, it is appropriate to divide by n (the order of the number of edges).. Graphons There is a natural way to embed a simple graph on n vertices in a space of functions called graphons. Let W be the space of functions h: [0, ] [0, ] such that h(x, y) = h(y, x) for all (x, y) [0, ]. A finite simple graph G on n vertices can be represented as a graphon h G W in a natural way as (see Fig. ) { h G if there is an edge between vertex nx and vertex ny, (x, y) := (.) 0 otherwise. 3 4 y h G (x,y) =, on h G (x,y) = 0, else x Figure : An example of a graph G and its graphon representation h G. The space of graphons W is endowed with the cut distance d (h, h ) := sup dx dy [h (x, y) h (x, y)], h, h W. (.) S,T [0,] S T On W there is a natural equivalence relation. Let Σ be the space of measure-preserving bijections σ : [0, ] [0, ]. Then h (x, y) h (x, y) if h (x, y) = h (σx, σy) for some σ Σ. This equivalence relation yields the quotient space ( W, δ ), where δ is the metric defined by δ ( h, h ) := inf σ,σ Σ d (h σ, hσ ), h, h W. (.3) As noted above, we suppress the n-dependence. Thus, by G we denote any simple graph on n vertices, by h G its image in the graphon space W, and by h G its image in the quotient space W. For a more detailed description of the structure of the space ( W, δ ) we refer the reader to [4], [5], []. In the 4

5 sequel we will deal with constraints on the edge and triangle density. In the space W the edge density and the triangle density of a graphon h are defined by T (h) = dx dx h(x, x ), [0,] T (h) = dx dx dx 3 h(x, x )h(x, x 3 )h(x 3, x ). [0,] 3 (.4) For an element h of the quotient space W we define the edge and triangle density by T ( h) = T (h), and T ( h) = T (h), where h is any representative element of the equivalence class h..3 Subgraph counts Label the simple graphs in any order, e.g., F is an edge, F is a wedge, F 3 is triangle, etc. Let C k (G) denote the number of subgraphs F k in G. In the dense regime, C k (G) grows like n V k, where V k = V (F k ) is the number of vertices in F k. For m N, consider the following scaled vector-valued function on G n : ( ) m ( ) m p(fk )C k (G) C(G) := n V = n p(fk )C k (G) k k= n V. (.5) k k= The term p(f k ) counts the edge-preserving permutations of the vertices of F k, i.e., p(f ) = for an edge, p(f ) = for a wedge, p(f 3 ) = 6 for a triangle, etc. The term C k (G)/n V k represents a subgraph density in the graph G. The additional n guarantees that the full vector scales like n, the scaling of the large deviation principle for graphons in the Erdős-Rényi random graph derived in [9]. For a simple graph F k, let hom(f k, G) be the number of homomorphisms from F k to G, and define the homomorphism density as t(f k, G) := hom(f k, G) = p(f k)c k (G), (.6) n V k which does not distinguish between permutations of the vertices. Hence the Hamiltonian becomes H( θ, T m (G)) = n θ k t(f k, G) = n ( θ T (G)), G G n, (.7) k= n V k where T (G) := (t(f k, G)) m k=. (.8) The canonical ensemble with parameter θ thus takes the form [ ] P can (G θ ) := e θ n T (G) ψn( θ ), G G n, (.9) where ψ n replaces the partition function Z( θ): ψ n ( θ) := n log G G n e n ( θ T (G)) = n log Z( θ). (.0) In the sequel we take θ equal to a specific value θ, so as to meet the soft constraint, i.e., T = G G n T (G) Pcan (G) = T. (.) The canonical probability then becomes P can (G) = P can (G θ ). (.) 5

6 Both the constraint T and the Lagrange multiplier θ in general depend on n, i.e., T = T n and θ = θ n. We consider constraints that converge when we pass to the limit n, i.e., Consequently, we expect that lim T n = T. (.3) n lim θ n = θ. (.4) n Throughout the sequel we assume that (.4) holds. If convergence fails, then we may still consider subsequential convergence. The subtleties concerning (.4) are discussed in [5, Appendix A]..4 Variational characterisation of ensemble equivalence The expression in (.7) can be written in terms of graphons as H( θ, T m (G)) = n θ k t(f k, h G ). (.5) With this scaling the hard constraint T has the interpretation of the density of an observable quantity in G, and defines a subspace of the quotient space W, which we denote by W, and which consists of all graphons that meet the hard constraint, i.e., k= W := { h W : T (h) = T }. (.6) The soft constraint in the canonical ensemble becomes T = T (recall (.5)). Recall that for n N we write θ for θ n. In order to characterise the asymptotic behavior of the two ensembles, the entropy function of a Bernoulli random variable is essential. For u [0, ] we define I(u) := u log u + ( u) log( u). (.7) We extend the domain of this function to the graphon space W by defining I(h) := dx dy I(h(x, y)) (.8) [0,] (with the convention that 0 log 0 = 0). On the quotient space ( W, δ ) we define I( h) = I(h), where h is any element of the equivalence class h. In order to keep the notation minimal we use I( ) for both (.7) and (.8). Depending on the argument of the function it will be clear which of the two is considered. The key result in [5] is the following variational formula for s. Theorem.3 [5] Subject to (.), (.3) and (.4), with lim n n S n (P mic P can ) =: s (.9) [ s = sup θ T [ ( h) I( h)] sup θ T ] ( h) I( h). (.30) h W h W Theorem.3 and the compactness of W give us a variational characterisation of ensemble equivalence: s = 0 if and only if at least one of the maximisers of θ T ( h) I( h) in W also lies in W W. Equivalently, s = 0 when at least one the maximisers of θ T ( h) I( h) satisfies the hard constraint. Theorem.3 allows us to identify examples where ensemble equivalence holds (s = 0) or is broken (s > 0). In [5] a detailed analysis was given for the special case where the constraint is on the total number of edges and the total number of triangles. The analysis in [5] relied on the large deviation principle for dense Erdős-Rényi random graphs established in [9]. The function defined in (.7) plays a crucial role and is related to the rate function of the large deviation principle. 6

7 Theorem.4 [5] For the edge-triangle model, s = 0 when T = T 3, 0 < T and T = 0, while s > 0 when T T 3 and T 8, T T 3, 0 < T and 0 < T < 8, (T, T ) lies on the scallopy curve in Fig.. Here, T, T are in fact the limits T,, T, in (.3), but in order to keep the notation light we now also suppress the index. (0,) (,) s = 0 s > 0 s > 0 triangle density T T = T 3 T = T 3 s =? (0, 8 ) (0,0) (,0) edge density T T = T (T ) (,0) Figure : The admissible edge-triangle density region is the region on and between the blue curves [8]. Theorem.4 is illustrated in Fig.. The region on and between the blue curves corresponds to the choices of (T, T ) that are graphical, i.e., there exists a graph with edge density T and triangle density T. The red curves represent ensemble equivalence, the blue curves and the grey region represent breaking of ensemble equivalence, while in the white region between the red curve and the lower blue curve we do not know what happens. Breaking of ensemble equivalence arises from frustration between the values of T and T. The lower blue curve, called the scallopy curve, consist of infinitely many pieces labelled by l N\{}. The l-th piece corresponds to T ( l l, l l+ ] and a T that is a computable but non-trivial function of T (see [5], [7], [8]). The structure of the graphs drawn from the microcanonical ensemble was determined in [5] and [8]: the vertex set can be partitioned into l subsets, the first l subsets have size c l n, the last subset has size between c l n and c l n, where [ ] c l := l+ + l+ l T [ l+, l ). (.3) The graph has the form of a complete l-partite graph, with some additional edges on the last subset 7

8 that create no triangles within that last subset. The optimal graphons have the form, k < l: x < kc l < y or y < kc l < x, gl (x, y) := p l, (l )c l < x < [ + (l )c l] < y or (l )c l < y < [ + (l )c] < x, 0, otherwise, (.3) where p l = 4c l( lc l ) (0, ]. (.33) ( (l )c l ) Figure 3 plots c l and p l as a function of T for l N. An illustration of the graphon limit of such a graph is given below in Figure Figure 3: For l N: c l (left) and p l (right) as a function of T. y (l ) c l g l (x,y) = p l on gl (x,y) = on 4 c l gl (x,y) = 0 else 3 c l c l c l c l c l 3 c l 4 c l (l ) c l x Figure 4: Graphon g l, for l N and T ( l l, l l+ ]. 8

9 3 Theorems In this section we present our results, which address the following two issues: In Theorems we identify the scaling behaviour of s for fixed T and T T 3, respectively, T T 3. It turns out that the way in which s tends to zero differs in the two cases. In Propositions we characterise some possible asymptotic structures of random graphs drawn from the microcanonical ensemble when the hard constraint is on the edge and triangle density. Our results indicate that the structure of the graphs differs for T T 3, respectively, T T 3. In the sequel we make the following two assumptions: Assumption Fix the edge density T (0, ) and consider the triangle density T 3 + ɛ, for some ɛ either positive or negative. For this pair of constraints we consider the Lagrange multipliers θ (ɛ) := (θ(ɛ), θ(ɛ)) as defined in Section. (see also the follow up discussion in Section.3). Then, for ɛ sufficiently small, we have the representation sup h W [ θ (ɛ)t ( h) + θ (ɛ)t ( h) I( h) where θ := θ (0), γ = θ (0) and γ = θ (0). ] = θ T I(T ) + (γ T + γ T 3 )ɛ + O(ɛ ), (3.) In Section 4. we show that Assumption is true when T [, ). For T (0, ) we can prove (3.4) and (3.5) below, but with replacing the equality. If Assumption is true, then we again obtain (3.4) and (3.5) with equality. If it fails, then we have strict inequality. Assumption Fix the edge density T (0, ) and consider the triangle density T 3 + ɛ, for some ɛ either positive or negative. For this pair of constraints we consider the microcanonical entropy J(ɛ) := sup{ I( h) : h W, T ( h) = T, T ( h) = T 3 + ɛ}. (3.) Then for ɛ sufficiently small the solution of (3.), denoted by h ɛ, has the following form h ɛ = T + g ɛ, where g ɛ = g I I + g (I J) (J I) + g J J, (3.3) with g, g, g [ T, T ] and I, J [0, ]. Assumption is based on the following intuitive argument. Suppose we want to maximise the microcanonical entropy among all piecewise constant graphons. Then we expect the entropy to decrease when we add more structure, i.e. more steps, in the graphon. A piecewise constant graphon with m steps corresponds to a random graph where the vertices are divided into m groups, and within each group we make an ER random graph with some probability. We expect that the microcanonical entropy will decrease as m increases. This statement is also supported by extensive numerical experiments performed in [9]. The methodology we rely on in order to analyse the variational problem in (3.) does not always identify the exact optimal graphon. It identifies a candidate optimal graphon, which is sufficient in some cases, for the scaling behaviour of the relative entropy s. We call these graphons balance optimal. Informally speaking, a balance optimal graphon is obtained when solving the optimisation problem in (3.) in a smaller class of graphons than the whole class of graphons that satisfy the hard constraint. This is the class of graphons satisfying the conditions in Assumption such that the values g, g, g all correspond to contributions of the same order. The precise definition of a balance optimal graphon is given in Section 5. We want to investigate in this chapter whether the global maximizer of (3.) lies in this smaller class of graphons. We show that balance optimisers have specific structural properties. 9

10 But, for the case of a perturbation upwards, the unique optimal graphon does not lie in this class, and this happens because λ(i) gets very small as ɛ 0 while g stays constant. We refer the reader to [0]. For the case of a perturbation downwards the exact structure of the unique optimal graphon is still not known: the only results we are aware of come from an extensive numerical study [7]. From this numerical study it seems that, at least for T (0, T ), with T 0.44, the unique global optimiser is indeed a balance optimal graphon. In this chapter we investigate this question further by identifying the balance optimal graphons and comparing them with the results established numerically in [7]. Balance optimal graphons are candidate optimisers of J(ɛ). In what follows, because all the graphons we derive are balance optimal graphons, we simply speak of optimal graphons. When at some point a clear distinction is needed we say so. Another important feature is that balance optimal graphons are in general not unique. In the following sections we construct various balance optimal graphons, showing the different structures that can emerge. The variational problem J(ɛ) in (3.) has been solved in [0] for the case T > T 3, while the case T < T 3 still remains unsolved. In this chapter we consider only a small perturbation around the typical values, but the advantage of our method is that it is simpler and yields more intuition about the way the constraint is attained. Moreover, it also applies for the case ɛ < 0, which has not been rigorously analysed before. In [7] the authors identify the maximizers of the microcanonical entropy numerically. The optimal graphons obtained numerically in [7] agree structurally with the balance optimal graphons that we find. Theorem 3. For T (0, ) and T, Theorem 3. For T (0, ], lim ɛ s (T, T 3 + 3T ɛ) = ɛ 0 6 T T log T (0, ). (3.4) lim sup ɛ 0 ɛ /3 s (T, T 3 T 3 ɛ) 4 T T (0, ). (3.5) Theorem 3.3 For T (, ), lim sup ɛ /3 s (T, T 3 T 3 ɛ) f(t, T ) (0, ), (3.6) ɛ 0 where T ( T, 0) is the unique point where the function x f(t, x), defined by attains its global minimum. f(t, x) := T I(T + x) I(T ) I (T )x x, x ( T, 0), (3.7) We illustrate these results in Figure 5. In the left panel we plot the limits in the right-hand side of (3.5) (3.6) as a function of T. In the right panel we plot s (T, T 3 + ɛ) as a function of ɛ, for ɛ sufficiently small, and for four different values of T. Remark 3.4 We believe, and there is numerical evidence in [7], that the results in (3.5) and (3.6) hold with equality and that the corresponding limits exist. Remark 3.5 From [8, Theorem.] we also have that for some constant c = c(ɛ) > 0. s (T, T 3 T 3 ɛ) cɛ /3, (3.8) In Propositions below we identify the structure of balance optimal graphons corresponding to the perturbed constraints in the microcanonical ensemble in the limit as n. 0

11 * * * * Figure 5: Limit of scaled s as a function of ɛ for ɛ sufficiently small. Proposition 3.6 When the ER-line is approached from above, a balance optimal graphon is given by h = T + ɛ g + O(ɛ) (global perturbation) (3.9) with g given by, (x, y) [0, ], g (x, y) = 0, (x, y) [0, ] (, ] (, ] [0, ],, (x, y) (, ]. (3.0) It is important to mention that the balance optimal graphon determined in Proposition 3.6 is not unique in the sense that there are multiple graphons that yield the same entropy value. From Proposition 3.6 we also see it is possible that the class of balance optimisers does not contain the actual unique optimiser of J(ɛ). For this pair of constraints and from [0] we have that the actual unique optimiser, denoted by h ɛ, is given by h, (x, y) [0, λɛ], h ɛ (x, y) = T + h ɛ, (x, y) [0, λɛ] (λɛ, ] (λɛ, ] [0, λɛ], (3.) T + h ɛ, (x, y) (λɛ, ], where λ := ( T, h ) := h, h := T. (3.) The term h solves the equation I (h ) = 3I ( T ) and is constant as ɛ 0. For details on this issue we refer to [0]. As mentioned above, balance optimal graphons have the structural property that g, g, g all contribute equally to the constraint. This is not the case for the graphon in (3.0), because only g and g contribute to the constraint, to leading order. The exact computations are provided in Section 5. From (3.0) and (3.) we observe that balance optimal graphons can have structures very different from the optimal graphons. Proposition 3.7 When the ER-line is approached from below and T (0, ], a balance optimal graphon is given by h = T + ɛ /3 g + O(ɛ /3 ) (global perturbation) (3.3)

12 with g given by T, (x, y) [0, ], g (x, y) = T, (x, y) [0, ] (, ] (, ] [0, ], T, (x, y) (, ]. This g is not unique, in the sense that there are multiple graphons that are balance optimal. (3.4) Proposition 3.8 When the ER-line is approached from below and T (, ), the unique balance optimal graphon is given by h = T + gɛ (local perturbation) (3.5) with g ɛ defined by gɛ (x, y) := with T ( T, 0) defined in Theorem 3.3. T ɛ T /3, (x, y) [0, T T ɛ/3 ] T ɛ /3, (x, y) [0, T T ɛ/3 ] [ T T ɛ/3, ] or (x, y) [ T T ɛ/3, ] [0, T T ɛ/3 ], T, (x, y) [ T T ɛ/3, ], (3.6) In conclusion, Theorems say that at a fixed density of the edges it is less costly in terms of relative entropy to increase the density of triangles than to decrease it. The ER-line represents a crossover in the cost (see Figure 5, right panel). Above the ER-line the cost is linear in the distance, below the ER-line the cost is proportional to the 3-power of the distance. Propositions show that the optimal perturbation of the ER-graphon is global above the ER-line and below the ER-line when the edge density is less than and local below the ER-line when the edge density is larger than. 4 Proofs of Theorems In this section we prove Theorems Along the way we use the results given in Propositions , which we prove in Section Proof of Theorem 3. For ease of notation we drop the superscript from the constraint on the edge density and write T instead of T. Let T (ɛ) = T, T (ɛ) = T 3 + 3T ɛ. (4.) The factor 3T appearing in front of the ɛ is put in for convenience. We know that for every pair of graphical constraints (T (ɛ), T (ɛ)) there exists a unique pair of Lagrange multipliers (θ (ɛ), θ (ɛ)) corresponding to these constraints. For an elaborate discussion on this issue we refer the reader to [?]. By considering the Taylor expansion of the Lagrange multipliers (θ (ɛ), θ (ɛ)) around ɛ = 0, we obtain where θ (ɛ) = θ + γ ɛ + Γ ɛ + O(ɛ 3 ), θ (ɛ) = γ ɛ + Γ ɛ + O(ɛ 3 ), (4.) θ (0) = θ = I (T ), γ = θ (0), Γ = θ (0), θ (0) = 0, γ = θ (0), Γ = θ (0). (4.3) We denote the two terms in the expression for s in (.30) by I, I, i.e., s = sup h W [ θ T ( h) I( h) ] sup h W [ θ T ( h) I( h) ] = I I, (4.4) and we let s (ɛ) denote the relative entropy corresponding to the perturbed constraints. We distinguish between the cases T [, ) and T (0, ).

13 Case I T [, ): From [?, Section 5], if T [, ) and T [ 8, ), then the corresponding Lagrange multipliers (θ, θ ) are both non-negative. Hence from [7, Theorem 4.] we have that [ ] [ I := sup θ (ɛ)t ( h) + θ (ɛ)t ( h) I( h) = sup θ (ɛ)u + θ (ɛ)u I(u) ], (4.5) h W 0 u and, consequently, I = sup 0<u< [ θ (ɛ)u + θ (ɛ)u 3 I(u) ] = θ (ɛ)u (ɛ) + θ (ɛ)u (ɛ) 3 I(u (ɛ)). (4.6) The optimiser u (ɛ) corresponding to the perturbed multipliers θ (ɛ) and θ (ɛ) is analytic in ɛ, as shown in [9]. Therefore, a Taylor expansion around ɛ = 0 gives where δ = u (0) and = u (0). Hence I can be written as Moreover, u (ɛ) = T + δɛ + ɛ + O(ɛ 3 ), (4.7) I = θ T I(T ) + (γ T + γ T 3 )ɛ + O(ɛ ). (4.8) I = [ θ + γ ɛ + Γ ɛ + O(ɛ 3 ) ] T + [ γ ɛ + Γ ɛ + O(ɛ 3 ) ] (T 3 + 3T ɛ) inf h W ɛ I( h) (4.9) where Consequently, =θ T + γ T ɛ + Γ T ɛ + T 3 γ ɛ + Γ T 3 ɛ + 3T γ ɛ J (ɛ) + O(ɛ 3 ), J (ɛ) := h () ɛ inf I( h), W ɛ := { h W : T ( h) = T, T ( h) = T 3 + 3T ɛ}. (4.0) h W ɛ s (T, T 3 + 3T ɛ) = J (ɛ) I(T ) + O(ɛ ). (4.) Denote by one of the, possibly multiple, balance optimisers of the variational problem J (ɛ). () From Proposition 3.6 we know that, for ɛ sufficiently small, any graphon in the equivalence class h ɛ, denoted by h () ɛ, has the form h () ɛ = T + ɛg + O(ɛ) where the graphon g was defined in (3.0). By considering the Taylor expansion of the function I around ɛ = 0, we get I(h () ɛ ) = I(T ) + I (T ) ɛ dx dy g (x, y) [0,] + I (T ) ɛ dx dy g (x, y) + o(ɛ) [0,] = I(T ) + I (T ) ɛ dx dy g (x, y) + o(ɛ) [0,] = I(T ) + I (T )ɛ + o(ɛ) = I(T ) + T ( T ) ɛ + o(ɛ). But, from (3.), a straightforward computation of the entropy of h ɛ shows that (4.) J 6 (ɛ) = I(T ) + T T log T ɛ + o(ɛ). (4.3) Hence we obtain that the global optimiser is not a balance optimiser and that s (T, T 3 + 3T ɛ) = 6 T T log T ɛ + o(ɛ). (4.4) 3

14 Case II T (0, ): Consider the term I := sup h W [ θ (ɛ)t ( h) + θ (ɛ)t ( h) I( h) ], as above. If Assumption applies, then this case is proved in the same way as Case I. Otherwise, consider the following lower bound [ ] [ θ (ɛ)t ( h) + θ (ɛ)t ( h) I( h) sup θ (ɛ)u + θ (ɛ)u 3 I(u) ]. (4.5) 0 u sup h W The arguments used in Case I after (4.6) apply, and the result in (4.) is obtained with an inequality instead of an equality. 4. Proof of Theorem 3. In this section we omit the computations that are similar to those in the proof of Theorem 3. in Section??. Let T (ɛ) = T, T (ɛ) = T 3 T 3 ɛ. (4.6) The factor T 3 appearing in front of the ɛ is put in for convenience in the computations. The perturbed Lagrange multipliers are where θ (ɛ) = θ + γ ɛ + Γ ɛ + O(ɛ 3 ), θ (ɛ) = γ ɛ + Γ ɛ + O(ɛ 3 ), (4.7) θ = I (T ), γ = θ (0), Γ = θ (0) γ = θ (0), Γ = θ (0). (4.8) We denote the two terms in the expression for s in (.30) by I, I, i.e., s = I I, and let s (ɛ) denote the perturbed relative entropy. The computations for I are similar as before, because the exact form of the constraint does not affect the expansions in (4.7) and (4.8). For I, on the other hand, we have I = θ T + γ T ɛ + Γ T ɛ + T 3 γ ɛ + Γ T 3 ɛ T 3 γ ɛ J (ɛ) = θ T + γ T ɛ + T 3 γ ɛ J (ɛ) + (4.9) O(ɛ ), where Consequently, J (ɛ) := inf I( h), W ɛ := { h W : T ( h) = T, T ( h) = T h W 3 T 3 ɛ}. (4.0) ɛ s (T, T T 3 ɛ) = J (ɛ) I(T ) + O(ɛ ). (4.) Denote by h ɛ one of the, possibly multiple, optimisers of the variational problem J (ɛ). From Proposition 3.7 we know that, for T (0, ], a balance optimal graphon in the equivalence class h ɛ, denoted by h ɛ for simplicity in the notation, has the form h ɛ = T + ɛ /3 g + O(ɛ /3 ) (4.) with g given by Hence which gives T, (x, y) [0, ], g (x, y) = T, (x, y) [0, ] (, ] (, ] [0, ], T, (x, y) (, ]. (4.3) J (ɛ) I(T ) + T I (T )ɛ /3 I(T ) + 4 T ɛ /3, (4.4) T s (T, T T 3 ɛ) 4 T ɛ /3 + o(ɛ /3 ). (4.5) T 4

15 4.3 Proof of Theorem 3.3 The computations leading to the expression for the relative entropy in the right-hand side of (3.6) are similar as those in Section 4., and we omit them. Hence we have where, for T (, ), s (T, T T 3 ɛ) = J (ɛ) I(T ) + O(ɛ ), (4.6) J (ɛ) := inf I( h), W ɛ := { h W : T ( h) = T, T ( h) = T h W 3 T 3 ɛ}. (4.7) ɛ Denote by h ɛ one of the, possibly multiple, optimisers of the variational problem J (ɛ). From Proposition 3.8 we know that, for T (, ), a balance optimal graphon in the equivalence class h ɛ, denoted by h ɛ for simplicity in the notation, has the form h ɛ = T + g ɛ (4.8) with g ɛ given by gɛ (x, y) := T ɛ T /3, (x, y) [0, T T ɛ/3 ] T ɛ /3, (x, y) [0, T T ɛ/3 ] [ T T ɛ/3, ] or (x, y) [ T T ɛ/3, ] [0, T T ɛ/3 ], T, (x, y) [ T T ɛ/3, ]. (4.9) The term T ( T, 0) is defined in Theorem 3.3. Hence we have s (T, T T 3 ɛ) f(t, T )ɛ /3 + o(ɛ /3 ), (4.30) where T ( T, 0) is the unique point where the global minimum of the function x f(t, x), defined by f(t, x) := T I(T + x) I(T ) I (T )x x, x ( T, 0). (4.3) We need to show that, for every T (0, ) and for every x ( T, 0), f(t, x) > 0 or equivalently that I(T + x) I(T ) I (T )x > 0. (4.3) From the mean-value theorem we have that there exists ξ (T +x, T ) such that I (T +x) I(T ) = I (ξ)x. Hence we have that f(t, x) = (I (ξ) I (T ))x > 0, (4.33) which follows because I is an increasing function, x ( T, 0) and ξ (T + x, T ). More detailed arguments are provided in the following section. 5 Proofs of Propositions In this section we prove Propositions In Section 5. we prove Proposition 3.6 and in Section 5. we prove Propositions 3.7 and 3.8. The proof of Proposition 3.8 is similar to the proof of Proposition 3.7, only computations are different. In Section 4 the following variational problems were encountered: () For T (0, ), () For T (0, ], J (ɛ) = inf { I( h): h W, T ( h) = T, T ( h) = T 3 + 3T ɛ }. (5.) J (ɛ) = inf { I( h): h W, T ( h) = T, T ( h) = T 3 T 3 ɛ }. (5.) 5

16 (3) For T (, ), J (ɛ) = inf { I( h): h W, T ( h) = T, T ( h) = T 3 T 3 ɛ }. (5.3) In order to prove Propositions , we need to analyse these three variational problems, for ɛ sufficiently small, which is the objective of this section. The variational formula in (5.) has been rigorously analysed in [0], and hence we study the variational formulas in (5.) and (5.3), under the assumption that the optimiser lies in the class of balance optimal graphons. We remind the reader that we suppose Assumption to be true. We analyse the variational formulas with the help of a perturbation argument. In particular, we show that the balance optimal perturbations are those given in (3.9), (3.3) and (3.5), respectively. The results in Propositions 3.7 and 3.8 follow directly from the following two lemmas. Lemma 5. Let T (0, ]. For ɛ > 0 consider the variational formula for J (ɛ) given in (5.). Then, for ɛ sufficiently small, T J (ɛ) I(T ) + ɛ /3 + o(ɛ /3 ). (5.4) 4 T Lemma 5. Let T (, ). For ɛ > 0 consider the variational formula for J (ɛ) given in (5.3). Then, for ɛ sufficiently small, J (ɛ) I(T ) + f(t, T )ɛ /3 + o(ɛ /3 ), (5.5) where f(t, x), x ( T, 0), and T were defined in Theorem 3.3. Remark 5.3 As argued in Remark 3.4, we believe, and there is numerical evidence in [7], that the results in (5.4) and (5.5) hold with equality. In what follows we use the notation f(ɛ) g(ɛ), for two functions f, g, when f(ɛ) g(ɛ) converges to a positive constant, as ɛ Proof of Proposition 3.6 In this section we prove Proposition 3.6 given that Assumption holds. In order to find the optimal perturbation when the ER-line is approached from above, we need to solve J (ɛ) in (5.). The following construction shows intuitively why balance optimal perturbations have the form given in (3.9). Consider an inhomogeneous ER-random graph on n vertices. We split the vertices of the graph into two parts of equal size, i.e. of size n/. In one part we connect two vertices with probability T + ɛ, in the other part we connect two vertices with probability T ɛ, and we connect vertices lying in different parts with probability T. This graph has expected edge density equal to ( n ) ( T ( n ) + (T + ɛ)( n ) + (T ɛ)( n Similarly, the expexted triangle density is equal to (( n ) ( n (T + 3) ɛ) 3 + n ( n ) ( n T T + )(T ) ɛ) = T 3 n 4 + 3T n ɛ T 3 + 3T ɛ, )) = T. (5.6) for n large. Below when we speak of optimal perturbation we mean balance optimal. In the proof below we will see that the optimal perturbation is indeed given by the graphon counterpart of the 6

17 inhomogeneous ER-random graph described above. We now proceed to the technical details of the proof. We consider the variational formula J (ɛ), with ɛ > 0, given in (5.). We denote by one of the, possibly multiple, optimisers of J (ɛ). For simplicity in the notation, in what follows we work with a representative element, denoted by h ɛ, of the equivalence class h ɛ. We write the optimiser h ɛ in the form h ɛ = T + H ɛ for some bounded symmetric function H ɛ defined on the unit square [0, ] and taking values in R. This term will be called the perturbation term. The optimiser h ɛ has to satisfy the conditions on the edge and triangle densities, i.e., T (h ɛ ) = T, T (h ɛ ) = T 3 + 3T ɛ. (5.7) Hence the perturbation term H ɛ needs to satisfy the constraints (G ): dx dy H ɛ (x, y) = 0 (5.8) [0,] and (G ): 3T [0,] 3 dx dy dz H ɛ (x, y) H ɛ (y, z) + dx dy dz H ɛ (x, y) H ɛ (y, z) H ɛ (z, x) = 3T ɛ. [0,] 3 In what follows we prove the result stated in Proposition 3.6, i.e., the optimal perturbation is a threestep function and is of the order ɛ. In Assumption it is stated that it suffices to restrict to graphons that can be written in the form T + H ɛ (), where H ɛ () is a bounded symmetric function defined on [0, ], taking three non-zero values. In what follows, for simplicity in the computations and without loss of generality, we suppose that the optimal graphon has the form h ɛ (5.9) Then (G ) above becomes H () ɛ = g I I + g (I J) (J I) + g J J. (5.0) and the two integrals in (G ) become λ(i) g + λ(i)( λ(i))g + ( λ(i)) g = 0, (5.) [0,] 3 dx dy dz H ɛ (x, y) H ɛ (y, z) = λ(i) 3 g + λ(i) ( λ(i))g g + λ(i)( λ(i)) g g + λ(i)( λ(i))g + ( λ(i)) g (5.) and [0,] 3 dx dy dz H ɛ (x, y) H ɛ (y, z) = λ(i) 3 g + λ(i) ( λ(i))g g + λ(i)( λ(i)) g g + λ(i)( λ(i))g + ( λ(i)) g, (5.3) and a similar expression can be computed for the second integral in (G ). We now give the formal definition of a balance optimal graphon: Definition 5.4 For T (0, ), a graphon T + h ɛ, ɛ > 0, is called balanced if it has the structure given in (5.0) and the terms λ(i) g, λ(i)( λ(i))g and ( λ(i)) g are all of the same order when ɛ is sufficiently small. 7

18 Definition 5.5 For ɛ > 0 a graphon h ɛ is called balance optimal if it solves the following optimisation problem: J bal (ɛ) := inf{i( h), h W, h is balanced, T ( h) = T, T ( h) = T 3 + 3T ɛ}. (5.4) It is straightforward to observe that, for ɛ > 0, J bal (ɛ) J(ɛ). (5.5) In what follows we essentially determine J bal (ɛ) for ɛ sufficiently small. We distinguish two cases, first g = 0 and then g 0. Case g = 0: The values of g + and g are such so that T + H ɛ () satisfies the conditions in (5.8) and (5.9). We proceed with the condition in (5.9). A standard computation yields dx dy dz H ɛ () (x, y) H ɛ () (y, z) = λ(i) 3 g+ + λ(j) 3 g (5.6) [0,] 3 and [0,] 3 dx dy dz H () ɛ (x, y) H () ɛ (y, z) H () ɛ (z, x) = λ(i) 3 g λ(j) 3 g 3. (5.7) From (5.8) we obtain the first order condition Using the condition in (5.8), we get that (5.9) equals λ(i) g + + λ(j) g = 0. (5.8) g λ(j) 3 3T λ(i) (λ(j) + λ(i)) λ(j) 3 g3 λ(i) 3 (λ(i)3 λ(j) 3 ) = 3T ɛ + o(ɛ). (5.9) There are multiple ways in which the condition in (5.9) can be met. We show that the lowest possible value of the function I is attained when g + ɛ, g ɛ and λ(i), λ(j) are constant. To that end we distinguish the following cases: (I) (II) g λ(j) 3 3T (λ(j) + λ(i)) ɛ, λ(i) g3 which splits into three sub-cases: (Ia) (Ib) (Ic) (d) g + ɛ /, g ɛ /, g + ɛ /+δ/3, g ɛ / δ, g + ɛ / 3δ, g ɛ /+δ, λ(j) 3 λ(i) 3 (λ(i)3 λ(j) 3 ) = o(ɛ), (5.0) λ(j) λ(i). (5.) λ(j) 3 λ(i) ɛδ, δ (0, ). (5.) λ(j) 3 λ(i) ɛ δ, δ (0, 6 ). (5.3) g + ɛ /3, g = ḡ ( T, 0), λ(j) ɛ /3. (5.4) g 3T λ(j) 3 λ(i) (λ(j) + λ(i)) ɛ+δ, g λ(i) ɛ δ, δ > 0. (5.5) A simple calculation shows that, in all five cases above, λ(i) + λ(j) and λ(i) 3 λ(j) 3, and hence we can omit these two factors from the analysis below. In what follows we exclude cases (Ib), (Ic) and (II) one by one by comparing them to graphons of the type given in case (Ia). 8

19 Case (Ib): We show that, for ɛ > 0 sufficiently small, graphons having the structure indicated in (Ia) yield smaller values of the function I than graphons with the structure in (Ib). We consider two graphons, denoted by T + g and T + ĝ, where g is as in Case (Ia) and ĝ is as in Case (Ib). Before giving the technical details of the proof, we present a heuristic argument why I(T + g ) < I(T + ĝ ). In what follows we will denote by B(p) a Bernoulli random variable with parameter p. The function I(x), x [0, ], defined in (??) represents the entropy of a B(x) random variable with parameter x. On the graphon space the function I(h), h W, defined in (??), can be seen as the expectation of the entropy of a Bernoulli random variable with a random parameter (the expectation is with respect to the random parameter), i.e., B(h(X, Y )) with (X, Y ) a uniformly distributed random variable on [0, ]. For h W we have I(h) = dx dy [ I(h(x, y))] = E[ I(h(X, Y ))]. (5.6) [0,] Hence we have the following equivalence I(T + g ) < I(T + ĝ ) E[ I(T + g (X, Y ))] > E[ I(T + ĝ (X, Y ))], (5.7) where (X, Y ) is a uniformly distributed random vector on [0, ]. Instead of working with entropy, it is intuitively simpler to work with the relative entropy with respect to the random variable B( ). The relative entropy is defined by Note that I (x) := x log x + ( x) log x, x [0, ]. (5.8) E[ I(T + g (X, Y ))] > E[ I(T + ĝ (X, Y ))] E[I (T + g (X, Y ))] < E[I (T + ĝ (X, Y ))]. (5.9) We first give an intuitive argument and afterwards prove that E[I (T + g (X, Y ))] < E[I (T + ĝ (X, Y ))]. (5.30) We distinguish between the cases T (0, ] and T (, ). The case T (0, ] follows by using similar arguments as in case T (, ). We treat in detail only the case T (, ). The relative entropy of a random variable with respect to B( ) is zero if and only if that random variable is equal to B( ). So, in order to compare the relative entropies in (5.30), we need to see how close the Bernoulli random variables with random parameters T + g (X, Y ) and T + ĝ (X, Y ) are to B( ). We are considering the case T >. Hence the random variables B(T + g (X, Y )) and B(T + ĝ (X, Y )) will be close to B( ) when the random parameters T + g (X, Y ) and T + ĝ (X, Y ) are close to. This is the case when g (X, Y ) and ĝ (X, Y ) are negative. These events occur with probabilities P(T + g (X, Y ) < T ) = P(g (X, Y ) < 0) = P(g (X, Y ) = g ) = λ(j), (5.3) because of the properties of the graphon in Case (Ia). Similarly, we have that P(T + ĝ (X, Y ) < T ) = P(ĝ (X, Y ) < 0) = P(ĝ (X, Y ) = g ) = λ(ĵ) ɛ 4δ/3, (5.3) for some δ (0, ], because of the properties of the graphon in Case (Ib). Hence we see that the random variable B(T + g (X, Y )) is closer to the random variable B( ) with much higher probability than the random variable B(T + ĝ (X, Y )). We can see this by computing the corresponding expectations, E(g (X, Y ) g (X, Y ) = g ) P(g (X, Y ) = g ) = g P(g (X, Y ) = g ) ɛ /, (5.33) 9

20 while E(ĝ (X, Y ) ĝ (X, Y ) = ĝ ) P(ĝ (X, Y ) = ĝ ) = ĝ P(ĝ (X, Y ) = ĝ ) ɛ / δ ɛ 4δ/3 = ɛ /+δ/3. In what follows we complete this argument by adding the technical details. We work out the expressions in the left-hand and right-hand sides of (5.30). The expression in the right-hand side of (5.30) can be written as E[I (T + g (X, Y ))] = LI (T + g + ) + KI (T + g ) + ( L K)I (T ), (5.34) for some constants L := P(g (X, Y ) = g + ) and K = P(g (X, Y ) = g ) independent of ɛ. Similarly, E[I (T + ĝ (X, Y ))]=λ(î) I (T + ĝ + ) + ɛ 4δ/3 I (T + ĝ ) + ( λ(î) ɛ 4δ/3 )I (T ), (5.35) where λ(î) = P(ĝ (X, Y ) = ĝ + ) and P(ĝ (X, Y ) = ĝ ) ɛ 4δ/3. Moreover, we recall that from the properties of the graphons in Case (Ia) and Case (Ib) we get g + ɛ, g ɛ, ĝ + ɛ /+δ/3, ĝ ɛ / δ, δ (0, ]. (5.36) Hence, for T (, ] and ɛ sufficiently small, because of (5.36), we obtain the following inequalities: I (T + g + ) > I (T + ĝ + ) > I (T + g ) > I (T + ĝ ). (5.37) Using a Taylor expansion of the function I around T and the first order conditions Lg + + Kg = 0 and λ(î) ĝ + + λ(ĵ) ĝ = 0, (5.38) we observe that (5.34) and (5.35) read E[I (T + g (X, Y ))] = I (T ) + I (T )(Lg+ + Kg ) + o ( g+ + g ) and (5.39) E[I (T + ĝ (X, Y ))] = I (T ) + I (T )(λ(î )ĝ+ + λ(ĵ) ĝ ) ( ) + o λ(î )ĝ+ + λ(ĵ) ĝ. (5.40) Using (5.36), we observe that Lg + + Kg ɛ and Hence, for ɛ sufficiently small, [ ] E I (T + g (X, Y )) which proves (5.30). λ(î )ĝ + + λ(ĵ) ĝ ɛ +δ/3 + ɛ 4/3δ ɛ δ ɛ δ/3. (5.4) [ ] < E I (T + ĝ (X, Y )), (5.4) Similar arguments can be used for the case T (0, ) to show that graphons having the structure as in Case (Ic), yield larger values of I for ɛ sufficiently small. We omit the details. Case (d): In this case we have that the optimal graphon is constant on a subset of the unit square with a size tending to zero as ɛ 0. Such a graphon yields I(T + g ) = λ(i) I(T + g + ) + ( λ(i))( λ(j))i(t ) + λ(j) I(T + g ) = λ(i) (I(T ) + I (T )g + + o(ɛ /3 )) + ( λ(i))( λ(j))i(t ) + λ(j) I(T + g ) = I(T ) λ(j) I(T ) λ(j) ḡi (T ) + λ(j) I(T + ḡ) = I(T ) + ɛ /3 (I(T + ḡ) ḡi (T ) I(T )) + o(ɛ /3 ). (5.43) 0

21 The second equality follows by considering a Taylor expansion around ɛ = 0 in the terms that go to zero as ɛ 0, i.e, g +. In the third equality we use (5.8). What remains is to show that I(T + ḡ) I(T ) ḡi (T ) > 0, (5.44) for ḡ ( T, 0). From the mean-value theorem we have that I(T + ḡ) I(T ) = I (ξ)ḡ for some ξ (T + ḡ, T ). Since ḡ < 0 and I is a convex function, i.e. I is an increasing function, we have that I (ξ) < I (T ). This proves the claim above. From (5.43) we observe that graphons having the form as in Case (d) yield larger values of I, for ɛ sufficiently small, than graphons as in Case (a). Case (II): This case is simpler to exclude than the ones above. Indeed, suppose that (5.5) holds. Then either λ(i) should become small or g should become large. But g ɛ δ is not possible because g should stay bounded in ( T, 0) as ɛ 0. Hence the only possibility is λ(i) ɛ η and g ɛ ζ for some η, ζ such that ζ η = δ, because of the second condition in (5.5). From the first condition in (5.5) we have that ζ η = + δ. Solving these two equations we obtain that η = 3 + δ and ζ = 3 + δ. From (5.8) we then get that g + ɛ δ, which is not possible because g + should stay bounded in (0, T ) as ɛ 0. At this point we summarise our findings. We considered the variational formula J (ɛ) given in (5.) and we assumed that we can restrict ourselves to piece-wise constant graphons (see Assumption ) subject to the constraints in (5.8) and (5.9). Afterwards, without loss of generality, we restricted ourselves to an even smaller class of graphons, those of the form g = g + I I + g J J (5.45) for some g + > 0, g < 0 and I, J [0, ] with λ(i) +λ(j). At the end of this section we elaborate on the case g 0. More specifically, we have shown that the optimal perturbation satisfies g + ɛ /, g ɛ / and λ(i), λ(j). Hence the solution to J (ɛ) has the form T + g ɛ + o(ɛ), where g = g + L L + g K K, for some g + > 0, g < 0, L, K (0, ) independent of ɛ, is a symmetric function defined on [0, ]. From the constraints (5.8) and (5.9) we have that g + L = g K and L 3 g + + K 3 g =. A simple calculation shows that I(T + g ɛ ) = I(T ) + I (T )(L g + + K g ) ɛ + I (T )(L g + + K g )ɛ + o(ɛ) = I(T ) + I (T )(L g + + K g )ɛ + o(ɛ). Hence, in order to find the optimal graphon we need to solve the following optimisation problem: min ( L g+ + K g ) (5.46) This is equivalent to such that L + K, g + L + g K = 0, L 3 g + + K 3 g =. ( min K + L ) K + L such that L + K. (5.47) From a standard computation we find that the optimal K, L should satisfy K + L =. Hence we need to minimize L+L L( L). This function is convex in L (0, ) and attains a unique minimum at the point L =. Having computed L, K we find g + = g =, and so the optimal solution to J (ɛ), for ɛ sufficiently small, is the graphon T + ɛ, if (x, y) [0, ], h ɛ (x, y) = T, if (x, y) [0, ] (, ] or (, ] [0, ], (5.48) T ɛ, if (x, y) (, ]. A standard computation shows that T (h ɛ ) = T and T (h ɛ ) = T 3 + 3T ɛ.

22 Case g 0: By following similar arguments as for the case g = 0, we can show that the optimal values of g, g, g, K and L can be retrieved by solving the following optimisation problem: min ( L g + K g + LKg ) such that L + K =, L g + K g + LKg = 0, L 3 g + K 3 g + L Kg g + LK g g + LKg =. (5.49) Suppose first that L = K =. Then we have the following optimisation problem ( min 4 g + g + g) such that g + g + g = 0, g + g + g g + g g + g = 8. Introducing Lagrange multipliers, we obtain the solution g = 0 and g = g =. For arbitrary L, K, substituting g = ( L L g + L ) L g into (5.49), and differentiating the Lagrangian with respect to g, we obtain g = 0. We observe at this point that this argument holds only for the case where g, g and g go to zero as ɛ 0. This is not the case for the actual optimal graphon in (3.). Case g 0 and g = 0: From (5.44) we observe that g = 0 yields an equality. Hence in this case the microcanonical entropy will be of the order ɛ instead of ɛ /3. From the first-order constraint in (5.8) we obtain g = λ ( λ) g, (5.50) where λ := λ(i). Then the second order constraint reads g λ( λ) = ɛ. (5.5) 4 ( λ) λ Following similar arguments as before, we can show that the case g ɛ δ, λ ɛ /3 δ/3, g ɛ /3+δ/ is not optimal. The case g or g are constant, independently of ɛ, is also not optimal, since if one of them is constant then the entropy cost will be ɛ /3 instead of ɛ. A standard computation yields I(T + g ) = I(T ) + ( I (T ) + 4 λ ) ɛ + o(ɛ), (5.5) λ while for the graphon defined in (5.48) we have I(h ɛ ) = I(T ) + I (T )ɛ + o(ɛ). (5.53) Hence we see that I(T + g ) > I(T + h ɛ ) if and only if λ is constant and independent of ɛ. If λ ɛ δ, then further analysis is needed in order to establish the optimal graphon. In any case, the graphon h ɛ is balance optimal, as desired.

The large deviation principle for the Erdős-Rényi random graph

The large deviation principle for the Erdős-Rényi random graph (Courant Institute, NYU) joint work with S. R. S. Varadhan Main objective: how to count graphs with a given property Only consider finite