Asymptotics and Extremal Properties of the Edge-Triangle Exponential Random Graph Model

Similar documents
Ground states for exponential random graphs

Phase transitions in the edge-triangle exponential random graph model

The large deviation principle for the Erdős-Rényi random graph

arxiv: v1 [stat.ml] 30 Dec 2008

EMERGENT STRUCTURES IN LARGE NETWORKS

arxiv: v2 [math.pr] 4 Jan 2018

Multipodal Structure and Phase Transitions in Large Constrained Graphs

arxiv: v3 [math.pr] 6 Apr 2011

Consistency Under Sampling of Exponential Random Graph Models

Introduction to Real Analysis Alternative Chapter 1

Graph Limits: Some Open Problems

ESTIMATING AND UNDERSTANDING EXPONENTIAL RANDOM GRAPH MODELS

Course 212: Academic Year Section 1: Metric Spaces

Characterizing extremal limits

arxiv: v2 [math.co] 11 Mar 2008

Cauchy s Theorem (rigorous) In this lecture, we will study a rigorous proof of Cauchy s Theorem. We start by considering the case of a triangle.

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Topological properties

Undecidability of Linear Inequalities Between Graph Homomorphism Densities

Algorithmic approaches to fitting ERG models

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

arxiv: v2 [math.st] 21 Nov 2017

Problem Set 6: Solutions Math 201A: Fall a n x n,

Graph limits Graph convergence Approximate asymptotic properties of large graphs Extremal combinatorics/computer science : flag algebra method, proper

Problem Set 2: Solutions Math 201A: Fall 2016

arxiv: v2 [stat.ot] 10 Nov 2011

Moment Measures. Bo az Klartag. Tel Aviv University. Talk at the asymptotic geometric analysis seminar. Tel Aviv, May 2013

McGill University Math 354: Honors Analysis 3

14 : Theory of Variational Inference: Inner and Outer Approximation

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

Quasi-conformal minimal Lagrangian diffeomorphisms of the

Maximum Likelihood Estimation in Loglinear Models

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012

On the number of cycles in a graph with restricted cycle lengths

Functional Analysis Exercise Class

Chapter 4: Asymptotic Properties of the MLE

Normal Fans of Polyhedral Convex Sets

Lecture 4: Graph Limits and Graphons

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Submodularity in Machine Learning

A Graphon-based Framework for Modeling Large Networks

Probabilistic Graphical Models

A brief introduction to Conditional Random Fields

14 : Theory of Variational Inference: Inner and Outer Approximation

In class midterm Exam - Answer key

l(y j ) = 0 for all y j (1)

Fourth Week: Lectures 10-12

L p Spaces and Convexity

Constrained Optimization and Lagrangian Duality

Elusive problems in extremal graph theory

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Limit shapes outside the percolation cone

1. Let A R be a nonempty set that is bounded from above, and let a be the least upper bound of A. Show that there exists a sequence {a n } n N

Consistency of the maximum likelihood estimator for general hidden Markov models

A Potpourri of Nonlinear Algebra

Hard-Core Model on Random Graphs

Mean-field dual of cooperative reproduction

Generalized Exponential Random Graph Models: Inference for Weighted Graphs

On the Logarithmic Calculus and Sidorenko s Conjecture

Hyperbolic metric spaces and their applications to large complex real world networks II

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

A LARGE DEVIATION PRINCIPLE FOR THE ERDŐS RÉNYI UNIFORM RANDOM GRAPH

Probabilistic Graphical Models

Applications of the Lopsided Lovász Local Lemma Regarding Hypergraphs

Computing Maximum Likelihood Estimates in Log-Linear Models

Steiner s formula and large deviations theory

Metric Spaces Lecture 17

Some Background Math Notes on Limsups, Sets, and Convexity

TEICHMÜLLER SPACE MATTHEW WOOLF

The Lopsided Lovász Local Lemma

Parameter estimators of sparse random intersection graphs with thinned communities

The number of edge colorings with no monochromatic cliques

Combinatorial Optimization

Permutations with fixed pattern densities

Random Geometric Graphs

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

arxiv: v2 [math.pr] 22 Oct 2018

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

P-adic Functions - Part 1

1. Principle of large deviations

Subhypergraph counts in extremal and random hypergraphs and the fractional q-independence

A Variational Principle for Permutations

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Combinatorial Types of Tropical Eigenvector

Entropy, mixing, and independence

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

Mathematics 530. Practice Problems. n + 1 }

On John type ellipsoids

A NOTE ON THE TURÁN FUNCTION OF EVEN CYCLES

Mathematical Methods for Physics and Engineering

Large deviations for random projections of l p balls

Multipodal Structure and Phase Transitions in Large Constrained Graphs

Mid Term-1 : Practice problems

Part III. 10 Topological Space Basics. Topological Spaces

Conditional Marginalization for Exponential Random Graph Models

Behaviour of Lipschitz functions on negligible sets. Non-differentiability in R. Outline

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Transcription:

Asymptotics and Extremal Properties of the Edge-Triangle Exponential Random Graph Model Alessandro Rinaldo Carnegie Mellon University joint work with Mei Yin, Sukhada Fadnavis, Stephen E. Fienberg and Yi Zhou May 20, 2014 Alegbraic Statistics 2014 Illinois Institute of Technology A. Rinaldo Asymptotics and Extremal Properties of the ET Model 1/36

Outline Exponential random graphs models (ERGMs) for network data. Degeneracy and Geometry of Discrete exponential families. Asymptotic Geometry of the Edge-Triangle Model. Asymptotic Extremal Properties of the Edge Triangle Model. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 2/36

Outline Exponential random graphs models (ERGMs) for network data. Degeneracy and Geometry of Discrete exponential families. Asymptotic Geometry of the Edge-Triangle Model. Asymptotic Extremal Properties of the Edge Triangle Model. Asymptotic quantization of exponential random graphs (2103) Yin, M., Rinaldo, A. and Fadnavis, S. http://arxiv.org/abs/1311.1738 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 2/36

Statistical Network (Random Graph) Analysis Let G n be the set of simple graphs on n nodes. Thus, G n = 2 (n 2). A. Rinaldo Asymptotics and Extremal Properties of the ET Model 3/36

Statistical Network (Random Graph) Analysis Let G n be the set of simple graphs on n nodes. Thus, G n = 2 (n 2). The nodes represent the units of some (sub)population of interest. The edges of any x G n encode a set of static relationships among population units. Source: Chung and Lu (2006), Complex Graphs and Networks. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 3/36

Statistical Network (Random Graph) Analysis Let G n be the set of simple graphs on n nodes. Thus, G n = 2 (n 2). The nodes represent the units of some (sub)population of interest. The edges of any x G n encode a set of static relationships among population units. Source: Chung and Lu (2006), Complex Graphs and Networks. Statistical Network Analysis Construct interpretable and realistic statistical models for G n by modeling the random occurrence of edges. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 3/36

Exponential Random Graph Models (ERGMs) Exponential random graph models arise by specifying a set of informative network statistics on G n x t(g n) = (t 1 (G n),..., t d (G n)) R d, such that the probability of observing G n is a function of t(g n) only. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 4/36

Exponential Random Graph Models (ERGMs) Exponential random graph models arise by specifying a set of informative network statistics on G n x t(g n) = (t 1 (G n),..., t d (G n)) R d, such that the probability of observing G n is a function of t(g n) only. Examples: the number of edges E(G n) (Erdös-Renyi model); number of triangles T (G n); the number of k-stars; the degree sequence (the β-model) or degree distribution. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 4/36

Exponential Random Graph Models (ERGMs) Exponential random graph models arise by specifying a set of informative network statistics on G n x t(g n) = (t 1 (G n),..., t d (G n)) R d, such that the probability of observing G n is a function of t(g n) only. Examples: the number of edges E(G n) (Erdös-Renyi model); number of triangles T (G n); the number of k-stars; the degree sequence (the β-model) or degree distribution....or any combinations of statistics. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 4/36

This talk: the Edge-Triangle (ET) Model ET Model: the network statistics are t(g n) = (E(G n), T (G n)) N 2. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 5/36

This talk: the Edge-Triangle (ET) Model ET Model: the network statistics are t(g n) = (E(G n), T (G n)) N 2. Example: n = 9. On G 9, there are 2 36 distinct graphs but only 444 distinct network statistics. 90 80 70 Number of triangles 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 Number of edges A. Rinaldo Asymptotics and Extremal Properties of the ET Model 5/36

Exponential Random Graph Models (ERGMs) Exponential family representation Given the choice t of network statistics, construct the exponential family of probability distributions {Q θ, θ Θ} on G n such that, for a given parameter θ Θ, the probability of observing x is Q θ (G n) = exp { θ, t(g n) ψ(θ)}, A. Rinaldo Asymptotics and Extremal Properties of the ET Model 6/36

Exponential Random Graph Models (ERGMs) Exponential family representation Given the choice t of network statistics, construct the exponential family of probability distributions {Q θ, θ Θ} on G n such that, for a given parameter θ Θ, the probability of observing x is where ψ(θ) = log Q θ (G n) = exp { θ, t(g n) ψ(θ)}, ( x Gn e θ,t(gn) ) the log-partition function - a (often intractable) normalizing term; Θ = {θ R d : e ψ(θ) < } = R d is the natural parameter space. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 6/36

Exponential Random Graph Models (ERGMs) Exponential family representation Given the choice t of network statistics, construct the exponential family of probability distributions {Q θ, θ Θ} on G n such that, for a given parameter θ Θ, the probability of observing x is where ψ(θ) = log Q θ (G n) = exp { θ, t(g n) ψ(θ)}, ( x Gn e θ,t(gn) ) the log-partition function - a (often intractable) normalizing term; Θ = {θ R d : e ψ(θ) < } = R d is the natural parameter space. Two observations: It may be invariant with respect to relabeling of the vertices. Redundancy: if t(g n) = t(g n), then Q θ (G n) = Q θ (G n), for all θ. For the ET example, the median number of graphs corresponding to a network statistic is 2, 741, 130! A. Rinaldo Asymptotics and Extremal Properties of the ET Model 6/36

Sufficiency Principle Let T n = {t : t = t(g n), G n G n} R d and ν(t) = {G n G n : t(g n) = t}. Consider instead the family of probability distributions {P θ, θ Θ} on T n such that, for a given parameter θ Θ, the probability of observing t is with the same Θ and ψ(θ). P θ (t) = exp { θ, t ψ(θ)} ν(t), A. Rinaldo Asymptotics and Extremal Properties of the ET Model 7/36

Sufficiency Principle 90 80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 Number of edges Let T n = {t : t = t(g n), G n G n} R d and ν(t) = {G n G n : t(g n) = t}. Consider instead the family of probability distributions {P θ, θ Θ} on T n such that, for a given parameter θ Θ, the probability of observing t is with the same Θ and ψ(θ). P θ (t) = exp { θ, t ψ(θ)} ν(t), In the ET example, instead of G 9 we can work with T n = Number of triangles A. Rinaldo Asymptotics and Extremal Properties of the ET Model 7/36

Statistical Inference Given one observation x G n, i.e. given t = t(g n), estimate θ (usually with the MLE); assess whether the ERG model fits the data. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 8/36

Statistical Inference Given one observation x G n, i.e. given t = t(g n), estimate θ (usually with the MLE); assess whether the ERG model fits the data. In the ET model, we want to learn 2 parameters: the edge and triangle parameters. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 8/36

ERGMs are Bad!...I mean Hard! Despite being discrete exponential families, many ERGMs are troublesome and behave extremely badly". A. Rinaldo Asymptotics and Extremal Properties of the ET Model 9/36

ERGMs are Bad!...I mean Hard! Despite being discrete exponential families, many ERGMs are troublesome and behave extremely badly". ERGMs and most network models are highly non-standard models for which traditional parametric statistics does not apply. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 9/36

ERGMs are Bad!...I mean Hard! Despite being discrete exponential families, many ERGMs are troublesome and behave extremely badly". ERGMs and most network models are highly non-standard models for which traditional parametric statistics does not apply. The asymptotic (large n) properties of most ERGMs remain unknown. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 9/36

ERGMs are Bad!...I mean Hard! Despite being discrete exponential families, many ERGMs are troublesome and behave extremely badly". ERGMs and most network models are highly non-standard models for which traditional parametric statistics does not apply. The asymptotic (large n) properties of most ERGMs remain unknown. This talk will focus on extremal asymptotics of the ET model. See Chatterjee and Diaconis (2013) for many other asymptotic results. One route to the extremal asymptotics of the ET goes through degeneracy. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 9/36

Degeneracy in ERG Models See Rinaldo, Fienberg and Zhou (2009) What is degeneracy? Quoting from Handcock (2003): A random graph model is near degenerate if the model places almost all its probability mass on a small number of graph configurations [...] e.g. empty graph, full graph, an individual graph, no 2-stars"; a degenerrate model is not able to represent a range of realistic [networks]" since only a small range of graphs [is] covered as the parameters vary"; the MLE does not exist and/or MCMCMLE fails to converge; the observed network t is very unlikely under the distribution specified by the MLE; the model misbehaves... A. Rinaldo Asymptotics and Extremal Properties of the ET Model 10/36

Degeneracy in the ET Model: Entropy Plots We capture overall degenerate behavior using Shannon s entropy function θ ( ) Pθ (t) P θ (t) log 2, θ Θ. ν(t) t T n Rationale: degeneracy occurs when the probability mass is spread over a small number of network statistics, so degenerate distributions will tend to correspond to values of θ for which the entropy function is small. Entropy plot: for each point θ R 2 in the natural parameter space, plot the entropy of the corresponding probability distribution. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 11/36

Degeneracy in the ET Model: Entropy Plots A. Rinaldo Asymptotics and Extremal Properties of the ET Model 11/36

Degeneracy in the ET Model: Entropy Plots A. Rinaldo Asymptotics and Extremal Properties of the ET Model 11/36

Basics of Discrete Exponential Families 90 80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 Number of edges See Barndorff-Nielsen (1974) and Brown (1986) Let T be a random vector taking values in a finite set T n, for example Number of triangles The distribution of T belongs to the exponential family E = {P θ, θ Θ}, with P θ (t) = exp { θ, t ψ(θ)} ν(t), θ Θ = R d. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 12/36

Basics of Discrete Exponential Families 90 80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 Number of edges See Barndorff-Nielsen (1974) and Brown (1986) Let T be a random vector taking values in a finite set T n, for example Number of triangles The distribution of T belongs to the exponential family E = {P θ, θ Θ}, with P θ (t) = exp { θ, t ψ(θ)} ν(t), θ Θ = R d. The set P n = convhull(t n) is called the convex support. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 12/36

Basics of Discrete Exponential Families Choose File no file selected 90 80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 Number of edges See Barndorff-Nielsen (1974) and Brown (1986) Let T be a random vector taking values in a finite set T n, for example The distribution of T belongs to the exponential family E = {P θ, θ Θ}, with P θ (t) = exp { θ, t ψ(θ)} ν(t), θ Θ = R d. Number of triangles The set P n = convhull(t n) is called the convex support. It is a polytope. JavaView v.3.95 www.javaview.de Loading http://www.javaview.de/models/primitive/dodecahedron_demo.jvx... int(p n) = {E In the display, θ [T ], θ Θ} is precisely the set of all possible expected values use the right mouse to get help or to open the control panel. of T : mean value space. Type or browse a file from your local disk and press <upload>: upload Currently, the file formats described in data formats are supported which include JavaView's JVX, BYU, Sun's OBJ, Mathematica graphics MGS, Maple graphics MPL, STL, WRL, DXF (some formats are partially supported only). You may also upload gzip- or zip-compressed files which must have an extension like.jvx.gz or.mpl.zip. Your uploaded file will remain on the server for at most 3 hours! int(p n) and Applications: Θ are homeomorphic: we can represent the exponential family Apply any of the algorithms implemented in JavaView to your own models. using int(p n) instead of Θ: mean value parametrization. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 12/36

Convex support for the ET example. 90 Convex support 80 70 Number of triangles 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40 Number of edges A. Rinaldo Asymptotics and Extremal Properties of the ET Model 13/36

One-to-one correspondence between Θ and int(p n). 2 Natural parameters space 90 Mean values space 1 80 0 70 1 2 3 4 5 6 7 10 0 10 20 Number of triangles 60 50 40 30 20 10 0 0 10 20 30 40 Number of edges A. Rinaldo Asymptotics and Extremal Properties of the ET Model 13/36

Extended Exponential Families: Geometric Construction For every face F of P n, construct the exponential family of distributions E F for the points in F with convex support F. Note that E F depends on dim(f ) < d parameters only. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 14/36

Extended Exponential Families: Geometric Construction For every face F of P n, construct the exponential family of distributions E F for the points in F with convex support F. Note that E F depends on dim(f ) < d parameters only. The extended exponential family is E = E {E F : F is a face of P n} 90 80 70 60 50 40 30 20 10 0 90 80 70 60 50 40 30 20 10 0 90 80 70 60 50 40 30 20 10 0 0 10 20 30 0 10 20 30 0 10 20 30 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 14/36

Extended Exponential Families: Geometric Construction For every face F of P n, construct the exponential family of distributions E F for the points in F with convex support F. Note that E F depends on dim(f ) < d parameters only. The extended exponential family is E = E {E F : F is a face of P n} 90 80 70 60 50 40 30 20 10 0 90 80 70 60 50 40 30 20 10 0 90 80 70 60 50 40 30 20 10 0 0 10 20 30 0 10 20 30 0 10 20 30 Extended Exponential Family The extended exponential family is the closure of the original family. Geometrically, this corresponds to taking the closure of the mean value space, i.e. including the boundary of P n. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 14/36

Back to Degeneracy A. Rinaldo Asymptotics and Extremal Properties of the ET Model 15/36

Back to Degeneracy A. Rinaldo Asymptotics and Extremal Properties of the ET Model 15/36

Extended Exponential Families: Dual Geometric Construction Let P be a full-dimensional polytope in R d. The normal cone to a face F is the polyhedral cone { } N F = c R k : F {x P : c, x = max c, y, } y P consisting of all the linear functionals on P that are maximal over F. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 16/36

Extended Exponential Families: Dual Geometric Construction Let P be a full-dimensional polytope in R d. The normal cone to a face F is the polyhedral cone { } N F = c R k : F {x P : c, x = max c, y, } y P consisting of all the linear functionals on P that are maximal over F. The set of cones is called the normal fan of P. N (P) = {N F, F is a face of P} A. Rinaldo Asymptotics and Extremal Properties of the ET Model 16/36

Extended Exponential Families: Dual Geometric Construction Let P be a full-dimensional polytope in R d. The normal cone to a face F is the polyhedral cone { } N F = c R k : F {x P : c, x = max c, y, } y P consisting of all the linear functionals on P that are maximal over F. The set of cones is called the normal fan of P. N (P) = {N F, F is a face of P} Key properties: The (relative interiors of the) cones in N (P) partition R d. dim(n F ) = d dim(f ). A. Rinaldo Asymptotics and Extremal Properties of the ET Model 16/36

The Normal Fan for the ET Model 1 Normal Fan 100 80 0 Number of triangles 60 40 20 1 2 1 0 1 2 0 20 0 20 40 Number of edges A. Rinaldo Asymptotics and Extremal Properties of the ET Model 17/36

The Normal Fan for the ET Model 1 Normal Fan 100 80 0 Number of triangles 60 40 20 1 2 1 0 1 2 0 20 0 20 40 Number of edges A. Rinaldo Asymptotics and Extremal Properties of the ET Model 17/36

The Normal Fan for the ET Model 1 Normal Fan 100 80 0 Number of triangles 60 40 20 1 2 1 0 1 2 0 20 0 20 40 Number of edges A. Rinaldo Asymptotics and Extremal Properties of the ET Model 17/36

The Normal Fan for the ET Model 1 Normal Fan 100 80 0 Number of triangles 60 40 20 1 2 1 0 1 2 0 20 0 20 40 Number of edges A. Rinaldo Asymptotics and Extremal Properties of the ET Model 17/36

Degeneracy Explained (...Graphically) Entropy plots of the natural space and mean value spaces. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 18/36

Degeneracy Explained (...Graphically) Entropy plots of the natural space space with superimposed the normal fan and of the mean value space. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 18/36

Main Result (Colloquial Form) Pick any θ Θ, any face F of P n and any direction o 0 in the interior of the normal cone N F. For a sequence of positive numbers r i, form the sequence θ i = θ + r i o in Θ. Let µ i = E θi [T ] be the mean value parameter corresponding to θ i ; then, the sequence {µ i } is contained in the interior of P n. Then, lim i µ i is a point in the interior of F, which depends only on θ and d. Conversely, any point on the boundary of the convex support can be obtained in this way. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 19/36

Main Result (Colloquial Form) Pick any θ Θ, any face F of P n and any direction o 0 in the interior of the normal cone N F. For a sequence of positive numbers r i, form the sequence θ i = θ + r i o in Θ. Let µ i = E θi [T ] be the mean value parameter corresponding to θ i ; then, the sequence {µ i } is contained in the interior of P n. Then, lim i µ i is a point in the interior of F, which depends only on θ and d. Conversely, any point on the boundary of the convex support can be obtained in this way. Normal Fan and Extended Exponential Families The normal fan realizes geometrically the closure of the family inside the natural parameter space. See Rinaldo, Fienberg and Zhou (2009) and Geyer (2009) A. Rinaldo Asymptotics and Extremal Properties of the ET Model 19/36

Now let n... A. Rinaldo Asymptotics and Extremal Properties of the ET Model 20/36

Now let n... n = 5 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 20/36

Now let n... n = 6 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 20/36

Now let n... n = 7 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 20/36

Now let n... n = 9 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 20/36

Now let n... We are interested in the following: What happens when θ = (θ 1, θ 2 ) diverges to infinity along fixed directions and n? It is a form of asymptotic closure of the family. Interpretation is: how does the model behave when both θ and n are large? We will require the use of fairly recent results on the theory of graph limits (though see Häggström and Jonasson, 1999). A. Rinaldo Asymptotics and Extremal Properties of the ET Model 20/36

Now let n... We are interested in the following: What happens when θ = (θ 1, θ 2 ) diverges to infinity along fixed directions and n? This would correspond to taking the asymptotic closure of the ET model. Interpretation: how does the model behave when both θ and n are large? We will require the use of fairly recent results on the theory of graph limits (though see Häggström and Jonasson, 1999). A. Rinaldo Asymptotics and Extremal Properties of the ET Model 20/36

Rescaling First! For G n G n and a graph H with V (H) n vertices, the density homomorphism of H in G n is t(h, G n) = In particular, t(k 2, G n) = 2E(Gn) n 2 hom(h, Gn) n V (H). and t(k 3, G n) = 6T (Gn) n 3. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 21/36

Rescaling First! For G n G n and a graph H with V (H) n vertices, the density homomorphism of H in G n is t(h, G n) = In particular, t(k 2, G n) = 2E(Gn) n 2 hom(h, Gn) n V (H). and t(k 3, G n) = 6T (Gn) n 3. Rescaling of the ET model To avoid trivialities, rescale the ET model so that { } Q θ (G n) = exp n 2 ( θ, t(g n) ψ n(θ)), G n G n where θ = (θ 1, θ 2 ) R 2 and t(g n) = (t(k 2, G n), t(k 3, G n)) R 2. This rescaling ensures that the sufficient statistics are of the same order n 2. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 21/36

Rescaling First! For G n G n and a graph H with V (H) n vertices, the density homomorphism of H in G n is t(h, G n) = In particular, t(k 2, G n) = 2E(Gn) n 2 hom(h, Gn) n V (H). and t(k 3, G n) = 6T (Gn) n 3. Rescaling of the ET model Let T n = {t(g n), G n G n} [0, 1] 2 and let E n = {P n,θ, θ R 2 } be the exponential family for t(g n): { } P θ (t) = exp n 2 ( θ, t ψ n(θ)) ν n(t), t T n. Also let P n = convhull(t n) [0, 1] 2. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 21/36

Turán Graphs Turán graphs are fundamental in extremal graph theory and also in describing the asymptotic closure of the ET model. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 22/36

Turán Graphs Turán graphs are fundamental in extremal graph theory and also in describing the asymptotic closure of the ET model. Let T (n, r) be a Turán graph on n nodes and r classes, r = 1, 2,..., n (complete r-partite graph with partition sets differing in size by at most 1). For k = 0, 1,..., n 1, set v k,n = t(t (n, k + 1)) = [ t(k2, T (n, k + 1)) t(k 3, T (n, k + 1)) ] [0, 1] 2. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 22/36

Turán Graphs Turán graphs are fundamental in extremal graph theory and also in describing the asymptotic closure of the ET model. Let T (n, r) be a Turán graph on n nodes and r classes, r = 1, 2,..., n (complete r-partite graph with partition sets differing in size by at most 1). For k = 0, 1,..., n 1, set v k,n = t(t (n, k + 1)) = [ t(k2, T (n, k + 1)) t(k 3, T (n, k + 1)) ] [0, 1] 2. Vertices of P n are Turán Graphs (Bollobás, 1976) The vertices of P n are the points {v k,n, k = 0, 1,..., n/2 1} and v n 1,n. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 22/36

Limiting Geometry of the ET Model For the asymptotics of the ET model we need the sets (well studied in extremal graph theory) T = cl ({t(g n), G n G n, n = 1, 2,...}) and P = convhull(t ). A. Rinaldo Asymptotics and Extremal Properties of the ET Model 23/36

Limiting Geometry of the ET Model For the asymptotics of the ET model we need the sets (well studied in extremal graph theory) T = cl ({t(g n), G n G n, n = 1, 2,...}) and P = convhull(t ). The set T (Razborov, 2008) T is the closed subset of [0, 1] 2 with upper boundary curve y = x 3/2 and lower boundary given by y = 0 for 0 x 1/2 and y ( (k 1) k 2 ) ( k(k x(k + 1)) k + ) 2 k(k x(k + 1)) k 2 (k + 1) 2 for (k 1)/k x k/(k + 1), k = 2, 3,.... A. Rinaldo Asymptotics and Extremal Properties of the ET Model 23/36

Limiting Geometry of the ET Model 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 23/36

More on T Infinite homomorphism densities of Turán graphs: For k = 0, 1,..., let ( ) k lim v k(k 1) k,n = v k :=, R 2. n k + 1 (k + 1) 2 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 24/36

More on T Infinite homomorphism densities of Turán graphs: For k = 0, 1,..., let ( ) k lim v k(k 1) k,n = v k :=, R 2. n k + 1 (k + 1) 2 Think of lower boundary of T as a series of scallops" (strictly concave segments) between the v k s. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 24/36

More on T Infinite homomorphism densities of Turán graphs: For k = 0, 1,..., let ( ) k lim v k(k 1) k,n = v k :=, R 2. n k + 1 (k + 1) 2 Think of lower boundary of T as a series of scallops" (strictly concave segments) between the v k s. triangle density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 24/36

More on T A. Rinaldo Asymptotics and Extremal Properties of the ET Model 24/36

The set P The set P = convhull(t ) P = lim n P n, with P n P for all n. The extreme points of P are {v k, k = 0, 1,...} and lim k v k = (1, 1). If n is a multiple of (k + 1)(k + 2), then v k,n = v k and v k+1,n = v k+1. For all such n, the line segment connecting v k and v k+1 intersects P only in v k and v k+1. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 25/36

The set P The set P = convhull(t ) P = lim n P n, with P n P for all n. The extreme points of P are {v k, k = 0, 1,...} and lim k v k = (1, 1). If n is a multiple of (k + 1)(k + 2), then v k,n = v k and v k+1,n = v k+1. For all such n, the line segment connecting v k and v k+1 intersects P only in v k and v k+1. Engström and Norén (2011) show a similar result using isomorphism densities, in which case P = np n. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 25/36

The set P triangle density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 25/36

The set P triangle density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 25/36

The Normal Fan to P Let L k the line segment joining v k and v k1 and set L 1 the segment joining v 0 = (0, 0) and (1, 1) = lim k v k. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 26/36

The Normal Fan to P Let L k the line segment joining v k and v k1 and set L 1 the segment joining v 0 = (0, 0) and (1, 1) = lim k v k. The outer normal to L k are ( 1, 1) if k = 1. o k = (0, ( 1) ) if k = 0 1, (k+1)(k+2) if k = 1, 2,... k(3k+5) We call the o k the critical directions of the model. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 26/36

The Normal Fan to P Let L k the line segment joining v k and v k1 and set L 1 the segment joining v 0 = (0, 0) and (1, 1) = lim k v k. The outer normal to L k are ( 1, 1) if k = 1. o k = (0, ( 1) ) if k = 0 1, (k+1)(k+2) if k = 1, 2,... k(3k+5) We call the o k the critical directions of the model. Finally, set C k = cone(o k 1, o k ) for k = 0, 1,.... A. Rinaldo Asymptotics and Extremal Properties of the ET Model 26/36

The Normal Fan to P -1.0-0.5 0.0 0.5 1.0-1.0-0.5 0.0 0.5 1.0 A. Rinaldo Asymptotics and Extremal Properties of the ET Model 26/36

The Normal Fan to P -1.0-0.5 0.0 0.5 1.0 triangle density 0.0 0.2 0.4 0.6 0.8 1.0-1.0-0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 26/36

The Normal Fan to P -1.0-0.5 0.0 0.5 1.0 triangle density 0.0 0.2 0.4 0.6 0.8 1.0-1.0-0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 26/36

Asymptotics/Extremal Properties of the ET Model: Generic Directions Recall that P θ,n is the ET probability distribution on T n corresponding to θ R 2. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 27/36

Asymptotics/Extremal Properties of the ET Model: Generic Directions Asymptotics along generic directions Pick arbitrary θ, o R 2 with o 0 and o {o K, k = 1, 0,...} and let k be such that o C k. For any 0 < ɛ < 1, there exists an n 0 = n 0 (θ, ɛ, o) > 0 such that the following holds: for every n > n 0, there exists an r 0 = r 0 (θ, ɛ, o, n) > 0 such that for all r > r 0, P θ+ro,n (v k,n ) > 1 ɛ. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 27/36

Asymptotics/Extremal Properties of the ET Model: Generic Directions Asymptotics along generic directions Pick arbitrary θ, o R 2 with o 0 and o {o K, k = 1, 0,...} and let k be such that o C k. For any 0 < ɛ < 1, there exists an n 0 = n 0 (θ, ɛ, o) > 0 such that the following holds: for every n > n 0, there exists an r 0 = r 0 (θ, ɛ, o, n) > 0 such that for all r > r 0, P θ+ro,n (v k,n ) > 1 ɛ. Choice of θ does not matter asymptotically. All points in C k yield the same asymptotics! When n is a multiple of (k + 1)(k + 2), this gives convergence in total variation (in n and r, along subsequences) to v k. For other n s, we get weak convergence. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 27/36

Asymptotics along Generic Directions: Example Directions that will give convergence to the full graph, empty graph and Turán graph with 2 classes -1.0-0.5 0.0 0.5 1.0 triangle density 0.0 0.2 0.4 0.6 0.8 1.0-1.0-0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 28/36

Asymptotics/Extremal Properties of the ET Model: Critical Directions Technical assumption: n multiple of (k + 1)(k + 2). Most likely it can be removed. Pick a o k, k { 1, 0}. Let l k R 2 span the line through the origin parallel to L k (line segment joining v k and v k+1 ) and consider the halfspaces H + k = {x R 2 : x, l k 0} and H k = {x R 2 : x, l k < 0}. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 29/36

Asymptotics/Extremal Properties of the ET Model: Critical Directions Asymptotics along critical directions Let θ R 2, k > 1 and 0 < ɛ < 1, arbitrarily small. Then there exists an n 0 = n 0 (θ, ɛ, k) > 0 such that the following holds: for every n > n 0 and a multiple of (k + 1)(k + 2) there exists an r 0 = r 0 (β, ɛ, k, n) > 0 such that for all r > r 0, if θ H + k then P θ+ro k,n(v k+1 ) > 1 ɛ; if θ H k, then P θ+ro k,n(v k ) > 1 ɛ. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 29/36

Asymptotics/Extremal Properties of the ET Model: Critical Directions Asymptotics along critical directions Let θ R 2, k > 1 and 0 < ɛ < 1, arbitrarily small. Then there exists an n 0 = n 0 (θ, ɛ, k) > 0 such that the following holds: for every n > n 0 and a multiple of (k + 1)(k + 2) there exists an r 0 = r 0 (β, ɛ, k, n) > 0 such that for all r > r 0, if θ H + k then P θ+ro k,n(v k+1 ) > 1 ɛ; if θ H k, then P θ+ro k,n(v k ) > 1 ɛ. Asymptotics depends on whether θ, l k 0 or not. The model undergoes an asymptotic phase transition as θ traverse the boundary of H + k. This type of discontinuity only occurs when n. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 29/36

Asymptotics along Critical Directions. Example: k = 2. -1.0-0.5 0.0 0.5 1.0 triangle density 0.0 0.2 0.4 0.6 0.8 1.0-1.0-0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 30/36

The Critical directions o 1 and o 0 The critical directions o 1 and o 0 are different. -1.0-0.5 0.0 0.5 1.0 triangle density 0.0 0.2 0.4 0.6 0.8 1.0-1.0-0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 edge density A. Rinaldo Asymptotics and Extremal Properties of the ET Model 31/36

The Critical directions o 1 and o 0 The direction o 1 : For fixed ( n, the outer ) normal to the segment joining v 0,n and v n 1,n is o 1,n := 1,. For n and r large, the probability n n 2 P θ+ron,n, with θ = (θ 1, θ 2 ), assigns almost all of its mass to the empty graph when θ 1 + θ 2 < 0 and to the complete graph when θ 1 + θ 2 > 0, and it is uniform over v 0,n and v n 1,n when θ 1 n(n 1) + θ 2 (n 1)(n 2) = 0. The direction o 0 : For large r and n, P n,θ+ro+0 with θ = (θ 1, θ 2 ), becomes indistinguishable from the exponential family on the number of edges of triangle-free graphs with natural parameter θ 1. More refined result: replace triangle-free graphs" with subgraphs of T (n, 2)." (See Diaconis and Chatterjee, 2013.) A. Rinaldo Asymptotics and Extremal Properties of the ET Model 31/36

From Convergence in Total Variation to Convergence in Cut Metric 0.0 0.2 0.4 0.6 0.8 1.0 edge density We have shown that, when n is a multiple of (k + 2)(k + 1), P n,θ+ro converges in total variation (weakly for other n s) to point masses on the corner points" and to a distribution on the horizontal segment of the set T = triangle density 0.0 0.2 0.4 0.6 0.8 1.0 depending on o and θ, as n, r jointly. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 32/36

From Convergence in Total Variation to Convergence in Cut Metric 0.0 0.2 0.4 0.6 0.8 1.0 edge density We have shown that, when n is a multiple of (k + 2)(k + 1), P n,θ+ro converges in total variation (weakly for other n s) to point masses on the corner points" and to a distribution on the horizontal segment of the set T = triangle density 0.0 0.2 0.4 0.6 0.8 1.0 depending on o and θ, as n, r jointly. Can this be extended to some form of stochastic convergence of the sequence {G n,r } with G n,r P n,θ+ro? A. Rinaldo Asymptotics and Extremal Properties of the ET Model 32/36

From Convergence in Total Variation to Convergence in Cut Metric 0.0 0.2 0.4 0.6 0.8 1.0 edge density We have shown that, when n is a multiple of (k + 2)(k + 1), P n,θ+ro converges in total variation (weakly for other n s) to point masses on the corner points" and to a distribution on the horizontal segment of the set T = triangle density 0.0 0.2 0.4 0.6 0.8 1.0 depending on o and θ, as n, r jointly. Can this be extended to some form of stochastic convergence of the sequence {G n,r } with G n,r P n,θ+ro? Yes, but we need graphons! A. Rinaldo Asymptotics and Extremal Properties of the ET Model 32/36

From Convergence in Total Variation to Convergence in Cut Metric Let W be the pace of measurable and symmetric function from [0, 1] 2 into [0, 1], called graphons. For f W, the interval [0, 1] can be thought as a continuum of vertices, and f (x, y) = f (y, x) is the probability that x and y are connected by an edge. Any finite graph G n has a graphon representation { 1, if ( nx, ny ) is an edge in Gn, f Gn (x, y) = 0, otherwise. The cut distance between graphons f and g is d (f, g) = sup (f (x, y) g(x, y)) dxdy, S,T [0,1] S T Let W arise from W by identifying graphons that are at a cut-distance 0 and related by a measure preserving transformation. The space ( W, d ) is compact. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 33/36

From Convergence in Total Variation to Convergence in Cut Metric A sequence {G n} of possibly random graphs converges to the graphon f when lim n d (f Gn, f ) = 0. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 33/36

From Convergence in Total Variation to Convergence in Cut Metric Paraphrasing Our Results There exists sequences {n i, r i } i such that the sequence of random graphs {G i } i with G i P ni,θ+r i o satisfies lim i d (f G i, f Kr ) = 0 in probability, where f Tr is the Turán graphon on r classes f Tr (x, y) = { 1 if [xr] [yr], 0 otherwise, (x, y) [0, 1] 2, with r depending on o and possibly θ ([x] denotes the integer part of x). When o = o 0, the limiting graphon is pf T 2 with p (0, 1) depending on θ 1. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 34/36

The Variational Analysis Approach Most of our results can also be derived (in a slightly different form) using a powerful analytic method due to Chatterjee and Diaconis (2013). However, it gives less precise results along critical directions. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 35/36

The Variational Analysis Approach Most of our results can also be derived (in a slightly different form) using a powerful analytic method due to Chatterjee and Diaconis (2013). However, it gives less precise results along critical directions. Chatterjee and Diaconis (2013) showed that, for fixed, θ = (θ 1, θ 2 ), the limiting behavior of {G n} with G n Q n,θ, is determined by the solution of the variational problem ( ) sup f W θ 1 t(k 2, f ) + θ 2 t(k 3, f ) [0,1] 2 I(f )dxdy where I(u) = 1 u log u + 1 (1 u) log(1 u), and, for any finite graph H 2 2 on k node, t(h, f ) := f (x i, x j )dx 1 x k. [0,1] k {i,j} E(H), A. Rinaldo Asymptotics and Extremal Properties of the ET Model 35/36

The Variational Analysis Approach Most of our results can also be derived (in a slightly different form) using a powerful analytic method due to Chatterjee and Diaconis (2013). However, it gives less precise results along critical directions. Chatterjee and Diaconis (2013) showed that, for fixed, θ = (θ 1, θ 2 ), the limiting behavior of {G n} with G n Q n,θ, is determined by the solution of the variational problem ( ) sup f W θ 1 t(k 2, f ) + θ 2 t(k 3, f ) [0,1] 2 I(f )dxdy where I(u) = 1 u log u + 1 (1 u) log(1 u), and, for any finite graph H 2 2 on k node, t(h, f ) := f (x i, x j )dx 1 x k. [0,1] k {i,j} E(H) This result can be applied to the case θ along fixed directions., A. Rinaldo Asymptotics and Extremal Properties of the ET Model 35/36

Conclusions We have fully described the asymptotic extremal behavior of the ET model. We made heavy use of recent results on graph limits (see, e.g., the book by Lovász, 2012 for details) and probability (Chatterjee and Diaconis, 2013 and Chatterjee and Varadhan, 2011). Some of our results and technicques are likely to generalize to more complex ERGMs. A. Rinaldo Asymptotics and Extremal Properties of the ET Model 36/36