Consistency Under Sampling of Exponential Random Graph Models

Similar documents
An Introduction to Exponential-Family Random Graph Models

Bayesian nonparametric models of sparse and exchangeable random graphs

Nonparametric Bayesian Matrix Factorization for Assortative Networks

arxiv: v2 [math.st] 21 Nov 2017

Asymptotics and Extremal Properties of the Edge-Triangle Exponential Random Graph Model

14 : Theory of Variational Inference: Inner and Outer Approximation

Specification and estimation of exponential random graph models for social (and other) networks

3 : Representation of Undirected GM

Algorithmic approaches to fitting ERG models

Chaos, Complexity, and Inference (36-462)

Lecture 2: Exchangeable networks and the Aldous-Hoover representation theorem

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling

Chaos, Complexity, and Inference (36-462)

Graph Detection and Estimation Theory

Conditional Marginalization for Exponential Random Graph Models

Ground states for exponential random graphs

Undirected Graphical Models

Likelihood Analysis of Gaussian Graphical Models

Algebraic Representation of Networks

Efficient Approximation for Restricted Biclique Cover Problems

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Bayesian Inference for Contact Networks Given Epidemic Data

X X (2) X Pr(X = x θ) (3)

STA 4273H: Statistical Machine Learning

Supporting Statistical Hypothesis Testing Over Graphs

Lecture 13: Spectral Graph Theory

Lecture 4: Graph Limits and Graphons

Notes 6 : First and second moment methods

This section is an introduction to the basic themes of the course.

Combinatorial Optimization

Based on slides by Richard Zemel

Dominating Configurations of Kings

Probabilistic Graphical Models

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Peter J. Dukes. 22 August, 2012

Statistical and Computational Phase Transitions in Planted Models

Undirected Graphical Models

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Math 123, Week 2: Matrix Operations, Inverses

Undirected Graphical Models: Markov Random Fields

What is model theory?

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Lecture 7. Econ August 18

Basic math for biology

1 Complex Networks - A Brief Overview

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Parameter estimators of sparse random intersection graphs with thinned communities

Statistical Model for Soical Network

Codes on graphs. Chapter Elementary realizations of linear block codes

6.867 Machine Learning

Bargaining, Information Networks and Interstate

Measuring Social Influence Without Bias

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Parametrizations of Discrete Graphical Models

Bayesian Analysis of Network Data. Model Selection and Evaluation of the Exponential Random Graph Model. Dissertation

Modeling tie duration in ERGM-based dynamic network models

IV. Analyse de réseaux biologiques

Sampling and incomplete network data

BIRTHDAY PROBLEM, MONOCHROMATIC SUBGRAPHS & THE SECOND MOMENT PHENOMENON / 23

Estimation for Dyadic-Dependent Exponential Random Graph Models

VC-DENSITY FOR TREES

1 Mechanistic and generative models of network structure

Lecture: Modeling graphs with electrical networks

CSC 2541: Bayesian Methods for Machine Learning

Protein Complex Identification by Supervised Graph Clustering

Conditional probabilities and graphical models

14 : Theory of Variational Inference: Inner and Outer Approximation

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

An Application of First-Order Logic to a Problem in Combinatorics 1

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Physics 141 Dynamics 1 Page 1. Dynamics 1

Conditional Independence and Factorization

Phase Transitions in Networks: Giant Components, Dynamic Networks, Combinatoric Solvability

Department of Statistics. Bayesian Modeling for a Generalized Social Relations Model. Tyler McCormick. Introduction.

Measuring Segregation in Social Networks

Statistical Models for Social Networks with Application to HIV Epidemiology

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

GROUP THEORY PRIMER. New terms: so(2n), so(2n+1), symplectic algebra sp(2n)

NP Completeness and Approximation Algorithms

MAP Examples. Sargur Srihari

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Topic Dependencies for Electronic Books

Triangle-free graphs with no six-vertex induced path

CHAPTER 1. Relations. 1. Relations and Their Properties. Discussion

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

c Copyright 2015 Ke Li

Project in Computational Game Theory: Communities in Social Networks

Lecture 3: Simulating social networks

MODEL ANSWERS TO HWK #7. 1. Suppose that F is a field and that a and b are in F. Suppose that. Thus a = 0. It follows that F is an integral domain.

Nonparametric Bayesian Methods (Gaussian Processes)

A Brief Introduction to Copulas

Probabilistic Graphical Models

Mixed Membership Stochastic Blockmodels

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

The Probabilistic Method

Graphical Models. Lecture 12: Belief Update Message Passing. Andrew McCallum

Assessing Goodness of Fit of Exponential Random Graph Models

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

Graphical Model Inference with Perfect Graphs

Transcription:

Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar

Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient statistics count the number of edges, triangles, k- cliques, etc Only observe sub-networks Estimate from sub-network, extrapolate to whole network

Practical questions 1. Are the MLEs from the sub-graph consistent? (What does consistency mean in this setting?) 0. Logically Prior question: is it probabilistically consistent to apply the same ERGM, with the same parameters, both to the whole network and its sub-networks?

Definition 1: Projective The family P A,Θ A A is projective when A B implies that P A,θ can be recovered by marginalization [over B\A?] over P B,θ, for all θ.

Definition 2: Separable Increments The sufficient statistics of the family P A,Θ A A have separable increments when, for each A B, x X A, the range of possible increments δ is the same for all x, and the conditional volume factor is constant in x, that is, v B\A A δ, x = v B\A (δ). [The number {of super-graphs in B consistent with a subgraph in A and for which the extra statistic (increment) = δ} does not depend on the subgraph in A?] Note: this depends only on the functional forms of the sufficient statistics, and not on the model parameters.

Projective Theorems/Propositions: The exponential family P A,Θ A A is projective if and only if the sufficient statistics T A A A have separable increments. (Predictive Sufficiency) In a projective exponential family, the distribution of X B\A conditional on X A depends on the data only through T B\A.

Consistency Theorems Suppose that model P θ (infinite-dimensional A) is projective, and that the log partition function obeys equation (10) for each A A. Then the MLE exists and is strongly consistent. wrt a growing sequence of sets A s.t. a function of the measure of the size, r A (10) log z A θ a A θ = r A a(θ) If instead lim r A a A θ r A = a(θ), then convergence in probability still holds (theorem 4). If (10) holds only for some θ, then MLE may be consistent only for some θ. All components of T A must be scaled by the same factor r A.

Application: ERGM Remember that sufficient statistics typically are vectors of counts of edges, triangles, cliques, k-stars, etc. n Dyadic independence (t X = i=1 j<i t ij (X ij, X ji )) implies projectability via Theorem 1. This applies to the β-model. Homophily (friend of a friend is a friend) causes problems

Application: ERGM (p. 16) Sadly, no statistic which counts triangles, or larger motifs, can have the nice additive form of dyad counts, no matter how we decompose the network. Take, for instance, triangles. Any given edge among the first n nodes could be part of a triangle, depending on ties to the next node. Thus to determine the number of triangles among the first n+1 nodes, we need much more information about the sub-graph of the first n nodes than just the number of triangles among them. Indeed, we can go further. The range of possible increments to the number of triangles changes with the number of existing triangles. This is quite incomparable with separable increments, so, by (1), the parameters cannot be projective. Question: Can we look at this problem so one-dimensionally?

Application: ERGM (p.16-17) While these ERGMs are not projective, some of them may still satisfy equation (14) [and thus the MLEs converge in probability]. For instance, in models where T has two elements, the number of edges and the (normalized) number of triangles or of 2-stars, the log partition function is known to scale like n(n 1) as n, at least in the parameter regimes where the models behave basically like either very full or very empty Erdős-Rényi networks Since these models are not projective, however, it is impossible to improve parameter estimates by getting more data, since parameters for smaller sub-graphs just cannot be extrapolated to larger graphs (or vice versa) Such an ERGM may provide a good description of a social network on a certain set of nodes, but it cannot be projected to give predictions on any larger or more global graph from which that one was drawn.

Application: ERGM (p. 17) If an ERGM is postulated for the whole network, then inference for its parameters must explicitly treat the unobserved portions of the network as missing data (perhaps through an expectation-maximization algorithm), though of course there may be considerable uncertainty about just how much data is missing. Question: Does this make sense in light of the last slide?

Solutions? Model the evolution of networks over time Hanneke, Fu and Xing consider situations where the distribution of the network at time t+1 conditional on the network at time t follows an exponential family. Even when the statistics in the conditional specification include (say) changes in the number of triangles, the issues raised above do not apply. Give up the exponential family form Won t work because Lauritzen showed that whenever the sufficient statistics for a semi-group, the models must be either ordinary exponential families, or certain generalizations thereof with much the same properties.

The end (p 18) every infinite exchangeable graph distribution is actually a mixture over projective dyadic-independence distributions, though not necessarily ones with a finite-dimensional sufficient statistic. Along any one sequence of sub-graphs from such an infinite graph, in fact, the densities of all motifs approach limiting values which pick out a unique projective dyadicindependence distribution This suggests that an alternative to parametric inference would be nonparametric estimation of the limiting dyadic-independence model, by smoothing the adjacency matrix; this, too, we pursue elsewhere.

Naïve Thoughts and Questions Did I miss the assertion that the ERGM MLEs from sub-graphs are not consistent? Is the problem that sampling/structure is considered to be uniform? That is, any newly sampled node is equally likely to be connected to all the existing identical nodes (who are currently part of the same motifs)? Is there a way to think about a network with 4 nodes as a collection of 4 nonindependent networks with 3 nodes? Or 6 networks with 2 nodes? I.e., can we define a dimension that is scientifically meaningful and start there, rather than with all of the nodes? 6 degrees of Kevin Bacon style? Then think of the smaller networks as coming from a mixture of ERGMs? Is the problem that we are thinking about binary adjacency matrices? If we extended the matrices to ordinal or continuous measures of connection, would local networks make more sense?