Overview course module Stochastic Modelling

Similar documents
Specification and estimation of exponential random graph models for social (and other) networks

Overview course module Stochastic Modelling. I. Introduction II. Actor-based models for network evolution

Continuous-time Statistical Models for Network Panel Data

Statistical Model for Soical Network

Introduction to statistical analysis of Social Networks

Conditional Marginalization for Exponential Random Graph Models

Modeling tie duration in ERGM-based dynamic network models

Chaos, Complexity, and Inference (36-462)

Actor-Based Models for Longitudinal Networks

Chaos, Complexity, and Inference (36-462)

Statistical models for dynamics of social networks: inference and applications

Parameterizing Exponential Family Models for Random Graphs: Current Methods and New Directions

Exponential random graph models for the Japanese bipartite network of banks and firms

Bayesian Analysis of Network Data. Model Selection and Evaluation of the Exponential Random Graph Model. Dissertation

Tied Kronecker Product Graph Models to Capture Variance in Network Populations

Overview of Stochastic Approaches to Social Network Analysis

Partners in power: Job mobility and dynamic deal-making

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data

Missing data in networks: exponential random graph (p ) models for networks with non-respondents

Random Effects Models for Network Data

Determining the E ects of Social Network Evolution

Assessing Goodness of Fit of Exponential Random Graph Models

Statistical Methods for Social Network Dynamics

Decision Making and Social Networks

Statistical Analysis of Longitudinal Network Data With Changing Composition

Curved exponential family models for networks

Network Dynamics. Tom A.B. Snijders. May 25, 2011

MULTILEVEL LONGITUDINAL NETWORK ANALYSIS

and ). Key words and phrases. Graphs, longitudinal data, method of moments, stochastic approximation,

A note on perfect simulation for exponential random graph models

Measuring the shape of degree distributions

Delayed Rejection Algorithm to Estimate Bayesian Social Networks

Using Potential Games to Parameterize ERG Models

Dynamic modeling of organizational coordination over the course of the Katrina disaster

An Introduction to Exponential-Family Random Graph Models

Statistical Models for Social Networks with Application to HIV Epidemiology

Tom A.B. Snijders Christian E.G. Steglich Michael Schweinberger Mark Huisman

Generalized Exponential Random Graph Models: Inference for Weighted Graphs

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 18 Feb 2004 Diego Garlaschelli a,b and Maria I. Loffredo b,c

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations

Goodness of Fit of Social Network Models 1

Social Networks 34 (2012) Contents lists available at ScienceDirect. Social Networks. jo ur nal homep ag e:

Appendix: Modeling Approach

Relational Event Modeling: Basic Framework and Applications. Aaron Schecter & Noshir Contractor Northwestern University

BAYESIAN ANALYSIS OF EXPONENTIAL RANDOM GRAPHS - ESTIMATION OF PARAMETERS AND MODEL SELECTION

Sequential Importance Sampling for Bipartite Graphs With Applications to Likelihood-Based Inference 1

Models for Longitudinal Network Data

Uncovering structure in biological networks: A model-based approach

LOGIT MODELS FOR AFFILIATION NETWORKS

Latent Stochastic Actor Oriented Models for Relational Event Data

Goodness of Fit of Social Network Models

Bernoulli Graph Bounds for General Random Graphs

Hierarchical Models for Social Networks

SMALL-WORLD NAVIGABILITY. Alexandru Seminar in Distributed Computing

EXPLAINED VARIATION IN DYNAMIC NETWORK MODELS 1. Tom A.B. SNIJDERS 2

c Copyright 2015 Ke Li

Instability, Sensitivity, and Degeneracy of Discrete Exponential Families

An Approximation Method for Improving Dynamic Network Model Fitting

Opinion Dynamics on Triad Scale Free Network

University of Groningen. The multilevel p2 model Zijlstra, B.J.H.; van Duijn, Maria; Snijders, Thomas. Published in: Methodology

Agent-Based Methods for Dynamic Social Networks. Duke University

Bayesian Meta-Analysis of Social Network Data via Conditional Uniform Graph Quantiles

LOGIT MODELS AND LOGISTIC REGRESSIONS FOR SOCIAL NETWORKS: I. AN INTRODUCTION TO MARKOV GRAPHS AND p* STANLEY WASSERMAN UNIVERSITY OF ILLINOIS

Deterministic Decentralized Search in Random Graphs

Inference in curved exponential family models for networks

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

Centrality in Social Networks: Theoretical and Simulation Approaches

Department of Statistics. Bayesian Modeling for a Generalized Social Relations Model. Tyler McCormick. Introduction.

Meeting Strangers and Friends of Friends: How Random are Social Networks?

Discrete Temporal Models of Social Networks

arxiv: v1 [stat.me] 3 Apr 2017

A Graphon-based Framework for Modeling Large Networks

Properties of Latent Variable Network Models

A multilayer exponential random graph modelling approach for weighted networks

The Beginning of Graph Theory. Theory and Applications of Complex Networks. Eulerian paths. Graph Theory. Class Three. College of the Atlantic

Author's personal copy

Heider vs Simmel: Emergent Features in Dynamic Structures

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 4 May 2000

A COMPARATIVE ANALYSIS ON COMPUTATIONAL METHODS FOR FITTING AN ERGM TO BIOLOGICAL NETWORK DATA A THESIS SUBMITTED TO THE GRADUATE SCHOOL

Exploratory Study of a New Model for Evolving Networks

An approximation method for improving dynamic network model fitting

CS224W Project Writeup: On Navigability in Small-World Networks

arxiv: v1 [stat.me] 1 Aug 2012

(51) 2019 Department of Mathematics & Statistics, Old Dominion University: Highdimensional multivariate time series with spatial structure.

Modeling Organizational Positions Chapter 2

Measuring Segregation in Social Networks

Assessing the Goodness-of-Fit of Network Models

TEMPORAL EXPONENTIAL- FAMILY RANDOM GRAPH MODELING (TERGMS) WITH STATNET

Consistency Under Sampling of Exponential Random Graph Models

An Introduction to Stochastic Actor Oriented Models

Mathematical Foundations of Social Network Analysis

Learning latent structure in complex networks

Siena Advanced Users Workshop, 2018

Markov Chains and MCMC

Spectral Graph Theory Tools. Analysis of Complex Networks

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Modeling the co-evolution of networks and behavior

Data Mining and Analysis: Fundamental Concepts and Algorithms

Node and Link Analysis

Transcription:

Overview course module Stochastic Modelling I. Introduction II. Actor-based models for network evolution III. Co-evolution models for networks and behaviour IV. Exponential Random Graph Models A. Definition B. Example: gossip about the boss C. Estimation / goodness of fit issues LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 123

A: Exponential Random Graph Models history: Frank & Strauss Markov random graph model (1986), Frank (1991) and Wasserman & Pattison (1996) generalised to exponential family distribution: Pr( X = x) exp( β ks k ( x)) K k= 1 proportional to proportionality constant unknown and practically uncalculable(!) linear combination of network statistics very flexible! Likelihood of macro structure x is explained by prevalence of micro structures s(x), testable via parameters β. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 124

Some possible statistics tie count statistic i j i reciprocity statistic (for directed graphs) i j i transitive closure i j i k n sk ( x) = x j i, j= 1 n ij s ( x ) = x x k k ij ji i, j= 1 j n s ( x) = x x x k ij jk ik i, j, k= 1 j A positive parameter β attached to any of these statistics means: the right configuration is more likely than the left one. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 125

Example Consider an ERG model for an undirected network with 3 parameters for these statistics: n 1 (1) number of edges sedges ( x) = xij 2 1 s2 ( x) = x x 2 stars ij ik i, j, k= 1 (3) number of triangles n i, j = 1 number of 2-stars (2) 1 s ( x) = x x x triangles ij jk ik 6 i, j, k= 1 Then the 3-parameter ERG distribution function is this one: Pr( X = x) exp( β s ( x) edges + β s 2 stars 2 stars + β s triangles edges triangles ( x) ( x)) LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 126 n

Example and consider the following two 4-node-networks & their statistics: x a x b s edges 4 3 s 2-stars 5 3 s triangles 1 0 LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 127

Example Probabilities are (only) given up to a proportionality factor: x a x b Pr(x a ) exp(4 β edges +5 β 2-stars +1 β triangles ) Pr(x b ) exp(3 β edges +3 β 2-stars ) LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 128

but the ratio of probabilities can be calculated! x a x b a Pr( x ) b Pr( x ) = exp(4β + 5 β + β ) edges 2 stars triangles exp(3β + 3 β ) edges 2 stars ( β ) edges β2 stars βtriangles = exp (4 3) + (5 3) + = exp( β + 2 β + β ) edges 2 stars triangles LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 129

Local characterisation of ERGMs ERG distribution is considered as a collection of local conditional tie probabilities a Pr( x ) b Pr( x ) K = exp βk k ( ) k ( ) k= 1 ( a b s x s x ) compared are a network x a and a neighbouring network x b differences in model statistics between x a and x b What this equation refers to is the (conditional) probability of the one tie by which they differ! model parameters LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 130

Recall similar formula for dynamic networks Local characterisation of ERG models: a K Pr( x ) = exp k k ( ) k ( ) b β Pr( x ) k= 1 ( a b s x s x ) Local characterisation for actor-based network evolution models: c a K i x exp ( a b ( ) ( )) = c b βk sik x sik x i x k= 1 Pr( x ) Pr( x ) compared are two moves ( micro steps ) made by actor i from a network x c to two neighbouring networks x a and x b model parameters difference in model statistics of actor i between the two compared moves LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 131

so how do these conditional odds for the middle tie to exist (vs. not to exist) look like in applications? Suppose, in a (larger) trade network, estimation gave: redundant ties are avoided: β triangles = -0.4 positive degree variance: β 2-stars = 0.1 low density: β edges = -1.5 LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 132

Then the equation becomes: Pr( middle tie) Pr( no middle tie) = exp( β + 2 β + β ) edges 2 stars triangles = exp( 1.5 + 2 0.1 0.4) = exp( 1.7) 0.183 1/5.5 i.e., the middle tie is about 5.5 times as likely NOT to exist as to exist [given the rest of the network]. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 133

Same model in a different application: Suppose, in a (larger) friendship network, estimates were: closure: β triangles = 0.4 pos. degree variance: β 2-stars = 0.1 low density: β edges = -1.5 Then the equation becomes: Pr( middle tie) exp( 0.9) 0.407 1/2.5 Pr( no middle tie) = Still two½ times as likely NOT to exist but always keep in mind that all ties are random LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 134

random ties are benchmark for evaluating such conditional odds! Hence, consider these networks as baseline comparison: Pr( tie) Pr( no tie) β edges = exp( ) = exp( 1.5) 0.223 1/ 4.5 In the friendship context (odds=2.5), the tie in question is relatively likely to be present, In the trade context (odds=5.5), it is relatively unlikely to be present, given the network s density. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 135

Exponential Random Graph Models high flexibility due to the many possibilities of choosing statistics & controlling effects for each other + can express small worlds (see Robins et al. 2005) but also other network types network probability model: not tied to any particular algorithm+ problems: estimation, model specification, interpretation LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 136

B: Example: gossip about the boss data from Lea Ellwardt (2008) organisation for social work with juveniles one department with internal teams 28 employees gossip about the boss in this department was largely negative LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 137

Hypotheses: 1. closeness / proximity is associated with gossip friendship / liking predicts gossip communication frequency predicts gossip team structure predicts gossip gossip occurs in local clusters 2. information asymmetry is associated with gossip dissimilarity in / lack of contacts with the boss predicts g sp. 3. negative attitude is associated with gossip distrust in management predicts gossip disliking the boss predicts gossip 4. attitude similarity is associated with gossip LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 138

Results from an exponential random graph analysis: (1) (1) (1) (1) (1) Unpredicted: heterogeneity of outdegrees (1) LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 139

Results from an exponential random graph analysis: (3) (3) (2) (3) (3) (2) LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 140

Results from an exponential random graph analysis: (4) (4) (4) * p<0.05 (one-sided test) Conclusion: by and large, the analysis supports the hypotheses LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 141

Example: 50 Scottish girls nominated friendship data from Michell & Amos (1997) 50 girls Scotland 1995 1 st year secondary 13 years old triangulation can be observed LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 142

triangulation general: typically more triangles in data than expected under random tie formation zooming in on the next slide Red markers stand for empirically observed data sets, black line for expectations under randomness. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 143

Red markers stand for observed values in several empirical data sets; the black line indicates expectations under randomness. Red markers lie way above the random expectation. So: include triangulation in your model for this type of data! LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 144

First try: estimate a simple model with triangulation / transitive closure tendencies include precursor configurations in attempt to arrive at a meaningful model (Snijders 2002) 2-path in-2-star out-2-star LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 145

Good convergence is indicated by the t-statistics being close to zero. One or more of the t-statistics are rather large. Convergence of the algorithm is doubtful. @2 Estimation results. ------------------- Regular end of estimation algorithm. Total of 2092 iteration steps. @3 Estimates and standard errors 1. u: number of arcs 4.9212 ( 1.7544) 2. u: reciprocity 3.9062 ( 0.3581) 3. u: in-2-stars -1.2377 ( 0.3367) 4. u: 2-paths -0.8630 ( 0.1349) 5. u: transitive triplets -12.7060 ( 99.0000) LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 146

Second try: estimate another model with new specification (Snijders et al. 2006) m n 2 2 s ( x) = ( 1) k m= 1 τ m+ 1 m m 1 alternating m-triangles statistic τ m = number of m-triangles (configurations on the left) 2 LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 147

Good convergence is indicated by the t-statistics being close to zero. @2 Estimation results. ------------------- Regular end of estimation algorithm. Total of 2015 iteration steps. @3 Estimates and standard errors 1. u: number of arcs -3.2498 ( 0.3456) 2. u: reciprocity 4.4430 ( 0.3867) 3. u: alternating m-triangles, parameter 2 0.6734 ( 0.0758) Why does estimation seem so difficult for some models? Why so complex statistics like alternating m-triangles? LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 148

C: Estimation of ERGMs estimation is based on simulation: 1. assume starting values for parameters 2. simulate networks that are drawn from this parametrisation s ERG distribution 3. compare them to actually observed (and to-be-modelled) network 4. adjust parameters in order to get better fit [on desired dimensions] 5. repeat from 2. until parameter values converge this unfortunately does not always work, and notably triangulation effects cause problems LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 149

see here the distribution of the number of transitive triplets over simulated networks from a model like the first we estimated Problem: nonzero probability of a full network! This is unrealistic and implies that estimates are meaningless LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 150

zooming in on the low end (i.e., realistic networks; vertical line = observed value): because observed value is almost never assumed by simulations only on average! LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 151

To keep in mind about ERGMs: Visualise ERG models as probability distributions on a (huge) space of all possible network, one observed network is modelled as drawn from that distribution. Model parameters β are attached to network statistics s, these statistics in general correspond to subgraph counts (local patterns, motifs ), the parameters describe the relative prevalence of the corresponding subgraph in the total graph. Interpretation of parameters needs to take into account other parameters (notably edges / arcs). LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 152

References (in chronological order of appearance) Small worlds and scale-free networks: Barabási, A.L., and Albert, R. (1999). Emergence of scaling in random networks Science, 286:509-512. Caldarelli, G.; Capocci, A.; De Los Rios, P.; Munoz, M. A. (2002). Scale-free Networks without Growth or Preferential Attachment: Good get Richer. eprint arxiv:cond-mat/0207366. Guare, J. (1990). Six Degrees of Separation: A Play. New York: Vintage Books. Kasturirangan, R. (1999). Multiple scales in small-world graphs. Massachusetts Institute of Technology AI Lab Memo 1663. Also cond-mat: 9904055. Kleinberg, J.M. (2000). Navigation in a small world. Nature 406, 845. Milgram, S. (1967). The Small World Problem. Psychology Today, 2, 60-67. Newman, M.E.J. (2000). Models of the Small World. Journal of Statistical Physics, 101, 819-841. Watts, D.J. (1999). Networks, dynamics, and the small-world phenomenon. American Journal of Sociology, 105, 493-527. Watts, D.J., and Strogatz, S.H. (1998). Collective Dynamics of small world networks. Nature, 393, 440-442. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 153

Classic paper on testing the triad census against tie-independence: Holland, P.W. & Leinhardt, S. (1978). An Omnibus Test for Social Structure Using Triads. Sociological Methods & Research 7: 227-256. (see also Wasserman & Faust, chp.14) Null model assumes tie-independence, given the degree distribution: Karlberg,M. (1999). Testing Transitivity in Digraphs. Sociological Methodology 29 (1): 225 251. Papers on permutation tests: Hubert, L.J., & Schultz, J. (1976). Quadratic assignment as a general data analysis strategy. British Journal of Mathematical and Statistical Psychology 29: 190-241. Krackhardt, D. (1988). Predicting with networks: nonparametric multiple regression analysis of dyadic data. Social Networks 10: 359-381. Dekker, D., Krackhardt, D., & Snijders, T.A.B. (2003). Multicollinearity robust QAP for multiple regression. Forthcoming in Psychometrika. Butts, C.T. (2005). Permutation models for relational data. Forthcoming in Sociological Methodology. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 154

Exponential Random Graph Models: Frank, Ove. 1991. Statistical Analysis of Change in Networks. Statistica Neerlandica, 45: 283-293. Frank, Ove, and David Strauss. 1986. Markov Graphs. Journal of the American Statistical Association, 81: 832-842. Robins, G.L., Woolcock, J., and Pattison, P. (2005) Small and other worlds: Global network structures from local processes. American Journal of Sociology, 110, 894-936. Snijders, Tom A.B., Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock (2006). New specifications for exponential random graph models. To be published, Sociological Methodology. Handcock, Mark S. (2003). Assessing degeneracy in statistical models of social networks. Working paper no. 39, Center for Statistics and the Social Sciences. Seattle: University of Washington. Snijders, Tom A.B. (2002) Markov Chain Monte Carlo Estimation of Exponential Random Graph Models Journal of Social Structure. Volume 3, number 2. Snijders, Tom A.B., Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock (2006). New specifications for exponential random graph models. Sociological Methodology. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 155

Network dynamics: Snijders, T.A.B., Ch. Steglich, M. Schweinberger and M.E. Huisman (2007). Manual for SIENA version 3. Torlò, V., Steglich, Ch., Lomi, A. and Snijders, T.A.B. (2007). Network Influence, Social Selection and Individual Performance in Organizations. Working paper. http://www.gmw.rug.nl/~steglich/pdf/tsls2007.pdf Co-evolution: Steglich, Christian,Tom Snijders, and Patrick West, (2006). Applying SIENA: An illustrative analysis of the co-evolution of adolescents' friendship networks, taste in music, and alcohol consumption. Methodology 2(1), 48-56. The s50 data set: Michell, L., and Amos, A. (1997) Girls, pecking order and smoking. Social Science and Medicine, 44, 1861-1869. LINKS workshop 2009, module Stochastic Modelling, Christian Steglich 156