Assessing the Goodness-of-Fit of Network Models

Size: px

Start display at page:

Download "Assessing the Goodness-of-Fit of Network Models"

Willa Bradford
5 years ago
Views:

David Hunter Steve Goodreau Martina Morris and the U.

1 Assessing the Goodness-of-Fit of Network Models Mark S. Handcock Department of Statistics University of Washington Joint work with David Hunter Steve Goodreau Martina Morris and the U. Washington Network Modeling Group Supported by NIH NIDA Grant DA023 and NICHD Grant HD04 NIPS Networks Workshop 200, December 200

2 Examples of Friendship Relationships The National Longitudinal Study of Adolescent Health Add Health" is a school-based study of the health-related behaviors of adolescents in grades to 2. Each nominated up to 5 boys and 5 girls as their friends 60 schools: Smallest has 6 adolescents in grades 2

4 Grade Grade Grade Grade 0 Grade Grade 2 White (non-hispanic) Black (non-hispanic) Hispanic (of any race) Asian / Native Am / Other (non-hispanic) Race NA

5 Statistical Models for Social Networks Notation A social network is defined as a set of n social actors" and a social relationship between each pair of actors.

6 Statistical Models for Social Networks Notation A social network is defined as a set of n social actors" and a social relationship between each pair of actors. { relationship from actor i to actor j Y ij = 0 otherwise

7 Statistical Models for Social Networks Notation A social network is defined as a set of n social actors" and a social relationship between each pair of actors. { relationship from actor i to actor j Y ij = 0 otherwise call Y [Y ij ] n n a sociomatrix a N = n(n ) binary array

8 Statistical Models for Social Networks Notation A social network is defined as a set of n social actors" and a social relationship between each pair of actors. { relationship from actor i to actor j Y ij = 0 otherwise call Y [Y ij ] n n a sociomatrix a N = n(n ) binary array The basic problem of stochastic modeling is to specify a distribution for Y i.e., P(Y = y)

9 A Framework for Network Modeling Let Y be the sample space of Y e.g. {0, } N Any model-class for the multivariate distribution of Y can be parametrized in the form: P η (Y = y) = exp{η g(y)} κ(η, Y) y Y η Λ R q q-vector of parameters Besag (4), Frank and Strauss (6) g(y) q-vector of network statistics. g(y ) are jointly sufficient for the model For a saturated" model-class q = 2 Y κ(η, Y) distribution normalizing constant κ(η, Y) = y Y exp{η g(y)}

10 Approximating the loglikelihood Suppose Y, Y 2,..., Y m i.i.d. P η0 (Y = y) for some η 0. Using the LOLN, the difference in log-likelihoods is l(η) l(η 0 ) = log κ(η 0) κ(η)

11 Approximating the loglikelihood Suppose Y, Y 2,..., Y m i.i.d. P η0 (Y = y) for some η 0. Using the LOLN, the difference in log-likelihoods is l(η) l(η 0 ) = log κ(η 0) κ(η) = log E η0 (exp {(η 0 η) g(y )})

12 Approximating the loglikelihood Suppose Y, Y 2,..., Y m i.i.d. P η0 (Y = y) for some η 0. Using the LOLN, the difference in log-likelihoods is l(η) l(η 0 ) = log κ(η 0) κ(η) = log E η0 (exp {(η 0 η) g(y )}) log M exp {(η 0 η) (g(y i ) g(y obs ))} M i= l(η) l(η 0 ).

13 Approximating the loglikelihood Suppose Y, Y 2,..., Y m i.i.d. P η0 (Y = y) for some η 0. Using the LOLN, the difference in log-likelihoods is l(η) l(η 0 ) = log κ(η 0) κ(η) = log E η0 (exp {(η 0 η) g(y )}) log M exp {(η 0 η) (g(y i ) g(y obs ))} M i= l(η) l(η 0 ). Simulate Y, Y 2,..., Y m using a MCMC (Metropolis-Hastings) algorithm Handcock (2002).

14 Approximating the loglikelihood Suppose Y, Y 2,..., Y m i.i.d. P η0 (Y = y) for some η 0. Using the LOLN, the difference in log-likelihoods is l(η) l(η 0 ) = log κ(η 0) κ(η) = log E η0 (exp {(η 0 η) g(y )}) log M exp {(η 0 η) (g(y i ) g(y obs ))} M i= l(η) l(η 0 ). Simulate Y, Y 2,..., Y m using a MCMC (Metropolis-Hastings) algorithm Handcock (2002). Approximate the MLE ˆη = argmax η { l(η) l(η 0 )} (MC-MLE) Geyer and Thompson (2)

15 Estimating the change in log-likelihood Theoretically, the estimated value of l(θ) l(θ 0 ) converges to the true value as the size of the MCMC sample increases, regardless of the value of θ 0.

16 Estimating the change in log-likelihood Theoretically, the estimated value of l(θ) l(θ 0 ) converges to the true value as the size of the MCMC sample increases, regardless of the value of θ 0. However, in practice this convergence can be agonizingly slow, especially if θ 0 is not chosen close to the maximizer of the likelihood. Hunter and Handcock (2006)

17 Measures of the Goodness-of-fit of models As this is an Exponential family, natural to measure goodness-of-fit via deviance [ ] deviance = 2 l(saturated model) l(ˆθ)) and residual deviance = 2 [ ] l(ˆθ) l(0)

18 Measures of the Goodness-of-fit of models As this is an Exponential family, natural to measure goodness-of-fit via deviance [ ] deviance = 2 l(saturated model) l(ˆθ)) and residual deviance = 2 [ ] l(ˆθ) l(0) Standard" asymptotic arguments approximate this by a χ 2 distribution

19 Measures of the Goodness-of-fit of models As this is an Exponential family, natural to measure goodness-of-fit via deviance [ ] deviance = 2 l(saturated model) l(ˆθ)) and residual deviance = 2 [ ] l(ˆθ) l(0) Standard" asymptotic arguments approximate this by a χ 2 distribution The standard asymptotic approximation can be very bad here...

20 Measures of the Goodness-of-fit of models As this is an Exponential family, natural to measure goodness-of-fit via deviance [ ] deviance = 2 l(saturated model) l(ˆθ)) and residual deviance = 2 [ ] l(ˆθ) l(0) Standard" asymptotic arguments approximate this by a χ 2 distribution The standard asymptotic approximation can be very bad here... but the deviance may still be a useful measure of fit if properly calibrated. Hunter and Handcock (2006)

21 How can we tell if a model class is useful? Many aspects: Is the model-class itself able to represent a range of realistic networks? model degeneracy: small range of graphs covered as the parameters vary (Handcock 2003)

22 How can we tell if a model class is useful? Many aspects: Is the model-class itself able to represent a range of realistic networks? model degeneracy: small range of graphs covered as the parameters vary (Handcock 2003) What are the properties of different methods of estimation? e.g, MLE, psuedolikelihood, Bayesian framework computational failure: estimates do not exist for certain observable graphs

23 How can we tell if a model class is useful? Many aspects: Is the model-class itself able to represent a range of realistic networks? model degeneracy: small range of graphs covered as the parameters vary (Handcock 2003) What are the properties of different methods of estimation? e.g, MLE, psuedolikelihood, Bayesian framework computational failure: estimates do not exist for certain observable graphs Can we assess the goodness-of-fit of models? appropriate measures and tests (Besag 2000; Hunter, Goodreau, Handcock 200)

24 Model Degeneracy idea: A random graph model is near degenerate if the model places almost all its probability mass on a small number of graph configurations in Y. e.g. empty graph, full graph, an individual graph, no 2 stars, mono-degree graphs

25 Model Degeneracy idea: A random graph model is near degenerate if the model places almost all its probability mass on a small number of graph configurations in Y. e.g. empty Classes graph, of statistics full graph, usedan forindividual modeling graph, no 2 stars, mono-degree graphs ) Nodal Markov statistics Frank and Strauss (6) Example: The 2-star model motivated by notions of symmetry and homogeneity P(Y = y) = exp{η edges in Y that do note(y) share an+ actor η 2 S(y)} are c(η, η 2 ) is near-degenerate for most values of η 2 > 0 E(y) = y ij S(y) = y ij y ik i<j i<j<k y Y conditionally independent given the rest of the network analogous to nearest neighbor ideas in spatial statistics Degree distribution: d k (y) = proportion of nodes of degree k in y. k-star distribution: s k (y) = proportion of k-stars in the graph y. triangles: t (y) = proportion of triangles in the graph y.. j 3 h i i. i j j j 2 j j 2 triangle two-star three-star = transitive triad Mark S. Handcock Statistical Modeling With ERGM Figure: Some configurations for non-directed graphs

28 Geometry of Exponential Random Graph Models Consider the alternative parametrization of the models µ : Λ int(c) defined by µ(η) = E η [Z (Y )] y Y Z (y) exp{ηt Z (y)} c(η) The mapping is injective: µ(η a ) = µ(η b ) P ηa (Y = y) = P ηb (Y = y) y. The mapping in strictly increasing in the sense that (η a η b ) T (µ(η a ) µ(η b )) 0 with equality only if P ηa (Y = y) = P ηb (Y = y) y. Represents an alternative parameterization of the model

29 Example of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} y Y c(η, η 2 ) where E(y) is the number of edges (0 N = ( g 2) ) S(y) is the number of 2 stars (0 M = 3 ( g 3) )

30 Example of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} y Y c(η, η 2 ) where E(y) is the number of edges (0 N = ( g 2) ) S(y) is the number of 2 stars (0 M = 3 ( g 3) ) µ = E η [E(Y )] = i<j E[Y ij ] = NE[Y 2 ] µ is the expected number of edges, or N µ is the probability that two actors are linked.

31 Example of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} y Y c(η, η 2 ) where E(y) is the number of edges (0 N = ( g 2) ) S(y) is the number of 2 stars (0 M = 3 ( g 3) ) µ = E η [E(Y )] = i<j E[Y ij ] = NE[Y 2 ] µ is the expected number of edges, or N µ is the probability that two actors are linked. µ 2 = E η [S(Y )] = E[Y ij Y ik ] = ME[Y 2 Y 3 ] i<j<k µ 2 is the expected number of 2 stars, or M µ 2 is the probability that a given actor is tied to two randomly chosen other actors.

32 Figure 4: Regions of the parameter space of µ µ 2 : mean 2 stars parameter µ : mean edges parameter

33 Figure 5: Regions of the parameter space of θ θ 2 : 2 stars parameter θ : edges parameter

34 In some cases mixed parameterizations may be better Let (t (), t (2) ) be a partition of t such that: t () is interpretable as a mean value parametrization t (2) is interpretable as the natural" conditional log-odds Consider similar partitions (η (), η (2) ) of η and (µ () (η), µ (2) (η)) of µ(η). Let Λ (2) be the set of values of η (2) for η varying in Λ and C () be the convex hull of {t () (y) : y Y}. The mapping η : Λ Λ (2) int(c () ) defined by η(η) = (µ () (η), η (2) ) () is a mixed parametization of the model (Y, t, η). The components µ () and η (2) are variationally independent, that is, the range of η(η) is a product space.

35 Degeneracy in the mean value parametization Definition: A model is near degenerate if µ(η) is close to the boundary of C

36 Degeneracy in the mean value parametization Definition: A model is near degenerate if µ(η) is close to the boundary of C Let deg Y = {y Y : Z (y) bdc} be the set of graph on the boundary of the convex hull. idea: Based on the geometry of the mean value parametrization the expected sufficient statistics are close to a boundary of the hull and the model will place much probability mass on graphs in deg Y.

37 Figure 4: Regions of the parameter space of µ µ 2 : mean 2 stars parameter µ : mean edges parameter

38 This statement can be quantified in a number of ways: Result: Let e be a unit vector in R q and bd(e) = sup µ intc (e T µ). µ(λe) bd(e)e as λ. 2 P λe,y (Y deg Y) as λ. 3 For every d < bd(e), P λe,y (e T Z (Y ) d) 0 as λ. 4 Let η 0 intc. Then Kullback Leibler divergence(η 0 ; λe) as λ.

39 Effect of Near-Degeneracy on MCMC Estimation Closely related to nice properties of simple MCMC schemes (Geyer ). If a random graph model is simulated using a MCMC based on a near-degenerate ψ it will very likely fail.

40 Effect of Near-Degeneracy on MCMC Estimation Closely related to nice properties of simple MCMC schemes (Geyer ). If a random graph model is simulated using a MCMC based on a near-degenerate ψ it will very likely fail. Full-conditional MCMC with dyad update: M(ψ) = max y Y ψt δ(y c ij ) where δ(yij c) = Z (y + ij ) Z (y ij ) As µ(ψ) bd(c), M(ψ) There exists y Y with [ ] logit P(Y ij = Yij c = yij c ) = ±M(ψ) If ψ is near-degenerate then M(ψ) is large and the MCMC will mix very slowly.

41 Example of degeneracy of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} c(η, η 2 ) M(η) = max{ η, η + 2(g 2)η 2 } y Y

42 Example of degeneracy of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} c(η, η 2 ) y Y M(η) = max{ η, η + 2(g 2)η 2 } MCMC will usually mix poorly.

43 Example of degeneracy of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} c(η, η 2 ) y Y M(η) = max{ η, η + 2(g 2)η 2 } MCMC will usually mix poorly. If µ(η) close to (3, 0) (e.g., η = (4.5,.4)) then M(η) = 4.5 So an MCMC will approach (3, 0) and stay there (.% and.% at (2, 0) bd(c)).

44 Example of degeneracy of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} c(η, η 2 ) y Y M(η) = max{ η, η + 2(g 2)η 2 } MCMC will usually mix poorly. If µ(η) close to (3, 0) (e.g., η = (4.5,.4)) then M(η) = 4.5 So an MCMC will approach (3, 0) and stay there (.% and.% at (2, 0) bd(c)). If µ(η) close to (, 40) (e.g., η = ( 3.43, 0.63)) then M(η) = The model places 50% of its mass on graphs with 2 or fewer edges and 36% on graphs with at least edges.

45 Example of degeneracy of the 2 star model P(Y = y) = exp{η E(y) + η 2 S(y)} c(η, η 2 ) y Y M(η) = max{ η, η + 2(g 2)η 2 } MCMC will usually mix poorly. If µ(η) close to (3, 0) (e.g., η = (4.5,.4)) then M(η) = 4.5 So an MCMC will approach (3, 0) and stay there (.% and.% at (2, 0) bd(c)). If µ(η) close to (, 40) (e.g., η = ( 3.43, 0.63)) then M(η) = The model places 50% of its mass on graphs with 2 or fewer edges and 36% on graphs with at least edges. The model is also unstable e.g., η = ( 3.43, 0.6)) µ(η) (4.4,.) and the model places almost all its mass on empty graphs.

46 Figure 4: Regions of the parameter space of µ µ 2 : mean 2 stars parameter µ : mean edges parameter

47 (a) Trace plot of edges (b) Density of edges Density e+04 6e+04 e+05 Iterations edges (c) Trace plot of 2 stars (d) Density of 2 stars Density e+04 6e+04 e+05 Iterations stars

48 Estimation within the mean value parametization If Z (y obs ) int(c), the MLE of µ is Z (y obs ).

49 Estimation within the mean value parametization If Z (y obs ) int(c), the MLE of µ is Z (y obs ). If Z (y obs ) int(c) the MLE of µ does not exist.

50 Estimation within the mean value parametization If Z (y obs ) int(c), the MLE of µ is Z (y obs ). If Z (y obs ) int(c) the MLE of µ does not exist. The MLE ˆµ is unbiased and has minimum variance: [ ] log c(η) E η (ˆµ) = E η [Z (Y )] = µ(η) = (η) η i [ 2 ] log c(η) V η (ˆµ) = V η [Z (Y )] = (η) η i η j An estimate of the variance-covariance is available using the same MCMC.

51 Trace plot of edges Density of edges Density e+04 6e+04 e+05 Iterations edges Trace plot of 2 stars Density of 2 stars Density e+04 6e+04 e+05 Iterations stars

52 Figure 4: Regions of the parameter space of µ µ 2 : mean 2 stars parameter µ : mean edges parameter

53 Existence and uniqueness of MLE Let C be the convex hull of {Z (y) : y Y} - the convex hull of the discrete support points. Let int(c) be the interior of C.

54 Existence and uniqueness of MLE Let C be the convex hull of {Z (y) : y Y} - the convex hull of the discrete support points. Let int(c) be the interior of C. Result (Barndorff-Nielsen ) The MLE exists if, and only if, Z (y observed ) int(c) If it exists, it is unique and can be found by solving the likelihood equations or by direct optimization of L.

55 Figure : Enumeration of sufficient statistics for graphs with nodes. The circles are centered on the possible values and the area of the circle is proportional to the number of graphs with that value of the sufficient statistic. There are a total of 2,0,52 graphs. number of 2 stars number of edges

56 How can we tell if a model class is useful? Many aspects: Is the model-class itself able to represent a range of realistic networks? model degeneracy: small range of graphs covered as the parameters vary (Handcock 2003) What are the properties of different methods of estimation? e.g, MLE, psuedolikelihood, Bayesian framework computational failure: estimates do not exist for certain observable graphs Can we assess the goodness-of-fit of models? appropriate measures and tests (Besag 2000; Hunter, Goodreau, Handcock 200)

57 Existence and uniqueness of MC-MLE Geyer and Thompson (2) show the MC-MLE converges to the true MLE as the number of simulations increases. also produces estimates of the asymptotic covariance matrix, size of the MCMC induced error, etc. Let CO be the convex hull of sampled sufficient statistics. In practice, three cases: Z (y) int(co) C: MC-MLE exists and is unique 2 Z (y) int(co) but is in int(c): MC-MLE does not exist, even though MLE does 3 Z (y) int(c): MC-MLE and MLE do not exist

58 Figure : Enumeration of sufficient statistics for graphs with nodes. The circles are centered on the possible values and the area of the circle is proportional to the number of graphs with that value of the sufficient statistic. There are a total of 2,0,52 graphs. number of 2 stars number of edges

59 How can we tell if a model class is useful? Many aspects: Is the model-class itself able to represent a range of realistic networks? model degeneracy: small range of graphs covered as the parameters vary (Handcock 2003) What are the properties of different methods of estimation? e.g, MLE, psuedolikelihood, Bayesian framework computational failure: estimates do not exist for certain observable graphs Can we assess the goodness-of-fit of models? appropriate measures and tests (Besag 2000; Hunter, Goodreau, Handcock 200)

60 Goodness of fit intuition ERGM class exp{η g(y)}

61 Goodness of fit intuition ERGM class exp{η g(y)} y obs

62 Goodness of fit intuition ERGM (approx) class MLE exp{η g(y)} η y obs

63 Goodness of fit intuition ERGM (approx) Fitted class MLE ERGM exp{η g(y)} η exp{ η g(y)} y obs

64 Goodness of fit intuition ERGM (approx) Fitted class MLE ERGM exp{η g(y)} η exp{ η g(y)} y obs Randomly generated networks Ỹ, Ỹ2,...

65 Goodness of fit intuition ERGM (approx) Fitted class MLE ERGM exp{η g(y)} η exp{ η g(y)} y obs Randomly generated networks Ỹ, Ỹ2,... Question: How does y obs look as a representative of the sample Ỹ, Ỹ2,...?

66 The eyeball test The data: School 0: 205 Students Simulated network, model A: Simulated graph: By grade 0 2??

67 The eyeball test (cont d) The data: School 0: 205 Students Simulated network, model B: Simulated graph: By grade 0 2?? (Yikes!)

68 The models Model A: g(y) contains terms for # of edges Homophily effects of grade, sex, and race factors Main effects of grade, sex, and race factors i (.632)i EP i, where EP i =# edges with i shared partners Model B: g(y) contains terms for # of edges # of neighbors of the same sex (homophily effect) # of 2-stars # of triangles (Note: It was necessary to use MPLE to fit Model B)

69 Quantitative checks for goodness of fit A well-known example: Lamberteschi Pucci Guadagni Tornabuoni Albizzi Ginori Medici Acciaiuoli Salviati Bischeri Peruzzi Strozzi Castellani Ridolfi Barbadori Florentine marriage data Edge indicates marriage tie between families Sides=degree + 3 Color=degree Size=log(wealth) Pazzi

70 Quantitative checks for goodness of fit A well-known example: Lamberteschi Pucci Guadagni Tornabuoni Albizzi Ginori Medici Acciaiuoli Salviati Bischeri Peruzzi Strozzi Castellani Ridolfi Barbadori Florentine marriage data Edge indicates marriage tie between families Sides=degree + 3 Color=degree Size=log(wealth) Pazzi model <- ergm(flomarriage ~ edges + kstar(2))

71 Graphical GOF check: degree distribution model <- ergm(flomarriage ~ edges + kstar(2)) Goodness of fit diagnostics proportion of nodes degree

72 Graphical GOF: edgewise shared partner distribution model <- ergm(flomarriage ~ edges + kstar(2)) Goodness of fit diagnostics proportion of edges edge wise shared partners

73 Graphical GOF check: geodesic distance distribution model <- ergm(flomarriage ~ edges + kstar(2)) Goodness of fit diagnostics proportion of dyads NR minimum geodesic distance

74 GOF check: Examples from Add Health networks n=205 log odds for a node log odds for an edge log odds for a dyad log odds for a dyad degree log odds for a dyad edge wise shared partners n=220 School 44, Edges + Attributes + WSPartner(.5), free degree log odds for a dyad minimum geodesic distance degree shared partners minimum geodesic distance

75 MCMC Testing Significance tests based on comparing the observed value of a statistics to a null probability distribution. MCMC p values (2000) Besag and Clifford (), Besag

76 Illustration: Empirical evidence of competition among Darwin s Finches Island Finch A B C D E F G H I J K L M N O P Q Large ground finch Medium ground finch Small ground finch Sharp-beaked ground finch Cactus ground finch Large cactus ground finch Large tree finch Medium tree finch Small tree finch Vegetarian finch Woodpecker finch Mangrove finch Warbler finch Table: Darwin s finch data

77 MCMC Testing and p values Does the observed grouping of finch species on islands happened by random chance or if it was the result of a struggle in which only species which depended on different food sources could coexist on an island. To test this hypothesis, consider the test statistic S 2 = m(m ) sij 2, where m is the number of finch species, S = ( s ij ) = AA T, and A = ( a ij ) is the bipartite graph in the table. i j

78 Figure: Null distribution of the test statistic S 2

79 Figure: Number of pairs of finches sharing x islands, x = 0,,...,

80 Summary Network representations intersect with most sciences Sparse models are being used to capture structural properties The models must depend on the scientific objective. Some seemingly simple models are not so. The inclusion of attributes is very important actor attributes dyad attributes e.g. homophily, race, location structural terms e.g. transitive homophily

81 We need better and more local models for social networks: e.g. nearest neighbor" ideas for local dependence Baddeley and Moller () Snijders, Robins, Pattison, Handcock (2006) Taking into account class membership is very important known classes block models" Wang and Wong ()

82 latent class and trait models are important an underlying latent social space" of actors Hoff, Raftery and Handcock (2002) Hoff (2003, 2004,...) latent class models are very promising Nowicki and Snijders (200) latent class and trait models Handcock, Raftery, Tantrum (200); Krivitsky et. al (200) Hoff (2005, 200) grade of membership models Airoldi, Blei, Feinberg (200)

Statistical Models for Social Networks with Application to HIV Epidemiology

Statistical Models for Social Networks with Application to HIV Epidemiology Mark S. Handcock Department of Statistics University of Washington Joint work with Pavel Krivitsky Martina Morris and the U.