Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

Size: px

Start display at page:

Download "Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models"

Drusilla Silvia Harrison
6 years ago
Views:

1 Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Pablo Robles Granda, and Sebastian Moreno)

2 Statistical network methods/models are critical for: Hypothesis testing Simulation of system behavior over time Understanding key system properties and causal effects

3 Many of the current generative network models are edge-based

4 Edge-based generative network models (GNMs) P A G =(V, E) Models differ wrt to how they produce edge probabilities in (e.g., ER, SBM, CL, KPGM) P

5 These models represent/learn probability distributions over the graph space Likely networks Unlikely networks P K o (G) P K o (G) G K o

6 Example: mixed Kronecker Product Graph Models

7 mixed KPGM (Moreno et al. KDD 13) Starting from an initiator matrix of b x b Bernoulli parameters ` Matrix of size b is constructed using Kronecker multiplication ` Example: apple = K =6; ` =4 Then the Bernoulli trials for the subsequent K- ` kronecker multiplications are tied Result: each edge occurs with same expectation as KPGM, but overall variance increases

8 mixed KPGM (Moreno et al. KDD 13) Starting from an initiator matrix of b x b Bernoulli parameters K =6; ` =4 ` Matrix of size b is constructed using Kronecker multiplication ` Example: apple = Then the Bernoulli trials for the subsequent K- ` kronecker multiplications are tied Result: each edge occurs with same expectation as KPGM, but overall variance increases

9 mixed KPGM (Moreno et al. KDD 13) Starting from an initiator matrix of b x b Bernoulli parameters K =6; ` =4 ` Matrix of size b is constructed using Kronecker multiplication Hierarchical structure, with tied parameters, allows mkpgm to better capture complex properties (e.g., clustering and variability) observed in real networks ` Example: apple = Other hierarchical GNMs: BTER (Seshadhri et al., 2012), bisbm (Larremore et al., 2014), nestedsbm (Peixoto, 2014) Then the Bernoulli trials for the subsequent K- ` kronecker multiplications are tied Result: each edge occurs with same expectation as KPGM, but overall variance increases

10 GNMs are typically defined procedurally (e.g., through a prescribed sampling process), which makes it difficult to compare model properties* * See the following for a nice overview: A. Jacobs, A. Clauset (2014). A unified view of generative models for networks: models, methods, opportunities, and challenges. In NIPS 2014 Workshop on Networks: From Graphs to Rich Data.

11 Observation: mkpgm can be represented as a Bayesian network V [0] P [0] Z [0] V [0] V [0] P [1] Z [1] Blocks V [0] P A Edges V [0]

12 Observation: mkpgm can be represented as a Bayesian network V [0] P [0] Z [0] V [0] V [0] P [1] Z [1] Blocks V [0] [0] P A Edges V [0] V V [0] [0] The final BN N consists of all the RVs Z [0],Z [1],,Z [ =K `] and their associated probabilities.

13 Observation: mkpgm can be represented as a Bayesian network V [0] Super-block level =0 Z [0] i V [0] Block level =1 Z [1] j A uv Edge level = V [0] - + The final BN N consists of all the RVs Z [0],Z [1],,Z [ =K `] and their associated probabilities.

14 What is the advantage of representing hierarchical GNMs as Bayesian networks?

Utility of BN representation for GNMs (Robles et al. IJCAI 17) Facilitates comparison between hierarchical GNM models mkpgm Sampling(, K, `){ Compute P [0] = [1] Zj Sample G[0] Bernoulli(P) For l = 1.

Difficult to Compare Comparison Feasible } [0] Zi BTER Sampling( ){ Preprocess:create node groups [1] Zj Link within-block nodes with ER model Can easily identify parametric symmetries that enable

[1] Zj [0] Zi [1] Zj =0 Block level =1 Edge level =2 Auv Fig.

15 Utility of BN representation for GNMs (Robles et al. IJCAI 17) Facilitates comparison between hierarchical GNM models mkpgm Sampling(, K, `){ Compute P [0] = [1] Zj Sample G[0] Bernoulli(P) For l = 1... K `: Set P [l] = G[l Enables development of generalized sampling algorithms that are provably correct and eﬃcient [0] Zi ` 1 [2] Zk 1] A[3] uv Sample G[l] Bernoulli(P [l] ) Return graph G[l] Difficult to Compare Comparison Feasible } [0] Zi BTER Sampling( ){ Preprocess:create node groups [1] Zj Link within-block nodes with ER model Can easily identify parametric symmetries that enable lifted inference Link between-block nodes with CL model A[3] uv Return graph } Context-specific dependence sparsifies the set of RVs to be sampled. [1] Zj [0] Zi [1] Zj =0 Block level =1 Edge level =2 Auv Fig. 1 b Auv Symmetries [0] Zi Super-block level P (0) P (1) pa = P (0) P (1) pa = P (0) P (1) pa = pa = pa = [1] Zj [0] Zi [1] Zj uv BN form of Fig. 1 b Z=1 Z=0 Sparsification pa = 1 1 U [2] U [2] Fill (color/blend): represent A (latent) random variables (RVs) with the same value Arrows: represent dependencies among (RVs). Tables: Represent CPD of the RVs. Identifying symmetries is not a trivial task as described [0] Zi Auv Squares: represent binary random variables for blocks and graph links; red (sampled Z=1), grey (sampled Z=0), crosshatch indicates Z=0 (sampling prevented) due to context-specific dependence on parent rv value. Arrows: represent dependencies among rv's due to block membership.

16 Sampling algorithm for GNM transformed to BN (Robles et al. IJCAI 17) Average-case complexity: O( E )

17 So given that we have models that can efficiently generate networks with varying structure How can we extend these to generate networks with correlated attributes?

18 P G G

19 P (X, G) P E (X, E E, X )=P E (E E )P (X X ) Pr How to learn and effectively sample from P (X,G)? Discrete space with O(2 V 2 +V p ) structures X G

20 There has been work on learning joint network models e.g., PRMs (Getoor et al. 03), ERGMs (Wasserman & Patterson 96), MAGs (Kim & Leskovec 12) But since these were developed as descriptive models, it is difficult to use them to efficiently generate networks with varying structure

Issues with learning/sampling from full joint Learning: A single network does not provide enough data to easily learn a full joint model Pr Solution: use marginals and combine them together to

21 Issues with learning/sampling from full joint Learning: A single network does not provide enough data to easily learn a full joint model Pr Solution: use marginals and combine them together to approximate sampling from full joint distribution Sampling: X G Hierarchical network models concentrate mass to smaller regions to capture edge dependence this makes it difficult to identify likely networks in the sampling space Solution: develop approach that can avoid the sparse regions to produce reasonable samples from the joint distribution efficiently Likely graph structures Likely attributed graphs Space of attributed graphs

22 Sampling attributed networks: Sample attribute values for nodes; use graph model to propose edges; then sample edges conditioned on node attributes

23 G Generative Network Model P (X, G G, X ) Marginals X attributes attribute vector x Output sampled network

24 Constrained sampling of attributed graphs (CSAG) (Robles et al. KDD 16) 1.Sample attributes from P(X) 2.Sample from the [hierarchical] model P(G) 3.Sample edges by constraining G X 4.If attribute-relations are capture accurately done 5.Else go to 2, back up a level in hierarchy to further constrain sampling at that level

25 Constrained sampling of attributed graphs (CSAG) (Robles et al. KDD 16) We developed a two-stage constrained sampling process using the BN view of mkpgms G 0 Basic sampling LP sampling G K-l-2 G K-l-2 G K-l-2 Super-block level =0 Z [0] i Basic sampling LP sampling LP sampling Block level =1 Edge level = Z [1] j + - CSAG constraints bias the search to regions of the space with higher likelihood networks Auv G K-l-1 ME sampling G K-l If ρ OUT ~ ρ IN return G K-l else G K-l-1 G Replace basic K-l-1 sampling at edge else level else ME sampling with maximum ME sampling entropy (ME) constraints to sample edges that match G K-l target Gattribute K-l correlation If If ρ OUT correlation ~ ρ IN If ρ OUT ( ) ~ ρ IN not possible: back-up a level in hierarchy and return return use linear programming (LP) to G K-l G constrain block K-l sampling to facilitate better match on attribute correlation

26 CSAG results: network structure CSAG produces network structure that closely matches that of underlying mkpgm marginal

27 CSAG results: attribute correlation CSAG produces attribute correlations that closely match that of input data

28 CSAG results: overall + CSAG achieves lowest joint error + +

29 Why does it work? Computational insight: Representation and objective functions that are easy to specify (e.g. full joint, edge-based likelihood) do not always match well to goals of network learning Potential solution: Using constraints on model space and/or objectives adds inductive bias (can be easier to identify/guide towards properties of interest than to construct better representations/objective functions) Current work: Use CSAG generation method to study the transferability of relational models learned from one (sub) network to another (sub) network based on network structure and attribute correlation

30 Regularization vs. probabilistic modeling (Zeno & Neville, MLG 16) Simulation experiments based on semi-synthetic data generated from CSAG (with parameters learned from real social network data) Weighted vote relational neighbor Relational Bayes collective classifier Attribute correlation AUC Network density Network density

31 Questions?

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,