Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin 23rd European Signal Processing Conference (EUSIPCO 15) Nice, France, September 4, 15 1 / 21

Table of Contents Introduction Gamma process edge partition model Example results Improve EPM to model dissortativity Conclusions 2 / 21

Introduction Network community detection and link prediction We will focus on unweighted undirected relational networks, which can also be represented as binary symmetric adjacency matrices. Non-probabilistic community detection algorithms (see Fortunato, 10 for a comprehensive review) Examples: Modularity maximization (Newman and Girvan, 04) Click percolation (Palla et al., 05) Restrictions: Usually cannot be used to generate networks and predict missing edges (links) Often need to tune the number of communities 3 / 21

Introduction Network community detection and link prediction Generative network models Model assumptions are clearly stated in a hierarchical model Generate random networks Detect latent communities Detect community-community interactions Predict missing edges (links) Automatically infer the number of communities with nonparametric Bayesian priors 4 / 21

Introduction Assortative and dissortative relational networks Assortativity: Also known as Homophily. A subset of nodes that are densely connected to each other but sparsely to the others are often considered to belong to the same community. Example: in a social network, a community may consist of a group of closely related friends. 5 / 21

Introduction Assortative and dissortative relational networks Dissortativity: Also known as Stochastic Equivalence. A subset of nodes that are sparsely connected to each other but densely connected to another subset of nodes are often considered to belong to the same community. Example: in a predator-prey network, a community may consist of a group of animals that play similar roles in the ecosystem but not necessarily prey on each other. 6 / 21

Introduction Probabilistic models for network analysis Latent class model Stochastic blockmodel (Holland et al., 1983; Nowichi and Snijders, 01) Infinite relational model (Kemp et al., 06) Mixed-membership stochastic blockmodel (Airoldi et al., 08) Latent factor model Eigenmodel (Hoff, 08) Infinite latent feature relational model (Miller et al., 09; Morup et al., 11) Community-affiliation graph model (Yang and Leskovec, 12, 14) Detection of disjoint or overlapping communities Interpretation of latent representations Prediction of missing edges 7 / 21

Gamma process edge partition model Gamma process edge partition model Detect overlapping communities and predict missing edges As a latent factor model: Connect each binary edge to a latent count via the Bernoulli-Poisson link Factorize the latent count matrix As a latent class model: Explicitly partition each observed edge into multiple latent communities Implicitly assign a node to multiple communities based on how its edges are partitioned (overlapping communities) Designed to analyze assortative networks Can be generalized to capture dissortativity by modeling community-community interactions 8 / 21

Gamma process edge partition model Gamma process edge partition model Hierarchical model b ij = 1(m ij 1), K m ij = m ijk, m ijk Po (r k φ ik φ jk ), k=1 φ ik Gam(a i, 1/c i ), a i Gam(e 0, 1/f 0 ), r k Gam(γ 0 /K, 1/c 0 ), γ 0 Gam(e 1, 1/f 1 ). The Bernoulli-Poisson link [ ] K b ij Bernoulli 1 exp ( r k φ ik φ jk ). k=1 9 / 21

Gamma process edge partition model The Bernoulli-Poisson link Thresholding a count variable to obtain a binary variable b = 1(m 1), m Po(λ) (1) Marginal likelihood of the Bernoulli-Poisson link ( b Ber 1 e λ). The conditional posterior of the latent count m follows a truncated Poisson distribution, expressed as (m b, λ) b Po + (λ), Use rejection sampling to sample from the truncated Poisson distribution. Conceptual and computational advantages over the probit and logistic links. 10 / 21

Gamma process edge partition model Overlapping community structures Edge partition model (EPM) under data augmentation m ij = k m ijk, m ijk Po (r k φ ik φ jk ). m ijk represents how often nodes i and j interact due to their affiliations with community k. r k φ ik j i φ jk measures how strongly node i is affiliated with community k, and the latent count m i k := N j=i+1 m ijk + i 1 j=1 m jik (2) represents how often node i is connected to the other nodes due to its affiliation with community k. Assign node i to multiple communities in {k : m i k 1}, or (hard) assign it to a single community using either argmax(r k φ ik j i φ jk) or argmax(m i k ). k k 11 / 21

Gamma process edge partition model Related model: community-affiliation graph model (AGM) A restricted version of the gamma process EPM, expressed as [ b ij Ber 1 e ] ɛ exp( r k φ ik φ jk ), k where ɛ R + and φ ik {0, 1}, could be considered as a nonparametric Bayesian generalization of the community-affiliation graph model (AGM) of (Yang and Leskovec, 12, 14). It is argued in AGM that all previous community detection methods, including clique percolation and MMSB, would fail to detect communities with dense overlaps, due to a hidden assumption that a community s overlapping parts are less densely connected than its non-overlapping ones. The EPM does not make such a restrictive assumption; and beyond the AGM, it does not restrict φ ik to be binary. 12 / 21

Gamma process edge partition model Data augmentation and marginalization Using the Poisson additive property, we have ( m i k Po r k φ ik j i φ jk Marginalizing out φ ik leads to ) i, m k Po (r k m ik NB (a i, p ik ), p ik := r k j i φ jk c i +r k j i φ. jk Marginalizing out r k leads to m k NB (γ 0 /K, p k ), p k := j i φ ikφ jk 2 i j i φ ikφ jk 2c 0 + i j i φ ikφ jk. Using these equations, we can develop closed-form Gibbs sampling update equations for all model parameters. ) 13 / 21

Gamma process edge partition model Gibbs sampling ( K ) Sample m ij. (m ij ) b ij Po + k=1 r kφ ik φ jk. ( Sample m ijk. ({m ijk } k=1:k ) Mult m ij ; {r kφ ik φ jk } k=1:k Sample a i. (l ik ) ( m i k t=1 Ber ai ( (a i ) Gam a i +t 1 ), k r k φ ik φ jk ). e 0 + k l ik, 1 f 0 + k ln(1 p ik ) ). ( ) 1 Sample φ ik. (φ ik ) Gam a i + m i k, c i +r k j i φ. jk Sample γ 0, c i and c 0. ( Sample r k. (r k ) Gam γ 0 K + m k, 1 c 0 + i j i 1 2 φ ikφ jk ). 14 / 21

Example results Synthetic assortative network Four communities with dense intra-community connections. The 2nd community overlaps with both the 1st and 3rd ones. (a) Ground truth (b) Adjacency matrix (c) IRM (d) Eigenmodel (e) AGM (f) GP EPM Figure: Comparison of four algorithms abilities to recover the ground-truth link probabilities using 80% of the pairs of nodes randomly selected from a synthetic relational network The number of features for the Eigenmodel is set as K = 4. 15 / 21

Example results The infinite relational model (IRM) accurately captures the community structures but produces cartoonish blocks The Eigenmodel somewhat overfits the data The AGM produces some undesired artifacts The gamma process EPM (GP-EPM) provides a reconstruction that looks most similar to the ground truth. (a) Ground truth (b) Adjacency matrix (c) IRM (d) Eigenmodel (e) AGM (f) GP EPM 16 / 21

Example results Both the GP-EPM and Eigenmodel perform well and clearly outperform the IRM and AGM in missing link prediction, measured by both the area under the ROC curve and the area under the precision-recall (PR) curve. Table: Comparison of four algorithms abilities to predict missing edges of a synthetic assortative network. The number of features for the Eigenmodel is set as K = 4. Model AUC-ROC AUC-PR IRM 0.9680 ± 0.0073 0.8636 ± 0.0448 Eigenmodel 0.9746 ± 0.0066 0.9073 ± 0.0236 AGM 0.9291 ± 0.0184 0.8166 ± 0.0470 GP-EPM 0.9746 ± 0.0056 0.9042 ± 0.0270 17 / 21

Improve EPM to model dissortativity Synthetic dissortative network Four communities with dense intra-community or inter-community connections. (a) Ground truth (b) Adjacency matrix (c) IRM (d) Eigenmodel (e) AGM (f) GP EPM Figure: Comparison of four algorithms abilities to recover the ground-truth link probabilities using 80% of the pairs of nodes randomly selected from a synthetic relational network that exhibits clear dissortativity. 18 / 21

Improve EPM to model dissortativity EPM for dissortative networks (Zhou, AIStats 15) EPM that captures community-community interactions K K b ij = 1(m ij 1), m ij = m ik1k 2j, m ik1k 2j Po (φ ik1 λ k1k 2 φ jk2 ), k 1=1 k 2=1 Use a relational hierarchical gamma process to support K = The inferred latent feature matrix {φ k } and community-community interaction rate matrix {λ k1k 2 } for the improved EPM model on Protein230 19 / 21

Improve EPM to model dissortativity EPM for dissortative networks (Zhou, AIStats 15) (a) Protein interaction network (b) HGP EPM 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 0 (c) GP EPM (d) IRM 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 0 Figure: Comparison of three models on estimating the link probabilities for the Protein230 network using 80% of its node pairs. / 21

Conclusions Conclusions The gamma process edge partition model (GP-EPM) provides an efficient and effective solution to model assortative relational networks The GP-EPM has limited ability to model dissortativity in relational networks. As in (Zhou, AIStats 15), to model dissortativity in relational networks, one may modify the GP-EPM to capture community-community interactions. 21 / 21