Nonparametric Bayesian Matrix Factorization for Assortative Networks

Similar documents
arxiv: v2 [stat.ml] 30 Dec 2015

Learning latent structure in complex networks

Mixed Membership Stochastic Blockmodels

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Overlapping Communities

Scalable Gaussian process models on matrices and tensors

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Consistency Under Sampling of Exponential Random Graph Models

Random function priors for exchangeable arrays with applications to graphs and relational data

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Priors for Random Count Matrices with Random or Fixed Row Sums

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Lecture 6 (supplemental): Stochastic Block Models

Modularity and Graph Algorithms

Introduction to Probabilistic Machine Learning

Nonparametric Latent Feature Models for Link Prediction

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

Bayesian nonparametric models of sparse and exchangeable random graphs

Learning Bayesian network : Given structure and completely observed data

Mixed Membership Stochastic Blockmodels

Undirected Graphical Models

1 Matrix notation and preliminaries from spectral graph theory

Chris Bishop s PRML Ch. 8: Graphical Models

Machine Learning in Simple Networks. Lars Kai Hansen

The Origin of Deep Learning. Lili Mou Jan, 2015

Efficient Online Inference for Bayesian Nonparametric Relational Models

Machine Learning Summer School, Austin, TX January 08, 2015

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Bayesian non parametric inference of discrete valued networks

STA 4273H: Statistical Machine Learning

Bayesian Learning in Undirected Graphical Models

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

Introduction to Probabilistic Graphical Models

Lecture 16 Deep Neural Generative Models

Probabilistic Graphical Models

Mixed Membership Stochastic Blockmodels

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Finding Mixed-Memberships in Social Networks

Appendix: Modeling Approach

Chapter 16. Structured Probabilistic Models for Deep Learning

Learning Bayesian Networks for Biomedical Data

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 27 Mar 2006

Variational Inference (11/04/13)

Stochastic blockmodels with a growing number of classes

AS the availability and scope of social networks

Modeling homophily and stochastic equivalence in symmetric relational data

Pattern Recognition and Machine Learning

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

PMR Learning as Inference

Bayes methods for categorical data. April 25, 2017

Groups of vertices and Core-periphery structure. By: Ralucca Gera, Applied math department, Naval Postgraduate School Monterey, CA, USA

Multislice community detection

3 : Representation of Undirected GM

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

6.867 Machine learning, lecture 23 (Jaakkola)

Project in Computational Game Theory: Communities in Social Networks

Hierarchical Mixed Membership Stochastic Blockmodels for Multiple Networks and Experimental Interventions

Lecture 2: Simple Classifiers

Graph Detection and Estimation Theory

Bayesian Nonparametrics for Speech and Signal Processing

STA 4273H: Statistical Machine Learning

arxiv: v2 [stat.ml] 10 Sep 2012

The Trouble with Community Detection

CSC 412 (Lecture 4): Undirected Graphical Models

13: Variational inference II

Statistical mechanics of community detection

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

arxiv: v2 [math.st] 8 Dec 2010

Hierarchical Models for Social Networks

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Lecture 13 : Variational Inference: Mean Field Approximation

Study Notes on the Latent Dirichlet Allocation

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

RaRE: Social Rank Regulated Large-scale Network Embedding

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Massive-scale estimation of exponential-family random graph models with local dependence

The non-backtracking operator

IV. Analyse de réseaux biologiques

Probabilistic Graphical Models

Probabilistic Graphical Models

1 Matrix notation and preliminaries from spectral graph theory

Sampling and incomplete network data

STA 216, GLM, Lecture 16. October 29, 2007

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Random Effects Models for Network Data

Bayesian nonparametric models for bipartite graphs

Scaling Neighbourhood Methods

Image segmentation combining Markov Random Fields and Dirichlet Processes

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

Network Event Data over Time: Prediction and Latent Variable Modeling

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Probabilistic Graphical Models

Modeling heterogeneity in random graphs

STA 4273H: Statistical Machine Learning

Learning Bayesian networks

Transcription:

Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin 23rd European Signal Processing Conference (EUSIPCO 15) Nice, France, September 4, 15 1 / 21

Table of Contents Introduction Gamma process edge partition model Example results Improve EPM to model dissortativity Conclusions 2 / 21

Introduction Network community detection and link prediction We will focus on unweighted undirected relational networks, which can also be represented as binary symmetric adjacency matrices. Non-probabilistic community detection algorithms (see Fortunato, 10 for a comprehensive review) Examples: Modularity maximization (Newman and Girvan, 04) Click percolation (Palla et al., 05) Restrictions: Usually cannot be used to generate networks and predict missing edges (links) Often need to tune the number of communities 3 / 21

Introduction Network community detection and link prediction Generative network models Model assumptions are clearly stated in a hierarchical model Generate random networks Detect latent communities Detect community-community interactions Predict missing edges (links) Automatically infer the number of communities with nonparametric Bayesian priors 4 / 21

Introduction Assortative and dissortative relational networks Assortativity: Also known as Homophily. A subset of nodes that are densely connected to each other but sparsely to the others are often considered to belong to the same community. Example: in a social network, a community may consist of a group of closely related friends. 5 / 21

Introduction Assortative and dissortative relational networks Dissortativity: Also known as Stochastic Equivalence. A subset of nodes that are sparsely connected to each other but densely connected to another subset of nodes are often considered to belong to the same community. Example: in a predator-prey network, a community may consist of a group of animals that play similar roles in the ecosystem but not necessarily prey on each other. 6 / 21

Introduction Probabilistic models for network analysis Latent class model Stochastic blockmodel (Holland et al., 1983; Nowichi and Snijders, 01) Infinite relational model (Kemp et al., 06) Mixed-membership stochastic blockmodel (Airoldi et al., 08) Latent factor model Eigenmodel (Hoff, 08) Infinite latent feature relational model (Miller et al., 09; Morup et al., 11) Community-affiliation graph model (Yang and Leskovec, 12, 14) Detection of disjoint or overlapping communities Interpretation of latent representations Prediction of missing edges 7 / 21

Gamma process edge partition model Gamma process edge partition model Detect overlapping communities and predict missing edges As a latent factor model: Connect each binary edge to a latent count via the Bernoulli-Poisson link Factorize the latent count matrix As a latent class model: Explicitly partition each observed edge into multiple latent communities Implicitly assign a node to multiple communities based on how its edges are partitioned (overlapping communities) Designed to analyze assortative networks Can be generalized to capture dissortativity by modeling community-community interactions 8 / 21

Gamma process edge partition model Gamma process edge partition model Hierarchical model b ij = 1(m ij 1), K m ij = m ijk, m ijk Po (r k φ ik φ jk ), k=1 φ ik Gam(a i, 1/c i ), a i Gam(e 0, 1/f 0 ), r k Gam(γ 0 /K, 1/c 0 ), γ 0 Gam(e 1, 1/f 1 ). The Bernoulli-Poisson link [ ] K b ij Bernoulli 1 exp ( r k φ ik φ jk ). k=1 9 / 21

Gamma process edge partition model The Bernoulli-Poisson link Thresholding a count variable to obtain a binary variable b = 1(m 1), m Po(λ) (1) Marginal likelihood of the Bernoulli-Poisson link ( b Ber 1 e λ). The conditional posterior of the latent count m follows a truncated Poisson distribution, expressed as (m b, λ) b Po + (λ), Use rejection sampling to sample from the truncated Poisson distribution. Conceptual and computational advantages over the probit and logistic links. 10 / 21

Gamma process edge partition model Overlapping community structures Edge partition model (EPM) under data augmentation m ij = k m ijk, m ijk Po (r k φ ik φ jk ). m ijk represents how often nodes i and j interact due to their affiliations with community k. r k φ ik j i φ jk measures how strongly node i is affiliated with community k, and the latent count m i k := N j=i+1 m ijk + i 1 j=1 m jik (2) represents how often node i is connected to the other nodes due to its affiliation with community k. Assign node i to multiple communities in {k : m i k 1}, or (hard) assign it to a single community using either argmax(r k φ ik j i φ jk) or argmax(m i k ). k k 11 / 21

Gamma process edge partition model Related model: community-affiliation graph model (AGM) A restricted version of the gamma process EPM, expressed as [ b ij Ber 1 e ] ɛ exp( r k φ ik φ jk ), k where ɛ R + and φ ik {0, 1}, could be considered as a nonparametric Bayesian generalization of the community-affiliation graph model (AGM) of (Yang and Leskovec, 12, 14). It is argued in AGM that all previous community detection methods, including clique percolation and MMSB, would fail to detect communities with dense overlaps, due to a hidden assumption that a community s overlapping parts are less densely connected than its non-overlapping ones. The EPM does not make such a restrictive assumption; and beyond the AGM, it does not restrict φ ik to be binary. 12 / 21

Gamma process edge partition model Data augmentation and marginalization Using the Poisson additive property, we have ( m i k Po r k φ ik j i φ jk Marginalizing out φ ik leads to ) i, m k Po (r k m ik NB (a i, p ik ), p ik := r k j i φ jk c i +r k j i φ. jk Marginalizing out r k leads to m k NB (γ 0 /K, p k ), p k := j i φ ikφ jk 2 i j i φ ikφ jk 2c 0 + i j i φ ikφ jk. Using these equations, we can develop closed-form Gibbs sampling update equations for all model parameters. ) 13 / 21

Gamma process edge partition model Gibbs sampling ( K ) Sample m ij. (m ij ) b ij Po + k=1 r kφ ik φ jk. ( Sample m ijk. ({m ijk } k=1:k ) Mult m ij ; {r kφ ik φ jk } k=1:k Sample a i. (l ik ) ( m i k t=1 Ber ai ( (a i ) Gam a i +t 1 ), k r k φ ik φ jk ). e 0 + k l ik, 1 f 0 + k ln(1 p ik ) ). ( ) 1 Sample φ ik. (φ ik ) Gam a i + m i k, c i +r k j i φ. jk Sample γ 0, c i and c 0. ( Sample r k. (r k ) Gam γ 0 K + m k, 1 c 0 + i j i 1 2 φ ikφ jk ). 14 / 21

Example results Synthetic assortative network Four communities with dense intra-community connections. The 2nd community overlaps with both the 1st and 3rd ones. (a) Ground truth (b) Adjacency matrix (c) IRM (d) Eigenmodel (e) AGM (f) GP EPM Figure: Comparison of four algorithms abilities to recover the ground-truth link probabilities using 80% of the pairs of nodes randomly selected from a synthetic relational network The number of features for the Eigenmodel is set as K = 4. 15 / 21

Example results The infinite relational model (IRM) accurately captures the community structures but produces cartoonish blocks The Eigenmodel somewhat overfits the data The AGM produces some undesired artifacts The gamma process EPM (GP-EPM) provides a reconstruction that looks most similar to the ground truth. (a) Ground truth (b) Adjacency matrix (c) IRM (d) Eigenmodel (e) AGM (f) GP EPM 16 / 21

Example results Both the GP-EPM and Eigenmodel perform well and clearly outperform the IRM and AGM in missing link prediction, measured by both the area under the ROC curve and the area under the precision-recall (PR) curve. Table: Comparison of four algorithms abilities to predict missing edges of a synthetic assortative network. The number of features for the Eigenmodel is set as K = 4. Model AUC-ROC AUC-PR IRM 0.9680 ± 0.0073 0.8636 ± 0.0448 Eigenmodel 0.9746 ± 0.0066 0.9073 ± 0.0236 AGM 0.9291 ± 0.0184 0.8166 ± 0.0470 GP-EPM 0.9746 ± 0.0056 0.9042 ± 0.0270 17 / 21

Improve EPM to model dissortativity Synthetic dissortative network Four communities with dense intra-community or inter-community connections. (a) Ground truth (b) Adjacency matrix (c) IRM (d) Eigenmodel (e) AGM (f) GP EPM Figure: Comparison of four algorithms abilities to recover the ground-truth link probabilities using 80% of the pairs of nodes randomly selected from a synthetic relational network that exhibits clear dissortativity. 18 / 21

Improve EPM to model dissortativity EPM for dissortative networks (Zhou, AIStats 15) EPM that captures community-community interactions K K b ij = 1(m ij 1), m ij = m ik1k 2j, m ik1k 2j Po (φ ik1 λ k1k 2 φ jk2 ), k 1=1 k 2=1 Use a relational hierarchical gamma process to support K = The inferred latent feature matrix {φ k } and community-community interaction rate matrix {λ k1k 2 } for the improved EPM model on Protein230 19 / 21

Improve EPM to model dissortativity EPM for dissortative networks (Zhou, AIStats 15) (a) Protein interaction network (b) HGP EPM 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 0 (c) GP EPM (d) IRM 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 0 Figure: Comparison of three models on estimating the link probabilities for the Protein230 network using 80% of its node pairs. / 21

Conclusions Conclusions The gamma process edge partition model (GP-EPM) provides an efficient and effective solution to model assortative relational networks The GP-EPM has limited ability to model dissortativity in relational networks. As in (Zhou, AIStats 15), to model dissortativity in relational networks, one may modify the GP-EPM to capture community-community interactions. 21 / 21