Mixed Membership Stochastic Blockmodels

Similar documents
Mixed Membership Stochastic Blockmodels

Mixed Membership Stochastic Blockmodels

arxiv: v1 [stat.me] 30 May 2007

Generative Clustering, Topic Modeling, & Bayesian Inference

Latent Dirichlet Allocation (LDA)

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Non-Parametric Bayes

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

GLAD: Group Anomaly Detection in Social Media Analysis

Bayesian Analysis for Natural Language Processing Lecture 2

13: Variational inference II

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Collaborative topic models: motivations cont

IV. Analyse de réseaux biologiques

Unsupervised Learning with Permuted Data

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Probabilistic Graphical Networks: Definitions and Basic Results

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Understanding Covariance Estimates in Expectation Propagation

Bayesian Hidden Markov Models and Extensions

Learning latent structure in complex networks

arxiv: v1 [cs.si] 7 Dec 2013

Lecture 13 : Variational Inference: Mean Field Approximation

Latent Variable Models

Unified Modeling of User Activities on Social Networking Sites

Introduction to Machine Learning

Posterior Regularization

Gaussian Mixture Models

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Mixed Membership Matrix Factorization

Machine Learning Basics: Maximum Likelihood Estimation

Data Mining Techniques

Introduction to Bayesian inference

GWAS V: Gaussian processes

Latent Dirichlet Allocation (LDA)

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

STA 414/2104: Machine Learning

Two-sample hypothesis testing for random dot product graphs

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Machine Learning Techniques for Computer Vision

Modeling heterogeneity in random graphs

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

PMR Learning as Inference

Mixed Membership Matrix Factorization

Statistical Machine Learning Lectures 4: Variational Bayes

Integrated Non-Factorized Variational Inference

Bayesian Regression Linear and Logistic Regression

STA 4273H: Statistical Machine Learning

Hierarchical Mixed Membership Stochastic Blockmodels for Multiple Networks and Experimental Interventions

Bayesian Methods for Machine Learning

Mixtures of Gaussians. Sargur Srihari

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Recent Advances in Bayesian Inference Techniques

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Bayesian Learning (II)

Modeling Strategic Information Sharing in Indian Villages

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Consensus and Distributed Inference Rates Using Network Divergence

Chris Bishop s PRML Ch. 8: Graphical Models

Clustering, K-Means, EM Tutorial

Latent Dirichlet Allocation

Introduction to Probabilistic Machine Learning

Density Estimation. Seungjin Choi

Graphical Models for Collaborative Filtering

Lecture 3a: Dirichlet processes

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Learning Energy-Based Models of High-Dimensional Data

Bayesian Nonparametrics

Logistic Regression. Machine Learning Fall 2018

CS-E3210 Machine Learning: Basic Principles

The connection of dropout and Bayesian statistics

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Consistency Under Sampling of Exponential Random Graph Models

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Probabilistic Reasoning in Deep Learning

Lecture 1: 01/22/2014

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

CSC 2541: Bayesian Methods for Machine Learning

Auto-Encoding Variational Bayes

Lecture 4: Probabilistic Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

Hierarchical Models for Social Networks

Model-Based Clustering for Social Networks

Logistic Regression Trained with Different Loss Functions. Discussion

Structure Learning: the good, the bad, the ugly

Introduction to Machine Learning

STA 4273H: Sta-s-cal Machine Learning

Variational Bayesian Logistic Regression

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Introduction to Machine Learning

Decision Tree Learning Lecture 2

Expectation Maximization

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Efficient Online Inference for Bayesian Nonparametric Relational Models

Algorithmisches Lernen/Machine Learning

Transcription:

Mixed Membership Stochastic Blockmodels Journal of Machine Learning Research, 2008 by E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing as interpreted by Ted Westling STAT 572 Final Talk May 8, 2014 Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 1

Overview 1. Notation and motivation 2. The Mixed Membership Stochastic Blockmodel 3. Simulations 4. Applications 5. Conclusions Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 2

Overview 1. Notation and motivation 2. The Mixed Membership Stochastic Blockmodel 3. Simulations 4. Applications 5. Conclusions Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 3

Network theory: notation N nodes or individuals. We observe relations/interactions Y (i, j) on pairs of individuals. Here we assume Y (i, j) {0, 1}, Y (i, i) = 0, but do not assume Y (i, j) = Y (j, i) (we deal with directed networks). Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 4

Scientific Motivation for Blockmodels General blockmodels: Hypothesis: the nodes can be grouped in to non-overlapping blocks such that the probability of observing an edge from i to j is determined by latent block indicators π i, π j and the block relation π T i Bπ j. Goal: recover the blocks and the block relationships. MMSB: Hypothesis: nodes exhibit mixed membership over latent blocks such that... (same as above, except π i, π j are now distributions). Goal: recover the mixed memberships and the block relationships. Two contrasting examples: monks and rural Indian village. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 5

Example 1: Crisis in a Cloister Sampson (1968) collected a directed social network of 18 monks. Qualitative analysis indicated three clusters of monks: Young Turks, Loyal Opposition, and Outcasts. Figure: Monk adjacency matrices, ordered randomly (left) and using the a-priori clusters (right). Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 6

Example 1: Crisis in a Cloister Is this qualitative hypothesis supported by the data? What are the relationships between the blocks? How strongly defined are the blocks? (MMSB can answer while regular blockmodel cannot.) Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 7

Example 2: Indian village network Researchers collected individual- and household-level demographic and network data for 75 villages near Bangalore, India. I focused on one village of 114 households. Constructed network from household-level visit survey. No ground truth. Figure: Indian village adjacency matrix, arbitrarily ordered. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 8

Example 1: Indian village network Do the data exhibit block structure? Do the estimated blocks correspond to anything in the data? How strongly defined are the blocks? Are there particularly influential households? Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 9

Overview 1. Notation and motivation 2. The Mixed Membership Stochastic Blockmodel 3. Simulations 4. Applications 5. Conclusions Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 10

Generative MMSB 1 Fix the number of latent blocks K. 2 For each node i = 1,..., N: 1 Have a distribution π i over the latent blocks, distributed as π i Dirichlet(α). 3 For each ordered pair of nodes (i, j): 1 Assume i and j are in particular blocks for the interaction i j indicated by z i j, z i j, where: z i j Categorical(π i ). z i j Categorical(π j ). 2 Draw the binary relation Y (i, j) Bernoulli(b) where b = z T i j Bz i j. This is B(g, h) when z i j = e g and z i j = e h. 4 Choose K by estimating the model for K = 1, 2, 3,... and picking the model with the highest approximate BIC. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 11

MMSB vs. Previous methods 1 Central advancement over previous methods: each node i given a distribution π i over blocks rather than being a member of exactly one block. 2 Allows us to determine how strong someone s allegiance to each block is. 3 Learn about the block-level relationships through B. 4 Limitations: unlike previous methods, requires fixing the number of blocks K. Also does not incorporate other possible network mechanisms, e.g. reciprocity or triangle-closing. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 12

Model Estimation: Overview Strategy: treat {π, Z, Z } θ as random latent variables and obtain posterior distribution. Treat {α, B} β as fixed parameters to estimate via Empirical Bayes. Cannot use EM algorithm, because there is no closed form for p(θ Y, β). Sampling from exact posterior does not scale well, so instead use Variational Bayes. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 13

Variational Bayes Main idea: write down a simple parametric approximation q(π, Z ) for the posterior distribution that depends on free variational parameters. Minimize the KL divergence between q and the true posterior in terms of, which is equivalent to maximizing the evidence lower bound or ELBO L( Y, α, B) = E q [log p(π, Z, Y α, B)] E q [log q(π, Z )]. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 14

Empirical Bayes hyperparameter estimation Said we would update α and B with Empirical Bayes. This usually means maximizing p(y α, B), which we can t get in closed form. However, L is a lower bound: log p(y α, B) = K(p, q) + L( Y, α, B) so we maximize this instead, hoping that K(p, q) is relatively constant in α, B. Really approximate Empirical Bayes. Different sort of approximation than variational inference. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 15

Nested algorithm for the MMSB 1 Initialize B (0), α (0), γ (0) 1:N, Φ(0) 2 E-step: (a) for each i, j: (i) Update φ i j, φ i j (ii) Update γ i, γ j (iii) Update B. (b) Until convergence 3 M-step: Update α. 4 Until convergence., Φ (0). Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 16

Overview 1. Notation and motivation 2. The Mixed Membership Stochastic Blockmodel 3. Simulations 4. Applications 5. Conclusions Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 17

Model Simulations Simulated data from the assumed model under various parameter values. K = 3, 6, 9. N = 20, 50, 80. α = 0.025 1, 0.1 1, 0.25 1. B with.8 on the diagonal and.1 off-diagonal ( Diagonal ), with Uniform(.5, 1) on the diagonal and Uniform(.2,.5) off-diagonal ( Diffuse ). For each combination of parameter values, ran five simulations. For each simulation, initialized at the true parameter values ( Oracle ) and at uninformed values ( Naive ). Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 18

Simulations: run time 150 alpha = 0.025 alpha = 0.1 alpha = 0.25 Run time (minutes) 100 50 0 150 100 50 0 150 100 50 0 20 40 60 8020 40 60 8020 40 60 80 N (number of nodes) K = 3 K = 6 K = 9 B Diagonal Diffuse Initialization Naive Oracle Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 19

Simulations: MSE of γ alpha = 0.025 alpha = 0.1 alpha = 0.25 MSE of block membership vectors 0.2 0.1 0.0 0.2 0.1 0.0 0.2 0.1 0.0 20 40 60 8020 40 60 8020 40 60 80 N (number of nodes) K = 3 K = 6 K = 9 B Diagonal Diffuse Initialization Naive Oracle Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 20

Simulations: conclusions Runtime is order N 2. Initialization matters, especially when α is small. Better performance when there is less mixing (small α, B close to diagonal). Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 21

Overview 1. Notation and motivation 2. The Mixed Membership Stochastic Blockmodel 3. Simulations 4. Applications 5. Conclusions Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 22

Crisis in a Cloister: posterior expected memberships 12 5 6 7 True Faction a a a a Loyal Opposition Outcasts Waverers Young Turks 917 3 1615 1 2 8 4 10 11 13 18 14 Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 23

Crisis in a Cloister: adjacency matrices Figure: Left: Original adjacency matrix, ordered according to posterior group memberships. Right: smoothed adjacency matrix using posterior Z. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 24

Indian Village: adjacency matrices Figure: Left: Original adjacency matrix, ordered according to posterior group memberships. Right: smoothed adjacency matrix using posterior Z. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 25

Overview 1. Notation and motivation 2. The Mixed Membership Stochastic Blockmodel 3. Simulations 4. Applications 5. Conclusions Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 26

Conclusions MMSB is a useful extension to blockmodels when there is a ground truth or the goal is prediction. However, when lacking ground truth model lacks interpretability. Could be improved by incorporating other network components. Variational algorithm also makes interpretation difficult since posterior distribution and emprical bayes are both approximations. Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 27

Thank you! Ted Westling Mixed Membership Stochastic Blockmodels STAT 572 Final Talk 28