Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Similar documents
Overlapping Communities

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Heat Kernel Based Community Detection

1 Matrix notation and preliminaries from spectral graph theory

The Ties that Bind Characterizing Classes by Attributes and Social Ties

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

RaRE: Social Rank Regulated Large-scale Network Embedding

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Project in Computational Game Theory: Communities in Social Networks

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Densest subgraph computation and applications in finding events on social media

Final Exam, Machine Learning, Spring 2009

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Learning in Bayesian Networks

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Determining the Diameter of Small World Networks

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

A brief introduction to Conditional Random Fields

Nonnegative Matrix Factorization

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

Quiz 3. Please write your name in the upper corner of each page. Problem Points Grade. Total 100 Q3-1

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

CS6220: DATA MINING TECHNIQUES

Lecture 6: Gaussian Mixture Models (GMM)

DS504/CS586: Big Data Analytics Graph Mining II

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks

Data Mining Techniques

Study Notes on the Latent Dirichlet Allocation

Chapter 5-2: Clustering

SIGNAL DETECTION ON GRAPHS: BERNOULLI NOISE MODEL. Carnegie Mellon University, Pittsburgh, PA, USA

Unified Modeling of User Activities on Social Networking Sites

Graphical Models for Collaborative Filtering

JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE

L 2,1 Norm and its Applications

Stat 315c: Introduction

25 : Graphical induced structured input/output models

On Top-k Structural. Similarity Search. Pei Lee, Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada

a Short Introduction

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

Networks and Their Spectra

GLAD: Group Anomaly Detection in Social Media Analysis

Linear Models for Regression CS534

A Dimensionality Reduction Framework for Detection of Multiscale Structure in Heterogeneous Networks

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Fast Algorithms for Pseudoarboricity

PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks

Clustering and Gaussian Mixture Models

CS6220: DATA MINING TECHNIQUES

Time-Sensitive Dirichlet Process Mixture Models

Data Preprocessing. Cluster Similarity

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

6.867 Machine learning, lecture 23 (Jaakkola)

NETWORKS (a.k.a. graphs) are important data structures

Local Lanczos Spectral Approximation for Community Detection

Intelligent Systems (AI-2)

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Introduction to Logistic Regression

An Efficient reconciliation algorithm for social networks

CS224W: Analysis of Networks Jure Leskovec, Stanford University

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Discovering molecular pathways from protein interaction and ge

Intelligent Systems (AI-2)

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Mixed Membership Stochastic Blockmodels

Linear Models for Regression CS534

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System

On Defining and Computing Communities

Analysis & Generative Model for Trust Networks

Supporting Statistical Hypothesis Testing Over Graphs

DS504/CS586: Big Data Analytics Graph Mining II

Lecture 11: Unsupervised Machine Learning

Massive-scale estimation of exponential-family random graph models with local dependence

Mixed Membership Stochastic Blockmodels

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

CS249: ADVANCED DATA MINING

Probabilistic Graphical Models

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

25 : Graphical induced structured input/output models

arxiv: v1 [cs.dm] 26 Apr 2010

Conditional Random Field

Kristina Lerman USC Information Sciences Institute

Data Mining and Matrices

Reconstruction in the Generalized Stochastic Block Model

Modelling self-organizing networks

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Efficient Subgraph Matching by Postponing Cartesian Products. Matthias Barkowsky

Linear & nonlinear classifiers

Maximum likelihood with coarse data. robust optimisation

On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering

Transcription:

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1

Background Community 2

Background Community Overlap 3

Background Communities are everywhere in networks, especially in large social networks. 4

Background Communities are everywhere in networks, especially in large social networks. Nodes can belong to multiple communities simultaneously, which leads to overlapping community structure. 5

Background Communities are everywhere in networks, especially in large social networks. Nodes can belong to multiple communities simultaneously, which leads to overlapping community structure. In traditional methods, it is assumed that overlaps between communities are sparsely connected. 6

Background Communities are everywhere in networks, especially in large social networks. Nodes can belong to multiple communities simultaneously, which leads to overlapping community structure. In traditional methods, it is assumed that overlaps between communities are sparsely connected. More communities a pair of nodes shares, the more likely they are connected in the network. 7

Background Communities are everywhere in networks, especially in large social networks. Nodes can belong to multiple communities simultaneously, which leads to overlapping community structure. In traditional methods, it is assumed that overlaps between communities are sparsely connected. More communities a pair of nodes shares, the more likely they are connected in the network. 8

Cluster Affiliation Model The social/information network is assumed to be undirected and unweighted. They represent node community memberships with a bipartite affiliation network. Communities Nodes 9

Cluster Affiliation Model The social/information network is assumed to be undirected and unweighted. They represent node community memberships with a bipartite affiliation network. F ua u 10

Notations Notations G(V, E) N B(V, C, M) C M K F R N K Meanings Network Total number of nodes, V = N Bipartite affiliation network Set of communities Node community affiliations Total number of communities, C = K Affiliation factor matrix 11

Cluster Affiliation Model for Big Networks ( BIGCLAM ) The process of generating network G(V, E) given a bipartite community affiliation B(V, C, M): B(V, C, M) Communities C F uc F vc Nodes u v Nonnegative weight F uc 12

Cluster Affiliation Model for Big Networks ( BIGCLAM ) The process of generating network G(V, E) given a bipartite community affiliation B(V, C, M): B(V, C, M) Communities Nodes F uc u C F vc v F uc u C? F vc v Larger F uc is more likely to generate links (inside C). F uc = 0 will not affect the link generation probabilities. Nonnegative weight F uc Community connects its members with probability 1 e F uc F vc 13

Cluster Affiliation Model for Big Networks ( BIGCLAM ) The process of generating network G(V, E) given a bipartite community affiliation B(V, C, M): B(V, C, M) Communities C C p u, v = 1 exp( F u T F v ) F uc F vc F uc F vc F u is a weight vector for node u: F u = F u Nodes u v u? v Nonnegative weight F uc Community connects its members with probability 1 e F uc F vc 14

Probabilistic Interpretation In the generation process, we have an undirected weighted network where pairs of nodes u, v have a latent interaction of non-negative strength X uv In the observed graph G(V, E), u, v is connected if X uv > 0 15

Probabilistic Interpretation In the generation process, we have an undirected weighted network where pairs of nodes u, v have a latent interaction of non-negative strength X uv In the observed graph G(V, E), u, v is connected if X uv > 0 u, v generate an interaction of strength X c uv within community c (using a Poisson distribution with mean F uc F vc ) Then the total amount of interaction X uv = c X uv X c uv ~Pois F uc F vc X uv ~Pois c F uc F vc pr X uv 0 = 1 exp( F T u F v ) c 16

Probabilistic Interpretation Which kind of nodes is likely to have a high degree? Answer: Node u with larger F uc is more likely to be connected to other members of c 1 e F uc F vc F uc F uc = 0 p c u, v = 0 for all v C 17

Probabilistic Interpretation Which pair of nodes is likely to have a link? Answer: Pair of nodes that share multiple community memberships receive multiple chances to create a link. A p u, v = 1 e F ua F va F ua F va u v 18

Probabilistic Interpretation Which pair of nodes is likely to have a link? Answer: Pair of nodes that share multiple community memberships receive multiple chances to create a link. A B p u, v = 1 e F ua F va F ub F ua u F va v F vb p u, v = 1 e F ua F va F ub F vb 1 e F ua F va 19

Background Part Edges between pair of nodes u, v that do not share any common communities (=0?) p u, v = ε, where ε is the background edge probability (between a random pair of nodes) ε = 2 E / V ( V 1) 10 8 20

Community Detection Given an undirected network G(V, E), detect K communities by finding the most likely affiliation factor matrix F to the underlying network G by maximizing the likelihood l F = log P(G F): where l F = u,v E F = arg max F 0 l(f) log(1 exp F u T F v ) (u,v) E F u T F v 21

A variant of nonnegative matrix factorization (NMF): learn F R N K that best approximates the adjacency matrix A of a given network G. F = arg min F 0 D(A, f FFT ) where loss function D = l(f) and link function f = 1 exp( ) 22

Optimization where l F = u,v E F = arg max F 0 l(f) log(1 exp F u T F v ) (u,v) E F u T F v Update F u with the other F v fixed (convex). After each update, F u is projected into a space of non-negative vectors: F uc = max(f uc, 0). Each step requires O( N u ) time. 23

Community Affiliations Whether u belongs to community c or not? Criterion: ignore the membership of u to c if F uc is below some threshold δ. ε 1 exp( δ 2 ) δ = log(1 ε) 10 5 ~10 4 24

Number of communities Reserve 20% of node pairs as a hold out set. The K with the maximum hold out likelihood will be chosen as the number of communities. 25

Experimental results 26

Data Description They collected 6 large social and information networks where nodes explicitly state their community memberships. (Defining ground-truth communities will also help quantitatively evaluate the performance) LiveJournal, Friendster, Orkut, YouTube: Nodes: users, edges: friendship Groups are formed over specific interests/hobbies etc. DBLP Nodes: authors, edges: co-authorship Communities: publication venues Amazon Nodes: products, edges connect commonly co-purchased products Each node belongs to one or more hierarchically product categories 27

Data Description Some statistics N: number of nodes E: number of edges C: number of communities S: average community size A: community memberships per node 28

Data Overview Ground-truth communities heavily overlap: on average 95% of all communities overlap with at least one other community. How the edge probability changes as k increases: The edge probability increases by 10 4 (from 10 5 to 10 1 ) when the pair share two communities 29

Evaluation Measures Runtime comparison 30

Evaluation Measures F-1 score F-1 score = 31

Evaluation Measures Omega Index C uv is the set of ground truth communities u and v share. 32

Evaluation Measures NMI (normalized mutual information) Accuracy in the number of communities ( 1) 33

Experimental results Composite performance of six datasets Scores of methods are scaled: the best performing method achieves the score of 1. 34

Experimental results 35

Conclusion A novel large scale community detection method A set of networks with explicit ground-truth labels for nodes Overlaps of communities are more connected 36