Overlapping Communities

Similar documents
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

1 Matrix notation and preliminaries from spectral graph theory

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Clustering and Gaussian Mixture Models

Probabilistic Graphical Models

10-701/15-781, Machine Learning: Homework 4

Probabilistic Graphical Models

A brief introduction to Conditional Random Fields

Project in Computational Game Theory: Communities in Social Networks

Anomaly Detection. Davide Mottin, Konstantina Lazaridou. HassoPlattner Institute. Graph Mining course Winter Semester 2016

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Statistical Models. David M. Blei Columbia University. October 14, 2014

Markov Networks.

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Introduction to Machine Learning

Gaussian Mixture Models

Generative Models for Discrete Data

Undirected Graphical Models

Data Preprocessing. Cluster Similarity

CSC 412 (Lecture 4): Undirected Graphical Models

Supporting Statistical Hypothesis Testing Over Graphs

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

10708 Graphical Models: Homework 2

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Data Mining and Analysis: Fundamental Concepts and Algorithms

RaRE: Social Rank Regulated Large-scale Network Embedding

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

Unsupervised Learning

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Probabilistic Graphical Models

COMP90051 Statistical Machine Learning

ECE 5984: Introduction to Machine Learning

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Data Mining and Matrices

CS534 Machine Learning - Spring Final Exam

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

Graphical Models for Collaborative Filtering

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

SIGNAL DETECTION ON GRAPHS: BERNOULLI NOISE MODEL. Carnegie Mellon University, Pittsburgh, PA, USA

Web Structure Mining Nodes, Links and Influence

Modeling of Growing Networks with Directional Attachment and Communities

1 Matrix notation and preliminaries from spectral graph theory

Node similarity and classification

Ad Placement Strategies

CSC321 Lecture 18: Learning Probabilistic Models

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

9 Forward-backward algorithm, sum-product on factor graphs

Lecture 15. Probabilistic Models on Graph

A Note on Google s PageRank

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Week 5: Logistic Regression & Neural Networks

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

3 : Representation of Undirected GM

A New Space for Comparing Graphs

Using first-order logic, formalize the following knowledge:

Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning

Conditional Random Field

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Statistics: Learning models from data

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Online Social Networks and Media. Link Analysis and Web Search

Algorithm Design and Analysis

Nonnegative Matrix Factorization

Restricted Boltzmann Machines for Collaborative Filtering

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

MODULE -4 BAYEIAN LEARNING

Time-Sensitive Dirichlet Process Mixture Models

CS6220: DATA MINING TECHNIQUES

Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization

Machine Learning for Data Science (CS4786) Lecture 12

Facebook Friends! and Matrix Functions

Notes on Markov Networks

Mathematical Formulation of Our Example

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

ECEN 689 Special Topics in Data Science for Communications Networks

6.207/14.15: Networks Lecture 7: Search on Networks: Navigation and Web Search

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Probabilistic Graphical Models Lecture Notes Fall 2009

6.867 Machine learning, lecture 23 (Jaakkola)

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Theory and Methods for the Analysis of Social Networks

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS Lecture 13. More Maximum Likelihood

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

Transcription:

Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017

Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH MINING WS 2017 2

Lecture road Introduction to graph clustering Hierarchical approaches Spectral clustering GRAPH MINING WS 2017 3

Identifying Communities Can we identify node groups? (communities, modules, clusters) Nodes: Football Teams Edges: Games played GRAPH MINING WS 2017 4

College Football Network Atlantic Football Cups / conferences Nodes: Football Teams Edges: Games played GRAPH MINING WS 2017 5

Protein-Protein Interactions Can we identify functional modules? Nodes: Proteins Edges: Physical interactions GRAPH MINING WS 2017 6

Protein-Protein Interactions Functional modules Nodes: Proteins Edges: Physical interactions GRAPH MINING WS 2017 7

Facebook Network Can we identify social communities? Nodes: Facebook Users Edges: Friendships GRAPH MINING WS 2017 8

Facebook Network Social communities High school Summer internship Stanford (Squash) Stanford (Basketball) Nodes: Facebook Users Edges: Friendships GRAPH MINING WS 2017 9

Overlapping Communities Non-overlapping vs. overlapping communities GRAPH MINING WS 2017 10

Non-overlapping Communities Nodes Nodes Network Adjacency matrix GRAPH MINING WS 2017 11

Communities as Tiles! What is the structure of community overlaps: Edge density in the overlaps is higher! Communities as tiles GRAPH MINING WS 2017 12

Recap so far Communities in a network This is what we want! GRAPH MINING WS 2017 13

Plan of attack 1) Given a model, we generate the network: Generative model for networks A C B D E F G 2) Given a network, find the best model H A C B D E H F G Generative model for networks GRAPH MINING WS 2017 14

Model of networks Goal: Define a model that can generate networks The model will have a set of parameters that we will later want to estimate (and detect communities) Generative model for networks A C B D E F G Q: Given a set of nodes, how do communities generate edges of the network? H GRAPH MINING WS 2017 15

Community-Affiliation Graph Communities, C Memberships, M p A p B Model Nodes, V Model Generative model B(V, C, M, {p c }) for graphs: Nodes V, Communities C, Memberships M Each community c has a single probability p c Network Later we fit the model to networks to detect communities GRAPH MINING WS 2017 16

AGM: Generative Process Communities, C Memberships, M p A p B Model Nodes, V Community Affiliations AGM generates the links: For each For each pair of nodes in community A, we connect them with prob. p A The overall edge probability is: Network P ( u, v) = 1- Õ(1 - p c ) cîm u ÇM v If u, v share no communities: P u, v = ε M u set of communities node u belongs to Think of this as an OR function: If at least 1 community says YES we create an edge GRAPH MINING WS 2017 17

Recap: AGM networks Model Network GRAPH MINING WS 2017 18

AGM: Flexibility AGM can express a variety of community structures: Non-overlapping, Overlapping, Nested GRAPH MINING WS 2017 19

Detecting Communities Detecting communities with AGM: generate A B F D E infer C H G Given a Graph G(V, E), find the Model 1) Affiliation graph M 2) Number of communities C 3) Parameters p c GRAPH MINING WS 2017 20

Maximum Likelihood Estimation Maximum Likelihood Principle (MLE): Given: Data X Assumption: Data is generated by some model f(θ) f model Θ model parameters Want to estimate P f X Θ): The probability that our model f (with parameters Θ) generated the data Now let s find the most likely model that could have generated the data: arg max P f X Θ) 9 GRAPH MINING WS 2017 21

Example: MLE Imagine we are given a set of coin flips Task: Figure out the bias of a coin! Data: Sequence of coin flips: X = [1, 0, 0, 0, 1, 0, 0, 1] Model: f Θ = return 1 with prob. Θ, else return 0 What is P f X Θ? Assuming coin flips are independent So, P f X Θ = P f 1 Θ P f 0 Θ P f 0 Θ P f 1 Θ What is P f 1 Θ? Simple, P f 1 Θ = Θ Then, P f X Θ = Θ 3 1 Θ 5 For example: P f X Θ = 0. 5 = 0. 003906 P f X Θ = 3 8 = 0. 005029 What did we learn? Our data was most likely generated by coin with bias Θ = 3/8 P f X Θ Θ = 3/8 GRAPH MINING WS 2017 22 Θ

MLE for Graphs How do we do MLE for graphs? Model generates a probabilistic adjacency matrix We then flip all the entries of the probabilistic matrix to obtain the binary adjacency matrix A For every pair of nodes u, v AGM gives the prob. p uv of them being linked 0 0.9 0.4 0.04 0.1 0 0.85 0.75 0.1 0.77 0 0.6 0.04 0.65 0.7 0 Flip biased coins A 0 1 0 0 1 0 1 1 0 1 0 1 0 1 1 0 The likelihood of AGM generating graph G: P( G Q) = P ( u, v) ÎE P( u, v) P ( u, v) ÏE (1 - P( u, v)) GRAPH MINING WS 2017 23

Graphs: Likelihood P(G Θ) Given graph G(V,E) and Θ, we calculate likelihood that Θ generated G: P(G Θ) G A B 0 0.9 0.9 0 0.9 0 0.9 0 0 1 1 0 1 0 1 0 0.9 0.9 0 0.9 1 1 0 1 Θ=B(V, C, M, {p c }) 0 0 0.9 0 P(G Θ) 0 0 1 0 G P( G Q) = P ( u, v) ÎE P( u, v) P ( u, v) ÏE (1 - P( u, v)) GRAPH MINING WS 2017 24

MLE for Graphs Our goal: Find Θ = B(V, C, M, p C ) such that: arg max Q P( G ) AGM Q How do we find B(V, C, M, p C ) that maximizes the likelihood? GRAPH MINING WS 2017 25

MLE for AGM Our goal is to find B V, C, M, p C such that: arg max L(V,C,M, p C ) M P(u, v) M (1 P u, v u,v E uv E ) Problem: Finding B means finding the bipartite affiliation network. There is no nice way to do this. Fitting B(V, C, M, p C ) is too hard, let s change the model (so it is easier to fit)! GRAPH MINING WS 2017 26

From AGM to BigCLAM Relaxation: Memberships have strengths F ua u v F ua : The membership strength of node u to community A (F ua = 0: no membership) Each community A links nodes independently: P A u, v = 1 exp ( F ua F va ) GRAPH MINING WS 2017 27

j Factor Matrix F Community membership strength matrix F Nodes F = Communities F va strength of u s membership to A P A u, v = 1 exp ( F ua F va ) Probability of connection is proportional to the product of strengths Notice: If one node doesn t belong to the community (F XY = 0) then P(u, v) = 0 Prob. that at least one common community C links the nodes: F u vector of community membership strengths of u P u, v = 1 1 P C u, v GRAPH MINING WS 2017 28 C

From AGM to BigCLAM Community A links nodes u, v independently: P A u, v = 1 exp ( F ua F va ) Then prob. at least one common C links them: P u, v = 1 Example F matrix: 0 1.2 0 0.2 F u : F v : F w : 0.5 0 0 0.8 0 1.8 1 0 Node community membership strengths C 1 P C u, v = 1 exp ( C F uc F vc ) = 1 exp ( F u F T v ) Then: F u F v T = 0. 16 And: P u, v = 1 exp 0. 16 = 0. 14 But: P u, w = 0. 88 P v, w = 0 GRAPH MINING WS 2017 29

BigCLAM: How to find F Task: Given a network G(V, E), estimate F Find F that maximizes the likelihood: arg max F M P(u, v) M (1 P u, v ) (u,v) E u,v E where: P(u, v) = 1 exp ( F u F v T ) Many times we take the logarithm of the likelihood, and call it log-likelihood: l F = log P(G F) Goal: Find F that maximizes l(f): GRAPH MINING WS 2017 30

BigCLAM: V1.0 Compute gradient of a single row F u of F: Coordinate gradient ascent: Iterate over the rows of F: Compute gradient l F u of row u (while keeping others fixed) Update the row F u : F u F u + η l(f u ) Project F u back to a non-negative vector: If F uc < 0: F uc = 0 This is slow! Computing l F u takes linear time! N(u).. Set out outgoing neighbors GRAPH MINING WS 2017 31

BigCLAM: V2.0 However, we notice: We cache v F v So, computing v N(u) F v now takes linear time in the degree N u of u In networks degree of a node is much smaller to the total number of nodes in the network, so this is a significant speedup! GRAPH MINING WS 2017 32

BigClam: Scalability BigCLAM takes 5 minutes for 300k node nets Other methods take 10 days Can process networks with 100M edges! GRAPH MINING WS 2017 33

Extension: Beyond Clusters GRAPH MINING WS 2017 34

Extension: Directed AGM Extension: Make community membership edges directed! Outgoing membership: Nodes sends edges Incoming membership: Node receives edges GRAPH MINING WS 2017 35

Example: Model and Network GRAPH MINING WS 2017 36

Directed AGM Everything is almost the same except now we have 2 matrices: F and H F out-going community memberships H in-coming community memberships F H Edge prob.: P u, v = 1 exp( F u H v T ) Network log-likelihood: which we optimize the same way as before GRAPH MINING WS 2017 37

Predator-prey Communities GRAPH MINING WS 2017 38

Questions? GRAPH MINING WS 2017 39

References Yang, J. and Leskovec, J. Community-affiliation graph model for overlapping network community detection. ICDE, 2012. Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach by J. Yang, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2013. Detecting Cohesive and 2-mode Communities in Directed and Undirected Networks by J. Yang, J. McAuley, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2014. Community Detection in Networks with Node Attributes by J. Yang, J. McAuley, J. Leskovec. IEEE International Conference On Data Mining (ICDM), 2013. GRAPH MINING WS 2017 40