Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Similar documents
Quilting Stochastic Kronecker Product Graphs to Generate Multiplicative Attribute Graphs

Supporting Statistical Hypothesis Testing Over Graphs

Tied Kronecker Product Graph Models to Capture Variance in Network Populations

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Graph Detection and Estimation Theory

Consistency Under Sampling of Exponential Random Graph Models

Using Bayesian Network Representations for Effective Sampling from Generative Network Models

Using Bayesian Network Representations for Effective Sampling from Generative Network Models

Mixed Membership Stochastic Blockmodels

Lecture 21: Spectral Learning for Graphical Models

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

CS224W: Methods of Parallelized Kronecker Graph Generation

Scalable Gaussian process models on matrices and tensors

Learning Structured Probability Matrices!

Specification and estimation of exponential random graph models for social (and other) networks

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Analytic Theory of Power Law Graphs

Facebook Friends! and Matrix Functions

Algebra II. A2.1.1 Recognize and graph various types of functions, including polynomial, rational, and algebraic functions.

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Collaborative topic models: motivations cont

Using R for Iterative and Incremental Processing

Scalable and exact sampling method for probabilistic generative graph models

RaRE: Social Rank Regulated Large-scale Network Embedding

Statistical and Computational Phase Transitions in Planted Models

A physical model for efficient rankings in networks

Lecture 1: Asymptotics, Recurrences, Elementary Sorting

How to exploit network properties to improve learning in relational domains

Community Detection. fundamental limits & efficient algorithms. Laurent Massoulié, Inria

STA 4273H: Statistical Machine Learning

Essential Learning Outcomes for Algebra 2

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Deterministic Decentralized Search in Random Graphs

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Mixed Membership Stochastic Blockmodels

Piazza Recitation session: Review of linear algebra Location: Thursday, April 11, from 3:30-5:20 pm in SIG 134 (here)

Iterative solvers for linear equations

STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

Bayesian Inference for Contact Networks Given Epidemic Data

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016

CURRICULUM CATALOG. Algebra II (3135) VA

Theory and Methods for the Analysis of Social Networks

A New Space for Comparing Graphs

Information Recovery from Pairwise Measurements

1 Mechanistic and generative models of network structure

Modeling of Growing Networks with Directional Attachment and Communities

Intelligent Systems (AI-2)

Matrices and Vectors

Lecture 13: Spectral Graph Theory

13: Variational inference II

Blind Identification of Invertible Graph Filters with Multiple Sparse Inputs 1

Bayesian Methods for Machine Learning

Reconstruction in the Generalized Stochastic Block Model

Recoverabilty Conditions for Rankings Under Partial Information

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

Randomized Algorithms

Lecture: Local Spectral Methods (1 of 4)

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

MATRIX DETERMINANTS. 1 Reminder Definition and components of a matrix

Robust Principal Component Analysis

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Parameter estimators of sparse random intersection graphs with thinned communities

Data Mining and Matrices

Content-based Recommendation

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

More on Neural Networks

Probabilistic Graphical Models

Curriculum Catalog

CPSC 540: Machine Learning

CS6220: DATA MINING TECHNIQUES

Accelerated Training of Max-Margin Markov Networks with Kernels

A Nearly Sublinear Approximation to exp{p}e i for Large Sparse Matrices from Social Networks

Data mining in large graphs

Finite Model Theory and Graph Isomorphism. II.

Community Detection on Euclidean Random Graphs

Network Event Data over Time: Prediction and Latent Variable Modeling

Algebra 2 Syllabus. Certificated Teacher: Date: Desired Results

Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

Algorithmic approaches to fitting ERG models

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov

Modeling heterogeneity in random graphs

Bayesian Learning in Undirected Graphical Models

An Introduction to Exponential-Family Random Graph Models

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance

Linear Algebra and Probability

Scaling Neighbourhood Methods

VCMC: Variational Consensus Monte Carlo

1 Complex Networks - A Brief Overview

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Matrix estimation by Universal Singular Value Thresholding

Spectral Methods for Subgraph Detection

CPSC 540: Machine Learning

Transcription:

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011

Overview Question How to efficiently sample graphs from Multiplicative Attribute Graphs model? We introduce the first sub-quadratic sampling algorithm for sampling Multiplicative Attribute Graphs ( ) Time complexity: O (log 2 (n)) 3 E on mild conditions n : the number of nodes in the graph E : number of edges Exploit the close connection between Stochastic Kronecker Graphs (SKG) and Multiplicative Attribute Graphs (MAG) Can sample a graph with 8 million nodes and 20 billion edges in under 6 hours (naïve algorithm will take 93 days)

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Motivation Protein-Protein Interaction Network Social Network (Facebook) H. Jeong, et al., 2001

Need for Statistical Model Statistical Model Class of probability distributions which describes the stochastic process that could have generate data uncertainty in data Example: Continuous Data: Normal Distribution Count Data: Poisson Distribution Now, Graph Data:!?!? We need a probability distribution on the space of graphs!

Initial Works Erdös-Rényi model (1960) Exponential Random Graph Model (Anderson et al, 1999) Usually called ERGM or p Uses usual exponential family model with interesting features of the Graph as sufficient statistic Latent Space Model (Hoff et al, 2002) Embed nodes into latent social space

Initial Works Problem Erdös-Rényi model (1960) Exponential Random Graph Model (Anderson et al, 1999) Usually called ERGM or p Uses usual exponential family model with interesting features of the Graph as sufficient statistic Latent Space Model (Hoff et al, 2002) Embed nodes into latent social space Parameter estimation and sampling algorithm does not scale (at least O(n 2 ))

The Hunt for Scalability Stochastic Kronecker Graphs (SKG) Model (Leskovec et al., 2010) Parameter Estimation: O (n log 2 (n)) for each MCMC step Sampling: O (log 2 (n) E ) Multiplicative Attribute Graphs (MAG) Model (Kim and Leskovec, 2010) Generalization of SKG model ( ) Parameter Estimation: O (log 2 (n)) 2 E Sampling:!?!?

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Graph and Adjacency Matrix Let G be a directed graph. Its nodes are labed 1, 2,..., n. The A ij = 1 if there is an edge from i to j. Graph G Adjacency Matrix A 2 7 1 6 4 3 5 8

Question You are given an adjacency matrix! You want to, compactly describe it! In other words, compress it using small number of parameters.

Kronecker Multiplication Suppose you are given a 2 2 matrix Θ. Θ = ( 0.4 0.7 0.7 0.9 )

Kronecker Multiplication Suppose you are given a 2 2 matrix Θ. Θ = ( 0.4 0.7 0.7 0.9 ) You would like to take the Kronecker product of itself. Θ Θ

Kronecker Multiplication On the left side, stretch the matrix!

Kronecker Multiplication On the left side, stretch the matrix! On the right side, repeat the matrix four times!

Kronecker Multiplication On the left side, stretch the matrix! On the right side, repeat the matrix four times!

Kronecker Multiplication On the left side, stretch the matrix! On the right side, repeat the matrix four times!

Kronecker Multiplication On the left side, stretch the matrix! On the right side, repeat the matrix four times!

Kronecker Multiplication On the left side, stretch the matrix! On the right side, repeat the matrix four times! Now, element-wise multiply each entry! =

Kronecker Multiplication You can do it further! =

Generative Model Stochastic Kronecker Graphs (SKG) Model The probability of an edge between two nodes is given by Kronecker power of the parameter matrix Θ. Each edge is independently sampled given the parameter matrix sample

Sampling Algorithm Sampling an entry of the adjacency matrix one by one will take O(n 2 ) time. Fractal structure of the Kronecker matrix lets us do it in more clever way: O (log 2 (n) E ).

Sampling Algorithm

Sampling Algorithm Divide the matrix into four quadrants 0.4 0.7 0.7 0.9

Sampling Algorithm Choose one with proportional probability 0.9

Sampling Algorithm Divide the matrix into four quadrants 0.4 0.7 0.7 0.9

Sampling Algorithm Choose one with proportional probability 0.7

Sampling Algorithm Divide the matrix into four quadrants 0.4 0.7 0.7 0.9

Sampling Algorithm Choose one with proportional probability 0.9

Sampling Algorithm Repeat this edge number times 0.9

Difficulty of Parameter Estimation Observed graph data is never permuted to be similar to Kronecker matrices. You have to find the right permutation to do the inference! Quiz Followings are the same adjacency matrix sampled from the SKG model, permuted differently. Which one is the unpermuted version?

Difficulty of Parameter Estimation Observed graph data is never permuted to be similar to Kronecker matrices. You have to find the right permutation to do the inference! Quiz Followings are the same adjacency matrix sampled from the SKG model, permuted differently. Which one is the unpermuted version?

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Idea Suppose there are d attributes which describe each node. Each node either possesses or lacks each attribute. For example, each attribute can be understood to an answer to each question, such as: Do you like playing StarCraft? Do you speak Esperanto? Do you watch Big Bang Theory? There is a 2 2 parameter matrix associated with each attribute. For example, ( ) ( ) ( 0.5 0.2 0.6 0.4 0.2 0.3 Θ SC =, Θ 0.3 0.9 Esp =, Θ 0.3 0.8 BB = 0.3 0.6 ).

Model Θ SC = ( 0.5 0.2 0.3 0.9 ) ( 0.6 0.4, Θ Esp = 0.3 0.8 ) ( 0.2 0.3, Θ BB = 0.3 0.6 Consider two people: Yun: (StarCraft O, Esperanto X, Big Bang Theory O) Bill: (StarCraft X, Esperanto X, Big Bang Theory O) The probability of an edge from Yun to Bill is given: P yun,bill = ).

Model Θ SC = ( 0.5 0.2 0.3 0.9 ) ( 0.6 0.4, Θ Esp = 0.3 0.8 ) ( 0.2 0.3, Θ BB = 0.3 0.6 Consider two people: Yun: (StarCraft O, Esperanto X, Big Bang Theory O) Bill: (StarCraft X, Esperanto X, Big Bang Theory O) The probability of an edge from Yun to Bill is given: P yun,bill = 0.3 }{{} SC ).

Model Θ SC = ( 0.5 0.2 0.3 0.9 ) ( 0.6 0.4, Θ Esp = 0.3 0.8 ) ( 0.2 0.3, Θ BB = 0.3 0.6 Consider two people: Yun: (StarCraft O, Esperanto X, Big Bang Theory O) Bill: (StarCraft X, Esperanto X, Big Bang Theory O) The probability of an edge from Yun to Bill is given: P yun,bill = 0.3 }{{} SC 0.6 }{{} Esp ).

Model Θ SC = ( 0.5 0.2 0.3 0.9 ) ( 0.6 0.4, Θ Esp = 0.3 0.8 ) ( 0.2 0.3, Θ BB = 0.3 0.6 Consider two people: Yun: (StarCraft O, Esperanto X, Big Bang Theory O) Bill: (StarCraft X, Esperanto X, Big Bang Theory O) The probability of an edge from Yun to Bill is given: P yun,bill = 0.3 }{{} SC 0.6 }{{} Esp }{{} 0.6 = 0.108 BB ).

Model Θ SC = ( 0.5 0.2 0.3 0.9 ) ( 0.6 0.4, Θ Esp = 0.3 0.8 ) ( 0.2 0.3, Θ BB = 0.3 0.6 Consider two people: Yun: (StarCraft O, Esperanto X, Big Bang Theory O) Bill: (StarCraft X, Esperanto X, Big Bang Theory O) The probability of an edge from Yun to Bill is given: P yun,bill = 0.3 }{{} SC 0.6 }{{} Esp }{{} 0.6 = 0.108 BB The effect of each attribute is multiplicative. Proposed by Kim and Leskovec (2010). ).

Graphical Representation Attributes of Receivers Adjacency Matrix Attributes of Senders sample Q : edge probability matrix Each edge is independently sampled. Naively O ( n 2 d ). Can we do it faster? Q

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Idea Suppose we have a set of attributes which looks like the following: The parameter ( for ) each attribute is the same: 0.4 0.7 Θ = 0.7 0.9 How would the edge probability matrix Q would look like?

Idea This is the answer! Looks familiar? Attributes of Receivers Attributes of Senders Q

Idea This is the answer! Looks familiar? Attributes of Receivers Θ Attributes of Senders Q

Idea This is the answer! Looks familiar? Kronecker Power of the Matrix! Attributes of Receivers Θ Θ [2] Attributes of Senders Q

Idea This is the answer! Looks familiar? Attributes of Receivers Θ Θ [2] Attributes of Senders Θ [3] Q

Connection If every node has unique attribute configuration, it suffices to sample from SKG model! Problem is, there are duplications... Solution: Sample multiple SKGs!

Step 1 Partition nodes such that each node has unique attribute configuration within its partition. (1) (2)

Step 2 Use this partition to divide the edge probability matrix Q. Q (1,1) Q (1,2) Q Q (2,1) Q (2,2)

Step 3 When permuted, each Q (k,l) matrix becomes the submatrix of Kronecker Product Matrix. permute Q (1,1) Q (1,1)

Step 4 We sample a graph for each Q (k,l) and quilt these pieces together to form the final graph! A (1,1) A (1,2) A A (2,1) A (2,2)

Analysis The size of the partition B is critically important! In the last example, B = 2, so we had to sample B 2 = 2 2 = 4 SKGs. If B = 3, then we have to sample 3 2 = 9 SKGs! If B = O(n), this is useless! When the density of the attribute matrix µ is 0.5, then B = O (log 2 n). Since sampling each SKG ( takes O (log ) 2 (n) E ), the overall time complexity is O (log 2 (n)) 3 E.

Analysis Actaully, B = O (log 2 n) is not even a tight bound. 20 size of the partition B 15 10 5 Observed log 2 (n) 0 0.2 0.4 0.6 0.8 1 number of nodes n 10 6

Further Considerations Existence of Small Sets Sampling a whole SKG for small set will be wasteful. µ 0.5 When µ is close to 0 or 1, some attribute configurations are more frequent than others. Q (1,1) Q (1,2) number of occurrences 10 4 10 3 10 2 10 1 10 0 µ = 0.5 µ = 0.6 µ = 0.7 µ = 0.9 Q (2,1) Q (2,2) 10 0 10 1 10 2 10 3 10 4 attribute configuration (ranked)

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Setup We chose two parameter matrices from the literature (Kim and Leskovec 2010, Moreno and Neville 2009) Θ 1 = [ 0.15 0.7 0.7 0.85 ] and Θ 2 = Set number of attributes d = log 2 n. Set µ = 0.5. Now increase n and observe running time! [ 0.35 0.52 0.52 0.95 ]

Total Running Time 10 7 Θ 1 10 7 Θ 2 2 Running Time (ms) 1.5 1 0.5 1 0.5 0 Quilting Naive 0 Quilting Naive 0 2 4 6 8 Number of nodes (n) 10 6 0 2 4 6 8 Number of nodes (n) 10 6

Running Time per Edge Θ 1 Θ 2 Running time per edge 1.2 1 0.8 0.6 0.4 0.2 Quilting Naive 1.5 1 0.5 Quilting Naive 0 0 0 2 4 6 8 Number of nodes (n) 10 6 0 2 4 6 8 Number of nodes (n) 10 6

The Effect of Attribute Density µ Θ1 Θ2 Relative Running Time ρ(µ) 8 6 4 2 0 10 8 6 4 2 0 n = 2 10 n = 2 12 n = 2 14 n = 2 16 n = 2 18 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 attribute probability (µ) attribute probability (µ) cycle list name

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Summary Question How to efficiently sample graphs from Multiplicative Attribute Graphs model? We introduce the first sub-quadratic sampling algorithm for sampling Multiplicative Attribute Graphs ( ) Time complexity: O (log 2 (n)) 3 E on mild conditions n : the number of nodes in the graph E : number of edges Exploit the close connection between Stochastic Kronecker Graphs (SKG) and Multiplicative Attribute Graphs (MAG) Can sample a graph with 8 million nodes and 20 billion edges in under 6 hours (naïve algorithm will take 93 days)

Further Direction Analysis on µ 0.5 is only empirical. When the attribute matrix is sparse and high-dimensional, the method fails completely: Similarity search algorithms such as Locality Sensitive Hashing (LSH) or Cover Tree may help. Characteristic of such MAG model would be of interest.