Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011

Overview Question How to efficiently sample graphs from Multiplicative Attribute Graphs model? We introduce the first sub-quadratic sampling algorithm for sampling Multiplicative Attribute Graphs ( ) Time complexity: O (log 2 (n)) 3 E on mild conditions n : the number of nodes in the graph E : number of edges Exploit the close connection between Stochastic Kronecker Graphs (SKG) and Multiplicative Attribute Graphs (MAG) Can sample a graph with 8 million nodes and 20 billion edges in under 6 hours (naïve algorithm will take 93 days)

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Motivation Protein-Protein Interaction Network Social Network (Facebook) H. Jeong, et al., 2001

Need for Statistical Model Statistical Model Class of probability distributions which describes the stochastic process that could have generate data uncertainty in data Example: Continuous Data: Normal Distribution Count Data: Poisson Distribution Now, Graph Data:!?!? We need a probability distribution on the space of graphs!

Initial Works Erdös-Rényi model (1960) Exponential Random Graph Model (Anderson et al, 1999) Usually called ERGM or p Uses usual exponential family model with interesting features of the Graph as sufficient statistic Latent Space Model (Hoff et al, 2002) Embed nodes into latent social space

Initial Works Problem Erdös-Rényi model (1960) Exponential Random Graph Model (Anderson et al, 1999) Usually called ERGM or p Uses usual exponential family model with interesting features of the Graph as sufficient statistic Latent Space Model (Hoff et al, 2002) Embed nodes into latent social space Parameter estimation and sampling algorithm does not scale (at least O(n 2 ))

The Hunt for Scalability Stochastic Kronecker Graphs (SKG) Model (Leskovec et al., 2010) Parameter Estimation: O (n log 2 (n)) for each MCMC step Sampling: O (log 2 (n) E ) Multiplicative Attribute Graphs (MAG) Model (Kim and Leskovec, 2010) Generalization of SKG model ( ) Parameter Estimation: O (log 2 (n)) 2 E Sampling:!?!?

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Graph and Adjacency Matrix Let G be a directed graph. Its nodes are labed 1, 2,..., n. The A ij = 1 if there is an edge from i to j. Graph G Adjacency Matrix A 2 7 1 6 4 3 5 8

Question You are given an adjacency matrix! You want to, compactly describe it! In other words, compress it using small number of parameters.

Kronecker Multiplication Suppose you are given a 2 2 matrix Θ. Θ = ( 0.4 0.7 0.7 0.9 )

Kronecker Multiplication Suppose you are given a 2 2 matrix Θ. Θ = ( 0.4 0.7 0.7 0.9 ) You would like to take the Kronecker product of itself. Θ Θ

Kronecker Multiplication On the left side, stretch the matrix!

Kronecker Multiplication On the left side, stretch the matrix! On the right side, repeat the matrix four times!

Kronecker Multiplication On the left side, stretch the matrix! On the right side, repeat the matrix four times! Now, element-wise multiply each entry! =

Kronecker Multiplication You can do it further! =

Generative Model Stochastic Kronecker Graphs (SKG) Model The probability of an edge between two nodes is given by Kronecker power of the parameter matrix Θ. Each edge is independently sampled given the parameter matrix sample

Sampling Algorithm Sampling an entry of the adjacency matrix one by one will take O(n 2 ) time. Fractal structure of the Kronecker matrix lets us do it in more clever way: O (log 2 (n) E ).

Sampling Algorithm

Sampling Algorithm Divide the matrix into four quadrants 0.4 0.7 0.7 0.9

Sampling Algorithm Choose one with proportional probability 0.9

Sampling Algorithm Divide the matrix into four quadrants 0.4 0.7 0.7 0.9

Sampling Algorithm Choose one with proportional probability 0.7

Sampling Algorithm Divide the matrix into four quadrants 0.4 0.7 0.7 0.9

Sampling Algorithm Choose one with proportional probability 0.9

Sampling Algorithm Repeat this edge number times 0.9

Difficulty of Parameter Estimation Observed graph data is never permuted to be similar to Kronecker matrices. You have to find the right permutation to do the inference! Quiz Followings are the same adjacency matrix sampled from the SKG model, permuted differently. Which one is the unpermuted version?

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Idea Suppose there are d attributes which describe each node. Each node either possesses or lacks each attribute. For example, each attribute can be understood to an answer to each question, such as: Do you like playing StarCraft? Do you speak Esperanto? Do you watch Big Bang Theory? There is a 2 2 parameter matrix associated with each attribute. For example, ( ) ( ) ( 0.5 0.2 0.6 0.4 0.2 0.3 Θ SC =, Θ 0.3 0.9 Esp =, Θ 0.3 0.8 BB = 0.3 0.6 ).

Model Θ SC = ( 0.5 0.2 0.3 0.9 ) ( 0.6 0.4, Θ Esp = 0.3 0.8 ) ( 0.2 0.3, Θ BB = 0.3 0.6 Consider two people: Yun: (StarCraft O, Esperanto X, Big Bang Theory O) Bill: (StarCraft X, Esperanto X, Big Bang Theory O) The probability of an edge from Yun to Bill is given: P yun,bill = 0.3 }{{} SC ).

Graphical Representation Attributes of Receivers Adjacency Matrix Attributes of Senders sample Q : edge probability matrix Each edge is independently sampled. Naively O ( n 2 d ). Can we do it faster? Q

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Idea Suppose we have a set of attributes which looks like the following: The parameter ( for ) each attribute is the same: 0.4 0.7 Θ = 0.7 0.9 How would the edge probability matrix Q would look like?

Idea This is the answer! Looks familiar? Attributes of Receivers Attributes of Senders Q

Idea This is the answer! Looks familiar? Attributes of Receivers Θ Attributes of Senders Q

Idea This is the answer! Looks familiar? Kronecker Power of the Matrix! Attributes of Receivers Θ Θ [2] Attributes of Senders Q

Idea This is the answer! Looks familiar? Attributes of Receivers Θ Θ [2] Attributes of Senders Θ [3] Q

Connection If every node has unique attribute configuration, it suffices to sample from SKG model! Problem is, there are duplications... Solution: Sample multiple SKGs!

Step 1 Partition nodes such that each node has unique attribute configuration within its partition. (1) (2)

Step 2 Use this partition to divide the edge probability matrix Q. Q (1,1) Q (1,2) Q Q (2,1) Q (2,2)

Step 3 When permuted, each Q (k,l) matrix becomes the submatrix of Kronecker Product Matrix. permute Q (1,1) Q (1,1)

Step 4 We sample a graph for each Q (k,l) and quilt these pieces together to form the final graph! A (1,1) A (1,2) A A (2,1) A (2,2)

Analysis The size of the partition B is critically important! In the last example, B = 2, so we had to sample B 2 = 2 2 = 4 SKGs. If B = 3, then we have to sample 3 2 = 9 SKGs! If B = O(n), this is useless! When the density of the attribute matrix µ is 0.5, then B = O (log 2 n). Since sampling each SKG ( takes O (log ) 2 (n) E ), the overall time complexity is O (log 2 (n)) 3 E.

Analysis Actaully, B = O (log 2 n) is not even a tight bound. 20 size of the partition B 15 10 5 Observed log 2 (n) 0 0.2 0.4 0.6 0.8 1 number of nodes n 10 6

Further Considerations Existence of Small Sets Sampling a whole SKG for small set will be wasteful. µ 0.5 When µ is close to 0 or 1, some attribute configurations are more frequent than others. Q (1,1) Q (1,2) number of occurrences 10 4 10 3 10 2 10 1 10 0 µ = 0.5 µ = 0.6 µ = 0.7 µ = 0.9 Q (2,1) Q (2,2) 10 0 10 1 10 2 10 3 10 4 attribute configuration (ranked)

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Setup We chose two parameter matrices from the literature (Kim and Leskovec 2010, Moreno and Neville 2009) Θ 1 = [ 0.15 0.7 0.7 0.85 ] and Θ 2 = Set number of attributes d = log 2 n. Set µ = 0.5. Now increase n and observe running time! [ 0.35 0.52 0.52 0.95 ]

Total Running Time 10 7 Θ 1 10 7 Θ 2 2 Running Time (ms) 1.5 1 0.5 1 0.5 0 Quilting Naive 0 Quilting Naive 0 2 4 6 8 Number of nodes (n) 10 6 0 2 4 6 8 Number of nodes (n) 10 6

Running Time per Edge Θ 1 Θ 2 Running time per edge 1.2 1 0.8 0.6 0.4 0.2 Quilting Naive 1.5 1 0.5 Quilting Naive 0 0 0 2 4 6 8 Number of nodes (n) 10 6 0 2 4 6 8 Number of nodes (n) 10 6

The Effect of Attribute Density µ Θ1 Θ2 Relative Running Time ρ(µ) 8 6 4 2 0 10 8 6 4 2 0 n = 2 10 n = 2 12 n = 2 14 n = 2 16 n = 2 18 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 attribute probability (µ) attribute probability (µ) cycle list name

Outline 1 Introduction 2 Stochastic Kronecker Graphs (SKG) Model 3 Multiplicative Attribute Graphs (MAG) Model 4 Quilting Algorithm 5 Experiments 6 Conclusion

Summary Question How to efficiently sample graphs from Multiplicative Attribute Graphs model? We introduce the first sub-quadratic sampling algorithm for sampling Multiplicative Attribute Graphs ( ) Time complexity: O (log 2 (n)) 3 E on mild conditions n : the number of nodes in the graph E : number of edges Exploit the close connection between Stochastic Kronecker Graphs (SKG) and Multiplicative Attribute Graphs (MAG) Can sample a graph with 8 million nodes and 20 billion edges in under 6 hours (naïve algorithm will take 93 days)

Further Direction Analysis on µ 0.5 is only empirical. When the attribute matrix is sparse and high-dimensional, the method fails completely: Similarity search algorithms such as Locality Sensitive Hashing (LSH) or Cover Tree may help. Characteristic of such MAG model would be of interest.