GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

Similar documents
Mixed Membership Stochastic Blockmodels

EE-559 Deep learning Recurrent Neural Networks

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Deterministic Decentralized Search in Random Graphs

Deep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila

Tracking the World State with Recurrent Entity Networks

STA 4273H: Statistical Machine Learning

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Neural Networks: Backpropagation

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

Mixed Membership Stochastic Blockmodels

Machine Learning I Continuous Reinforcement Learning

Spatial Transformation

Deep Learning Architecture for Univariate Time Series Forecasting

Tied Kronecker Product Graph Models to Capture Variance in Network Populations

The weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge.

Evaluating the Variance of

Deep Feedforward Networks

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

STA 414/2104: Machine Learning

Radial-Basis Function Networks

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

arxiv: v2 [cs.cl] 1 Jan 2019

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Deep Neural Networks

Structured deep models: Deep learning on graphs and beyond

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

A New Space for Comparing Graphs

Deep unsupervised learning

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Overlapping Communities

Delay and Accessibility in Random Temporal Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Slide credit from Hung-Yi Lee & Richard Socher

Dynamic Approaches: The Hidden Markov Model

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Modeling of Growing Networks with Directional Attachment and Communities

Network models: dynamical growth and small world

EE-559 Deep learning LSTM and GRU

Network models: random graphs

Recurrent Neural Networks. Jian Tang

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

STA 4273H: Statistical Machine Learning

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Presented By: Omer Shmueli and Sivan Niv

ECE521 Lectures 9 Fully Connected Neural Networks

Networks as vectors of their motif frequencies and 2-norm distance as a measure of similarity

Lecture 16 Deep Neural Generative Models

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

Introduction to Deep Learning CMPT 733. Steven Bergner

An overview of deep learning methods for genomics

CPSC 540: Machine Learning

CS224W: Social and Information Network Analysis

Learning Gaussian Process Models from Uncertain Data

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

Long-Short Term Memory and Other Gated RNNs

Deep Feedforward Networks

Recurrent Neural Network

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Advanced Introduction to Machine Learning CMU-10715

Convolutional Neural Networks

Time-Sensitive Dirichlet Process Mixture Models

Learning latent structure in complex networks

Convolutional Neural Networks. Srikumar Ramalingam

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

Probability and Information Theory. Sargur N. Srihari

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

RECURRENT NETWORKS I. Philipp Krähenbühl

An overview of word2vec

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

Restricted Boltzmann Machines for Collaborative Filtering

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Greedy Layer-Wise Training of Deep Networks

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Chapter 14 Combining Models

Expectation propagation for signal detection in flat-fading channels

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

UNSUPERVISED LEARNING

Deep Learning Lab Course 2017 (Deep Learning Practical)

Markov Chains, Random Walks on Graphs, and the Laplacian

Transcription:

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University of Toronto, Vector Institute 1

Introduction: Generative Model for Graphs Modeling graphs is fundamental for studying networks e.g. medical, chemical, social Goal: Model and efficiently sample complex distributions over graphs Learn generative model from observed set of graphs 2

Challenges in Graph Generation Large and variable output spaces Graph with n nodes requires n 2 to fully specify structure Number of nodes and edges varies between different graphs Non-unique representations Distributions over graphs without assuming fixed set of nodes n node graph represented by up to n! equivalent adjacency matrices π Π is arbitrary node ordering Complex, non-local dependencies New edges depend on previously generated edges 3

Overview to GraphRNN Decompose graph generation into two RNNs: Graph-level: generates sequence of nodes Edge-level: generates sequence of edges for each new node 4

Modeling Graphs as Sequences Graph G p(g) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S (G, π) = (S π 1,..., S π n ) (1) Each sequence element is adjacency vector S π i {0, 1} i 1 i {1,..., n} for edges between node π(v i ) and π(v j ), j {1,..., i 1} 5

Modeling Graphs as Sequences Graph G p(g) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S (G, π) = (S π 1,..., S π n ) (1) Each sequence element is adjacency vector S π i {0, 1} i 1 i {1,..., n} for edges between node π(v i ) and π(v j ), j {1,..., i 1} 5

Modeling Graphs as Sequences Graph G p(g) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S (G, π) = (S π 1,..., S π n ) (1) Each sequence element is adjacency vector S π i {0, 1} i 1 i {1,..., n} for edges between node π(v i ) and π(v j ), j {1,..., i 1} 5

Modeling Graphs as Sequences Graph G p(g) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S (G, π) = (S π 1,..., S π n ) (1) Each sequence element is adjacency vector S π i {0, 1} i 1 i {1,..., n} for edges between node π(v i ) and π(v j ), j {1,..., i 1} 5

Modeling Graphs as Sequences Graph G p(g) with n nodes under node ordering π Define mapping f s from G to sequence S π = f S (G, π) = (S π 1,..., S π n ) (1) Each sequence element is adjacency vector S π i {0, 1} i 1 i {1,..., n} for edges between node π(v i ) and π(v j ), j {1,..., i 1} 5

Distribution on Graphs Distribution on Sequences Instead of learning p(g) sample, π Π to get observations of S π Then learn p(s π ) modeled autoregressively: p(g) = S π p(s π )1[f G (S π ) = G] (3) Exploiting sequential structure of S π, decompose p(s π ) P(S π ) = n+1 i=1 n+1 p(s π i S π 1,..., S π i 1) (4) = p(si π S<i) π i=1 6

Motivating GraphRNN Model p(g) Distribution over graphs Model p(s π ) Distribution over sequence of edge connections Model p(s π i S π <i ) Distribution over edge connections for i-th node conditioned on previous nodes edge connections parameterize with an expressive neural network 7

GraphRNN Framework Idea: Use an RNN that consists of a state-transition function and an output function: h i = f trans (h i 1, S π i 1) (5) θ i = f out (h i ) (6) h i R d encodes the state of the graph generated so far Si 1 π encodes adjacency for most recently generated node i 1 θ i specifies the distribution of next node s adjacency vector S π i P θi f trans and f out can be arbitrary neural networks P θi can be an arbitrary distribution over binary vectors 8

GraphRNN Framework Corrected Idea: Use an RNN that consists of a state-transition function and an output function: h i = f trans (h i 1, S π i ) (5) θ i+1 = f out (h i ) (6) h i R d encodes the state of the graph generated so far Si π encodes adjacency for most recently generated node i θ i+1 specifies the distribution of next node s adjacency vector S π i+1 P θi+1 f trans and f out can be arbitrary neural networks P θi can be an arbitrary distribution over binary vectors. 9

GraphRNN Framework Corrected Idea: Use an RNN that consists of a state-transition function and an output function: h i = f trans (h i 1, S π i ) (5) θ i+1 = f out (h i ) (6) S π i+1 P θi+1 10

GraphRNN Inference Algorithm Algorithm 1 GraphRNN inference algorithm Input: RNN-based transition module f trans, output module f out, probability distribution P θi parameterized by θ i, start token SOS, end token EOS, empty graph state h Output: Graph sequence S π S0 π = SOS, h 0 = h, i = 0 repeat i = i + 1 h i = f trans (h i 1, Si 1 π ) {update graph state} θ i = f out (h i ) Si π P θi {sample node i s edge connections} until Si π is EOS Return S π = (S1 π,..., S i π ) 11

GraphRNN Inference Algorithm Corrected Algorithm 1 GraphRNN inference algorithm Input: RNN-based transition module f trans, output module f out, probability distribution P θi parameterized by θ i, start token SOS, end token EOS, empty graph state h Output: Graph sequence S π S π = SOS, h 01 0 = h, i = 0 repeat i = i + 1 h i = f trans (h i 1, S i 1 π ) {update graph state} i θ i i+1 = f out (h i ) S π i i+1 P θ {sample node i i + 1 s edge connections} i i+1 until S π is EOS i i+1 Return S π = (S1 π,..., S i π ) 12

GraphRNN Variants Objective: p model (S π ) over all observed graph sequences Implement f trans as Gated Recurrent Unit (GRU) But different assumptions about p(si π S<i π ) for each variant: 1. Multivariate Bernoulli (GraphRNN-S): f out is a MLP with sigmoid activation that outputs θ i+1 R i θ i+1 parameterizes the multivariate Bernoulli Si+1 π P θ i+1 independently 13

GraphRNN Variants Objective: p model (S π ) over all observed graph sequences Implement f trans as Gated Recurrent Unit (GRU) But different assumptions about p(si π S<i π ) for each variant: 2. Dependent Bernoulli sequence (GraphRNN): i 1 p(si π S<i) π = p(si,j S π i,<j, π S<i) π (7) j=1 S π i,j {0, 1} indicating if node π(v i) is connected to node π(v j ) f out is a edge-level RNN generates the edges of a given node 14

Tractability via Breadth First Search (BFS) Idea: Apply BFS ordering to the graph G with node permutation π before generating the sequence S π Benefits: Reduce overall # of seq to consider Only need to train on all possible BFS orderings, rather than all possible node permutations Reduce the number of edge predictions Edge-level RNN only predicts M edges, the maximum size of the BFS queue 15

BFS Order Leads To Fixed Size S π i Si π R M represents sliding window over nodes in the BFS queue Zero-pad all Si π to be a length M vector: S π i = (A π max(1,i M),i,..., Aπ i 1,i) T, i {2,..., n} (9) 16

Experiments

Datasets 3 Synthetic and 2 real graph datasets: Dataset Type # Graphs Graph Size Description Community Synthetic 500 60 V 160 2-community, Erdős-Rényimodel (E-R) Grid Synthetic 100 100 V 400 Standard 2D grid B-A Synthetic 500 100 V 200 Barabási-Albert model, 4 existing nodes connected Protein Real 918 100 V 500 Amino acids nodes, edge if 6 Angstroms apart Ego Real 757 50 V 399 Document nodes, edges citation relationships, from Citeseer 17

Baseline Methods & Settings Compared GraphRNN to traditional models and deep learning baselines: Method Type Algorithm Traditional Erdős-RényiModel (E-R) (Erdös & Rényi, 1959) Barabási-Albert Model (B-A) (Albert & Barabási, 2002) Kronecker graph models (Leskovec et al., 2010) Mixed-membership stochastic block models (MMSB) (Airoldi et al., 2008) Deep learning GraphVAE (Simonovsky & Komodakis, 2018) DeepGMG (Li et al., 2018) 80%-20% train-test split All models trained with early stopping Traditional methods: learn from a single graph, so train a separate model for each training graph in order to compare with these methods Deep learning baselines: use smaller dataset: Community-small: 12 V 20 Ego-small: 4 V 18 18

Evaluating Generated Graph Via MMD Metric Existing: Visual Inspection Simple comparisons of average statistics between the two sets Proposed: A metric based on Maximum Mean Discrepancy (MMD), to compare all moments of their empirical distributions using an exponential kernel with Wasserstein distance. 19

Graph Visualization Figure 2: Visualization of graphs from grid dataset (Left group), community dataset (Middle group) and Ego dataset (Right group). Within each group, graphs from training set (First row), graphs generated by GraphRNN(Second row) and graphs generated by Kronecker, MMSB and B-A baselines respectively (Third row) are shown. Different visualization layouts are used for different datasets. 20

Comparison with traditional models Table 1: Comparison of GraphRNNto traditional graph generative models using MMD. (max( V ), max( E )) of each dataset is shown. Community (160,1945) Ego (399,1071) Grid (361,684) Protein (500,1575) Deg. Clus. Orbit Deg. Clus. Orbit Deg. Clus. Orbit Deg. Clus. Orbit E-R 0.021 1.243 0.049 0.508 1.288 0.232 1.011 0.018 0.900 0.145 1.779 1.135 B-A 0.268 0.322 0.047 0.275 0.973 0.095 1.860 0 0.720 1.401 1.706 0.920 Kronecker 0.259 1.685 0.069 0.108 0.975 0.052 1.074 0.008 0.080 0.084 0.441 0.288 MMSB 0.166 1.59 0.054 0.304 0.245 0.048 1.881 0.131 1.239 0.236 0.495 0.775 GraphRNN-S 0.055 0.016 0.041 0.090 0.006 0.043 0.029 10 5 0.011 0.057 0.102 0.037 GraphRNN 0.014 0.002 0.039 0.077 0.316 0.030 10 5 0 10 4 0.034 0.935 0.217 GraphRNN had 80% decrease of MMD on average compared with traditional baselines GraphRNN-S performed well on Protein: may not involve highly complex edge dependencies 21

Comparison with Deep Learning Models & Generalization Table 2: GraphRNNcompared to state-of-the-art deep graph generative models on small graph datasets using MMD and negative log-likelihood (NLL). (max( V ), max( E )) of each dataset is shown. (DeepVAE and GraphVAE cannot scale to the graphs in Table 1.) Community-small (20,83) Ego-small (18,69) Degree Clustering Orbit Train NLL Test NLL Degree Clustering Orbit Train NLL Test NLL GraphVAE 0.35 0.98 0.54 13.55 25.48 0.13 0.17 0.05 12.45 14.28 DeepGMG 0.22 0.95 0.40 106.09 112.19 0.04 0.10 0.02 21.17 22.40 GraphRNN-S 0.02 0.15 0.01 31.24 35.94 0.002 0.05 0.0009 8.51 9.88 GraphRNN 0.03 0.03 0.01 28.95 35.10 0.0003 0.05 0.0009 9.05 10.61 GraphRNN had 90% decrease of MMD on average compared with deep learning baselines 22% smaller average NLL gap compared to other deep models 22

Experiments: Evaluation with Graph Statistics Figure 3: Average degree (Left) and clustering coefficient (Right) distributions of graphs from test set and graphs generated by GraphRNN and baseline models. GraphRNN generated graphs average statistics closely matchs the overall test set distribution. 23

Experiments: Robustness Interpolate between (B-A) and (E-R) graphs Randomly perturb [0%, 20%,..., 100%] edges of B-A graphs 0% (B-A) 100% (E-R) Figure 4: MMD performance of different approaches on degree (Left) and clustering coefficient (Right) under different noise level. GraphRNN maintains strong performance as we interpolate between these structures, indicating high robustness and versatility. 24