A New Space for Comparing Graphs

Similar documents
Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

CS798: Selected topics in Machine Learning

Networks and Their Spectra

ORIE 4741: Learning with Big Messy Data. Spectral Graph Theory

Learning SVM Classifiers with Indefinite Kernels

Communities, Spectral Clustering, and Random Walks

Theory and Methods for the Analysis of Social Networks

series. Utilize the methods of calculus to solve applied problems that require computational or algebraic techniques..

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Kernel methods, kernel SVM and ridge regression

Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

Lecture 13: Spectral Graph Theory

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

1 Matrix notation and preliminaries from spectral graph theory

PCA, Kernel PCA, ICA

Spectral Methods for Subgraph Detection

Spectral Generative Models for Graphs

DATA MINING LECTURE 13. Link Analysis Ranking PageRank -- Random walks HITS

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

1 Matrix notation and preliminaries from spectral graph theory

Statistical Pattern Recognition

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

STA141C: Big Data & High Performance Statistical Computing

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Using R for Iterative and Incremental Processing

High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution

8.1 Concentration inequality for Gaussian random matrix (cont d)

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Clustering using Mixture Models

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Statistical Machine Learning

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Data Mining and Matrices

Machine learning for pervasive systems Classification in high-dimensional spaces

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

c 4, < y 2, 1 0, otherwise,

Online Social Networks and Media. Link Analysis and Web Search

CPSC 540: Machine Learning

Lecture 2: Linear Algebra Review

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Spectral Clustering. Guokun Lai 2016/10

Link Analysis Ranking

Lecture 9: Low Rank Approximation

Algebraic Representation of Networks

STA141C: Big Data & High Performance Statistical Computing

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Mathematical foundations - linear algebra

Advanced Machine Learning & Perception

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

6.842 Randomness and Computation March 3, Lecture 8

Discriminative Direction for Kernel Classifiers

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Probabilistic Graphical Models

Graphs, Geometry and Semi-supervised Learning

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Kernel Methods. Machine Learning A W VO

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Online Social Networks and Media. Link Analysis and Web Search

CPSC 540: Machine Learning

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

ECE521 week 3: 23/26 January 2017

Machine Learning for Data Science (CS4786) Lecture 11

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Linear algebra and applications to graphs Part 1

Learning sets and subspaces: a spectral approach

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

CS249: ADVANCED DATA MINING

CS 6375 Machine Learning

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Binet-Cauchy Kernerls on Dynamical Systems and its Application to the Analysis of Dynamic Scenes

Principal Component Analysis (PCA)

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Overlapping Communities

Efficient Data Reduction and Summarization

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families

Data Mining Recitation Notes Week 3

CS534 Machine Learning - Spring Final Exam

This section is an introduction to the basic themes of the course.

Markov Chains, Random Walks on Graphs, and the Laplacian

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Data Mining Techniques

A Random Dot Product Model for Weighted Networks arxiv: v1 [stat.ap] 8 Nov 2016

Fundamentals of Matrices

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Unsupervised Learning

Graph Partitioning Algorithms and Laplacian Eigenvalues

Table of Contents. Multivariate methods. Introduction II. Introduction I

Transcription:

A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38

Main Contribution A succinct and informative Covariance Matrix for graph structure. Gains: Can be computed in O(E), scalable. Dealing with covariance matrix easier than graphs. Can directly compare different graph structures by simply comparing corresponding covariance matrices. Directly apply machine learning on matrices. No worry about graph and its combinatorial isomorphic variants. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 2 / 38

Networks: A New Source of Information The connectivity (or the presence of absence of edges) in various networks carries a altogether new set of valuable information. The local connectivity structure of an individual (or his/her ego network), can be used to infer many peculiarities about him/her. Analyzing this network connectivity structure can lead to many interesting and useful applications. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 3 / 38

Identify users across networks Get ego networks, try to match it across networks? Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 4 / 38

New Classifications Based on Ego Networks The, collaboration pattern gets reflected in the ego network of an individual. High Energy Physics (HEnP) collaboration network is very dense. Dependence on specialized labs leads to more collaboration. Can we classify a researcher purely on the basis of his/her collaboration ego network? YES!! (This work) Gains: Personalized Recommendations Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 5 / 38

Chemical Compound/Activity Classification Yes!! (Available in a separate tech report) http://arxiv.org/abs/1404.5214 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 6 / 38

Many More Applications... Synonym extraction using word graphs. Structure matching across databases. Structured text translation. Protein alignment. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 7 / 38

The Underlying Fundamental Problem What is the right common space (with a well defined metric) for graph structure. Challenges: Varying sizes. Node correspondence is usually not available. Same graph object exhibits many isomorphic forms. Succinct summarization of graphs is a wide open research direction. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 8 / 38

Graph Representation : Permutation Invariance A property that does not change with node renumbering. 2 1 5 3 4 0 1 0 1 1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 0 1 0 1 0 0 4 2 1 3 5 0 1 1 0 0 1 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 Adjacency Matrices are not permutation invariant, they are not comparable. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 9 / 38

Hardness Graph Isomorphism is a special case of graph comparison (checking equality). Graph Isomorphism is a hard problem. (Its belongingness in P or NP is still open) Its hopeless to have an efficient exact embedding for all possible graph. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 10 / 38

Good News: Real World Graphs are Special Very specific spectrum. Has triadic closures and local clustering. The degree distribution follows power law. Lot of hubs. We can hope to capture all of these in a succinct representation. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 11 / 38

The Right Object for Studying Graphs Should be Permutation invariant. Should be sensitive to variations in the spectral properties. Should be sensitive to distributions of different substructures (or subgraphs). The last two are related, so its not clear what is the right balance. Should be efficient to compute!! Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 12 / 38

Existing Approach 1 A normalized feature vector representation of various known graph invariants. Top eigenvalues of adjacency matrix. Clustering coefficient. Mean and Variance of Degrees, edges, etc. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 13 / 38

Problems Graphs of different sizes have different number of eigenvalues. Not really clear if their values are directly comparable. How many graph invariants are enough? What relative importance to give to different invariants?(some characteristic might dominate others) Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 14 / 38

Existing Approach 2 A histogram based on frequency of small graphs (graphlets) contained in the given graph. Usually frequency of small graphs of size 3-4 is used. The histogram can be efficiently and accurately estimated by sampling. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 15 / 38

Problems Small graphs do not always capture enough structural information. We need frequency of larger graphs for richer information. Counting graphs of size 5 is very costly. For every sampled subgraphs, we need to match it with one of the many isomorphic variants. Every sampling step encode the graph isomorphism problem. (costly for large substructures) Computation time increases exponentially with size of graph, even after sampling. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 16 / 38

Alternative View: Power Iteration of Adjacency Matrix Power iteration is a cheap and effective way of summarizing any matrix. View adjacency matrix A as a dynamical operator (function) operating on vectors. If two operators A and B are similar then vectors {Ax, A 2 x,...a k x} should be similar to {Bx, B 2 x,...b k x} The subspace {Ax, A 2 x,...a k x} is a well studied object known as k-th order Krylov subspace. Krylov subspace based methods are to some of the fastest known linear algebraic algorithms for sparse matrices. Problem: Krylov subspace are not permutation invariant in general. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 17 / 38

Summarization with Power Iteration on Unit Vector Start with vector as e, the vector of all 1 s. Given adjacency matrix A, the generated subspace will be {Ae, A 2 e, A 3 e,...} 2 5 1 4 3 M A = (Ae) 1 (A 2 e) 1 (A 3 e) 1 (A 4 e) 1 (A 5 e) 1 (Ae) 2 (A 2 e) 2 (A 3 e) 2 (A 4 e) 2 (A 5 e) 2 (Ae) 3 (A 2 e) 3 (A 3 e) 3 (A 4 e) 3 (A 5 e) 3 (Ae) 4 (A 2 e) 4 (A 3 e) 4 (A 4 e) 4 (A 5 e) 4 (Ae) 5 (A 2 e) 5 (A 3 e) 5 (A 4 e) 5 (A 5 e) 5 Here (Ae) i is the i th component of vector Ae. Truncated power iteration are very informative. Power iteration over unit vector is key ingredient in many web algorithms including the famous HITS. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 18 / 38

Observation: Power Iteration on Unit Vector is Special If A and B are the adjacancy matrices of same graph under reordering, then rows of M B is simply row shuffled version of M A. 2 5 1 4 3 M A = (Ae) 1 (A 2 e) 1 (A 3 e) 1 (A 4 e) 1 (A 5 e) 1 (Ae) 2 (A 2 e) 2 (A 3 e) 2 (A 4 e) 2 (A 5 e) 2 (Ae) 3 (A 2 e) 3 (A 3 e) 3 (A 4 e) 3 (A 5 e) 3 (Ae) 4 (A 2 e) 4 (A 3 e) 4 (A 4 e) 4 (A 5 e) 4 (Ae) 5 (A 2 e) 5 (A 3 e) 5 (A 4 e) 5 (A 5 e) 5 4 2 3 M B = (Be) 1 (B 2 e) 1 (B 3 e) 1 (B 4 e) 1 (B 5 e) 1 (Be) 2 (B 2 e) 2 (B 3 e) 2 (B 4 e) 2 (B 5 e) 2 (Be) 3 (B 2 e) 3 (B 3 e) 3 (B 4 e) 3 (B 5 e) 3 (Be) 4 (B 2 e) 4 (B 3 e) 4 (B 4 e) 4 (B 5 e) 4 (Be) 5 (B 2 e) 5 (B 3 e) 5 (B 4 e) 5 (B 5 e) 5 1 5 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 19 / 38

Graphs as Bag-of-Vectors 2 5 1 4 3 M A = (Ae) 1 (A 2 e) 1 (A 3 e) 1 (A 4 e) 1 (A 5 e) 1 (Ae) 2 (A 2 e) 2 (A 3 e) 2 (A 4 e) 2 (A 5 e) 2 (Ae) 3 (A 2 e) 3 (A 3 e) 3 (A 4 e) 3 (A 5 e) 3 (Ae) 4 (A 2 e) 4 (A 3 e) 4 (A 4 e) 4 (A 5 e) 4 (Ae) 5 (A 2 e) 5 (A 3 e) 5 (A 4 e) 5 (A 5 e) 5 We can associate a set of n vectors, corresponding to rows of M A, with a graph having n nodes. Reordering of nodes does not change this set. (It simply permutes them) The dimension k of these vectors (no of columns) is the no of power iteration performed. We are looking for an object that richly describes a set of vectors. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 20 / 38

What Describes a Set (Bag) of Vectors? Set of vectors easier to deal with than graphs. The cardinality of set n can vary. Two objects Subspace spanned by n vectors. (bad choice as n k ) Most likely probability distribution generating these vectors. (Fit the most likely Gaussians) Key component of M.L.E multivariate Gaussian over n samples, The Covariance Matrix. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 21 / 38

Our Proposal: The Covariance Matrix Representation We propose C A R k k, the covariance matrix of M A as a representation for graph with adjacancy matrix A. Input: Adjacency matrix A R n n, k, no of power iterations. Initialize x 0 = e R n 1. for t = 1 to k do M (:),(t) = n Axt 1 Ax t 1 1 x t = M (:),(t) end for µ = e R k 1 C A = 1 n n i=1 (M (i),(:) µ)(m (i),(:) µ) T return C A R k k Whats Nice?: For a fixed k, all graphs (irrespective of size) represented in a common space of R k k p.s.d Covariance Matrix. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 22 / 38

Property 1 Theorem C A is a graph invariant. Proof Idea: The covariance matrix is independent of the ordering of rows. Implications: C A can be used as a representation for graph. Covariance matrix is a well studied object which is easier to handle than graphs. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 23 / 38

Property 2 Theorem C A i,j = ( n ) n t=1 λi+j t st 2 ( n ) ( t=1 λi tst 2 ) 1, n t=1 λj tst 2 where λ t is the t th eigenvalue and s t is the component wise sum of the t th eigenvector. Proof Idea: The mean of vector A i e can be written as [et A i e] ) n. Some algebra Ci,j (n A = [e T A i+j e] 1, use representation of e in the [e T A i e][e T A j e] eigenbasis of A to complete the proof. Implications: C A encodes the spectrum. Components of matrix C A (weighted and exponentiated) combination of all λ ts and s ts. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 24 / 38

Property 3 Theorem Given the adjacency matrix A of an undirected graph with n nodes and m edges, we have C A 1,2 = n 2m ( 3 + P3 + n(var(deg)) + m ( 4m n 1) ) 1 (P 2 + m) where denotes the total number of triangles, P 3 is the total number of distinct simple paths of length 3, P 2 is the total number of distinct simple paths of length 2 and Var(deg) = 1 n n i=1 deg(i)2 ( 1 n n i=1 deg(i)) 2 is the variance of degree. [e T A 3 e] [e T A 1 e][e T A 2 e] ) 1. Terms of the form Proof Idea: We get Ci,j (n A = [e T A i e] is the sum of total number of paths of length i although with lot of repetition. Careful counting leads to the above expression. Implications:Components of C A is sensitive to counts of substructures. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 25 / 38

How many Iterations? 2 5 1 4 3 M A = (Ae) 1 (A 2 e) 1 (A 3 e) 1 (A 4 e) 1 (A 5 e) 1 (Ae) 2 (A 2 e) 2 (A 3 e) 2 (A 4 e) 2 (A 5 e) 2 (Ae) 3 (A 2 e) 3 (A 3 e) 3 (A 4 e) 3 (A 5 e) 3 (Ae) 4 (A 2 e) 4 (A 3 e) 4 (A 4 e) 4 (A 5 e) 4 (Ae) 5 (A 2 e) 5 (A 3 e) 5 (A 4 e) 5 (A 5 e) 5 k is the number of power iteration, or the number of columns in M A. Power iteration converges to the dominant eigenvector geometrically. Near convergence the new columns are uninformative. We only need very few iterations, like 4 or 5. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 26 / 38

Similarity between Graphs We compare the corresponding covariance matrices. We use standard Bhattacharrya similarity between C A R k k and C B R k k. Theorem Sim(C A, C B ) = exp Dist(C A,C B ) ( ) Dist(C A, C B ) = 1 2 log det(σ) (det(c A )det(c B )) Σ = C A + C B 2 Sim(C A, C B ) is a positive semi-definite (hence a valid kernel). Note: Covariance matrix has special properties (e.g. symmetric), so the similarity measure should respect that structure. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 27 / 38

Computation Time Given a choice of k, computing the set of vectors {Ae, A 2 e, A 3 e,..., A k e} recursively is O(E k). (A is sparse!!) Computing the covariance matrix C A is O(nk 2 ). Computing similarity is O(k 3 ). Usually, we need very small k like 4 or 5. Hence, the overall complexity is O(E). Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 28 / 38

Evaluation Tasks: How Good is this Representation? We test the effectiveness on two graph classification task. Classifying researcher s subfield based on his/her ego network structure Discriminating random Erdos-Reyni graphs from real graphs A good representation (or similarity measure) should have better discriminative power. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 29 / 38

Task 1 Classify Researcher s Subfield : We take three publicly available collaboration network dataset: 1 High energy physics (HEnP) 2 Condensed matter physics (CM) 3 Astro physics (ASTRO) Sample ego networks from them to generate a dataset of graphs. Given a researcher s ego collaboration network, determine whether he/she belongs to HenP, CM, or ASTRO. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 30 / 38

Task 2 Classify Random Vs Social: Discriminate random Erdos-Reyni graphs from Twitter ego networks. For every twitter ego network Erdos-Reyni graph is generated with same number of nodes and edges. A good similarity measure should be able to discriminate between graphs following different distributions. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 31 / 38

Data Statistics Table: Graph statistics of ego-networks used in the experiments. STATS Number of Graphs Mean Number of Nodes Mean Number of Edges Mean Clustering Coefficient High Energy Condensed Matter Astro Physics Twitter Random 1000 415 1000 973 973 131.95 73.87 87.40 137.57 137.57 8644.53 410.20 1305.00 1709.20 1709.20 0.95 0.86 0.85 0.55 0.18 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 32 / 38

Competing Methodologies Proposed similarity based on covariance matrix. We report results for (k =4,5,6). No tuning. Subgraph frequency histogram with graphs of size 3,4, and 5. Going beyond 5 is way too costly. Random Walk kernels. Feature vector of eigenvalues. (Use Top-5 and Top-10 eigenvectors) Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 33 / 38

Experimental Details We run kernel SVMs, on the similarity values computed from competing representations. Generate 10 partition, use 9 for train and cross-validate for svm parameter C, 10th part for testing. Each experiment repeated 10 times randomizing over partitions. We report classification accuracy and time required to compute similarity. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 34 / 38

Classification Accuracy Methods COLLAB (HEnP Vs CM) COLLAB (HEnP Vs ASTRO) COLLAB (ASTRO Vs CM) COLLAB (Full) SOCIAL (Twitter Vs Random) Our(k =4) 98.06(0.05) 87.70(0.13) 89.29(0.18) 82.94(0.16) 99.18(0.03) Our(k =5) 98.22(0.06) 87.47(0.04) 89.26(0.17) 83.56(0.12) 99.43(0.02) Our(k =6) 97.51(0.04) 82.07(0.06) 89.65(0.09) 82.87(0.11) 99.48(0.03) FREQ-5 96.97(0.04) 85.61(0.1) 88.04(0.14) 81.50(0.08) 99.42(0.03) FREQ-4 97.16(0.05) 82.78(0.06) 86.93(0.12) 78.55(0.08) 98.30(0.08) FREQ-3 96.38(0.03) 80.35(0.06) 82.98(0.12) 73.42(0.13) 89.70(0.04) RW 96.12(0.07) 80.43(0.14) 85.68(0.03) 75.64(0.09) 90.23(0.06) EIGS-5 94.85(0.18) 77.69(0.24) 83.16(0.47) 72.02(0.25) 90.74(0.22) EIGS-10 96.92(0.21) 78.15(0.17) 84.60(0.27) 72.93(0.19) 92.71(0.15) Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 35 / 38

Running Time Comparisons Table: Time (in sec) required for computing all pairwise similarities of the two datasets. SOCIAL COLLAB (Full) Total Number of Graphs 1946 2415 Our (k =4) 177.20 260.56 Our (k =5) 200.28 276.77 Our (k =6) 207.20 286.87 FREQ-5 (1000 Samp) 5678.67 7433.41 FREQ-4 (1000 Samp) 193.39 265.77 FREQ-3 (All) 115.58 369.83 RW 19669.24 25195.54 EIGS-5 36.84 26.03 EIGS-10 41.15 29.46 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 36 / 38

Lessons Simply computing many graph invariant does not give the right representation. 1 Issue of relative importance. 2 Never know how many are enough. 3 One graph usually has more invariants than others. Histogram of subgraphs is a good representation but very costly when computing for subgraphs of size 5. Power iteration is a very cheap way of summarizing graphs which caputres information of various substructures. Finding right representation is the key in machine learning with graphs. Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 37 / 38

Thanks!! Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 38 / 38