arxiv: v2 [cs.dc] 2 Apr 2016

Size: px

Start display at page:

Download "arxiv: v2 [cs.dc] 2 Apr 2016"

Moses Watts
5 years ago
Views:

1 Sbgraph Conting: Color Coding Beyond Trees Venkatesan T. Chakaravarthy 1, Michael Kapralov 2, Prakash Mrali 1, Fabrizio Petrini 3, Xiny Qe 3, Yogish Sabharwal 1, and Barch Schieber 3 arxiv: v2 [cs.dc] 2 Apr ,3 IBM Research 1 {vechakra, prakmra, ysabharwal}@in.ibm.com 3 {fpetrin, xqe, sbar}@s.ibm.com 2 EPFL 2 michael.kapralov@epfl.ch Agst 9, 2018 Abstract The problem of conting occrrences of qery graphs in a large data graph, known as sbgraph conting, is fndamental to several domains sch as genomics and social network analysis. Many important special cases (e.g. triangle conting) have received significant attention. Color coding is a very general and powerfl algorithmic techniqe for sbgraph conting. Color coding has been shown to be effective in several applications, bt scalable implementations are only known for the special case of tree qeries (i.e. qeries of treewidth one). In this paper we present the first efficient distribted implementation for color coding that goes beyond tree qeries: or algorithm applies to any qery graph of treewidth 2. Since tree qeries can be solved in time linear in the size of the data graph, or contribtion is the first step into the realm of color coding for qeries that reqire sperlinear rnning time in the worst case. This sperlinear complexity leads to significant load balancing problems on graphs with heavy tailed degree distribtions. Or algorithm strctres the comptation to work arond high degree nodes in the data graph, and achieves very good rntime and scalability on a diverse collection of data and qery graph pairs as a reslt. We also provide theoretical analysis of or algorithmic techniqes, showing asymptotic improvements in rntime on random graphs with power law degree distribtions, a poplar model for real world graphs. 1 Introdction Graphs serve as common abstractions for real world data, making graph mining primitives a critical tool for analyzing real-world networks. Conting the nmber of occrrences of a qery graph in a large data graph (sbgraph conting, often referred to as motif conting) is an important problem with applications in a variety of domains sch as bioinformatics, social sciences and spam detection (e.g. [8, 10, 23]). Sbgraph conting and its variants have received a lot of attention in the literatre. Sbstantial progress has been achieved for the case of small qeries sch as triangles or 1

Figre 1: Illstration of a match (left) and a colorfl match (right) 4-vertex sbgraphs: not only have very efficient algorithms been developed (e.g. [15, 20, 27, 31]), bt also theoretical explanation of their performance on poplar graph models has been obtained (see [?

Even for reasonably large graphs (a million edges) and small qeries (e.g. 5-cycles), the nmber of soltions tend to be enormos, rnning into billions.

Theoretically, the fastest known algorithm for conting occrrences of a k-vertex sbgraph in an n-vertex data graph rns in time n ωk/3, where O(n ω ) is the time complexity of matrix mltiplication

2 Figre 1: Illstration of a match (left) and a colorfl match (right) 4-vertex sbgraphs: not only have very efficient algorithms been developed (e.g. [15, 20, 27, 31]), bt also theoretical explanation of their performance on poplar graph models has been obtained (see [?] and references therein). Some of the recent work has addressed larger qeries [29, 30,?, 26, 7], bt or nderstanding here is far from complete. Even for reasonably large graphs (a million edges) and small qeries (e.g. 5-cycles), the nmber of soltions tend to be enormos, rnning into billions. This explosion in the search space makes the sbgraph conting problem very hard even for moderately large qeries. Theoretically, the fastest known algorithm for conting occrrences of a k-vertex sbgraph in an n-vertex data graph rns in time n ωk/3, where O(n ω ) is the time complexity of matrix mltiplication (crrently ω 2.38). This improves pon the trivial algorithm with rntime n k, bt is prohibitively expensive even for moderate size qeries. To address the above isse, Alon et al. [2] proposed the color coding techniqe. Here, given a k-node qery, we assign random colors between 1 and k to the vertices of the data graph, and cont the nmber of occrrences of the qery that are colorfl, meaning the vertices matched to the qery have distinct colors. See Figre 1. The cont is scaled p appropriately to get an estimate on the actal nmber of occrances. The accracy is then improved by repeating the process over mltiple random colorings and taking the average. Restricting the search to colorfl matches leads to prning of the search space and improved efficiency. Using this method, Alon et al. obtained faster algorithms for cetain qeries sch as paths, cycles, trees and bonded treewidth graphs. The power of color coding as a very general conting techniqe together with the importance of sbgraph conting in varios applications (as mentioned above) makes it important to design practically efficient and scalable implementations. In a different work, Alon et al. [1] applied the color coding techniqe for conting the occrrences of treelets (tree qeries) in biological networks. Color coding allowed them to handle tree qeries p to size 10 in protein interaction networks, extending beyond the reach of previosly known approaches [25, 18, 17]. Recently, Slota and Maddri [28, 30] presented FASCIA, an efficient and scalable distribted implementation of sbgraph conting (via color coding), again for the case of treelet qeries. However, despite considerable interest in non-tree qeries from several application domains (see the experimental section for details), the technqe has not been explored for more general settings. In this work we present the first efficient distribted implementation of color coding beyond tree qeries. 2

3 As part of their original color coding soltion, Alon et al. [2] presented faster algorithms for certain special classes of qeries. They showed that if the qery is a tree, then colorfl sbgraph conting can be solved in time O(2 k m), i.e. in time linear in the size of the data graph. They extended the algorithm to show that if the qery is close to a tree, specifically has (small) treewidth t, a rnning time of O(2 k n t+1 ) can be achieved. Treewidth [9] is a widely adopted measre of the intrinsic complexity of a graph. Intitively, it measres how close the topology of a given graph is to being a tree: tree qeries have treewidth 1, and a cycle is the simplest example of a treewidth 2 qery. The above algorithm, restricted to trees, forms the basis for the previosly-mentioned treelet conting implementations [28, 30, 1]. While the rntime of the above algorithm is linear for the case of trees (i.e. acyclic qeries), it becomes at least qadratic for qery graphs of treewidth 2 and beyond. This phenomenon also manifests itself in practice: on real world graphs with even moderately skewed degree distribtion load imbalance is observed and the rnning time tends to have qadratic dependence on the maximm degree of the graph. Ths, even triangles (the smallest cyclic qery) are harder to handle, and have received considerable attention from the research commnity (as mentioned earlier). The goal of this paper is to stdy the colorfl sbgraph conting problem on qeries of treewidth 2, taking the first step in the realm of color coding with cyclic qeries. The class of qeries of treewidth 2 is qite rich. In particlar, it contains all trees, cycles, series-parallel graphs and beyond. Figre 8 shows treewidth 2 qeries (sed in or experimental evalation) drawn from real-world stdies on biological, social and collaboration networks [22, 32, 4]. To the best of or knowledge, the previosly-mentioned algorithm [1] is the best known algorithm for treewidth 2 qeries, and we se it as or baseline. We rephrase this algorithm within or framework and devise a distribted implementation. The rephrased algorithm becomes a recrsive procedre that decomposes the qery into simpler path sbqeries, which are then solved to get the overall cont. We ths refer to or baseline as the Path Splitting algorithm (PS). Or Contribtions 1. Bilding on the PS algorithm, we develop novel strategies that lead to significant performance gains in terms of rntime, scalability, and the size of graphs and qeries handled. 2. Or algorithm works by decomposing the qery to cycles and leaves, thereby redcing the problem of colorfl sbgraph conting on treewidth 2 qeries to conting (annotated) cycles. 3. The decomposition in terms of cycles enables s to exploit the so-called degree ordering approach (e.g., MINBUCKET algorithm for triangle enmeration [?]) Specifically, we show how to force the comptation process to (mostly) work arond high degree vertices, leading to sbstantial speedps and scalability gains. 4. We present a detailed experimental evalation of the algorithms on real-world graphs having more than million edges and real-world qeries of size p to 10 nodes. The reslts show that or strategies offer improvements of p to 28x in terms of rnning time and 3

4 exhibit improved scalability. 5. Finally, we complement or experimental evaltation by a theoretical analysis of the rntime of or degree ordering approach for cycle qeries, on a poplar class of random power law graphs (Chng-L graphs [14]). Or analysis provides jstification for empirically observed performance gains of the approach. Related Work Sbgraph conting has received significant attention in the fields of comptational biology [25, 18, 17] and social network analysis [21, 13, 27,?, 20]. We give an overview of prior work on the problem (both theoretical and empirical) as well as techniqes for making sbgraph conting scalable, and explain how or contribtions relate to this prior work. Color Coding and Approximate Sbgraph Conting: Color coding was introdced in an inflential paper by Alon et al. [2] as a fast algorithm for finding occrrences of a qery in a data graph and conting the nmber of sch occrrences. In a different work, Alon et al. [1] explored its applications to approximate sbgraph conting (most commonly known as motif conting) in comptational biology. They were motivated by the fact that sbgraph conting is an important primitive for characterizing biological networks [?]. Color coding allowed Alon et al. to cont occrrences of treelets (tree qeries) p to size 10 in protein interaction networks, extending beyond the reach of previosly known approaches [25, 18, 17]. A scalable distribted implementation of color coding for trees has been reported by Slota and Maddri [29, 30], bt no principled soltions beyond tree qeries are known. ParSE [33] extends beyond tree qeries, by considering qery graphs that can partitioned into sbtemplates via edge cts of size 1. However, the only class of qery graphs that can be perfectly partitioned sing this method is trees; ParSE resorts to brte force enmeration for other cases. Or work provides the first principled approach to implementing color coding in a scalable way beyond trees qeries. Frther, or analysis of the rntime of or cycle conting sbrotine on a random graphs with a power law degree distribtion provides a theoretical jstification of or algorithmic techniqes. While or work and the above-mentioned prior work [1, 29, 30] cont non-indced sgraphs, some other prior work [25, 18, 17] addressed the case of conting indced sbgraphs. The search space of non-indced sbgraphs is larger and frthermore, these conts are more robst with respect to pertrbations of the data graph [1]. Degree Based Approaches: Designing scalable sbgraph conting algorithms trns ot to be hard even for the simple case of triangle conting. A naive approach lets each vertex enmerate pairs of neighbors and check if they are connected. This leads to wastefl comptations and also rns into load balancing isses on graphs with heavy tailed degree distribtions [31]. The above isse has been addressed sing a simple, bt efficient soltion (referred to as the MINBUCKET algorithm [15, 31]): each vertex enmerates pairs of neighbors with degree no smaller than its own (with arbitrary tie breaking) and checks they are connected. It is not hard to see that this gives a correct cont, and it has been empirically observed that this algorithm does not rn into load balancing isses even on heavy tailed graphs [31]. The MINBUCKET heristic has also been shown to give polynomial rntime improvement over the naive method when the inpt is a random graph with a power law degree distribtion [?]. A recent work by Jha et al. [20] applies the 4

5 degree based techniqe for cotning 4-vertex qeires. There are a few prior approaches for arbitrary qeries of [7, 3, 26], bt algorithms do not se degree information, and are comparable to the baseline algorithm sed in or stdy. To the best of or knowledge, prior to or work there has not been a systematic stdy of how MINBUCKET generalizes to larger sbgraph conting problems. In this work we generalize the method for conting occrrences of treewidth 2 graphs, perform a thorogh experimental evalation and provide a theoretical rntime analysis of or techniqe in the random power law graph model. Or paper improves pon prior work along three axes: generality of qeries handled, scalability of the proposed soltion and theoretical analysis of the main algorithmic primitive on a class of graphs often sed to model real world networks. 2 Preliminaries Sbgraph conting problem. The sbgraph conting problem is defined as follows. The inpt consists of a qery graph Q = (V Q, E Q ) over a set of k nodes and a data graph G = (V G, E G ) over a set of n vertices and m edges. The task is to cont the nmber of (not necessarily indced) sbgraphs of G that are isomorphic to Q. Formally, cont the nmber of injective mappings π : V Q V G sch that for any pair of qery nodes a, b V Q, if q 1, q 2 E Q, then π(q 1 ), π(q 2 ) E G. We refer to sch mappings π as matches. Color coding and colorfl matches. A coloring is a fnction χ : V G {1, 2,..., k}, where for every vertex V G, χ() denotes its color. A match π from V Q to V G is colorfl if a V Q χ(π(a)) = the vertices of Q are mapped to k distinctly colored vertices in G, i.e. {1, 2, 3,..., k}. The main idea is that instead of conting all possible matches of the k vertices of the qery graph to the vertices of the data graph, one first colors the vertices of the data graph niformly at random sing k colors, and then searches for colorfl matches. Colorfl sbgraph conting problem. In the colorfl sbgraph conting problem the task is to cont the nmber of colorfl matches of the qery Q in V G. Or setting conts the nmber of colorfl matches or mappings from Q to the data vertices. Alternatively, we may want to cont the nmber of colorfl sbgraphs that are isomorphic to Q. The latter qantity can be obtained by dividing the former by at(q), the nmber of atomorphisms of Q. While it is comptationally hard to compte at(q) for an arbitrary qery graph, the qantity can be compted qickly for qeries of relatively small size (say abot 10 nodes). Given the above discssion, we focs on conting the nmber of colorfl matches. Treewidth. Intitively, if the qery graph Q = (V Q, E Q ) has treewidth t then Q can be decomposed into sbgraphs Q 1, Q 2,... sch that each sbgraph Q i is also of treewidth t, and each Q i has no more than t nodes that belong also to other sbgraphs. We call sch nodes the bondary nodes of Q i. In addition, the total nmber of distinct bondary nodes in all sbgraphs Q 1, Q 2,... is at most t + 1. Note that the decomposition can be done recrsively as each Q i has treewidth t, ntil we are left only with sbgraphs that have at most t + 1 nodes. This reslts in a treewidth decomposition tree denoted T Q. A formal definition is givne below. A tree decompsition of a qery Q is a tree T = (V T, E T ), wherein each node p V T is 5

6 associated with a sbset of qery nodes S(p) V Q, called pieces, sch that the following properties are tre: (i) for every qery edge (a, b) E Q, there exists a piece S(p) (for some p V T ) that contains both a and b; (ii) for every qery node a V Q, the set of nodes whose pieces contain a indce a connected sbtree. Alternatively, the second property states that if a belongs to pieces S(p 1 ) and S(p 2 ) for some p 1 and p 2, then a mst also belong to the piece S(p) for any node p fond on the (niqe) path connecting p 1 and p 2 in T. The width of the tree decomposition is the maximm cardinality overall pieces mins one, i.e., max p S(p) 1. The treewidth t of the qery is the minimm width over all its tree decompositions. Approximate sbgraph conting via color coding. Conting the nmber of colorfl matches trns ot to be easier than conting the actal (not necessarily colorfl) matches. The price to pay is that the algorithm is randomized. We color the graph randomly and obtain the nmber of colorfl matches, and repeat the process independently at random a few times. Then, an estimate for the nmber of matches (occrances of the qery) can be obtained by taking the average. For a given inpt graph G and qery Q let n(g, Q) denote the nmber of matches π from Q to G. For a (random) coloring χ of vertices of G let n colorfl (G, Q, χ) denote the nmber of colorfl matches of Q to G nder coloring χ. It was shown [2, 1] that with proper normalization the colorfl cont n colorfl (G, Q, χ) is an nbiased estimator of the actal cont. Specifically, the right normalization factor is k k /k!, i.e. we have (k k /k!) E χ [n colorfl (G, Q, χ)] = n(g, Q). The variance of the estimator can also be bonded (see [1], section 2.1). Ths, taking the average of n colorfl (G, Q, χ) nder a few independently chosen colorings χ converges to the right answer, i.e. n(g, Q). Ths, in order to obtain an approximate sbgraph conting algorithm it sffices to solve the colorfl sbgraph conting problem. The rest of the paper is devoted to designing a scalable soltion to colorfl sbgraph conting. 3 Overview The work of Alon et al. [2] yields a natral algorithm for the colorfl sbgraph conting problem on bonded treewidth qery graphs. This algorithm is based on the following intition. Sppose that we have fond a colorfl match π for a sbgraph Q of the inpt qery graph Q, and we wish to extend it into a colorfl match π for Q by additionally fixing the mapping of the nodes otside Q. For this we do not need to know the mapping of the non-bondary nodes of Q, since they do not share edges with nodes otside Q. Instead, it sffices to know the mapping of the bondary nodes (i.e., the nodes that share edges with nodes otside Q) and the set of colors sed by π. The mapping of the bondary nodes is needed to ensre that for any edge from a bondary node to otside, the corresponding data vertices share an edge in the data graph; and the set of colors is needed to avoid repeating a color already sed by π. Analogosly, in the setting of conting, in order to cont the nmber of colorfl matches for Q, we do not need a complete listing of colorfl matches of Q. Instead, we can grop the colorfl matches based on the set of colors sed and the mappings for the bondary nodes and it sffices to know the cont per grop. Based on the above intition, we apply dynamic programming to cont the nmber 6

7 colorfl matches of Q. Let T Q be the tree decomposition of Q with treewith t. The algorithm processes T Q in a bottom-p manner and a creates a hash table (that we call a projection table) for each tree node. The sbgraph graph Q associated with a node has at most t bondary nodes and these nodes can be mapped to the data vertices in at most n t ways. In addition, we need to record the colors of the data vertices to which the nodes of Q are mapped. Since we focs on colorfl matches, the set of colors sed (that we call signatre ) can be at most ( k t) 2 k (where k is the size of the qery graph). For each combination of mappings to the bondary nodes and the signatre, we record the nmber of colorfl matches of Q consistent with the combination. The nmber of entries in the table is at most n t 2 k. The projection table for a tree node can be compted from those of its children. We get the total nmber of colorfl matches by performing an aggregation on the projection table of the root node. Working in the realm of motif conting, Slota and Maddri [30] described an efficient distribted implementation of the above algorithm for the case of tree qeries and presented an experimental evalation. Trees have treewidth one hence, the size of projection tables is linear in the nmber of vertices and the overall comptation can be carried ot in time linear in the graph size. Or goal is to address a more general class of qeries (beyond trees) in a distribted setting and we focs on the case of qeries of treewidth 2. Treewidth 2 qeries are more challenging since in the worst case, the tables can be of size qadratic in the nmber vertices and the comptation time also gets qadratic. The constrcion of or algorithm is motivated by the fact that real life data graphs tend to exhibit variations in the degree distribtion. A naive implementation that treats all data vertices in the same manner wold reslt in a lot of entries in the projection tables of the high degree vertices that do not lead to colorfl matches for the overall inpt qery. Moreover, in a distribted setting the processors owning sch vertices perform more comptation leading to load imbalance. Or algorithm is based on a crcial observation that any treewidth 2 qery can be recrsively decomposed into (annotated) cycles or leaves. The core component of the algorithm is an efficient procedre for handling cycles that employs a strategy based on degree based ordering of vertices. This leads to redction in wastefl comptation, as well as improved load balancing. The procedre is inspired by a similar strategy sed in prior work [?] for handling triangles. The overall algorithm ses the above decomposition and the improved procedre for handling cycles. 4 Overall Algorithm In this section we describe the overall strctre of or sbgraph conting algorithm that proceeds in two steps. In the first step, we decompose the qery into cycles and leaves (called blocks) and constrct a decomposition tree for the inpt qery Q which is essentially a careflly chosen treewidth decomposition tree; each node of the tree represents a block and encodes a convenient sbqery. This step is independent of the data graph and can be viewed as a preprocessing phase for the qery. Then in the second step we traverse the tree in a bottom p manner, performing primitive conting operations over the data graph prescribed by the internal nodes and combining the reslts. The final cont is prodced 7

8 by the root of the tree. 4.1 Decomposition Tree For an inpt qery graph Q = (V Q, E Q ), constrct the decomposition tree T (Q) by iteratively applying one of two primitive operations: contraction of a leaf edge or a cycle. As these operations are applied the nmber of nodes in the qery Q decreases. At the same time new edges may appear in Q to represent contracted strctres, and edges as well as nodes may get annotated with the identity of the contracted strctres that they represent. Before defining the tree constrction algorithm we need to introdce two definitions. First, we say that a cycle C in Q is contractible if (a) C = (a 0, a 1,..., a L 1 ) is indced (i.e. there are no edges between nodes a 0, a 1,..., a L 1 except the edges of C) and (b) cycle C has most two bondary nodes (i.e., nodes that share edges with nodes otside of C). Second, a leaf edge is an edge L = (a, b), where b is a leaf node (has degree one); a is called the bondary node of the leaf edge. We se the common term block to refer to leaf edges and contractible cycles. For example, consider the qery named Satellite in Fig 2. The cycle (i, j, k) is contractible with a single bondary node i, the cycle (a, b, c, d, e) is contractible with two bondary nodes a and c, and (f, h) is a leaf edge. The cycle (i, f, g) is not contractible since it has three bondary nodes. We constrct the decomposition tree T (Q) starting with an empty tree. The tree is bilt bottom-p starting from the leaf level and hence, the strctre may be a forest with mltiple roots in the intermediate stages. Each iteration adds a new node and may make some of the existing roots as its children, clminating in a tree. In the constrction process we iteratively perform the following operations ntil Q contains a single node: find a block B (a leaf edge or a contractible cycle) in Q and remove it from Q (while possibly adding an edge to Q), and add a corresponding node to T (Q). We iterate ntil Q contains a single node. We distingish 3 cases. Case 1: B is a contractible cycle C with exactly one bondary node a V Q : Remove the nodes and edges of C from Q, except for node a. Erase any annotation fond on a in Q and annotate it with the block name B. Case 2: B is a contractible cycle C with two bondary nodes a, b V Q : Remove the nodes and edges of C from Q, except for the nodes a and b. Add an edge (a, b) in Q and annotate it with B. Erase any annotation fond on a and b in Q. Case 3: B is a leaf edge L = (a, b): Remove b and the edge from Q. Erase any annotation fond on node a Q and annotate it with the block name B. The nodes and edges of B inherit the annotations from Q, as they were before Q was transformed (this ensres that the annotations on the bondary nodes that got erased get captred by the new annotation). Next we add a new node B to the tree T (Q). If any node or edge in B has an annotation B, make B a child of B in T (Q). This completes the constrction of T. We show below that the process can find a block in each iteration and terminate sccessflly on every qery of treewidth 2. Assming termination, it is not difficlt to see that the process prodces a tree. Dring contraction, every block B annotates a particlar node or an edge of Q, recording the way in which it has been contracted. The annotation gets inherited by 8

9 Figre 2: Illstration of the decomposition process. the top row shows the seqence of qeries considered in the process (the original qery is on the left), the bottom row shows the blocks that were contracted in each step. some other block B in a sbseqent iteration. The block B becomes the parent of B. The annotation is erased in Q, ensring that no other block becomes a parent of B. Taking Satellite as the inpt qery Q, Figre 2 provides an illstration process, along with the otpt decomposition tree. The bottom row shows the blocks being contracted and the top row shows the transformed Q. The first iteration contracts the cycle B 1 = (a, b, c, d, e). A new edge (a, c) is added to Q, along with the annotation B 1, and B 1 is added to the tree. The second iteration contracts the leaf block B 2 = (f, h). Node f is annotated as B 2 and the B 2 is added to the tree. The third iteration contracts B 3 = (a, f, g, c), by adding an edge (f, g) with the annotation B 3. The block is added to the tree and it is made the parent of B 1 and B 2. In the forth iteration, the cycle B 4 = (i, j, k) is contracted. Node i gets annotated as B 4 and B 4 is added to the tree. Finally, the qery Q 4 is contracted leaving Q empty. We add Q 4 as the root of the tree, making it the parent of B 3 and B 4. The following lemma garantees that for any treewidth 2 qery Q, the tree constrction procedre will always find a block (a leaf edge or a contractible cycle) in each iteration and terminate sccessflly. The proof relies on prior work on nested ear decompositions of treewidth 2 qeries [16]. 9

10 Lemma 4.1 (i) Any treewidth 2 qery Q contains a block; (ii) the transformed qery reslting from the contraction process is also a treewidth 2 qery. Proof: We first prove part (ii) of the lemma. If the contracted block has one bondary node then no new edges are added to Q, in which case the tree T Q for the pdated Q is given by deleting all the nodes not in the pdated V Q from the sbsets S Q (t). If the contracted block has two bondary nodes a and b then the edge (a, b) is added to Q. In this case we get the tree for the pdated Q by replacing each occrrence of the nodes not in the pdated V Q by b. Note that the size of each sbset is still at most 3, nodes associated with sbsets that contain b form a connected component, and for at least one sbset S Q (t), {a, b} S Q (t). We now prove part (i). First, Root the tree T Q at an arbitrary non-leaf node. This indces an ancestor-descendant relationship on the nodes in V T. Note that if there are two nodes {t, t } V T, sch that S Q (t ) S Q (t), node t can be omitted and all its children connected to t. Ths from now on we assme that no sbset S Q (t) is contained (or identical) to another sbset. We need the following definition and claim. Definition 4.1 For a node t V T, let Q t be the sbgraph of Q indced by the nodes that are in the nion of the sbsets associated with the nodes of T Q in the sbtree rooted at t. Claim 4.1 For every node t V T, either Q t contains a block, or Q t is a path whose endpoints are in the sbset associated with the parent of t (if sch exists). Before proving the claim we show how it implies the lemma. Since the claim holds also for the root of T Q then either Q contains a block or it is a path in which case it also contains a leaf block. Proof of Claim 4.1: We prove the claim by indction. The base of the indction is a leaf node. Consider a leaf node t V T. There are two possibilities: (i) S Q (t) = {x, y}, and (ii) S Q (t) = {x, y, z}. If S Q (t) = {x, y}, then at least one node, say y, is only connected to x and ths (x, y) is a leaf edge. If S Q (t) = {x, y, z}, then consider the sbgraph indced by {x, y, z}. If this sbgraph is a triangle then it mst be a contractible cycle. The only remaining case is the sbgraph indced by {x, y, z} forms a path. Assme that the endpoints of this path are x and z. If one of these endpoints, say z, is not in the sbset associated with the parent of t then (y, z) is a leaf edge. Otherwise, let t be the parent of t, we have S Q (t) S Q (t ) = {x, z}. For the indctive step consider a non-leaf node t V T. If Q t for any child t of t contains a block then we are done. Assme that this is not the case. Consider first the case that t has a single child t. By the indctive hypothesis Q t is a path whose endpoints x and y are in S Q (t). Let S Q (t) = {x, y, z}. If z is connected to both x and y then the cycle closed by z is a contractible cycle. If z is connected to only one endpoint, say y, then we get a path with endpoints x and z. If either x or z are not in the sbset associated with the parent of t, then the missing endpoint is leaf node. If both x and z are in the sbset associated with the parent of t then the indctive claim follows. Next, Consider the case that t has several children. If two of the children of t, say t and t, share endpoints then the cycle formed by Q t and Q t is contractible. Otherwise, t mst have exactly two children, say t and t, with endpoint {x, y} and {y, z}, forming a path with endpoints x and z. If z is connected also to x then the cycle closed by the edge (x, z) 10

11 is a contractible cycle. If either x or z are not in the sbset associated with the parent of t, then the missing endpoint is a leaf node. If both x and z are in the sbset associated with the parent of t then the indctive claim follows. An inpt qery may admit mltiple decomposition trees and the choice of the tree inflences the performance of or algorithm. In Section 6, we present a heristic for finding a good decomposition. Each node of the tree represents a block and it will be convenient to view to the node simply as the block represented by it. At this point, it is interesting to consider tree qeries stdied by Slota and Maddri [30]. Given a tree qery, their algorithm fixes a sitable qery node as the root and iteratively processes the tree in a bottom-p manner. The algorithm implicitly ses a decomposition tree. However, since trees do not have cycles, the decomposition tree consists of only leaf edge blocks. In contrast, the decomposition trees of treewidth two qeries involve the more challenging case of cycles as well. 4.2 Tree Traversal Here, we describe the second step of the algorithm that traverses the decomposition tree in a bottom-p manner and comptes the nmber of colorfl matches of the blocks in the data graph. For this prpose, we define the notion of sbqeries represented by blocks. A sbqery Q of the inpt qery Q refers to any indced sbgraph of Q. Consider a block B and let U be the nion of nodes fond in the block B and its descendant blocks in the tree. The sbqery represented by B, denoted SQ(B), refers to the sbqery indced by U. For example, Figre 2 shows the sbqery represented by the block B 4. The decomposition tree yields a nested hierarchy of sbqeries: the root block represents the whole inpt qery and for any block B with the parent B, the sbqery SQ(B) is contained within SQ(B ). Let B be a block. A node a SQ(B) is said to be a bondary node, if a shares an edge with a node otside SQ(B). It is not hard to see that these bondary nodes are the same as the bondary nodes of B (identified dring the tree constrction process). Ths, SQ(B) can have at most two bondary nodes. Before describing the conting algorithm we extend the notion of colorfl matches to sbqeries: a colorfl match for a sbqery Q = (V Q, E Q ) is an injective mapping π : V Q V G, sch that for any edge (a, b) E Q, (π(a), π(b)) E G, and the vertices of Q are mapped to distinctly colored vertices of G. The algorithm traverses the tree in a bottom-p manner. For each block B, it otpts a sccinct synopsis of the set of colorfl matches of the sbqery SQ(B), sing a projection table and signatre (as otlined in Section 3). that we now define precisely. Signatre: Let K = {1, 2,..., k} denote the set of colors sed in the data graph, where k is the size of the inpt qery Q. The term signatre refers to any sbset α K. For a sbqery Q and a colorfl match π of Q, the signatre of π refers to the set of colors of the data vertices sed by π and it is denoted sig(π), i.e., sig(π) = a Q {χ(π(a))}. Projection Tables: Let Q be sbqery with two bondary nodes a and b. For a pair of data vertices and v and a signatre α K let cnt(, v, α Q) denote the nmber of colorfl matches of Q wherein the bondary nodes a and b are mapped to and v and the 11

12 Overall Algorithm 1. Compte a decomposition tree T (Q) for the inpt qery Q. 2. Traverse the tree bottom-p. For each non-root block B: Use the projection tables of the children blocks of B and compte the projection table for B 3. Otpt the nmber of colorfl mathes of the sbqery represented by the root-block. Figre 3: Overall Algorithm signatre of π is α: cnt(, v, α Q) = {π Π : π(a) = and π(b) = v and sig(π) = α}, where Π is the set of all the colorfl matches of Q. These conts can be conveniently represented in the form a hash table with (, v, α) forming the key and the cont forming the vale. We refer to any encoding of the above conts (sch as the hash table above) as the projection table of Q. In the worst case, the table may have size qadratic in the inpt data graph. However, a significant fraction of the triplets will have a cont of zero and we maintain only the non-zero conts. The projection table for sbqeries having a single bondary node a is defined in a similar manner. For a data vertex and a signatre α K, define cnt(, s Q) = {π Π : π(q) = and sig(π) = α}. 4.3 Compting the Conts Given a decomposition tree, the algorithm works based on the fact that the projection table for a block can be compted by joining the projection table of its children blocks. As an illstration of the idea, consider the block B 3 having bondary nodes f and g, and the sbqery represented by it (Figre 2). For a pair of vertices and v, and a signatre α, the projection cont cnt(, v, α B 3 ) can be compted as follows. The block consists of the path (a, f, g, c), and any match π for the sbqery mst map these nodes to vertices (x,, v, y) that form a path in the data graph. The block is annotated by its children blocks B 1 with bondary nodes a and c, and B 2 with bondary node f. Any pair of matches π 1 and π 2 for SQ(B 1 ) and SQ(B 2 ) can be extended as matches for SQ(B 3 ), as long as their signatres α 1 and α 2 are disjoint (since the blocks do not share any node) and are contained within α. Therefore, we can derive the desired cont by performing the following aggregation over all qadrples (x, y, α 1, α 2 ) satisfying the properties: (x,, v, y) forms a path in the data graph; α 1, α 2 α; (α 1 α 2 ) is empty. The aggregation is: cnt(, v, α B 3 ) = cnt(x, y, α 1 B 1 ) cnt(, α 2 B 2 ). x,y α 1,α 2 12

13 We can express the projection conts for any block in the above manner. However, as the nmber of children increases, the cartesian prodct involved in the aggregation wold be prohibitively expensive. Or procedres efficiently simlate the aggregation by performing a seqence of join operations involving the projection tables of children blocks. Given a decomposition tree, the algorithm traverses the decomposition tree in a bottomp manner, compting the projection tables for all the blocks and clminates in the rootblock representing the whole inpt qery. At this step, instead of prodcing a projection table, the algorithm simply comptes the nmber of colorfl matches. The psedo-code is shown in Figre 3. 5 Solving Blocks The main step of the algorithm is the constrction of the projection tables of a block from its children blocks. In this section we develop efficient procedres for handling cycles. For the sake of highlighting the main ideas, we first focs on the case of cycles fond at a leaf level of the decomposition tree (sch as the cycle B 1 in Figre 2); these cycles do not have other blocks annotating them. General cycles are handled by extending these ideas as discssed later. 5.1 Solving Cycles at the Leaf Level Consider a cycle block C = (a 0,..., a L 1 ) of length L withot annotations. The cycle may have at most two bondary nodes. We discss the more interesting case where the nmber of bondary nodes is exactly two; the other cases are handled in a similar fashion. Let the two bondary nodes of the cycle be a p and a q, for some 0 p, q L 1. We present two procedres for compting the projection table of C: a baseline procedre that ses a path splitting strategy and an efficient procedre gided by a degree based ordering of vertices. Path Splitting Algorithm (PS). For two nodes a s and a t on the cycle, let P s,t + and P s,t be the paths obtained by traversing the cycle from a s to a t in the clockwise and conter-clockwise directions, respectively, i.e., P s,t + = (a s, a s 1,..., a t ) and Ps,t = (a s, a s 1,..., a t ), where and refer to addition and sbtraction modlo L. Let cnt(,, P s,t + ) denote the projection conts for path P s,t + taking a s and a t as the bondary nodes. Namely, for a triple (, v, α), let cnt(, v, α P s,t + ) denote the nmber of colorfl matches for P s,t + wherein π(a s) =, π(a t ) = v and sig(π) = α. A similar notion is defined for the paths Ps,t. The procedre splits the cycle into two paths along the bondary nodes, given by P p,q + and Pp,q; we refer to these special paths as P + and P. See Fig 5 (a) for an illstration. The projection table for P + is constrcted iteratively, by bilding the tables for the paths P p,j +, for each node a j fond along the path. This is accomplished by extending the projection table for the prior path P p,j 1 + via a join with the edges of the data graph. The psedocode is given in Figre 4 (Procedre 1). We assme that all the conts are initialized to zero. The first iteration is handled by directly reading the edges of the data graph. In the sbseqent iterations, we extend every triple (, v, α) with non-zero cont cnt(, v, s P p,j 1 + ), with any edge (v, w), provided the reslting match is colorfl. The 13

14 Procedre 1: Compting Projection Table for P + For each edge (, v) in the data graph G cnt(, v, α P p,p 1 + ) 1, where α = {χ(), χ(v)}. For j = p 2, p 3,..., q For each triple (, v, α) with cnt(, v, α P + p,j 1 ) 0 For each edge (v, w) in G sch that χ(w) α do: Let α = α {χ(w)}. Increment cnt(, w, α P + p,j ) by cnt(, v, α P + p,j 1 ). Procedre 2: Compting Projection Table for C For each entry (, v, α 1 ) with cnt(, v, α 1 P + ) 0 For each entry (, v, α 2 ) with cnt(, v, α 2 P ) 0 If α 1 α 2 = {χ(), χ(v)} α α 1 α 2 val 1 cnt(, v, α 1 P + ); val 2 cnt(, v, α 2 P ) Increment cnt(, v, α C) by val 1 val 2. Figre 4: PS Algorithm conts for P are constrcted analogosly. Finally, the projection table for the cycle C is obtained by joining the conts of P + and P, as shown in Procedre 2. Here, a pair of triples (, v, α 1 ) and (, v, α 2 ) are joined, if the reslting match is colorfl. Discssion of baseline. As discssed below (Section 5.2), the PS procedre can be extended to handle general cycles with annotations, and yields an algorithm for handling treewidth 2 qeries. The resltant PS algorithm is eqivalent to the original color coding algorithm of Alon et al. [2]. Prior work [30, 1] on colorfl sbgraph conting tilize the algorithm of Alon et al. as the basis for conting tree qeries (treelets). We developed a distribted implementation of the PS algorithm, and se it as the baseline in or experimental stdy. Known techniqes for sbgraph conting with large qeries (e.g. [7, 26]) employ similar graph traversal techniqes, making PS consistent with the state of the art for sbgraph conting as well as color coding. We develop an procedre, called Degree Based (DB) algorithm, that otperforms the PS algorithm for practical graphs and qeries. It is motivated by the following observations. First, the paths P + and P may have neven lengths (for instance, in Figre 5), P + = 6 and P = 2) and the processing of the longer path dominates the overall rnning time. Second, in real-graphs with skewed degree distribtions, high degree vertices tend to have more paths passing throgh them, which poplate the projection tables of P + and P. However, significant fraction of these paths do not find appropriate conterparts in the other table to complete a match, leading to wastefl comptations. Third, in a distribted setting, the above phenomenon manifests as higher load on processors owning high degree vertices, leading to load imbalance. It is not difficlt to address the first isse alone. The only intricacy is that when the paths are split evenly, the bondary nodes may appear internally on the the paths (see Figre 5 with a split across nodes denoted h and d). This can be handled by recording the mapping for the bondary nodes as part of the projection conts. We implemented the above algorithm as well and noticed that the isse of wastefl comptations and load im- 14

15 Figre 5: PS and DB Illstrations. balance still persists. And frthermore, performance of the PS algorithm and the modified implementations does not differ significantly on or benchmark graphs and qeries. Degree Based Algorithm (DB). The DB algorithm addresses all the three isses by sing the strategy of bilding the paths from high degree vertices. Arrange the data vertices in the increasing order of their degree; if two vertices have the same degree, the tie is broken arbitrarily, say by placing the vertex having the least id first. We say that a vertex is higher than a vertex v, if appears after v in the above ordering and this is denoted v. Consider the inpt cycle C = (a 0, a 1,..., a L 1 ) with bondary nodes a p and a q and let π be a colorfl match for C that maps the above nodes to data vertices 0, 1,..., L 1, respectively. Among these data vertices, let j be the highest vertex. We refer to the corresponding node a j as the highest node of π. The idea is to partition the set of colorfl matches into L grops based on their highest node a h and compte the projection table for each grop separately. For a pair of data vertices and v, and a signatre α, let cnt(, v, α C, hi = h) denote the nmber of colorfl matches of π for C, wherein π(a p ) =, π(a q ) = v, sig(π) = α and a h is the highest node of π. The projection table for C can be obtained by aggregating the above conts: for any triple (, v, α), cnt(, v, α C) = L 1 cnt(, v, α C, hi = h). (1) h=0 We next describe an efficient procedre for compting the conts cnt(, v, α C, hi = h). The concept of high starting matches plays a crcial role in the procedre. Let a d be the node diagonally opposite to a h on the cycle, i.e., d = h L/2. The procedre splits the cycles into two paths P + h,d and P h,d ; Figre 5 (b) shows the paths for two sample vales of h. Let a j be a node fond on the path P + h,d, A colorfl match π for P + h,j is said to be high-starting, if the data vertex π(a h ) is higher than all the other data vertices sed by π, 15

16 Procedre 1: Compte cnt (, v, α P + h,d ) For each edge (, v) in the data graph G with v cnt (, v, α P + h,h 1 ) 1, where α = {χ(), χ(v)}. For j = h 2, a 3,..., d For each triple (, v, α) with cnt (, v, α P + h,j 1 ) 0 For each edge (v, w) in G s.t. w and χ(w) α: Let α = α {χ(w)}. Incr. cnt (, w, α P + h,j ) by cnt (, v, α P + h,j 1 ). Procedre 2: Compte cnt (x, y, α C, hi = h) for Config. (A) For each entry (, v, x, α 1 ) with cnt (, v, x, α 1 P + h,d ) 0 For each entry (, v, y, α 2 ) with cnt (, v, y, α 2 P h,d ) 0 If α 1 α 2 = {χ(), χ(v)} α α 1 α 2 val 1 cnt (, v, x, α 1 P + h,d ); val 2 cnt (, v, y, α 2 P h,d ) Incr. cnt (x, y, α C, hi = h) by val 1 val 2. Figre 6: DB Algorithm i.e., π(a h ) π(a i ), for all nodes a i on the path P + h,j. For a pair of vertices and v, and a signatre α, let cnt (, v, α P + h,j ) denote the nmber of high-starting colorfl matches for the path P + h,j wherein π(a h) =, π(a j ) = v and sig(π) = α. We then cont the high-starting colorfl matches for the two paths, which can be accomplished via edge extensions, as in the PS algorithm. However, the crrent setting offers a crcial advantage: we can dictate that the starting node a h is the highest node, meaning whenever an entry (, v, α) gets extended by an edge (v, w), we can impose the condition that is higher than w in the degree based ordering. Imposing the condition leads to a significant prning of the tables. The psedo-code is given in Figre 6 (Procedre 1). While the degree based strategy is more efficient, we need to address an intricacy regarding the projection aspects. In contrast to the PS algorithm, the DB algorithm splits at the highest node and conseqently, the bondary nodes p and q may appear inside the paths. Ths, in order to get the projection conts on p and q, we also need to explicitly record the mappings for the bondary nodes. The two nodes a p and a q may occr on either P + h,d or P h,d. Six different configrations are possible, of which two are shown in Figre 5 (b). In Configration (A), the paths inclde one bondary each, whereas in the second configration, the same path incldes both the bondary nodes. The other for configrations are symmetric: the bondary nodes may swap the paths in which they occr and in Configration (B) can also reverse the order in which they occr. We discss the two configrations shown in the figre; the other configrations are handled in a similar fashion. Consider configration (A). In order to record the mappings of the bondary node a p, we introdce an additional field in the projection conts. For a triple of data vertices, v and x, and a signatre α, let cnt (, v, x, α P + h,d ) denote the nmber of high-starting matches π for P + h,d with π(a h) =, π(a d ) = v, π(a p ) = x and sig(π) = α. These conts 16

17 Compte Projection Table for P + h,d Let B be the block annotating the edge (a h, a h 1 ) cnt (,, P + h,h 1 ) = cnt (,, B) For j = h 1, h 2,..., d Execte NodeJoin(a j ) Execte EdgeJoin(a j ) Execte NodeJoin(a d ) NodeJoin(a j ): If a j is annotated by a block B For each (, v, α 1 ) with cnt (, v, α 1 P + h,j ) 0 For each (v, α 2 ) with cnt(v, α 2 B) 0 If (α 1 α 2 = {χ(v)} α α 1 α 2 val 1 cnt (, v, α 1 P + h,j ); val 2 cnt(v, α 2 B) Incr. cnt (, v, α P + h,j ) by val 1 val2 EdgeJoin(a j ) For each entry cnt (, v, α 1 P + h,j ) 0 For each entry cnt(v, w, α 2 B) 0 and w If (α 1 α 2 = {χ(v)} α α 1 α 2 val 1 cnt (, v, α 1 P + h,j ); val 2 cnt(v, w, α 2 B) Incr. cnt (, w, α P + h,j 1 ) by val 1 val 2 Figre 7: DB Procedre for General Cycle Blocks are compted in a manner similar to the base procedre shown in Figre 6 (Procedre 1); however, when the process enconters the bondary node p (namely, the initialization step or j = p), the mapped vertex (v or w, respectively) is recorded in the additional field. The analogos conts for P can derived in a similar manner. The vale of cnt (, v, α C, hi = h) is obtained by joining the two; see Procedre (2) in Figre 6. Configration (B) is handled in a similar fashion, except that we need two additional fields to record the mappings for both the bondary nodes. Namely, we maintain conts having keys of the form (, v, x, y) representing the mapping of the nodes h, d, q and p to the vertices, v, x and y. Procedre (2) is also adjsted accordingly. Finally, we can get the projection table cnt(, v, α C) via aggregation, as in Eqation Solving General Blocks In this section, we present procedres for handling generic blocks. We first consider the case of cycle blocks with two bondary nodes. Consider a generic cycle C = (a 0, a 1,..., a L 1 ) having two bondary nodes a p and a q, whose nodes and edges may be annotated with other blocks (children of C in the decomposition tree). All these blocks have at most two bondary nodes and these are fond on C. For sch any block B, the sbqery represented by B has the same bondary nodes as that of B. Ths, we can get the projection table for C by joining the projection tables of the 17

18 sbqeries represented by the above blocks, as described below. As before, we consider each possible choice for the highest node a h and split the cycle into two paths P + h,d and P h,d. The path segment P + h,d also represents a sbqery (indced by the nion of the nodes fond in the path and the blocks annotating path). Ths, we can extend the notion of projection tables for these segments as well. The procedre for compting the projection table for P + h,d is similar that the one discssed in previos section (Procedre 1 in Figre 6), and works by extending one edge in each step. However, two aspects need to be addressed. Firstly, in contrast to the prior procedre, the edge being extended may be annotated with a block or n-annoated (and correspond to an original edge fond in inpt qery Q). In the former case, we perform a join operation with the edges of the data graph (as before), whereas in the latter case the join operation involves the projection table of the block B. For the sake of niformity, it will be convenient to view the former edges as blocks as well, denoted B G, and associate with them a projection table derived from the graph edges, as follows. For each edge (, v) G, set cnt(, v, α) as 1, for α = {χ(), χ(v)}; all other entries of the table are set to a cont of zero. The second aspect is that the nodes of the cycles may also be annotated, and these get inclded as part of the seqence of joins being performed. The two aspects are addressed by procedres called NodeJoin and EdgeJoin. The psedo-code is shown in Figre 7. The procedre starts with an initial table representing the first edge (a h, a h 1 ) and performs a seqence of join operation with the blocks annoatating the nodes and edges of the cycle. At this jnctre, two intricacies mst be highlited. Firstly, the endpoint a h and/or a d may be annotated by a block B, which mst be joined by either P + h,d or P + h,d, bt not by both (to avoid doble conting). For this prpose, we adopt the convention that P + h,d and P h,d inclde only the block annotating a d and a h (if fond), respetively. Secondly, for a block with two bondary nodes p and q, the projection table views one of them as the first bondary node and the other as the second (corresponding to the two components of the keys of the form (, v, α)). Ths, the bondary nodes are ordered and the projection tables need not be symmetric: taking q as the first bondary node and p as the second bondary node wold prodce a different bondary tables. However, the bondary tables are transpose of each other (cnt(, v, α) = cnt(v,, α)). Or algorithm maintains both the tables and ses the appropriate one as dictated by the nodes of the cycle. The psedo-code reflects the first aspect, bt, for the sake of clarity, ignores the second. The projection conts obtained by the above process are joined sing a proecre similar to Figre 6, taking into accont the configration in which the bondary nodes occr. These are aggregated over all possible choices of the high node a h. Cycles with a single bondary node are handled in a similar manner by considering each possible choice for the highest node a h and splitting the cycle into two paths P + h,d and P h,d. The setting is simpler with only two configrations possible on how the bondary nodes may appear on the paths: the (single) bondary node may appear in P + or P. Ths, the prior procedres can be applied here as well. The case of leaf blocks are also handled via join operations. Any leaf block (a, b) is processed by joining the projection table for the blocks annotating the nodes a, the edge (a, b) and the node b (if fond). At the end of the traversal process, the root block is solved, which is either a cycle or a 18

19 singleton node. In the former case, the block is treated as a cycle withot bondary nodes. Instead of compting its projection table, we simply cont the nmber of colorfl matches, via a procedre similar to that of two-bondary cycles. In the latter case, we consider the projection table of the block annotating the singleton node and otpt the sm of conts across all entries of the table. The process yields the nmber of colorfl matches of the inpt qery Q. 6 Finding Good Decomposition Trees In each step of the decomposition process, mltiple blocks may be available for contraction. Each seqence of choices leads to a niqe decomposition tree, and hence, mltiple trees are possible for a given qery. For example, the qery brain1 (Figre 8) admits two decomposition trees: (i) contract the 4-cycle first and then the 6-cycle, and (ii) vice versa. We condcted an experimental stdy involving a nmber of real-world data graphs and qeries. For each qery, we enmerated all the possible decomposition trees and evalated the exection time on each graph. We observed a maximm difference of 13x in the exection times of two decomposition trees for the same graph-qery combination. However, we noted that in most cases the optimal tree is independent of the data graph and is mainly determined by the strctre of the qery. These observations show that we need a procedre for selecting a good tree, bt in this process, we need not analyze the large data graph; rather, it sffices to focs on the strctral properties of the small qery graph. Or stdy also showed that the following factors, in the decreasing order of importance, determine the exection time: (i) length of the longest cycle block; (ii) nmber of bondary nodes; (iii) nmber of node/edge annotations. Armed with the above observations, we designed a simple heristic procedre. Enmerate all possible trees for the given qery and pick the best sing the above factors for comparison. In or experimental setting, barring a few exceptions, the heristic picked the optimal tree in majority of the cases and a near-optimal tree for the rest. Since the qeries are of small size (abot 10 nodes), even a seqential implementation of the heristic takes insignificant amont of rnning time. 7 Distribted Implementation In this section, we present a brief sketch of the distribted implementation of the two algorithms, highlighting their main aspects. The distribted implementation consists of three layers. The first layer, called the planner, finds a good decomposition tree for the given qery a fast seqential implementation the heristic discssed in Section 6. The second layer, called the plan solver, takes the data graph and the decomposition tree and implements the PS and DB algorithms presented in Section 5. It accomplishes the above task by sing efficient join rotines spported by the third layer, called engine. The engine has three fnctionalities. The first is to store the data graph in a distribted manner. This is achieved via a 1D decomposition, wherein the vertices are eqally distribted among the processors sing block distribtion, and each vertex is owned by some processor. The second is to maintain projection tables. These tables are of two types: nary projection 19

20 Table 1: Real Data Graphs Graph Domain Nodes Edges Avg Max Deg Deg brightkite Geo loc. 58K 214K condmat Collab. 23K 93K astroph Collab. 18K 198K enron Commn. 36K 180K hepph Citation 34K 421K slashdot Soc. net. 82K 900K epinions Soc. net. 131K 841K orkt Soc. net. 524K 1.3M roadnetca Road net. 2M 2.7M brain Biology 400K 1.1M tables having single-vertex keys of the form (, α) associated with blocks having single bondary nodes; binary projection tables having two-vertex keys of the form (, v, α). The binary tables also have variants involving additional fields for storing the mappings for the bondary vertices. The engine provides a convenient abstraction to the plan solver for all these types of tables. All the tables are maintained as distribted hash tables which se open addressing to resolve collisions. Every entry (, v, α) is stored on the processor owning v; the degree of v is packed as part of the entry for enforcing the degree constraint in the join operations (of the form w in Procedre 1 of Figre 6). Signatres are maintained as bitmaps. The third fnctionality is to spport two types of join operations on the projection tables. The first type of join is sed for extending a path segment an edge; this involves a join with either the graph edges or the projection table of the block annotating the edge. In the former case, the extension of an entry with a key (, v, α) with an edge (v, w) will be performed at the owner of v. The reslt is an entry with a key (, w, α ); this entry is commnicated to the owner of w, where it gets stored. The latter case involves join of two entries with keys (, v, α 1 ) and (v, w, α 2 ). Since the first entry is stored at the owner of v and the second, at the owner of w, a commnication is performed to bring the two entries to a common processor. The second type of join is sed for merging the projection tables of two path segments (for example, Procedre 2 in Figre 6) and it is implemented in a similar way. The two operations are implemented sing a standard sort-merge join procedre with signatre compatibility checks performed via fast bitwise operations. 8 Experimental Stdy We present an extensive experimental evalation of the algorithms presented in the paper. Or experiments inclde a comparison of the algorithms on exection time, strong and weak scaling stdies for or algorithm, and stdies to evalate the qality of or qery plan generation heristic and the efficacy of color coding for treewidth two qeries. 20

Figre 8: Real world qeries sed in or stdy. Figre 9: Average exection time (seconds). 8.1 Experimental Setp System. The experiments were condcted on an IBM Ble Gene/Q system [12].

Each MPI rank was mapped to a single core. The nmber of MPI ranks mapped to a node was adjsted based on the memory reqirements of individal experiments. Graphs.

21 Figre 8: Real world qeries sed in or stdy. Figre 9: Average exection time (seconds). 8.1 Experimental Setp System. The experiments were condcted on an IBM Ble Gene/Q system [12]. Each BG/Q node has 16 cores and 16 GB memory; mltiple nodes are connected sing a 5D tors interconnect. Or implementation is based on MPI2 with gcc with the nmber of ranks varying from 32 to 512. Each MPI rank was mapped to a single core. The nmber of MPI ranks mapped to a node was adjsted based on the memory reqirements of individal experiments. Graphs. The experiments involved nine real world graphs obtained from the SNAP dataset collection and the hman brain network from the Open Connectome Project (http: //snap.stanford.ed, Or benchmark incldes representative graphs from different domains in SNAP. The graphs and their characteristics are presented in Table 1. We also sed synthetic R-MAT graphs [11], for the prpose of stdying the weak scaling behavior of or algorithms. Qeries. Or qery benchmark consists of the ten real world qeries shown in Figre 8. The qeries were derived from prior network analysis work spanning diverse domains: dros, ecoli1, ecoli2, brain1, brain2, brain3 - biological networks [22, 19]; glet1, glet2 - graphlets [7]; wiki - collaboration networks [32]; yotbe - spam networks [24]. Algorithms. We stdy two algorithms: PS, which serves as the baseline, and or degree based DB algorithm. Recall that PS is eqivalent to the dyamic programming based algorithm of Alon et al. [2]. 21

22 8.2 Graph-Qery Characteristics The characteristics of the inpt graph and qery strongly inflence the rnning time of qery conting algorithms. To obtain an overall characterization of the phenomenon, we measred the exection time of the DB algorithm on each of the 100 real graph and qery combinations sing 512 MPI ranks. Figre 9 shows the average rnning time for each graph across the ten qeries and the average rnning time of each qery across the ten graphs. The wide variations in exection time across graphs and qeries is indicative of their relative difficlty in practice. For example, althogh roadnetca is a larger graph than epinions, the average rnning time of the former is smaller than the latter by an order of magnitde. We can nderstand this behavior by stdying the skew in nderlying degree distribtion. In general, conting colorfl occrrences of a qery on a graph with high skew (indicated by high maximm degree in Table 1) tends to be comptationally expensive. Similarly, the qeries also exhibit large variations in rnning time, ranging from sb-second for yotbe, glet1 and glet2 to more than a minte for brain2 and brain3. These variations can be acconted for by stdying the differences in the size and the sb-strctres of the qeries. We observed that qeries with longer cycles are more challenging. As an extreme case, a 12-vertex complete binary tree qery reqires 2 seconds on average, in contrast to the 10-vertex brain3 qery which reqires nearly 2 mintes on average, exemplifies or observation. 8.3 Performance Comparison of PS and DB Algorithms We stdy the performance of the PS and DB algorithms on 100 graph-qery combinations obtained by selecting a graph from Table 1 and a qery from Figre 8. For or DB algorithm, we sed plans spplied by the heristic described in Section 6. In contrast, for the PS algorithm, we enmerated all the possible plans and obtained the optimal plan. Ths, we compare or algorithm to the best possible scenario for the baseline algorithm. We compte the improvement factor (IF ) of DB over PS as the ratio of the exection time of PS to DB. Figre 10 shows IF at 32 and 512 ranks. The combinations where DB otperforms PS (IF > 1) are highlighted in green. The blank entries represent cases where PS (or DB) did not complete exection, de to lack of available memory. At 32 ranks, we can see that DB otperforms PS on 84% of the graph-qery combinations with IF being as high as 9.1x (average 2.4x). At 512 ranks, DB otperforms the baseline on 89% of the cases, with IF becoming as high as 28.7x (average 5.0x). We can see that the relative performance of the two algorithms is dependent on the graph-qery pair. For instance, the average IF on enron and condmat graphs are 8.4 and 3.1 on 512 ranks, respectively, correlating well with their skew in the degree distribtion (see Table 1). Similarly, the improvement factors is higher on complex qeries sch as brain1 where the average improvement is 13.1x, compared to yotbe where the average improvement is only 4.1x. The phenomenon becomes extreme in the case of road networks that have very low skew and exhibit sb-second average rnning time across qeries. Or DB algorithm scales better than PS, as demonstrated by the increase in IF at higher ranks. For different graph-qery combinations, we compted the ratio of IF at 512 ranks to that of 32 ranks and fond that IF increases by a factor of p to 4.7x (average 1.7x). 22

23 (a) 32 Ranks (b) 512 Ranks Figre 10: Improvement factor of the DB algorithm over the PS algorithm. 23

24 DB PS DB PS Normalized Time Normalized MaxLoad brain1 brain2 dros ecoli1 ecoli2 glet1 glet2 wiki yotbe 0.00 brain1 brain2 dros ecoli1 ecoli2 glet1 glet2 wiki yotbe (a) Time (b) Max. Load DB PS 1.00 Normalized AvgLoad brain1 brain2 dros ecoli1 ecoli2 glet1 glet2 wiki yotbe Real World Qeries (c) Avg. Load Figre 11: Normalized exection time, average load and maximm load on enron graph. To nderstand this trend frther, we compte the load (nmber of projection fnction operations) for both algorithms for processing different qeries on the enron graph at 512 ranks. For different qeries, Figre 11 shows the exection time and the average and maximm load. We can see that DB has lesser average load than PS, since DB avoids wastefl comptations. Frthermore, the improvement obtained by DB over PS on exection time correlates well with improvement obtained on the maximm load. For example, on ecoli1 qery, even thogh PS otperforms DB at 32 ranks, the perforamance is reversed at 512 ranks (see Fig 10), becase of sperior load balancing characteristic of DB. 8.4 Scalability Characteristics of DB Algorithm We stdied the scaling of DB across the 100 graph-qery combinations. For each combination, we compted the ratio of the exection time at 512 ranks to that of 32 ranks. Figre 12 smmarizes the above information by providing the averge of the above speedp for each qery across graphs and the same for each graph across qeries. As against an 24

25 Figre 12: Avg. speedp of DB at 512 ranks compared to 32 ranks. brain1 dros ecoli2 glet2 brain2 ecoli1 glet1 wiki yotbe Ideal brain1 brain2 brain3 dros ecoli1 glet1 wiki ecoli2 glet2 yotbe Speedp Time (Seconds) Ranks Ranks Figre 13: Strong and weak scaling ideal speedp of 16x, we see that the algorithms obtains speedps in the range of 7.4x to 15.8x. We stdied the strong scaling behavior of or algorithm, sing enron as a representative graph. Taking 32 ranks as the baseline, Figre 13 shows the speedp p to 512 ranks for different qeries. The algorithm scales well across qeries, with an average speedp of 8.2x and maximm speedp of 9.9x at 512 ranks (as against an ideal speedp of 16x). To stdy weak scaling, we se R-MAT synthetic graphs with parameters A = 0.5, B = 0.1, C = 0.1 and D = 0.3 and edge factor 16, sggested in a Graph 500 benchmark specification ( jriedy/tmp/graph500/). The nmber of vertices was fixed at 1K per rank and the nmber of ranks was varied from 32 to 512. We report the exection times each qery-rank combination in Figre 13. We see excellent weak scaling behavior with the exection times at 512 ranks remaining close to that of the baseline 32 ranks. 8.5 Evalation of Plan Generation Heristic We stdied the qality of or plan generation heristic for the DB algorithm at 512 ranks. For each graph-qery combination, we determined the optimal plan via an exhastive enmeration. We compared the exection time of the heristic plan to the optimal plan and measred the percentage difference. These reslts are reported in Figre 14. We can see that in 90% of the case, the heristic generated the optimal plan, whereas in the remaninig cases, the difference was at most 15%. 25

Figre 14: Error % of the exection time of the plan proposed by the plan heristic with reference to the optimal plan for each graph-qery combination.

6 Precision of Color Coding We evalated the precision of color coding on or benchmark by performing independent trials and compting the empirical variance of the sample (see Section 2).

26 Figre 14: Error % of the exection time of the plan proposed by the plan heristic with reference to the optimal plan for each graph-qery combination. Figre 15: Coefficient of variation with 50 trials of color coding for each graph-qery combination. 8.6 Precision of Color Coding We evalated the precision of color coding on or benchmark by performing independent trials and compting the empirical variance of the sample (see Section 2). Specifically, for a given graph-qery combination we performed a seqence of trials, where in each trial the colorfl cont n colorfl (G, Q, χ) was compted for a fresh random coloring. We performed 10 random trials for each of the 100 graph-qery combinations in or test set and evalated the empirical mean and variance of the nmber of colorfl matches. For each graph-qery combination, we compted the coefficient of variation, which is the ratio of the empirical variance to the mean. The reslts are shown in Figre 15. A vale close to 0 indicates the convergence of or estimate to the tre mean n(g, Q). We observed that with only three trials, 82% of the graph-qery combinations had coefficient of variation at most 0.1; when the nmber of trials was increased to 10, it increases to 91%. Hence, sing 512 ranks, for a majority of the inpt graph-qery combinations in or benchmark, we reqire less than a minte to cont the actal nmber of matches of the qery, with 10% accracy. We conclde that or DB algorithm enables fast approximate conting of treewidth 2 qeries for data graphs spanning varios real domains. 26

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2 BIJU PATNAIK UNIVERSITY OF TECHNOLOGY, ODISHA Lectre Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2 Prepared by, Dr. Sbhend Kmar Rath, BPUT, Odisha. Tring Machine- Miscellany UNIT 2 TURING MACHINE