arxiv: v2 [cs.dc] 2 Apr 2016

Size: px
Start display at page:

Download "arxiv: v2 [cs.dc] 2 Apr 2016"

Transcription

1 Sbgraph Conting: Color Coding Beyond Trees Venkatesan T. Chakaravarthy 1, Michael Kapralov 2, Prakash Mrali 1, Fabrizio Petrini 3, Xiny Qe 3, Yogish Sabharwal 1, and Barch Schieber 3 arxiv: v2 [cs.dc] 2 Apr ,3 IBM Research 1 {vechakra, prakmra, ysabharwal}@in.ibm.com 3 {fpetrin, xqe, sbar}@s.ibm.com 2 EPFL 2 michael.kapralov@epfl.ch Agst 9, 2018 Abstract The problem of conting occrrences of qery graphs in a large data graph, known as sbgraph conting, is fndamental to several domains sch as genomics and social network analysis. Many important special cases (e.g. triangle conting) have received significant attention. Color coding is a very general and powerfl algorithmic techniqe for sbgraph conting. Color coding has been shown to be effective in several applications, bt scalable implementations are only known for the special case of tree qeries (i.e. qeries of treewidth one). In this paper we present the first efficient distribted implementation for color coding that goes beyond tree qeries: or algorithm applies to any qery graph of treewidth 2. Since tree qeries can be solved in time linear in the size of the data graph, or contribtion is the first step into the realm of color coding for qeries that reqire sperlinear rnning time in the worst case. This sperlinear complexity leads to significant load balancing problems on graphs with heavy tailed degree distribtions. Or algorithm strctres the comptation to work arond high degree nodes in the data graph, and achieves very good rntime and scalability on a diverse collection of data and qery graph pairs as a reslt. We also provide theoretical analysis of or algorithmic techniqes, showing asymptotic improvements in rntime on random graphs with power law degree distribtions, a poplar model for real world graphs. 1 Introdction Graphs serve as common abstractions for real world data, making graph mining primitives a critical tool for analyzing real-world networks. Conting the nmber of occrrences of a qery graph in a large data graph (sbgraph conting, often referred to as motif conting) is an important problem with applications in a variety of domains sch as bioinformatics, social sciences and spam detection (e.g. [8, 10, 23]). Sbgraph conting and its variants have received a lot of attention in the literatre. Sbstantial progress has been achieved for the case of small qeries sch as triangles or 1

2 Figre 1: Illstration of a match (left) and a colorfl match (right) 4-vertex sbgraphs: not only have very efficient algorithms been developed (e.g. [15, 20, 27, 31]), bt also theoretical explanation of their performance on poplar graph models has been obtained (see [?] and references therein). Some of the recent work has addressed larger qeries [29, 30,?, 26, 7], bt or nderstanding here is far from complete. Even for reasonably large graphs (a million edges) and small qeries (e.g. 5-cycles), the nmber of soltions tend to be enormos, rnning into billions. This explosion in the search space makes the sbgraph conting problem very hard even for moderately large qeries. Theoretically, the fastest known algorithm for conting occrrences of a k-vertex sbgraph in an n-vertex data graph rns in time n ωk/3, where O(n ω ) is the time complexity of matrix mltiplication (crrently ω 2.38). This improves pon the trivial algorithm with rntime n k, bt is prohibitively expensive even for moderate size qeries. To address the above isse, Alon et al. [2] proposed the color coding techniqe. Here, given a k-node qery, we assign random colors between 1 and k to the vertices of the data graph, and cont the nmber of occrrences of the qery that are colorfl, meaning the vertices matched to the qery have distinct colors. See Figre 1. The cont is scaled p appropriately to get an estimate on the actal nmber of occrances. The accracy is then improved by repeating the process over mltiple random colorings and taking the average. Restricting the search to colorfl matches leads to prning of the search space and improved efficiency. Using this method, Alon et al. obtained faster algorithms for cetain qeries sch as paths, cycles, trees and bonded treewidth graphs. The power of color coding as a very general conting techniqe together with the importance of sbgraph conting in varios applications (as mentioned above) makes it important to design practically efficient and scalable implementations. In a different work, Alon et al. [1] applied the color coding techniqe for conting the occrrences of treelets (tree qeries) in biological networks. Color coding allowed them to handle tree qeries p to size 10 in protein interaction networks, extending beyond the reach of previosly known approaches [25, 18, 17]. Recently, Slota and Maddri [28, 30] presented FASCIA, an efficient and scalable distribted implementation of sbgraph conting (via color coding), again for the case of treelet qeries. However, despite considerable interest in non-tree qeries from several application domains (see the experimental section for details), the technqe has not been explored for more general settings. In this work we present the first efficient distribted implementation of color coding beyond tree qeries. 2

3 As part of their original color coding soltion, Alon et al. [2] presented faster algorithms for certain special classes of qeries. They showed that if the qery is a tree, then colorfl sbgraph conting can be solved in time O(2 k m), i.e. in time linear in the size of the data graph. They extended the algorithm to show that if the qery is close to a tree, specifically has (small) treewidth t, a rnning time of O(2 k n t+1 ) can be achieved. Treewidth [9] is a widely adopted measre of the intrinsic complexity of a graph. Intitively, it measres how close the topology of a given graph is to being a tree: tree qeries have treewidth 1, and a cycle is the simplest example of a treewidth 2 qery. The above algorithm, restricted to trees, forms the basis for the previosly-mentioned treelet conting implementations [28, 30, 1]. While the rntime of the above algorithm is linear for the case of trees (i.e. acyclic qeries), it becomes at least qadratic for qery graphs of treewidth 2 and beyond. This phenomenon also manifests itself in practice: on real world graphs with even moderately skewed degree distribtion load imbalance is observed and the rnning time tends to have qadratic dependence on the maximm degree of the graph. Ths, even triangles (the smallest cyclic qery) are harder to handle, and have received considerable attention from the research commnity (as mentioned earlier). The goal of this paper is to stdy the colorfl sbgraph conting problem on qeries of treewidth 2, taking the first step in the realm of color coding with cyclic qeries. The class of qeries of treewidth 2 is qite rich. In particlar, it contains all trees, cycles, series-parallel graphs and beyond. Figre 8 shows treewidth 2 qeries (sed in or experimental evalation) drawn from real-world stdies on biological, social and collaboration networks [22, 32, 4]. To the best of or knowledge, the previosly-mentioned algorithm [1] is the best known algorithm for treewidth 2 qeries, and we se it as or baseline. We rephrase this algorithm within or framework and devise a distribted implementation. The rephrased algorithm becomes a recrsive procedre that decomposes the qery into simpler path sbqeries, which are then solved to get the overall cont. We ths refer to or baseline as the Path Splitting algorithm (PS). Or Contribtions 1. Bilding on the PS algorithm, we develop novel strategies that lead to significant performance gains in terms of rntime, scalability, and the size of graphs and qeries handled. 2. Or algorithm works by decomposing the qery to cycles and leaves, thereby redcing the problem of colorfl sbgraph conting on treewidth 2 qeries to conting (annotated) cycles. 3. The decomposition in terms of cycles enables s to exploit the so-called degree ordering approach (e.g., MINBUCKET algorithm for triangle enmeration [?]) Specifically, we show how to force the comptation process to (mostly) work arond high degree vertices, leading to sbstantial speedps and scalability gains. 4. We present a detailed experimental evalation of the algorithms on real-world graphs having more than million edges and real-world qeries of size p to 10 nodes. The reslts show that or strategies offer improvements of p to 28x in terms of rnning time and 3

4 exhibit improved scalability. 5. Finally, we complement or experimental evaltation by a theoretical analysis of the rntime of or degree ordering approach for cycle qeries, on a poplar class of random power law graphs (Chng-L graphs [14]). Or analysis provides jstification for empirically observed performance gains of the approach. Related Work Sbgraph conting has received significant attention in the fields of comptational biology [25, 18, 17] and social network analysis [21, 13, 27,?, 20]. We give an overview of prior work on the problem (both theoretical and empirical) as well as techniqes for making sbgraph conting scalable, and explain how or contribtions relate to this prior work. Color Coding and Approximate Sbgraph Conting: Color coding was introdced in an inflential paper by Alon et al. [2] as a fast algorithm for finding occrrences of a qery in a data graph and conting the nmber of sch occrrences. In a different work, Alon et al. [1] explored its applications to approximate sbgraph conting (most commonly known as motif conting) in comptational biology. They were motivated by the fact that sbgraph conting is an important primitive for characterizing biological networks [?]. Color coding allowed Alon et al. to cont occrrences of treelets (tree qeries) p to size 10 in protein interaction networks, extending beyond the reach of previosly known approaches [25, 18, 17]. A scalable distribted implementation of color coding for trees has been reported by Slota and Maddri [29, 30], bt no principled soltions beyond tree qeries are known. ParSE [33] extends beyond tree qeries, by considering qery graphs that can partitioned into sbtemplates via edge cts of size 1. However, the only class of qery graphs that can be perfectly partitioned sing this method is trees; ParSE resorts to brte force enmeration for other cases. Or work provides the first principled approach to implementing color coding in a scalable way beyond trees qeries. Frther, or analysis of the rntime of or cycle conting sbrotine on a random graphs with a power law degree distribtion provides a theoretical jstification of or algorithmic techniqes. While or work and the above-mentioned prior work [1, 29, 30] cont non-indced sgraphs, some other prior work [25, 18, 17] addressed the case of conting indced sbgraphs. The search space of non-indced sbgraphs is larger and frthermore, these conts are more robst with respect to pertrbations of the data graph [1]. Degree Based Approaches: Designing scalable sbgraph conting algorithms trns ot to be hard even for the simple case of triangle conting. A naive approach lets each vertex enmerate pairs of neighbors and check if they are connected. This leads to wastefl comptations and also rns into load balancing isses on graphs with heavy tailed degree distribtions [31]. The above isse has been addressed sing a simple, bt efficient soltion (referred to as the MINBUCKET algorithm [15, 31]): each vertex enmerates pairs of neighbors with degree no smaller than its own (with arbitrary tie breaking) and checks they are connected. It is not hard to see that this gives a correct cont, and it has been empirically observed that this algorithm does not rn into load balancing isses even on heavy tailed graphs [31]. The MINBUCKET heristic has also been shown to give polynomial rntime improvement over the naive method when the inpt is a random graph with a power law degree distribtion [?]. A recent work by Jha et al. [20] applies the 4

5 degree based techniqe for cotning 4-vertex qeires. There are a few prior approaches for arbitrary qeries of [7, 3, 26], bt algorithms do not se degree information, and are comparable to the baseline algorithm sed in or stdy. To the best of or knowledge, prior to or work there has not been a systematic stdy of how MINBUCKET generalizes to larger sbgraph conting problems. In this work we generalize the method for conting occrrences of treewidth 2 graphs, perform a thorogh experimental evalation and provide a theoretical rntime analysis of or techniqe in the random power law graph model. Or paper improves pon prior work along three axes: generality of qeries handled, scalability of the proposed soltion and theoretical analysis of the main algorithmic primitive on a class of graphs often sed to model real world networks. 2 Preliminaries Sbgraph conting problem. The sbgraph conting problem is defined as follows. The inpt consists of a qery graph Q = (V Q, E Q ) over a set of k nodes and a data graph G = (V G, E G ) over a set of n vertices and m edges. The task is to cont the nmber of (not necessarily indced) sbgraphs of G that are isomorphic to Q. Formally, cont the nmber of injective mappings π : V Q V G sch that for any pair of qery nodes a, b V Q, if q 1, q 2 E Q, then π(q 1 ), π(q 2 ) E G. We refer to sch mappings π as matches. Color coding and colorfl matches. A coloring is a fnction χ : V G {1, 2,..., k}, where for every vertex V G, χ() denotes its color. A match π from V Q to V G is colorfl if a V Q χ(π(a)) = the vertices of Q are mapped to k distinctly colored vertices in G, i.e. {1, 2, 3,..., k}. The main idea is that instead of conting all possible matches of the k vertices of the qery graph to the vertices of the data graph, one first colors the vertices of the data graph niformly at random sing k colors, and then searches for colorfl matches. Colorfl sbgraph conting problem. In the colorfl sbgraph conting problem the task is to cont the nmber of colorfl matches of the qery Q in V G. Or setting conts the nmber of colorfl matches or mappings from Q to the data vertices. Alternatively, we may want to cont the nmber of colorfl sbgraphs that are isomorphic to Q. The latter qantity can be obtained by dividing the former by at(q), the nmber of atomorphisms of Q. While it is comptationally hard to compte at(q) for an arbitrary qery graph, the qantity can be compted qickly for qeries of relatively small size (say abot 10 nodes). Given the above discssion, we focs on conting the nmber of colorfl matches. Treewidth. Intitively, if the qery graph Q = (V Q, E Q ) has treewidth t then Q can be decomposed into sbgraphs Q 1, Q 2,... sch that each sbgraph Q i is also of treewidth t, and each Q i has no more than t nodes that belong also to other sbgraphs. We call sch nodes the bondary nodes of Q i. In addition, the total nmber of distinct bondary nodes in all sbgraphs Q 1, Q 2,... is at most t + 1. Note that the decomposition can be done recrsively as each Q i has treewidth t, ntil we are left only with sbgraphs that have at most t + 1 nodes. This reslts in a treewidth decomposition tree denoted T Q. A formal definition is givne below. A tree decompsition of a qery Q is a tree T = (V T, E T ), wherein each node p V T is 5

6 associated with a sbset of qery nodes S(p) V Q, called pieces, sch that the following properties are tre: (i) for every qery edge (a, b) E Q, there exists a piece S(p) (for some p V T ) that contains both a and b; (ii) for every qery node a V Q, the set of nodes whose pieces contain a indce a connected sbtree. Alternatively, the second property states that if a belongs to pieces S(p 1 ) and S(p 2 ) for some p 1 and p 2, then a mst also belong to the piece S(p) for any node p fond on the (niqe) path connecting p 1 and p 2 in T. The width of the tree decomposition is the maximm cardinality overall pieces mins one, i.e., max p S(p) 1. The treewidth t of the qery is the minimm width over all its tree decompositions. Approximate sbgraph conting via color coding. Conting the nmber of colorfl matches trns ot to be easier than conting the actal (not necessarily colorfl) matches. The price to pay is that the algorithm is randomized. We color the graph randomly and obtain the nmber of colorfl matches, and repeat the process independently at random a few times. Then, an estimate for the nmber of matches (occrances of the qery) can be obtained by taking the average. For a given inpt graph G and qery Q let n(g, Q) denote the nmber of matches π from Q to G. For a (random) coloring χ of vertices of G let n colorfl (G, Q, χ) denote the nmber of colorfl matches of Q to G nder coloring χ. It was shown [2, 1] that with proper normalization the colorfl cont n colorfl (G, Q, χ) is an nbiased estimator of the actal cont. Specifically, the right normalization factor is k k /k!, i.e. we have (k k /k!) E χ [n colorfl (G, Q, χ)] = n(g, Q). The variance of the estimator can also be bonded (see [1], section 2.1). Ths, taking the average of n colorfl (G, Q, χ) nder a few independently chosen colorings χ converges to the right answer, i.e. n(g, Q). Ths, in order to obtain an approximate sbgraph conting algorithm it sffices to solve the colorfl sbgraph conting problem. The rest of the paper is devoted to designing a scalable soltion to colorfl sbgraph conting. 3 Overview The work of Alon et al. [2] yields a natral algorithm for the colorfl sbgraph conting problem on bonded treewidth qery graphs. This algorithm is based on the following intition. Sppose that we have fond a colorfl match π for a sbgraph Q of the inpt qery graph Q, and we wish to extend it into a colorfl match π for Q by additionally fixing the mapping of the nodes otside Q. For this we do not need to know the mapping of the non-bondary nodes of Q, since they do not share edges with nodes otside Q. Instead, it sffices to know the mapping of the bondary nodes (i.e., the nodes that share edges with nodes otside Q) and the set of colors sed by π. The mapping of the bondary nodes is needed to ensre that for any edge from a bondary node to otside, the corresponding data vertices share an edge in the data graph; and the set of colors is needed to avoid repeating a color already sed by π. Analogosly, in the setting of conting, in order to cont the nmber of colorfl matches for Q, we do not need a complete listing of colorfl matches of Q. Instead, we can grop the colorfl matches based on the set of colors sed and the mappings for the bondary nodes and it sffices to know the cont per grop. Based on the above intition, we apply dynamic programming to cont the nmber 6

7 colorfl matches of Q. Let T Q be the tree decomposition of Q with treewith t. The algorithm processes T Q in a bottom-p manner and a creates a hash table (that we call a projection table) for each tree node. The sbgraph graph Q associated with a node has at most t bondary nodes and these nodes can be mapped to the data vertices in at most n t ways. In addition, we need to record the colors of the data vertices to which the nodes of Q are mapped. Since we focs on colorfl matches, the set of colors sed (that we call signatre ) can be at most ( k t) 2 k (where k is the size of the qery graph). For each combination of mappings to the bondary nodes and the signatre, we record the nmber of colorfl matches of Q consistent with the combination. The nmber of entries in the table is at most n t 2 k. The projection table for a tree node can be compted from those of its children. We get the total nmber of colorfl matches by performing an aggregation on the projection table of the root node. Working in the realm of motif conting, Slota and Maddri [30] described an efficient distribted implementation of the above algorithm for the case of tree qeries and presented an experimental evalation. Trees have treewidth one hence, the size of projection tables is linear in the nmber of vertices and the overall comptation can be carried ot in time linear in the graph size. Or goal is to address a more general class of qeries (beyond trees) in a distribted setting and we focs on the case of qeries of treewidth 2. Treewidth 2 qeries are more challenging since in the worst case, the tables can be of size qadratic in the nmber vertices and the comptation time also gets qadratic. The constrcion of or algorithm is motivated by the fact that real life data graphs tend to exhibit variations in the degree distribtion. A naive implementation that treats all data vertices in the same manner wold reslt in a lot of entries in the projection tables of the high degree vertices that do not lead to colorfl matches for the overall inpt qery. Moreover, in a distribted setting the processors owning sch vertices perform more comptation leading to load imbalance. Or algorithm is based on a crcial observation that any treewidth 2 qery can be recrsively decomposed into (annotated) cycles or leaves. The core component of the algorithm is an efficient procedre for handling cycles that employs a strategy based on degree based ordering of vertices. This leads to redction in wastefl comptation, as well as improved load balancing. The procedre is inspired by a similar strategy sed in prior work [?] for handling triangles. The overall algorithm ses the above decomposition and the improved procedre for handling cycles. 4 Overall Algorithm In this section we describe the overall strctre of or sbgraph conting algorithm that proceeds in two steps. In the first step, we decompose the qery into cycles and leaves (called blocks) and constrct a decomposition tree for the inpt qery Q which is essentially a careflly chosen treewidth decomposition tree; each node of the tree represents a block and encodes a convenient sbqery. This step is independent of the data graph and can be viewed as a preprocessing phase for the qery. Then in the second step we traverse the tree in a bottom p manner, performing primitive conting operations over the data graph prescribed by the internal nodes and combining the reslts. The final cont is prodced 7

8 by the root of the tree. 4.1 Decomposition Tree For an inpt qery graph Q = (V Q, E Q ), constrct the decomposition tree T (Q) by iteratively applying one of two primitive operations: contraction of a leaf edge or a cycle. As these operations are applied the nmber of nodes in the qery Q decreases. At the same time new edges may appear in Q to represent contracted strctres, and edges as well as nodes may get annotated with the identity of the contracted strctres that they represent. Before defining the tree constrction algorithm we need to introdce two definitions. First, we say that a cycle C in Q is contractible if (a) C = (a 0, a 1,..., a L 1 ) is indced (i.e. there are no edges between nodes a 0, a 1,..., a L 1 except the edges of C) and (b) cycle C has most two bondary nodes (i.e., nodes that share edges with nodes otside of C). Second, a leaf edge is an edge L = (a, b), where b is a leaf node (has degree one); a is called the bondary node of the leaf edge. We se the common term block to refer to leaf edges and contractible cycles. For example, consider the qery named Satellite in Fig 2. The cycle (i, j, k) is contractible with a single bondary node i, the cycle (a, b, c, d, e) is contractible with two bondary nodes a and c, and (f, h) is a leaf edge. The cycle (i, f, g) is not contractible since it has three bondary nodes. We constrct the decomposition tree T (Q) starting with an empty tree. The tree is bilt bottom-p starting from the leaf level and hence, the strctre may be a forest with mltiple roots in the intermediate stages. Each iteration adds a new node and may make some of the existing roots as its children, clminating in a tree. In the constrction process we iteratively perform the following operations ntil Q contains a single node: find a block B (a leaf edge or a contractible cycle) in Q and remove it from Q (while possibly adding an edge to Q), and add a corresponding node to T (Q). We iterate ntil Q contains a single node. We distingish 3 cases. Case 1: B is a contractible cycle C with exactly one bondary node a V Q : Remove the nodes and edges of C from Q, except for node a. Erase any annotation fond on a in Q and annotate it with the block name B. Case 2: B is a contractible cycle C with two bondary nodes a, b V Q : Remove the nodes and edges of C from Q, except for the nodes a and b. Add an edge (a, b) in Q and annotate it with B. Erase any annotation fond on a and b in Q. Case 3: B is a leaf edge L = (a, b): Remove b and the edge from Q. Erase any annotation fond on node a Q and annotate it with the block name B. The nodes and edges of B inherit the annotations from Q, as they were before Q was transformed (this ensres that the annotations on the bondary nodes that got erased get captred by the new annotation). Next we add a new node B to the tree T (Q). If any node or edge in B has an annotation B, make B a child of B in T (Q). This completes the constrction of T. We show below that the process can find a block in each iteration and terminate sccessflly on every qery of treewidth 2. Assming termination, it is not difficlt to see that the process prodces a tree. Dring contraction, every block B annotates a particlar node or an edge of Q, recording the way in which it has been contracted. The annotation gets inherited by 8

9 Figre 2: Illstration of the decomposition process. the top row shows the seqence of qeries considered in the process (the original qery is on the left), the bottom row shows the blocks that were contracted in each step. some other block B in a sbseqent iteration. The block B becomes the parent of B. The annotation is erased in Q, ensring that no other block becomes a parent of B. Taking Satellite as the inpt qery Q, Figre 2 provides an illstration process, along with the otpt decomposition tree. The bottom row shows the blocks being contracted and the top row shows the transformed Q. The first iteration contracts the cycle B 1 = (a, b, c, d, e). A new edge (a, c) is added to Q, along with the annotation B 1, and B 1 is added to the tree. The second iteration contracts the leaf block B 2 = (f, h). Node f is annotated as B 2 and the B 2 is added to the tree. The third iteration contracts B 3 = (a, f, g, c), by adding an edge (f, g) with the annotation B 3. The block is added to the tree and it is made the parent of B 1 and B 2. In the forth iteration, the cycle B 4 = (i, j, k) is contracted. Node i gets annotated as B 4 and B 4 is added to the tree. Finally, the qery Q 4 is contracted leaving Q empty. We add Q 4 as the root of the tree, making it the parent of B 3 and B 4. The following lemma garantees that for any treewidth 2 qery Q, the tree constrction procedre will always find a block (a leaf edge or a contractible cycle) in each iteration and terminate sccessflly. The proof relies on prior work on nested ear decompositions of treewidth 2 qeries [16]. 9

10 Lemma 4.1 (i) Any treewidth 2 qery Q contains a block; (ii) the transformed qery reslting from the contraction process is also a treewidth 2 qery. Proof: We first prove part (ii) of the lemma. If the contracted block has one bondary node then no new edges are added to Q, in which case the tree T Q for the pdated Q is given by deleting all the nodes not in the pdated V Q from the sbsets S Q (t). If the contracted block has two bondary nodes a and b then the edge (a, b) is added to Q. In this case we get the tree for the pdated Q by replacing each occrrence of the nodes not in the pdated V Q by b. Note that the size of each sbset is still at most 3, nodes associated with sbsets that contain b form a connected component, and for at least one sbset S Q (t), {a, b} S Q (t). We now prove part (i). First, Root the tree T Q at an arbitrary non-leaf node. This indces an ancestor-descendant relationship on the nodes in V T. Note that if there are two nodes {t, t } V T, sch that S Q (t ) S Q (t), node t can be omitted and all its children connected to t. Ths from now on we assme that no sbset S Q (t) is contained (or identical) to another sbset. We need the following definition and claim. Definition 4.1 For a node t V T, let Q t be the sbgraph of Q indced by the nodes that are in the nion of the sbsets associated with the nodes of T Q in the sbtree rooted at t. Claim 4.1 For every node t V T, either Q t contains a block, or Q t is a path whose endpoints are in the sbset associated with the parent of t (if sch exists). Before proving the claim we show how it implies the lemma. Since the claim holds also for the root of T Q then either Q contains a block or it is a path in which case it also contains a leaf block. Proof of Claim 4.1: We prove the claim by indction. The base of the indction is a leaf node. Consider a leaf node t V T. There are two possibilities: (i) S Q (t) = {x, y}, and (ii) S Q (t) = {x, y, z}. If S Q (t) = {x, y}, then at least one node, say y, is only connected to x and ths (x, y) is a leaf edge. If S Q (t) = {x, y, z}, then consider the sbgraph indced by {x, y, z}. If this sbgraph is a triangle then it mst be a contractible cycle. The only remaining case is the sbgraph indced by {x, y, z} forms a path. Assme that the endpoints of this path are x and z. If one of these endpoints, say z, is not in the sbset associated with the parent of t then (y, z) is a leaf edge. Otherwise, let t be the parent of t, we have S Q (t) S Q (t ) = {x, z}. For the indctive step consider a non-leaf node t V T. If Q t for any child t of t contains a block then we are done. Assme that this is not the case. Consider first the case that t has a single child t. By the indctive hypothesis Q t is a path whose endpoints x and y are in S Q (t). Let S Q (t) = {x, y, z}. If z is connected to both x and y then the cycle closed by z is a contractible cycle. If z is connected to only one endpoint, say y, then we get a path with endpoints x and z. If either x or z are not in the sbset associated with the parent of t, then the missing endpoint is leaf node. If both x and z are in the sbset associated with the parent of t then the indctive claim follows. Next, Consider the case that t has several children. If two of the children of t, say t and t, share endpoints then the cycle formed by Q t and Q t is contractible. Otherwise, t mst have exactly two children, say t and t, with endpoint {x, y} and {y, z}, forming a path with endpoints x and z. If z is connected also to x then the cycle closed by the edge (x, z) 10

11 is a contractible cycle. If either x or z are not in the sbset associated with the parent of t, then the missing endpoint is a leaf node. If both x and z are in the sbset associated with the parent of t then the indctive claim follows. An inpt qery may admit mltiple decomposition trees and the choice of the tree inflences the performance of or algorithm. In Section 6, we present a heristic for finding a good decomposition. Each node of the tree represents a block and it will be convenient to view to the node simply as the block represented by it. At this point, it is interesting to consider tree qeries stdied by Slota and Maddri [30]. Given a tree qery, their algorithm fixes a sitable qery node as the root and iteratively processes the tree in a bottom-p manner. The algorithm implicitly ses a decomposition tree. However, since trees do not have cycles, the decomposition tree consists of only leaf edge blocks. In contrast, the decomposition trees of treewidth two qeries involve the more challenging case of cycles as well. 4.2 Tree Traversal Here, we describe the second step of the algorithm that traverses the decomposition tree in a bottom-p manner and comptes the nmber of colorfl matches of the blocks in the data graph. For this prpose, we define the notion of sbqeries represented by blocks. A sbqery Q of the inpt qery Q refers to any indced sbgraph of Q. Consider a block B and let U be the nion of nodes fond in the block B and its descendant blocks in the tree. The sbqery represented by B, denoted SQ(B), refers to the sbqery indced by U. For example, Figre 2 shows the sbqery represented by the block B 4. The decomposition tree yields a nested hierarchy of sbqeries: the root block represents the whole inpt qery and for any block B with the parent B, the sbqery SQ(B) is contained within SQ(B ). Let B be a block. A node a SQ(B) is said to be a bondary node, if a shares an edge with a node otside SQ(B). It is not hard to see that these bondary nodes are the same as the bondary nodes of B (identified dring the tree constrction process). Ths, SQ(B) can have at most two bondary nodes. Before describing the conting algorithm we extend the notion of colorfl matches to sbqeries: a colorfl match for a sbqery Q = (V Q, E Q ) is an injective mapping π : V Q V G, sch that for any edge (a, b) E Q, (π(a), π(b)) E G, and the vertices of Q are mapped to distinctly colored vertices of G. The algorithm traverses the tree in a bottom-p manner. For each block B, it otpts a sccinct synopsis of the set of colorfl matches of the sbqery SQ(B), sing a projection table and signatre (as otlined in Section 3). that we now define precisely. Signatre: Let K = {1, 2,..., k} denote the set of colors sed in the data graph, where k is the size of the inpt qery Q. The term signatre refers to any sbset α K. For a sbqery Q and a colorfl match π of Q, the signatre of π refers to the set of colors of the data vertices sed by π and it is denoted sig(π), i.e., sig(π) = a Q {χ(π(a))}. Projection Tables: Let Q be sbqery with two bondary nodes a and b. For a pair of data vertices and v and a signatre α K let cnt(, v, α Q) denote the nmber of colorfl matches of Q wherein the bondary nodes a and b are mapped to and v and the 11

12 Overall Algorithm 1. Compte a decomposition tree T (Q) for the inpt qery Q. 2. Traverse the tree bottom-p. For each non-root block B: Use the projection tables of the children blocks of B and compte the projection table for B 3. Otpt the nmber of colorfl mathes of the sbqery represented by the root-block. Figre 3: Overall Algorithm signatre of π is α: cnt(, v, α Q) = {π Π : π(a) = and π(b) = v and sig(π) = α}, where Π is the set of all the colorfl matches of Q. These conts can be conveniently represented in the form a hash table with (, v, α) forming the key and the cont forming the vale. We refer to any encoding of the above conts (sch as the hash table above) as the projection table of Q. In the worst case, the table may have size qadratic in the inpt data graph. However, a significant fraction of the triplets will have a cont of zero and we maintain only the non-zero conts. The projection table for sbqeries having a single bondary node a is defined in a similar manner. For a data vertex and a signatre α K, define cnt(, s Q) = {π Π : π(q) = and sig(π) = α}. 4.3 Compting the Conts Given a decomposition tree, the algorithm works based on the fact that the projection table for a block can be compted by joining the projection table of its children blocks. As an illstration of the idea, consider the block B 3 having bondary nodes f and g, and the sbqery represented by it (Figre 2). For a pair of vertices and v, and a signatre α, the projection cont cnt(, v, α B 3 ) can be compted as follows. The block consists of the path (a, f, g, c), and any match π for the sbqery mst map these nodes to vertices (x,, v, y) that form a path in the data graph. The block is annotated by its children blocks B 1 with bondary nodes a and c, and B 2 with bondary node f. Any pair of matches π 1 and π 2 for SQ(B 1 ) and SQ(B 2 ) can be extended as matches for SQ(B 3 ), as long as their signatres α 1 and α 2 are disjoint (since the blocks do not share any node) and are contained within α. Therefore, we can derive the desired cont by performing the following aggregation over all qadrples (x, y, α 1, α 2 ) satisfying the properties: (x,, v, y) forms a path in the data graph; α 1, α 2 α; (α 1 α 2 ) is empty. The aggregation is: cnt(, v, α B 3 ) = cnt(x, y, α 1 B 1 ) cnt(, α 2 B 2 ). x,y α 1,α 2 12

13 We can express the projection conts for any block in the above manner. However, as the nmber of children increases, the cartesian prodct involved in the aggregation wold be prohibitively expensive. Or procedres efficiently simlate the aggregation by performing a seqence of join operations involving the projection tables of children blocks. Given a decomposition tree, the algorithm traverses the decomposition tree in a bottomp manner, compting the projection tables for all the blocks and clminates in the rootblock representing the whole inpt qery. At this step, instead of prodcing a projection table, the algorithm simply comptes the nmber of colorfl matches. The psedo-code is shown in Figre 3. 5 Solving Blocks The main step of the algorithm is the constrction of the projection tables of a block from its children blocks. In this section we develop efficient procedres for handling cycles. For the sake of highlighting the main ideas, we first focs on the case of cycles fond at a leaf level of the decomposition tree (sch as the cycle B 1 in Figre 2); these cycles do not have other blocks annotating them. General cycles are handled by extending these ideas as discssed later. 5.1 Solving Cycles at the Leaf Level Consider a cycle block C = (a 0,..., a L 1 ) of length L withot annotations. The cycle may have at most two bondary nodes. We discss the more interesting case where the nmber of bondary nodes is exactly two; the other cases are handled in a similar fashion. Let the two bondary nodes of the cycle be a p and a q, for some 0 p, q L 1. We present two procedres for compting the projection table of C: a baseline procedre that ses a path splitting strategy and an efficient procedre gided by a degree based ordering of vertices. Path Splitting Algorithm (PS). For two nodes a s and a t on the cycle, let P s,t + and P s,t be the paths obtained by traversing the cycle from a s to a t in the clockwise and conter-clockwise directions, respectively, i.e., P s,t + = (a s, a s 1,..., a t ) and Ps,t = (a s, a s 1,..., a t ), where and refer to addition and sbtraction modlo L. Let cnt(,, P s,t + ) denote the projection conts for path P s,t + taking a s and a t as the bondary nodes. Namely, for a triple (, v, α), let cnt(, v, α P s,t + ) denote the nmber of colorfl matches for P s,t + wherein π(a s) =, π(a t ) = v and sig(π) = α. A similar notion is defined for the paths Ps,t. The procedre splits the cycle into two paths along the bondary nodes, given by P p,q + and Pp,q; we refer to these special paths as P + and P. See Fig 5 (a) for an illstration. The projection table for P + is constrcted iteratively, by bilding the tables for the paths P p,j +, for each node a j fond along the path. This is accomplished by extending the projection table for the prior path P p,j 1 + via a join with the edges of the data graph. The psedocode is given in Figre 4 (Procedre 1). We assme that all the conts are initialized to zero. The first iteration is handled by directly reading the edges of the data graph. In the sbseqent iterations, we extend every triple (, v, α) with non-zero cont cnt(, v, s P p,j 1 + ), with any edge (v, w), provided the reslting match is colorfl. The 13

14 Procedre 1: Compting Projection Table for P + For each edge (, v) in the data graph G cnt(, v, α P p,p 1 + ) 1, where α = {χ(), χ(v)}. For j = p 2, p 3,..., q For each triple (, v, α) with cnt(, v, α P + p,j 1 ) 0 For each edge (v, w) in G sch that χ(w) α do: Let α = α {χ(w)}. Increment cnt(, w, α P + p,j ) by cnt(, v, α P + p,j 1 ). Procedre 2: Compting Projection Table for C For each entry (, v, α 1 ) with cnt(, v, α 1 P + ) 0 For each entry (, v, α 2 ) with cnt(, v, α 2 P ) 0 If α 1 α 2 = {χ(), χ(v)} α α 1 α 2 val 1 cnt(, v, α 1 P + ); val 2 cnt(, v, α 2 P ) Increment cnt(, v, α C) by val 1 val 2. Figre 4: PS Algorithm conts for P are constrcted analogosly. Finally, the projection table for the cycle C is obtained by joining the conts of P + and P, as shown in Procedre 2. Here, a pair of triples (, v, α 1 ) and (, v, α 2 ) are joined, if the reslting match is colorfl. Discssion of baseline. As discssed below (Section 5.2), the PS procedre can be extended to handle general cycles with annotations, and yields an algorithm for handling treewidth 2 qeries. The resltant PS algorithm is eqivalent to the original color coding algorithm of Alon et al. [2]. Prior work [30, 1] on colorfl sbgraph conting tilize the algorithm of Alon et al. as the basis for conting tree qeries (treelets). We developed a distribted implementation of the PS algorithm, and se it as the baseline in or experimental stdy. Known techniqes for sbgraph conting with large qeries (e.g. [7, 26]) employ similar graph traversal techniqes, making PS consistent with the state of the art for sbgraph conting as well as color coding. We develop an procedre, called Degree Based (DB) algorithm, that otperforms the PS algorithm for practical graphs and qeries. It is motivated by the following observations. First, the paths P + and P may have neven lengths (for instance, in Figre 5), P + = 6 and P = 2) and the processing of the longer path dominates the overall rnning time. Second, in real-graphs with skewed degree distribtions, high degree vertices tend to have more paths passing throgh them, which poplate the projection tables of P + and P. However, significant fraction of these paths do not find appropriate conterparts in the other table to complete a match, leading to wastefl comptations. Third, in a distribted setting, the above phenomenon manifests as higher load on processors owning high degree vertices, leading to load imbalance. It is not difficlt to address the first isse alone. The only intricacy is that when the paths are split evenly, the bondary nodes may appear internally on the the paths (see Figre 5 with a split across nodes denoted h and d). This can be handled by recording the mapping for the bondary nodes as part of the projection conts. We implemented the above algorithm as well and noticed that the isse of wastefl comptations and load im- 14

15 Figre 5: PS and DB Illstrations. balance still persists. And frthermore, performance of the PS algorithm and the modified implementations does not differ significantly on or benchmark graphs and qeries. Degree Based Algorithm (DB). The DB algorithm addresses all the three isses by sing the strategy of bilding the paths from high degree vertices. Arrange the data vertices in the increasing order of their degree; if two vertices have the same degree, the tie is broken arbitrarily, say by placing the vertex having the least id first. We say that a vertex is higher than a vertex v, if appears after v in the above ordering and this is denoted v. Consider the inpt cycle C = (a 0, a 1,..., a L 1 ) with bondary nodes a p and a q and let π be a colorfl match for C that maps the above nodes to data vertices 0, 1,..., L 1, respectively. Among these data vertices, let j be the highest vertex. We refer to the corresponding node a j as the highest node of π. The idea is to partition the set of colorfl matches into L grops based on their highest node a h and compte the projection table for each grop separately. For a pair of data vertices and v, and a signatre α, let cnt(, v, α C, hi = h) denote the nmber of colorfl matches of π for C, wherein π(a p ) =, π(a q ) = v, sig(π) = α and a h is the highest node of π. The projection table for C can be obtained by aggregating the above conts: for any triple (, v, α), cnt(, v, α C) = L 1 cnt(, v, α C, hi = h). (1) h=0 We next describe an efficient procedre for compting the conts cnt(, v, α C, hi = h). The concept of high starting matches plays a crcial role in the procedre. Let a d be the node diagonally opposite to a h on the cycle, i.e., d = h L/2. The procedre splits the cycles into two paths P + h,d and P h,d ; Figre 5 (b) shows the paths for two sample vales of h. Let a j be a node fond on the path P + h,d, A colorfl match π for P + h,j is said to be high-starting, if the data vertex π(a h ) is higher than all the other data vertices sed by π, 15

16 Procedre 1: Compte cnt (, v, α P + h,d ) For each edge (, v) in the data graph G with v cnt (, v, α P + h,h 1 ) 1, where α = {χ(), χ(v)}. For j = h 2, a 3,..., d For each triple (, v, α) with cnt (, v, α P + h,j 1 ) 0 For each edge (v, w) in G s.t. w and χ(w) α: Let α = α {χ(w)}. Incr. cnt (, w, α P + h,j ) by cnt (, v, α P + h,j 1 ). Procedre 2: Compte cnt (x, y, α C, hi = h) for Config. (A) For each entry (, v, x, α 1 ) with cnt (, v, x, α 1 P + h,d ) 0 For each entry (, v, y, α 2 ) with cnt (, v, y, α 2 P h,d ) 0 If α 1 α 2 = {χ(), χ(v)} α α 1 α 2 val 1 cnt (, v, x, α 1 P + h,d ); val 2 cnt (, v, y, α 2 P h,d ) Incr. cnt (x, y, α C, hi = h) by val 1 val 2. Figre 6: DB Algorithm i.e., π(a h ) π(a i ), for all nodes a i on the path P + h,j. For a pair of vertices and v, and a signatre α, let cnt (, v, α P + h,j ) denote the nmber of high-starting colorfl matches for the path P + h,j wherein π(a h) =, π(a j ) = v and sig(π) = α. We then cont the high-starting colorfl matches for the two paths, which can be accomplished via edge extensions, as in the PS algorithm. However, the crrent setting offers a crcial advantage: we can dictate that the starting node a h is the highest node, meaning whenever an entry (, v, α) gets extended by an edge (v, w), we can impose the condition that is higher than w in the degree based ordering. Imposing the condition leads to a significant prning of the tables. The psedo-code is given in Figre 6 (Procedre 1). While the degree based strategy is more efficient, we need to address an intricacy regarding the projection aspects. In contrast to the PS algorithm, the DB algorithm splits at the highest node and conseqently, the bondary nodes p and q may appear inside the paths. Ths, in order to get the projection conts on p and q, we also need to explicitly record the mappings for the bondary nodes. The two nodes a p and a q may occr on either P + h,d or P h,d. Six different configrations are possible, of which two are shown in Figre 5 (b). In Configration (A), the paths inclde one bondary each, whereas in the second configration, the same path incldes both the bondary nodes. The other for configrations are symmetric: the bondary nodes may swap the paths in which they occr and in Configration (B) can also reverse the order in which they occr. We discss the two configrations shown in the figre; the other configrations are handled in a similar fashion. Consider configration (A). In order to record the mappings of the bondary node a p, we introdce an additional field in the projection conts. For a triple of data vertices, v and x, and a signatre α, let cnt (, v, x, α P + h,d ) denote the nmber of high-starting matches π for P + h,d with π(a h) =, π(a d ) = v, π(a p ) = x and sig(π) = α. These conts 16

17 Compte Projection Table for P + h,d Let B be the block annotating the edge (a h, a h 1 ) cnt (,, P + h,h 1 ) = cnt (,, B) For j = h 1, h 2,..., d Execte NodeJoin(a j ) Execte EdgeJoin(a j ) Execte NodeJoin(a d ) NodeJoin(a j ): If a j is annotated by a block B For each (, v, α 1 ) with cnt (, v, α 1 P + h,j ) 0 For each (v, α 2 ) with cnt(v, α 2 B) 0 If (α 1 α 2 = {χ(v)} α α 1 α 2 val 1 cnt (, v, α 1 P + h,j ); val 2 cnt(v, α 2 B) Incr. cnt (, v, α P + h,j ) by val 1 val2 EdgeJoin(a j ) For each entry cnt (, v, α 1 P + h,j ) 0 For each entry cnt(v, w, α 2 B) 0 and w If (α 1 α 2 = {χ(v)} α α 1 α 2 val 1 cnt (, v, α 1 P + h,j ); val 2 cnt(v, w, α 2 B) Incr. cnt (, w, α P + h,j 1 ) by val 1 val 2 Figre 7: DB Procedre for General Cycle Blocks are compted in a manner similar to the base procedre shown in Figre 6 (Procedre 1); however, when the process enconters the bondary node p (namely, the initialization step or j = p), the mapped vertex (v or w, respectively) is recorded in the additional field. The analogos conts for P can derived in a similar manner. The vale of cnt (, v, α C, hi = h) is obtained by joining the two; see Procedre (2) in Figre 6. Configration (B) is handled in a similar fashion, except that we need two additional fields to record the mappings for both the bondary nodes. Namely, we maintain conts having keys of the form (, v, x, y) representing the mapping of the nodes h, d, q and p to the vertices, v, x and y. Procedre (2) is also adjsted accordingly. Finally, we can get the projection table cnt(, v, α C) via aggregation, as in Eqation Solving General Blocks In this section, we present procedres for handling generic blocks. We first consider the case of cycle blocks with two bondary nodes. Consider a generic cycle C = (a 0, a 1,..., a L 1 ) having two bondary nodes a p and a q, whose nodes and edges may be annotated with other blocks (children of C in the decomposition tree). All these blocks have at most two bondary nodes and these are fond on C. For sch any block B, the sbqery represented by B has the same bondary nodes as that of B. Ths, we can get the projection table for C by joining the projection tables of the 17

18 sbqeries represented by the above blocks, as described below. As before, we consider each possible choice for the highest node a h and split the cycle into two paths P + h,d and P h,d. The path segment P + h,d also represents a sbqery (indced by the nion of the nodes fond in the path and the blocks annotating path). Ths, we can extend the notion of projection tables for these segments as well. The procedre for compting the projection table for P + h,d is similar that the one discssed in previos section (Procedre 1 in Figre 6), and works by extending one edge in each step. However, two aspects need to be addressed. Firstly, in contrast to the prior procedre, the edge being extended may be annotated with a block or n-annoated (and correspond to an original edge fond in inpt qery Q). In the former case, we perform a join operation with the edges of the data graph (as before), whereas in the latter case the join operation involves the projection table of the block B. For the sake of niformity, it will be convenient to view the former edges as blocks as well, denoted B G, and associate with them a projection table derived from the graph edges, as follows. For each edge (, v) G, set cnt(, v, α) as 1, for α = {χ(), χ(v)}; all other entries of the table are set to a cont of zero. The second aspect is that the nodes of the cycles may also be annotated, and these get inclded as part of the seqence of joins being performed. The two aspects are addressed by procedres called NodeJoin and EdgeJoin. The psedo-code is shown in Figre 7. The procedre starts with an initial table representing the first edge (a h, a h 1 ) and performs a seqence of join operation with the blocks annoatating the nodes and edges of the cycle. At this jnctre, two intricacies mst be highlited. Firstly, the endpoint a h and/or a d may be annotated by a block B, which mst be joined by either P + h,d or P + h,d, bt not by both (to avoid doble conting). For this prpose, we adopt the convention that P + h,d and P h,d inclde only the block annotating a d and a h (if fond), respetively. Secondly, for a block with two bondary nodes p and q, the projection table views one of them as the first bondary node and the other as the second (corresponding to the two components of the keys of the form (, v, α)). Ths, the bondary nodes are ordered and the projection tables need not be symmetric: taking q as the first bondary node and p as the second bondary node wold prodce a different bondary tables. However, the bondary tables are transpose of each other (cnt(, v, α) = cnt(v,, α)). Or algorithm maintains both the tables and ses the appropriate one as dictated by the nodes of the cycle. The psedo-code reflects the first aspect, bt, for the sake of clarity, ignores the second. The projection conts obtained by the above process are joined sing a proecre similar to Figre 6, taking into accont the configration in which the bondary nodes occr. These are aggregated over all possible choices of the high node a h. Cycles with a single bondary node are handled in a similar manner by considering each possible choice for the highest node a h and splitting the cycle into two paths P + h,d and P h,d. The setting is simpler with only two configrations possible on how the bondary nodes may appear on the paths: the (single) bondary node may appear in P + or P. Ths, the prior procedres can be applied here as well. The case of leaf blocks are also handled via join operations. Any leaf block (a, b) is processed by joining the projection table for the blocks annotating the nodes a, the edge (a, b) and the node b (if fond). At the end of the traversal process, the root block is solved, which is either a cycle or a 18

19 singleton node. In the former case, the block is treated as a cycle withot bondary nodes. Instead of compting its projection table, we simply cont the nmber of colorfl matches, via a procedre similar to that of two-bondary cycles. In the latter case, we consider the projection table of the block annotating the singleton node and otpt the sm of conts across all entries of the table. The process yields the nmber of colorfl matches of the inpt qery Q. 6 Finding Good Decomposition Trees In each step of the decomposition process, mltiple blocks may be available for contraction. Each seqence of choices leads to a niqe decomposition tree, and hence, mltiple trees are possible for a given qery. For example, the qery brain1 (Figre 8) admits two decomposition trees: (i) contract the 4-cycle first and then the 6-cycle, and (ii) vice versa. We condcted an experimental stdy involving a nmber of real-world data graphs and qeries. For each qery, we enmerated all the possible decomposition trees and evalated the exection time on each graph. We observed a maximm difference of 13x in the exection times of two decomposition trees for the same graph-qery combination. However, we noted that in most cases the optimal tree is independent of the data graph and is mainly determined by the strctre of the qery. These observations show that we need a procedre for selecting a good tree, bt in this process, we need not analyze the large data graph; rather, it sffices to focs on the strctral properties of the small qery graph. Or stdy also showed that the following factors, in the decreasing order of importance, determine the exection time: (i) length of the longest cycle block; (ii) nmber of bondary nodes; (iii) nmber of node/edge annotations. Armed with the above observations, we designed a simple heristic procedre. Enmerate all possible trees for the given qery and pick the best sing the above factors for comparison. In or experimental setting, barring a few exceptions, the heristic picked the optimal tree in majority of the cases and a near-optimal tree for the rest. Since the qeries are of small size (abot 10 nodes), even a seqential implementation of the heristic takes insignificant amont of rnning time. 7 Distribted Implementation In this section, we present a brief sketch of the distribted implementation of the two algorithms, highlighting their main aspects. The distribted implementation consists of three layers. The first layer, called the planner, finds a good decomposition tree for the given qery a fast seqential implementation the heristic discssed in Section 6. The second layer, called the plan solver, takes the data graph and the decomposition tree and implements the PS and DB algorithms presented in Section 5. It accomplishes the above task by sing efficient join rotines spported by the third layer, called engine. The engine has three fnctionalities. The first is to store the data graph in a distribted manner. This is achieved via a 1D decomposition, wherein the vertices are eqally distribted among the processors sing block distribtion, and each vertex is owned by some processor. The second is to maintain projection tables. These tables are of two types: nary projection 19

20 Table 1: Real Data Graphs Graph Domain Nodes Edges Avg Max Deg Deg brightkite Geo loc. 58K 214K condmat Collab. 23K 93K astroph Collab. 18K 198K enron Commn. 36K 180K hepph Citation 34K 421K slashdot Soc. net. 82K 900K epinions Soc. net. 131K 841K orkt Soc. net. 524K 1.3M roadnetca Road net. 2M 2.7M brain Biology 400K 1.1M tables having single-vertex keys of the form (, α) associated with blocks having single bondary nodes; binary projection tables having two-vertex keys of the form (, v, α). The binary tables also have variants involving additional fields for storing the mappings for the bondary vertices. The engine provides a convenient abstraction to the plan solver for all these types of tables. All the tables are maintained as distribted hash tables which se open addressing to resolve collisions. Every entry (, v, α) is stored on the processor owning v; the degree of v is packed as part of the entry for enforcing the degree constraint in the join operations (of the form w in Procedre 1 of Figre 6). Signatres are maintained as bitmaps. The third fnctionality is to spport two types of join operations on the projection tables. The first type of join is sed for extending a path segment an edge; this involves a join with either the graph edges or the projection table of the block annotating the edge. In the former case, the extension of an entry with a key (, v, α) with an edge (v, w) will be performed at the owner of v. The reslt is an entry with a key (, w, α ); this entry is commnicated to the owner of w, where it gets stored. The latter case involves join of two entries with keys (, v, α 1 ) and (v, w, α 2 ). Since the first entry is stored at the owner of v and the second, at the owner of w, a commnication is performed to bring the two entries to a common processor. The second type of join is sed for merging the projection tables of two path segments (for example, Procedre 2 in Figre 6) and it is implemented in a similar way. The two operations are implemented sing a standard sort-merge join procedre with signatre compatibility checks performed via fast bitwise operations. 8 Experimental Stdy We present an extensive experimental evalation of the algorithms presented in the paper. Or experiments inclde a comparison of the algorithms on exection time, strong and weak scaling stdies for or algorithm, and stdies to evalate the qality of or qery plan generation heristic and the efficacy of color coding for treewidth two qeries. 20

21 Figre 8: Real world qeries sed in or stdy. Figre 9: Average exection time (seconds). 8.1 Experimental Setp System. The experiments were condcted on an IBM Ble Gene/Q system [12]. Each BG/Q node has 16 cores and 16 GB memory; mltiple nodes are connected sing a 5D tors interconnect. Or implementation is based on MPI2 with gcc with the nmber of ranks varying from 32 to 512. Each MPI rank was mapped to a single core. The nmber of MPI ranks mapped to a node was adjsted based on the memory reqirements of individal experiments. Graphs. The experiments involved nine real world graphs obtained from the SNAP dataset collection and the hman brain network from the Open Connectome Project (http: //snap.stanford.ed, Or benchmark incldes representative graphs from different domains in SNAP. The graphs and their characteristics are presented in Table 1. We also sed synthetic R-MAT graphs [11], for the prpose of stdying the weak scaling behavior of or algorithms. Qeries. Or qery benchmark consists of the ten real world qeries shown in Figre 8. The qeries were derived from prior network analysis work spanning diverse domains: dros, ecoli1, ecoli2, brain1, brain2, brain3 - biological networks [22, 19]; glet1, glet2 - graphlets [7]; wiki - collaboration networks [32]; yotbe - spam networks [24]. Algorithms. We stdy two algorithms: PS, which serves as the baseline, and or degree based DB algorithm. Recall that PS is eqivalent to the dyamic programming based algorithm of Alon et al. [2]. 21

22 8.2 Graph-Qery Characteristics The characteristics of the inpt graph and qery strongly inflence the rnning time of qery conting algorithms. To obtain an overall characterization of the phenomenon, we measred the exection time of the DB algorithm on each of the 100 real graph and qery combinations sing 512 MPI ranks. Figre 9 shows the average rnning time for each graph across the ten qeries and the average rnning time of each qery across the ten graphs. The wide variations in exection time across graphs and qeries is indicative of their relative difficlty in practice. For example, althogh roadnetca is a larger graph than epinions, the average rnning time of the former is smaller than the latter by an order of magnitde. We can nderstand this behavior by stdying the skew in nderlying degree distribtion. In general, conting colorfl occrrences of a qery on a graph with high skew (indicated by high maximm degree in Table 1) tends to be comptationally expensive. Similarly, the qeries also exhibit large variations in rnning time, ranging from sb-second for yotbe, glet1 and glet2 to more than a minte for brain2 and brain3. These variations can be acconted for by stdying the differences in the size and the sb-strctres of the qeries. We observed that qeries with longer cycles are more challenging. As an extreme case, a 12-vertex complete binary tree qery reqires 2 seconds on average, in contrast to the 10-vertex brain3 qery which reqires nearly 2 mintes on average, exemplifies or observation. 8.3 Performance Comparison of PS and DB Algorithms We stdy the performance of the PS and DB algorithms on 100 graph-qery combinations obtained by selecting a graph from Table 1 and a qery from Figre 8. For or DB algorithm, we sed plans spplied by the heristic described in Section 6. In contrast, for the PS algorithm, we enmerated all the possible plans and obtained the optimal plan. Ths, we compare or algorithm to the best possible scenario for the baseline algorithm. We compte the improvement factor (IF ) of DB over PS as the ratio of the exection time of PS to DB. Figre 10 shows IF at 32 and 512 ranks. The combinations where DB otperforms PS (IF > 1) are highlighted in green. The blank entries represent cases where PS (or DB) did not complete exection, de to lack of available memory. At 32 ranks, we can see that DB otperforms PS on 84% of the graph-qery combinations with IF being as high as 9.1x (average 2.4x). At 512 ranks, DB otperforms the baseline on 89% of the cases, with IF becoming as high as 28.7x (average 5.0x). We can see that the relative performance of the two algorithms is dependent on the graph-qery pair. For instance, the average IF on enron and condmat graphs are 8.4 and 3.1 on 512 ranks, respectively, correlating well with their skew in the degree distribtion (see Table 1). Similarly, the improvement factors is higher on complex qeries sch as brain1 where the average improvement is 13.1x, compared to yotbe where the average improvement is only 4.1x. The phenomenon becomes extreme in the case of road networks that have very low skew and exhibit sb-second average rnning time across qeries. Or DB algorithm scales better than PS, as demonstrated by the increase in IF at higher ranks. For different graph-qery combinations, we compted the ratio of IF at 512 ranks to that of 32 ranks and fond that IF increases by a factor of p to 4.7x (average 1.7x). 22

23 (a) 32 Ranks (b) 512 Ranks Figre 10: Improvement factor of the DB algorithm over the PS algorithm. 23

24 DB PS DB PS Normalized Time Normalized MaxLoad brain1 brain2 dros ecoli1 ecoli2 glet1 glet2 wiki yotbe 0.00 brain1 brain2 dros ecoli1 ecoli2 glet1 glet2 wiki yotbe (a) Time (b) Max. Load DB PS 1.00 Normalized AvgLoad brain1 brain2 dros ecoli1 ecoli2 glet1 glet2 wiki yotbe Real World Qeries (c) Avg. Load Figre 11: Normalized exection time, average load and maximm load on enron graph. To nderstand this trend frther, we compte the load (nmber of projection fnction operations) for both algorithms for processing different qeries on the enron graph at 512 ranks. For different qeries, Figre 11 shows the exection time and the average and maximm load. We can see that DB has lesser average load than PS, since DB avoids wastefl comptations. Frthermore, the improvement obtained by DB over PS on exection time correlates well with improvement obtained on the maximm load. For example, on ecoli1 qery, even thogh PS otperforms DB at 32 ranks, the perforamance is reversed at 512 ranks (see Fig 10), becase of sperior load balancing characteristic of DB. 8.4 Scalability Characteristics of DB Algorithm We stdied the scaling of DB across the 100 graph-qery combinations. For each combination, we compted the ratio of the exection time at 512 ranks to that of 32 ranks. Figre 12 smmarizes the above information by providing the averge of the above speedp for each qery across graphs and the same for each graph across qeries. As against an 24

25 Figre 12: Avg. speedp of DB at 512 ranks compared to 32 ranks. brain1 dros ecoli2 glet2 brain2 ecoli1 glet1 wiki yotbe Ideal brain1 brain2 brain3 dros ecoli1 glet1 wiki ecoli2 glet2 yotbe Speedp Time (Seconds) Ranks Ranks Figre 13: Strong and weak scaling ideal speedp of 16x, we see that the algorithms obtains speedps in the range of 7.4x to 15.8x. We stdied the strong scaling behavior of or algorithm, sing enron as a representative graph. Taking 32 ranks as the baseline, Figre 13 shows the speedp p to 512 ranks for different qeries. The algorithm scales well across qeries, with an average speedp of 8.2x and maximm speedp of 9.9x at 512 ranks (as against an ideal speedp of 16x). To stdy weak scaling, we se R-MAT synthetic graphs with parameters A = 0.5, B = 0.1, C = 0.1 and D = 0.3 and edge factor 16, sggested in a Graph 500 benchmark specification ( jriedy/tmp/graph500/). The nmber of vertices was fixed at 1K per rank and the nmber of ranks was varied from 32 to 512. We report the exection times each qery-rank combination in Figre 13. We see excellent weak scaling behavior with the exection times at 512 ranks remaining close to that of the baseline 32 ranks. 8.5 Evalation of Plan Generation Heristic We stdied the qality of or plan generation heristic for the DB algorithm at 512 ranks. For each graph-qery combination, we determined the optimal plan via an exhastive enmeration. We compared the exection time of the heristic plan to the optimal plan and measred the percentage difference. These reslts are reported in Figre 14. We can see that in 90% of the case, the heristic generated the optimal plan, whereas in the remaninig cases, the difference was at most 15%. 25

26 Figre 14: Error % of the exection time of the plan proposed by the plan heristic with reference to the optimal plan for each graph-qery combination. Figre 15: Coefficient of variation with 50 trials of color coding for each graph-qery combination. 8.6 Precision of Color Coding We evalated the precision of color coding on or benchmark by performing independent trials and compting the empirical variance of the sample (see Section 2). Specifically, for a given graph-qery combination we performed a seqence of trials, where in each trial the colorfl cont n colorfl (G, Q, χ) was compted for a fresh random coloring. We performed 10 random trials for each of the 100 graph-qery combinations in or test set and evalated the empirical mean and variance of the nmber of colorfl matches. For each graph-qery combination, we compted the coefficient of variation, which is the ratio of the empirical variance to the mean. The reslts are shown in Figre 15. A vale close to 0 indicates the convergence of or estimate to the tre mean n(g, Q). We observed that with only three trials, 82% of the graph-qery combinations had coefficient of variation at most 0.1; when the nmber of trials was increased to 10, it increases to 91%. Hence, sing 512 ranks, for a majority of the inpt graph-qery combinations in or benchmark, we reqire less than a minte to cont the actal nmber of matches of the qery, with 10% accracy. We conclde that or DB algorithm enables fast approximate conting of treewidth 2 qeries for data graphs spanning varios real domains. 26

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2

Lecture Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2 BIJU PATNAIK UNIVERSITY OF TECHNOLOGY, ODISHA Lectre Notes On THEORY OF COMPUTATION MODULE - 2 UNIT - 2 Prepared by, Dr. Sbhend Kmar Rath, BPUT, Odisha. Tring Machine- Miscellany UNIT 2 TURING MACHINE

More information

On the tree cover number of a graph

On the tree cover number of a graph On the tree cover nmber of a graph Chassidy Bozeman Minerva Catral Brendan Cook Oscar E. González Carolyn Reinhart Abstract Given a graph G, the tree cover nmber of the graph, denoted T (G), is the minimm

More information

1. Tractable and Intractable Computational Problems So far in the course we have seen many problems that have polynomial-time solutions; that is, on

1. Tractable and Intractable Computational Problems So far in the course we have seen many problems that have polynomial-time solutions; that is, on . Tractable and Intractable Comptational Problems So far in the corse we have seen many problems that have polynomial-time soltions; that is, on a problem instance of size n, the rnning time T (n) = O(n

More information

Nonlinear parametric optimization using cylindrical algebraic decomposition

Nonlinear parametric optimization using cylindrical algebraic decomposition Proceedings of the 44th IEEE Conference on Decision and Control, and the Eropean Control Conference 2005 Seville, Spain, December 12-15, 2005 TC08.5 Nonlinear parametric optimization sing cylindrical algebraic

More information

Discontinuous Fluctuation Distribution for Time-Dependent Problems

Discontinuous Fluctuation Distribution for Time-Dependent Problems Discontinos Flctation Distribtion for Time-Dependent Problems Matthew Hbbard School of Compting, University of Leeds, Leeds, LS2 9JT, UK meh@comp.leeds.ac.k Introdction For some years now, the flctation

More information

On oriented arc-coloring of subcubic graphs

On oriented arc-coloring of subcubic graphs On oriented arc-coloring of sbcbic graphs Alexandre Pinlo Alexandre.Pinlo@labri.fr LaBRI, Université Bordeax I, 351, Cors de la Libération, 33405 Talence, France Janary 17, 2006 Abstract. A homomorphism

More information

Sources of Non Stationarity in the Semivariogram

Sources of Non Stationarity in the Semivariogram Sorces of Non Stationarity in the Semivariogram Migel A. Cba and Oy Leangthong Traditional ncertainty characterization techniqes sch as Simple Kriging or Seqential Gassian Simlation rely on stationary

More information

Lecture Notes: Finite Element Analysis, J.E. Akin, Rice University

Lecture Notes: Finite Element Analysis, J.E. Akin, Rice University 9. TRUSS ANALYSIS... 1 9.1 PLANAR TRUSS... 1 9. SPACE TRUSS... 11 9.3 SUMMARY... 1 9.4 EXERCISES... 15 9. Trss analysis 9.1 Planar trss: The differential eqation for the eqilibrim of an elastic bar (above)

More information

FOUNTAIN codes [3], [4] provide an efficient solution

FOUNTAIN codes [3], [4] provide an efficient solution Inactivation Decoding of LT and Raptor Codes: Analysis and Code Design Francisco Lázaro, Stdent Member, IEEE, Gianligi Liva, Senior Member, IEEE, Gerhard Bach, Fellow, IEEE arxiv:176.5814v1 [cs.it 19 Jn

More information

Introdction Finite elds play an increasingly important role in modern digital commnication systems. Typical areas of applications are cryptographic sc

Introdction Finite elds play an increasingly important role in modern digital commnication systems. Typical areas of applications are cryptographic sc A New Architectre for a Parallel Finite Field Mltiplier with Low Complexity Based on Composite Fields Christof Paar y IEEE Transactions on Compters, Jly 996, vol 45, no 7, pp 856-86 Abstract In this paper

More information

Section 7.4: Integration of Rational Functions by Partial Fractions

Section 7.4: Integration of Rational Functions by Partial Fractions Section 7.4: Integration of Rational Fnctions by Partial Fractions This is abot as complicated as it gets. The Method of Partial Fractions Ecept for a few very special cases, crrently we have no way to

More information

Cubic graphs have bounded slope parameter

Cubic graphs have bounded slope parameter Cbic graphs have bonded slope parameter B. Keszegh, J. Pach, D. Pálvölgyi, and G. Tóth Agst 25, 2009 Abstract We show that every finite connected graph G with maximm degree three and with at least one

More information

Weak ε-nets for Axis-Parallel Boxes in d-space

Weak ε-nets for Axis-Parallel Boxes in d-space Weak ε-nets for Axis-Parallel Boxes in d-space Esther Ezra May 25, 2009 Abstract In this note we show the existence of weak ε-nets of size O /ε loglog /ε for point sets and axis-parallel boxes in R d.

More information

Elements of Coordinate System Transformations

Elements of Coordinate System Transformations B Elements of Coordinate System Transformations Coordinate system transformation is a powerfl tool for solving many geometrical and kinematic problems that pertain to the design of gear ctting tools and

More information

Classify by number of ports and examine the possible structures that result. Using only one-port elements, no more than two elements can be assembled.

Classify by number of ports and examine the possible structures that result. Using only one-port elements, no more than two elements can be assembled. Jnction elements in network models. Classify by nmber of ports and examine the possible strctres that reslt. Using only one-port elements, no more than two elements can be assembled. Combining two two-ports

More information

Optimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance

Optimal Control of a Heterogeneous Two Server System with Consideration for Power and Performance Optimal Control of a Heterogeneos Two Server System with Consideration for Power and Performance by Jiazheng Li A thesis presented to the University of Waterloo in flfilment of the thesis reqirement for

More information

Subcritical bifurcation to innitely many rotating waves. Arnd Scheel. Freie Universitat Berlin. Arnimallee Berlin, Germany

Subcritical bifurcation to innitely many rotating waves. Arnd Scheel. Freie Universitat Berlin. Arnimallee Berlin, Germany Sbcritical bifrcation to innitely many rotating waves Arnd Scheel Institt fr Mathematik I Freie Universitat Berlin Arnimallee 2-6 14195 Berlin, Germany 1 Abstract We consider the eqation 00 + 1 r 0 k2

More information

Universal Scheme for Optimal Search and Stop

Universal Scheme for Optimal Search and Stop Universal Scheme for Optimal Search and Stop Sirin Nitinawarat Qalcomm Technologies, Inc. 5775 Morehose Drive San Diego, CA 92121, USA Email: sirin.nitinawarat@gmail.com Vengopal V. Veeravalli Coordinated

More information

3.1 The Basic Two-Level Model - The Formulas

3.1 The Basic Two-Level Model - The Formulas CHAPTER 3 3 THE BASIC MULTILEVEL MODEL AND EXTENSIONS In the previos Chapter we introdced a nmber of models and we cleared ot the advantages of Mltilevel Models in the analysis of hierarchically nested

More information

Graph-Modeled Data Clustering: Fixed-Parameter Algorithms for Clique Generation

Graph-Modeled Data Clustering: Fixed-Parameter Algorithms for Clique Generation Graph-Modeled Data Clstering: Fied-Parameter Algorithms for Cliqe Generation Jens Gramm Jiong Go Falk Hüffner Rolf Niedermeier Wilhelm-Schickard-Institt für Informatik, Universität Tübingen, Sand 13, D-72076

More information

Reducing Conservatism in Flutterometer Predictions Using Volterra Modeling with Modal Parameter Estimation

Reducing Conservatism in Flutterometer Predictions Using Volterra Modeling with Modal Parameter Estimation JOURNAL OF AIRCRAFT Vol. 42, No. 4, Jly Agst 2005 Redcing Conservatism in Fltterometer Predictions Using Volterra Modeling with Modal Parameter Estimation Rick Lind and Joao Pedro Mortaga University of

More information

Sensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters

Sensitivity Analysis in Bayesian Networks: From Single to Multiple Parameters Sensitivity Analysis in Bayesian Networks: From Single to Mltiple Parameters Hei Chan and Adnan Darwiche Compter Science Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwiche}@cs.cla.ed

More information

An Investigation into Estimating Type B Degrees of Freedom

An Investigation into Estimating Type B Degrees of Freedom An Investigation into Estimating Type B Degrees of H. Castrp President, Integrated Sciences Grop Jne, 00 Backgrond The degrees of freedom associated with an ncertainty estimate qantifies the amont of information

More information

Sign-reductions, p-adic valuations, binomial coefficients modulo p k and triangular symmetries

Sign-reductions, p-adic valuations, binomial coefficients modulo p k and triangular symmetries Sign-redctions, p-adic valations, binomial coefficients modlo p k and trianglar symmetries Mihai Prnesc Abstract According to a classical reslt of E. Kmmer, the p-adic valation v p applied to a binomial

More information

RESOLUTION OF INDECOMPOSABLE INTEGRAL FLOWS ON A SIGNED GRAPH

RESOLUTION OF INDECOMPOSABLE INTEGRAL FLOWS ON A SIGNED GRAPH RESOLUTION OF INDECOMPOSABLE INTEGRAL FLOWS ON A SIGNED GRAPH BEIFANG CHEN, JUE WANG, AND THOMAS ZASLAVSKY Abstract. It is well-known that each nonnegative integral flow of a directed graph can be decomposed

More information

A Theory of Markovian Time Inconsistent Stochastic Control in Discrete Time

A Theory of Markovian Time Inconsistent Stochastic Control in Discrete Time A Theory of Markovian Time Inconsistent Stochastic Control in Discrete Time Tomas Björk Department of Finance, Stockholm School of Economics tomas.bjork@hhs.se Agatha Mrgoci Department of Economics Aarhs

More information

Assignment Fall 2014

Assignment Fall 2014 Assignment 5.086 Fall 04 De: Wednesday, 0 December at 5 PM. Upload yor soltion to corse website as a zip file YOURNAME_ASSIGNMENT_5 which incldes the script for each qestion as well as all Matlab fnctions

More information

The Dual of the Maximum Likelihood Method

The Dual of the Maximum Likelihood Method Department of Agricltral and Resorce Economics University of California, Davis The Dal of the Maximm Likelihood Method by Qirino Paris Working Paper No. 12-002 2012 Copyright @ 2012 by Qirino Paris All

More information

arxiv: v2 [math.co] 28 May 2014

arxiv: v2 [math.co] 28 May 2014 Algorithmic Aspects of Reglar Graph Covers with Applications to Planar Graphs Jiří Fiala 1, Pavel Klavík 2, Jan Kratochvíl 1, and Roman Nedela 3 arxiv:1402.3774v2 [math.co] 28 May 2014 1 Department of

More information

A Note on Johnson, Minkoff and Phillips Algorithm for the Prize-Collecting Steiner Tree Problem

A Note on Johnson, Minkoff and Phillips Algorithm for the Prize-Collecting Steiner Tree Problem A Note on Johnson, Minkoff and Phillips Algorithm for the Prize-Collecting Steiner Tree Problem Palo Feofiloff Cristina G. Fernandes Carlos E. Ferreira José Coelho de Pina September 04 Abstract The primal-dal

More information

Imprecise Continuous-Time Markov Chains

Imprecise Continuous-Time Markov Chains Imprecise Continos-Time Markov Chains Thomas Krak *, Jasper De Bock, and Arno Siebes t.e.krak@.nl, a.p.j.m.siebes@.nl Utrecht University, Department of Information and Compting Sciences, Princetonplein

More information

Simpler Testing for Two-page Book Embedding of Partitioned Graphs

Simpler Testing for Two-page Book Embedding of Partitioned Graphs Simpler Testing for Two-page Book Embedding of Partitioned Graphs Seok-Hee Hong 1 Hiroshi Nagamochi 2 1 School of Information Technologies, Uniersity of Sydney, seokhee.hong@sydney.ed.a 2 Department of

More information

Temporal Social Network: Group Query Processing

Temporal Social Network: Group Query Processing Temporal Social Network: Grop Qery Processing Xiaoying Chen 1 Chong Zhang 2 Yanli H 3 Bin Ge 4 Weidong Xiao 5 Science and Technology on Information Systems Engineering Laboratory National University of

More information

Gradient Projection Anti-windup Scheme on Constrained Planar LTI Systems. Justin Teo and Jonathan P. How

Gradient Projection Anti-windup Scheme on Constrained Planar LTI Systems. Justin Teo and Jonathan P. How 1 Gradient Projection Anti-windp Scheme on Constrained Planar LTI Systems Jstin Teo and Jonathan P. How Technical Report ACL1 1 Aerospace Controls Laboratory Department of Aeronatics and Astronatics Massachsetts

More information

Graphs and Their. Applications (6) K.M. Koh* F.M. Dong and E.G. Tay. 17 The Number of Spanning Trees

Graphs and Their. Applications (6) K.M. Koh* F.M. Dong and E.G. Tay. 17 The Number of Spanning Trees Graphs and Their Applications (6) by K.M. Koh* Department of Mathematics National University of Singapore, Singapore 1 ~ 7543 F.M. Dong and E.G. Tay Mathematics and Mathematics EdOOation National Institte

More information

Network Coding for Multiple Unicasts: An Approach based on Linear Optimization

Network Coding for Multiple Unicasts: An Approach based on Linear Optimization Network Coding for Mltiple Unicasts: An Approach based on Linear Optimization Danail Traskov, Niranjan Ratnakar, Desmond S. Ln, Ralf Koetter, and Mriel Médard Abstract In this paper we consider the application

More information

Difference Constraints: An adequate Abstraction for Complexity Analysis of Imperative Programs

Difference Constraints: An adequate Abstraction for Complexity Analysis of Imperative Programs Difference Constraints: An adeqate Abstraction for Complexity Analysis of Imperative Programs Moritz Sinn, Florian Zleger, Helmt Veith TU Wien, Astria Abstract Difference constraints have been sed for

More information

Spanning Trees with Many Leaves in Graphs without Diamonds and Blossoms

Spanning Trees with Many Leaves in Graphs without Diamonds and Blossoms Spanning Trees ith Many Leaes in Graphs ithot Diamonds and Blossoms Pal Bonsma Florian Zickfeld Technische Uniersität Berlin, Fachbereich Mathematik Str. des 7. Jni 36, 0623 Berlin, Germany {bonsma,zickfeld}@math.t-berlin.de

More information

Faster exact computation of rspr distance

Faster exact computation of rspr distance DOI 10.1007/s10878-013-9695-8 Faster exact comptation of rspr distance Zhi-Zhong Chen Ying Fan Lsheng Wang Springer Science+Bsiness Media New Yk 2013 Abstract De to hybridiation eents in eoltion, stdying

More information

Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defuzzification Strategy in an Adaptive Neuro Fuzzy Inference System

Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defuzzification Strategy in an Adaptive Neuro Fuzzy Inference System Creating a Sliding Mode in a Motion Control System by Adopting a Dynamic Defzzification Strategy in an Adaptive Nero Fzzy Inference System M. Onder Efe Bogazici University, Electrical and Electronic Engineering

More information

Upper Bounds on the Spanning Ratio of Constrained Theta-Graphs

Upper Bounds on the Spanning Ratio of Constrained Theta-Graphs Upper Bonds on the Spanning Ratio of Constrained Theta-Graphs Prosenjit Bose and André van Renssen School of Compter Science, Carleton University, Ottaa, Canada. jit@scs.carleton.ca, andre@cg.scs.carleton.ca

More information

Typed Kleene Algebra with Products and Iteration Theories

Typed Kleene Algebra with Products and Iteration Theories Typed Kleene Algebra with Prodcts and Iteration Theories Dexter Kozen and Konstantinos Mamoras Compter Science Department Cornell University Ithaca, NY 14853-7501, USA {kozen,mamoras}@cs.cornell.ed Abstract

More information

Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers

Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers Theoretical and Experimental Implementation of DC Motor Nonlinear Controllers D.R. Espinoza-Trejo and D.U. Campos-Delgado Facltad de Ingeniería, CIEP, UASLP, espinoza trejo dr@aslp.mx Facltad de Ciencias,

More information

Krauskopf, B., Lee, CM., & Osinga, HM. (2008). Codimension-one tangency bifurcations of global Poincaré maps of four-dimensional vector fields.

Krauskopf, B., Lee, CM., & Osinga, HM. (2008). Codimension-one tangency bifurcations of global Poincaré maps of four-dimensional vector fields. Kraskopf, B, Lee,, & Osinga, H (28) odimension-one tangency bifrcations of global Poincaré maps of for-dimensional vector fields Early version, also known as pre-print Link to pblication record in Explore

More information

Modelling by Differential Equations from Properties of Phenomenon to its Investigation

Modelling by Differential Equations from Properties of Phenomenon to its Investigation Modelling by Differential Eqations from Properties of Phenomenon to its Investigation V. Kleiza and O. Prvinis Kanas University of Technology, Lithania Abstract The Panevezys camps of Kanas University

More information

arxiv: v1 [cs.sy] 22 Nov 2018

arxiv: v1 [cs.sy] 22 Nov 2018 AAS 18-253 ROBUST MYOPIC CONTROL FOR SYSTEMS WITH IMPERFECT OBSERVATIONS Dantong Ge, Melkior Ornik, and Ufk Topc arxiv:1811.09000v1 [cs.sy] 22 Nov 2018 INTRODUCTION Control of systems operating in nexplored

More information

Discussion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli

Discussion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli 1 Introdction Discssion of The Forward Search: Theory and Data Analysis by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli Søren Johansen Department of Economics, University of Copenhagen and CREATES,

More information

Move Blocking Strategies in Receding Horizon Control

Move Blocking Strategies in Receding Horizon Control Move Blocking Strategies in Receding Horizon Control Raphael Cagienard, Pascal Grieder, Eric C. Kerrigan and Manfred Morari Abstract In order to deal with the comptational brden of optimal control, it

More information

Study on the impulsive pressure of tank oscillating by force towards multiple degrees of freedom

Study on the impulsive pressure of tank oscillating by force towards multiple degrees of freedom EPJ Web of Conferences 80, 0034 (08) EFM 07 Stdy on the implsive pressre of tank oscillating by force towards mltiple degrees of freedom Shigeyki Hibi,* The ational Defense Academy, Department of Mechanical

More information

Information Source Detection in the SIR Model: A Sample Path Based Approach

Information Source Detection in the SIR Model: A Sample Path Based Approach Information Sorce Detection in the SIR Model: A Sample Path Based Approach Kai Zh and Lei Ying School of Electrical, Compter and Energy Engineering Arizona State University Tempe, AZ, United States, 85287

More information

CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK. Wassim Jouini and Christophe Moy

CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK. Wassim Jouini and Christophe Moy CHANNEL SELECTION WITH RAYLEIGH FADING: A MULTI-ARMED BANDIT FRAMEWORK Wassim Joini and Christophe Moy SUPELEC, IETR, SCEE, Avene de la Bolaie, CS 47601, 5576 Cesson Sévigné, France. INSERM U96 - IFR140-

More information

4.2 First-Order Logic

4.2 First-Order Logic 64 First-Order Logic and Type Theory The problem can be seen in the two qestionable rles In the existential introdction, the term a has not yet been introdced into the derivation and its se can therefore

More information

Technical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty

Technical Note. ODiSI-B Sensor Strain Gage Factor Uncertainty Technical Note EN-FY160 Revision November 30, 016 ODiSI-B Sensor Strain Gage Factor Uncertainty Abstract Lna has pdated or strain sensor calibration tool to spport NIST-traceable measrements, to compte

More information

QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING

QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING Jornal of the Korean Statistical Society 2007, 36: 4, pp 543 556 QUANTILE ESTIMATION IN SUCCESSIVE SAMPLING Hosila P. Singh 1, Ritesh Tailor 2, Sarjinder Singh 3 and Jong-Min Kim 4 Abstract In sccessive

More information

arxiv: v2 [cs.ds] 17 Oct 2014

arxiv: v2 [cs.ds] 17 Oct 2014 On Uniform Capacitated k-median Beyond the Natral LP Relaxation Shi Li Toyota Technological Institte at Chicago shili@ttic.ed arxiv:1409.6739v2 [cs.ds] 17 Oct 2014 Abstract In this paper, we stdy the niform

More information

REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL

REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL Lewis c11.tex V1-10/19/2011 4:10pm Page 461 11 REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL In this book we have presented a variety of methods for the analysis and design of optimal control systems.

More information

Simplified Identification Scheme for Structures on a Flexible Base

Simplified Identification Scheme for Structures on a Flexible Base Simplified Identification Scheme for Strctres on a Flexible Base L.M. Star California State University, Long Beach G. Mylonais University of Patras, Greece J.P. Stewart University of California, Los Angeles

More information

Downloaded 07/06/18 to Redistribution subject to SIAM license or copyright; see

Downloaded 07/06/18 to Redistribution subject to SIAM license or copyright; see SIAM J. SCI. COMPUT. Vol. 4, No., pp. A4 A7 c 8 Society for Indstrial and Applied Mathematics Downloaded 7/6/8 to 8.83.63.. Redistribtion sbject to SIAM license or copyright; see http://www.siam.org/jornals/ojsa.php

More information

Computational Geosciences 2 (1998) 1, 23-36

Computational Geosciences 2 (1998) 1, 23-36 A STUDY OF THE MODELLING ERROR IN TWO OPERATOR SPLITTING ALGORITHMS FOR POROUS MEDIA FLOW K. BRUSDAL, H. K. DAHLE, K. HVISTENDAHL KARLSEN, T. MANNSETH Comptational Geosciences 2 (998), 23-36 Abstract.

More information

The Linear Quadratic Regulator

The Linear Quadratic Regulator 10 The Linear Qadratic Reglator 10.1 Problem formlation This chapter concerns optimal control of dynamical systems. Most of this development concerns linear models with a particlarly simple notion of optimality.

More information

Stability of Model Predictive Control using Markov Chain Monte Carlo Optimisation

Stability of Model Predictive Control using Markov Chain Monte Carlo Optimisation Stability of Model Predictive Control sing Markov Chain Monte Carlo Optimisation Elilini Siva, Pal Golart, Jan Maciejowski and Nikolas Kantas Abstract We apply stochastic Lyapnov theory to perform stability

More information

Collective Inference on Markov Models for Modeling Bird Migration

Collective Inference on Markov Models for Modeling Bird Migration Collective Inference on Markov Models for Modeling Bird Migration Daniel Sheldon Cornell University dsheldon@cs.cornell.ed M. A. Saleh Elmohamed Cornell University saleh@cam.cornell.ed Dexter Kozen Cornell

More information

Decoder Error Probability of MRD Codes

Decoder Error Probability of MRD Codes Decoder Error Probability of MRD Codes Maximilien Gadolea Department of Electrical and Compter Engineering Lehigh University Bethlehem, PA 18015 USA E-mail: magc@lehighed Zhiyan Yan Department of Electrical

More information

EVALUATION OF GROUND STRAIN FROM IN SITU DYNAMIC RESPONSE

EVALUATION OF GROUND STRAIN FROM IN SITU DYNAMIC RESPONSE 13 th World Conference on Earthqake Engineering Vancover, B.C., Canada Agst 1-6, 2004 Paper No. 3099 EVALUATION OF GROUND STRAIN FROM IN SITU DYNAMIC RESPONSE Ellen M. RATHJE 1, Wen-Jong CHANG 2, Kenneth

More information

Safe Manual Control of the Furuta Pendulum

Safe Manual Control of the Furuta Pendulum Safe Manal Control of the Frta Pendlm Johan Åkesson, Karl Johan Åström Department of Atomatic Control, Lnd Institte of Technology (LTH) Box 8, Lnd, Sweden PSfrag {jakesson,kja}@control.lth.se replacements

More information

Applying Fuzzy Set Approach into Achieving Quality Improvement for Qualitative Quality Response

Applying Fuzzy Set Approach into Achieving Quality Improvement for Qualitative Quality Response Proceedings of the 007 WSES International Conference on Compter Engineering and pplications, Gold Coast, stralia, Janary 17-19, 007 5 pplying Fzzy Set pproach into chieving Qality Improvement for Qalitative

More information

RESGen: Renewable Energy Scenario Generation Platform

RESGen: Renewable Energy Scenario Generation Platform 1 RESGen: Renewable Energy Scenario Generation Platform Emil B. Iversen, Pierre Pinson, Senior Member, IEEE, and Igor Ardin Abstract Space-time scenarios of renewable power generation are increasingly

More information

Cuckoo hashing: Further analysis

Cuckoo hashing: Further analysis Information Processing Letters 86 (2003) 215 219 www.elsevier.com/locate/ipl Cckoo hashing: Frther analysis Lc Devroye,PatMorin School of Compter Science, McGill University, 3480 University Street, Montreal,

More information

Department of Industrial Engineering Statistical Quality Control presented by Dr. Eng. Abed Schokry

Department of Industrial Engineering Statistical Quality Control presented by Dr. Eng. Abed Schokry Department of Indstrial Engineering Statistical Qality Control presented by Dr. Eng. Abed Schokry Department of Indstrial Engineering Statistical Qality Control C and U Chart presented by Dr. Eng. Abed

More information

PIPELINE MECHANICAL DAMAGE CHARACTERIZATION BY MULTIPLE MAGNETIZATION LEVEL DECOUPLING

PIPELINE MECHANICAL DAMAGE CHARACTERIZATION BY MULTIPLE MAGNETIZATION LEVEL DECOUPLING PIPELINE MECHANICAL DAMAGE CHARACTERIZATION BY MULTIPLE MAGNETIZATION LEVEL DECOUPLING INTRODUCTION Richard 1. Davis & 1. Brce Nestleroth Battelle 505 King Ave Colmbs, OH 40201 Mechanical damage, cased

More information

Multi-Voltage Floorplan Design with Optimal Voltage Assignment

Multi-Voltage Floorplan Design with Optimal Voltage Assignment Mlti-Voltage Floorplan Design with Optimal Voltage Assignment ABSTRACT Qian Zaichen Department of CSE The Chinese University of Hong Kong Shatin,N.T., Hong Kong zcqian@cse.chk.ed.hk In this paper, we stdy

More information

Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS

Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS Chapter 3 MATHEMATICAL MODELING OF DYNAMIC SYSTEMS 3. System Modeling Mathematical Modeling In designing control systems we mst be able to model engineered system dynamics. The model of a dynamic system

More information

I block CLK 1 CLK 2. Oscillator - Delay block. circuit. US Al. Jun.28,2011 P21 P11 P22 P12. PlN P2N. (19) United States

I block CLK 1 CLK 2. Oscillator - Delay block. circuit. US Al. Jun.28,2011 P21 P11 P22 P12. PlN P2N. (19) United States (19) United States c12) Patent Application Pblication Wang et al. 111111 1111111111111111111111111111111111111111111111111111111111111111111111111111 US 21227143Al (1) Pb. o.: US 212/27143 A1 (43) Pb.

More information

A State Space Based Implicit Integration Algorithm. for Differential Algebraic Equations of Multibody. Dynamics

A State Space Based Implicit Integration Algorithm. for Differential Algebraic Equations of Multibody. Dynamics A State Space Based Implicit Integration Algorithm for Differential Algebraic Eqations of Mltibody Dynamics E. J. Hag, D. Negrt, M. Ianc Janary 28, 1997 To Appear Mechanics of Strctres and Machines Abstract.

More information

1. State-Space Linear Systems 2. Block Diagrams 3. Exercises

1. State-Space Linear Systems 2. Block Diagrams 3. Exercises LECTURE 1 State-Space Linear Sstems This lectre introdces state-space linear sstems, which are the main focs of this book. Contents 1. State-Space Linear Sstems 2. Block Diagrams 3. Exercises 1.1 State-Space

More information

Optimal search: a practical interpretation of information-driven sensor management

Optimal search: a practical interpretation of information-driven sensor management Optimal search: a practical interpretation of information-driven sensor management Fotios Katsilieris, Yvo Boers and Hans Driessen Thales Nederland B.V. Hengelo, the Netherlands Email: {Fotios.Katsilieris,

More information

Discussion Papers Department of Economics University of Copenhagen

Discussion Papers Department of Economics University of Copenhagen Discssion Papers Department of Economics University of Copenhagen No. 10-06 Discssion of The Forward Search: Theory and Data Analysis, by Anthony C. Atkinson, Marco Riani, and Andrea Ceroli Søren Johansen,

More information

arxiv: v3 [gr-qc] 29 Jun 2015

arxiv: v3 [gr-qc] 29 Jun 2015 QUANTITATIVE DECAY RATES FOR DISPERSIVE SOLUTIONS TO THE EINSTEIN-SCALAR FIELD SYSTEM IN SPHERICAL SYMMETRY JONATHAN LUK AND SUNG-JIN OH arxiv:402.2984v3 [gr-qc] 29 Jn 205 Abstract. In this paper, we stdy

More information

On the circuit complexity of the standard and the Karatsuba methods of multiplying integers

On the circuit complexity of the standard and the Karatsuba methods of multiplying integers On the circit complexity of the standard and the Karatsba methods of mltiplying integers arxiv:1602.02362v1 [cs.ds] 7 Feb 2016 Igor S. Sergeev The goal of the present paper is to obtain accrate estimates

More information

FEA Solution Procedure

FEA Solution Procedure EA Soltion Procedre (demonstrated with a -D bar element problem) EA Procedre for Static Analysis. Prepare the E model a. discretize (mesh) the strctre b. prescribe loads c. prescribe spports. Perform calclations

More information

Convergence analysis of ant colony learning

Convergence analysis of ant colony learning Delft University of Technology Delft Center for Systems and Control Technical report 11-012 Convergence analysis of ant colony learning J van Ast R Babška and B De Schtter If yo want to cite this report

More information

Setting The K Value And Polarization Mode Of The Delta Undulator

Setting The K Value And Polarization Mode Of The Delta Undulator LCLS-TN-4- Setting The Vale And Polarization Mode Of The Delta Undlator Zachary Wolf, Heinz-Dieter Nhn SLAC September 4, 04 Abstract This note provides the details for setting the longitdinal positions

More information

Curves - Foundation of Free-form Surfaces

Curves - Foundation of Free-form Surfaces Crves - Fondation of Free-form Srfaces Why Not Simply Use a Point Matrix to Represent a Crve? Storage isse and limited resoltion Comptation and transformation Difficlties in calclating the intersections

More information

arxiv: v1 [physics.flu-dyn] 11 Mar 2011

arxiv: v1 [physics.flu-dyn] 11 Mar 2011 arxiv:1103.45v1 [physics.fl-dyn 11 Mar 011 Interaction of a magnetic dipole with a slowly moving electrically condcting plate Evgeny V. Votyakov Comptational Science Laboratory UCY-CompSci, Department

More information

Prediction of Transmission Distortion for Wireless Video Communication: Analysis

Prediction of Transmission Distortion for Wireless Video Communication: Analysis Prediction of Transmission Distortion for Wireless Video Commnication: Analysis Zhifeng Chen and Dapeng W Department of Electrical and Compter Engineering, University of Florida, Gainesville, Florida 326

More information

Study on the Mathematic Model of Product Modular System Orienting the Modular Design

Study on the Mathematic Model of Product Modular System Orienting the Modular Design Natre and Science, 2(, 2004, Zhong, et al, Stdy on the Mathematic Model Stdy on the Mathematic Model of Prodct Modlar Orienting the Modlar Design Shisheng Zhong 1, Jiang Li 1, Jin Li 2, Lin Lin 1 (1. College

More information

UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL

UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL 8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-1 April 01, Tallinn, Estonia UNCERTAINTY FOCUSED STRENGTH ANALYSIS MODEL Põdra, P. & Laaneots, R. Abstract: Strength analysis is a

More information

International Journal of Physical and Mathematical Sciences journal homepage:

International Journal of Physical and Mathematical Sciences journal homepage: 64 International Jornal of Physical and Mathematical Sciences Vol 2, No 1 (2011) ISSN: 2010-1791 International Jornal of Physical and Mathematical Sciences jornal homepage: http://icoci.org/ijpms PRELIMINARY

More information

Second-Order Wave Equation

Second-Order Wave Equation Second-Order Wave Eqation A. Salih Department of Aerospace Engineering Indian Institte of Space Science and Technology, Thirvananthapram 3 December 016 1 Introdction The classical wave eqation is a second-order

More information

Home Range Formation in Wolves Due to Scent Marking

Home Range Formation in Wolves Due to Scent Marking Blletin of Mathematical Biology () 64, 61 84 doi:1.16/blm.1.73 Available online at http://www.idealibrary.com on Home Range Formation in Wolves De to Scent Marking BRIAN K. BRISCOE, MARK A. LEWIS AND STEPHEN

More information

Sareban: Evaluation of Three Common Algorithms for Structure Active Control

Sareban: Evaluation of Three Common Algorithms for Structure Active Control Engineering, Technology & Applied Science Research Vol. 7, No. 3, 2017, 1638-1646 1638 Evalation of Three Common Algorithms for Strctre Active Control Mohammad Sareban Department of Civil Engineering Shahrood

More information

Worst-case analysis of the LPT algorithm for single processor scheduling with time restrictions

Worst-case analysis of the LPT algorithm for single processor scheduling with time restrictions OR Spectrm 06 38:53 540 DOI 0.007/s009-06-043-5 REGULAR ARTICLE Worst-case analysis of the LPT algorithm for single processor schedling with time restrictions Oliver ran Fan Chng Ron Graham Received: Janary

More information

Robust Tracking and Regulation Control of Uncertain Piecewise Linear Hybrid Systems

Robust Tracking and Regulation Control of Uncertain Piecewise Linear Hybrid Systems ISIS Tech. Rept. - 2003-005 Robst Tracking and Reglation Control of Uncertain Piecewise Linear Hybrid Systems Hai Lin Panos J. Antsaklis Department of Electrical Engineering, University of Notre Dame,

More information

A Macroscopic Traffic Data Assimilation Framework Based on Fourier-Galerkin Method and Minimax Estimation

A Macroscopic Traffic Data Assimilation Framework Based on Fourier-Galerkin Method and Minimax Estimation A Macroscopic Traffic Data Assimilation Framework Based on Forier-Galerkin Method and Minima Estimation Tigran T. Tchrakian and Sergiy Zhk Abstract In this paper, we propose a new framework for macroscopic

More information

Restricted cycle factors and arc-decompositions of digraphs. J. Bang-Jensen and C. J. Casselgren

Restricted cycle factors and arc-decompositions of digraphs. J. Bang-Jensen and C. J. Casselgren Restricted cycle factors and arc-decompositions of digraphs J. Bang-Jensen and C. J. Casselgren REPORT No. 0, 0/04, spring ISSN 0-467X ISRN IML-R- -0-/4- -SE+spring Restricted cycle factors and arc-decompositions

More information

PREDICTABILITY OF SOLID STATE ZENER REFERENCES

PREDICTABILITY OF SOLID STATE ZENER REFERENCES PREDICTABILITY OF SOLID STATE ZENER REFERENCES David Deaver Flke Corporation PO Box 99 Everett, WA 986 45-446-6434 David.Deaver@Flke.com Abstract - With the advent of ISO/IEC 175 and the growth in laboratory

More information

Collaborative Filtering with Low Regret

Collaborative Filtering with Low Regret Collaborative Filtering with Low Regret Gy Bresler IDSS/LIDS/EECS, MIT 3 Vassar Street Cambridge, Massachsetts gy@mit.ed Devavrat Shah IDSS/LIDS/EECS, MIT 3 Vassar Street Cambridge, Massachsetts devavrat@mit.ed

More information

Computational Fluid Dynamics Simulation and Wind Tunnel Testing on Microlight Model

Computational Fluid Dynamics Simulation and Wind Tunnel Testing on Microlight Model Comptational Flid Dynamics Simlation and Wind Tnnel Testing on Microlight Model Iskandar Shah Bin Ishak Department of Aeronatics and Atomotive, Universiti Teknologi Malaysia T.M. Kit Universiti Teknologi

More information

FREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS

FREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS 7 TH INTERNATIONAL CONGRESS O THE AERONAUTICAL SCIENCES REQUENCY DOMAIN LUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS Yingsong G, Zhichn Yang Northwestern Polytechnical University, Xi an, P. R. China,

More information

3 2D Elastostatic Problems in Cartesian Coordinates

3 2D Elastostatic Problems in Cartesian Coordinates D lastostatic Problems in Cartesian Coordinates Two dimensional elastostatic problems are discssed in this Chapter, that is, static problems of either plane stress or plane strain. Cartesian coordinates

More information