Neighborhood Based Fast Graph Search in Large Networks

Size: px
Start display at page:

Download "Neighborhood Based Fast Graph Search in Large Networks"

Transcription

1 Neighborhood Bsed Fst Grph Serh in Lrge Networks Arijit Khn Dept. of Computer Siene University of Cliforni Snt Brbr, CA 9306 Ziyu Gun Dept. of Computer Siene University of Cliforni Snt Brbr, CA 9306 Nn Li Dept. of Computer Siene University of Cliforni Snt Brbr, CA 9306 Supriyo Chkrborty Dept. of Eletril Engineering University of Cliforni Los Angeles, CA Xifeng Yn Dept. of Computer Siene University of Cliforni Snt Brbr, CA 9306 Shu To IBMT.J.Wtson 9 Skyline Drive Hwthorne, NY 0532 shuto@us.ibm.om ABSTRACT Complex soil nd informtion network serh beomes importnt with vriety of pplitions. In the ore of these pplitions, lies ommon nd ritil problem: Given lbeled network nd query grph, how to effiiently serh the query grph in the trget network. The presene of noise nd the inomplete knowledge bout the struture nd ontent of the trget network mke it unrelisti to find n ext mth. Rther, it is more ppeling to find the top-k pproximte mthes. In this pper, we propose neighborhood-bsed similrity mesure tht ould void ostly grph isomorphism nd edit distne omputtion. Under this new mesure, we prove tht subgrph similrity serh is NP hrd, while grph similrity mth is polynomil. By studying the priniples behind this mesure, we found n informtion propgtion model tht is ble to onvert lrge network into set of multidimensionl vetors, where sophistited indexing nd similrity serh lgorithms re vilble. The proposed method, lled Ness (Neighborhood Bsed SimilritySerh), is pproprite for grphs with low utomorphism nd high noise, whih re ommon in mny soil nd informtion networks. Ness is not only effiient, but lso robust ginst struturl noise nd informtion loss. Empiril results show tht it n quikly nd urtely find high-qulity mthes in lrge networks, with negligible ost. Ctegories nd Subjet Desriptors H.3.3 [Informtion Serh nd Retrievl]: Serh proess; I.2.8 [Problem Solving, Control Methods, nd Serh]: Grph nd tree serh strtegies Generl Terms Algorithms, Performne Permission to mke digitl or hrd opies of ll or prt of this work for personl or lssroom use is grnted without fee provided tht opies re not mde or distributed for profit or ommeril dvntge nd tht opies ber this notie nd the full ittion on the first pge. To opy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speifi permission nd/or fee. SIGMOD, June 2 6, 20, Athens, Greee. Copyright 20 ACM //06...$0.00. Keywords Grph Query, Grph Serh, Grph Alignment, RDF. INTRODUCTION Reent dvnes in soil nd informtion siene hve shown tht linked dt pervde our soiety nd the nturl world round us [36]. Grphs beome inresingly importnt to represent omplited strutures nd shem-less dt suh s wikipedi, freebse [5] nd vrious soil networks. Given n ttributed network nd smll query grph, how to effiiently serh the query grph in the trget network is ritil tsk for mny grph pplitions. It hs been extensively studied in hemi-informtis, bioinformtis, XML nd Semnti Web. SPARQL [27] is the stte-of-the RDF query lnguge for Semnti Web. SPARQL requires urte knowledge bout the grph struture to write query nd lso it performs n ext grph pttern mthing. However, due to the noise nd the inomplete informtion (struture nd ontent) in mny networks, it is not relisti to find ext mthes for given query. It is more ppeling to find the top-k pproximte mthes. Unfortuntely, grph similrity mesures suh s subgrph isomorphism, mximum ommon subgrphs, grph edit distne, missing edges tht re pproprite for hemil strutures nd biologil networks, re not suitble for entity-reltionship grphs nd soil networks. There re two hllenging issues for these grph theoreti mesures. First, entity-reltionship grphs nd soil networks hve quite different hrteristis from physil networks. They re not governed by physil lws nd often full of noise, thus mking strit topologil similrity exmintion nerly impossible. How the entities re onneted in these networks re not s importnt s how losely these entities re onneted. Seond, these grphs re very lrge nd omplex with lot of ttributes ssoited. If ury is to be ensured, the lgorithms developed for edit distne nd missing edges re not slble. These two issues motivte us to invent new grph similrity mesures tht re less sensitive to struture hnges, nd hve slble indexing nd serh solutions. Figure () shows grph query to Find the thlete who is from Romni nd won gold in 3000m nd bronze in 500m both in 984 olympis.. Compre this query ginst possible mth in FreeBse (Olympis) shown in Figure (b), it is observed tht these two grphs re by no mens similr under trditionl grph similrity definitions. Grph edit distne between

2 Romni Bronze Romni 500m 984 () Query 3000m Mrii Pui (b) Mth in Freebse Gold Bronze 500m m Gold Figure : Top- Mth for Query () in FreeBse these two grphs is 7. The size of their mximum ommon grph is 3. The number of mximum missing edges for the query grph is 4. However, Mrii Pur in Figure (b) is good mth for the query shown in Figure (), beuse she hs ll these ttributes quite lose to her in Figure (b). In prtie, it is hrd to ome up with query tht extly onforms with the grph strutures in the trget network due to the lk of shems in linked dt. However, it is esy to write query like Figure (), where user onnets entities with possible links. As long s the the proximity between these entities is pproximtely mintined in query grph, the system shll be ble to deliver mthes like Figure (b). The bove pproximte query form n serve s primitive for mny dvned grph opertors suh s RDF query nswering, network lignment, subgrph similrity serh, nme dismbigution nd dtbse shem mthing. For exmple, bsed on prtil informtion relted to one person, e.g. his friends, one n lign his physil soil irle with his yber soil network on Febook. In mny ses, nodes in soil or informtion networks hve inomplete informtion or even nonymized informtion. Nevertheless, the prtil neighborhood informtion vilble from query grph will be helpful to identify entities in the trget network. Clerly, there is need to dopt pproximte similrity serh tehniques to solve the bove problem. In bioinformtis, pproximte grph lignment hs been extensively studied, e.g. PthBlst [2], Sg [33]. These studies resort to strit pproximtion definition suh s grph edit distne, whose optiml solution is expensive to ompute. Sine they re trgeting reltively smll biologil networks with less thn 0k nodes, it is diffiult to pply them in soil nd informtion networks with thousnds or even millions of nodes. As illustrted in NetAlign [23], in order to hndle lrge grphs with 0k nodes, one hs to srifie ury to hieve better query response time. Reently there hve been other studies on pproximte mthing with lrge grphs, i.e., TALE [34], SIGMA [24] nd G-Ry [35]. However, both TALE nd SIGMA onsider the number of missing edges s the qulittive mesure of pproximte mthing nd hene, the tehniques nnot pture the notion of proximity mong lbels, s shown in Figure. G-Ry, on the other hnd, tries to mintin the shpe of the query by llowing some pproximtion in the mth. Unfortuntely, shpe is not n importnt ftor in entity-reltionship grphs. In this pper, we introdue novel neighborhood-bsed similrity mesure by vetorizing nodes ording to the lbel distribution of their neighbors. We further extend the similrity notion to grph by finding the embeddings in the trget grph tht mximize the sum of node mthes. This grph mthing tehnique voids omplited subgrph isomorphism nd grph edit distne lultion, whih beomes infesible for lrge grphs. It is observed tht soil/informtion networks usully hve more diversified node lbels nd therefore less uto-isomorphi struture, but my ontin more noise. Our objetive funtion n provide better similrity semntis for grphs with vrious rndom noise. It simplifies the proedure of grph mthing, leding to the development of n effiient grph serh frmework, lled Ness (Neighborhood Bsed Similrity Serh). With the introdution of slble indies built on vetorized nodes nd n intelligent query optimiztion tehnique, Ness n quikly nd urtely find high-qulity mthes in lrge networks, with negligible time ost. Our ontributions. We propose novel similrity serh problem in grphs, neighborhood-bsed similrity serh, whih ombines the topologil struture nd ontent informtion together during the serh proess. The similrity definition proposed in this work is ble to void expensive isomorphism testing s muh s possible. The priniples to derive pproprite funtions to fit this definition re refully exmined. We found tht the informtion propgtion model stisfies these priniples, where eh node propgtes ertin frtion of its lbels to its neighbors, nd thereby we ould onvert eh node into multidimensionl vetor, where sophistited indexing nd similrity serh lgorithms re vilble. Tht is, we suessfully turn grph serh problem into high-dimension index problem. We first identify set of rules to define pproximte mthes of nodes bsed on their neighborhood struture nd lbels. These rules re importnt sine the query my not lwys hve omplete informtion bout the ext neighborhood struture in the trget grph. The pproximte node mth onept is further extended to subgrph similrity serh, i.e. multiple node lignment for given query grph. We prove tht under this mesure, subgrph similrity serh is NP hrd. However, in omprison with grph isomorphism, whih is neither known to be solvble in polynomil time nor NP-hrd, grph similrity mth is proved to be polynomil. We demonstrte tht, without performing subgrph isomorphism testing, it is possible to prune unpromising nodes by itertively propgting node informtion mong shrinking ndidte set, whih signifintly redues query exeution time. We further nlyze how to index the vetor struture s well s optimize query proessing to speed up similrity serh. The informtion propgtion model nd the neighborhood vetoriztion pproh keep the index struture muh simpler thn the grph itself, thus mking it esy to be updted dynmilly for grph hnges rising from node/edge insertion nd deletion. In summry, we propose ompletely new grph similrity serh frmework, Ness, to define nd determine pproximte mthes in mssive grphs. As tested in rel nd syntheti networks, Ness is ble to find high-qulity mthes effiiently in lrge sle networks. 2. PRELIMINARIES A lbeled grph G =(V G,E G,L G) hs lbel set L G nd eh node u V G is tthed with set of lbels. The lbel set of node u in G is denoted by L(u) L G. For the ske of simpliity, we ssume there re no lbels nd weights on the edges. Nevertheless, the proposed tehniques ould be extended for grphs with lbeled or weighted edges. Given two lbeled grphs G nd G, G is lled subgrph isomorphi to G, if there exists subgrph H of G, suh tht G is isomorphi to H. Formlly, we define subgrph isomorphism s follow. DEFINITION (SUBGRAPH ISOMORPHISM). A subgrph isomorphism is n injetive funtion f : V G V G, s.t., () u

3 V G, L(u) L(f(u)), nd (2) (u, v) E G, (f(u),f(v)) E G. DEFINITION 2 (EMBEDDING). Given grph G nd query grph Q, n embedding of Q is n (injetive) funtion f : V Q V G, suh tht, v V Q,L(v) L(f(v)), wheref(v) V (G). In this work, we only studied the one-to-one node mthing for query grph Q nd the node lbels re preserved in the embedding. However, our ost funtion nd lgorithms n be extended to inlude other mthing nd node lbel similrity senrios. Given two grphs G nd Q, there might be mny possible embeddings. Certinly, the qulity of n embedding depends on whether it preserves the onnetions nd lbels in the query grph or not. Subgrph isomorphism tully defines n ext embedding, written s f e. The qulity of n embedding n be defined in vrious wys; i.e., for given lbel-preserved embedding f, we n ount the number of edge mismthes, C e = {(u, v) E Q : (f(u),f(v)) E G}, s the embedding s qulity. In generl, for ost funtion C : f R, we define the top-k grph similrity serh problem s below. PROBLEM STATEMENT. Given grph G nd query grph Q, find the top-k embeddings with respet to ost funtion C. The edge mismth ost funtion C e hs been studied in [38, 34, 24]. Unfortuntely, it nnot differentite the se where two nodes re lose to eh other but there is no diret edge between them. f f 2 b u u 3 u 2 b u' u' 3 u' 2 G b v v 2 v 3 Q Figure 2: Problem with Edge Mismth Cost Funtion b u d f d g e Figure 3: Informtion Propgtion Model Figure 2 shows one exmple. There re two lbel-preserved embeddings f nd f 2 of the query grph Q in trget grph G. Inf nd f 2, there is no edge onneting nd b. Thus, C e will ssign equl ost to both embeddings. On the other hnd, the grph edit distne between f nd Q is 2, wheres it is only between f 2 nd Q. Although, intuitively it is observed tht f is better mth thn f 2, beuse the nodes with lbels nd b re only 2-hops wy in f, wheres they re disonneted in f 2. This observtion inspires us to develop neighborhood-bsed similrity mesure tht disounts how nodes re extly onneted, but fouses on the proximity mong the lbels rried by these nodes. It needs to hieve the following two objetives: () The ost funtion should identify pproximte embeddings, nd (2) it must be esy to ompute. In the next setion, we will define the neighborhood-bsed similrity ost funtion nd the omplexity nlysis of tht funtion. 3. NEIGHBORHOOD-BASED GRAPH SIM- ILARITY In order to solve the problem rised by the edge mismth ost funtion, we define novel neighborhood-bsed similrity mesure by ompring the h-hop neighbors of node, defined s follows. h DEFINITION 3 (h-hop NEIGHBORS). Given grph G nd node u V (G),theh-hop neighborhood of u is the set of nodes v whose distne from u is less thn or equl to h. To ompre the neighborhoods of two nodes, we resort to n informtion propgtion model [22] tht is ble to trnsform neighborhoods into vetors in multidimensionl spe, where sophistited indexing nd fst similrity serh lgorithms re vilble. 3. Informtion Propgtion Model Figure 3 shows the informtion propgtion model to hrterize the neighborhood informtion round node u. The lbel informtion enoded in u s neighbors is propgted to u through different pths nd umulted t u. One ould use the umulted informtion nd its strength s vetor to desribe the neighborhood of u. The neighborhood vetor of u is denoted by R(u), whih onsists of set of tuples, R(u) ={ l, A(u, l) },wherel is lbel present in the neighborhood of u nd A(u, l) represents the strength of lbel l t node u in grph. There re mny different mehnisms to propgte informtion. However, not every one is vlid for grph similrity serh. Any vlid one must omply with the following priniple, PROPERTY (COST FUNCTION). For grph similrity ost funtion C, given n ext embedding f e, C(f e) must be equl to 0. Here, we onsider simple but effetive informtion propgtion model so tht the derived neighborhood-bsed similrity mesure stisfies the bove priniple. It propgtes informtion long the shortest pths between two nodes with exponentil dey to the length. Eq. desribes the formul of A(u, l) in R(u) ={ l, A(u, l) } tht represents the h-hop neighborhood of node u in grph. A(u, l) = h i= α i d(u,v)=i I(l L(v)), () where I(l L(v)) is n inditor funtion whih tkes vlue one when l is in the lbel set of v nd zero otherwise. d(u, v) is the distne between u nd v. α is onstnt lled the propgtion ftor. It is between 0 nd, whose optimum vlue will be disussed lter. Eq. 2 onfines Eq. to n embedding f in G by only onsidering the verties nd the shortest pths in f. A f (u, l) = h i= α i v V f,d(u,v)=i I(l L(v)). (2) Using this informtion propgtion model, we shll formulte the neighborhood-bsed ost funtion. 3.2 Neighborhood-bsed Cost Funtion Given query grph Q nd its embedding f in the trget grph G, we n pply the informtion propgtion model to propgte lbels in Q nd f. Sine verties in f might not be diretly onneted, we will onsider ll of the shortest pths onneting these verties during propgtion. To derive the neighborhood-bsed ost funtion C N (f), we first ompute the differene between the neighborhood vetors R f (u) nd R Q(v), representing the neighborhoods u nd v in the embedding nd the query grph, respetively. C N (v, u) = l R Q (v) M(A Q(v, l),a f (u, l)), (3)

4 where M(x, y) is positive differene funtion s given below. { x y, if x>y; M(x, y) = 0, otherwise. The reson to dpt positive differene funtion is tht if the embedding f ing rries more lbels thn Q, we shll not penlize it. Only when there re lbels nd edges missed in f, C N(v, u) will return positive vlue. Note tht, the summtion in Eqution 3 is onsidered over ll lbels l present in R Q(v), i.e.{l : A Q(v, l) > 0}. For brevity, we simply denote this by l R Q(v) in Eqution 3, nd the sme nottion will be used in the remining of the pper. Given n embedding f, we ggregte the differenes for ll pirs (v, u), whereu = f(v). The neighborhood bsed grph similrity ost C N (f) is given s follows. C N(f) = C N(v, f(v)) (4) v V Q f u 2 b u f 2 v G u 3 b v 2 b u 2 Q Figure 4: Neighborhood Bsed Similrity Cost f b d G Figure 5: Exmple of Flse Positive Figure 4 provides n exmple of neighborhood bsed grph mthing ost. In grph G, lbel b is propgted to node u from node u 2 nd u 2, vi the orresponding shortest pths respetively. Assume α =0.5 nd h =2,wehveA G(u,b)= = We n derive the neighborhood vetors for other nodes in G: R G(u )={ b, 0.75,, 0.5 }, R G(u 2)={, 0.5,, 0.25 }, R G(u 3)={, 0.5, b, 0.75 } nd R G(u 2)={, 0.5,, 0.25 }. Similrly, R Q(v )={ b, 0.5 } nd R Q(v 2)={, 0.5 }. In Figure 4, we hve two possible embeddings f nd f 2. R f (u ) = { b, 0.5 } nd R f (u 2)={, 0.5 }. Hene, C N(f )=( ) + ( ) = 0. For f 2, we mth v to u nd v 2 to u 2.WehveR f2 (u )={ b, 0.25 } nd R f2 (u 2)={, 0.25 }. Therefore, C N(f 2)=( ) + ( ) = 0.5. Note tht, for the embedding f 2, node u 3 will not ontribute ny lbels to R f2 sine it does not prtiipte in the mthing. However, it is on the shortest pth from u 2 to u, thus propgting lbels between u 2 nd u. We must mention tht the vetoriztion of the neighborhoods nd the omprison mong these vetors n be done in vrious wys. However, the finl ost funtion must stisfy the bsi property of C (Property ) to void flse negtives for ext embeddings. The following theorem shows tht C N follows this property. THEOREM. For n ext embedding f e, C N (f e)=0. PROOF. For n ext embedding f e,if(v,v 2) E Q,then (f e(v ), f e(v 2)) E G. Thus, the shortest distne between the node pirs f e(v ),f(v 2) in f e nnot be higher thn the shortest distne between the node pirs v,v 2 in Q. Hene, it follows from Eq. tht l, v, A f (f e(v),l) A Q(v, l). Therefore, bsed on Eq. 3ndEq.4,C N(f e)=0. b Q d Theorem ensures tht there is no flse negtives for ext embeddings. However, there might be some flse positives s shown in Figure 5. In this exmple, if h =, C N(f) =0, lthough f is not n ext embeddings of Q. Fortuntely, if we inrese h to 2, C N(f) > 0. In rel-life grphs tht hve low utomorphism nd more distint lbels in nodes, flse positives n mostly be voided, s shown in our experiments nd in the following Lemm. LEMMA. Given grph G nd query grph Q, ifehof their nodes hs distint lbel, for ny inext embedding f, h > 0,α>0, C N(f) > 0. PROOF. Omitted. Our definition of neighborhood-bsed ost funtion is robust ginst struturl differenes nd other forms of noises. As long s two lose lbels in query grph re lose enough in the trget grph, we onsider it s potentil mth. We n lso rnk the embeddings bsed on the proximity of their lbels in the trget grph ompred to tht in the query grph. Thus, even if there exists no ext embedding of the query grph, the ost funtion n identify the losely pproximte mthes nd rnk them bsed on their struturl differenes. We formlly define our problem sttement s follows. PROBLEM STATEMENT 2. [Neighborhood-Bsed Top-k Similrity Serh] Given trget grph G nd query grph Q, find the top-k embeddings with respet to the ost funtion C N. In the following disussion, we show tht the bove problem is NP-hrd by reduing the lique problem to it. LEMMA 2. Given grph G nd query grph Q, u V G,v V Q, L(u) =, L(v) =,ifqisomplete grph, then for ll inext embeddings f, C N (f) > 0. PROOF. Sine u V G, v V Q, L(u) =, L(v) =,for ny inext embedding f, eh node u = f(v) hs only one lbel, whih is sme s the lbel of node v in Q. Sine, Q is omplete grph, there exists t lest one node f(v) in f nd lbel l suh tht the number of -hop neighbors of v in Q tht hs lbel l is more thn the number of -hop neighbors of f(v) in f with lbel l. Hene, A Q(v, l) >A f (f(v),l). Therefore, it follows from the definition of C N tht, C N(f) > 0. THEOREM 2. Neighborhood-Bsed Top-k Similrity Serh is NP-hrd. PROOF. Let us onsider the se where L(u) =, L(v) =, u V G,v V Q,ndQisomplete grph. Suppose the top- mth f n be identified in polynomil time. Given f, it n lso be verified in polynomil time, whether C N (f) =0. Now, if C N(f) =0, by Lemm 2, there exists lique of size of Q in the trget grph G. So, it is possible to solve the lique problem in polynomil time. However, we know tht, the lique deision problem is NP-hrd [0], therefore we hve ontrdition. Hene, the similrity serh problem is NP-hrd. The grph isomorphism problem is neither known to be solvble in polynomil time nor NP-omplete. However, given two grphs Q nd G of sme size, it is possible to determine in polynomil time, if G itself is n embedding of Q with ost C N(f) =0.We ll this problem s the Grph Similrity Mth problem. Thus, we suspet tht neighborhood-bsed similrity serh might hve lower time omplexity thn grph theoreti mesures suh s grph isomorphism nd edit distne.

5 THEOREM 3. Grph Similrity Mth is polynomil in n,where n = V Q. PROOF. SineG itself is n embedding f of Q, we n determine the individul node mthing osts C N (v,u) in polynomil time, for ll v V Q, u V G. Next, we onstrut flow network nd determine the minimum ost of mximum flow in tht network (see Figure 6). From the soure node s, dd direted edge to eh node v in Q. The pity of eh of these edges is nd the ost is 0. Similrly, from eh node u in G, dd direted edge to the sink node t. The pity nd ost of eh of these edges re nd 0 respetively. From eh node v in Q, dd direted edge to eh node u in G, ifl(v) L(u). The pity nd ost of this edge re nd C N(v, u) respetively. Due to the pity onstrints, eh node in Q n be mthed with t most one node in G, nd lso only one node of Q n be mthed with sme node in G. Clerly, if the mximum flow in this network is n nd the minimum ost of the mximum flow is 0, then G is n embedding of Q with ost C N(f) =0. However, this flow problem n be solved using the Ford nd Fulkerson lgorithm [] in O(n 3 ) time. Therefore, given two grphs Q nd G of the sme size, it is possible to determine in polynomil time, if G itself is n embedding f of Q with ost C N(f) =0. follow, A G(u, l) = h n i (l)α i (l) i=2 < n2 (l)α 2 (l) n(l)α(l) To void flse positive, we wnt A G(u, l) <A Q(v, l) =α(l) s shown in Figure 7. Hene, α(l) <. n(l)+n 2 (l) In the next setion, we will introdue n itertive method to find the top-k embeddings in lrge grph. 4. SEARCH ALGORITHM In this setion, we introdue slble itertive pproh to find the top-k grph embeddings. Our gol is not to enumerte ll the possible embeddings f in G for given query grph, whose ost is prohibitive. Insted of enumerting f, we diretly use A G(u, l) to bound A f (u, l) sine A G(u, l) A f (u, l). LEMMA 3. Given query grph Q nd its embedding f in G, l, u V f, A G(u, l) A f (u, l). PROOF. Omitted. (5),0 s Q v v 2,C N(v,u ) u u 2 G,0 t Lemm 3 shows tht A G(u, l) in the neighborhood vetor R G(u) nnot be lower thn A f (u, l) of the sme lbel l in the neighborhood vetor R f (u), wheref is subgrph of G. THEOREM 4. Given query grph Q nd its embedding f in G, M(A Q(v, l),a G(f(v),l)) C N (f) v V Q l R Q (v) v n u n PROOF. It follows from Lemm 3 so tht M(A Q(v, l),a f (u, l)) M(A Q(v, l),a G(u, l)). Figure 6: Flow Network to Solve Grph Similrity Mth 3.3 Propgtion Ftor: α In the informtion propgtion model desribed in Eq., the propgtion ftor, α, should be less thn in order to reflet the reltion tht the strength A(u, l) of lbel l t node u dereses with the inrese of distne. However, we find the top-k embeddings by repetedly mthing the individul nodes from G nd Q tht stisfies ost threshold ɛ (The detiled proedure will be disussed in the next setion). Now, if α is lrge, eh node will propgte high frtion of lbels to its neighbors nd this n inrese the number of flse positives t the initil node mthing stge, thus slowing down the overll serh proess. In Figure 7, for α =0.5 nd h =2,wegetR G(u) ={, } = {, 0.5 } nd R Q(v) ={, 0.5 }. Thus, node u G will be reported s mth of node v Q even for ost threshold ɛ =0. Clerly, this is flse positive. To solve this problem, we do not employ uniform propgtion ftor for different lbels. Insted, for eh lbel l, we selet n optimum α(l). For given lbel l, let us ssume tht, the mximum number of one-hop neighbors with lbel l, of ny node in G is n(l). To onsider the worst se, let us ssume tht, some node u in G hs no one-hop neighbor with lbel l; but it hs n 2 (l) two-hop neighbors with lbel l, n 3 (l) three-hop neighbors with lbel l nd so on. Therefore, the strength of lbel l t node u in G will be s Theorem 4 shows tht without enumerting embeddings of Q in the trget grph G, we n derive the lower bound: M(A Q(v, l), A G(u, l)), whereu is possible mth of v in G. u G A G(u, ) = 0.5 Figure 7: High α v Q A Q(v, ) = 0.5 Flse Positive for u b d G b Figure 8: Node Mthing Exmple Our lgorithm works by itertively pruning unpromising nodes in the trget grph.. Mth the individul nodes of the query grph with some nodes in the trget grph, whih stisfies predefined ost threshold ɛ (See Eq. 7). 2. Disrd the lbels of the unmthed nodes in the trget grph. 3. Propgte the lbels only mong the mthed nodes from the previous step. Reompute the neighborhood vetors R G(u) only for the mthed nodes. Repet Step until onvergene. u b b d Q v

6 During eh itertion, we remove the lbels of the unmthed nodes in the trget grph G nd then reompute the neighborhood vetors only for the mthed nodes. Sine the modified trget grph hs more unlbeled nodes ompred to the previous itertion, it will derese A G(u, l). With this new nd redued set of neighborhood vetors nd using the sme ost threshold ɛ, we determine the individul node mthes with the nodes of the query grph. Therefore, some dditionl nodes in G will be unmthed t eh itertion. The itertion ontinues until there is no unmthed nodes found. For rel life grphs, with less utomorphism nd more distint lbels, we n unlbel most of the unpromising nodes using this tehnique. Thus finding the top-k embeddings from the set of remining mthed nodes of G beomes lmost trivil. To determine the runtime omplexity of our itertive serh lgorithm, let us denote the number of promising nodes present before i-th itertion s n i nd the number of unpromising nodes disovered t i-th itertion s k i;wherei. Clerly, n = n nd n i+ = n i k i. If there re totl r itertions, r i= ki = O(n). Let the omplexity of itertion i be T i. In the first itertion, for eh node, it needs to propgte its lbels t h hops. Thus, T = O(nld h ),wherel is the verge number of lbels, d h is the verge number of h-hop neighbors for eh node in G. However, for eh of the subsequent itertions, it is not neessry to perform suh propgtion for ll the nodes in the grph. Rther, the number of unpromising nodes t itertion i +, fori, n be determined by either propgting the remining n i+ nodes lbels, or by subtrting the effet of k i unpromising nodes from previous itertion. Hene, T i+ = O(min{n i+,k i}ld h ),fori. Therefore, the overll runtime omplexity of our serh lgorithm is given s follow. T + r i=2 r T i = O(nld h )+ O(min{n i+,k i}ld h ) i= r = O(nld h )+ O(k ild h ) = O(nld h ) (6) In prtie, it onverges muh fster. Next, we shll disuss the detils of the itertive lgorithm nd the lgorithm to find the top-k embeddings from the nodes filtered by the itertive lgorithm. 4. Node Mth Given the trget grph G nd the query grph Q, we ompute the vetors R G(u) nd R Q(v) for ll nodes u V G,v V Q, onsidering their h-hop neighborhoods. For eh node pir u V G,v V Q,s.t. L(v) L(u), we lulte the node mthing ost, ost(u, v) s the differene of their neighborhood vetors, ost(u, v) = M(A Q(v, l),a G(u, l)). (7) l R(v) Figure 8 shows n exmple. Assume α = 0.5 nd h = 2. We get R G(u) ={ b, 0.5,, } = { b, 0.5,, 0.5 }, nd similrly, R G(u )={ b,,, 0.25 }. Menwhile, for the query grph Q, wehver Q(v) ={ b, 0.5,, 0.25 }. Hene, ost(u, v) =0nd lso ost(u,v)=0following the bove eqution. Now, for eh node v V G, we mintin list of nodes u V G, suh tht L(v) L(u) nd ost(u, v) ɛ. Here, ɛ is predefined ost threshold. The vlue of ɛ will be disussed shortly. i= 4.2 Top-k Serh In order to find the top-k grph embedding, we initilize the ost threshold ɛ to smll vlue ɛ 0 0 nd perform the bove mentioned itertive proedure until it termintes. Given the mthed nodes, if we nnot find t lest k embeddings from them, with ost C N(f) ɛ V Q eh; then the threshold ost ɛ is doubled nd we repet the bove proedure, until the k embeddings re found. Otherwise, we find the top-k embeddings mong the mthed nodes. Note tht, t this point, ny embedding formed by ll unmthed nodes will hve ost C N(f) >ɛ V Q. However, it is possible to hve some embedding with few mthed nd unmthed nodes, nd the ost of suh embeddings might lso be C N (f) ɛ V Q. The problem is eliminted s follow. We set ɛ equl to the highest ost of the disovered top-k embeddings nd then run the lgorithm gin (this step will find top-k embeddings whose node ost might be higher thn ɛ). In this se, ny embedding formed by t lest one of the unmthed node will hve ost more thn tht of ny of the top-k embeddings found erlier. Hene, the top-k embeddings identified only using the mthed nodes will be the best top-k embeddings. The omplete lgorithm is given below. Algorithm Top-k Serh Input: Trget grph G, query grph Q, positive integer k. Output: Top-k mthes f bsedontheostmetric N. proedure : ɛ ɛ 0, ompute R G(v), v V Q 2: list 0(v) ={u : u V G L(v) L(u)} 3: i, strt with originl grph G nd ompute R G(u), u V G 4: for ll v V Q do 5: list i(v) ={u : u V G L(v) L(u) ost(u, v) ɛ} 6: end for 7: (list, i) =Itertive Unlbel(list, i, G, Q) 8: if k mthes of ost C N (f) ɛ V Q n be found in {u : u list i(v) v V Q} then 9: report top-k mthes nd stop 0: else : ɛ 2ɛ 2: go bk to step 2 3: end if Algorithm 2 Itertive Unlbel (list,i,g,q) proedure : if list i(v) < list i (v) for some v V G then 2: for ll u V G do 3: if u list i(v) v V Q then 4: unlbel u 5: end if 6: end for 7: reompute R(u) u V G 8: (list, i) =Itertive Unlbel(list, i +,G,Q) 9: else 0: return (list, i) : end if From the finl list of mthed nodes for eh node in V Q,how n we find embeddings with ost C N(f) ɛ V Q eh (line 8 of Algorithm )? One simple tehnique is to onsider ll possible ombintions from the lists nd verify their osts. When the number of mthed nodes in eh of the finl lists is smll, it is not time

7 onsuming to hek. However, when the lists re long, we n do better thn brute fore enumertion using dynmi progrmming. After finl list of mthed nodes list(v) for eh v V Q is generted, we perform the propgtion one more mong the mthed nodes; however this time we propgte the node id s insted of lbels. After this propgtion, eh mthed node u in G will hve its neighboring nodes (denoted s neighbor(u)) within h hops who hve influene on the ost (Eq. ). The finl embeddings n be formed s follows. We selet node u list(v) for some v V Q nd initilize set P ossible_mth = neighbor(u). We hve two situtions: () within h hops of u, there is no f(v ) v v in Q. (2) v v of Q, wetryto identify mth u inside P ossible_mth nd extend this set by dding neighbor(u ) nd lso eliminting the node u from P ossible_mth. For the first sitution, we ould derive the ost for node u, l L(v) AQ(v, l). We n reurse mong these two situtions to find the embeddings. In this wy, we n find the low-ost embeddings without enumerting ll possible ombintions mong the nodes in the finl lists. 5. INDEXING The most expensive prts of Ness re the omputtion of R G(u) for ll u in G (Line 3 of Algorithm ) nd the determintion of list (v) for ll v in V Q (Line 5 of Algorithm ). However, the omputtion of R G(u) n be done off-line by performing bredth first serh up to h-hops from eh node in G. Its time omplexity is O( V G d h ),whered is the verge degree of eh node. To speed up the omputtion of list (v) for ll v V Q,weuse two types of simple index strutures. In the first type of indexing, we build hsh tble orresponding to eh lbel. The nodes in G re hshed bsed on their lbels. Given query node v, weuse this hsh struture to quikly identify the set of possible mthes u, suh tht L(v) L(u). If the lbels of v re very seletive, there will be limited number of possible mthes u nd we n quikly determine the nodes u mong these mthes, for whih ost(u, v) ɛ. Algorithm 3 Neighborhood Bsed Indexing Off-line Proedure : pre ompute R G(u) ={ l, A G(u, l) } for ll u V G 2: for ll lbel l do 3: rete sorted list S(l) of nodes in desending order of A G(u, l), suh tht u i(l) is i-th node in S(l) 4: end for On-line Proedure : i 2: sum(i) M(A Q(v, l),a G(u i(l),l)) l R(v) 3: if sum(i) ɛ then 4: i i + 5: go to step 2 6: else 7: verify u j(l) if ost(u j(l),v) ɛ, j<i, l R Q(v) 8: end if However, if the lbels of v re not very seletive nd there re mny possible mthes using the hshing tehnique disussed bove, we use the seond index struture, whih is built on the neighborhood vetor R G(u) following the priniple of Threshold Algorithm [2]. The neighborhood vetor R G(u) ={ l, A G(u, l) } for eh node u V G is pre omputed. Next, for eh lbel l, we generte sorted list S(l) of nodes u in desending order of their A G(u, l) l R Q (v) vlues. Let us denote the node t position i from the top of S(l) s u i(l). In the online phse, we strt from the top of the eh sorted list S(l) in prllel nd go to the next position in the subsequent itertion. For some position i from the top, we ompute, sum(i) = M[A Q(v, l),a G(u i(l),l)]. Assume t itertion i = i, sum(i ) beomes greter thn the ost threshold ɛ. Then, we terminte this itertive proedure nd verify for ll nodes u j(l), wherej < i,l R Q(v), ifost(u j(l),v) ɛ. For eh v V Q, we need to verify only O((i ) l ) nodes for their ost; where l denotes the number of lbels in R Q(v). This n redue the omplexity of the online lgorithm signifintly. The omplete proedure for neighborhood bsed indexing is given in Algorithm 3. Proof of Corretness. Let us denote S i(l) s ll the nodes up to position i from top of the sorted list S(l), i.e.s i(l) ={u j(l), j i}. The following lemm will be useful to prove the orretness of our indexing lgorithm. LEMMA 4. If sum(i) >ɛ, then for ll u {S i (l) :l R Q(v)}, ost(u, v) >ɛ. PROOF. It follows diretly from the ft tht, eh S(l) is sorted list of nodes u in desending order of A G(u, l) vlues. Therefore, in Algorithm 3, we strt from i = nd find the smllest i, forwhihsum(i) >ɛ. Following the previous lemm, for ny node u {S i (l) :l R Q(v)}, we n eliminte them without tully omputing ost(u, v). We note tht, our indexing n be esily implemented in diskbsed mnner for very lrge grphs. Also we n pply externl memory bredth first serh lgorithms, e.g., Ulrih Meyer [] nd Lrs Arge [2], to ompute the neighborhood vetors R G(u) for ll the nodes. Dynmi Updte. Our indexing struture n effiiently ommodte dynmi updtes in G, i.e., insertion/ deletion of nodes, edges nd lbels. If node u is dded or deleted in G, it will only hnge the vetors of u s h-hop neighbors. We only need to propgte the lbels of these nodes nd modify their neighborhood vetors. They lso need to be updted in the sorted lists of lbel l for ll l L(u). The ddition/ deletion of lbel n be hndled similrly. If n edge (u,u 2) is dded/ deleted in G, we need to updte vetors for the h hop neighbors of both u nd u QUERY OPTIMIZATION In this setion, we eliminte the non-disrimintive lbels both from the trget nd query grphs t the initil stge of our mthing lgorithm to mke the tehnique more effiient. The effiieny of the lgorithm Itertive Unlbel is relted to the number of individul node mthes for eh node in the query grph. If there exists some node whih is not very seletive in terms of its own lbels or the lbels present in its neighborhood, there will be mny mthes orresponding to tht node t the initil stge of our lgorithm. In order to eliminte the problem posed by these nodes, we first eliminte ll the non-disrimintive lbels both from the trget grph nd the query grph, nd then we lso ignore the nodes in the query grph, whih do not ontin suffiient number of disrimintive lbels in themselves nd in their neighborhoods. These nondisrimintive lbels re onsidered t the lst stge of our mthing lgorithm, i.e., when we serh for the finl mthes. In the following disussion, we shll lrify the notion of disrimintive nd non-disrimintive lbels in the perspetive of node nd grph mthes.

8 ? Sheil MCrthy? Andre Mgi in the Wter () Query Mrth Plimpton? John Stephen Wters Spielberg () Query Drren E. Burrows Thoms Burstin S. MCrthy Thoms Burstin S. MCrthy Peker The Goonies Cry-Bby Amistd Andre Andre Mgi in the Wter The Lotus Eter Mgi in the Wter Bright Angel John Stephen Wters Spielberg John Wters Stephen Spielberg (b) Mth_ () Mth_2 (b) Mth_ () Mth_2 Figure 0: Top-2 Mthes (Query ) Figure : Top-2 Mthes (Query 2) # of nodes () hevy-hed Pruned Not Pruned A Q(v, l) A G(u, l) # of nodes A Q(v, l) (b) hevy-til A G(u, l) Figure 9: Disrimintive (Hevy-Hed) vs. Non-Disrimintive (Hevy-Til) Distribution Let us onsider the distribution of A G(u, l) vlues of some lbel l, <l, A G(u, l)> R G(u), for different nodes u V G. Figure 9 shows one exmple. For lbel l, we plot the different A G(u, l) vlues long the X-xis. The Y -xis shows the number of nodes u hving tht prtiulr A G(u, l) vlue in their neighborhood vetor R G(u). The distribution in Figure 9() is skewed towrds the smller vlues of A G(u, l), wheres Figure 9(b) is skewed towrds the higher vlues of A G(u, l). We ll them s hevy-hed nd hevy-til distributions respetively. Given query node v, sine we prune ll the nodes u in G for whih l R Q (v) M[A Q(v, l),a G(u, l)] >ɛ, the lbels with hevy-hed distribution hve more pruning power thn those with hevy-til distribution. Therefore, we should retin lbels with hevy-hed distribution for node mth, s those lbels re more disrimintive. 7. EXPERIMENTAL RESULTS In this setion, we present the experimentl results to demonstrte the effetiveness nd the effiieny of the neighborhood bsed similrity serh tehnique on number of rel-life nd syntheti grph dtsets inluding DBLP, Intrusion, Freebse nd WebGrph. In order to evlute the effetiveness, we show two possible pplitions - RDF query nswering nd network lignment. We test the robustness of our pproh by providing the ury of the best mthes for queries of different sizes nd under the presene of rndom noise. The effiieny nd slbility of our pproh re lso investigted. All experiments re performed using single ore in 40GB, 2.50GHz Xeon server. 7. Grph Dt Sets DBLP Collbortion Grph. The DBLP ollbortion grph is downloded from ley /db. There re 684K distint uthors nd 7M o-uthor edges mong them. We onsider the nme of eh uthor s the lbel of tht node. There re 683, 927 distint lbels in DBLP. We use the DBLP dtset for effiieny test. Freebse Entity Reltionship Grph. Freebse is lrge ollbortive knowledge bse of strutured dt hrvested from mny soures inluding Wikipedi. We downloded the film entity reltionship grph dt from / This grph hs 72K nodes, eh representing n entity, i.e., tor, movie, diretor, produer nd so on. An edge represents the reltionship between two entities. Nmes of entities re treted s lbels. There re totl 579K edges nd 59, 54 distint lbels in this grph. Freebse grph is used for effetiveness, robustness nd effiieny nlysis. Intrusion Alert Network. This network ontins the nonymous log dt of intrusion lerts in omputer network. It hs 200K nodes nd 703K edges where eh node is omputer nd n edge mens possible ttk suh s Denil-of-Servie nd TCP Servie Sweep. Eh node hs 25 lbels (omputer generted lerts in this se) on verge. There re round, 000 types of lerts. We use this grph for robustness nd effiieny experiments. WebGrph with Syntheti Lbels. We downloded the uk web grph dt from [4]. This web grph is olletion of UK web pges. For our experiments, we use subset tht ontins 0M pges (i.e. nodes) nd 23M hyperlinks (i.e. edges). We uniformly ssign 0, 000 synthetilly generted lbels ross vrious nodes, suh tht eh node gets one lbel. We test the slbility of our pproh on this grph. 7.2 RDF Query Answering In ddition to the query shown in Figure, we show two more exmples using the Freebse grph dtset. Query : Who did inemtogrphy for t lest two Sheil M- Crthy movies, one of them being Andre? The person ws lso inemtogrpher of the movie Mgi in the Wter. Here, we would like to emphsize tht, Sheil MCrthy did not t in the movie Andre. However, s disussed erlier, this type of inury is ommon, sine the user my not hve the -

9 ACCURACY () Aury (Intrusion) ERROR RATIO (b) Error Rtio (Freebse) ERROR RATIO () Error Rtio (Intrusion) Figure 2: Robustness of Network Alignment AVG # OF ITERATIONS () Top-k Serh (Algorithm ) AVG # OF ITERATIONS (b) Itertive Unlbel (Algorithm 2) SEARCH TIME (SEC) () Online Serh Time Figure 3: Convergene of Online Serh Algorithm (DBLP) urte informtion, or there n be some noises in the trget grph. Using our pproh, we get the following top-2 nswers for this query, s shown in Figure 0. Query 2: Whih tors hve ppered in both "John Wters" movie nd "Steven Spielberg" movie? The query nd the orresponding top-2 mthes re shown in Figure. Here, we would like to emphsize tht, tors in the Freebse dtset re not diretly onneted with the diretors nd inemtogrphers; rther vi some movies. To write SPARQL query, we need to mintin this struturl property. However, given the query grph s shown in Figure, whih does not mintin this struturl property; we still obtin the results, where the embeddings re very lose to the query grph. 7.3 Network Alignment We perform network lignment for query grphs of different sizes nd in the presene of vrious mount of noise. For these experiments, three different sets of query grphs re used with dimeters 2, 3, 4 nd the number of nodes 00, 50, 200 respetively. These query sets will simulte the sitution when we lign smll soil network to lrge one. In eh query set, we rndomly selet 00 subgrphs with the speified dimeters nd nodes from the originl grph dtsets. Then we introdue noise by dding edges to the query grphs, whih re not present in the originl grph. The noise rtio is defined s the number of edges dded divided by the originl number of edges present in the query grph. We use propgtion depth 2 nd α is seleted s desribed erlier in Setion 3.3. The robustness of our pproh in the presene of rndom noise is mesured using two metris. The ury is defined s the number of orretly identified nodes of the trget grph in ll the top- mthes divided by the totl number of nodes in ll query grphs in the orresponding query set. The ury is for both DBLP nd Freebse dtsets with different mounts of noise, sine these grphs hve more number of distint lbels. The ury vs. noise rtio plots for Intrusion dtset is shown in Figure 2(). The ury remins t reltively high level when the noise rtio inreses up to 0.2. We lso mesure the error rtio, whih is defined s the number of inorretly identified nodes of the trget grph in ll the top- mthes divided by the totl number of nodes in ll query grphs in the orresponding query set. The lower is the error rtio, the more distinguishble the nodes re in terms of their neighborhood struture nd ontents. The error rtio remins lose to 0 for DBLP grph t different mount noise. The error rtio vs. noise rtio plots for Freebse nd Intrusion re shown in Figure 2(b) nd 2() respetively. It n be observed tht the error rtio remins t reltively low level for Freebse grph, when the noise rtio inreses up to 0.2. Hene, these experiments indite tht DBLP nd Freebse is less utomorphi ompred to the Intrusion network. 7.4 Effiieny Results We provide the running time of our lgorithm for different dtsets in Tble. For these experiments, we rndomly selet query grphs with50 nodes nd dimeter 2 from the originl grph dtsets. The vetoriztion nd indexing is performed with propgtion depth 2 nd the serh lgorithm is used to identify the top- mthes. It n be observed tht our lgorithm is very effiient for lrge grph dtsets. The on-line phse for Intrusion grph requires more time beuse the verge number of lbels per node is muh higher thn tht in other grphs. This leds to more time used for ost omputtion (Eq. (7)). We lso verify the onvergene rte of our Top-k Serh nd Itertive Unlbel lgorithms for vrious network lignment experiments disussed erlier. The onvergene rte of these lgorithms is mesured s the verge number of itertions required before they terminte. When the noise rtio is inresed, our lgorithm requires more itertions to stisfy the ost threshold. Thus, the orresponding running time lso inreses s shown in Figure 3 for the DBLP dtset. Moreover, it requires more time to identify the

10 AVG # OF ITERATIONS () Convergene (Freebse) SEARCH TIME (SEC) (b) Serh Time (Freebse) AVG # OF ITERATIONS () Convergene(Intrusion) SEARCH TIME (SEC) (d) Serh Time (Intrusion) Figure 4: Convergene of Online Serh Algorithm (Freebse & Intrusion) mthes of lrger query grph. The onvergene plots for Freebse nd Intrusion networks re given in Figure 4. Dtset 2-hop Indexing Top- Serh (Off-line) (Online) DBLP, 733 se 0.06 se (0.7M, 7M, 0.7M) Freebse 280 se 0.22 se (0.2M, 0.6M, 0.2M) Intrusion 227 se.6 se (0.2M, 0.5M, K) WebGrph 5, 25 se 0.26 se (0M, 23M, 0K) Tble : Effiieny: Off-line Indexing nd Online Serh 7.5 Neighborhood-bsed Cost Funtion Properties Rell tht we proved in Theorem tht our neighborhood-bsed ost funtion ensures there is no flse negtives when the ost threshold is set to 0. In this subsetion, we investigte the flse positive rte by using our neighborhood-bsed ost funtion with threshold set to 0. This experiment is performed on DBLP, Freebse nd Intrusion dtsets. In prtiulr, for eh dtset, we selet 00 smll query subgrphs with 0 nodes eh from the originl grph. For eh of the query grphs, by using 2-hop propgtion, we identify ll mthes with ost = 0. Among these mthes, we mnully verify if there is ny flse positives, i.e. mth whih is not grph isomorphi with the query grph. The perentge of flse positives is lulted s the number of flse positives divided by the totl number of mthes obtined. We show the results in Tble 2. It n be seen tht using our ost funtion with ost threshold set to 0, the perentge of flse positives on rel-life soil/ informtion networks is very smll. Dtset Flse Positive DBLP 0% Freebse 0% Intrusion 0.3% Tble 2: Flse Positive Rtio Dtset Serh with Serh w/o Index&Op- Index&Optimiztion timiztion DBLP 0.06 se 9.63 se Freebse 0.22 se.75 se Tble 3: Benefits of Index nd Optimiztion As we hve disussed erlier, the higher the vlue of h is, the lower the number of flse positives will be. Therefore, for trget grph, we n employ error rtio s ost funtion nd lern the stisftory vlue of h from trining queries generted from the trget grph. DBLP grph is used in this experiment. We use trining set of 00 smll query grphs (with 0 nodes eh) generted from the DBLP grph. The queries re generted in suh wy tht the lbels in the query nodes re mostly not unique. Some noise is lso dded in these query grphs s explined erlier. Next, we strt with h =0nd grdully inrese h until the error rtio beomes less thn smll vlue. We show the results for DBLP grph in Figure 5. It n be observed tht, by setting h =2, we n redue the error rtio to n eptble level when the noise rtio is below 0.. This indites tht for the rel-life soil/ informtion networks with few uto-morphism nd mny distint lbels, we only need smll propgtion depth to mke the error rtio lose to zero. 7.6 Pruning Cpity of Serh Algorithm We verify the pruning pity of our Top-k serh lgorithm with respet to the number of distint lbels present in the trget grph. For this experiment, we use subgrph extrted from the WebGrph dtset, whih ontins, 000 nodes nd 4, 067 edges. We vry the number of distint lbels from to 800. Given rndomly extrted query grph with the number of nodes V Q = 8, 0 nd 2 respetively, we hek how mny subgrphs need to be verified during the finl mth phse of our pproh. The smller this number is, the more powerful the pruning of our lgorithm is. We plot the number of subgrphs need to be verified in the finl mth phse vs. the number of distint lbels in Figure 6. Note tht the Y xis is in log sle. It n be observed tht, when there is only distint lbel in the entire grph, we need to verify bout 0 25 subgrphs for query grph with 8 nodes during the finl mth phse. However, s the number of distint lbels inreses, the number of subgrphs tht we need to verify dereses rpidly. For 800 distint lbels, we only need to verify very smll number of subgrphs (e.g. 2 subgrphs when V Q =8) in the finl mth phse of our pproh. Thus, our lgorithm n be very effiient on grphs with few utomorphisms nd mny distint lbels. 7.7 Indexing nd Query Optimiztion In Tble 3, we ompre the running time of our online serh lgorithm with tht of liner sn with no indexing nd query optimiztion. Eh of the query grphs hs 50 nodes nd dimeter 2 for this experiment. It n be observed tht, our indexing nd query optimiztion tehniques n signifintly speed up online serh. We lso ompre the index onstrution time of dynmi updte with the ost of rebuilding the whole index when the trget grph is modified. The propgtion depth is 2 for these experiments. The results for DBLP dtset re shown in Figure 7. As we n see, for wide rnge of updtes in the trget grph, it is more effiient to updte the index struture rther thn re-indexing the grph. The

11 ERROR RATIO = 0 = 0.05 = 0.0 = PROPAGATION DEPTH # OF SUB GRAPHS (0 x ) V Q =8 V Q =0 35 V Q = # OF DISTINCT LABELS TIME (SEC) Dynmi Updte Re-Index % NODE UPDATE Figure 5: Stisftory h Vlue (DBLP) Figure 6: Pruning Cpity (WebGrph) Figure 7: Dynmi Updte Index (DBLP) results lso indite tht our index struture is very effiient ginst dynmi updtes in the trget grph. 7.8 Slbility We show the slbility of our pproh on the WebGrph dtset. The vetoriztion time s funtion of the number of nodes in the grph is shown in Figure 8(). Figure 8(b) shows the hnge trends of the online serh time with respet to the number of nodes. The propgtion depth is 2 for indexing nd we identify the top- mthes using our serh lgorithm. Eh of the query grphs hs 0 nodes nd dimeter 3 for this experiment. As it n be observed, for grph with 0 million nodes, our pproh n return the top- mth in 0. seond. The orresponding index building time is lso tolerble. Both the index building time nd the online serh time is roughly liner in the number of nodes. These results show tht our tehnique is highly slble for lrge sle informtion/ soil networks. TIME (SEC) # OF NODES (M) () Vetoriztion Time TIME (SEC) # OF NODES (M) (b) Serh Time For subgrph serh, Shsh et l. [3] extend the pth-bsed tehnique for full-sle grph retrievl; Yn et l. propose gindex [37] using frequent subgrphs. These studies inspired new grph index strutures suh s δ-tolerne Closed Frequent Subgrphs [8], Tree [40], nd GCoding[4]. He et l. [7] develop losure tree index to perform pproximte grph serh. Tin et l. [33] design frgment bsed index to ssemble n pproximte mth. Shng et l. introdue n effiient lgorithm for testing subgrph isomorphism [29]. Ferro et l. propose novel indexing sheme, SING [26], bsed on lolity informtion. All these methods re built stritly on grph strutures, not good for pproximte serh shown in Figure. There hve been signifint studies on inext grph mthing on ttributed grphs [30, 7]. Tong et l. [35] propose the best-effort pttern mthing in lrge ttributed grphs. It finds the best mth not bsed on the proximity mong the lbels, rther bsed on the shpe of the query grph. Tin et l. [34] proposed n pproximte subgrph mthing tool, lled TALE, with effiient indexing nd high pruning pbilities. Mongiovì et. l. introdue set-overbsed inext grph mthing tehnique, lled SIGMA [24]. Both tehniques only use edge misses to mesure the qulity of grph mthing. Therefore, they re not pproprite for the proximity bsed serh senrio studied in this work. There hve been some reent work on inext grph mthing, i.e., simultion bsed ubi time grph pttern mthing [3], homomorphism bsed subgrph mthing [4], Belief propgtion bsed net lignment [3], edgeedit-distne bsed subgrph indexing tehnique [39] nd grph prtition bsed subgrph identifition sheme [6]. Figure 8: Slbility Results (WebGrph) 8. RELATED WORK Grph serh hs been studied in different ontexts suh s grph isomorphism, grph indexing, struture mthing, et. In XML, where the strutures enountered re often trees nd ltties, queries built on pth expression beome populr [28] nd their orresponding indies hve been developed [9]. In bioinformtis, ext nd pproximte grph lignment hs been extensively studied, e.g., PthBlst [2], Sg [33], NetAlign [23], IsoRnk [32]. They re trgeting reltively smll biologil networks with less thn 0k nodes. It is diffiult to pply them in soil nd informtion networks with thousnds or even millions of nodes. Kernel bsed grph mthing tehniques re lso proposed, e.g., ommon wlks [6, 8], shortest pth [5], limited-size subgrphs [9] nd subtree ptterns [20]. Reently, Shervshidze et. l [25] proposed fst subtree pttern kernel bsed on the Weisfeiler- Lehmn method. Kernel methods do not support subgrph serh well. 9. CONCLUSIONS In this pper, we defined new grph similrity mesure, neighborhood bsed grph similrity, nd proposed n informtion propgtion model to onvert lrge network into set of multidimensionl vetors, where sophistited indexing nd similrity serh lgorithms re vilble. We proved, under this mesure, tht subgrph similrity serh is NP hrd, while grph similrity mth is polynomil. We introdued riterion to selet the best propgtion rte with respet to different node lbels in grph. We further investigted the tehniques to index the neighborhood vetors nd to ompress them by deleting non-disrimintive lbels, thus optimizing the query proessing time. The proposed method, lled Ness, is not only effiient, but lso robust ginst struture hnges nd informtion loss. Empiril results show tht it ould quikly nd urtely find high-qulity mthes in lrge networks, with negligible time ost. In future work, it will be interesting to onsider the grph lignment problem, when the node lbels in two grphs re not extly identil, i.e the sme user n hve slightly different usernmes in Febook nd Twitter.

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P. Chpter 7: The Riemnn Integrl When the derivtive is introdued, it is not hrd to see tht the it of the differene quotient should be equl to the slope of the tngent line, or when the horizontl xis is time

More information

Lecture Notes No. 10

Lecture Notes No. 10 2.6 System Identifition, Estimtion, nd Lerning Leture otes o. Mrh 3, 26 6 Model Struture of Liner ime Invrint Systems 6. Model Struture In representing dynmil system, the first step is to find n pproprite

More information

Metodologie di progetto HW Technology Mapping. Last update: 19/03/09

Metodologie di progetto HW Technology Mapping. Last update: 19/03/09 Metodologie di progetto HW Tehnology Mpping Lst updte: 19/03/09 Tehnology Mpping 2 Tehnology Mpping Exmple: t 1 = + b; t 2 = d + e; t 3 = b + d; t 4 = t 1 t 2 + fg; t 5 = t 4 h + t 2 t 3 ; F = t 5 ; t

More information

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem. 27 Lesson 2: The Pythgoren Theorem nd Similr Tringles A Brief Review of the Pythgoren Theorem. Rell tht n ngle whih mesures 90º is lled right ngle. If one of the ngles of tringle is right ngle, then we

More information

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1)

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1) Green s Theorem Mth 3B isussion Session Week 8 Notes Februry 8 nd Mrh, 7 Very shortly fter you lerned how to integrte single-vrible funtions, you lerned the Fundmentl Theorem of lulus the wy most integrtion

More information

Distance-Join: Pattern Match Query In a Large Graph Database

Distance-Join: Pattern Match Query In a Large Graph Database Distne-Join: Pttern Mth Query In Lrge Grph Dtbse Lei Zou Huzhong University of Siene nd Tehnology Wuhn, Chin zoulei@mil.hust.edu.n Lei Chen Hong Kong University of Siene nd Tehnology Hong Kong leihen@se.ust.hk

More information

Part 4. Integration (with Proofs)

Part 4. Integration (with Proofs) Prt 4. Integrtion (with Proofs) 4.1 Definition Definition A prtition P of [, b] is finite set of points {x 0, x 1,..., x n } with = x 0 < x 1

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1.

, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1. Exerise Genertor polynomils of onvolutionl ode, given in binry form, re g, g j g. ) Sketh the enoding iruit. b) Sketh the stte digrm. ) Find the trnsfer funtion T. d) Wht is the minimum free distne of

More information

(h+ ) = 0, (3.1) s = s 0, (3.2)

(h+ ) = 0, (3.1) s = s 0, (3.2) Chpter 3 Nozzle Flow Qusistedy idel gs flow in pipes For the lrge vlues of the Reynolds number typilly found in nozzles, the flow is idel. For stedy opertion with negligible body fores the energy nd momentum

More information

Electromagnetism Notes, NYU Spring 2018

Electromagnetism Notes, NYU Spring 2018 Eletromgnetism Notes, NYU Spring 208 April 2, 208 Ation formultion of EM. Free field desription Let us first onsider the free EM field, i.e. in the bsene of ny hrges or urrents. To tret this s mehnil system

More information

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

Green s Theorem. (2x e y ) da. (2x e y ) dx dy. x 2 xe y. (1 e y ) dy. y=1. = y e y. y=0. = 2 e

Green s Theorem. (2x e y ) da. (2x e y ) dx dy. x 2 xe y. (1 e y ) dy. y=1. = y e y. y=0. = 2 e Green s Theorem. Let be the boundry of the unit squre, y, oriented ounterlokwise, nd let F be the vetor field F, y e y +, 2 y. Find F d r. Solution. Let s write P, y e y + nd Q, y 2 y, so tht F P, Q. Let

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

QUADRATIC EQUATION. Contents

QUADRATIC EQUATION. Contents QUADRATIC EQUATION Contents Topi Pge No. Theory 0-04 Exerise - 05-09 Exerise - 09-3 Exerise - 3 4-5 Exerise - 4 6 Answer Key 7-8 Syllus Qudrti equtions with rel oeffiients, reltions etween roots nd oeffiients,

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

Discrete Structures Lecture 11

Discrete Structures Lecture 11 Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.

More information

CS 573 Automata Theory and Formal Languages

CS 573 Automata Theory and Formal Languages Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple

More information

For a, b, c, d positive if a b and. ac bd. Reciprocal relations for a and b positive. If a > b then a ab > b. then

For a, b, c, d positive if a b and. ac bd. Reciprocal relations for a and b positive. If a > b then a ab > b. then Slrs-7.2-ADV-.7 Improper Definite Integrls 27.. D.dox Pge of Improper Definite Integrls Before we strt the min topi we present relevnt lger nd it review. See Appendix J for more lger review. Inequlities:

More information

Lecture 1 - Introduction and Basic Facts about PDEs

Lecture 1 - Introduction and Basic Facts about PDEs * 18.15 - Introdution to PDEs, Fll 004 Prof. Gigliol Stffilni Leture 1 - Introdution nd Bsi Fts bout PDEs The Content of the Course Definition of Prtil Differentil Eqution (PDE) Liner PDEs VVVVVVVVVVVVVVVVVVVV

More information

University of Sioux Falls. MAT204/205 Calculus I/II

University of Sioux Falls. MAT204/205 Calculus I/II University of Sioux Flls MAT204/205 Clulus I/II Conepts ddressed: Clulus Textook: Thoms Clulus, 11 th ed., Weir, Hss, Giordno 1. Use stndrd differentition nd integrtion tehniques. Differentition tehniques

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Hyers-Ulam stability of Pielou logistic difference equation

Hyers-Ulam stability of Pielou logistic difference equation vilble online t wwwisr-publitionsom/jns J Nonliner Si ppl, 0 (207, 35 322 Reserh rtile Journl Homepge: wwwtjnsom - wwwisr-publitionsom/jns Hyers-Ulm stbility of Pielou logisti differene eqution Soon-Mo

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,

More information

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}

More information

Core 2 Logarithms and exponentials. Section 1: Introduction to logarithms

Core 2 Logarithms and exponentials. Section 1: Introduction to logarithms Core Logrithms nd eponentils Setion : Introdution to logrithms Notes nd Emples These notes ontin subsetions on Indies nd logrithms The lws of logrithms Eponentil funtions This is n emple resoure from MEI

More information

T b a(f) [f ] +. P b a(f) = Conclude that if f is in AC then it is the difference of two monotone absolutely continuous functions.

T b a(f) [f ] +. P b a(f) = Conclude that if f is in AC then it is the difference of two monotone absolutely continuous functions. Rel Vribles, Fll 2014 Problem set 5 Solution suggestions Exerise 1. Let f be bsolutely ontinuous on [, b] Show tht nd T b (f) P b (f) f (x) dx [f ] +. Conlude tht if f is in AC then it is the differene

More information

8 THREE PHASE A.C. CIRCUITS

8 THREE PHASE A.C. CIRCUITS 8 THREE PHSE.. IRUITS The signls in hpter 7 were sinusoidl lternting voltges nd urrents of the so-lled single se type. n emf of suh type n e esily generted y rotting single loop of ondutor (or single winding),

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing

More information

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix

Matrices SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics (c) 1. Definition of a Matrix tries Definition of tri mtri is regulr rry of numers enlosed inside rkets SCHOOL OF ENGINEERING & UIL ENVIRONEN Emple he following re ll mtries: ), ) 9, themtis ), d) tries Definition of tri Size of tri

More information

Section 3.6. Definite Integrals

Section 3.6. Definite Integrals The Clulus of Funtions of Severl Vribles Setion.6 efinite Integrls We will first define the definite integrl for funtion f : R R nd lter indite how the definition my be extended to funtions of three or

More information

Bisimulation, Games & Hennessy Milner logic

Bisimulation, Games & Hennessy Milner logic Bisimultion, Gmes & Hennessy Milner logi Leture 1 of Modelli Mtemtii dei Proessi Conorrenti Pweł Soboiński Univeristy of Southmpton, UK Bisimultion, Gmes & Hennessy Milner logi p.1/32 Clssil lnguge theory

More information

Linear Algebra Introduction

Linear Algebra Introduction Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +

More information

Solutions to Assignment 1

Solutions to Assignment 1 MTHE 237 Fll 2015 Solutions to Assignment 1 Problem 1 Find the order of the differentil eqution: t d3 y dt 3 +t2 y = os(t. Is the differentil eqution liner? Is the eqution homogeneous? b Repet the bove

More information

Nondeterministic Automata vs Deterministic Automata

Nondeterministic Automata vs Deterministic Automata Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n

More information

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4. Mth 5 Tutoril Week 1 - Jnury 1 1 Nme Setion Tutoril Worksheet 1. Find ll solutions to the liner system by following the given steps x + y + z = x + y + z = 4. y + z = Step 1. Write down the rgumented mtrix

More information

Introduction to Olympiad Inequalities

Introduction to Olympiad Inequalities Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................

More information

Comparing the Pre-image and Image of a Dilation

Comparing the Pre-image and Image of a Dilation hpter Summry Key Terms Postultes nd Theorems similr tringles (.1) inluded ngle (.2) inluded side (.2) geometri men (.) indiret mesurement (.6) ngle-ngle Similrity Theorem (.2) Side-Side-Side Similrity

More information

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106 8. Problem Set Due Wenesy, Ot., t : p.m. in - Problem Mony / Consier the eight vetors 5, 5, 5,..., () List ll of the one-element, linerly epenent sets forme from these. (b) Wht re the two-element, linerly

More information

6.5 Improper integrals

6.5 Improper integrals Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =

More information

Generalization of 2-Corner Frequency Source Models Used in SMSIM

Generalization of 2-Corner Frequency Source Models Used in SMSIM Generliztion o 2-Corner Frequeny Soure Models Used in SMSIM Dvid M. Boore 26 Mrh 213, orreted Figure 1 nd 2 legends on 5 April 213, dditionl smll orretions on 29 My 213 Mny o the soure spetr models ville

More information

Maintaining Mathematical Proficiency

Maintaining Mathematical Proficiency Nme Dte hpter 9 Mintining Mthemtil Profiieny Simplify the epression. 1. 500. 189 3. 5 4. 4 3 5. 11 5 6. 8 Solve the proportion. 9 3 14 7. = 8. = 9. 1 7 5 4 = 4 10. 0 6 = 11. 7 4 10 = 1. 5 9 15 3 = 5 +

More information

Section 1.3 Triangles

Section 1.3 Triangles Se 1.3 Tringles 21 Setion 1.3 Tringles LELING TRINGLE The line segments tht form tringle re lled the sides of the tringle. Eh pir of sides forms n ngle, lled n interior ngle, nd eh tringle hs three interior

More information

Engr354: Digital Logic Circuits

Engr354: Digital Logic Circuits Engr354: Digitl Logi Ciruits Chpter 4: Logi Optimiztion Curtis Nelson Logi Optimiztion In hpter 4 you will lern out: Synthesis of logi funtions; Anlysis of logi iruits; Tehniques for deriving minimum-ost

More information

Review Topic 14: Relationships between two numerical variables

Review Topic 14: Relationships between two numerical variables Review Topi 14: Reltionships etween two numeril vriles Multiple hoie 1. Whih of the following stterplots est demonstrtes line of est fit? A B C D E 2. The regression line eqution for the following grph

More information

15-451/651: Design & Analysis of Algorithms December 3, 2013 Lecture #28 last changed: November 28, 2013

15-451/651: Design & Analysis of Algorithms December 3, 2013 Lecture #28 last changed: November 28, 2013 15-451/651: Design & nlysis of lgorithms Deemer 3, 2013 Leture #28 lst hnged: Novemer 28, 2013 Lst time we strted tlking out mehnism design: how to llote n item to the person who hs the mximum vlue for

More information

Querying Communities in Relational Databases

Querying Communities in Relational Databases Querying Communities in Reltionl Dtbses Lu Qin, Jeffrey Xu Yu, Lijun Chng, Yufei To The Chinese University of Hong Kong, Hong Kong, Chin {lqin,yu,ljhng}@se.uhk.edu.hk, toyf@se.uhk.edu.hk Abstrt Keyword

More information

arxiv: v1 [math.ca] 21 Aug 2018

arxiv: v1 [math.ca] 21 Aug 2018 rxiv:1808.07159v1 [mth.ca] 1 Aug 018 Clulus on Dul Rel Numbers Keqin Liu Deprtment of Mthemtis The University of British Columbi Vnouver, BC Cnd, V6T 1Z Augest, 018 Abstrt We present the bsi theory of

More information

On the Scale factor of the Universe and Redshift.

On the Scale factor of the Universe and Redshift. On the Sle ftor of the Universe nd Redshift. J. M. unter. john@grvity.uk.om ABSTRACT It is proposed tht there hs been longstnding misunderstnding of the reltionship between sle ftor of the universe nd

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

INTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable

INTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable INTEGRATION NOTE: These notes re supposed to supplement Chpter 4 of the online textbook. 1 Integrls of Complex Vlued funtions of REAL vrible If I is n intervl in R (for exmple I = [, b] or I = (, b)) nd

More information

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals AP Clulus BC Chpter 8: Integrtion Tehniques, L Hopitl s Rule nd Improper Integrls 8. Bsi Integrtion Rules In this setion we will review vrious integrtion strtegies. Strtegies: I. Seprte the integrnd into

More information

] dx (3) = [15x] 2 0

] dx (3) = [15x] 2 0 Leture 6. Double Integrls nd Volume on etngle Welome to Cl IV!!!! These notes re designed to be redble nd desribe the w I will eplin the mteril in lss. Hopefull the re thorough, but it s good ide to hve

More information

f (x)dx = f(b) f(a). a b f (x)dx is the limit of sums

f (x)dx = f(b) f(a). a b f (x)dx is the limit of sums Green s Theorem If f is funtion of one vrible x with derivtive f x) or df dx to the Fundmentl Theorem of lulus, nd [, b] is given intervl then, ording This is not trivil result, onsidering tht b b f x)dx

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

Lecture Summaries for Multivariable Integral Calculus M52B

Lecture Summaries for Multivariable Integral Calculus M52B These leture summries my lso be viewed online by liking the L ion t the top right of ny leture sreen. Leture Summries for Multivrible Integrl Clulus M52B Chpter nd setion numbers refer to the 6th edition.

More information

ANALYSIS AND MODELLING OF RAINFALL EVENTS

ANALYSIS AND MODELLING OF RAINFALL EVENTS Proeedings of the 14 th Interntionl Conferene on Environmentl Siene nd Tehnology Athens, Greee, 3-5 Septemer 215 ANALYSIS AND MODELLING OF RAINFALL EVENTS IOANNIDIS K., KARAGRIGORIOU A. nd LEKKAS D.F.

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

Chem Homework 11 due Monday, Apr. 28, 2014, 2 PM

Chem Homework 11 due Monday, Apr. 28, 2014, 2 PM Chem 44 - Homework due ondy, pr. 8, 4, P.. . Put this in eq 8.4 terms: E m = m h /m e L for L=d The degenery in the ring system nd the inresed sping per level (4x bigger) mkes the sping between the HOO

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Learning Partially Observable Markov Models from First Passage Times

Learning Partially Observable Markov Models from First Passage Times Lerning Prtilly Oservle Mrkov s from First Pssge s Jérôme Cllut nd Pierre Dupont Europen Conferene on Mhine Lerning (ECML) 8 Septemer 7 Outline. FPT in models nd sequenes. Prtilly Oservle Mrkov s (POMMs).

More information

System Validation (IN4387) November 2, 2012, 14:00-17:00

System Validation (IN4387) November 2, 2012, 14:00-17:00 System Vlidtion (IN4387) Novemer 2, 2012, 14:00-17:00 Importnt Notes. The exmintion omprises 5 question in 4 pges. Give omplete explntion nd do not onfine yourself to giving the finl nswer. Good luk! Exerise

More information

THE PYTHAGOREAN THEOREM

THE PYTHAGOREAN THEOREM THE PYTHAGOREAN THEOREM The Pythgoren Theorem is one of the most well-known nd widely used theorems in mthemtis. We will first look t n informl investigtion of the Pythgoren Theorem, nd then pply this

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Line Integrals and Entire Functions

Line Integrals and Entire Functions Line Integrls nd Entire Funtions Defining n Integrl for omplex Vlued Funtions In the following setions, our min gol is to show tht every entire funtion n be represented s n everywhere onvergent power series

More information

MATH Final Review

MATH Final Review MATH 1591 - Finl Review November 20, 2005 1 Evlution of Limits 1. the ε δ definition of limit. 2. properties of limits. 3. how to use the diret substitution to find limit. 4. how to use the dividing out

More information

More Properties of the Riemann Integral

More Properties of the Riemann Integral More Properties of the Riemnn Integrl Jmes K. Peterson Deprtment of Biologil Sienes nd Deprtment of Mthemtil Sienes Clemson University Februry 15, 2018 Outline More Riemnn Integrl Properties The Fundmentl

More information

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Now we must transform the original model so we can use the new parameters. = S max. Recruits MODEL FOR VARIABLE RECRUITMENT (ontinue) Alterntive Prmeteriztions of the pwner-reruit Moels We n write ny moel in numerous ifferent ut equivlent forms. Uner ertin irumstnes it is onvenient to work with

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

AP Calculus AB Unit 4 Assessment

AP Calculus AB Unit 4 Assessment Clss: Dte: 0-04 AP Clulus AB Unit 4 Assessment Multiple Choie Identify the hoie tht best ompletes the sttement or nswers the question. A lultor my NOT be used on this prt of the exm. (6 minutes). The slope

More information

Section 4.4. Green s Theorem

Section 4.4. Green s Theorem The Clulus of Funtions of Severl Vriles Setion 4.4 Green s Theorem Green s theorem is n exmple from fmily of theorems whih onnet line integrls (nd their higher-dimensionl nlogues) with the definite integrls

More information

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours Mi-Term Exmintion - Spring 0 Mthemtil Progrmming with Applitions to Eonomis Totl Sore: 5; Time: hours. Let G = (N, E) e irete grph. Define the inegree of vertex i N s the numer of eges tht re oming into

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES

PAIR OF LINEAR EQUATIONS IN TWO VARIABLES PAIR OF LINEAR EQUATIONS IN TWO VARIABLES. Two liner equtions in the sme two vriles re lled pir of liner equtions in two vriles. The most generl form of pir of liner equtions is x + y + 0 x + y + 0 where,,,,,,

More information

Fast index for approximate string matching

Fast index for approximate string matching Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit

More information

12.4 Similarity in Right Triangles

12.4 Similarity in Right Triangles Nme lss Dte 12.4 Similrit in Right Tringles Essentil Question: How does the ltitude to the hpotenuse of right tringle help ou use similr right tringles to solve prolems? Eplore Identifing Similrit in Right

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

April 8, 2017 Math 9. Geometry. Solving vector problems. Problem. Prove that if vectors and satisfy, then.

April 8, 2017 Math 9. Geometry. Solving vector problems. Problem. Prove that if vectors and satisfy, then. pril 8, 2017 Mth 9 Geometry Solving vetor prolems Prolem Prove tht if vetors nd stisfy, then Solution 1 onsider the vetor ddition prllelogrm shown in the Figure Sine its digonls hve equl length,, the prllelogrm

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point GCSE C Emple 7 Work out 9 Give your nswer in its simplest form Numers n inies Reiprote mens invert or turn upsie own The reiprol of is 9 9 Mke sure you only invert the frtion you re iviing y 7 You multiply

More information

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions MEP: Demonstrtion Projet UNIT 4: Trigonometry UNIT 4 Trigonometry tivities tivities 4. Pythgors' Theorem 4.2 Spirls 4.3 linometers 4.4 Rdr 4.5 Posting Prels 4.6 Interloking Pipes 4.7 Sine Rule Notes nd

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Alpha Algorithm: Limitations

Alpha Algorithm: Limitations Proess Mining: Dt Siene in Ation Alph Algorithm: Limittions prof.dr.ir. Wil vn der Alst www.proessmining.org Let L e n event log over T. α(l) is defined s follows. 1. T L = { t T σ L t σ}, 2. T I = { t

More information

GM1 Consolidation Worksheet

GM1 Consolidation Worksheet Cmridge Essentils Mthemtis Core 8 GM1 Consolidtion Worksheet GM1 Consolidtion Worksheet 1 Clulte the size of eh ngle mrked y letter. Give resons for your nswers. or exmple, ngles on stright line dd up

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

Fast Frequent Free Tree Mining in Graph Databases

Fast Frequent Free Tree Mining in Graph Databases The Chinese University of Hong Kong Fst Frequent Free Tree Mining in Grph Dtses Peixing Zho Jeffrey Xu Yu The Chinese University of Hong Kong Decemer 18 th, 2006 ICDM Workshop MCD06 Synopsis Introduction

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

Overview of Calculus I

Overview of Calculus I Overview of Clculus I Prof. Jim Swift Northern Arizon University There re three key concepts in clculus: The limit, the derivtive, nd the integrl. You need to understnd the definitions of these three things,

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

The Riemann-Stieltjes Integral

The Riemann-Stieltjes Integral Chpter 6 The Riemnn-Stieltjes Integrl 6.1. Definition nd Eistene of the Integrl Definition 6.1. Let, b R nd < b. ( A prtition P of intervl [, b] is finite set of points P = { 0, 1,..., n } suh tht = 0

More information

SEMANTIC ANALYSIS PRINCIPLES OF PROGRAMMING LANGUAGES. Norbert Zeh Winter Dalhousie University 1/28

SEMANTIC ANALYSIS PRINCIPLES OF PROGRAMMING LANGUAGES. Norbert Zeh Winter Dalhousie University 1/28 SEMNTI NLYSIS PRINIPLES OF PROGRMMING LNGUGES Norbert Zeh Winter 2018 Dlhousie University 1/28 PROGRM TRNSLTION FLOW HRT Soure progrm (hrter strem) Snner (lexil nlysis) Front end Prse tree Prser (syntti

More information

Alpha Algorithm: A Process Discovery Algorithm

Alpha Algorithm: A Process Discovery Algorithm Proess Mining: Dt Siene in Ation Alph Algorithm: A Proess Disovery Algorithm prof.dr.ir. Wil vn der Alst www.proessmining.org Proess disovery = Ply-In Ply-In event log proess model Ply-Out Reply proess

More information