arxiv: v1 [cs.db] 30 May 2012

Size: px
Start display at page:

Download "arxiv: v1 [cs.db] 30 May 2012"

Transcription

1 Efficient Sugrph Similrity Serch on Lrge Proilistic Grph Dtses Ye Yun Guoren Wng Lei Chen Hixun Wng College of Informtion Science nd Engineering, Northestern University, Chin Hong Kong University of Science nd Technology, Hong Kong, Chin Microsoft Reserch Asi, Beijing, Chin rxiv: v [cs.db] 30 My 202 ABSTRACT Mny studies hve een conducted on seeking the efficient solution for sugrph similrity serch over certin (deterministic) grphs due to its wide ppliction in mny fields, including ioinformtics, socil network nlysis, nd Resource Description Frmework (RDF) dt mngement. All these works ssume tht the underlying dt re certin. However, in relity, grphs re often noisy nd uncertin due to vrious fctors, such s errors in dt extrction, inconsistencies in dt integrtion, nd privcy preserving purposes. Therefore, in this pper, we study sugrph similrity serch on lrge proilistic grph dtses. Different from previous works ssuming tht edges in n uncertin grph re independent of ech other, we study the uncertin grphs where edges occurrences re correlted. We formlly prove tht sugrph similrity serch over proilistic grphs is #P-complete, thus, we employ filter-nd-verify frmework to speed up the serch. In the filtering phse, we develop tight lower nd upper ounds of sugrph similrity proility sed on proilistic mtrix index, PMI. PMI is composed of discrimintive sugrph fetures ssocited with tight lower nd upper ounds of sugrph isomorphism proility. Bsed on PMI, we cn sort out lrge numer of proilistic grphs nd mximize the pruning cpility. During the verifiction phse, we develop n efficient smpling lgorithm to vlidte the remining cndidtes. The efficiency of our proposed solutions hs een verified through extensive experiments.. INTRODUCTION Grphs hve een used to model vrious dt in wide rnge of pplictions, such s ioinformtics, socil network nlysis, nd RDF dt mngement. Furthermore, in these rel pplictions, due to noisy mesurements, inference models, miguities of dt integrtion, nd privcy-preserving mechnisms, uncertinties re often introduced in the grph dt. For exmple, in proteinprotein interction (PPI) network, the pirwise interction is derived from sttisticl models [5, 6, 20], nd the STRING dtse ( is such pulic dt source tht contins PPIs with uncertin edges provided y sttisticl predictions. In socil network, proilities cn e ssigned to edges to model the Permission to mke digitl or hrd copies of ll or prt of this work for personl or clssroom use is grnted without fee provided tht copies re not mde or distriuted for profit or commercil dvntge nd tht copies er this notice nd the full cittion on the first pge. To copy otherwise, to repulish, to post on servers or to redistriute to lists, requires prior specific permission nd/or fee. Articles from this volume were invited to present their results t Th8th Interntionl Conference on Very Lrge Dt Bses, August 27th - 3st 202, Istnul, Turkey. Proceedings of the VLDB Endowment, Vol. 5, No. 9 Copyright 202 VLDB Endowment /2/05... $ degree of influence or trust etween two socil entities [2, 25, 4]. In RDF grph, uncertinties/ inconsistencies re introduced in dt integrtion where vrious dt sources re integrted into RDF grphs [8, 24]. To model the uncertin grph dt, proilistic grph model is introduced [27, 43, 2, 8, 24]. In this model, ech edge is ssocited with n edge existence proility to quntify the likelihood tht this edge exists in the grph, nd edge proilities re independent of ech other. However, the proposed proilistic grph model is invlid in mny rel scenrios. For exmple, for uncertin protein-protein interction (PPI) networks, uthors in [9, 28] first estlish elementry interctions with proilities etween proteins, then use mchine lerning tools to predict other possile interctions sed on the elementry links. The predictive results show tht interctions re correlted, especilly with high dependence of interctions t the sme proteins. Given nother exmple, in communiction networks or rod networks, n edge proility is used to quntify the reliility of link [8] or the degree of trffic jm [6]. Oviously, there re correltions for the routing pths in these networks [6], i.e., usy trffic pth often locking trffics in nery pths. Therefore, it is necessry for proilistic grph model to consider correltions existed mong edges or nodes. Clerly, it is unrelistic to model the joint distriution for the entire set of nodes in lrge grph, i.e., rod nd socil networks. Thus, in this pper, we introduce joint distriutions for locl nodes. For exmple, in grph 00 of Figure, we give joint distriution to mesure interctions (neighor edges ) of th nodes in locl neighorhood. The joint proility tle (JPT) shows the joint distriution, nd proility in JPT (the second row) is given s P r(e =, e 2 =, = 0) = 0.2, where denotes existence while 0 denotes nonexistence. For lrger grphs, we hve multiple joint distriutions of nodes in smll neighorhoods (in fct, these re mrginl distriutions). In rel pplictions, these mrginl distriutions cn e esily otined. For exmple, uthors in [6] use smpling methods to estimte trffic joint proility of nery rods, nd point out tht the trffic joint proility follows multi-gussin distriution. For PPI networks, uthors in [9, 28] estlish mrginl distriutions using Byesin prediction. In this pper, we study sugrph similrity serch over proilistic grphs due to wide usge of sugrph similrity serch in mny ppliction fields, such s nswering SPARQL query (grph) in RDF grph dt [8, ], predicting complex iologicl interctions (grphs) [33, 9], nd identifying vehicle routings (grphs) in rod networks [8, 6]. In the following, we give the detils out sugrph similrity serch, our solutions nd contriutions. Neighor edges re the edges tht re incident to the sme vertex or the edges of tringle. 800

2 JPT e e2 e3 Pro e2 e e3 d 00 e e2 e e2 e3 Pro e3 e4 JPT2 c e3 e4 e5 Pro e5 q c e e () e (3) e e 2 e 5 c (2) e (4) c e 5 Figure : Proilistic grph dtse & Query grph. Proilistic Sugrph Mtching In this pper, we focus on threshold-sed proilistic sugrph similrity mtching (T-PS) over lrge set of proilistic grphs. Specificlly, let D = {g, g 2,..., g n} e set of proilistic grphs where edges existences re not independent, ut re given explicitly y joint distriutions, q e query grph, nd ϵ e proility threshold, T-PS query retrieves ll grphs g D such tht the sugrph similrity proility (SSP) etween q nd g is t lest ϵ. We will formlly define SSP lter (Def 9). We employ the possile world semntics [3, ], which hs een widely used for modeling proilistic dtses, to explin the mening of returned results for sugrph similrity serch. A possile world grph (PWG) of proilistic grph is possile instnce of the proilistic grph. It contins ll vertices nd suset of edges of the proilistic grph, nd it hs weight which is otined y joining joint proility tles of ll neighor edges. Then, for query grph q nd proilistic grph g, the proility tht q sugrph similrly mtches g is the summtion of the weights of those PWGs, of g, to which q is sugrph similr. If q is sugrph similr to PWG g, g must contin sugrph of q, sy q, such tht the difference etween q nd q must e less thn the user specified error tolernce threshold δ. In other words, q is sugrph isomorphic to g fter q is relxed with δ edges. Exmple. Consider grph 002 in Figure. JP T nd JP T 2 give joint distriutions of neighor edges {e, e 2, } nd {,, e 5 } respectively. Figure 2 lists prtil PWGs of proilistic grph 002 nd their weights. The weight of PWG () is otined y joining t of JP T nd t 2 of JP T 2, i.e., P r(e =, e 2 =, =, =, e 5 = 0) = P r(e =, e 2 =, = ) P r( =, =, e 5 = 0) = = Suppose the distnce threshold is. To decide if q sugrph similrly mtches proilistic grph 002, we first find ll of 002 s PWGs tht contin sugrph whose difference etween q is less thn. The results re PWGs (), (2), (3) nd (4), s shown in Figure 2, since we cn delete edge, or c of q. Next, we dd up the proilities of these PWGs: = If the query specifies proility threshold of 0.4, then grph 002 is returned since 0.45 > 0.4. The ove exmple gives nive solution, to T-PS query processing, tht needs to enumerte ll PWGs of proilistic grph. This solution is very inefficient due to the exponentil numer of PWGs. Therefore, in this pper, we propose filter-nd-verify method to reduce the serch spce..2 Overview of Our Approch Given set of proilistic grphs D = {g,..., g n} nd query grph q, our solution performs T-PS query processing in three steps, nmely, structurl pruning, proilistic pruning, nd verifiction. Figure 2: Prtil possile world grphs of proilistic grph 002 Structurl Pruning The ide of structurl pruning is strightforwrd. If we remove ll the uncertinty in proilistic grph, nd q is still not sugrph similr to the resulting grph, then q cnnot sugrph similrly mtch the originl proilistic grph. Formlly, for g D, let g c denote the corresponding deterministic grph fter we remove ll the uncertin informtion from g. We hve Theorem. If q sim g c, P r(q sim g) = 0. where sim denotes sugrph similr reltionship (Def 8), nd P r(q sim g) denotes the sugrph similrity proility of q to g. Bsed on this oservtion, given D nd q, we cn prune the dtse D c = {g, c..., gn} c using conventionl deterministic grph similr mtching methods. In this pper, we dopt the method in [38] to quickly compute results. [38] uses multi-filter composition strtegy to prune lrge numer of grphs directly without performing pirwise similrity computtion, which mkes [38] more efficient compred to other grph similr serch lgorithms [5, 4]. Assume the result is SCq c = {g c q sim g c, g c D c }. Then, its corresponding proilistic grph set, SC q = {g g c SCq}, c is the input for uncertin sugrph similr mtching in the next step. Proilistic Pruning To further prune the results, we propose Proilistic Mtrix Index (PMI) tht will e introduced lter, for proilistic pruning. For given set of proilistic grphs D nd its corresponding set of deterministic grphs D c, we crete feture set F from D c, where ech feture is deterministic grph, i.e., F D c. In PMI, for ech g SC q, we cn locte set D g = { LowerB(f j), UpperB(f j) f j iso g c, j F } where LowerB(f) nd UpperB(f) re the lower nd upper ounds of the sugrph isomorphism proility of f to g (Def 6), denoted y P r(f iso g). In this pper, iso is used to denote sugrphisomorphism. If f is not sugrph isomorphic to g c, we hve 0. In the proilistic filtering, we first determine the remining grphs fter q is relxed with δ edges, where δ is the sugrph distnce threshold. Suppose the remining grphs re {rq,...rq i,... rq }. For ech rq i, we compute two fetures fi nd fi 2 in D g such tht rq i iso fi nd rq i iso fi 2. Let P r(q sim g) denote the sugrph similrity proility of q to g (Def 9). Then, we cn clculte upper nd lower ounds of P r(q sim g) sed on the vlues of UpperB(fi ) nd LowerB(fi 2 ) for i respectively. If the upper ound of P r(q sim g) is smller thn proility threshold ϵ, g is pruned. If the lower ound of P r(q sim g) is not smller thn ϵ, g is in the finl nswers. 80

3 Verifiction In this step, we clculte P r(q sim g) for query q nd cndidte nswer g, fter proilistic pruning, to mke sure g is relly n nswer, i.e. P r(q sim g) ϵ..3 Contriutions nd Pper Orgniztion The min ide (contriution) of our pproch is to use the proilistic checking of feture-sed index to discriminte most grphs. To chieve this, severl chllenges need to e ddressed. Chllenge : Determine est ounds of P r(q sim g) For ech rq i, we cn find mny f i s nd f 2 i s in PMI, thus, lrge numer of ounds of P r(q sim g) sed on the comintion of UpperB(f i ) nd LowerB(f 2 i ) for i cn e computed. In this pper, we convert the prolem of computing the est upper ound into set cover prolem. Our contriution is to develop n efficient rndomized lgorithm to otin the est upper ound using integer qudrtic progrmming, which is presented in Section 3. Chllenge 2: Compute n effective D g An effective D g should consist of tight UpperB(f) nd LowerB( f) whose vlues cn e computed efficiently. As we will show lter tht clculting P r(f iso g) is #P-complete, which increses the difficulty of computing n effective D f. To ddress this chllenge, we mke contriution to derive tight UpperB(f) nd LowerB(f) y converting the prolem of computing ounds into mximum clique prolem nd propose n efficient solution y comining the properties of proility conditionl independence nd grph theory, which is discussed in Section 4.. Chlleng: Find the fetures tht mximize pruning Frequent sugrphs (mined from D c ) re commonly used s fetures in grph mtching. However, it would e imprcticl to index ll of them. Our gol is to mximize the pruning cpility with smll numer of fetures. To chieve this gol, we consider two criteri in selecting fetures, the size of the feture nd the numer of disjoint emeddings tht feture hs. A feture of smll size nd mny emeddings is preferred. The detils out feture selection re given in Section 4.2. Chlleng: Compute SSP efficiently Though we re le to filter out lrge numer of proilistic grphs, computing the exct SSP in the verifiction phse my still tke quite some time nd ecome the ottleneck in query processing. To ddress this issue, we develop n efficient smpling lgorithm, sed on the Monte Crlo theory, to estimte SSP with high qulity, which is presented in Section 5. In ddition, in Section 2, we formlly define T-PS queries over proilistic grphs nd give the complexity of the prolem in Section 2. We discuss the results of performnce tests on rel dt sets in Section 6 nd the relted works in Section 7. We conclude our work in Section PROBLEM DEFINITION In this section, we define some necessry concepts nd show the complexity of our prolem. Tle summrizes the nottions used in this pper. 2. Prolem Definition Definition. (Deterministic Grph) An undirected deterministic grph 2 g c, is denoted s (V, E, Σ, L), where V is set of vertices, E is set of edges, Σ is set of lels, nd L : V E Σ is 2 In this pper, we consider undirected grphs, lthough it is strightforwrd to extend our methods to directed grphs. Symol Description D, SC q, A q the proilistic dtse set D c, SC c q the deterministic dtse g the proilistic grph ϵ the user-specified proility threshold δ the sugrph distnce threshold f, q, g g c the deterministic grph U = {rq,.., rq } the remining grph set fter q is relxed with δ edges LowerB(f), the lower nd upper ounds of SIP UpperB(f) L sim(q), U simq the lower nd upper ounds of SSP Brq i, Bf i, Bc i the Boolen vriles of query, emedding nd cut Ef, Ec the set of emeddings nd cuts IN the set of disjoint emeddings F the feture set P r(x ne ) the joint proility distriution of neighor edges P r(q iso g) the isomorphism etween q nd g P r(q sim g) the sugrph similrity proility etween q nd g Tle : Nottions function tht ssigns lels to vertices nd edges. A set of edges re neighor edges, denoted y ne, if they re incident to the sme vertex or the edges form tringle in g c. For exmple, consider grph 00 in Figure. Edges e, e 2 nd re neighor edges, since they form tringle. Consider grph 002 in Figure. Edges,, nd e 5 re lso neighor edges, since they re incident to the sme vertex. Definition 2. (Proilistic Grph) A proilistic grph is defined s g = (g c, X E), where g c is deterministic grph, nd X E is inry rndom vrile set indexed y E. An element x e X E tkes vlues 0 nd, nd denotes the existence possiility of edge e. A joint proility density function P r(x ne ) is ssigned to ech neighor edge set, where x ne denotes the ssignments restricted to the rndom vriles of neighor edge set, ne. A proilistic grph hs uncertin edges ut deterministic vertices. The proility function P r(x ne ) is given s joint proility tle of rndom vriles of ne. For exmple, the proilistic grph 002 in Figure hs 2 joint proility tles ssocited with 2 neighor edge sets, respectively. Definition 3. (Possile World Grph) A possile world grph g = (V, E, Σ, L ) is n instntition of proilistic grph g = ((V, E, Σ, L), X E ), where V = V, E E, Σ Σ. We denote the instntition from g to g s g g. Both g nd g c re deterministic grphs. But proilistic grph g corresponds to one g c nd multiple possile world grphs. We use P W G(g) to denote the set of ll possile world grphs derived from g. For exmple, Figure 2 lists 4 possile world grphs of the proilistic grph 002 in Figure. Definition 4. (Conditionl Independence) Let X, Y, nd Z e sets of rndom vriles. X is conditionlly independent of Y given Z (denoted y X Y Z) in distriution Pr if: P r(x = x; Y = y Z = z) = P r(x = x Z = z) P r(y = y Z = z) for ll vlues x dom(x), y dom(y ) nd z dom(z). Following rel pplictions [9, 28, 8, 6], we ssume tht ny two disjoint susets of Boolen vriles, X A nd X B of X E, re 802

4 conditionlly independent given suset X C (X A X B X C ), if there is pth from vertex in A to vertex in B pssing through C. Then, the proility of possile world grph g is given y: P r(g g ) = P r(x ne) () ne NS where NS is ll the sets of neighor edges of g. For exmple, in proilistic grph 002 of Figure, {e, e 2 } {, e 5 }. Clerly, for ny possile world grph g, we hve P r(g g ) > 0 nd g P W G(g) P r(g g ) =, tht is, ech possile world grph hs n existence proility, nd the sum of these proilities is. Definition 5. (Sugrph Isomorphism) Given two deterministic grphs g = (V, E, Σ, L ) nd g 2 = (V 2, E 2, Σ 2, L 2 ), we sy g is sugrph isomorphic to g 2 (denoted y g iso g 2 ), if nd only if there is n injective function f : V V 2 such tht: for ny (u, v) E, there is n edge (f(u), f(v)) E 2 ; for ny u V, L (u) = L 2(f(u)); for ny (u, v) E, L (u, v) = L 2 (f(u), f(v)). The sugrph (V 3, E 3 ) of g 2 with V 3 = {f(v) v V } nd E 3 = {(f(u), f(v)) (u, v) E } is clled the emedding of g in g 2. When g is sugrph isomorphic to g 2, we lso sy tht g is sugrph of g 2 nd g 2 is super-grph of g. Definition 6. (Sugrph Isomorphism Proility) For deterministic grph f nd proilistic grph g, we define their sugrph isomorphism proility (SIP) s, P r(f iso g) = g SUB(f,g) P r(g g ) (2) where SUB(f, g) is g s possile world grphs tht re super-grphs of f, tht is, SUB(f, g) = {g P W G(g) f iso g }. Definition 7. (Mximum Common Sugrph-MCS) Given two deterministic grphs g nd g 2, the mximum common sugrph of g nd g 2 is the lrgest sugrph of g 2 tht is sugrph isomorphic to g, denoted y mcs(g, g 2 ). Definition 8. (Sugrph Distnce) Given two deterministic grphs g nd g 2, the sugrph distnce is, dis(g, g 2 ) = g mcs(g, g 2 ). Here, g nd mcs(g, g 2 ) denote the numer of edges in g nd mcs(g, g 2 ), respectively. For distnce threshold δ, if dis(g, g 2 ) δ, we cll g is sugrph similr to g 2. Note tht, in this definition, sugrph distnce only depends on the edge set difference, which is consistent with pervious works on similrity serch over deterministic grphs [38, 5, 30]. The opertions on n edge consist of edge deletion, releling nd insertion. Definition 9. (Sugrph Similrity Proility) For given query grph q, proilistic grph g 3 nd sugrph distnce threshold δ, we define their sugrph similrity proility s, P r(q sim g) = g SIM(q,g) P r(g g ) (3) where SIM(q, g) is g s possile world grphs tht hve sugrph distnce to q no lrger thn δ, tht is, SIM(q, g) = {g P W G(g ) dis(q, g ) δ}. 3 Without loss of the generlity, in this pper, we ssume query grph is connected deterministic grph, nd proilistic grph is connected. Pr(y) u u 2 u 3 Pr(y) Pr(y2) Pr(y2) Pr(y2) Pr(y3) v v 2 v 3 g Pr(y3) w u v v 2 v 3 Figur: The proilistic grph g nd query grph q constructed for (y y 2 ) (y y 2 y 3 ) (y 2 y 3 ). grph feture f f 2 f 3 00 (0.55, 0.64) (0.3, 0.48) 0 PMI 002 (0.42, 0.5) (0.26, 0.58) (0.08, 0.5) f f 2 f 3 fetures Figur: Proilistic Mtrix Index (PMI) & fetures of proilistic grph dtse Prolem Sttement. Given set of proilistic grphs D = {g,..., g n }, query grph q, nd proility threshold ϵ (0 < ϵ ), sugrph similr query returns set of proilistic grphs {g P r(q sim g) ϵ, g D}. 2.2 Prolem Complexity From the prolem sttement, we know tht in order to nswer proilistic sugrph similr queries efficiently, we need to clculte SSP (sugrph similrity proility) efficiently. We now show the time complexity of clculting SSP. Theorem 2. It is #P-complete to clculte the sugrph similrity proility. Proof. Due to spce limit, we do not give the full proof nd just highlight the mjor steps here. We consider proilistic grph whose edge proilities re independent from ech other. This proilistic grph model is specil cse of the proilistic grph defined in Definition 2. We prove the theorem y reducing n ritrry instnce of the #P-complete DNF counting prolem [3] to n instnce of the prolem of computing P r(q sim g) in polynomil time. Figur illustrtes n reduction for the DNF formul F = (y y 2 ) (y y 2 y 3 ) (y 2 y 3 ). In the figure, the grph distnce etween q nd ech possile world grph g is (delete vertex w from q). Ech truth ssignment to the vriles in F corresponds to possile world grph g derived from g. The proility of ech truth ssignment equls to the proility of g tht the truth ssignment corresponds to. A truth ssignment stisfies F if nd only if g, the truth ssignment corresponds to, is sugrph similr to q (suppose grph distnce is ). Thus, P r(f ) is equl to the proility, P r(q sim g). 3. PROBABILISTIC PRUNING As mentioned in Section.2, we first conduct structurl pruning to remove proilistic grphs tht do not pproximtely contin the query grph q, nd then we use proilistic pruning techniques to further filter the remining proilistic grph set, nmed SC q. q c 803

5 3. Pruning Conditions We first introduce n index structure, Proilistic Mtrix Index (PMI), to fcilitte proilistic filtering. Ech column of the mtrix corresponds to proilistic grph in the dtse D, nd ech row corresponds to n indexed feture. Ech entry records {LowerB(f), UpperB(f)}, where UpperB(f) nd LowerB(f) re the upper nd lower ounds of the sugrph isomorphism proility of f to g, respectively. Exmple 2. Figur shows the PMI of proilistic grphs in Figure. Given query q, proilistic grph g nd sugrph distnce δ, we generte grph set, U = {rq,.., rq }, y relxing q with δ edge deletions or relelings 4. Here, we use the solution proposed in [38] to generte {rq,.., rq }. Suppose we hve uilt the PMI. For ech g SC q, in PMI, we locte D g = { LowerB(f j), UpperB(f j) f j iso g c, j F } For ech rq i, we find two grph fetures in D g, {fi, fi 2 }, such tht rq i iso fi nd rq i iso fi 2, where i. Then we hve proilistic pruning conditions s follows. Pruning.(sugrph pruning) Given proility threshold ϵ nd D g, if UpperB(f i ) < ϵ, then g cn e sfely pruned from SC q. Pruning 2.(super grph pruning) Given proility threshold ϵ nd D g, if LowerB(f 2 i ) i,j UpperB(f 2 i )Upper- B(f 2 j ) ϵ, then g is in the finl nswers, i.e., g A q, where A q is the finl nswer set. Before proving the correctness of the ove two pruning conditions, we first introduce lemm out P r(q sim g), which will e used for the proof. Let Brq i e Boolen vrile where i, Brq i is true when rq i is sugrph isomorphic to g c, nd P r(brq i) is the proility tht Brq i is true. We hve Lemm. P r(q sim g) = P r(brq... Brq ). (4) Proof. From Definition 9, we hve P r(q sim g) = g SIM(q,g) P r(g g ) (5) where SMI(q, g) is set of possile world grphs tht hve sugrph distnce to q no lrger thn δ Let d e the sugrph distnce etween q nd g c. We divide SIM(q, g) into δ d + susets 5, {SM 0,..., SM δ d }, such tht possile world grph in SM i hs sugrph distnce d + i with q. Thus, from Eqution 5, we get P r(q iso g) = P r(g g ) g SM... SM δ d = P r(g g ) P r( 0 j δ d g SM j 0 j <j 2 δ d g SM j SM j2 g g ) + + ( ) i P r(g g ) 0 j <...<j i δ d g SM j... SM ji + + ( ) δ d P r(g g ). g SM j... SM jδ d 4 According to the sugrph similrity serch, insertion does not chnge the query grph. 5 For g SC q, we hve d δ, since the proilistic grphs with d > δ hve een filtered out in the deterministic pruning. (6) Let L i, 0 i δ d, e the grph set fter q is relxed with d + i edges, nd BL i e Boolen vrile, when BL i is true, it indictes t lest one grph in L i is sugrph of g c. Consider the ith item on the RHS in Eqution 6, let A e the set composed of ll grphs in i grph sets, nd B = BL j... BL ji e the corresponding Boolen vrile of A. The set g SM j... SM ji contins ll PWGs tht hve ll grphs in A. Then, for the ith item, we get, ( ) i P r(g g ) 0 j <...<j i δ d g SM j... SM ji = ( ) i P r(bl j... BL ji ). 0 j <...<j i δ d Similrly, we cn get the results for other items. By replcing the corresponding items with these results in Eqution 6, we get P r(q iso g) = P r(bl j) 0 j δ d 0 j <j 2 δ d + + ( ) i P r(bl j... BL ji ) 0 j <...<j i δ d (7) P r(bl j BL j2 ) + + ( ) δ d P r(bl j... BL jδ d ). (8) Bsed on the Inclusion-Exclusion Principle [26], the RHS of Eqution 8 is P r(bl 0... BL δ d ). Clerly, BL 0... BL δ d, then P r(bl 0... BL δ d ) = P r(bl δ d ) = P r(brq... Brq ) Lemm gives method to compute SSP. Intuitively, the proility of q eing sugrph similr to g equls to the proility tht t lest one grph of the grph set U = {rq,..., rq } is sugrph of g, where U is remnning grph set fter q is relxed with δ edges. With Lemm, we cn formlly prove the two pruning conditions. Theorem 3. Given proility threshold ϵ nd D g, if UpperB(f i ) < ϵ, then g cn e sfely pruned from SC q. Proof. Since rq i iso fi, we hve Brq... Brq Bf... Bf, where Bfi is Boolen vrile denoting the proility of fi eing sugrph of g for i. Bsed on Lemm, we otin P r(q sim g) = P r(brq... Brq ) Then g cn e pruned. P r(bf... Bf ) P r(bf ) P r(bf ) UpperB(f ) UpperB(f ) < ϵ. Theorem 4. Given proility threshold ϵ nd D g, if L- owerb(f 2 i ) i,j UpperB(f 2 i )UpperB(f 2 j ) ϵ, then g A q, where A q is the finl nswer set. Proof. Since Brq i Bf 2 i, we cn show tht P r(q sim g) = P r(brq... Brq ) P r(bf 2... Bf 2 ) P r(bf 2 i ) i,j LowerB(f 2 i ) ϵ. i,j P r(bf 2 i )P r(bf 2 j ) UpperB(f 2 i )UpperB(f 2 j ) 804

6 rq rq2 c rq3 f S :{rq,rq 2 } W(S )=0.4 f 2 c S 2 :{rq 2,rq 3 } W(S 2 )=0. f 3 S 3 :{rq,rq 3 } W(S 3 )=0.5 Then g A q. Figure 5: Otin tightest U sim (q) Note tht the pruning process needs to ddress the trditionl sugrph isomorphism prolem (rq iso f or rq iso f). In our work, we implement the stte-of-the-rt method VF2 [0]. 3.2 Otin Tightest Bounds of sugrph similrity proility In pruning conditions, for ech rq i ( i ), we find only one pir feture {f i, f 2 i }, mong F fetures, such tht rq i iso f i nd rq i iso f 2 i. Then we compute the upper ound, U sim(q) = UpperB(f i ) nd the lower ound L sim(q) = LowerB(f 2 i ) i,j UpperB(f 2 i )UpperB(f 2 j ). However, there re mny f i s nd f 2 i s stisfying conditions mong F fetures, therefore, we cn compute lrge numer of U sim (q)s nd L sim (q) -s. For ech rq i, if we find x fetures meeting the needs mong F fetures, we cn derive x U sim(q)s. Let x = 0 nd = 0, then there re 0 0 upper ounds. The sme holds for L sim(q). Clerly, it is unrelistic to determine the est ounds y enumerting ll the possile ones, thus, in this section, we give efficient lgorithms to otin the tightest U sim(q) nd L sim(q) Otin Tightest U sim (q) For ech f j ( j F ) in PMI, we determine grph set, s j, tht is suset of U = {rq,..., rq }, such tht rq i s j s.t. rq i iso f j. We lso ssocite s j weight, UpperB(f j). Then we otin F sets {s,.., s F } with ech set hving weight w(s j) = UpperB(f j) for j F. With this mpping, we trnsform the prolem of computing tightest U sim (q) into weighted set cover prolem defined s follows. Definition 0. (Tightest U sim (q)) Given finite set U = {rq,..., rq } nd collection S = {s,.., s j,.., s F } of susets of U with ech s j ttched weight w sj, we wnt to compute susect C S to minimize s j C w(sj) s.t. s j C sj = U. It is well-known tht the set cover prolem is NP-complete [3], we use greedy pproch to pproximte the tightest U sim (q). Algorithm gives detiled steps. Assume the optiml vlue is OPT, the pproximte vlue is within OP T ln U [2]. Algorithm OtinTightestU sim (q)(u, S) : A ϕ, U sim(q) = 0; 2: while A is not cover of U do 3: for ech s S, compute γ(s) = w(s) s A ; 4: choose n s with miniml γ(s); 5: A A s; 6: U sim(q)+ = w(s); 7: end while 8: return U sim(q); Exmpl. In Figure, suppose we use q to query proilistic grph 002, nd the sugrph distnce is. The relxed grph c set of q is U = {rq, rq 2, rq 3 } s shown in Figure 5. Given indexed fetures {f, f 2, f 3 }, we first determine s = {rq, rq 2 }, s 2 = {rq 2, rq 3 } nd s 3 = {rq, rq 3 }. We use the UpperB(f j ), j 3, s weight for three sets, nd thus we hve w(s ) = 0.4, w(s 2 ) = 0. nd w(s 3 ) = 0.5. Bsed on Definition 0, we otin three U sim (q)s, which re =0.5, =0.9 nd =0.6. Finlly the smllest (tightest) vlue, 0.5, is used s the upper ound, i.e., U sim(q) = Otin Tightest L sim (q) For lower ound L sim (q), the lrger (tighter) L sim (q) is, the etter the proilistic pruning power is. Here we formlize the prolem of computing lrgest L sim (q) s n integer qudrtic progrmming prolem, nd develop n efficient rndomized lgorithm to solve it. For ech f i ( i F ) in PMI, we determine grph set, s i, tht is suset of U = {rq,..., rq }, such tht rq j s i s.t. rq j iso f i. We ssocite s i pir weight of {LowerB(f i), Upper B(f i )}. Then we otin F sets {s,.., s F } with ech set hving pir weight {w L (s i ), w U (s i )} for i F. Thus the prolem of computing tightest L sim (q) cn e formlized s follows. Definition. (Tightest L sim(q)) Given finite set U = {rq,..., rq } nd collection S = {s,..., s F } of susets of U with ech s i ttched pir weight {w L(s i), w U (s i)}, we wnt to compute susect C {s,..., s F } to mximize w L (s i ) w U (s i )w U (s j ) s i C s.t. s i C si = U. s i,s j C Associte n indictor vrile, x si, with ech set s i S, which tkes vlue if set s i is selected, 0 otherwise. Then we wnt to: Mximize x si w L(s i) s.t. s i C s i,s j C rq s i x si rq U, x s {0, }. x si x sj w U (s i)w U (s j) Eqution 9 is n integer qudrtic progrmming which is hrd prolem [3]. We relx x si to tke vlues within [0, ], i.e., x si [0, ]. Then the eqution ecomes stndrd qudrtic progrmming (QP). Clerly, this QP is convex, nd there is n efficient solution to solve the progrmming [23]. Since ll fesile solutions for Eqution 9 re lso fesile solutions for the relxed qudrtic progrmming, the mximum vlue QP (I) computed y the relxed QP provides n upper ound for the vlue computed in Eqution 9. Thus the vlue of QP (I) cn e used s the tightest lower ound. However, the proposed relxtion technique cnnot give ny theoreticl gurntee on how tight QP (I) is to Eqution 9 [2]. Now following the relxed QP, we propose rndomized rounding lgorithm tht yields n pproximtion ound for Eqution 9. Algorithm 2 shows the detiled steps. According to Eqution 9, it is not difficult to see tht more elements in U re covered, the tighter L sim (q) is. The following theorem sttes tht the numer of covered elements of U hs theoreticl gurntee. Theorem 5. When Algorithm 2 termintes, the proility tht ll elements re covered is t lest U. (9) 805

7 Algorithm 2 OtinTightestL sim(q)(u, S) : C ϕ, L sim(q) = 0; 2: Let x s e n optiml solution to the relxed QP; 3: for k = to 2ln U do 4: Pick ech s S independently with proility x s ; 5: if s is picked then 6: C s; C 7: L sim (q) = L sim (q) + w L (s) w U (s) w U (s l ); 8: end if 9: end for 0: return L sim (q); rq rq2 c rq3 c l= f S :{rq,rq 2 } W(S )=0.4 f 2 c S 2 :{rq 2,rq 3 } W(S 2 )=0. f 3 S 3 :{rq,rq 3 } W(S 3 )=0.5 Figure 6: Otin tightest L sim (q) Proof. For n element rq U, the proility of rq is not covered in n itertion is ( x s) e x s e rq s x s e. rq s rq s Then rq is not covered t the end of the lgorithm is t most e 2log U. Thus, the proility tht there is some rq tht U 2 is not covered is t most U / U 2 = / U. Exmpl. In Figure, suppose we use q to query proilistic grph 002, nd the sugrph distnce is. The relxed grph set of q is U = {rq, rq 2, rq 3} shown in Figure 6. Given indexed fetures {f, f 2}, we first determine s = {rq } nd s 2 = {rq, rq 2, rq 3}. Then we use {LowerB(f i), UpperB(f i)}, i 2, s weights, nd thus we hve {w L (s ) = 0.28, w U (s ) = 0.36}, {w L (s 2 ) = 0.08, w U (s 2 ) = 0.5}. Bsed on Definition, we ssign L sim (q) = PROBABILISTIC MATRIX INDEX In this section, we discuss how to otin tight {LowerB(f), UpperB(f)} nd generte fetures used in proilistic mtrix index (PMI). 4. Bounds of Sugrph Isomorphism Proility 4.. LowerB(f) Let Ef = {f,.., f Ef } e the set of ll emeddings 6 of feture f in the deterministic grph g c, Bf i e Boolen vrile for i Ef, which indictes whether f i exists in g c or not, nd P r(bf i ) e the proility of the emedding f i exists in g. Similr to Lemm, we hve P r(f iso g) = P r(bf... Bf Ef ). (0) According to Theorem 2, it is not difficult to see tht clculting the exct P r(f iso g) is NP-complete. Thus we rewrite Eqution 0 s follows 6 In this pper, we use the lgorithm in [36] to compute emeddings of feture in g c P r(f iso g) = P r(bf... Bf Ef ) = P r(bf... Bf Ef ) P r(bf... Bf IN Bf IN +... Bf Ef ). where IN = {Bf,..., Bf IN } Ef. Let the corresponding emeddings of Bf i, i IN, do not hve common prts (edges). Since g c is connected, these IN Boolen vriles re conditionlly independent given ny rndom vrile of g. Then Eqution is written s P r(f iso g) P r(bf... Bf IN Bf IN +... Bf Ef ) IN = [ P r(bf i Bf IN +... Bf Ef )]. For vriles Bf x, Bf y {Bf IN +,..., Bf Ef }, we hve P r(bf i Bf x Bf y) = P r(bf i Bf x Bf y ) P r(bf x Bf y ) = P r(bf i Bf x Bf y )/P r(bf y ) P r(bf x Bf y )/P r(bf y ) = P r(bf i Bf x Bf y ). P r(bf x Bf y ) If Bf i nd Bf x re conditionlly independent given Bf y, then () (2) (3) P r(bf i Bf x Bf y) = P r(bf i Bf y)p r(bf x Bf y). (4) By comining Equtions 3 nd 4, we otin P r(bf i Bf x Bf y ) = P r(bf i Bf y ). (5) Bsed on this property, Eqution 2 is reduced to IN P r(f iso g) [ P r(bf i Bf IN +... Bf Ef )] IN = [ P r(bf i Bf... Bf C )] IN = [ P r(bf i COR)] where COR = Bf... Bf C, nd the corresponding emedding of Bf j C = {Bf,..., Bf C } overlps with the corresponding emedding of Bf i. For given Bf i, P r(bf i COR) is constnt, since the numer of emeddings overlpping with f i in g c is constnt. Now we otin the lower ound of P r(f iso g) s (6) IN LowerB(f) = [ P r(bf i COR)], (7) which is only dependent on the selected IN emeddings tht do not hve common prts with ech other. To compute P r(bf i COR), strightforwrd pproch is the following. We first join ll the joint proility tles (JPT), nd menwhile multiply joint proilities of joining tuples in JPTs. 806

8 e e 2 e 2 EM EM 3 Emeddings of f 2 in 002 EM 2 EM EM 2 EM 3 Grph fg of emeddings Figure 7: Emeddings & fg of feture f 2 in proilistic grph 002 Then, in the join result, we project on edge lels involved in Bf i nd COR, nd eliminte duplictes y summing up their existence proilities. The summriztion is the finl result. However, this solution is clerly time inefficient for the ske of join, duplicte elimintion, nd proility multipliction. In order to clculte P r(bf i COR) efficiently, we use smpling lgorithm to estimte its vlue. Algorithm 3 shows the detiled steps. The min ide of the lgorithm is s follows. We first smple possile world g. Then we check the condition, in Line 4, tht is used to estimte P r(bf i COR), nd the condition, in Line 7, tht is used to estimte P r(cor). Finlly we return n /n 2 which is n estimtion of P r(bf i COR)/P r(cor) = P r(bf i COR). The cycling numer m is set to (4ln 2 ξ )/τ 2 (0 < ξ <, τ > 0) used in Monte Crlo theory [26]. Algorithm 3 ClculteP r(bf i COR) (g, Bf i, COR) : n = 0, n 2 = 0; 2: for i = to m do 3: Smple ech neighor edge set ne of g ccording to P r(x ne), nd then otin n instnce g ; 4: if g hs emedding f i & no emeddings involved in COR then 5: n + = ; 6: end if 7: if g hs no emeddings involved in COR then 8: n 2+ = ; 9: end if 0: end for : return n /n 2; Exmple 5. In Figur, consider f 2, feture of proilistic grph 002 shown in Figure. f 2 hs three emeddings in 002, nmely EM, EM2 nd EM3 s shown in Figure 7. In corresponding Bf is, Bf nd Bf 3 re conditionlly independent given Bf 2. Then sed on Eqution 7, we hve LowerB(f) = [ P r(bf Bf 2 )][ P r(bf 3 Bf 2 )] = As stted erly, LowerB(f) depends on emeddings tht do not hve common prts. However, mong ll Ef emeddings, there re mny groups which contin disjoint emeddings nd leds to different lower ounds. We wnt to get tight lower ound in order to increse the pruning power. Next, we introduce how to otin tightest LowerB(f). Otin Tightest Lower Bound We construct n undirected grph, fg, with ech node representing n emedding f i, i Ef, nd link connecting two disjoint emeddings (nodes). Note tht, to void confusions, nodes nd links re used for fg, while vertices nd edges re for grphs. We lso ssign ech node weight, ln[ P r(bf i COR)]. In fg, clique is set of nodes such tht ny two nodes of the set re djcent. We define the weight of clique s the sum of node weights in the clique. Clerly, given clique in fg with weight v, LowerB(f) is e v. Thus, the lrger the weight, the tighter (lrger) the lower ound. To otin tight lower ound, we should find clique whose weight is lrgest, which is exctly the mximum weight clique prolem. Here we use the efficient solution in [7] to solve the mximum clique prolem, nd the lgorithm returns the lrgest weight z. Therefore, we use e z s the tightest vlue for LowerB(f). Exmple 6. Following Exmple 5, s shown in Figure 7, EM is disjoint with EM3. Bsed on the ove discussion, we construct fg, for the three emeddings, shown in Figure 7. There re two mximum cliques nmely, {EM, EM3} nd EM2. According to Eqution 7, the lower ounds derived from the 2 mximum cliques re 0.26 nd 0. respectively. Therefore we select the lrger (tighter) vlue 0.26 to e the lower ound of f 2 in UpperB(f) Firstly, we define Emedding Cut: For feture f, n emedding cut is set of edges in g c whose removl will cuse the sence of ll f s emeddings in g c. An emedding cut is miniml if no proper suset of the emedding cut is n emedding cut. In this pper, we use miniml emedding cut. Denote n emedding cut y c nd its corresponding Boolen vrile (sme s Bf) y Bc, where Bc is true indicting tht the emedding cut c exists in g c. Similr to Eqution 0, it is not difficult to otin, P r(f iso g) = P r(bc... Bc Ec ) = P r(bc... Bc Ec ) where Ec = {c,..., c Ec } is the set of ll emedding cuts of f in g c. Eqution 8 shows tht the sugrph isomorphism proility of f to g equls the proility of ll f s emedding cuts disppering in g. Similr to the deduction from Eqution 0 to 7 for LowerB(f), we cn rewrite Eqution 8 s follows P r(f iso g) = P r(bc... Bc Ec ) P r(bc... Bc IN Bc IN +... Bc Ec ) IN = [ P r(bc i Bc IN +... Bc Ec )] IN = [ P r(bc i Bc... Bc D )] IN = [ P r(bc i COM)] where IN = {Bc,..., Bc IN } is set of Boolen vriles whose corresponding cuts re disjoint, COM = Bc... Bc D, nd the corresponding cut of Bc j D = {Bc,..., Bc D } hs common prts with the corresponding cut of Bc i. Finlly we otin the upper ound s UpperB(f) = IN (8) (9) [ P r(bc i COM)]. (20) The upper ound only relies on the picked emedding cut set in which ny two cuts re disjoint. The vlue of P r(bc i COM) is estimted using Algorithm 3 y replcing emeddings with cuts. Similr to lower ound, computing tightest UpperB(f) cn e converted into mximum weight clique prolem. However, different from lower ound, ech node of the constructed grph fg represents cut nd hs weight of ln[ P r(bc i COM)] insted. Thus, for the mximum weight clique with weight v, the tightest vlue of UpperB(f) is e v. Now we discuss how to determine emedding cuts in g c. Clcultion of Emedding Cuts We uild connection etween emedding cuts in g c nd cuts for two vertices in deterministic grph. 807

9 e e 2 e 2 EM EM3 Emeddings of f2 EM2 s e e 2 e 2 Prllel grph cg Figure 8: Trnsformtion from emeddings of f 2 to prllel grph cg Suppose f hs Ef emeddings in g c, nd ech emedding hs k edges. Assign k lels, {e,..., e k }, for edges of ech emedding (the order is rndom.). We crete corresponding line grph for ech emedding y () creting k + isolted nodes, nd (2) connecting these k + nodes to e line y ssociting k edges (with corresponding lels) of the emedding. Bsed on these line grphs, we construct prllel grph, cg. The node set of cg consists of ll nodes of the Ef line grphs nd two new nodes, s nd t. The edge set of cg consists of ll edges (with lels) of the Ef line grphs. In ddition, one edge (without lel) is plced etween n end node of ech line grph nd s. Similrly, there is n edge etween t nd the other end node of ech line grph. As result, Ef emeddings re trnsformed into deterministic grph cg. Bsed on this trnsformtion, we hve Theorem 6. The emedding cut set of g c is lso the cut set (without edges incident to s nd t) from s to t in cg. In this work, we determine emedding cuts using the method in [22]. Exmple 7. Figure 8 shows the trnsformtion for feture f 2 in grph 002 in Figure. In cg, we cn find cuts {e 2, }, {e,, } nd {e 2, } which re clerly the emedding cuts of f 2 in Feture Genertion We would like to select frequent nd discrimintive fetures to construct proilistic mtrix index (PMI). To chieve this, we consider UpperB(f) given in Eqution 20, since upper ound plys most importnt role in the pruning cpility. According to Eqution 20, to get tight upper ound, we need lrge disjoint cut set nd lrge P r(bc i COM). Suppose the cut set is IN. Note tht IN = IN, since cut in IN hs corresponding Boolen vrile Bc i in IN. From the clcultion of emedding cuts, it is not difficult to see tht lrge numer of disjoint emeddings leds to lrge IN. Thus we would like feture tht hs lrge numer of disjoint emeddings. Since COM is smll, smll size feture results in lrge P r(bc i COM). In summry, we should index feture, which complies with following rules: Rule. Select fetures tht hve lrge numer of disjoint emeddings. Rule 2. Select smll size fetures. To chieve rule, we define the frequency of feture f s frq(f) = {g f isog c, IN / Ef α,g D}, where α is threshold of the D rtio of disjoint emeddings mong ll emeddings. Given frequency threshold β, feture f is frequent iff frq(f) β. Thus we would like to index frequent feture. To chieve rule 2, we control feture size used in Algorithm 4. To control feture numer [37, 29], we lso define the discrimintive mesure s: dis(f) = {D f f iso f} D f, where D f is the list proilistic grphs g s.t. f iso g c. Given discrimintive threshold γ, feture f is discrimintive, iff dis(f) > γ. Thus we should lso select discrimintive feture. t Bsed on the ove discussion, we select frequent nd discrimintive fetures, which is implemented in Algorithm 4. In this lgorithm, we first initil feture set F with single edge or vertex (line -4). Then we increse feture size (numer of vertices) from, nd pick out desirle fetures (line 6-9). mxl is used to control the feture size, nd gurntees picking out smll size feture stisfying rule 2. frq(f) nd dis(f) re used to mesure the frequency nd discrimintion of feture. The controlling prmeters α, β nd γ gurntee picking out feture stisfying rule. The defult vlues of the prmeters re usully set to 0. [37, 38]. Algorithm 4 FetureSelection(D, α, β, γ, mxl) : F ϕ; 2: Initil feture set F with single edge or vertex; 3: D f {g f iso g c }; 4: F F {f}; 5: for i = to mxl do 6: for ech feture f with i vertices do 7: if frq(f) β & dis(f) > γ then 8: D f {g f iso g c }; 9: F F {f}; 0: end if : end for 2: end for 3: return F ; 5. VERIFICATION In this section, we present the lgorithms to compute sugrph similrity proility (SSP) of cndidte proilistic grph g to q. Eqution 4 is the formul to compute SSP. By simplifying this eqution, we hve P r(q sim g) = ( ) i J {,...,}, J =i P r( J j=brqj). (2) Clerly, we need exponentil numer of steps to perform the exct clcultion. Therefore, we develop n efficient smpling lgorithm to estimte P r(q sim g). By Eqution 4, we know there re totlly Brqs tht re used to compute SSP. By Eqution 0, we know Brq = Bf... Bf Ef. Then, we hve, P r(q sim g) = P r(bf... Bf m) (22) where m is the numer of Bfs contined in these Brqs. Assume m Bfs hve x,..., x k Boolen vriles for uncertin edges. Algorithm 5 gives detiled steps of the smpling lgorithm. In this lgorithm, we use junction tree lgorithm to clculte P r(bf i) [7]. Algorithm 5 Clculte P r(q sim g) : Cnt = 0, V = m P r(bfi); 2: N = (4ln2/ξ)/τ 2 ; 3: for to N do 4: rndomly choose i {,..., m} with proility P r(bf i )/V ; 5: rndomly choose x,.., x k (ccording to proility P r(x ne )) with {0, } s.t. Bf i = ; 6: if Bf = 0... Bf i = 0 then 7: Cnt = Cnt + ; 8: end if 9: end for 0: return Cnt/N; 6. PERFORMANCE EVALUATION In this section, we report the effectiveness nd efficiency test results of our new proposed techniques. Our methods re implemented on Windows XP mchine with Core 2 Duo CPU (

10 GHz nd 2.8 GHz) nd 4GB min memory. Progrms re compiled y Microsoft Visul C In the experiments, we use rel proilistic grph dte set. Rel Proilistic Grph Dtset. The rel proilistic grph dtset is otined from the STRING dtse 7 tht contins the protein-protein interction (PPI) networks of orgnisms in the BioGRID dtse 8. A PPI network is proilistic grph where vertices represent proteins, edges represent interctions etween proteins, the lels of vertices re the COG functionl nnottions of proteins 9 provided y the STRING dtse, nd the existence proilities of edges re provided y the STRING dtse. We extrct 5K proilistic grphs from the dtse. The proilistic grphs hve n verge numer of 385 vertices nd 62 edges. Ech edge hs n verge vlue of existence proility. According to [9], the neighor PPIs (edges) re dominted y the strongest interctions of the neighor PPIs. Thus, for ech neighor edge set ne, we set its proilities s: P r(x ne ) = mx i ne P r(x i ), where x i is inry ssignment to ech edge in ne. Then, for ech ne, we otin 2 ne proilities. We normlize those proilities to construct the proility distriution, of ne, tht is input into lgorithms. Ech query set qi hs connected query grphs nd query grphs in qi re size-i grphs (the edge numer in ech query is i), which re extrcted from corresponding deterministic grphs of proilistic grphs rndomly, such s q50, q, q50, q200 nd q250. In sclility test, we rndomly generte 2k, 4K, 6K, 8K nd 0K dt grphs. The setting of experimentl prmeters is set s follows: the proility threshold is , nd the defult vlue is 0.5; the sugrph distnce is 2 6, nd the defult vlue is 4; the query size is , nd the defult vlue is 50. In feture genertion, the vlue of mxl is , nd the defult vlue is 50; the vlues of {α, β, γ} re , nd the defult vlue is 0.5. As introduced in Section.2, we implement the method in [38] to do structurl pruning. This method is clled Structure in experiments. In proilistic pruning, the method using ounds of sugrph similrity proility is clled SSPBound, nd the pproch using the est ounds is clled OPT-SSPBound. To implement SSPBound, for ech rq i, we rndomly find two fetures stisfying conditions in proilistic mtrix index (PMI). The method using ounds of sugrph isomorphism proility is clled SIP- Bound, nd the method using the tightest ound pproch is clled OPT-SIPBound. In verifiction, the smpling lgorithm is clled SMP, nd the method given y Eqution 2 is clled Exct. Since there re no pervious works on the topic studied in this pper, we lso compre the proposed lgorithms with Exct tht scns the proilistic grph dtses one y one. The complete proposed lgorithm of this pper is clled PMI. We report verge results in following experiments. In the first experiment, we demonstrte the efficiency of SMP ginst Exct in verifiction step. We first run structurl nd proilistic filtering lgorithms ginst the defult dtset to crete cndidte sets. The cndidte sets re then verified for clculting SSP using proposed lgorithms. Figure 9() reports the result, from which we know SMP is efficient with verge time less thn 3 seconds, while the curve of Exct decreses in exponentil. The pproximtion qulity of SMP is mesured y the precision nd recll metrics with respect to query size shown in Figure 9(). Precision is the percentge of true proilistic grphs in the output proilistic grphs. Recll is the percentge of returned proilistic grphs in ll true proilistic grphs. The experimentl results verify tht SMP hs very high pproximtion qulity with precision nd recll oth lrger thn 90%. We use SMP for verifiction in following experiments. Figure 0 reports cndidte sizes nd pruning time of SSPBound, OPT-SSPBound nd Structure with respect to proility thresholds. Recll tht SSPBound nd OPT-SSPBound re derived from upper nd lower ounds of SIP. Here, we feed them with OPT- SIPBound. From the results, we know tht the rs of SSPBound nd OPT-SSPBound decrese with the increse of proility threshold, since lrger thresholds cn remove more flse grphs with low confidences. As shown in Figure 0(), the cndidte size of OPT- SSPBound is very smll (i.e., 5 on verge), nd is smller thn tht of SSPBound, which indictes tht our derived est ounds re tight enough to hve gret pruning power. As shown in Figure 0(), OPT-SSPBound hs short pruning time (i.e., smller thn s on verge) ut tkes more time thn SSPBound due to more sugrph isomorphic tests during the clcultion of OPT-SSPBound. Oviously, proilities do not hve impcts on Structure, nd thus oth rs of Structure hold constnt. Figure shows cndidte sizes nd pruning time of SIPBound, OPT-SIPBound nd Structure with respect to sugrph distnce thresholds. To exmine the two metrics, we feed SIPBound nd OPT-SIPBound to OPT-SSPBound. From the results, we know tht ll rs increse with the increse of sugrph distnce threshold, since lrger thresholds led to lrge remining grph set which is input into the proposed lgorithms. Both OPT-SIPBound nd SIPBound hve smll numer of cndidte grphs, ut OPT- SIPBound tkes more time due to dditionl time for computing tightest ounds. From Figures 0() nd (), we elieve tht though Structure remins lrge numer of cndidtes, the proilistic pruning lgorithms cn further remove most flse grphs with efficient runtime. This oservtion verifies our lgorithmic frmework (i.e., structure pruning proilistic pruning verifiction) is effective to process queries on lrge proilistic grph dtse. Figure 2 exmines the impct of prmeters {mxl, α, β, γ} for feture genertion. Structure holds constnt in th results, since the feture genertion lgorithm is used for proilistic pruning. From Figure 2(), we know the lrger mxl is, the more cndidtes SSPBound nd OPT-SSPBound hve. The reson is tht the lrge mxl genertes lrge sized fetures, which leds to loose proilistic ounds. From Figure 2(), we see tht ll rs of proilistic pruning first decrese nd then increse, nd rech lowest t the vlues 0. nd 0.5 of α. As shown in Figures 2(c) nd 2(d), oth rs of OPT-SIPBound decrese s the vlues of prmeters increse, since either lrge β or lrge γ results in fewer fetures. Figure 3 reports totl query processing time with respect to different grph dtse sizes. PMI denotes the complete lgorithm, tht is, comintion of Structure, OPT-SSPBound (feed OPT-SIPBound) nd SMP. From the result, we know PMI hs quite efficient runtime nd voids the huge cost of computing SSP (#Pcomplete). PMI cn process queries within 0 seconds on verge. But the runtime of Exct grows in exponentil, nd hs gone eyond 0 seconds t the dtse size of 6k. The result of this experiment vlidtes the designs of this pper. Figur exmines the qulity of query nswers sed on proility correlted nd independent models. The query returns proilistic grphs if the proilistic grphs nd the query (sugrph) elong to the sme orgnism. We sy the query nd proilistic grph elong to the sme orgnism if the sugrph similrity proility is not less thn the threshold. In fct the STRING 809

Fast Frequent Free Tree Mining in Graph Databases

Fast Frequent Free Tree Mining in Graph Databases The Chinese University of Hong Kong Fst Frequent Free Tree Mining in Grph Dtses Peixing Zho Jeffrey Xu Yu The Chinese University of Hong Kong Decemer 18 th, 2006 ICDM Workshop MCD06 Synopsis Introduction

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Efficient Subgraph Search over Large Uncertain Graphs

Efficient Subgraph Search over Large Uncertain Graphs Efficient Subgrph Serch over Lrge Uncertin Grphs Ye Yun Guoren Wng Hixun Wng Lei Chen College of Informtion Science nd Engineering, Northestern University, Chin Microsoft Reserch si Hong Kong University

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

CS 330 Formal Methods and Models

CS 330 Formal Methods and Models CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2017 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 2 1. Prove ((( p q) q) p) is tutology () (3pts) y truth tle. p q p q

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

5.1 How do we Measure Distance Traveled given Velocity? Student Notes

5.1 How do we Measure Distance Traveled given Velocity? Student Notes . How do we Mesure Distnce Trveled given Velocity? Student Notes EX ) The tle contins velocities of moving cr in ft/sec for time t in seconds: time (sec) 3 velocity (ft/sec) 3 A) Lel the x-xis & y-xis

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Linear Inequalities. Work Sheet 1

Linear Inequalities. Work Sheet 1 Work Sheet 1 Liner Inequlities Rent--Hep, cr rentl compny,chrges $ 15 per week plus $ 0.0 per mile to rent one of their crs. Suppose you re limited y how much money you cn spend for the week : You cn spend

More information

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1. Mth Anlysis CP WS 4.X- Section 4.-4.4 Review Complete ech question without the use of grphing clcultor.. Compre the mening of the words: roots, zeros nd fctors.. Determine whether - is root of 0. Show

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

set is not closed under matrix [ multiplication, ] and does not form a group.

set is not closed under matrix [ multiplication, ] and does not form a group. Prolem 2.3: Which of the following collections of 2 2 mtrices with rel entries form groups under [ mtrix ] multipliction? i) Those of the form for which c d 2 Answer: The set of such mtrices is not closed

More information

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

1 ELEMENTARY ALGEBRA and GEOMETRY READINESS DIAGNOSTIC TEST PRACTICE

1 ELEMENTARY ALGEBRA and GEOMETRY READINESS DIAGNOSTIC TEST PRACTICE ELEMENTARY ALGEBRA nd GEOMETRY READINESS DIAGNOSTIC TEST PRACTICE Directions: Study the exmples, work the prolems, then check your nswers t the end of ech topic. If you don t get the nswer given, check

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 1 / 16 Non-Signling Boxes The primry lesson from lst lecture

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O 1 Section 5. The Definite Integrl Suppose tht function f is continuous nd positive over n intervl [, ]. y = f(x) x The re under the grph of f nd ove the x-xis etween nd is denoted y f(x) dx nd clled the

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

QUADRATURE is an old-fashioned word that refers to

QUADRATURE is an old-fashioned word that refers to World Acdemy of Science Engineering nd Technology Interntionl Journl of Mthemticl nd Computtionl Sciences Vol:5 No:7 011 A New Qudrture Rule Derived from Spline Interpoltion with Error Anlysis Hdi Tghvfrd

More information

Random subgroups of a free group

Random subgroups of a free group Rndom sugroups of free group Frédérique Bssino LIPN - Lortoire d Informtique de Pris Nord, Université Pris 13 - CNRS Joint work with Armndo Mrtino, Cyril Nicud, Enric Ventur et Pscl Weil LIX My, 2015 Introduction

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac REVIEW OF ALGEBRA Here we review the bsic rules nd procedures of lgebr tht you need to know in order to be successful in clculus. ARITHMETIC OPERATIONS The rel numbers hve the following properties: b b

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

4.4 Areas, Integrals and Antiderivatives

4.4 Areas, Integrals and Antiderivatives . res, integrls nd ntiderivtives 333. Ares, Integrls nd Antiderivtives This section explores properties of functions defined s res nd exmines some connections mong res, integrls nd ntiderivtives. In order

More information

Section 6.1 Definite Integral

Section 6.1 Definite Integral Section 6.1 Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot e determined

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

Section 6.1 INTRO to LAPLACE TRANSFORMS

Section 6.1 INTRO to LAPLACE TRANSFORMS Section 6. INTRO to LAPLACE TRANSFORMS Key terms: Improper Integrl; diverge, converge A A f(t)dt lim f(t)dt Piecewise Continuous Function; jump discontinuity Function of Exponentil Order Lplce Trnsform

More information

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2. Mth 43 Section 6. Section 6.: Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot

More information

Optimal Network Design with End-to-End Service Requirements

Optimal Network Design with End-to-End Service Requirements ONLINE SUPPLEMENT for Optiml Networ Design with End-to-End Service Reuirements Anntrm Blrishnn University of Tes t Austin, Austin, TX Gng Li Bentley University, Wlthm, MA Prsh Mirchndni University of Pittsurgh,

More information

September 13 Homework Solutions

September 13 Homework Solutions College of Engineering nd Computer Science Mechnicl Engineering Deprtment Mechnicl Engineering 5A Seminr in Engineering Anlysis Fll Ticket: 5966 Instructor: Lrry Cretto Septemer Homework Solutions. Are

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Hamiltonian Cycle in Complete Multipartite Graphs

Hamiltonian Cycle in Complete Multipartite Graphs Annls of Pure nd Applied Mthemtics Vol 13, No 2, 2017, 223-228 ISSN: 2279-087X (P), 2279-0888(online) Pulished on 18 April 2017 wwwreserchmthsciorg DOI: http://dxdoiorg/1022457/pmv13n28 Annls of Hmiltonin

More information

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom Lerning Gols Continuous Rndom Vriles Clss 5, 8.05 Jeremy Orloff nd Jonthn Bloom. Know the definition of continuous rndom vrile. 2. Know the definition of the proility density function (pdf) nd cumultive

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n ) Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Golden Section Search Method - Theory

Golden Section Search Method - Theory Numericl Methods Golden Section Serch Method - Theory http://nm.mthforcollege.com For more detils on this topic Go to http://nm.mthforcollege.com Click on Keyword Click on Golden Section Serch Method You

More information

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy: Scnner Specifying ptterns source code tokens scnner prser IR A scnner must recognize the units of syntx Some prts re esy: errors mps chrcters into tokens the sic unit of syntx x = x + y; ecomes

More information

Chapter 3 Polynomials

Chapter 3 Polynomials Dr M DRAIEF As described in the introduction of Chpter 1, pplictions of solving liner equtions rise in number of different settings In prticulr, we will in this chpter focus on the problem of modelling

More information

An Overview of Integration

An Overview of Integration An Overview of Integrtion S. F. Ellermeyer July 26, 2 The Definite Integrl of Function f Over n Intervl, Suppose tht f is continuous function defined on n intervl,. The definite integrl of f from to is

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Reasoning with Bayesian Networks

Reasoning with Bayesian Networks Complexity of Probbilistic Inference Compiling Byesin Networks Resoning with Byesin Networks Lecture 5: Complexity of Probbilistic Inference, Compiling Byesin Networks Jinbo Hung NICTA nd ANU Jinbo Hung

More information

Polynomials and Division Theory

Polynomials and Division Theory Higher Checklist (Unit ) Higher Checklist (Unit ) Polynomils nd Division Theory Skill Achieved? Know tht polynomil (expression) is of the form: n x + n x n + n x n + + n x + x + 0 where the i R re the

More information

Matching patterns of line segments by eigenvector decomposition

Matching patterns of line segments by eigenvector decomposition Title Mtching ptterns of line segments y eigenvector decomposition Author(s) Chn, BHB; Hung, YS Cittion The 5th IEEE Southwest Symposium on Imge Anlysis nd Interprettion Proceedings, Snte Fe, NM., 7-9

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Torsion in Groups of Integral Triangles

Torsion in Groups of Integral Triangles Advnces in Pure Mthemtics, 01,, 116-10 http://dxdoiorg/1046/pm011015 Pulished Online Jnury 01 (http://wwwscirporg/journl/pm) Torsion in Groups of Integrl Tringles Will Murry Deprtment of Mthemtics nd Sttistics,

More information

Designing Information Devices and Systems I Spring 2018 Homework 7

Designing Information Devices and Systems I Spring 2018 Homework 7 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should

More information

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS CS 310 (sec 20) - Winter 2003 - Finl Exm (solutions) SOLUTIONS 1. (Logic) Use truth tles to prove the following logicl equivlences: () p q (p p) (q q) () p q (p q) (p q) () p q p q p p q q (q q) (p p)

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

Chapter 6 Techniques of Integration

Chapter 6 Techniques of Integration MA Techniques of Integrtion Asst.Prof.Dr.Suprnee Liswdi Chpter 6 Techniques of Integrtion Recll: Some importnt integrls tht we hve lernt so fr. Tle of Integrls n+ n d = + C n + e d = e + C ( n ) d = ln

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

Ehrenfeucht-Fraïssé Games: Applications and Complexity. Department of Mathematics and Computer Science University of Udine, Italy ESSLLI 2010 CPH

Ehrenfeucht-Fraïssé Games: Applications and Complexity. Department of Mathematics and Computer Science University of Udine, Italy ESSLLI 2010 CPH Ehrenfeucht-Frïssé Gmes: Applictions nd Complexity Angelo Montnri Nicol Vitcolonn Deprtment of Mthemtics nd Computer Science University of Udine, Itly ESSLLI 2010 CPH Outline Introduction to EF-gmes Inexpressivity

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line APPENDIX D Preclculus Review APPENDIX D.1 Rel Numers n the Rel Numer Line Rel Numers n the Rel Numer Line Orer n Inequlities Asolute Vlue n Distnce Rel Numers n the Rel Numer Line Rel numers cn e represente

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Vectors , (0,0). 5. A vector is commonly denoted by putting an arrow above its symbol, as in the picture above. Here are some 3-dimensional vectors:

Vectors , (0,0). 5. A vector is commonly denoted by putting an arrow above its symbol, as in the picture above. Here are some 3-dimensional vectors: Vectors 1-23-2018 I ll look t vectors from n lgeric point of view nd geometric point of view. Algericlly, vector is n ordered list of (usully) rel numers. Here re some 2-dimensionl vectors: (2, 3), ( )

More information

Interpreting Integrals and the Fundamental Theorem

Interpreting Integrals and the Fundamental Theorem Interpreting Integrls nd the Fundmentl Theorem Tody, we go further in interpreting the mening of the definite integrl. Using Units to Aid Interprettion We lredy know tht if f(t) is the rte of chnge of

More information

Linear Systems with Constant Coefficients

Linear Systems with Constant Coefficients Liner Systems with Constnt Coefficients 4-3-05 Here is system of n differentil equtions in n unknowns: x x + + n x n, x x + + n x n, x n n x + + nn x n This is constnt coefficient liner homogeneous system

More information