Topi II.1: Fequent Sugph Mining Disete Topis in Dt Mining Univesität des Slndes, Süken Winte Semeste 2012/13 T II.1-1
TII.1: Fequent Sugph Mining 1. Definitions nd Polems 1.1. Gph Isomophism 2. Apioi-Bsed Gph Mining (AGM) 2.1. Lelled Adjeny Mties 2.2. Mtix Codes 2.3. Noml nd Cnonil Foms 3. DFS-Bsed Method: gspn 3.1. DFS Tees 3.2. DFS Codes nd Thei Odes 3.3. Cndidte Genetion 20 Noveme 2012 T II.1-2
Definitions nd Polems The dt is set of gphs D = {G1, G2,, Gn} Dieted o undieted The gphs Gi e lelled Eh vetex v hs lel L(v) Eh edge e = (u, v) hs lel L(u, v) Dt n e e.g. moleule stutues 20 Noveme 2012 T II.1-3
Gph Isomophism Gphs G = (V, E) nd G = (V, E ) e isomophi if thee exists ijetive funtion φ: V V suh tht (u, v) E if nd only if (φ(u), φ(v)) E L(v) = L(φ(v)) fo ll v V L(u, v) = L(φ(u), φ(v)) fo ll (u, v) E Gph G is sugph isomophi to G if thee exists sugph of G whih is isomophi to G No polynomil-time lgoithm is known fo detemining if G nd G e isomophi Detemining if G is sugph isomophi to G is NPhd 20 Noveme 2012 T II.1-4
Equivlene nd Cnonil Gphs Isomophism defines n equivlene lss id: V V, id(v) = v shows G is isomophi to itself If G is isomophi to G vi φ, then G is isomophi to G vi φ 1 If G is isomophi to H vi φ nd H to I vi χ, then G is isomophi to I vi φ χ A noniztion of gph G, non(g) podues nothe gph C suh tht if H is gph tht is isomophi to G, non(g) = non(h) Two gphs e isomophi if nd only if thei nonil vesions e the sme 20 Noveme 2012 T II.1-5
An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-6
An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-7
An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8
An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8
An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8
An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8
An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8
Fequent Sugph Mining Given set D of n gphs nd minimum suppot pmete minsup, find ll onneted gphs tht e sugph isomophi to t lest minsup gphs in D Enomously omplex polem Fo gphs tht hve m veties thee e 2 O(m2 ) sugphs (not ll e onneted) If we hve s lels fo veties nd edges we hve O (2s) O(m2 ) lelings of the diffeent gphs Counting the suppot mens solving multiple NP-hd polems 20 Noveme 2012 T II.1-9
An Exmple 20 Noveme 2012 T II.1-10
An Exmple 20 Noveme 2012 T II.1-10
An Exmple 20 Noveme 2012 T II.1-10
Apioi-Bsed Gph Mining (AGM) Sugph fequeny follows downwds losedness popety A supegph nnot e fequent unless its sugph is Ide: genete ll k-vetex gphs tht e supegphs of k 1 vetex fequent gphs nd hek fequeny Two polems: How to genete the gphs How to hek the fequeny Ide: do the genetion sed on djeny mties Inokuhi, Wshio & Motod 2000 20 Noveme 2012 T II.1-11
Mties nd Codes In lelled djeny mtix we hve Vetex lels in the digonl Edge lels in off-digonl (o 0 if no edges) The ode of the the djeny mtix X is the loweleft tingul sumtix listed in ow-mjo ode x1,1x2,1x2,2x3,1 xk,1 xk,k xn,n The djeny mties n e soted using the stndd lexiogphil ode in thei odes 20 Noveme 2012 T II.1-12
Joining Two Sugphs Assume we hve two fequent sugphs of k veties whose djeny mties gee on the fist k 1 edges ( ) ) Xk 1 x 1 X k = x T 2 x kk,y k = ( Xk 1 y 1 ( ) ( ) y T 2 y kk We n do the join s follows Z k+1 = X k 1 x 1 y 1 x T 2 x kk z k,k+1 y T 2 z k+1,k y kk zk+1,k = zk,k+1 ssumes ll possile edge lels is the djeny mtix epesenting the gph whose siz One mtix fo eh possiility = X k y 1 z k,k+1 y T 2 z k+1,k y kk 20 Noveme 2012 T II.1-13
Avoiding Redundny The two djeny mties e joined only if ode(xk) ode(yk) ( noml ode ) We need to onfim tht ll sugphs of the esulting (k +1)-vetex mtix e fequent We need to onside the noml-ode geneted k-vetex sugphs The lgoithm only stoes noml-ode geneted gphs They e geneted y e-geneting the k-vetex sugph fom singletons in noml ode Poess is lled nomliztion nd n ompute the noml foms of ll sugphs Nomliztion n e expessed s ow nd olumn pemuttions: Xn = P T XP 20 Noveme 2012 T II.1-14
Cnonil Foms Isomophi gphs n hve mny diffeent noml foms Given set NF(G) of ll noml foms epesenting gphs isomophi to G, the nonil fom of G is the djeny mtix X tht hs the minimum ode in NF(G) X = g min {ode(x) : X NF(G)} Given n djeny mtix X, its noml fom is Xn = P T XP fo some pemuttion mtix P, nd its nonil fom X is Q T P T XPQ fo some pemuttion mtix Q 20 Noveme 2012 T II.1-15
Finding Cnonil Foms Let X e n djeny mtix of k+1 veties Let Y e X with vetex m emoved Let P e the pemuttion of Y to its noml fom nd Q the pemuttion of P T YP to the nonil fom We ssume we hve ledy omputed them We ompute ndidte P nd Q fo X y Q is like Q ut ottom-ight one is 1 p ij is pij if i < m nd j k pi 1,j if i > m nd j k 1 if i = m nd j = k 0 othewise Finl P nd Q e found y tying ll ndidtes nd seleting the ones tht give the lowest ode 20 Noveme 2012 T II.1-16
The Algoithm Stt with fequent gphs of 1 vetex while thee e fequent gphs left Join two fequent (k 1)-vetex gphs Chek the esulting gphs sugphs e fequent If not, ontinue Compute the nonil fom of the gph If this nonil fom hs ledy een studied, ontinue Compe the nonil fom with the nonil foms of the k-vetex sugphs of the gphs in D If the gph is fequent, keep, othewise disd etun ll fequent sugphs 20 Noveme 2012 T II.1-17
The gspn Algoithm We n impove the unning time of fequent sugph mining y eithe Mking the fequeny hek fste Lots of effots in fste isomophism heking ut only little pogess Ceting less ndidtes tht need to e heked Level-wise lgoithms (like AGM) genete huge numes of ndidtes Eh must e heked with fo isomophism with othes The gspn (gph-sed Sustutue ptten mining) lgoithm eples the level-wise ppoh with depth-fist ppoh Yn & Hn 2002; Z&M Ch. 11 20 Noveme 2012 T II.1-18
Depth-Fist Spnning Tee A dept-fist spnning (DFS) tee of gph G Is onneted tee Contins ll the veties of G Is uild in depth-fist ode Seletion etween the silings is e.g. sed on the vetex index Edges of the DFS tee e fowd edges Edges not in the DFS tee e kwd edges A ightmost pth in the DFS tee is the pth tvels fom the oot to the ightmost vetex y lwys tking the ightmost hild (lst-dded) 20 Noveme 2012 T II.1-19
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20
The DFS Tee v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-21
Geneting Cndidtes fom DFS Tee Given gph G, we extend it only fom the veties in the ightmost pth We n dd kwds edges fom the ightmost vetex to some othe vetex in the ightmost pth We n dd fowd edge fom ny vetex in the ightmost pth This ineses the nume of veties y 1 The ode of geneting the ndidtes is Fist kwd extensions Fist to oot, then to oot s hild, Then fowd extensions Fist fom the lef, then fom lef s fthe, 20 Noveme 2012 T II.1-22
An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23
An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23
An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23
An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23
An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23
An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23
An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23
DFS Codes nd thei Odes A DFS ode is sequene of tuples of type vi, vj, L(vi), L(vj), L(vi,vj) Tuples e given in DFS ode Bkwds edges e listed efoe fowd edges A DFS ode is nonil if it is the smllest of the odes in the odeing vi, vj, L(vi), L(vj), L(vi,vj) < vx, vy, L(vx), L(vy), L(vx,vy) if vi, vj <e vx, vy ; o vi, vj = vx, vy nd L(vi), L(vj), L(vi, vj) <l L(vx), L(vy), L(vx, vy) The odeing of the lel tuples is the lexiogphil odeing 20 Noveme 2012 T II.1-24
Odeing the Edges Let eij = vi, vj nd exy = vx, vy eij <e exy if If eij nd exy e fowd edges, then j < y; o j = y nd i > x If eij nd exy e kwd edges, then i < x; o i = x nd j < y If eij is fowd nd exy is kwd, then i < y If eij is kwd nd exy is fowd, then j x 20 Noveme 2012 T II.1-25
Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, 20 Noveme 2012 T II.1-26
Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, Fist ows e identil t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, 20 Noveme 2012 T II.1-26
Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, In seond ow, G2 is igge in lels ode 20 Noveme 2012 T II.1-26
Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, Lst ows e fowd edges nd 4 = 4 ut 2 > 1 G1 is smllest 20 Noveme 2012 T II.1-26
Building the Cndidtes The ndidtes e uild in DFS ode tee A DFS ode is n nesto of DFS ode if is pope pefix of The silings in the tee follow the DFS ode ode A gph n e fequent only if ll of the gph epesenting its nestos in the DFS tee e fequent The DFS tee ontins ll the nonil odes fo ll the sugphs of the gphs in the dt But not ll of the veties in the ode tee oespond to nonil odes We will (impliitly) tvese this tee 20 Noveme 2012 T II.1-27
The Algoithm gspn: fo eh fequent 1-edge gphs ll sugm to gow ll nodes in the ode tee ooted in this 1-edge gph emove this edge fom the gph sugm if the ode is not nonil, etun Add this gph to the set of fequent gphs Cete eh supe-gph with one moe edge nd ompute its fequeny ll sugm with eh fequent supe-gph 20 Noveme 2012 T II.1-28