Topic II.1: Frequent Subgraph Mining

Similar documents
Chapter 7. Kleene s Theorem. 7.1 Kleene s Theorem. The following theorem is the most important and fundamental result in the theory of FA s:

The Area of a Triangle

( ) D x ( s) if r s (3) ( ) (6) ( r) = d dr D x

Section 1.3 Triangles

CS 573 Automata Theory and Formal Languages

Andersen s Algorithm. CS 701 Final Exam (Reminder) Friday, December 12, 4:00 6:00 P.M., 1289 Computer Science.

Prerna Tower, Road No 2, Contractors Area, Bistupur, Jamshedpur , Tel (0657) ,

10.3 The Quadratic Formula

10 Statistical Distributions Solutions

Discrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α

ITI Introduction to Computing II

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Arrow s Impossibility Theorem

INTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Swinburne Research Bank

EECE 260 Electrical Circuits Prof. Mark Fowler

Language Processors F29LP2, Lecture 5

Convert the NFA into DFA

gr0 GRAPHS Hanan Samet

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

2-Way Finite Automata Radboud University, Nijmegen. Writer: Serena Rietbergen, s Supervisor: Herman Geuvers

Week 8. Topic 2 Properties of Logarithms

Chapter 4. Sampling of Continuous-Time Signals

Graph States EPIT Mehdi Mhalla (Calgary, Canada) Simon Perdrix (Grenoble, France)

, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1.

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Surface maps into free groups

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Constraint Satisfaction Problems

A Bijective Approach to the Permutational Power of a Priority Queue

1 Nondeterministic Finite Automata

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Data Structures. Element Uniqueness Problem. Hash Tables. Example. Hash Tables. Dana Shapira. 19 x 1. ) h(x 4. ) h(x 2. ) h(x 3. h(x 1. x 4. x 2.

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Physics 604 Problem Set 1 Due Sept 16, 2010

Chapter 2 Finite Automata

CS 275 Automata and Formal Language Theory

Discrete Model Parametrization

Arrow s Impossibility Theorem

Pythagoras Theorem. The area of the square on the hypotenuse is equal to the sum of the squares on the other two sides

Lecture 10. Solution of Nonlinear Equations - II

gr0 GRAPHS Hanan Samet

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

System Validation (IN4387) November 2, 2012, 14:00-17:00

First Midterm Examination

Electronic Supplementary Material

Finite State Automata and Determinisation

Math 151 Homework 2 Solutions (Winter 2015)

Properties and Formulas

Regular expressions, Finite Automata, transition graphs are all the same!!

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Logic Synthesis and Verification

Graph Theory. Simple Graph G = (V, E). V={a,b,c,d,e,f,g,h,k} E={(a,b),(a,g),( a,h),(a,k),(b,c),(b,k),...,(h,k)}

QUADRATIC EQUATION. Contents

Bases for Vector Spaces

Mathematical Reflections, Issue 5, INEQUALITIES ON RATIOS OF RADII OF TANGENT CIRCLES. Y.N. Aliyev

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

First Midterm Examination

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Data Compression LZ77. Jens Müller Universität Stuttgart

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Class Summary. be functions and f( D) , we define the composition of f with g, denoted g f by

π,π is the angle FROM a! TO b

Parametric Methods. Autoregressive (AR) Moving Average (MA) Autoregressive - Moving Average (ARMA) LO-2.5, P-13.3 to 13.4 (skip

Non Right Angled Triangles

CIT 596 Theory of Computation 1. Graphs and Digraphs

3.1 Magnetic Fields. Oersted and Ampere

10 m, so the distance from the Sun to the Moon during a solar eclipse is. The mass of the Sun, Earth, and Moon are = =

Supplementary information Efficient Enumeration of Monocyclic Chemical Graphs with Given Path Frequencies

U>, and is negative. Electric Potential Energy

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Coalgebra, Lecture 15: Equations for Deterministic Automata

Analysis of Variance for Multiple Factors

. Using our polar coordinate conversions, we could write a

Computing data with spreadsheets. Enter the following into the corresponding cells: A1: n B1: triangle C1: sqrt

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

NON-DETERMINISTIC FSA

CHAPTER 18: ELECTRIC CHARGE AND ELECTRIC FIELD

Chapter 4 State-Space Planning

More Properties of the Riemann Integral

Correspondence Analysis & Related Methods

in the wost cse. (All logithms in this ppe e se logithms.) The est known soting metho, clle mege insetion y Knuth [9], is ue to Leste Fo J. n Selme Jo

Probability. b a b. a b 32.

The Formulas of Vector Calculus John Cullinan

50 AMC Lectures Problem Book 2 (36) Substitution Method

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

Prefix-Free Regular-Expression Matching

Fourier-Bessel Expansions with Arbitrary Radial Boundaries

Pre-Lie algebras, rooted trees and related algebraic structures

Matrix Algebra. Matrix Addition, Scalar Multiplication and Transposition. Linear Algebra I 24

CHAPTER 1 Regular Languages. Contents

ITI Introduction to Computing II

Gold s algorithm. Acknowledgements. Why would this be true? Gold's Algorithm. 1 Key ideas. Strings as states

EXERCISE - 01 CHECK YOUR GRASP

Deterministic simulation of a NFA with k symbol lookahead

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Validating XML Documents in the Streaming Model with External Memory

24.2 Naive Backtracking

Transcription:

Topi II.1: Fequent Sugph Mining Disete Topis in Dt Mining Univesität des Slndes, Süken Winte Semeste 2012/13 T II.1-1

TII.1: Fequent Sugph Mining 1. Definitions nd Polems 1.1. Gph Isomophism 2. Apioi-Bsed Gph Mining (AGM) 2.1. Lelled Adjeny Mties 2.2. Mtix Codes 2.3. Noml nd Cnonil Foms 3. DFS-Bsed Method: gspn 3.1. DFS Tees 3.2. DFS Codes nd Thei Odes 3.3. Cndidte Genetion 20 Noveme 2012 T II.1-2

Definitions nd Polems The dt is set of gphs D = {G1, G2,, Gn} Dieted o undieted The gphs Gi e lelled Eh vetex v hs lel L(v) Eh edge e = (u, v) hs lel L(u, v) Dt n e e.g. moleule stutues 20 Noveme 2012 T II.1-3

Gph Isomophism Gphs G = (V, E) nd G = (V, E ) e isomophi if thee exists ijetive funtion φ: V V suh tht (u, v) E if nd only if (φ(u), φ(v)) E L(v) = L(φ(v)) fo ll v V L(u, v) = L(φ(u), φ(v)) fo ll (u, v) E Gph G is sugph isomophi to G if thee exists sugph of G whih is isomophi to G No polynomil-time lgoithm is known fo detemining if G nd G e isomophi Detemining if G is sugph isomophi to G is NPhd 20 Noveme 2012 T II.1-4

Equivlene nd Cnonil Gphs Isomophism defines n equivlene lss id: V V, id(v) = v shows G is isomophi to itself If G is isomophi to G vi φ, then G is isomophi to G vi φ 1 If G is isomophi to H vi φ nd H to I vi χ, then G is isomophi to I vi φ χ A noniztion of gph G, non(g) podues nothe gph C suh tht if H is gph tht is isomophi to G, non(g) = non(h) Two gphs e isomophi if nd only if thei nonil vesions e the sme 20 Noveme 2012 T II.1-5

An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-6

An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-7

An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8

An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8

An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8

An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8

An Exmple of Isomophi Gphs 20 Noveme 2012 T II.1-8

Fequent Sugph Mining Given set D of n gphs nd minimum suppot pmete minsup, find ll onneted gphs tht e sugph isomophi to t lest minsup gphs in D Enomously omplex polem Fo gphs tht hve m veties thee e 2 O(m2 ) sugphs (not ll e onneted) If we hve s lels fo veties nd edges we hve O (2s) O(m2 ) lelings of the diffeent gphs Counting the suppot mens solving multiple NP-hd polems 20 Noveme 2012 T II.1-9

An Exmple 20 Noveme 2012 T II.1-10

An Exmple 20 Noveme 2012 T II.1-10

An Exmple 20 Noveme 2012 T II.1-10

Apioi-Bsed Gph Mining (AGM) Sugph fequeny follows downwds losedness popety A supegph nnot e fequent unless its sugph is Ide: genete ll k-vetex gphs tht e supegphs of k 1 vetex fequent gphs nd hek fequeny Two polems: How to genete the gphs How to hek the fequeny Ide: do the genetion sed on djeny mties Inokuhi, Wshio & Motod 2000 20 Noveme 2012 T II.1-11

Mties nd Codes In lelled djeny mtix we hve Vetex lels in the digonl Edge lels in off-digonl (o 0 if no edges) The ode of the the djeny mtix X is the loweleft tingul sumtix listed in ow-mjo ode x1,1x2,1x2,2x3,1 xk,1 xk,k xn,n The djeny mties n e soted using the stndd lexiogphil ode in thei odes 20 Noveme 2012 T II.1-12

Joining Two Sugphs Assume we hve two fequent sugphs of k veties whose djeny mties gee on the fist k 1 edges ( ) ) Xk 1 x 1 X k = x T 2 x kk,y k = ( Xk 1 y 1 ( ) ( ) y T 2 y kk We n do the join s follows Z k+1 = X k 1 x 1 y 1 x T 2 x kk z k,k+1 y T 2 z k+1,k y kk zk+1,k = zk,k+1 ssumes ll possile edge lels is the djeny mtix epesenting the gph whose siz One mtix fo eh possiility = X k y 1 z k,k+1 y T 2 z k+1,k y kk 20 Noveme 2012 T II.1-13

Avoiding Redundny The two djeny mties e joined only if ode(xk) ode(yk) ( noml ode ) We need to onfim tht ll sugphs of the esulting (k +1)-vetex mtix e fequent We need to onside the noml-ode geneted k-vetex sugphs The lgoithm only stoes noml-ode geneted gphs They e geneted y e-geneting the k-vetex sugph fom singletons in noml ode Poess is lled nomliztion nd n ompute the noml foms of ll sugphs Nomliztion n e expessed s ow nd olumn pemuttions: Xn = P T XP 20 Noveme 2012 T II.1-14

Cnonil Foms Isomophi gphs n hve mny diffeent noml foms Given set NF(G) of ll noml foms epesenting gphs isomophi to G, the nonil fom of G is the djeny mtix X tht hs the minimum ode in NF(G) X = g min {ode(x) : X NF(G)} Given n djeny mtix X, its noml fom is Xn = P T XP fo some pemuttion mtix P, nd its nonil fom X is Q T P T XPQ fo some pemuttion mtix Q 20 Noveme 2012 T II.1-15

Finding Cnonil Foms Let X e n djeny mtix of k+1 veties Let Y e X with vetex m emoved Let P e the pemuttion of Y to its noml fom nd Q the pemuttion of P T YP to the nonil fom We ssume we hve ledy omputed them We ompute ndidte P nd Q fo X y Q is like Q ut ottom-ight one is 1 p ij is pij if i < m nd j k pi 1,j if i > m nd j k 1 if i = m nd j = k 0 othewise Finl P nd Q e found y tying ll ndidtes nd seleting the ones tht give the lowest ode 20 Noveme 2012 T II.1-16

The Algoithm Stt with fequent gphs of 1 vetex while thee e fequent gphs left Join two fequent (k 1)-vetex gphs Chek the esulting gphs sugphs e fequent If not, ontinue Compute the nonil fom of the gph If this nonil fom hs ledy een studied, ontinue Compe the nonil fom with the nonil foms of the k-vetex sugphs of the gphs in D If the gph is fequent, keep, othewise disd etun ll fequent sugphs 20 Noveme 2012 T II.1-17

The gspn Algoithm We n impove the unning time of fequent sugph mining y eithe Mking the fequeny hek fste Lots of effots in fste isomophism heking ut only little pogess Ceting less ndidtes tht need to e heked Level-wise lgoithms (like AGM) genete huge numes of ndidtes Eh must e heked with fo isomophism with othes The gspn (gph-sed Sustutue ptten mining) lgoithm eples the level-wise ppoh with depth-fist ppoh Yn & Hn 2002; Z&M Ch. 11 20 Noveme 2012 T II.1-18

Depth-Fist Spnning Tee A dept-fist spnning (DFS) tee of gph G Is onneted tee Contins ll the veties of G Is uild in depth-fist ode Seletion etween the silings is e.g. sed on the vetex index Edges of the DFS tee e fowd edges Edges not in the DFS tee e kwd edges A ightmost pth in the DFS tee is the pth tvels fom the oot to the ightmost vetex y lwys tking the ightmost hild (lst-dded) 20 Noveme 2012 T II.1-19

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

An Exmple v6 d v7 v5 v1 v2 v8 v4 v3 20 Noveme 2012 T II.1-20

The DFS Tee v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-21

Geneting Cndidtes fom DFS Tee Given gph G, we extend it only fom the veties in the ightmost pth We n dd kwds edges fom the ightmost vetex to some othe vetex in the ightmost pth We n dd fowd edge fom ny vetex in the ightmost pth This ineses the nume of veties y 1 The ode of geneting the ndidtes is Fist kwd extensions Fist to oot, then to oot s hild, Then fowd extensions Fist fom the lef, then fom lef s fthe, 20 Noveme 2012 T II.1-22

An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23

An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23

An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23

An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23

An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23

An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23

An Exmple v1 v2 v5 v3 v4 v6 d v7 v8 20 Noveme 2012 T II.1-23

DFS Codes nd thei Odes A DFS ode is sequene of tuples of type vi, vj, L(vi), L(vj), L(vi,vj) Tuples e given in DFS ode Bkwds edges e listed efoe fowd edges A DFS ode is nonil if it is the smllest of the odes in the odeing vi, vj, L(vi), L(vj), L(vi,vj) < vx, vy, L(vx), L(vy), L(vx,vy) if vi, vj <e vx, vy ; o vi, vj = vx, vy nd L(vi), L(vj), L(vi, vj) <l L(vx), L(vy), L(vx, vy) The odeing of the lel tuples is the lexiogphil odeing 20 Noveme 2012 T II.1-24

Odeing the Edges Let eij = vi, vj nd exy = vx, vy eij <e exy if If eij nd exy e fowd edges, then j < y; o j = y nd i > x If eij nd exy e kwd edges, then i < x; o i = x nd j < y If eij is fowd nd exy is kwd, then i < y If eij is kwd nd exy is fowd, then j x 20 Noveme 2012 T II.1-25

Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, 20 Noveme 2012 T II.1-26

Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, Fist ows e identil t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, 20 Noveme 2012 T II.1-26

Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, In seond ow, G2 is igge in lels ode 20 Noveme 2012 T II.1-26

Exmple G 1 G 2 G 3 v 1 v 1 v 1 q q q v 2 v 2 v 2 v 4 v 3 v 4 v 3 v 4 v 3 t 11 = v 1,v 2,,,q t 12 = v 2,v 3,,, t 13 = v 3,v 1,,, t 14 = v 2,v 4,,, t 21 = v 1,v 2,,,q t 22 = v 2,v 3,,, t 23 = v 2,v 4,,, t 24 = v 4,v 1,,, t 31 = v 1,v 2,,,q t 32 = v 2,v 3,,, t 33 = v 3,v 1,,, t 34 = v 1,v 4,,, Lst ows e fowd edges nd 4 = 4 ut 2 > 1 G1 is smllest 20 Noveme 2012 T II.1-26

Building the Cndidtes The ndidtes e uild in DFS ode tee A DFS ode is n nesto of DFS ode if is pope pefix of The silings in the tee follow the DFS ode ode A gph n e fequent only if ll of the gph epesenting its nestos in the DFS tee e fequent The DFS tee ontins ll the nonil odes fo ll the sugphs of the gphs in the dt But not ll of the veties in the ode tee oespond to nonil odes We will (impliitly) tvese this tee 20 Noveme 2012 T II.1-27

The Algoithm gspn: fo eh fequent 1-edge gphs ll sugm to gow ll nodes in the ode tee ooted in this 1-edge gph emove this edge fom the gph sugm if the ode is not nonil, etun Add this gph to the set of fequent gphs Cete eh supe-gph with one moe edge nd ompute its fequeny ll sugm with eh fequent supe-gph 20 Noveme 2012 T II.1-28