1 Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15

2 Pttern Mthing Given pttern P nd text T, find ll sustrings of T tht re in P. P = 1: string pttern mthing [BM, KMP] P = k: keyword pttern mthing [AC] P is regulr expression: regulr-expression pttern mthing!!! Prefix-Free Regulr-Expression Mthing p.2/15

3 Overview Bsi Notions Relted Work Regulr-Expression Mthing Infix-Free Regulr-Expression Mthing Prefix-Free Regulr-Expression Mthing Determine whether or not L(E) is prefix-free Conlusions Prefix-Free Regulr-Expression Mthing p.3/15

4 Bsi Notions An utomton A is speified y tuple (Q, Σ,δ,s,F); Q finite set of sttes Σ finite lphet δ Q Σ Q s Q strt stte F Q set of finl sttes λ = the null-string symol A = Q + δ E = the numer of hrter ppernes in given regulr expression E Prefix-Free Regulr-Expression Mthing p.4/15

5 Bsi Notions Given trnsition (p,,q) in δ p hs n out-trnsition q hs n in-trnsition p is soure stte of q q is trget stte of p A to e non-returning if the strt stte of A does not hve ny in-trnsitions A to e non-exiting if finl stte of A does not hve ny out-trnsitions p q Prefix-Free Regulr-Expression Mthing p.4/15

6 Bsi Notions Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ suh tht xz = y. x is n infix of y if there exists u,v Σ suh tht uxv = y; we often ll x sustring of y. Prefix-Free Regulr-Expression Mthing p.4/15

7 Bsi Notions We define lnguge L to e prefix-free if no string in L is prefix of ny other strings in L. infix-free if no string in L is n infix of ny other strings in L. Prefix-Free Regulr-Expression Mthing p.4/15

8 Relted Work Given regulr expression E nd text T, The memership prolem: We n determine whether or not T L(E) in O(mn) time [Thompson] The deision prolem: We n determine whether or not there is sustring of T tht is in L(E)) in O(mn) time [Aho] or in O(m log n) time [Myers] The reognition prolem: We n report ll end positions of mthing sustrings of T in O(mn) time [Aho] or in O(m log n) time [Myers] The identifition prolem: We n report ll (strt, end) positions of mthing sustrings of T in O(mn log n) time [Myers et l.] Prefix-Free Regulr-Expression Mthing p.5/15

9 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

10 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

11 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

12 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

13 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

14 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

15 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

16 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

17 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

18 The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

19 The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. Σ ( + ) Σ Prefix-Free Regulr-Expression Mthing p.7/15

20 The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. ExpressionMthing (A, T) Q = null({s}) if f Q then output λ for j=1 to n Q = null(goto(q,w j )) if f Q then output j null(q) omputes ll sttes in A tht n e rehed from stte in the set Q of sttes y null trnsitions goto(q,w j ) gives ll sttes tht n e rehed from stte in Q y trnsition with w j, the urrent input hrter Prefix-Free Regulr-Expression Mthing p.7/15

21 The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. E = ( + ) T Given regulr expression E nd text T, we n find ll end positions of mthing sustrings of T in O(mn) worst-se time using O(m) spe [Crohemore nd Hnrt]. Prefix-Free Regulr-Expression Mthing p.7/15

22 The Identifition Prolem Given regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn 2 ) worst-se time using O(m) spe. Note tht the lgorithm of Myers et l. tkes O(mn log n) time using O(m log n) spe. Prefix-Free Regulr-Expression Mthing p.8/15

23 Infix-Free Regulr-Expression Mthing L IN L PRE L REG T Given n infix-free regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn) worst-se time using O(m) spe. Prefix-Free Regulr-Expression Mthing p.9/15

24 Prefix-Free Regulr-Expression Mthing L IN L PRE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, there re t most n mthing sustrings of T tht elong to L(E), where n is the size of T Prefix-Free Regulr-Expression Mthing p.10/15

25 Prefix-Free Regulr-Expression Mthing Given prefix-free regulr expression E nd text T, we find ll end positions of mthing sustrings of T in O(mn) time. T Let P = {p 1,p 2,...,p k } e the set of end positions of mthing sustrings for k n Construt the Thompson utomton A = (Q, Σ,δ,s,f ) for E R Sn T R = w n w 1 strting from the lst position p k in P to find the orresponding strt position Prefix-Free Regulr-Expression Mthing p.10/15

26 Prefix-Free Regulr-Expression Mthing T Q 15 For urrent input position i in T R, Q 15 is set of sttes suh tht there is pth from s to eh stte in Q 15 tht spells out w 15 w 14 w i. We keep reding T R until we meet f. Prefix-Free Regulr-Expression Mthing p.10/15

27 Prefix-Free Regulr-Expression Mthing T Q 13 Q 15 Prefix-Free Regulr-Expression Mthing p.10/15

28 Prefix-Free Regulr-Expression Mthing T Q 10 Q 13 Q 15 Prefix-Free Regulr-Expression Mthing p.10/15

29 Prefix-Free Regulr-Expression Mthing T Q 9 Q 10 Q 13 Q 15 In the worst-se, there re k suh sets of sttes nd we need O(km) time for eh hrter of T to updte these k sets. Thus, the totl running time is O(mn 2 ) in the worst-se sine k is t most n. Prefix-Free Regulr-Expression Mthing p.10/15

30 Prefix-Free Regulr-Expression Mthing If stte r in A is rehed from two different sttes p nd q, where p Q i nd q Q j, when reding hrter w h in EM, where h i < j, then oth pths from p nd q vi r nnot reh f y reding ny prefix of the remining input in EM. p Q i, q Q j T R j i h Q i Q j p r q Prefix-Free Regulr-Expression Mthing p.11/15

31 Prefix-Free Regulr-Expression Mthing If stte r in A is rehed from two different sttes p nd q, where p Q i nd q Q j, when reding hrter w h in EM, where h i < j, then oth pths from p nd q vi r nnot reh f y reding ny prefix of the remining input in EM. Eh stte in A ppers in t most one rehle set Any two sets of rehle sttes re disjoint We need t most O(m) time to updte ll sets of rehle sttes simultneously t eh step Given prefix-free regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn) worst-se time using O(m) spe. Prefix-Free Regulr-Expression Mthing p.11/15

32 Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? Prefix-Free Regulr-Expression Mthing p.12/15

33 Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? If n NFA A is prefix-free, then A must e non-exiting However, the reverse does not hold Prefix-Free Regulr-Expression Mthing p.12/15

34 Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? If n NFA A is prefix-free, then A must e non-exiting However, the reverse does not hold s f Prefix-Free Regulr-Expression Mthing p.12/15

35 Stte-Pir Grph Given finite-stte utomton A = (Q, Σ, δ, s, f), we define the stte-pir grph G A = (V,E), where V is set of nodes nd E is set of edges, s follows: V = {(i,j) q i nd q j Q} nd E = {((i,j),, (x,y)) (q i,,q x ) nd (q j,,q y ) δ nd Σ} ,2 1,1 3,3 4,6 4,4 5,5 6,6 5,7 7,7 Prefix-Free Regulr-Expression Mthing p.13/15

36 Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A ,2 1,1 3,3 4,6 4,4 5,5 6,6 5,7 7,7 Prefix-Free Regulr-Expression Mthing p.14/15

37 Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A Given finite-stte utomton A = (Q, Σ, δ, s, f), we n determine whether or not L(A) is prefix-free in O( Q 2 + δ 2 ) worst-se time Let G A = (V, E) e the stte-pir grph of A V = Q 2 Let δ i denote the set of out-trnsitions from stte q i in A δ = m i=1 δ i, where m = Q node (i, j) in G A n hve t most δ i δ j out-trnsitions E = m i,j=1 δ i δ j δ 2 Prefix-Free Regulr-Expression Mthing p.14/15

38 Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A Given finite-stte utomton A = (Q, Σ, δ, s, f), we n determine whether or not L(A) is prefix-free in O( Q 2 + δ 2 ) worst-se time Given regulr expression E, we n determine whether or not L(E) is prefix-free in O( E 2 ) worst-se time Construt the Thompson utomton for E Q = δ = O( E ) Prefix-Free Regulr-Expression Mthing p.14/15

39 Conlusions Solve the prefix-free regulr-expression mthing prolem in O(mn) time using O(m) spe sed on the Thompson utomt Determine whether or not L(A) is prefix-free for given NFA A in polynomil time sed on stte-pir grphs Prefix-Free Regulr-Expression Mthing p.15/15

