Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15
Pttern Mthing Given pttern P nd text T, find ll sustrings of T tht re in P. P = 1: string pttern mthing [BM, KMP] P = k: keyword pttern mthing [AC] P is regulr expression: regulr-expression pttern mthing!!! Prefix-Free Regulr-Expression Mthing p.2/15
Overview Bsi Notions Relted Work Regulr-Expression Mthing Infix-Free Regulr-Expression Mthing Prefix-Free Regulr-Expression Mthing Determine whether or not L(E) is prefix-free Conlusions Prefix-Free Regulr-Expression Mthing p.3/15
Bsi Notions An utomton A is speified y tuple (Q, Σ,δ,s,F); Q finite set of sttes Σ finite lphet δ Q Σ Q s Q strt stte F Q set of finl sttes λ = the null-string symol A = Q + δ E = the numer of hrter ppernes in given regulr expression E Prefix-Free Regulr-Expression Mthing p.4/15
Bsi Notions Given trnsition (p,,q) in δ p hs n out-trnsition q hs n in-trnsition p is soure stte of q q is trget stte of p A to e non-returning if the strt stte of A does not hve ny in-trnsitions A to e non-exiting if finl stte of A does not hve ny out-trnsitions p q Prefix-Free Regulr-Expression Mthing p.4/15
Bsi Notions Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ suh tht xz = y. x is n infix of y if there exists u,v Σ suh tht uxv = y; we often ll x sustring of y. Prefix-Free Regulr-Expression Mthing p.4/15
Bsi Notions We define lnguge L to e prefix-free if no string in L is prefix of ny other strings in L. infix-free if no string in L is n infix of ny other strings in L. Prefix-Free Regulr-Expression Mthing p.4/15
Relted Work Given regulr expression E nd text T, The memership prolem: We n determine whether or not T L(E) in O(mn) time [Thompson] The deision prolem: We n determine whether or not there is sustring of T tht is in L(E)) in O(mn) time [Aho] or in O(m log n) time [Myers] The reognition prolem: We n report ll end positions of mthing sustrings of T in O(mn) time [Aho] or in O(m log n) time [Myers] The identifition prolem: We n report ll (strt, end) positions of mthing sustrings of T in O(mn log n) time [Myers et l.] Prefix-Free Regulr-Expression Mthing p.5/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15
The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. Σ ( + ) Σ Prefix-Free Regulr-Expression Mthing p.7/15
The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. ExpressionMthing (A, T) Q = null({s}) if f Q then output λ for j=1 to n Q = null(goto(q,w j )) if f Q then output j null(q) omputes ll sttes in A tht n e rehed from stte in the set Q of sttes y null trnsitions goto(q,w j ) gives ll sttes tht n e rehed from stte in Q y trnsition with w j, the urrent input hrter Prefix-Free Regulr-Expression Mthing p.7/15
The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. E = ( + ) T Given regulr expression E nd text T, we n find ll end positions of mthing sustrings of T in O(mn) worst-se time using O(m) spe [Crohemore nd Hnrt]. Prefix-Free Regulr-Expression Mthing p.7/15
The Identifition Prolem Given regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn 2 ) worst-se time using O(m) spe. Note tht the lgorithm of Myers et l. tkes O(mn log n) time using O(m log n) spe. Prefix-Free Regulr-Expression Mthing p.8/15
Infix-Free Regulr-Expression Mthing L IN L PRE L REG T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Given n infix-free regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn) worst-se time using O(m) spe. Prefix-Free Regulr-Expression Mthing p.9/15
Prefix-Free Regulr-Expression Mthing L IN L PRE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, there re t most n mthing sustrings of T tht elong to L(E), where n is the size of T Prefix-Free Regulr-Expression Mthing p.10/15
Prefix-Free Regulr-Expression Mthing Given prefix-free regulr expression E nd text T, we find ll end positions of mthing sustrings of T in O(mn) time. T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Let P = {p 1,p 2,...,p k } e the set of end positions of mthing sustrings for k n Construt the Thompson utomton A = (Q, Σ,δ,s,f ) for E R Sn T R = w n w 1 strting from the lst position p k in P to find the orresponding strt position Prefix-Free Regulr-Expression Mthing p.10/15
Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 15 For urrent input position i in T R, Q 15 is set of sttes suh tht there is pth from s to eh stte in Q 15 tht spells out w 15 w 14 w i. We keep reding T R until we meet f. Prefix-Free Regulr-Expression Mthing p.10/15
Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 13 Q 15 Prefix-Free Regulr-Expression Mthing p.10/15
Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 10 Q 13 Q 15 Prefix-Free Regulr-Expression Mthing p.10/15
Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 9 Q 10 Q 13 Q 15 In the worst-se, there re k suh sets of sttes nd we need O(km) time for eh hrter of T to updte these k sets. Thus, the totl running time is O(mn 2 ) in the worst-se sine k is t most n. Prefix-Free Regulr-Expression Mthing p.10/15
Prefix-Free Regulr-Expression Mthing If stte r in A is rehed from two different sttes p nd q, where p Q i nd q Q j, when reding hrter w h in EM, where h i < j, then oth pths from p nd q vi r nnot reh f y reding ny prefix of the remining input in EM. p Q i, q Q j T R j i h Q i Q j p r q Prefix-Free Regulr-Expression Mthing p.11/15
Prefix-Free Regulr-Expression Mthing If stte r in A is rehed from two different sttes p nd q, where p Q i nd q Q j, when reding hrter w h in EM, where h i < j, then oth pths from p nd q vi r nnot reh f y reding ny prefix of the remining input in EM. Eh stte in A ppers in t most one rehle set Any two sets of rehle sttes re disjoint We need t most O(m) time to updte ll sets of rehle sttes simultneously t eh step Given prefix-free regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn) worst-se time using O(m) spe. Prefix-Free Regulr-Expression Mthing p.11/15
Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? Prefix-Free Regulr-Expression Mthing p.12/15
Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? If n NFA A is prefix-free, then A must e non-exiting However, the reverse does not hold Prefix-Free Regulr-Expression Mthing p.12/15
Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? If n NFA A is prefix-free, then A must e non-exiting However, the reverse does not hold s f Prefix-Free Regulr-Expression Mthing p.12/15
Stte-Pir Grph Given finite-stte utomton A = (Q, Σ, δ, s, f), we define the stte-pir grph G A = (V,E), where V is set of nodes nd E is set of edges, s follows: V = {(i,j) q i nd q j Q} nd E = {((i,j),, (x,y)) (q i,,q x ) nd (q j,,q y ) δ nd Σ}. 2 1 3 4 5 6 7 2,2 1,1 3,3 4,6 4,4 5,5 6,6 5,7 7,7 Prefix-Free Regulr-Expression Mthing p.13/15
Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A 2 1 3 4 5 6 7 2,2 1,1 3,3 4,6 4,4 5,5 6,6 5,7 7,7 Prefix-Free Regulr-Expression Mthing p.14/15
Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A Given finite-stte utomton A = (Q, Σ, δ, s, f), we n determine whether or not L(A) is prefix-free in O( Q 2 + δ 2 ) worst-se time Let G A = (V, E) e the stte-pir grph of A V = Q 2 Let δ i denote the set of out-trnsitions from stte q i in A δ = m i=1 δ i, where m = Q node (i, j) in G A n hve t most δ i δ j out-trnsitions E = m i,j=1 δ i δ j δ 2 Prefix-Free Regulr-Expression Mthing p.14/15
Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A Given finite-stte utomton A = (Q, Σ, δ, s, f), we n determine whether or not L(A) is prefix-free in O( Q 2 + δ 2 ) worst-se time Given regulr expression E, we n determine whether or not L(E) is prefix-free in O( E 2 ) worst-se time Construt the Thompson utomton for E Q = δ = O( E ) Prefix-Free Regulr-Expression Mthing p.14/15
Conlusions Solve the prefix-free regulr-expression mthing prolem in O(mn) time using O(m) spe sed on the Thompson utomt Determine whether or not L(A) is prefix-free for given NFA A in polynomil time sed on stte-pir grphs Prefix-Free Regulr-Expression Mthing p.15/15