Fast index for approximate string matching

Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit distne) t most k from the pttern re reported in O(m+ loglogn + #mthes) time (for onstnt k). The spe omplexity of the index is O(n 1+ǫ ) for ny onstnt ǫ > 0. 1 Introdution One of the fundmentl prolems in pttern mthing is indexing text t suh tht given query pttern p, ll the ourrenes of p in t n e reported effiiently. This n e solved optimlly using suffix trees [12]: The onstrution time nd spe omplexity of the index is O(n), nd the query time is O(m + #mthes), where n is the length of t, m is the length of p, nd #mthes is the numer of times p ppers in t. For simpliity, we shll ssume throughout the pper tht the size of the lphet is onstnt. A nturl extension of text indexing is to llow pproximte serh in the index. Formlly, given text t nd n integer k, the gol is to uild n index for t suh tht given query string p, ll the sustrings of t with Hmming distne (or edit distne) t most k from p n e reported effiiently. Agin for simpliity, we ssume throughout tht k is onstnt. Building n pproximte index with lmost liner spe nd query time ws mjor open prolem. The first effiient pproximte index ws otined for the se k = 1 y Amir et l. [1]. The index of Amir et l. uses O(nlog 2 n) spe, nd nswer queries in time O(mlognloglogn+ #mthes). A fster query time is otined using the dt-struture of [2]. Liner spe indies tht support one error were given in [7,8]. A ig rekthrough ws otinedy Cole et l. [4] whih presented n index tht supports n ritrry numer of errors. The index of Cole et l. uses O(nlog k n) spendnswers queries intimeo(m+log k n loglogn+#mthes). Chnet l.[3] gve n O(n)-spe index tht nswers queries in time O(m+(logn) k(k+1) loglogn+ #mthes). Most of the results ove work for oth Hmming distne or edit distne. We note tht the query time omplexity of the edit distne index in [4] is O(m+log k n Deprtment of Computer Siene, Ben-Gurion University of the Negev. Emil: dekelts@s. gu..il 1

loglogn+3 k #mthes). However, s we ssume here tht k is onstnt, the time omplexity eomes O(m+log k n loglogn+#mthes). The indies mentioned ove hve worst se performne gurntees. Indies with good performne on verge were given in [5,6,9 11]. In this pper, we show how to speed-up the query time in the index of Cole et l. This omes t ost of inresing the spe omplexity of the index. More preisely, we show tht for every integer α with 2 α n/2, there is n O(n(αlogαlogn) k )- spe index (for Hmming distne or edit distne) tht nswers queries in time O(m + (log α n) k loglogn + #mthes). In prtiulr, for every fixed ǫ > 0, one n tke α = log ǫ/2k n nd get n index with spe omplexity O(nlog k+ǫ n) nd query time O(m + log k n/(loglogn) k 1 + #mthes) (rell tht k is ssumed to e onstnt, so log α n = Θ(log/loglogn)). To get fster query time, one n tke α = n ǫ/2k for some ǫ > 0 nd get n index with spe omplexity O(n 1+ǫ ) nd query time O(m+loglogn+#mthes). 2 Preliminries Let s 1,...,s n e olletion of strings, where eh string ends with the hrter, nd does not pper elsewhere in s 1,...,s n. A ompressed trie for s 1,...,s n isrootedtreet ththsnleves ndeh internl vertex hstlest two hildren. Every edge of T is leled y string. Every string s i orresponds to distint lef v i of T suh tht the ontention of the lels of the edges on the pth from the root of T to v i is extly s i. A lotion l on ompressed trie T is pir (v,s) where v is vertex of T nd s is n empty string or proper prefix of the lel of some edge etween v nd hild of v. We will sometimes refer to vertex v s lotion (v,ǫ) nd vie vers. For vertex v in ompressed trie T, the string tht orresponds to v is the ontention of the lels on the pth from the root of T to v. For lotion l = (v,s), the string tht orresponds to l, denoted str(l), is the ontention of the string tht orresponds to v nd s. The weight of vertex v in tree T is the numer of desendent leves of v. A pth [v 1,...,v d ] in tree T is hevy pth if (1) v 1 is the root of T, (2) v d is lef, nd (3) for every i < d, there is no hild of v i with weight greter thn the weight of v i+1. A hevy pth deomposition of tree T is set C of pths in T suh tht (1) C ontins hevy pth C of T, nd (2) for every onneted omponent T in T C, C ontins the pths in hevy pth deomposition of T (T C is the grph otined from T y removing the verties of C). For hevy pth deomposition C define T C to e rooted tree whose set of verties is C, nd there is n edge from C to C in T C if there is vertex v C suh tht the topmost vertex in C is hild of v in T. Given hevy pth deomposition C of ompressed trie T nd lotion l = (v,s) in T, nextlo(l) is the lotion rehed when moving from l one hrter long thepth C C tht ontins v. Formlly, nextlo(l) is thelotionl = (v,s ) suh tht the string str(l ) is the prefix of length str(l) + 1 of the string tht orresponds to the ottommost vertex of C. If there is no suh lotion l then 2

nextlo(l) is undefined. We lso define next(l) to e the lst hrter of the string tht orresponds to nextlo(l). For vertex v in ompressed trie T, nexthrs(v) is the set of ll first hrters in the lels of the edges etween v nd its hildren. For hrter nexthrs(v), let w e the hild of v suh tht the first hrter of the lel of the edge (v,w) is. We define Su(T,v,) to e the tree otined y first tking the sutree of T indued y v, w, nd ll the desendents of w. Furthermore, if the lel of (v,w) ontins only one hrter then the vertex v nd the edge (v,w) re removed from Su(T,v,). Otherwise, the first hrter of the lel of (v,w) is ersed. Let T 1,...,T d e ompressed tries. The merge of T 1,...,T d is ompressed trie whose strings set is the union of the strings sets of T 1,...,T d. 3 k-mismthes index The following prolem is generliztion of the indexing prolem tht ws disussed in the introdution. Input A ompressed trie T over strings s 1,...,s n. Query A string p, nd lotion l on T. Output Allthestringss i suh thtstr(l) isprefix ofs i ndthehmming distne etween p nd s i [ str(l) +1.. str(l) +m] is extly k, where m is the length of p. A dt-struture tht solves the prolem ove is lled n unrooted k-mismthes index. A dt-struture tht solves simpler vrint of the prolem in whih str(l) is lwys empty is lled rooted k-mismthes index. To solve the indexing prolem mentioned in the introdution, one n onstrut rooted k -mismthes index on ll the suffixes of the input string t for ll k k. We note tht we use Hmming distne to simplify the presenttion. The sme tehniques n lso e used for edit distne. We first desrie the k-mismthes index of Cole et l. [4]. The min ide is to define new ompressed tries lled group trees, nd reursively uild rooted (k 1)-mismthes index on eh group tree (the reursion stops when k is equl to 0). A k-mismthes query on T is nswered y mking (k 1)-mismthes queries on O(logn) group trees. Let T e ompressed trie of the strings s 1,...,s n, nd let C e hevy pth deomposition of T. Consider some hevy pth C C, nd let v 1,...,v d e the verties long the pth C (where v 1 is the topmost vertex in the pth). We define error trees s follows: For every vertex v i nd every nexthrs(v i )\{next(v i )}, the error tree Err(T,v i,) is equl to Su(T,v i,). The error tree Err(T,v i ) is the tree otined y merging the trees Su(T,v i,) for every nexthrs(v i ) \ {next(v i )}. Then, if the root u of the resulting tree hs more thn one hild we dd new root u nd n edge (u,u) with lel s, where s is the string otined y ontenting the lels of the edges on the pth from v 1 to v i, nd the hrter 3

d d () () () (d) Figure 1: Exmple of error trees. Figure () shows hevy pth v 1,v 2,... nd the verties hnging from this pth. The error trees Err(T,v 2,) nd Err(T,v 2,) re shown in Figures () nd (), respetively. Figure (d) shows the error tree Err(T,v 2 ), whih is otined y merging Err(T,v 2,) nd Err(T,v 2,), nd dding new root u. next(v i ). If u hs only one hild we prepend the string s to lel of the edge etween u nd its hild. See Figure 1 for exmples of the definitions ove. The next step is to onstrut group trees from the error trees. Let w i e the numer of leves in the tree Err(T,v i ). For eh vertex v i we ssign n intervl I i = [ j<i w j, j i w j). For n intervl I = [,), we will denote left(i) = nd right(i) =. The merge of Err(T,v i ),...,Err(T,v j ) will e denoted Group 1 (T,v i,v j )ndwillelledtype 1 grouptree. WedonotreteGroup 1 (T,v i,v j ) for ll i nd j (s this would tke too muh spe). Insted, the type 1 group trees re onstruted y the following proedure (n exmple is given in Figure 2). 1: For every C C whih is not lef in T C do 2: Let v 1,...,v d e the verties of C with intervls I 1,...,I d. 3: L 1 {(1,d)}. 4: t 1. 5: While L t do 6: L t+1. 7: For every (i,i ) L t do 8: left(i i ), right(i i ). 9: Let j e the index suh tht + I 2 j. 10: If j i+1 then uild the group tree Group 1 (T,v i,v j 1 ) 11: Build the group tree Group 1 (T,v j,v j ). 12: If j i 1 then uild the group tree Group 1 (T,v j+1,v i ) 13: If j > i+1 then dd (i,j 1) to L t+1. 14: If j < i 1 then dd (j +1,i ) to L t+1. 4

Figure 2: An exmple of type 1 group tree onstrution. The top line shows intervls I 1,...,I 7 nd the point + I 2 3. Thus, the first itertion retes the group trees Group 1 (T,v 1,v 2 ), Group 1 (T,v 3,v 3 ), nd Group 1 (T,v 4,v 7 ). In the nextitertion, thefollowingtreesrereted: Group 1 (T,v 1,v 1 ), Group 1 (T,v 2,v 2 ), Group 1 (T,v 4,v 4 ), Group 1 (T,v 5,v 5 ), nd Group 1 (T,v 6,v 7 ). In the finl itertion, the group trees Group 1 (T,v 6,v 6 ) nd Group 1 (T,v 7,v 7 ) re reted. 15: t t+1. For every vertex v in T we rete group trees from the error trees Err(T,v,) in similr wy. These trees will e lled type 2 group trees. On every group tree (of type 1 or 2) we uild rooted (k 1)-mismthes index. Also, we uild n unrooted (k 1)-mismthes index on T. We now desrie how to nswer rooted query p. This is done y performing (k 1)-mismthes queries on some group trees or on T. Let l e the lotion in T suh tht str(l) is prefix of p, nd str(l) is mximl. The pth tht orresponds to p is the pth from the root of T to l. Let C 1,...,C r e the pths of C through whih the pth tht orresponds to p psses, in order from top to ottom. For t = 1,...,r, let l t e the lst lotion on C t through whih the pth tht orresponds to p psses. Note tht for t < r, l t must e vertex. For every pth C t, let v 1,...,v d e the verties of the pth, nd let j e the minimum index suh tht str(v j ) str(l t ). The following queries re performed: 1. If l t is not lef, do n unrooted (k 1)-mismthes query on T with query string p[ str(l t ) +2..m] nd strt position nextlo(l t ). 2. Identify the type 1 group trees whose merge inludes preisely the error trees Err(T,v 1 ),...,Err(T,v j 1 ). On eh group tree, do (k 1)-mismthes query with query string p[ str(v 1 ) +1..m]. 3. If l t = v j nd l t is not lef, identify the type 2 group trees whose merge inludes preisely the error trees Err(T,v j,) for ll p[ str(v j ) +1]. On eh group tree, do (k 1)-mismthes query with query string p[ str(v j ) + 2..m]. Hndling n unrooted query is done similrly: In this se the pth tht orresponds to p strts t the query lotion l insted of the strting t the root. Hndling the pths C 2,...,C r is the sme s efore. For the pth C 1, the type 1 group trees tht re queried re the trees whose merge inludes preisely the error trees Err(T,v i ),...,Err(T,v j 1 ), where i is the minimum index suh tht str(v i ) str(l) nd j is defined s efore. 5

4 New index Our onstrution is similr to the onstrution of Cole et l. We uild more group trees in order to redue the numer of group trees tht re serhed when nswering query. In prtiulr, while in the onstrution of Cole et l. group tree onsists of error trees tht ome from one hevy pth, in our onstrution some group trees (lled type 3 group trees) onsist of error trees from severl hevy pths. Let α e some integer with 2 α n/2. The type 1 group trees re uilt using proedure Build desried elow. 1: For every C C whih is not lef in T C do 2: Let v 1,...,v d e the verties of C with intervls I 1,...,I d. 3: L 1 {(1,d)}. 4: t 1. 5: While L t do 6: L t+1. 7: For every (i,i ) L t do 8: left(i i ), right(i i ). 9: i 0 i 1. 10: For j = 1,...,α 1 do 11: Let i j e the index suh tht + j ( ) I α i j. 12: If i j > i j 1 then 13: If i j i+1 then uild the group tree Group 1 (T,v i,v ij 1). 14: Build the group tree Group 1 (T,v ij,v ij ). 15: If i j i 1 then uild the group tree Group 1 (T,v ij +1,v i ). 16: If i j > i j 1 +2 then dd (i j 1 +1,i j 1) to L t+1. 17: If i α 1 < i 1 then dd (i α 1 +1,i ) to L t+1. 18: t t+1. The type 2 group trees re uilt similrly. We lso define type 3 group trees s follows. The weight of pth C C is the weight of the topmost vertex in C. A pth C C is lled d if weight(c ) > 1 weight(c), where C is the prent of C α in T C. We sn the verties of the tree T C in preorder. When we reh vertex C tht hs t lest one d hild, we uilt set B(C) ontining the pth C nd ll pths C C suh tht C is desendent of C in T C nd weight(c ) > 1 weight(c). α Note tht every C B(C)\{C} is d pth. For every C,C B(C) suh tht C is desendent of C we rete type 3 group tree, denoted Group 3 (T,C,C ), in the following wy. Let C = C 1,C 2,...,C r 1,C r = C e the pth from C to C in T C. Let u i e the first vertex in the pth C i, nd for i < r let v i e the prent of u i+1 in T (note tht v i C i ). Let i e the first hrter of the lel of the edge (v i,u i+1 ). Let s i e the ontention of the lels of the edges on the pth from u 1 to u i, nd let s i e the ontention of the lels of the edges on the pth from u 1 to v i, nd the hrter i. The group tree Group 3 (T,C,C ) is the merge of the following trees. 1. For every i < r nd every v C i whih is n nestor of v i, the tree otined y tking Err(T,v) nd prepending the string s i to the lel of the edge 6

d d d e d e () () () (d) Figure 3: Exmple of type 3 group trees. The pths C = C 1, C 2, nd C = C 3 re shown in Figure (). Two of the trees tht re merged when reting Group 3 (T,C,C ) re shown in () nd (d). The tree in () is otined from Err(T,v) (shown in ()) y dding the string s 2 = to the lel of the edge etween the root nd its hild. The tree in (d) is otined from Su(T,v 2,) y dding new root, where the lel of the new edge is s 2 =. etween the root of Err(T,v) nd its only hild. 2. For every i < r nd every nexthrs(v i ) \ { i } (note tht this inludes = next(v i )), the tree otined y tking Su(T,v i,) nd if the root of this tree hs only one hild, prepending the string s i to the edge etween the root nd its hild. Otherwise, new root is dded nd onneted to the old root y n edge, where the lel of the edge is s i. An exmple is given in Figure 3. Answering n unrooted query p is performed s follows. Let C 1,...,C r e the pths of C through whih the pth tht orresponds to p in T psses. Strt with t = 1. At eh itertion, if t = r or C t+1 is not d pth, perform queries for C t s desried in the previous setion, nd inrese t y 1. Otherwise, do rooted 7

(k 1)-mismthes query on Group 3 (T,C t,c t ) nd set t to t, where t > t is the mximum index suh tht C t B(C t ). In more detils, the lgorithm is s follows (we omit the queries on type 2 grouptrees whih re hndled similrly to the queries on type 1 group trees). 1: Let C 1,...,C r e the pths of C through whih the pth tht orresponds to p in T psses. 2: t 1. 3: While t r do 4: Let v 1,...,v d e the verties of C t, with intervls I 1,...,I d. 5: If t < r nd C t+1 is d pth 6: Let t > t e the mximum index suh tht C t B(C t ). 7: Do rooted (k 1)-mismthes query on Group 3 (T,C t,c t ) with query string p[ str(v 1 ) +1..m]. 8: t t. 9: Else 10: Let l t e the lst lotion on C t through whih the pth tht orresponds to p psses. 11: If l t is not lef then do n unrooted (k 1)-mismthes query on T with query string p[ str(l t ) +2..m] nd strt position nextlo(l t ). 12: Let j e the minimum index suh tht str(v j ) str(l t ). 13: p p[ str(v j ) +1..m]. 14: i 1, i d. 15: While i < j do 16: left(i i ), right(i i ). 17: Let β e the mximum integer suh tht + β ( ) < right(i α j). 18: If β > 0 then let j 1 e the index suh tht + β ( ) I α j 1 else j 1 i 1. 19: If β < α 1 then let j 2 e the index suh tht + β+1 α j 2 else j 2 i +1. 20: If j 1 i+1thendorooted(k 1)-mismthesqueryonGroup 1 (T,v i,v j1 1) with query string p. 21: If i j 1 < j thendorooted(k 1)-mismthesqueryonGroup 1 (T,v j1,v j1 ) with query string p. 22: i j 1 +1, i j 2 1. 23: t t+1 For n unrooted query, the pth C 1 is hndled s in the hndling of unrooted queries desried in the previous setion. Then, C 2,...,C r re hndled using the lgorithm ove. Theorem1. Thetime fornsweringqueryis O(m+(log α n) k loglogn+#mthes). Proof. Let t 1,...,t r e the different vlues of t during the run of the lgorithm. We first give ound on r. We lim tht for every i r 2, weight(c ti+2 ) 1 weight(c α t i ): If C ti +1 is not d pth then t i+1 = t i + 1 nd weight(c ti+1 ) 1 weight(c α t i ). Sine weight(c 1 ) > weight(c 2 ) > > weight(c t ) nd t i+2 t i+1, 8

we otin tht weight(c ti+2 ) 1 α weight(c t i ). If C ti +1 is d pth then C ti+1 +1 is not in B(C t ). Therefore, weight(c ti+2 ) weight(c ti+1 +1) 1 α weight(c t i ). Sine weight(c 1 ) = n nd weight(c t ) 1, we onlude tht r 2 + 2log α n. Therefore, the numer of (k 1)-mismthes queries performed t lines 7 nd 11 is t most r 2+2log α n. We next ound the numer of queries performed on type 1 group trees. During theexeution of lines 15 22, we sy tht the urrent intervl is the intervl I i I i+1 I i. The sequene of urrent intervls during the exeution of the lgorithm (for ll t) is deresing in lengths. If for some C t, lines 15 22 re exeuted s times, then the length of the urrent intervl dereses y ftor of t lest α mx(1,s 1). Thus, lines 15 22 re exeuted t most 2+2log α n times, nd the numer queries performed on type 1 group trees is t most 4+4log α n. Using similr nlysis, the numer of queries on type 2 group trees is t most 8+8log α n (in eh itertion of the serh in the type 2 group trees, up to 4 queries n e mde). Comining the ounds ove, we hve tht the totl numer of(k 1)-mismthes queries performed when nswering rooted queries is t most 14+14log α n. When nswering n unrooted query, t most 18+18log α n (k 1)-mismthes queries re mde (the dditionl 4+4log α n queries re due to the speil hndling of the pth C 1 ). Using indution, the totl numer of 0-mismthes queries performed for rooted or unrooted query is t most (18+18log α n) k = O((log α n) k ). Using the LCP dt-strutures of Cole et l. [4] we hve tht fter preproessing stge tht tkes O(m) time, the i-th 0-mismthes query tkes O(loglogn+ #mthes i ) time, where #mthes i is the numer of mthes returned y the query. Sine eh pproximte mth of p in t is reported extly one, i #mthes i = #mthes. Therefore, the totl time omplexity of k-mismthes query is O(m + (log α n) k loglogn+#mthes). Theorem 2. The spe omplexity of the index is O(n(αlogαlogn) k ). Proof. First, we ound the totl numer of leves in ll type 1 group trees (the nlysis is similr to the nlysis of Cole et l.). Define S k (n) = (5αlogαlogn) k. We will show tht the totl numer of leves in ll group trees tht re uilt for k-mismthes index over ompressed trie T with n leves is t most S k (n) n. The lim is proved using indution on k. The se k = 0 is trivil. Suppose we proved the lim for k 1, nd onsider some k-mismthes index over ompressed trie T with n leves. Let T 1,...,T d e ll the type 1 group trees tht re uilt for T y proedure Build, nd denote y x i the numer of leves in T i. By indution, we hve tht the (k 1)-mismthes indies onstruted on the trees T 1,...,T d hve t most d i=1 S k 1(x i ) x i leves. For lef v of T, let i(v,1),...,i(v,d v ) denote the indies of group trees in whih v ppers. Clerly, d i=1 S k 1(x i ) x i = dv v j=1 S k 1(x i(v,j) ). The funtion S k 1 (x) is n inresing funtion of x. Therefore, d i=1 S k 1(x i ) x i dv v j=1 S k 1(n) = S k 1 (n) v d v. We now give ound on d v. Fix some lef v of T. We prtition the group trees tht ontin v into sets, where eh set onsists of ll the trees tht re generted during one exeution of lines 10 16 of proedure Build. In eh set the numer of trees tht ontin v is t most α 1. Similrly to the proof of Theorem 1, the 9

numer of sets is t most logn + log α n 2logn. It follows tht the numer of leves in the (k 1)-mismthes indies uilt on the type 1 group trees is t most (α 1) 2logn S k 1 (n). Similrly, the numer of leves in the indies uilt on the type 2 group trees is t most (α 1) 2logn S k 1 (n). It remins to ound the numer of leves in the indies uilt on the type 3 group trees. Weeginyounding thesize ofb(c) forsome pthc. Consider the sutree T of T C tht is indued y the verties of B(C). For every two leves C 1 nd C 2 in T, the set of verties of T tht re desendents of the topmost vertex in C 1 is disjoint with the set of verties of T tht re desendents of the topmost vertex in C 2. It follows tht the sum of weights of the leves of T is less thn or equl to weight(c). Sine eh lef in T hs weight greter thn 1 weight(c), we onlude α tht T hs t most α leves. By the definition of hevy pth deomposition, we hve tht if C 1 is hild of C 2 in T then the weight of C 1 is less thn hlf the weight of C 2. Therefore, for every lef C in T, the numer of nestors of C in T is t most logα. Thus, B(C) αlogα. Using the sme rguments s ove, the numer of leves in the(k 1)-mismthes indies uilt on the type 3 group trees is t most S k 1 (n) v d v, where d v is the numer of type 3 group trees tht ontin the lef v. A type 3 group tree tht ontins v must e of the form Group 3 (T,C,C ) where C is pth through whih the pth from the root of T to v psses. The numer of suh pths is t most logn. Moreover, for fixed C, there re t most αlogα wys to hoose C. Therefore, d v αlogαlogn. We onlude tht the totl numer of leves in the indies uilt on ll group trees is t most (2 2(α 1)logn+αlogαlogn) S k 1 (n) 5αlogαlogn S k 1 (n) = S k (n). Referenes [1] A. Amir, D. Keselmn, G. M. Lndu, N. Lewenstein, M. Lewenstein, nd M. Rodeh. Ditionry mthing with one error. J. of Algorithms, 37(2):309 325, 2000. [2] A. L. Buhsum, M. T. Goodrih, nd J. R. Westrook. Rnge serhing over tree ross produts. In Pro. 8th Europen Symposium on Algorithms (ESA), pges 120 131, 2000. [3] H. Chn, T. W. Lm, W. Sung, S. Tm, nd S. Wong. A liner size index for pproximte pttern mthing. In Pro. 17th Symposium on Comintoril Pttern Mthing (CPM), LNCS 4009, pges 49 59, 2006. [4] R. Cole, L. Gottlie, nd M. Lewenstein. Ditionry mthing nd indexing with errors nd don t res. In Pro. 36th ACM Symposium on Theory Of Computing (STOC), pges 91 100, 2004. [5] C. Epifnio, A. Griele, F. Mignosi, A. Restivo, nd M. Siortino. Lnguges with mismthes. Theoretil Computer Siene, 385(1-3):152 166, 2007. 10

[6] A. Griele, F. Mignosi, A. Restivo, nd M. Siortino. Indexing strutures for pproximte string mthing. In Pro. 5th Itlin Conferene on Algorithms nd Complexity (CIAC), pges 140 151, 2003. [7] T. N. D. Huynh, W. K. Hon, T. W. Lm, nd W. K. Sung. Approximte string mthing using ompressed suffix rrys. In Pro. 15th Symposium on Comintoril Pttern Mthing (CPM), pges 434 444, 2004. [8] T. W. Lm, W. K. Sung, nd S. S. Wong. Improved pproximte string mthing using ompressed suffix dt strutures. In Pro. 16th Interntionl Symposium on Algorithms nd Computtion (ISAAC), pges 339 348, 2005. [9] M. G. Mß nd J.Nowk. Text indexing with errors. In Pro. 16th Symposium on Comintoril Pttern Mthing (CPM), pges 21 32, 2005. [10] G. Nvrro nd R. Bez-Ytes. A hyrid indexing method for pproximte string mthing. J. of Disrete Algorithms, 1(1):205 239, 2000. [11] G. Nvrro nd E. Chávez. A metri index for pproximte string mthing. Theoretil Computer Siene, 352(1 3):266 279, 2006. [12] P. Weiner. Liner pttern mthing lgorithm. In Pro. 14th IEEE Symposium on Swithing nd Automt Theory, pges 1 11, 1973. 11