Looking or All Plindromes in String Shih Jng Pn nd R C T Lee Deprtment o Computer Science nd Inormtion Engineering, Ntionl Chi-Nn University, Puli, Nntou Hsien,, Tiwn, ROC sjpn@lgdoccsiencnuedutw, rctlee@ncnuedutw Abstrct A plindrome is string o the orm αα', where α nd α' re lso strings nd reverse to ech other The problem o the pper is deined s ollows: given string S o length n, ind ll plindromes occurring in the given string S In the pper, we present n lgorithm bsed on suix trees to ind plindromes Our lgorithm will ind ll mximum plindromes which re not contined in ny other plindromes Ater inding mximum plindromes, we utilize eliminting opertions to ind other plindromes which re contined in mximum plindromes Introduction Let us irst deine plindromes Given stringα = α α 2 Lα n, α' = α nα n Lα is clled the reverse o α The string αα' is clled plindrome For exmple, bb nd cddc re both plindromes To look or DNA sequences contining plindromes is importnt in biology Mny reserchers hve discussed the plindrome problem [] Suix trees hve been used extensively by mny lgorithms [] Gusield proposed n lgorithm bsed on suix trees to ind ll tndem repets in given DNA sequence In this pper, we modiy the ide in nd present suix tree pproch to del with the plindrome problem 2 A suix tree pproch to detect ll plindromes 2 Term deinition Given n input string K, we irst dd chrcter which does not occur in K to the end o the input string The new string is denoted s K Then, we reverse the given string K nd dd chrcter ' which does not occur in the string to the end o the reversed string The new string is denoted s K We now construct suix tree or both K nd K Assume tht we hve K = cbb nd K = bbc' We will construct suix tree or K nd K To clerly describe the suix tree, we need mny indices or nodes o it They will be explined below () Rule : Ech le node is ssocited with either K or K, indicted by the or ' on the edge ending t the le node s indicted in Fig () (2) Rule 2: I le node is ssocited with K ( K ) there is n index i ( i r ), indicting the strting loction o the ssocited string The index i ( i r ) is in regulr type (itlic type) s indicted in Fig () () Rule : For ech node, except the root node, the collection o le indices below the node is clled orwrd collection or reversed collection or K or K The indices collected in orwrd nd reversed collections re enclosed in ( ) prentheses I the numbers inside the prenthesis re in regulr type (itlic type), they re ssocited with K ( K ) s indicted in Fig (b) We urther hve Rule s ollows: Rule : For ech node, the length rom the root to the node is denoted s D (v) D (v) is shown in [ ] brcket s shown in Fig 2 The bove results re merged nd shown in Fig For our lgorithm o plindrome inding, we need the ollowing terms s deined below () Eliminting opertion: Given plindrome string S = s s 2 Lsn, the eliminting opertion deletes the irst nd lst chrcters o the given string S The remining string S' = s2 Lsn is lso plindrome For exmple, bb is generted rom cbbc by n eliminting opertion (2) Mximum Plindromes: A substring T o S is mximum plindrome o S i T is not contined ny other plindrome o S For exmple, in the string dcbbce, the substring cbbc is mximum plindrome while bb is not
() Figure 6 root ' 6 ' c (,)(2,) () b bb c' (2,) b (,) b ()(2) () b () (2)() c' c' c' () () 2 (2) () 2 (2) (b) K cbb nd bbc' = = Figure 2 22 The Method interested in the node indicted by the rrow sign In this section, we irst present the lgorithm bsed on suix trees to ind ll mximum plindromes in given string We construct the suix tree o K nd K At ech node v, we check whether there exist i nd i r such tht i + i r = (n D(v) + 2) I the substring ssocited with this node is mximum plindrome, we will ind i nd i r such tht i + i r = (n D(v) + 2) On the other hnd, i i + i r = (n D(v) + 2) or some i nd i r, the substring corresponding to this node must be plindrome, but not necessrily mximum plindrome We now use the input string cbb to explin the bove discussion Consider Fig whose suix tree is constructed or cbb nd bbc' We re For this pointed node, i = 2, i r =, n =, nd D(v) = Thus i + i r = (n D(v) + 2) Thus i + i r = (n D(v) + 2) The substring corresponding to this node is bb nd is mximum plindrome By using n eliminting opertion, we get nother plindrome For other nodes o the suix tree, the ormul does not hold So there re not other mximum plindromes We now show nother exmple cbbbb The suix tree is shown in Fig For the let pointed node, n = 8 nd D(v) = Besides there exist i = 2, nd i r = Thus the ormul
Figure Figure i + i r = (n D(v) + 2) holds or such i nd ir The substring corresponding to this node is bb which is mximum plindrome By using eliminting opertion, we get nother plindrome bb For other nodes, the ormul does not hold So there re not other mximum plindromes We cn esily see tht we cn ind nother two mximum plindromes t the other two pointed nodes They re bbbb nd bb We would like to point out tht we my ind nother bb which is plindrome, but not mximum plindrome In the ollowing, we present the lgorithm or inding ll mximum plindromes Algorithm Input: string S with length n Output: ll plindromes which occur in S Add chrcter which does not occur in S to the end o S The new string is denoted s S ' 2 Add chrcter ' which does not occur in the string to the end o the reversed input string S The new string is denoted s S' Construct the suix tree or S ' nd S'
bbbb 8 (8) () [9] () () [7] 9 (9) [] ' bb c' 6 ' root c ()(8) [] (2,,6)(,,7) [] bb bb 2 (2) [8] b c' 8 (8) 7 (7) [] (2,6) [] (2) (,7,8)(,) [] c' 9 (9) [] b 7 (7) [] c' (,7)(,) c' 6 (,6)(2) bb ()(,) [] bb (2) c' () [9] ()() c' () [7] 2 (2) [8] Figure Figure 6 Collect the orwrd, reversed collections or S ' or S' nd ind D (v) or ech node in the suix tree For ech node v o the suix tree, check whether there exist i nd i r such tht i + ir = ( n D( v) + 2) I yes, return the substring corresponding to this node v s possible mximum plindrome o S 6 Utilize eliminting opertions to ll plindromes obtined in the bove procedures to produce more plindromes The correctness o our proposed lgorithm We inlly discuss why the lgorithm works Let us ssume tht there is mximum plindrome strting t i with length k in the given string S Then there must be mximum plindrome string t i in S As shown in Fig 6, we hve r i + k + i r = n But, on the suix tree constructed out o, we hve: S nd S K = D(v) Thus, we hve i + i = ( n D( v) + 2) Conclusions r In this pper, we presented n lgorithm bsed on suix trees to ind ll plindromes in given string The min gol is ctully to deine new kind o plindromes, nmely plindromes with gps between α nd α' We hve lredy some preliminry result to indicte tht our present lgorithm, with modiiction, cn be used to ind the new kind o plindromes, which cnnot be ound by using the Mncher lgorithm This is our uture reserch
Reerence [] D Gusield Algorithms on Strings, Trees, nd Sequences Cmbridge University Press,997 G Mncher A New liner-time "On-Line" Algorithm or Finding the Smllest Initil Plindrome o String Journl o the Assocition or Computing Mchinery, 97 [] A Porto, V Brbos Finding Approximte Plindromes in Strings Pttern Recognition, 2002 R Gupt, A Mittl, V Nrng, S Wing-Kin Detection o Plindromes in DNA Sequences Using Periodicity Trnsorm IEEE Interntionl Workshop on Biomedicl Circuits & Systems, 200 J Stoye, D Gusield Simple nd lexible detection o contiguous repets using suix tree Theoreticl Computer Science, 2002