On Pattern Matching With Swaps

Size: px
Start display at page:

Download "On Pattern Matching With Swaps"

Transcription

1 On Pattern Matching With Swaps Fouad B. Chedid Dhofar University, Salalah, Oman Notre Dame University - Louaize, Lebanon P.O.Box: 2509, Postal Code 211 Salalah, Oman Tel: Fax: fchedid@du.edu.om, fchedid@ndu.edu.lb Abstract Pattern Matching with Swaps (PMS for short) is a variation of the classical pattern matching problem where a match is allowed to include disjoint local swaps. In 2009, Cantone Faro devised a new dynamic programming algorithm for PMS, named Cross-Sampling, that runs in O(nm) time uses O(m) space. More important, Cross-Sampling admits a lineartime implementation based on bit parallelism when the pattern s size is comparable to the word size of the machine. In this paper, we present improved dynamic programming formulations of the approach of Cantone Faro for PMS which result in simpler algorithms that are much easier to be comprehended implemented. Keywords: Pattern Matching with Swaps, Approximate Pattern Matching with Swaps, Bit-Parallelism, Dynamic Programming, Efficient Algorithms. I. INTRODUCTION The classical Pattern Matching problem (PM for short) is a well-studied problem in computer science. This problem is defined as follows. Given a fixed alphabet Σ, a pattern P Σ of length m a text T Σ of length n m, PM asks for a way to find all occurrences of P in T. The Pattern Matching with Swaps problem (PMS for short) is a variation of PM in which a match is allowed to include disjoint local swaps. More precisely, a pattern P is said to have a swapped match with a text T at location j if adjacent characters in P can be swapped, if necessary, so as to make P identical to the substring of T ending at location j. We have included below an example of a pattern P = bbababab having a swapped match with a string T = ababbbaabba at location j = 8. Observe that two swaps are needed for this swap-match. Also, observe that both swaps are disjoint; that is, each character can be involved in at most one swap, that identical adjacent characters are not allowed to be swapped. P= b b a b a b a b T= a b a b b b a a b b a j= PMS was introduced in 1995 [8] as an open problem in non-stard stringology. A variant of PMS, named Approximate Pattern Matching with Swaps (APMS for short), asks to find for each location of the text where there is a swapped-match of the pattern the number of swaps needed to obtain a match at that location. We now know that PMS APMS have important applications in many fields such as computational biology, text musical retrieval, data mining, network security [5]. An algorithm for PMS that runs in o(nm) time first appears in [1]. Algorithms for PMS APMS that run in time O(n log m log σ), where σ is the size of the alphabet (= Σ ) appear in [2] [3]. We mention that both solutions in [1] [3] are based on the Fast Fourier Transform (FFT) method. A non FFT-based algorithm for PMS first appears in [7] where an algorithm, based on bit-parallelism, was devised with running time O((n + m) log m), if the pattern size is comparable to the word size of the machine. In 2009, Cantone Faro [5] devised a new dynamic programming algorithm for PMS, named Cross-Sampling, that runs in time O(nm) uses O(m) space. More important, Cross-Sampling admits a linear time implementations based on bit-parallelism, if the size of the pattern is comparable to the word size of the machine. Moreover, Cross-Sampling can be easily adapted to solve APMS in time O(nm) (O(n) for short patterns based on bit-parallelism). Thus, for the first time, we have an algorithm that solves PMS APMS for short patterns in linear time. In 2009, Campanelli et al. [4] described a variation of the Cross-Sampling algorithm that inherits much of the structure of Cross-Sampling but is based on a right-to-left scan of the text. The new algorithm, named Backward-Cross-Sampling, runs in time O(nm 2 ); however, extensive computer runs show that Backward-Cross-Sampling outperforms Cross-Sampling in practice [4]. In this paper, we present improved dynamic programming formulations of the approaches of Cantone Faro Campanelli et al. for PMS. Our work gives new algorithms for PMS APMS that are much easier to be comprehended implemented. In the sequel, a string P will be represented as a finite array P [0... m 1], which is basically the concatenation of the characters P [i], for 0 i m 1. Note that P [i] denotes the (i+1)th character of the string P. Let P i denote the prefix of P of length i + 1 (0 i m 1). The rest of the paper is organized as follows. Section 2 gives basic definitions. Section 3 presents our simpler solutions for PMS APMS. Section 4 presents more efficient versions of our solutions based on bit-parallelism Section 5 concludes the paper with some observations for future work. II. PROBLEM DEFINITION Let Σ be a fixed alphabet let P T be two strings over Σ of lengths m n m, respectively /13/$ IEEE

2 Definition 1: A swap permutation of P is a permutation π : {0,..., m 1} {0,..., m 1} such that: 1) if π(i) = j then π(j) = i (characters are swapped). 2) for all i, π(i) {i 1, i, i + 1} (only adjacent characters can be swapped) 3) if π(i) i then P [π(i)] P [i] (identical characters are not allowed to be swapped) The swapped version of P under the permutation π is denoted as π(p ); that is, π(p ) is the concatenation of the characters P [π(i)], for 0 i m 1. For a given text string T Σ, we say that P has a swapped match with T at location j if there exists a swap permutation π of P such that π(p ) has an exact match with T ending at location j. For example, the swap permutation that corresponds to the swap match shown as an example in the previous section is π(bbababab) = babbbaab. This swap match has two swaps: π(1) = 2, π(2) = 1 π(4) = 5, π(5) = 4. In this case, we write P T 8. Moreover, since 2 swaps are needed for this swap-match, we also write P 2 T 8. The Pattern Matching with Swaps Problem (PMS for short) is the following: INPUT: A text string T [0... n 1] a pattern string P [0... m 1] over a fixed alphabet Σ. OUTPUT: All locations j for m 1 j n 1 such that P T j. The Approximate Pattern Matching with Swaps Problem (APMS for short) is the following: INPUT: A text string T [0... n 1] a pattern string P [0... m 1] over a fixed alphabet Σ. OUTPUT: The number of swaps k needed for each m 1 j n 1, where P k T j. III. SIMPLER ALGORITHMS FOR PMS AND APMS We present improved dynamic programming formulations of the approaches of Canton Faro Campanelli et al. for PMS APMS which result in simpler algorithms for these problems that are much easier to be comprehended implemented. A. A Simpler Algorithm for PMS The main idea behind the Cross-Sampling algorithm is a new approach for finding all prefixes P i of P that have swapped matches with T ending at some location j, for 0 j n 1. This will be denoted by P i T j. The paper [5] defines a collection of sets S j, for 0 j n 1, as follows. S j = {0 i m 1 : P i T j } Thus, the pattern P has a swapped match with the text T ending at location j if only if S j m 1. To compute S j, the authors of [5] define another collection of sets S j, for 0 j n 1, as follows. S j = {0 i m 1 : P i 1 T j 1 P [i] = T [j + 1]} Then, it is shown how to compute S j in terms of S j 1 S j 1, where S j 1 is computed in terms of S j 2. This formulation of the solution gives a dynamic programming Algorithm Prefix-Sampling (P, m, T, n) S[m, n] {0} { Initially, all entries are set to False } for j 0 to n 1 do S[0, j] P [0] = T [j] for j 1 to n 1 do S[1, j] (S[0, j 1] (P [1] = T [j])) ((P [1] = T [j 1]) (P [0] = T [j])) for i 2 to m 1 do for j i to n 1 do S[i, j] (S[i 1, j 1] (P [i] = T [j])) (S[i 2, j 2] ((P [i] = T [j 1]) (P [i 1] = T [j]))) for j m 1 to n 1 do if S[m 1, j] then print j {Here, P T j } Fig. 1. The Prefix-Sampling Algorithm for PMS based iterative solution for PMS that runs in O(mn) time uses O(m) space. The dynamic programming approach of Cross-Sampling is based on the following lemma: Lemma 2: Let T P be a text of length n a pattern of length m, respectively. Then, for 0 i m 1 0 j n 1, we have that P i T j if only if one of the following two facts holds: P [i] = T [j] P i 1 T j 1. P [i] = T [j 1], P [i 1] = T [j], P i 2 T j 2. We use the above lemma to propose a simpler version of Cross- Sampling. Let us define the Boolean matrix S j i, for 0 i m 1 0 j n 1, as follows. S j i = 1, if P i T j Thus, the pattern P has a swapped match with T at location j if only if S j m 1 = 1. The following recursive definition of Sj i is inspired by Lemma 2, for 2 i m 1 i j n 1: S j i (Sj 1 i 1 (P [i] = T [j])) (S j 2 i 2 ((P [i] = T [j 1]) (P [i 1] = T [j]))) (1) The base cases for i = 0 i = 1 are given by S j 0 (P [0] = T [j]), for 0 j n 1. S j 1 (Sj 1 0 (P [1] = T [j])) ((P [1] = T [j 1]) (P [0] = T [j])), for 1 j n 1. These recursive relations compute S j i in terms of S j 1 i 1 S j 2 i 2. The recursive relations in Equation 1 give a dynamic matrix Sm n iteratively in O(nm) time O(m) space. Our resultant algorithm, named Prefix-Sampling, is shown in Fig. 1. The code in Fig. 1 runs in O(nm) time uses O(nm) space. However, it is a simple matter to modify the code so that it uses only O(m) space (compute the matrix Sm n column wise by keeping track of only three columns at a time S 1, S 2, S 3, where S 3 is computed in terms of S 2 S 1 ). We mention that the ideas behind Prefix-Sampling first appear in [6] as part of

3 TABLE I. A SAMPLE RUN OF PREFIX-SAMPLING S j= i= i= i= i= i= i= i= TABLE II. A SAMPLE RUN OF APPROXIMATE-PREFIX-SAMPLING S j= i= i= i= i= i= i= i= our parallel algorithm for PMS that runs in O(m 2 ) time using processors on a linear array model of computation. n m 1 We have traced Prefix-Sampling on the following PMS instance (taken from [5] for ease of comparison): let P = babaaab of length m = 7 T = abbababaabbab of length n = 13. The results (See Table I) show that P has a swapped match with T at location j = 9 (a swap-match corresponds to a non-zero entry in the last row of the table). B. A Simpler Algorithm for APMS Cantone Faro [5] showed how to adapt their Cross- Sampling algorithm to solve the APMS problem. Their Approximate-Cross-Sampling algorithm works with two new collections of sets S j S j, for 0 j n 1, where S j = {(i, k) : 0 i m 1 P i k T j } S j = {(i, k) : 0 i m 1 (P i 1 k T j 1 or i = 0) P [i] = T [j + 1]} Clearly, P k T j if only if (m 1, k) S j. The dynamic programming approach of Approximate-Cross- Sampling is based on the following lemma: Lemma 3: Let T P be a text of length n a pattern of length m, respectively. Then, for 0 i m 1 0 j n 1, we have that P i k T j if only if one of the following two facts holds: P [i] = T [j] either (i = 0 k = 0) or P i 1 k T j 1. P [i] = T [j 1], P [i 1] = T [j] either (i = 1 k = 1 or P i 2 k 1 T j 2. We use the above lemma to propose a simpler version of Approximate-Cross-Sampling. We redefine our Boolean matrix S j i from the previous section so that its definition reads as follows. For 0 i m 1 0 j n 1, we have S j i = k + 1, if P i k T j Thus, there will be k swaps involved in the swap-match of the pattern P with the text T at location j if only if S j m 1 = k + 1. The following recursive definition of Sj i is inspired by Lemma 3, for 2 i m 1 i j n 1: S j i Sj 1 i 1, if Sj 1 i 1 (P [i] = T [j]). S j 2 i 2 + 1, if Sj 2 i 2 ((P [i] = T [j 1]) (P [i 1] = T [j])). 0, otherwise. (2) The base case for i = 0 is defined as follows, for 0 j n 1. S j 0 (P [0] = T [j]). The base case for i = 1 is defined as follows, for 1 j n 1. S j i 1, if Sj 1 0 (P [1] = T [j]). 2, if (P [1] = T [j 1]) (P [0] = T [j]). 0, otherwise. These recursive relations compute S j i in terms of S j 1 i 1 S j 2 i 2. The recursive relations in Equation 2 give a dynamic matrix Sm n iteratively in O(nm) time O(m) space. For lack of space, we do not show the code of our resultant algorithm; however, we show a trace of this algorithm on the problem instance P = babaaab T = abbababaabbab. The results (See Table II) show that there will be 2 (= S[6, 9] 1) swaps needed for the swap-match of P with T at location j = 9. C. A Simpler Algorithm for PMS by Scanning the Text Backward The basic idea of Backward-Cross-Sampling of Campanelli et al. [4] is to search for all occurrences of the pattern in the text by scanning the characters of the text from right to left. In particular, Backward-Cross-Sampling processes the text in fixed-sized windows of size m which are searched for the longest prefix of the pattern that has a swapped match with the text ending at the last position j of the current window. After processing a text window, the largest matched prefix P i T j is computed, then j is incremented by m i as to left-align the current window of the text with P i. Let P [i h i] denotes the substring of P of length h ending at location i. The paper [4] defines two collections of sets Sj h W j h, for 0 j n 1 0 h m, where S h j = {h 1 i m 1 : P [i h i] T j } W h j = {h i m 2 : P [i h i] T j P [i h + 1] = T [j h]} Observe that P h T j if only if (h 1) Sj h. By the same token, P T j if only if Sj m = {m 1}. The Backward-Cross-Sampling algorithm for computing the sets is inspired by the following lemma: S h j Lemma 4: Let T P be a text of length n a pattern of length m, respectively. Then, for 0 j n 1, 0 h m, h 1 i m 1, we have that P [i h+1... i] T j if only if one of the following two facts holds: P [i h+2... i] T [j] P [i h+1] = T j h+1].

4 Algorithm Backward-Prefix-Sampling (P, m, T, n) 1. l 0; j m 1 2. while j n 1 do 3. for i 0 to m 1 do 4. S 1 [i] P [i] = T [j] 5. if S 1 [0] then l 1 6. for i 1 to m 1 do 7. S 2 [i] (S 1 [i] (P [i 1] = T [j 1])) 8. ((P [i] = T [j 1]) (P [i 1] = T [j])) 9. if S 2 [1] then l for h 3 to m do 11. for i h 1 to m 1 do 12. S 3 [i] (S 2 [i] (P [i h + 1] = T [j h + 1])) 13. (S 1 [i] ((P [i h + 2] = T [j h + 1]) 14. (P [i h + 1] = T [j h + 2]))) 15. if S 3 [h 1] then 16. l h {Here, P h T j } 17. if (S 2 = 0) (S 3 = 0)) then goto S 1 S 2 ; S 2 S if l = m then 20. print j {Here, P T j } 21. j = j else j j + m l 23. End of while j n 1) Fig. 2. The Backward-Prefix-Sampling Algorithm for PMS P [i h i] T [j], P [i h + 1] = T [j h + 2], P [i h + 2] = T [j h + 1]. We use the above lemma to propose a simpler version of Backward-Cross-Sampling. We define the Boolean matrix S h [i, j], for 0 i m 1, 0 j n 1, 1 h m, as follows. S h [i, j] = 1 if P [i h i] T j Thus, P h T j if only if S h [h 1, j] = 1. By the same token, the pattern P has a swapped match with T at location j if only if S m [m 1, j] = 1. The following recursive definition of S h [i, j] is inspired by Lemma 4, for 0 j n 1, 3 h m, h 1 i m 1: S h [i, j] (S h 1 [i, j] (P [i h + 1] = T [j h + 1])) (S h 2 [i, j] ((P [i h + 2] = T [j h + 1]) (P [i h + 1] = T [j h + 2]))) (3) The base cases for h = 1 h = 2 are given by S 1 [i, j] P [i] = T [j], for 0 i m 1 0 j n 1). S 2 [i, j] (S 1 [i, j] (P [i 1] = T [j 1])) ((P [i] = T [j 1]) (P [i 1] = T [j])), for 1 i m 1 1 j n 1. These recursive relations compute Sj h in terms of Sh 1 j S h 2 j. The recursive relations in Equation 3 give a dynamic matrix Sm n iteratively in O(nm 2 ) time O(m) space. Our resultant algorithm, named Backward-Prefix-Sampling, is shown in Fig. 2. TABLE III. A SAMPLE RUN OF APPROXIMATE-BACKWARD-PREFIX-SAMPLING S h j S 1 6 S 2 6 S 3 6 S 4 6 S 5 6 S 6 6 S 1 9 S 2 9 S 3 9 S 4 9 S 5 9 S 6 9 S 7 9 i= i= i= i= i= i= i= D. A Simpler Algorithm for APMS by Scanning the Text Backward Our Solution from the previous section can be easily extended to solve the APMS problem. We redefine the Boolean matrix S h [i, j] from the previous section so that its definition reads as follows. For 0 i m 1, 0 j n 1, 1 h m, we have S h [i, j] = k + 1 if P [i h i] k T j Thus, P k T j if only if S m [m 1, j] = k + 1. The following recursive definition of S h [i, j] is inspired by lemmas 3 4, for 0 j n 1, 3 h m, h 1 i m 1: S h [i, j] S h 1 [i, j], if (S h 1 [i, j] (P [i h + 1] = T [j h + 1])) S h 2 [i, j] + 1, if (S h 2 [i, j] ((P [i h + 2] = T [j h + 1]) (P [i h + 1] = T [j h + 2]))) (4) The base case for h = 1 is defined as follows, for 0 i m 1 0 j n 1. S 1 [i, j] 1, if P [i] = T [j]. The base case for h = 2 is defined as follows, for 1 i m 1 1 j n 1. S 2 [i, j] 1, if (S 1 [i, j] (P [i 1] = T [j 1])). S 2 [i, j] 2, if ((P [i] = T [j 1]) (P [i 1] = T [j])). The recursive relations in Equation 4 give a dynamic programming algorithm for computing the elements of the matrix S h [m, n] iteratively in O(nm 2 ) time O(m) space. For lack of space, we do not show the code of our resultant algorithm; however, we show a trace of this algorithm on the problem instance P = babaaab T = abbababaabbab. The results (See Table III) show that there will be 2 (= S m 9 [m 1] 1 = S 7 9[6] 1 = 3 1) swaps needed for the swap-match of P with T at location j = 9. IV. IMPROVED ALGORITHMS USING BIT-PARALLELISM We now consider the case of short patterns. In particular, we are interested in the case where the entire pattern can be stored in one word of computer memory. We show that under such condition, all our algorithms from the previous section admit more efficient implementations using bit-parallelism. In particular, Prefix-Sampling Backward-Prefix-Sampling will then have a linear running time.

5 Let T P be a text of length n a pattern of length m, respectively. Following [5], for each c Σ, we define the m- bit vector M c [0,..., m 1] as follows. M c [i] = 1, if P [i] = c, 0 otherwise. First, we consider Prefix-Sampling (See Fig. 1). The main statement in Prefix-Sampling is the one that computes S j from S j 1 S j 2. Using bit parallelism, that statement can be coded as follows. S j ((S j 1 1)&M T [j] ) ((S j 2 2)&(M T [j 1] &(M T [j] 1))) We demonstrate the embedding of bit-parallelism in Prefix- Sampling on the input instance P = babaaab T = abbababaabbab. In particular, we show how the algorithm would compute the vector S 5 from S 4 = ( ) S 3 = ( ) (See Table I in Section III). (The bit vectors are written from highest to lowest ordered bits). We have M T [j] = M T [5] = M a = ( ) M T [j 1] = M T [4] = M b = ( ). The 0th bit of S 5 is determined as follows. S 5 = M a &( ) = ( )&( ) = ( ). Thus, the 0th bit of S 5 is zero. The 1st bit of S 5 is determined as follows. For ease of presentation, we make use of two temporary variables T 1 T 2. Let T 1 = (S 4 1)&M a = ( )&( ) = ( ). Let T 2 = T 1 (M b &(M a 1)) = ( ) (( )&( ) = ( ). Then, S 5 = S 5 (T 2 &0 m 1 10) = ( ) (( )&( ) = ( ). Thus, the 1st bit of S 5 is 1. The remaining bits of S 5 are computed as follows. Let T 2 = T 2 &(T 1 (S 3 2)) = ( )&(( ) ( )) = ( ). Finally, S 5 = S 5 (T 2 &1 m 2 00) = ( ) ( )&( )) = ( ). For lack of space, we chose not to include the code of our Bit- Parallelism-Prefix-Sampling, but to include the code of our Bit-Parallelism-Backward-Prefix-Sampling in Fig. 3, where S 1, S 2, S 3 are now three m-bit vectors. We demonstrate some of the steps of Bit-Parallelism-Backward-Prefix- Sampling, which runs in O(n) time, on the problem instance P = babaaab T = abbababaabbab. First, observe that m = P = 7, n = T = 13, M a = ( ), M b = ( ). Line 1 of the code in Fig. 3 sets j to 6. Line 3 sets S 1 to M T [6] = M b = ( ). Since S 1 [0] = 1, the prefix of P ending at location i = 0 has a swapped match with T at location j = 6. This is true because P [0] = T [6]. Line 5 sets S 2 as follows. S 2 = (S 1 &(M T [5] 1)) (M T [5] &(M T [6] 1)) = (S 1 &(M a 1)) (M a &(M b 1)) = (( )&( )) (( )&( ))) = ( ) ( ) = ( ). Since S 2 [1] = 1, the prefix of P ending at location i = 1 has a swapped match with T at location j = 6. This is true because P [0... 1] = ba T [5... 6] = ab. Bit-Parallelism-Backward-Prefix-Sampling (m, n) 1. S 1 S 2 S 3 0; l 0; j m 1 2. while j n 1 do 3. S 1 M T [j] 4. if S 1 &0 m 1 1 0) then l 1 5. S 2 (S 1 &(M T [j 1] 1)) (M T [j 1] &(M T [j] 1)) 6. if S 2 &0 m ) then l 2 7. for h 3 to m do 8. S 3 (S 2 &(M T [j h+1] (h 1)) 9. (S 1 &((M T [j h+1] (h 2)) 10. &(M T [j h+2] (h 1))) 11. if (S 3 &0 m h 10 h 1 ) 0 then 12. l h {Here, P h T j } 13. if (S 2 = 0) (S 3 = 0)) then goto S 1 S 2 ; S 2 S if l = m then 16. print j {Here, P T j } 17. j = j else j j + m l 19. End-while (j n 1) Fig. 3. The Bit-Parallelism-Backward-Prefix-Sampling Algorithm for PMS V. CONCLUDING REMARKS A drawback of our Backward-Prefix-Sampling Campanelli et al. s Backward-Cross-Sampling is that both algorithms do not remember the length of the prefix matched in previous search attempts. We propose to rectify this issue as follows. Once a largest prefix P i, for i m 1, is found to have a swapped match with the text at location j, after the text window is shifted to the right as to become left-aligned with the pattern (line 22 in Fig. 2), the following iteration of the j loop (line 2 in Fig. 2) can simply search for the subpattern P [i... m 1] in the text subwindow T [j m + i j], then combine results to exp the largest matched prefix of the pattern in the text window. This modification can be expected to improve the performance of both algorithms in practice. REFERENCES [1] A. Amir, Y. Aumann, G.M. Lau, M. Lewenstein, N. Lewenstein, Pattern Matching With Swaps, Proc. IEEE Symposium on Foundations of Computer Science (FOCS), pp , [2] A. Amir, M. Lewenstein, E. Porat, Approximate Swapped Matching, Information Processing Letters, 83:1, pp , [3] A. Amir, R. Cole, R. Hariharan, M. Lewenstein, E. Porat, Overlap Matching, Inf. Comput., 181:1, pp , [4] M. Campanelli, D. Cantone S. Faro, A New Algorithm for Efficient Pattern Matching With Swaps, Proc. IWOCA 2009, to appear. [5] D. Cantone S. Faro, Pattern Matching With Swaps for Short Patterns in Linear Time, Proc. 35th Intl. Conference on Theory Practice of Computer Science (SofSem 2009), LNCS 5404, Springer, pp , [6] F.B. Chedid, Parallel Pattern Matching With Swaps on a Linear Array, Proc. 10th Intl. Conference on Algorithms Architectures for Parallel Processing (ICA3PP 2010), LNCS 6081, Springer, pp , [7] C. S. Iliopoulos M. s. Rahman, A New Model to Solve the Swap Matching Problem Efficient Algorithms for Short Patterns, Proc. 34th Intl. Conference on Theory Practice of Computer Science (SofSem 2008), LNCS 4910, Springer, pp , [8] S. Muthukrishnan, New Results Open Problems Related to Non- Stard Stringology, Proc. 6th Annual Symp. Combinatorial Pattern Matching, LNCS 937, Springer, pp , 1995.

PATTERN MATCHING WITH SWAPS IN PRACTICE

PATTERN MATCHING WITH SWAPS IN PRACTICE International Journal of Foundations of Computer Science c World Scientific Publishing Company PATTERN MATCHING WITH SWAPS IN PRACTICE MATTEO CAMPANELLI Università di Catania, Scuola Superiore di Catania

More information

arxiv: v1 [cs.fl] 29 Jun 2013

arxiv: v1 [cs.fl] 29 Jun 2013 On a compact encoding of the swap automaton Kimmo Fredriksson 1 and Emanuele Giaquinta 2 arxiv:1307.0099v1 [cs.fl] 29 Jun 2013 1 School of Computing, University of Eastern Finland kimmo.fredriksson@uef.fi

More information

arxiv: v2 [cs.ds] 18 Sep 2013

arxiv: v2 [cs.ds] 18 Sep 2013 The Swap Matching Problem Revisited Pritom Ahmed 1, Costas S. Iliopoulos 2, A.S.M. Sohidull Islam 1, and M. Sohel Rahman 1 arxiv:1309.1981v2 [cs.ds] 18 Sep 2013 1 AlEDA Group, Department of Computer Science,

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

Efficient (δ, γ)-pattern-matching with Don t Cares

Efficient (δ, γ)-pattern-matching with Don t Cares fficient (δ, γ)-pattern-matching with Don t Cares Yoan José Pinzón Ardila Costas S. Iliopoulos Manolis Christodoulakis Manal Mohamed King s College London, Department of Computer Science, London WC2R 2LS,

More information

A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence

A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence A Simple Linear Space Algorithm for Computing a Longest Common Increasing Subsequence Danlin Cai, Daxin Zhu, Lei Wang, and Xiaodong Wang Abstract This paper presents a linear space algorithm for finding

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

String Regularities and Degenerate Strings

String Regularities and Degenerate Strings M. Sc. Thesis Defense Md. Faizul Bari (100705050P) Supervisor: Dr. M. Sohel Rahman String Regularities and Degenerate Strings Department of Computer Science and Engineering Bangladesh University of Engineering

More information

Online Computation of Abelian Runs

Online Computation of Abelian Runs Online Computation of Abelian Runs Gabriele Fici 1, Thierry Lecroq 2, Arnaud Lefebvre 2, and Élise Prieur-Gaston2 1 Dipartimento di Matematica e Informatica, Università di Palermo, Italy Gabriele.Fici@unipa.it

More information

2. Exact String Matching

2. Exact String Matching 2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.

More information

Lecture 3: String Matching

Lecture 3: String Matching COMP36111: Advanced Algorithms I Lecture 3: String Matching Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2017 18 Outline The string matching problem The Rabin-Karp algorithm The Knuth-Morris-Pratt

More information

arxiv: v1 [cs.ds] 15 Jun 2016

arxiv: v1 [cs.ds] 15 Jun 2016 A Simpler Bit-parallel Algorithm for Swap Matching Václav Blažej, Ondřej Suchý, and Tomáš Valla arxiv:1606.04763v1 [cs.ds] 15 Jun 2016 Faculty of Information Technology, Czech Technical University in Prague,

More information

INF 4130 / /8-2017

INF 4130 / /8-2017 INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve

More information

Pattern Matching with Address Errors: Rearrangement Distances

Pattern Matching with Address Errors: Rearrangement Distances Pattern Matching with Address Errors: Rearrangement Distances Amihood Amir Yonatan Aumann Gary Benson Avivit Levy Ohad Lipsky Ely Porat Steven Skiena Uzi Vishne Abstract Historically, approximate pattern

More information

String Search. 6th September 2018

String Search. 6th September 2018 String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)

More information

Finding all covers of an indeterminate string in O(n) time on average

Finding all covers of an indeterminate string in O(n) time on average Finding all covers of an indeterminate string in O(n) time on average Md. Faizul Bari, M. Sohel Rahman, and Rifat Shahriyar Department of Computer Science and Engineering Bangladesh University of Engineering

More information

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Hsing-Yen Ann National Center for High-Performance Computing Tainan 74147, Taiwan Chang-Biau Yang and Chiou-Ting

More information

On-line String Matching in Highly Similar DNA Sequences

On-line String Matching in Highly Similar DNA Sequences On-line String Matching in Highly Similar DNA Sequences Nadia Ben Nsira 1,2,ThierryLecroq 1,,MouradElloumi 2 1 LITIS EA 4108, Normastic FR3638, University of Rouen, France 2 LaTICE, University of Tunis

More information

arxiv: v1 [cs.ds] 15 Feb 2012

arxiv: v1 [cs.ds] 15 Feb 2012 Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl

More information

(In)approximability Results for Pattern Matching Problems

(In)approximability Results for Pattern Matching Problems (In)approximability Results for Pattern Matching Problems Raphaël Clifford and Alexandru Popa Department of Computer Science University of Bristol Merchant Venturer s Building Woodland Road, Bristol, BS8

More information

IN MUSICAL SEQUENCE ABSTRACT 1 INTRODUCTION 2 APPROXIMATE MATCHING AND MUSICAL SEQUENCES

IN MUSICAL SEQUENCE ABSTRACT 1 INTRODUCTION 2 APPROXIMATE MATCHING AND MUSICAL SEQUENCES ON TUNING THE (α, δ)-sequential-mpling ALGORITHM FOR δ-approximate MATCHING WITH α-bounded GAPS IN MUSICAL SEQUENCE Domenico Cantone, Salvatore Cristofaro, Simone Faro Università di Catania, Dipartimento

More information

String Matching with Variable Length Gaps

String Matching with Variable Length Gaps String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length

More information

Optimal spaced seeds for faster approximate string matching

Optimal spaced seeds for faster approximate string matching Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching

More information

INF 4130 / /8-2014

INF 4130 / /8-2014 INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT

More information

Optimal spaced seeds for faster approximate string matching

Optimal spaced seeds for faster approximate string matching Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching

More information

Alternative Algorithms for Lyndon Factorization

Alternative Algorithms for Lyndon Factorization Alternative Algorithms for Lyndon Factorization Suhpal Singh Ghuman 1, Emanuele Giaquinta 2, and Jorma Tarhio 1 1 Department of Computer Science and Engineering, Aalto University P.O.B. 15400, FI-00076

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1 Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt

More information

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift

More information

A Fully Compressed Pattern Matching Algorithm for Simple Collage Systems

A Fully Compressed Pattern Matching Algorithm for Simple Collage Systems A Fully Compressed Pattern Matching Algorithm for Simple Collage Systems Shunsuke Inenaga 1, Ayumi Shinohara 2,3 and Masayuki Takeda 2,3 1 Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23)

More information

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic

More information

Algorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser

Algorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser Algorithms and Data Structures Complexity of Algorithms Ulf Leser Content of this Lecture Efficiency of Algorithms Machine Model Complexity Examples Multiplication of two binary numbers (unit cost?) Exact

More information

Shift-And Approach to Pattern Matching in LZW Compressed Text

Shift-And Approach to Pattern Matching in LZW Compressed Text Shift-And Approach to Pattern Matching in LZW Compressed Text Takuya Kida, Masayuki Takeda, Ayumi Shinohara, and Setsuo Arikawa Department of Informatics, Kyushu University 33 Fukuoka 812-8581, Japan {kida,

More information

String Range Matching

String Range Matching String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings

More information

OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS

OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS Parallel Processing Letters, c World Scientific Publishing Company OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS COSTAS S. ILIOPOULOS Department of Computer Science, King s College London,

More information

Theoretical Computer Science. Efficient string-matching allowing for non-overlapping inversions

Theoretical Computer Science. Efficient string-matching allowing for non-overlapping inversions Theoretical Computer Science 483 (2013) 85 95 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Efficient string-matching allowing

More information

Forbidden Patterns. {vmakinen leena.salmela

Forbidden Patterns. {vmakinen leena.salmela Forbidden Patterns Johannes Fischer 1,, Travis Gagie 2,, Tsvi Kopelowitz 3, Moshe Lewenstein 4, Veli Mäkinen 5,, Leena Salmela 5,, and Niko Välimäki 5, 1 KIT, Karlsruhe, Germany, johannes.fischer@kit.edu

More information

Algorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer

Algorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer Algorithm Theory 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore Institut für Informatik Wintersemester 2007/08 Text Search Scenarios Static texts Literature databases Library systems Gene databases

More information

Compressed Index for Dynamic Text

Compressed Index for Dynamic Text Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution

More information

arxiv: v2 [cs.ds] 5 Mar 2014

arxiv: v2 [cs.ds] 5 Mar 2014 Order-preserving pattern matching with k mismatches Pawe l Gawrychowski 1 and Przemys law Uznański 2 1 Max-Planck-Institut für Informatik, Saarbrücken, Germany 2 LIF, CNRS and Aix-Marseille Université,

More information

Analysis of Algorithms Prof. Karen Daniels

Analysis of Algorithms Prof. Karen Daniels UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Spring, 2012 Tuesday, 4/24/2012 String Matching Algorithms Chapter 32* * Pseudocode uses 2 nd edition conventions 1 Chapter

More information

Efficient Sequential Algorithms, Comp309

Efficient Sequential Algorithms, Comp309 Efficient Sequential Algorithms, Comp309 University of Liverpool 2010 2011 Module Organiser, Igor Potapov Part 2: Pattern Matching References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to

More information

15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12

15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12 Algorithms Theory 15 Text search P.D. Dr. Alexander Souza Text search Various scenarios: Dynamic texts Text editors Symbol manipulators Static texts Literature databases Library systems Gene databases

More information

Section Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional)

Section Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional) Section 2.4 Section Summary Sequences. o Examples: Geometric Progression, Arithmetic Progression Recurrence Relations o Example: Fibonacci Sequence Summations Special Integer Sequences (optional) Sequences

More information

Languages, regular languages, finite automata

Languages, regular languages, finite automata Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

arxiv: v1 [cs.ds] 25 Nov 2009

arxiv: v1 [cs.ds] 25 Nov 2009 Alphabet Partitioning for Compressed Rank/Select with Applications Jérémy Barbay 1, Travis Gagie 1, Gonzalo Navarro 1 and Yakov Nekrich 2 1 Department of Computer Science University of Chile {jbarbay,

More information

(Preliminary Version)

(Preliminary Version) Relations Between δ-matching and Matching with Don t Care Symbols: δ-distinguishing Morphisms (Preliminary Version) Richard Cole, 1 Costas S. Iliopoulos, 2 Thierry Lecroq, 3 Wojciech Plandowski, 4 and

More information

Subset seed automaton

Subset seed automaton Subset seed automaton Gregory Kucherov, Laurent Noé, and Mikhail Roytberg 2 LIFL/CNRS/INRIA, Bât. M3 Cité Scientifique, 59655, Villeneuve d Ascq cedex, France, {Gregory.Kucherov,Laurent.Noe}@lifl.fr 2

More information

CSE 202 Homework 4 Matthias Springer, A

CSE 202 Homework 4 Matthias Springer, A CSE 202 Homework 4 Matthias Springer, A99500782 1 Problem 2 Basic Idea PERFECT ASSEMBLY N P: a permutation P of s i S is a certificate that can be checked in polynomial time by ensuring that P = S, and

More information

Local Search for String Problems: Brute Force is Essentially Optimal

Local Search for String Problems: Brute Force is Essentially Optimal Local Search for String Problems: Brute Force is Essentially Optimal Jiong Guo 1, Danny Hermelin 2, Christian Komusiewicz 3 1 Universität des Saarlandes, Campus E 1.7, D-66123 Saarbrücken, Germany. jguo@mmci.uni-saarland.de

More information

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts Philip Bille 1, Rolf Fagerberg 2, and Inge Li Gørtz 3 1 IT University of Copenhagen. Rued Langgaards

More information

A Lower-Variance Randomized Algorithm for Approximate String Matching

A Lower-Variance Randomized Algorithm for Approximate String Matching A Lower-Variance Randomized Algorithm for Approximate String Matching Mikhail J. Atallah Elena Grigorescu Yi Wu Department of Computer Science Purdue University West Lafayette, IN 47907 U.S.A. {mja,egrigore,wu510}@cs.purdue.edu

More information

THE COMMUNICATIONS COMPLEXITY HIERARCHY IN DISTRIBUTED COMPUTING. J. B. Sidney + and J. Urrutia*

THE COMMUNICATIONS COMPLEXITY HIERARCHY IN DISTRIBUTED COMPUTING. J. B. Sidney + and J. Urrutia* THE COMMUNICATIONS COMPLEXITY HIERARCHY IN DISTRIBUTED COMPUTING J. B. Sidney + and J. Urrutia* 1. INTRODUCTION Since the pioneering research of Cook [1] and Karp [3] in computational complexity, an enormous

More information

Quantum-inspired Huffman Coding

Quantum-inspired Huffman Coding Quantum-inspired Huffman Coding A. S. Tolba, M. Z. Rashad, and M. A. El-Dosuky Dept. of Computer Science, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt. tolba_954@yahoo.com,

More information

Lecture 18 April 26, 2012

Lecture 18 April 26, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and

More information

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University. Lexical Analysis Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University Today

More information

1 Alphabets and Languages

1 Alphabets and Languages 1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,

More information

Closure Under Reversal of Languages over Infinite Alphabets

Closure Under Reversal of Languages over Infinite Alphabets Closure Under Reversal of Languages over Infinite Alphabets Daniel Genkin 1, Michael Kaminski 2(B), and Liat Peterfreund 2 1 Department of Computer and Information Science, University of Pennsylvania,

More information

String Matching II. Algorithm : Design & Analysis [19]

String Matching II. Algorithm : Design & Analysis [19] String Matching II Algorithm : Design & Analysis [19] In the last class Simple String Matching KMP Flowchart Construction Jump at Fail KMP Scan String Matching II Boyer-Moore s heuristics Skipping unnecessary

More information

Extended Superposed Quantum State Initialization Using Disjoint Prime Implicants

Extended Superposed Quantum State Initialization Using Disjoint Prime Implicants Extended Superposed Quantum State Initialization Using Disjoint Prime Implicants David Rosenbaum, Marek Perkowski Portland State University, Department of Computer Science Portland State University, Department

More information

Define M to be a binary n by m matrix such that:

Define M to be a binary n by m matrix such that: The Shift-And Method Define M to be a binary n by m matrix such that: M(i,j) = iff the first i characters of P exactly match the i characters of T ending at character j. M(i,j) = iff P[.. i] T[j-i+.. j]

More information

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS *

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * 1 Jorma Tarhio and Esko Ukkonen Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki,

More information

Average Case Analysis of the Boyer-Moore Algorithm

Average Case Analysis of the Boyer-Moore Algorithm Average Case Analysis of the Boyer-Moore Algorithm TSUNG-HSI TSAI Institute of Statistical Science Academia Sinica Taipei 115 Taiwan e-mail: chonghi@stat.sinica.edu.tw URL: http://www.stat.sinica.edu.tw/chonghi/stat.htm

More information

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009 Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and

More information

Size reduction of multitape automata

Size reduction of multitape automata Literature: Size reduction of multitape automata Hellis Tamm Tamm, H. On minimality and size reduction of one-tape and multitape finite automata. PhD thesis, Department of Computer Science, University

More information

Sorting suffixes of two-pattern strings

Sorting suffixes of two-pattern strings Sorting suffixes of two-pattern strings Frantisek Franek W. F. Smyth Algorithms Research Group Department of Computing & Software McMaster University Hamilton, Ontario Canada L8S 4L7 April 19, 2004 Abstract

More information

String Regularities and Degenerate Strings

String Regularities and Degenerate Strings M.Sc. Engg. Thesis String Regularities and Degenerate Strings by Md. Faizul Bari Submitted to Department of Computer Science and Engineering in partial fulfilment of the requirments for the degree of Master

More information

SORTING SUFFIXES OF TWO-PATTERN STRINGS.

SORTING SUFFIXES OF TWO-PATTERN STRINGS. International Journal of Foundations of Computer Science c World Scientific Publishing Company SORTING SUFFIXES OF TWO-PATTERN STRINGS. FRANTISEK FRANEK and WILLIAM F. SMYTH Algorithms Research Group,

More information

Efficient Algorithms for the Longest Common Subsequence Problem with Sequential Substring Constraints

Efficient Algorithms for the Longest Common Subsequence Problem with Sequential Substring Constraints 2011 11th IEEE International Conference on Bioinformatics and Bioengineering Efficient Algorithms for the Longest Common Subsequence Problem with Sequential Substring Constraints Chiou-Ting Tseng, Chang-Biau

More information

Theory of Computing Tamás Herendi

Theory of Computing Tamás Herendi Theory of Computing Tamás Herendi Theory of Computing Tamás Herendi Publication date 2014 Table of Contents 1 Preface 1 2 Formal languages 2 3 Order of growth rate 9 4 Turing machines 16 1 The definition

More information

Recurrence Relations and Recursion: MATH 180

Recurrence Relations and Recursion: MATH 180 Recurrence Relations and Recursion: MATH 180 1: Recursively Defined Sequences Example 1: The sequence a 1,a 2,a 3,... can be defined recursively as follows: (1) For all integers k 2, a k = a k 1 + 1 (2)

More information

Dynamic Programming. Prof. S.J. Soni

Dynamic Programming. Prof. S.J. Soni Dynamic Programming Prof. S.J. Soni Idea is Very Simple.. Introduction void calculating the same thing twice, usually by keeping a table of known results that fills up as subinstances are solved. Dynamic

More information

CS173 Running Time and Big-O. Tandy Warnow

CS173 Running Time and Big-O. Tandy Warnow CS173 Running Time and Big-O Tandy Warnow CS 173 Running Times and Big-O analysis Tandy Warnow Today s material We will cover: Running time analysis Review of running time analysis of Bubblesort Review

More information

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:

More information

Math 471. Numerical methods Introduction

Math 471. Numerical methods Introduction Math 471. Numerical methods Introduction Section 1.1 1.4 of Bradie 1.1 Algorithms Here is an analogy between Numerical Methods and Gastronomy: Calculus, Lin Alg., Diff. eq. Ingredients Algorithm Recipe

More information

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Outline. Similarity Search. Outline. Motivation. The String Edit Distance Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018

More information

A Θ(n) Approximation Algorithm for 2-Dimensional Vector Packing

A Θ(n) Approximation Algorithm for 2-Dimensional Vector Packing A Θ(n) Approximation Algorithm for 2-Dimensional Vector Packing Ekow Otoo a, Ali Pinar b,, Doron Rotem a a Lawrence Berkeley National Laboratory b Sandia National Laboratories Abstract We study the 2-dimensional

More information

CS243, Logic and Computation Nondeterministic finite automata

CS243, Logic and Computation Nondeterministic finite automata CS243, Prof. Alvarez NONDETERMINISTIC FINITE AUTOMATA (NFA) Prof. Sergio A. Alvarez http://www.cs.bc.edu/ alvarez/ Maloney Hall, room 569 alvarez@cs.bc.edu Computer Science Department voice: (67) 552-4333

More information

Computing a Longest Common Palindromic Subsequence

Computing a Longest Common Palindromic Subsequence Fundamenta Informaticae 129 (2014) 1 12 1 DOI 10.3233/FI-2014-860 IOS Press Computing a Longest Common Palindromic Subsequence Shihabur Rahman Chowdhury, Md. Mahbubul Hasan, Sumaiya Iqbal, M. Sohel Rahman

More information

Streaming and communication complexity of Hamming distance

Streaming and communication complexity of Hamming distance Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern

More information

Succincter text indexing with wildcards

Succincter text indexing with wildcards University of British Columbia CPM 2011 June 27, 2011 Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview Problem overview

More information

Small-Space Dictionary Matching (Dissertation Proposal)

Small-Space Dictionary Matching (Dissertation Proposal) Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length

More information

The NP-hardness and APX-hardness of the Two-Dimensional Largest Common Substructure Problems

The NP-hardness and APX-hardness of the Two-Dimensional Largest Common Substructure Problems The NP-hardness and APX-hardness of the Two-Dimensional Largest Common Substructure Problems Syuan-Zong Chiou a, Chang-Biau Yang a and Yung-Hsing Peng b a Department of Computer Science and Engineering

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Section Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence

Section Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence Section 2.4 1 Section Summary Sequences. Examples: Geometric Progression, Arithmetic Progression Recurrence Relations Example: Fibonacci Sequence Summations 2 Introduction Sequences are ordered lists of

More information

An Implementation of an Address Generator Using Hash Memories

An Implementation of an Address Generator Using Hash Memories An Implementation of an Address Generator Using Memories Tsutomu Sasao and Munehiro Matsuura Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan Abstract

More information

Maximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017

Maximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017 Maximal Unbordered Factors of Random Strings arxiv:1704.04472v1 [cs.ds] 14 Apr 2017 Patrick Hagge Cording 1 and Mathias Bæk Tejs Knudsen 2 1 DTU Compute, Technical University of Denmark, phaco@dtu.dk 2

More information

Analysis and Design of Algorithms Dynamic Programming

Analysis and Design of Algorithms Dynamic Programming Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................

More information

A Formal Model and an Algorithm for Generating the Permutations of a Multiset

A Formal Model and an Algorithm for Generating the Permutations of a Multiset A Formal Mel and an Algorithm for Generating the Permutations of a Multiset VINCENZO DE FLORIO and GEERT DECONINCK Electrical Engineering Department ESAT Katholieke Universiteit Leuven, Kasteelpark Arenberg

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012 Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,

More information

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes ) CSE548, AMS54: Analysis of Algorithms, Spring 014 Date: May 1 Final In-Class Exam ( :35 PM 3:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your relative

More information

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des Text Searching Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, du Traitement de l Information et des Systèmes. International PhD School in Formal Languages and Applications Tarragona,

More information

Notes for Lecture Notes 2

Notes for Lecture Notes 2 Stanford University CS254: Computational Complexity Notes 2 Luca Trevisan January 11, 2012 Notes for Lecture Notes 2 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation

More information

Information Complexity vs. Communication Complexity: Hidden Layers Game

Information Complexity vs. Communication Complexity: Hidden Layers Game Information Complexity vs. Communication Complexity: Hidden Layers Game Jiahui Liu Final Project Presentation for Information Theory in TCS Introduction Review of IC vs CC Hidden Layers Game Upper Bound

More information

Text Analytics. Searching Terms. Ulf Leser

Text Analytics. Searching Terms. Ulf Leser Text Analytics Searching Terms Ulf Leser A Probabilistic Interpretation of Relevance We want to compute the probability that a doc d is relevant to query q The probabilistic model determines this probability

More information

An algorithm for the satisfiability problem of formulas in conjunctive normal form

An algorithm for the satisfiability problem of formulas in conjunctive normal form Journal of Algorithms 54 (2005) 40 44 www.elsevier.com/locate/jalgor An algorithm for the satisfiability problem of formulas in conjunctive normal form Rainer Schuler Abt. Theoretische Informatik, Universität

More information

PSC Prague Stringology Club

PSC Prague Stringology Club Proceedings of the Prague Stringology Conference 2013 Edited by Jan Holub and Jan Žd árek September 2013 PSC Prague Stringology Club http://www.stringology.org/ Conference Organisation Program Committee

More information

Local Search for String Problems: Brute Force is Essentially Optimal

Local Search for String Problems: Brute Force is Essentially Optimal Local Search for String Problems: Brute Force is Essentially Optimal Jiong Guo 1, Danny Hermelin 2, Christian Komusiewicz 3 1 Universität des Saarlandes, Campus E 1.7, D-66123 Saarbrücken, Germany. jguo@mmci.uni-saarland.de

More information

Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm

Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Matevž Jekovec University of Ljubljana Faculty of Computer and Information Science Oct 10, 2013 Text indexing problem

More information