Approximate periods of strings

Size: px
Start display at page:

Download "Approximate periods of strings"

Transcription

1 Theoretical Computer Science 262 (2001) Approximate periods of strings Jeong Seop Sim a;1, Costas S. Iliopoulos b; c;2, Kunsoo Park a;1; c; d;3, W.F. Smyth a School of Computer Science and Engineering, Seoul National University, Seoul , South Korea b Department of Computer Science, King s College London, London WC2R 2LS, UK c School of Computing, Curtin University, Perth WA 6825, Australia d Department of Computing & Software, McMaster University, Hamilton, Ontario, Canada L8S 4KI Received 27 December 1999; revised 27 June 2000; accepted 7 August 2000 Communicated by M. Crochemore Abstract The study of approximately periodic strings is relevant to diverse applications such as molecular biology, data compression, and computer-assisted music analysis. Here we study dierent forms of approximate periodicity under a variety of distance functions. We consider three related problems, for two of which we derive polynomial-time algorithms; we then show that the third problem is NP-complete. c 2001 Elsevier Science B.V. All rights reserved. Keywords: Periodicity; Approximate periods; Repetitions; Distance function 1. Introduction Repetitive or periodic strings have been studied in such diverse elds as molecular biology, data compression, and computer-assisted music analysis. In response to requirements arising out of a variety of applications, interest has arisen in algorithms for nding regularities in strings; that is, periodicities of an approximate nature. Some important regularities that have been studied in the literature are the following: Periods: Astring p is called a period of a string x if x can be written as x = p k p, where k 1 and p is a prex of p. The shortest period of x is called the period of x. For example, if x = abcabcab, then abc; abcabc; and x itself are periods of x, Corresponding author. Fax: addresses: jssim@theory.snu.ac.kr (J.S. Sim), csi@dcs.kcl.ac.uk (C.S. Iliopoulos), kpark@theory.snu.ac.kr (K. Park), smyth@mcmaster.ca (W.F. Smyth). 1 Supported by KOSEF Grant and the Brain Korea 21 Project. 2 Supported in part by the CCSLAAR Royal Society Research Grant. 3 Supported by NSERC Grant No. A /01/$ - see front matter c 2001 Elsevier Science B.V. All rights reserved. PII: S (00)

2 558 J.S. Sim et al. / Theoretical Computer Science 262 (2001) while abc is the period of x. Ifx has a period p such that p 6 x =2, then x is said to be periodic. Further, if setting x = p k implies k =1; x is said to be primitive; if k 2; p k is called a repetition. Covers: Astring w is called a cover of x if x can be constructed by concatenations and superpositions of w. For example, if x = ababaaba, then aba and x are the covers of x. Ifx has a cover w x; x is said to be quasiperiodic; otherwise, x is superprimitive. Seeds: Asubstring w of x is called a seed of x if it is a cover of any superstring of x. For example, aba and ababa are some seeds of x = ababaab. Repetitions: A repetition is an immediately repeated nonempty string. For example, if x = aababab, then aa and ababab are repetitions in x; in particular, a 2 = aa is called a square or tandem repeat and (ab) 3 = ababab is called a cube. The notions cover and seed are generalizations of periods in the sense that superpositions as well as concatenations are used to dene them. Asignicant amount of research has been done on each of these four notions: Periods: The preprocessing of the Knuth Morris Pratt algorithm [21] nds all periods of x in linear time in fact, all periods of every prex of x. In parallel computation, Apostolico et al. [2] gave an optimal O(log log n) time algorithm for nding all periods, where n is the length of x. Covers: Apostolico et al. [4] introduced the notion of covers and described a lineartime algorithm to test whether x is superprimitive or not (see also [7, 8, 17]). Moore and Smyth [28] and recently Li and Smyth [24] gave linear-time algorithms for nding all covers of x. In parallel computation, Iliopoulos and Park [18] obtained an optimal O(log log n) time algorithm for nding all covers of x. Apostolico and Ehrenfeucht [3] and Iliopoulos and Mouchard [16] considered the problem of nding maximal quasiperiodic substrings of x. Atwo-dimensional variant of the covering problem was studied in [11, 14], and minimum covering by substrings of a given length in [19]. Seeds: Iliopoulos et al. [15] introduced the notion of seeds and gave an O(n log n) time algorithm for computing all seeds of x. For the same problem, Berkman et al. [6] presented a parallel algorithm that requires O(log n) time and O(n log n) work. Repetitions: There are several O(n log n) time algorithms for nding all the repetitions in a string [10, 5, 26]. In parallel computation, Apostolico and Breslauer [1] gave an optimal O(log log n) time algorithm (i.e., total work is O(n log n)) for nding all the repetitions. Anatural extension of the repetition problems is to allow errors. Approximate repetitions are common in applications such as molecular biology and computer-assisted music analysis [9, 12]. Among the four notions above, only approximate repetitions have been studied. If x = uww v where w and w are similar, ww is called an approximate square or approximate tandem repeat. When there is a nonempty string y between w and w, we say that w and w are an approximate nontandem repeat. In [23], Landau and Schmidt gave an O(kn log k log n) time algorithm for nding approximate squares whose edit distance is at most k in a text of length n. Schmidt also

3 J.S. Sim et al. / Theoretical Computer Science 262 (2001) gave an O(n 2 log n) algorithm for nding approximate tandem or nontandem repeats in [30] which uses an arbitrary score for similarity of repeated strings. In this paper, we introduce the notion of approximate periods which is an approximate version of periods. Here we study dierent forms of approximate periodicity under a variety of distance functions. We consider three related problems, for two of which we derive polynomial-time algorithms; we then show that the third problem is NP-complete. This paper is organized as follows. In Section 2, we describe some notations and denitions used in this paper. In Section 3, we dene approximate periods and three related problems studied in this paper. Then we present polynomial-time algorithms for two problems and the proof of NP-completeness of the third problem in Section 4. We conclude in Section Preliminaries A string is a sequence of zero or more characters from an alphabet. The set of all strings over the alphabet is denoted by and all strings of length m over is denoted by m. The empty string is denoted by. The length of a string x is denoted by x and the ith character of x by x[i]. When a string w is x[i] x[i +1] x[j], we denote w by x[i::j] and w is called a substring (or a factor [13]) of x. Conversely, x is called a superstring of w. For example, bcd is a substring of abcde, and abcde is a superstring of bcd. Ablank space is denoted by =, and we regard it as a character for convenience. Astring w is a prex of x if x = wu for u. Similarly, w is a sux of x if x = uw for u. Astring w is a subsequence (also called a subword [13]) of x (or x is a supersequence of w) ifw is obtained by deleting zero or more characters (at any positions) from x. For example, ace is a subsequence of aabcdef. For a given set S of strings, a string s is called a common supersequence of S if s is a supersequence of every string in S. The distance (x; y) between two strings x and y is the minimum cost to convert one string x to the other string y. There are several well-known distance functions. The edit distance between two strings x and y is the minimum number of edit operations to convert x to y. The edit operations are the insertion of a character into x, the deletion of a character from x, and the change (or substitution) of a character in x with a character in y. Note that since we dene the edit distance by the number of edit operations, the cost of each edit operation is 1. The Hamming distance between x and y is the smallest number of change operations to convert x to y. Note that the Hamming distance can be dened only when x = y because it does not allow insertions and deletions. The edit distance can be generalized by using a penalty matrix. A penalty matrix species the substitution cost for each pair of characters and the insertion=deletion cost for each character. The weighted edit distance between x and y is the minimum cost to convert x to y using a penalty matrix. Apenalty matrix is a metric when it satises

4 560 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 1. An alignment. the following conditions for all a; b; c {}: (a; b) 0, (a; b)=(b; a), (a; a)=0; and (a; c)6(a; b) + (b; c) (triangle inequality). An alignment of a set S of strings can be represented by a two-dimensional matrix, where each row is a string in S and all rows have the same length. The equality of lengths of all the rows can be obtained by inserting zero or more spaces to each of the strings in S. For example, when S = {abcae; bcd; abde}; an alignment of S is shown in Fig. 1. We say that a distance function (x; y) is a relative distance function if the lengths of strings x and y are considered in the value of (x; y); otherwise, it is an absolute distance function. The Hamming distance and the edit distance are examples of absolute distance functions. There are two ways to dene a relative distance between x and y: First, we can x one of the two strings and dene a relative distance function with respect to the xed string. The error ratio with respect to x is dened to be d= x, where d is an absolute distance between x and y. Second, we can dene a relative distance function symmetrically. The symmetric error ratio is dened to be d=l, where d is an absolute distance between x and y, and l =( x + y )=2 [31]. Note that we may take l = x + y (then everything is the same except that the ratio is multiplied by 2). If d is the edit distance between x and y, the error ratio with respect to x or the symmetric error ratio is called a relative edit distance. The weighted edit distance can also be used as a relative distance function because the penalty matrix can contain arbitrary costs. 3. Problem denitions Given two strings x; p and a distance function ; we dene approximate periods as follows. If there exists a partition of x into disjoint blocks of substrings, i.e., x = p 1 p 2 p r (p i ) such that (p; p i )6t for 16i r; and (p ;p r )6t where p is some prex of p; we say that p is a t-approximate period of x (or p is an approximate period of x with distance t). Each p i ; 16i6r; will be called a partition block of x. The last partition block p r can be matched to any prex p of p as in the denition of exact periods. If p is shorter than p; the last partition block will be

5 J.S. Sim et al. / Theoretical Computer Science 262 (2001) called short. Note that there can be several versions of approximate periods according to the denition of distance function. We consider the following problems related to approximate periods. Problem 1. Given x; p; and any distance function ; nd the minimum t such that p is a t-approximate period of x. Since p is xed in this case, it makes no dierence whether is an absolute distance function or an error ratio with respect to p. If a threshold k6 p on the edit distance is given as input in Problem 1, the problem asks whether p is a k-approximate period of x or not. Note that if the edit distance is used for, it is trivially true that p is a p -approximate period of x. Problem 2. Given a string x and a relative distance function ; nd a substring p of x that is an approximate period of x with the minimum distance. Since the length of p is not (a priori) xed in this problem, we need to use a relative distance function for (i.e., an error ratio or a weighted edit distance) rather than an absolute distance function. For example, if the absolute edit distance is used, every substring of x of length 1 is a 1-approximate period of x. Moreover, we assume that there are at least two partition blocks of x except possibly the short partition block, because otherwise the longest proper prex of x (or any long prex of x) can easily become an approximate period of x with a small distance. This assumption will be applied to Problem 3, too. Problem 3. Given a string x and a relative distance function ; nd a string p that is an approximate period of x with the minimum distance. This problem is harder than Problem 2 because p can be any string, not necessarily a substring of x. 4. Algorithms and NP-completeness Basically, we will use weighted edit distances with metric penalty matrices for the distance function in each problem. Recall that a penalty matrix denes the substitution cost for each pair of characters and the insertion or deletion cost for each character Problem 1 Our algorithm for Problem 1 consists of two steps. Let n = x and m = p. (1) Let w ij be the distance between p and x[i::j]; 16i6j n and w in be the minimum value of the distances between all prexes of p and x[i::n]. Note that x[i::n] can be matched to any prex of p by the denition of approximate periods. Compute w ij for all 16i6j6n.

6 562 J.S. Sim et al. / Theoretical Computer Science 262 (2001) (2) Compute the minimum t such that p is a t-approximate period of x. We use dynamic programming to compute t as follows. Let t i be the minimum value such that p is a t i -approximate period of x[1::i]. Let t 0 = 0. For i =1 to n, we compute t i by the following formula: t i = min (max(t h;w h+1;i )): 06h i The value t n is the minimum t such that p is a t-approximate period of x. To compute the distances in step 1, we use a dynamic programming table called the D table. To compute the distance between two strings x and y, ad table of size ( x + 1) ( y + 1) is used. Each entry D[i; j]; 06i6 x and 06j6 y, stores the minimum cost of transforming x[1::i] to y[1::j]. Initially, D[0; 0] = 0; D[i; 0] = D[i 1; 0] + (x[i];), and D[0; j]=d[0; j 1] + (; y[j]). Then we can compute all the entries of the D table in O( x y ) time by the following recurrence: D[i 1; j]+(x[i];) D[i; j] = min D[i; j 1] + (; y[ j]) D[i 1; j 1] + (x[i]; y[ j]): Note that is a blank space, and thus (a; ) means the deletion cost of a and (; a) means the insertion cost of a. We now consider the time complexity of the above algorithm for Problem 1 under various distances functions. For a weighted edit distance, we make a D table of size (m +1) (n i + 2) for each position i of x. Since the distances between all prexes of p and x[i::n] appear in the last column of the D table, we can get the minimum value of the last column as w in without increasing the time complexity. Hence, step 1 takes O(mn 2 ) time. In step 2, we can compute the minimum t in O(n 2 ) time since we compare O(n) values for each t i. Therefore, the total time complexity is O(mn 2 ). When the edit distance is used for, this algorithm for Problem 1 can be improved. In this case, (a; b) is always 1 if a b; (a; b) = 0, otherwise. Now it is not necessary to compute the edit distances between p and the substrings of x whose lengths are larger than 2m because their edit distances with p will exceed m. Step 1 now takes O(m 2 n) time since we make a D table of size (m +1) (2m + 1) for each position of x. Also, step 2 can be done in O(mn) time since we compare O(m) values at each position of x. Thus the time complexity is reduced to O(m 2 n). However, we can do better. Step 1 can be solved in O(mn) time by the algorithm due to Landau et al. [22]. Given two strings x and y and a forward (resp. backward) solution for the comparison between x and y, the algorithm in [22] incrementally computes a solution for x and by (resp. yb) ino(k) time, where b is an additional character and k is a threshold on the edit distance. This can be done due to the relationship between the solution for x and y and the solution for x and by. When k = m (i.e., the threshold is not given), we can compute all the edit distances between p and every substring of x whose length is at most 2m in O(mn) time using this algorithm. Recently, Kim and Park [20] gave a simpler O(mn)-time algorithm for the same problem. Therefore,

7 J.S. Sim et al. / Theoretical Computer Science 262 (2001) we can solve Problem 1 in O(mn) time if the edit distance is used for. When the threshold k on the edit distance is given as input for Problem 1, it can be solved in O(kn) time because each step of the above algorithm takes O(kn) time. If we use the Hamming distance for, the algorithm takes trivially O(n) time since x must be partitioned into blocks of size m. (The last partition block can be shorter than m.) Therefore, we have the following theorem. Theorem 1. Problem 1 can be solved in O(mn 2 ) time when a weighted edit distance is used for. If the edit distance (resp. the Hamming distance) is used for ; it can be solved in O(mn) time (resp. in O(n) time) Problem 2 When a relative edit distance is used for, Problem 2 can be solved in O(n 4 ) time by our algorithm for Problem 1. Let p be a candidate string for the approximate period of x. If we take each substring of x as p and apply the O(mn) algorithm for Problem 1 (that uses the algorithm in [22]), it takes O( p n) time for each p. Since there are O(n 2 ) substrings of x, the overall time is O(n 4 ). Without using the somewhat complicated algorithm in [22], however, we can solve Problem 2 in O(n 4 ) time by the following simple algorithm for weighted edit distances as well as relative edit distances. We take every substring of x as a candidate string p, and compute the minimum distance t such that p is a t-approximate period of x. But, we do this for all substrings of x that start at a position i at the same time, as follows. Let T be the minimum distance so far. Initially, T =. For i =1 to n, wedothe following. For each i, we process the n i + 1 substrings that start at position i as candidate strings. Let m be the length of a chosen substring of x as p. Initially, m =1. (1) Take x[i::i + m 1] as p and compute w hj for all 16h6j6n. The denition of w hj is the same as in Problem 1. This computation can be done by making nd tables with p and each of n suxes of x. By adding just one row to each of previous D tables (i.e., ndtables when p = x[i::i +m 2]), we can compute these new D tables in O(n 2 ) time. See Fig. 2. (Note that when m = 1, we create new D tables.) (2) Compute the minimum distance t such that p is a t-approximate period of x. This step is similar to the second step of the algorithm for Problem 1. Let t j be the minimum value such that p is a t j -approximate period of x[1::j] and let t 0 =0. For j =1 to n, we compute t j by the following formula: t j = min 06h j (max(t h;w h+1;j )): The value t n is the minimum t such that p is a t-approximate period of x. Ift n is smaller than T, we update T with t n.ifm n i + 1, increase m by 1 and go to step 1.

8 564 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 2. Computing new D tables. When all the steps are completed, the nal value of T is the minimum distance and the substring p that is a T -approximate period of x is an answer to Problem 2. Theorem 2. Problem 2 can be solved in O(n 4 ) time when a weighted edit distance or a relative edit distance is used for. When a relative Hamming distance is used for ; Problem 2 can be solved in O(n 3 ) time. Proof. For a weighted edit distance, we make n D tables in O(n 2 ) time in step 1 and compute the minimum distance in O(n 2 ) time in step 2. For m =1 to n i +1, we repeat the two steps. Therefore, it takes O(n 3 ) time for each i and the total time complexity of this algorithm is O(n 4 ). If a relative edit distance is used, this algorithm can be slightly simplied as in Problem 1, but it still takes time O(n 4 ). For a relative Hamming distance, it takes O(n) time for each candidate string and there are O(n 2 ) candidate strings. Thus, the total time complexity is O(n 3 ) Problem 3 Given a set of strings, the shortest common supersequence (SCS) problem is to nd a shortest common supersequence of all strings in the set. The SCS problem is NP-complete [25, 29]. We will show that Problem 3 is NP-complete by a reduction from the SCS problem. In this section we will call Problem 3 the AP problem (abbreviation of the approximate period problem). The decision versions of the SCS and AP problems are as follows: Denition 1. Given a positive integer m and a nite set S of strings from where is a nite alphabet, the SCS problem is to decide if there exists a common supersequence w of S such that w 6m. Denition 2. Given a number t; a string x from ( ) where is a nite alphabet, and a penalty matrix, the AP problem is to decide if there exists a string u such that u is a t-approximate period of x.

9 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 3. The penalty matrix M. Now we transform an instance of the SCS problem to an instance of the AP problem. We can assume that = {0; 1} since the SCS problem is NP-complete even if = {0; 1} [27, 29]. Assume that there are n strings s 1 ;:::;s n in S. First, we set = {a; b; #; $; 1 ; 2 ;}. Let x =# 1 m $# 2 m $#s 1 $#s 2 $ #s n $. Then, set t = m and dene the penalty matrix as in Fig. 3, where a shaded entry can be any value greater than m which makes the penalty matrix a metric. For example, every shaded entry can be m + 1. It is easy to see that this transformation can be done in polynomial time. Lemma 1. Assume that x is constructed as above. If u is an m-approximate period of x; then u is of the form #$ where {a; b} m. Proof. We rst show that u must have one # and one $. (1) Suppose that u has no # (resp. $). Clearly, there exists a partition block of x which has at least one # (resp. $), and the distance between u and the partition block is greater than m. Therefore, u must have at least one # and at least one $. (2) Suppose that u has more than one # (or $). Assume that u has two # s. (The other cases are similar.) Then u must also have two $ s, because unless the number of # s equals that of $ s in u, at least one partition block of x cannot have the same numbers of # s and $ s to those of u. If the numbers of # s and $ s in u dier from those of a partition block of x, the distance between u and the partition block exceed m due to the penalty matrix M. Consider the rst partition block of x. The rst partition block is # 1 m $# 2 m $ because it must have two # s and two $ s as u. For the distance between u and the rst partition block of x to be at most m; u must have at least m characters from { 1 ; 2 }. In such cases, however, the distance between u and at least one partition block of x will exceed m. It remains to show that u =#$ such that {a; b} m. Since u has one # and one $, x must be partitioned just after every occurrence of $. Let u be of the form #$, where ; ; {0; 1; a;b; 1 ; 2 ;}. Consider the rst two partition blocks # 1 m $

10 566 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 4. An alignment of S {u}. and # 2 m $ofx. If contains i 1 s for i 1; must also have i 2 s and the remaining m 2i characters in must be from {a; b} so that the distances between u and the rst two partition blocks of x do not exceed m. However, this makes the distance between u and any other partition block of x exceed m due to 1 s and 2 s in. Hence cannot have 1 or 2. Also, cannot have any character from {0; 1; } since 0, 1 and have cost 2 with 1 and 2 in the rst two partition blocks of x. For the distances between u and the rst two partition blocks of x to be at most m, and must be empty and must be of the form {a; b} m. See Fig. 4. Theorem 3. The AP problem is NP-complete. Proof. It is easy to see that the AP problem is in NP. To show that the AP problem is NP-complete, we need to show that S has a common supersequence w such that w 6m if and only if there exists a string u such that u is an m-approximate period of x. (if) By Lemma 1, u =#$ where {a; b} m. Since u is an m-approximate period of x, the distance between u and each partition block #s i $ is at most m. (The distances between u and the rst two partition blocks # 1 m $ and # 2 m $ are always m.) Consider an alignment of S {u}. Since = m and the distance between and s i is at most m, each a (resp. b) in must be aligned with 0 (resp. 1) or in s i. (See cases (a) and (b) in Fig. 4.) If we substitute 0 for a and 1 for b in, we obtain a common supersequence w of s 1 ;:::;s n such that w = m. (Note that if a or b in is aligned with for all s i, we can delete the character in and we can obtain a common supersequence which is shorter than m. See case (c) in Fig. 4.) Asimilar alignment was used by Wang and Jiang [32].

11 J.S. Sim et al. / Theoretical Computer Science 262 (2001) (only if ) Let w be a common supersequence of S such that w 6m. Let be the string constructed by substituting a for 0 and b for1inw. (When w m, we append some characters from {a; b} to so that = m:) Partition x just after every occurrence of $. The distance between each partition block of x and #$ ism since each a (resp. b) in can be aligned with 0 (resp. 1), ; 1,or 2 in each partition block. Therefore, #$ is an m-approximate period of x. 5. Conclusion In this paper, we have introduced the notion of approximate periods and three related problems. We presented polynomial-time algorithms for two of the problems and an NP-completeness proof for the third problem. There can be a variety of problems on approximate periods. For example, the following problem is an interesting one: Given a string x and a distance function, nd a substring of x that has an approximate period. This is a problem of nding an interesting area (i.e., approximately periodic area) in a given string. References [1] A. Apostolico, D. Breslauer, An optimal O(loglogN )-time parallel algorithm for detecting all squares in a string, SIAM J. Comput. 25 (6) (1996) [2] A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in: Proc. 19th Int. Colloq. Automata Languages and Programming, Lecture Notes in Computer Science, Vol. 623, Springer, Berlin, 1992, pp [3] A. Apostolico, A. Ehrenfeucht, Ecient detection of quasiperiodicities in strings, Theoret. Comput. Sci. 119 (2) (1993) [4] A. Apostolico, M. Farach, C.S. Iliopoulos, Optimal superprimitivity testing for strings, Inform. Process. Lett. 39 (1) (1991) [5] A. Apostolico, F.P. Preparata, Optimal o-line detection of repetitions in a string, Theoret. Comput. Sci. 22 (1983) [6] O. Berkman, C.S. Iliopoulos, K. Park, The subtree max gap problem with application to parallel string covering, Inform. Comput. 123 (1) (1995) [7] D. Breslauer, An on-line string superprimitivity test, Inform. Process. Lett. 44 (1992) [8] D. Breslauer, Testing string superprimitivity in parallel, Inform. Process. Lett. 49 (5) (1994) [9] T. Crawford, C.S. Iliopoulos, R. Raman, String matching techniques for musical similarity and melodic recognition, Comput. Musicol. 11 (1998) [10] M. Crochemore, An optimal algorithm for computing the repetitions in a word, Inform. Process. Lett. 12 (5) (1981) [11] M. Crochemore, C.S. Iliopoulos, M. Korda, Two-dimensional prex string matching and covering on square matrices, Algorithmica 20 (1998) [12] M. Crochemore, C.S. Iliopoulos, H. Yu, Algorithms for computing evolutionary chains in molecular and musical sequences, Proc. 9th Austral. Workshop on Combinatorial Algorithms, 1998, pp [13] M. Crochemore, W. Rytter, Text Algorithms, Oxford University Press, Oxford, [14] C.S. Iliopoulos, M. Korda, Optimal parallel superprimitivity testing on square arrays, Parallel Process. Lett. 6 (3) (1996) [15] C.S. Iliopoulos, D.W.G. Moore, K. Park, Covering a string, Algorithmica 16 (1996) [16] C.S. Iliopoulos, L. Mouchard, An O(nlogn) algorithm for computing all maximal quasiperiodicities in strings, in: Proc. Computing: Australasian Theory Symposium, Lecture Notes in Computer Science, Springer, Berlin, 1999, pp

12 568 J.S. Sim et al. / Theoretical Computer Science 262 (2001) [17] C.S. Iliopoulos, K. Park, An optimal O(log logn)-time algorithm for parallel superprimitivity testing, J. Korea Inform. Sci. Soc. 21 (1994) [18] C.S. Iliopoulos, K. Park, Awork-time optimal algorithm for computing all string covers, Theoret. Comput. Sci. 164 (1996) [19] C.S. Iliopoulos, W.F. Smyth, On-line algorithms for k-covering, Proc. 9th Austral. Workshop on Combinatorial Algorithms, 1998, pp [20] S. Kim, K. Park, Adynamic edit distance table, in: Proc. 11th Symp. Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 1848, Springer, Berlin, 2000, pp [21] D.E. Knuth, J.H. Morris, V.R. Pratt, Fast pattern matching in strings, SIAM J. Comput. 6 (1) (1977) [22] G.M. Landau, E.W. Myers, J.P. Schmidt, Incremental string comparison, SIAM J. Comput. 27 (2) (1998) [23] G.M. Landau, J.P. Schmidt, An algorithm for approximate tandem repeats, in: Proc. 4th Symp. Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 648, Springer, Berlin, 1993, pp [24] Y. Li, W.F. Smyth, An optimal on-line algorithm to compute all the covers of a string, preprint. [25] D. Maier, The complexity of some problems on subsequences and supersequences, J. ACM 25 (2) (1978) [26] M.G. Main R.J. Lorentz, An algorithm for nding all repetitions in a string, J. Algorithms 5 (1984) [27] M. Middendorf, More on the complexity of common superstring and supersequence problems, Theoret. Comput. Sci. 125 (2) (1994) [28] D. Moore, W.F. Smyth, Acorrection to An optimal algorithm to compute all the covers of a string, Inform. Process. Lett. 54 (2) (1995) [29] K.J. Raiha, E. Ukkonen, The shortest common supersequence problem over binary alphabet is NP-complete, Theoret. Comput. Sci. 16 (1981) [30] J.P. Schmidt, All highest scoring paths in weighted grid graphs and its application to nding all approximate repeats in strings, SIAM J. Comput. 27 (4) (1998) [31] P.H. Sellers, Pattern recognition genetic sequences by mismatch density, Bull. Math. Biol. 46 (4) (1984) [32] L. Wang, T. Jiang, On the complexity of multiple sequence alignment, J. Comput. Biol. 1 (4) (1994)

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

Computing repetitive structures in indeterminate strings

Computing repetitive structures in indeterminate strings Computing repetitive structures in indeterminate strings Pavlos Antoniou 1, Costas S. Iliopoulos 1, Inuka Jayasekera 1, Wojciech Rytter 2,3 1 Dept. of Computer Science, King s College London, London WC2R

More information

Finding all covers of an indeterminate string in O(n) time on average

Finding all covers of an indeterminate string in O(n) time on average Finding all covers of an indeterminate string in O(n) time on average Md. Faizul Bari, M. Sohel Rahman, and Rifat Shahriyar Department of Computer Science and Engineering Bangladesh University of Engineering

More information

OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS

OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS Parallel Processing Letters, c World Scientific Publishing Company OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS COSTAS S. ILIOPOULOS Department of Computer Science, King s College London,

More information

MURDOCH RESEARCH REPOSITORY This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout or pagination. The definitive version is

More information

Optimal Superprimitivity Testing for Strings

Optimal Superprimitivity Testing for Strings Optimal Superprimitivity Testing for Strings Alberto Apostolico Martin Farach Costas S. Iliopoulos Fibonacci Report 90.7 August 1990 - Revised: March 1991 Abstract A string w covers another string z if

More information

(Preliminary Version)

(Preliminary Version) Relations Between δ-matching and Matching with Don t Care Symbols: δ-distinguishing Morphisms (Preliminary Version) Richard Cole, 1 Costas S. Iliopoulos, 2 Thierry Lecroq, 3 Wojciech Plandowski, 4 and

More information

A fast algorithm to generate necklaces with xed content

A fast algorithm to generate necklaces with xed content Theoretical Computer Science 301 (003) 477 489 www.elsevier.com/locate/tcs Note A fast algorithm to generate necklaces with xed content Joe Sawada 1 Department of Computer Science, University of Toronto,

More information

Overlapping factors in words

Overlapping factors in words AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 57 (2013), Pages 49 64 Overlapping factors in words Manolis Christodoulakis University of Cyprus P.O. Box 20537, 1678 Nicosia Cyprus christodoulakis.manolis@ucy.ac.cy

More information

MURDOCH RESEARCH REPOSITORY Authors Version Iliopoulos, C.S., Moore, D. and Smyth, W.. (1998) The covers of a circular ibonacci string. Journal of Combinatorial Mathematics and Combinatorial Computing,

More information

Number of occurrences of powers in strings

Number of occurrences of powers in strings Author manuscript, published in "International Journal of Foundations of Computer Science 21, 4 (2010) 535--547" DOI : 10.1142/S0129054110007416 Number of occurrences of powers in strings Maxime Crochemore

More information

Properties of Fibonacci languages

Properties of Fibonacci languages Discrete Mathematics 224 (2000) 215 223 www.elsevier.com/locate/disc Properties of Fibonacci languages S.S Yu a;, Yu-Kuang Zhao b a Department of Applied Mathematics, National Chung-Hsing University, Taichung,

More information

Sorting suffixes of two-pattern strings

Sorting suffixes of two-pattern strings Sorting suffixes of two-pattern strings Frantisek Franek W. F. Smyth Algorithms Research Group Department of Computing & Software McMaster University Hamilton, Ontario Canada L8S 4L7 April 19, 2004 Abstract

More information

Consensus Optimizing Both Distance Sum and Radius

Consensus Optimizing Both Distance Sum and Radius Consensus Optimizing Both Distance Sum and Radius Amihood Amir 1, Gad M. Landau 2, Joong Chae Na 3, Heejin Park 4, Kunsoo Park 5, and Jeong Seop Sim 6 1 Bar-Ilan University, 52900 Ramat-Gan, Israel 2 University

More information

String Regularities and Degenerate Strings

String Regularities and Degenerate Strings M.Sc. Engg. Thesis String Regularities and Degenerate Strings by Md. Faizul Bari Submitted to Department of Computer Science and Engineering in partial fulfilment of the requirments for the degree of Master

More information

Average Complexity of Exact and Approximate Multiple String Matching

Average Complexity of Exact and Approximate Multiple String Matching Average Complexity of Exact and Approximate Multiple String Matching Gonzalo Navarro Department of Computer Science University of Chile gnavarro@dcc.uchile.cl Kimmo Fredriksson Department of Computer Science

More information

Varieties of Regularities in Weighted Sequences

Varieties of Regularities in Weighted Sequences Varieties of Regularities in Weighted Sequences Hui Zhang 1, Qing Guo 2 and Costas S. Iliopoulos 3 1 College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang 310023,

More information

Efficient (δ, γ)-pattern-matching with Don t Cares

Efficient (δ, γ)-pattern-matching with Don t Cares fficient (δ, γ)-pattern-matching with Don t Cares Yoan José Pinzón Ardila Costas S. Iliopoulos Manolis Christodoulakis Manal Mohamed King s College London, Department of Computer Science, London WC2R 2LS,

More information

On the number of occurrences of all short factors in almost all words

On the number of occurrences of all short factors in almost all words Theoretical Computer Science 290 (2003) 2031 2035 www.elsevier.com/locate/tcs Note On the number of occurrences of all short factors in almost all words Ioan Tomescu Faculty of Mathematics, University

More information

University of Waterloo. W. F. Smyth. McMaster University. Curtin University of Technology

University of Waterloo. W. F. Smyth. McMaster University. Curtin University of Technology WEAK REPETITIONS IN STRINGS L. J. Cummings Faculty of Mathematics University of Waterloo W. F. Smyth Department of Computer Science & Systems McMaster University School of Computing Curtin University of

More information

String Regularities and Degenerate Strings

String Regularities and Degenerate Strings M. Sc. Thesis Defense Md. Faizul Bari (100705050P) Supervisor: Dr. M. Sohel Rahman String Regularities and Degenerate Strings Department of Computer Science and Engineering Bangladesh University of Engineering

More information

A Lower-Variance Randomized Algorithm for Approximate String Matching

A Lower-Variance Randomized Algorithm for Approximate String Matching A Lower-Variance Randomized Algorithm for Approximate String Matching Mikhail J. Atallah Elena Grigorescu Yi Wu Department of Computer Science Purdue University West Lafayette, IN 47907 U.S.A. {mja,egrigore,wu510}@cs.purdue.edu

More information

Efficient High-Similarity String Comparison: The Waterfall Algorithm

Efficient High-Similarity String Comparison: The Waterfall Algorithm Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)

More information

SORTING SUFFIXES OF TWO-PATTERN STRINGS.

SORTING SUFFIXES OF TWO-PATTERN STRINGS. International Journal of Foundations of Computer Science c World Scientific Publishing Company SORTING SUFFIXES OF TWO-PATTERN STRINGS. FRANTISEK FRANEK and WILLIAM F. SMYTH Algorithms Research Group,

More information

SIMPLE ALGORITHM FOR SORTING THE FIBONACCI STRING ROTATIONS

SIMPLE ALGORITHM FOR SORTING THE FIBONACCI STRING ROTATIONS SIMPLE ALGORITHM FOR SORTING THE FIBONACCI STRING ROTATIONS Manolis Christodoulakis 1, Costas S. Iliopoulos 1, Yoan José Pinzón Ardila 2 1 King s College London, Department of Computer Science London WC2R

More information

Compressed Index for Dynamic Text

Compressed Index for Dynamic Text Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution

More information

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS *

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * 1 Jorma Tarhio and Esko Ukkonen Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki,

More information

Note On Parikh slender context-free languages

Note On Parikh slender context-free languages Theoretical Computer Science 255 (2001) 667 677 www.elsevier.com/locate/tcs Note On Parikh slender context-free languages a; b; ; 1 Juha Honkala a Department of Mathematics, University of Turku, FIN-20014

More information

On Pattern Matching With Swaps

On Pattern Matching With Swaps On Pattern Matching With Swaps Fouad B. Chedid Dhofar University, Salalah, Oman Notre Dame University - Louaize, Lebanon P.O.Box: 2509, Postal Code 211 Salalah, Oman Tel: +968 23237200 Fax: +968 23237720

More information

Note Watson Crick D0L systems with regular triggers

Note Watson Crick D0L systems with regular triggers Theoretical Computer Science 259 (2001) 689 698 www.elsevier.com/locate/tcs Note Watson Crick D0L systems with regular triggers Juha Honkala a; ;1, Arto Salomaa b a Department of Mathematics, University

More information

On the Number of Distinct Squares

On the Number of Distinct Squares Frantisek (Franya) Franek Advanced Optimization Laboratory Department of Computing and Software McMaster University, Hamilton, Ontario, Canada Invited talk - Prague Stringology Conference 2014 Outline

More information

arxiv: v1 [cs.ds] 2 Dec 2009

arxiv: v1 [cs.ds] 2 Dec 2009 Variants of Constrained Longest Common Subsequence arxiv:0912.0368v1 [cs.ds] 2 Dec 2009 Paola Bonizzoni Gianluca Della Vedova Riccardo Dondi Yuri Pirola Abstract In this work, we consider a variant of

More information

Decision Problems Concerning. Prime Words and Languages of the

Decision Problems Concerning. Prime Words and Languages of the Decision Problems Concerning Prime Words and Languages of the PCP Marjo Lipponen Turku Centre for Computer Science TUCS Technical Report No 27 June 1996 ISBN 951-650-783-2 ISSN 1239-1891 Abstract This

More information

How many double squares can a string contain?

How many double squares can a string contain? How many double squares can a string contain? F. Franek, joint work with A. Deza and A. Thierry Algorithms Research Group Department of Computing and Software McMaster University, Hamilton, Ontario, Canada

More information

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9], Chapter 1 Comparison-Sorting and Selecting in Totally Monotone Matrices Noga Alon Yossi Azar y Abstract An mn matrix A is called totally monotone if for all i 1 < i 2 and j 1 < j 2, A[i 1; j 1] > A[i 1;

More information

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Hsing-Yen Ann National Center for High-Performance Computing Tainan 74147, Taiwan Chang-Biau Yang and Chiou-Ting

More information

String Range Matching

String Range Matching String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings

More information

Minimal enumerations of subsets of a nite set and the middle level problem

Minimal enumerations of subsets of a nite set and the middle level problem Discrete Applied Mathematics 114 (2001) 109 114 Minimal enumerations of subsets of a nite set and the middle level problem A.A. Evdokimov, A.L. Perezhogin 1 Sobolev Institute of Mathematics, Novosibirsk

More information

The number of runs in Sturmian words

The number of runs in Sturmian words The number of runs in Sturmian words Paweł Baturo 1, Marcin Piatkowski 1, and Wojciech Rytter 2,1 1 Department of Mathematics and Computer Science, Copernicus University 2 Institute of Informatics, Warsaw

More information

Optimal Superprimitivity Testing for Strings

Optimal Superprimitivity Testing for Strings Purdue University Purdue e-pubs Computer Science Technical Reports Department of Computer Science 1990 Optimal Superprimitivity Testing for Strings Alberto Apostolico Martin Farach Costas S. Iliopoulos

More information

SUFFIX TREE. SYNONYMS Compact suffix trie

SUFFIX TREE. SYNONYMS Compact suffix trie SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix

More information

State Complexity of Neighbourhoods and Approximate Pattern Matching

State Complexity of Neighbourhoods and Approximate Pattern Matching State Complexity of Neighbourhoods and Approximate Pattern Matching Timothy Ng, David Rappaport, and Kai Salomaa School of Computing, Queen s University, Kingston, Ontario K7L 3N6, Canada {ng, daver, ksalomaa}@cs.queensu.ca

More information

Approximate Binary Search Algorithms for Mean Cuts and Cycles

Approximate Binary Search Algorithms for Mean Cuts and Cycles Approximate Binary Search Algorithms for Mean Cuts and Cycles S. Thomas McCormick Faculty of Commerce and Business Administration University of British Columbia Vancouver, BC V6T 1Z2 Canada June 1992,

More information

Counting and Verifying Maximal Palindromes

Counting and Verifying Maximal Palindromes Counting and Verifying Maximal Palindromes Tomohiro I 1, Shunsuke Inenaga 2, Hideo Bannai 1, and Masayuki Takeda 1 1 Department of Informatics, Kyushu University 2 Graduate School of Information Science

More information

An efficient alignment algorithm for masked sequences

An efficient alignment algorithm for masked sequences Theoretical Computer Science 370 (2007) 19 33 www.elsevier.com/locate/tcs An efficient alignment algorithm for masked sequences Jin Wook Kim, Kunsoo Park School of Computer Science and Engineering, Seoul

More information

Restricted b-matchings in degree-bounded graphs

Restricted b-matchings in degree-bounded graphs Egerváry Research Group on Combinatorial Optimization Technical reports TR-009-1. Published by the Egerváry Research Group, Pázmány P. sétány 1/C, H1117, Budapest, Hungary. Web site: www.cs.elte.hu/egres.

More information

More Dynamic Programming

More Dynamic Programming CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider

More information

A shrinking lemma for random forbidding context languages

A shrinking lemma for random forbidding context languages Theoretical Computer Science 237 (2000) 149 158 www.elsevier.com/locate/tcs A shrinking lemma for random forbidding context languages Andries van der Walt a, Sigrid Ewert b; a Department of Mathematics,

More information

More Dynamic Programming

More Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?

More information

Shortest Common Superstrings and Scheduling. with Coordinated Starting Times. Institut fur Angewandte Informatik. und Formale Beschreibungsverfahren

Shortest Common Superstrings and Scheduling. with Coordinated Starting Times. Institut fur Angewandte Informatik. und Formale Beschreibungsverfahren Shortest Common Superstrings and Scheduling with Coordinated Starting Times Martin Middendorf Institut fur Angewandte Informatik und Formale Beschreibungsverfahren D-76128 Universitat Karlsruhe, Germany

More information

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des Text Searching Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, du Traitement de l Information et des Systèmes. International PhD School in Formal Languages and Applications Tarragona,

More information

NP-hardness of the stable matrix in unit interval family problem in discrete time

NP-hardness of the stable matrix in unit interval family problem in discrete time Systems & Control Letters 42 21 261 265 www.elsevier.com/locate/sysconle NP-hardness of the stable matrix in unit interval family problem in discrete time Alejandra Mercado, K.J. Ray Liu Electrical and

More information

Theoretical Computer Science. Efficient algorithms to compute compressed longest common substrings and compressed palindromes

Theoretical Computer Science. Efficient algorithms to compute compressed longest common substrings and compressed palindromes Theoretical Computer Science 410 (2009) 900 913 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Efficient algorithms to compute compressed

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

An on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to

An on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to (To appear in ALGORITHMICA) On{line construction of sux trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN{00014 University of Helsinki,

More information

Finding a longest common subsequence between a run-length-encoded string and an uncompressed string

Finding a longest common subsequence between a run-length-encoded string and an uncompressed string Journal of Complexity 24 (2008) 173 184 www.elsevier.com/locate/jco Finding a longest common subsequence between a run-length-encoded string and an uncompressed string J.J. Liu a, Y.L. Wang b,, R.C.T.

More information

String Matching with Variable Length Gaps

String Matching with Variable Length Gaps String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length

More information

Including Interval Encoding into Edit Distance Based Music Comparison and Retrieval

Including Interval Encoding into Edit Distance Based Music Comparison and Retrieval Including Interval Encoding into Edit Distance Based Music Comparison and Retrieval Kjell Lemström; Esko Ukkonen Department of Computer Science; P.O.Box 6, FIN-00014 University of Helsinki, Finland {klemstro,ukkonen}@cs.helsinki.fi

More information

On-line String Matching in Highly Similar DNA Sequences

On-line String Matching in Highly Similar DNA Sequences On-line String Matching in Highly Similar DNA Sequences Nadia Ben Nsira 1,2,ThierryLecroq 1,,MouradElloumi 2 1 LITIS EA 4108, Normastic FR3638, University of Rouen, France 2 LaTICE, University of Tunis

More information

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization Department of Computer Science Series of Publications C Report C-2004-2 The Complexity of Maximum Matroid-Greedoid Intersection and Weighted Greedoid Maximization Taneli Mielikäinen Esko Ukkonen University

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

BOUNDS ON ZIMIN WORD AVOIDANCE

BOUNDS ON ZIMIN WORD AVOIDANCE BOUNDS ON ZIMIN WORD AVOIDANCE JOSHUA COOPER* AND DANNY RORABAUGH* Abstract. How long can a word be that avoids the unavoidable? Word W encounters word V provided there is a homomorphism φ defined by mapping

More information

THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS

THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS GÁBOR HORVÁTH, CHRYSTOPHER L. NEHANIV, AND KÁROLY PODOSKI Dedicated to John Rhodes on the occasion of his 80th birthday.

More information

Independence of certain quantities indicating subword occurrences

Independence of certain quantities indicating subword occurrences Theoretical Computer Science 362 (2006) 222 231 wwwelseviercom/locate/tcs Independence of certain quantities indicating subword occurrences Arto Salomaa Turku Centre for Computer Science, Lemminkäisenkatu

More information

arxiv: v1 [cs.dm] 12 Jun 2016

arxiv: v1 [cs.dm] 12 Jun 2016 A Simple Extension of Dirac s Theorem on Hamiltonicity Yasemin Büyükçolak a,, Didem Gözüpek b, Sibel Özkana, Mordechai Shalom c,d,1 a Department of Mathematics, Gebze Technical University, Kocaeli, Turkey

More information

The Intractability of Computing the Hamming Distance

The Intractability of Computing the Hamming Distance The Intractability of Computing the Hamming Distance Bodo Manthey and Rüdiger Reischuk Universität zu Lübeck, Institut für Theoretische Informatik Wallstraße 40, 23560 Lübeck, Germany manthey/reischuk@tcs.uni-luebeck.de

More information

The limitedness problem on distance automata: Hashiguchi s method revisited

The limitedness problem on distance automata: Hashiguchi s method revisited Theoretical Computer Science 310 (2004) 147 158 www.elsevier.com/locate/tcs The limitedness problem on distance automata: Hashiguchi s method revisited Hing Leung, Viktor Podolskiy Department of Computer

More information

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009 Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and

More information

arxiv: v1 [cs.cc] 4 Feb 2008

arxiv: v1 [cs.cc] 4 Feb 2008 On the complexity of finding gapped motifs Morris Michael a,1, François Nicolas a, and Esko Ukkonen a arxiv:0802.0314v1 [cs.cc] 4 Feb 2008 Abstract a Department of Computer Science, P. O. Box 68 (Gustaf

More information

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins.

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins. On-line Bin-Stretching Yossi Azar y Oded Regev z Abstract We are given a sequence of items that can be packed into m unit size bins. In the classical bin packing problem we x the size of the bins and try

More information

Algorithms for Three Versions of the Shortest Common Superstring Problem

Algorithms for Three Versions of the Shortest Common Superstring Problem Algorithms for Three Versions of the Shortest Common Superstring Problem Maxime Crochemore, Marek Cygan, Costas Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen King s College

More information

Minimizing the size of an identifying or locating-dominating code in a graph is NP-hard

Minimizing the size of an identifying or locating-dominating code in a graph is NP-hard Theoretical Computer Science 290 (2003) 2109 2120 www.elsevier.com/locate/tcs Note Minimizing the size of an identifying or locating-dominating code in a graph is NP-hard Irene Charon, Olivier Hudry, Antoine

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 410 (2009) 2759 2766 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Note Computing the longest topological

More information

Motif Extraction from Weighted Sequences

Motif Extraction from Weighted Sequences Motif Extraction from Weighted Sequences C. Iliopoulos 1, K. Perdikuri 2,3, E. Theodoridis 2,3,, A. Tsakalidis 2,3 and K. Tsichlas 1 1 Department of Computer Science, King s College London, London WC2R

More information

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Outline. Similarity Search. Outline. Motivation. The String Edit Distance Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018

More information

Approximation: Theory and Algorithms

Approximation: Theory and Algorithms Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:

More information

DD2446 Complexity Theory: Problem Set 4

DD2446 Complexity Theory: Problem Set 4 DD2446 Complexity Theory: Problem Set 4 Due: Friday November 8, 2013, at 23:59. Submit your solutions as a PDF le by e-mail to jakobn at kth dot se with the subject line Problem set 4: your full name.

More information

Lifting to non-integral idempotents

Lifting to non-integral idempotents Journal of Pure and Applied Algebra 162 (2001) 359 366 www.elsevier.com/locate/jpaa Lifting to non-integral idempotents Georey R. Robinson School of Mathematics and Statistics, University of Birmingham,

More information

On shredders and vertex connectivity augmentation

On shredders and vertex connectivity augmentation On shredders and vertex connectivity augmentation Gilad Liberman The Open University of Israel giladliberman@gmail.com Zeev Nutov The Open University of Israel nutov@openu.ac.il Abstract We consider the

More information

Analysis and Design of Algorithms Dynamic Programming

Analysis and Design of Algorithms Dynamic Programming Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................

More information

Periodicity in Rectangular Arrays

Periodicity in Rectangular Arrays Periodicity in Rectangular Arrays arxiv:1602.06915v2 [cs.dm] 1 Jul 2016 Guilhem Gamard LIRMM CNRS, Univ. Montpellier UMR 5506, CC 477 161 rue Ada 34095 Montpellier Cedex 5 France guilhem.gamard@lirmm.fr

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

Theoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts

Theoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012

Similarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012 Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,

More information

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool Constant-Space String-Matching in Sublinear Average Tie (Extended Abstract) Maxie Crocheore Universite de Marne-la-Vallee Leszek Gasieniec y Max-Planck Institut fur Inforatik Wojciech Rytter z Warsaw University

More information

Theoretical Computer Science. Why greed works for shortest common superstring problem

Theoretical Computer Science. Why greed works for shortest common superstring problem Theoretical Computer Science 410 (2009) 5374 5381 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Why greed works for shortest common

More information

arxiv: v2 [cs.ds] 5 Mar 2014

arxiv: v2 [cs.ds] 5 Mar 2014 Order-preserving pattern matching with k mismatches Pawe l Gawrychowski 1 and Przemys law Uznański 2 1 Max-Planck-Institut für Informatik, Saarbrücken, Germany 2 LIF, CNRS and Aix-Marseille Université,

More information

It is well-known (cf. [2,4,5,9]) that the generating function P w() summed over all tableaux of shape = where the parts in row i are at most a i and a

It is well-known (cf. [2,4,5,9]) that the generating function P w() summed over all tableaux of shape = where the parts in row i are at most a i and a Counting tableaux with row and column bounds C. Krattenthalery S. G. Mohantyz Abstract. It is well-known that the generating function for tableaux of a given skew shape with r rows where the parts in the

More information

Computability and Complexity

Computability and Complexity Computability and Complexity Non-determinism, Regular Expressions CAS 705 Ryszard Janicki Department of Computing and Software McMaster University Hamilton, Ontario, Canada janicki@mcmaster.ca Ryszard

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

Counting Palindromic Binary Strings Without r-runs of Ones

Counting Palindromic Binary Strings Without r-runs of Ones 1 3 47 6 3 11 Journal of Integer Sequences, Vol. 16 (013), Article 13.8.7 Counting Palindromic Binary Strings Without r-runs of Ones M. A. Nyblom School of Mathematics and Geospatial Science RMIT University

More information

arxiv: v1 [cs.dm] 18 May 2016

arxiv: v1 [cs.dm] 18 May 2016 A note on the shortest common superstring of NGS reads arxiv:1605.05542v1 [cs.dm] 18 May 2016 Tristan Braquelaire Marie Gasparoux Mathieu Raffinot Raluca Uricaru May 19, 2016 Abstract The Shortest Superstring

More information

Maximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017

Maximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017 Maximal Unbordered Factors of Random Strings arxiv:1704.04472v1 [cs.ds] 14 Apr 2017 Patrick Hagge Cording 1 and Mathias Bæk Tejs Knudsen 2 1 DTU Compute, Technical University of Denmark, phaco@dtu.dk 2

More information

arxiv: v1 [cs.ds] 15 Feb 2012

arxiv: v1 [cs.ds] 15 Feb 2012 Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl

More information

Similarity Search. The String Edit Distance. Nikolaus Augsten.

Similarity Search. The String Edit Distance. Nikolaus Augsten. Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester

More information

Minimal and Maximal Critical Sets in Room Squares

Minimal and Maximal Critical Sets in Room Squares University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 1996 Minimal and Maximal Critical Sets in Room Squares Ghulam Rasool Chaudhry

More information

On the reconstruction of the degree sequence

On the reconstruction of the degree sequence Discrete Mathematics 259 (2002) 293 300 www.elsevier.com/locate/disc Note On the reconstruction of the degree sequence Charles Delorme a, Odile Favaron a, Dieter Rautenbach b;c; ;1 a LRI, Bât. 490, Universite

More information

On the Complexity of Budgeted Maximum Path Coverage on Trees

On the Complexity of Budgeted Maximum Path Coverage on Trees On the Complexity of Budgeted Maximum Path Coverage on Trees H.-C. Wirth An instance of the budgeted maximum coverage problem is given by a set of weighted ground elements and a cost weighted family of

More information

Greedy Conjecture for Strings of Length 4

Greedy Conjecture for Strings of Length 4 Greedy Conjecture for Strings of Length 4 Alexander S. Kulikov 1, Sergey Savinov 2, and Evgeniy Sluzhaev 1,2 1 St. Petersburg Department of Steklov Institute of Mathematics 2 St. Petersburg Academic University

More information

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139 Upper and Lower Bounds on the Number of Faults a System Can Withstand Without Repairs Michel Goemans y Nancy Lynch z Isaac Saias x Laboratory for Computer Science Massachusetts Institute of Technology

More information