Approximate periods of strings
|
|
- Gervase Boone
- 5 years ago
- Views:
Transcription
1 Theoretical Computer Science 262 (2001) Approximate periods of strings Jeong Seop Sim a;1, Costas S. Iliopoulos b; c;2, Kunsoo Park a;1; c; d;3, W.F. Smyth a School of Computer Science and Engineering, Seoul National University, Seoul , South Korea b Department of Computer Science, King s College London, London WC2R 2LS, UK c School of Computing, Curtin University, Perth WA 6825, Australia d Department of Computing & Software, McMaster University, Hamilton, Ontario, Canada L8S 4KI Received 27 December 1999; revised 27 June 2000; accepted 7 August 2000 Communicated by M. Crochemore Abstract The study of approximately periodic strings is relevant to diverse applications such as molecular biology, data compression, and computer-assisted music analysis. Here we study dierent forms of approximate periodicity under a variety of distance functions. We consider three related problems, for two of which we derive polynomial-time algorithms; we then show that the third problem is NP-complete. c 2001 Elsevier Science B.V. All rights reserved. Keywords: Periodicity; Approximate periods; Repetitions; Distance function 1. Introduction Repetitive or periodic strings have been studied in such diverse elds as molecular biology, data compression, and computer-assisted music analysis. In response to requirements arising out of a variety of applications, interest has arisen in algorithms for nding regularities in strings; that is, periodicities of an approximate nature. Some important regularities that have been studied in the literature are the following: Periods: Astring p is called a period of a string x if x can be written as x = p k p, where k 1 and p is a prex of p. The shortest period of x is called the period of x. For example, if x = abcabcab, then abc; abcabc; and x itself are periods of x, Corresponding author. Fax: addresses: jssim@theory.snu.ac.kr (J.S. Sim), csi@dcs.kcl.ac.uk (C.S. Iliopoulos), kpark@theory.snu.ac.kr (K. Park), smyth@mcmaster.ca (W.F. Smyth). 1 Supported by KOSEF Grant and the Brain Korea 21 Project. 2 Supported in part by the CCSLAAR Royal Society Research Grant. 3 Supported by NSERC Grant No. A /01/$ - see front matter c 2001 Elsevier Science B.V. All rights reserved. PII: S (00)
2 558 J.S. Sim et al. / Theoretical Computer Science 262 (2001) while abc is the period of x. Ifx has a period p such that p 6 x =2, then x is said to be periodic. Further, if setting x = p k implies k =1; x is said to be primitive; if k 2; p k is called a repetition. Covers: Astring w is called a cover of x if x can be constructed by concatenations and superpositions of w. For example, if x = ababaaba, then aba and x are the covers of x. Ifx has a cover w x; x is said to be quasiperiodic; otherwise, x is superprimitive. Seeds: Asubstring w of x is called a seed of x if it is a cover of any superstring of x. For example, aba and ababa are some seeds of x = ababaab. Repetitions: A repetition is an immediately repeated nonempty string. For example, if x = aababab, then aa and ababab are repetitions in x; in particular, a 2 = aa is called a square or tandem repeat and (ab) 3 = ababab is called a cube. The notions cover and seed are generalizations of periods in the sense that superpositions as well as concatenations are used to dene them. Asignicant amount of research has been done on each of these four notions: Periods: The preprocessing of the Knuth Morris Pratt algorithm [21] nds all periods of x in linear time in fact, all periods of every prex of x. In parallel computation, Apostolico et al. [2] gave an optimal O(log log n) time algorithm for nding all periods, where n is the length of x. Covers: Apostolico et al. [4] introduced the notion of covers and described a lineartime algorithm to test whether x is superprimitive or not (see also [7, 8, 17]). Moore and Smyth [28] and recently Li and Smyth [24] gave linear-time algorithms for nding all covers of x. In parallel computation, Iliopoulos and Park [18] obtained an optimal O(log log n) time algorithm for nding all covers of x. Apostolico and Ehrenfeucht [3] and Iliopoulos and Mouchard [16] considered the problem of nding maximal quasiperiodic substrings of x. Atwo-dimensional variant of the covering problem was studied in [11, 14], and minimum covering by substrings of a given length in [19]. Seeds: Iliopoulos et al. [15] introduced the notion of seeds and gave an O(n log n) time algorithm for computing all seeds of x. For the same problem, Berkman et al. [6] presented a parallel algorithm that requires O(log n) time and O(n log n) work. Repetitions: There are several O(n log n) time algorithms for nding all the repetitions in a string [10, 5, 26]. In parallel computation, Apostolico and Breslauer [1] gave an optimal O(log log n) time algorithm (i.e., total work is O(n log n)) for nding all the repetitions. Anatural extension of the repetition problems is to allow errors. Approximate repetitions are common in applications such as molecular biology and computer-assisted music analysis [9, 12]. Among the four notions above, only approximate repetitions have been studied. If x = uww v where w and w are similar, ww is called an approximate square or approximate tandem repeat. When there is a nonempty string y between w and w, we say that w and w are an approximate nontandem repeat. In [23], Landau and Schmidt gave an O(kn log k log n) time algorithm for nding approximate squares whose edit distance is at most k in a text of length n. Schmidt also
3 J.S. Sim et al. / Theoretical Computer Science 262 (2001) gave an O(n 2 log n) algorithm for nding approximate tandem or nontandem repeats in [30] which uses an arbitrary score for similarity of repeated strings. In this paper, we introduce the notion of approximate periods which is an approximate version of periods. Here we study dierent forms of approximate periodicity under a variety of distance functions. We consider three related problems, for two of which we derive polynomial-time algorithms; we then show that the third problem is NP-complete. This paper is organized as follows. In Section 2, we describe some notations and denitions used in this paper. In Section 3, we dene approximate periods and three related problems studied in this paper. Then we present polynomial-time algorithms for two problems and the proof of NP-completeness of the third problem in Section 4. We conclude in Section Preliminaries A string is a sequence of zero or more characters from an alphabet. The set of all strings over the alphabet is denoted by and all strings of length m over is denoted by m. The empty string is denoted by. The length of a string x is denoted by x and the ith character of x by x[i]. When a string w is x[i] x[i +1] x[j], we denote w by x[i::j] and w is called a substring (or a factor [13]) of x. Conversely, x is called a superstring of w. For example, bcd is a substring of abcde, and abcde is a superstring of bcd. Ablank space is denoted by =, and we regard it as a character for convenience. Astring w is a prex of x if x = wu for u. Similarly, w is a sux of x if x = uw for u. Astring w is a subsequence (also called a subword [13]) of x (or x is a supersequence of w) ifw is obtained by deleting zero or more characters (at any positions) from x. For example, ace is a subsequence of aabcdef. For a given set S of strings, a string s is called a common supersequence of S if s is a supersequence of every string in S. The distance (x; y) between two strings x and y is the minimum cost to convert one string x to the other string y. There are several well-known distance functions. The edit distance between two strings x and y is the minimum number of edit operations to convert x to y. The edit operations are the insertion of a character into x, the deletion of a character from x, and the change (or substitution) of a character in x with a character in y. Note that since we dene the edit distance by the number of edit operations, the cost of each edit operation is 1. The Hamming distance between x and y is the smallest number of change operations to convert x to y. Note that the Hamming distance can be dened only when x = y because it does not allow insertions and deletions. The edit distance can be generalized by using a penalty matrix. A penalty matrix species the substitution cost for each pair of characters and the insertion=deletion cost for each character. The weighted edit distance between x and y is the minimum cost to convert x to y using a penalty matrix. Apenalty matrix is a metric when it satises
4 560 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 1. An alignment. the following conditions for all a; b; c {}: (a; b) 0, (a; b)=(b; a), (a; a)=0; and (a; c)6(a; b) + (b; c) (triangle inequality). An alignment of a set S of strings can be represented by a two-dimensional matrix, where each row is a string in S and all rows have the same length. The equality of lengths of all the rows can be obtained by inserting zero or more spaces to each of the strings in S. For example, when S = {abcae; bcd; abde}; an alignment of S is shown in Fig. 1. We say that a distance function (x; y) is a relative distance function if the lengths of strings x and y are considered in the value of (x; y); otherwise, it is an absolute distance function. The Hamming distance and the edit distance are examples of absolute distance functions. There are two ways to dene a relative distance between x and y: First, we can x one of the two strings and dene a relative distance function with respect to the xed string. The error ratio with respect to x is dened to be d= x, where d is an absolute distance between x and y. Second, we can dene a relative distance function symmetrically. The symmetric error ratio is dened to be d=l, where d is an absolute distance between x and y, and l =( x + y )=2 [31]. Note that we may take l = x + y (then everything is the same except that the ratio is multiplied by 2). If d is the edit distance between x and y, the error ratio with respect to x or the symmetric error ratio is called a relative edit distance. The weighted edit distance can also be used as a relative distance function because the penalty matrix can contain arbitrary costs. 3. Problem denitions Given two strings x; p and a distance function ; we dene approximate periods as follows. If there exists a partition of x into disjoint blocks of substrings, i.e., x = p 1 p 2 p r (p i ) such that (p; p i )6t for 16i r; and (p ;p r )6t where p is some prex of p; we say that p is a t-approximate period of x (or p is an approximate period of x with distance t). Each p i ; 16i6r; will be called a partition block of x. The last partition block p r can be matched to any prex p of p as in the denition of exact periods. If p is shorter than p; the last partition block will be
5 J.S. Sim et al. / Theoretical Computer Science 262 (2001) called short. Note that there can be several versions of approximate periods according to the denition of distance function. We consider the following problems related to approximate periods. Problem 1. Given x; p; and any distance function ; nd the minimum t such that p is a t-approximate period of x. Since p is xed in this case, it makes no dierence whether is an absolute distance function or an error ratio with respect to p. If a threshold k6 p on the edit distance is given as input in Problem 1, the problem asks whether p is a k-approximate period of x or not. Note that if the edit distance is used for, it is trivially true that p is a p -approximate period of x. Problem 2. Given a string x and a relative distance function ; nd a substring p of x that is an approximate period of x with the minimum distance. Since the length of p is not (a priori) xed in this problem, we need to use a relative distance function for (i.e., an error ratio or a weighted edit distance) rather than an absolute distance function. For example, if the absolute edit distance is used, every substring of x of length 1 is a 1-approximate period of x. Moreover, we assume that there are at least two partition blocks of x except possibly the short partition block, because otherwise the longest proper prex of x (or any long prex of x) can easily become an approximate period of x with a small distance. This assumption will be applied to Problem 3, too. Problem 3. Given a string x and a relative distance function ; nd a string p that is an approximate period of x with the minimum distance. This problem is harder than Problem 2 because p can be any string, not necessarily a substring of x. 4. Algorithms and NP-completeness Basically, we will use weighted edit distances with metric penalty matrices for the distance function in each problem. Recall that a penalty matrix denes the substitution cost for each pair of characters and the insertion or deletion cost for each character Problem 1 Our algorithm for Problem 1 consists of two steps. Let n = x and m = p. (1) Let w ij be the distance between p and x[i::j]; 16i6j n and w in be the minimum value of the distances between all prexes of p and x[i::n]. Note that x[i::n] can be matched to any prex of p by the denition of approximate periods. Compute w ij for all 16i6j6n.
6 562 J.S. Sim et al. / Theoretical Computer Science 262 (2001) (2) Compute the minimum t such that p is a t-approximate period of x. We use dynamic programming to compute t as follows. Let t i be the minimum value such that p is a t i -approximate period of x[1::i]. Let t 0 = 0. For i =1 to n, we compute t i by the following formula: t i = min (max(t h;w h+1;i )): 06h i The value t n is the minimum t such that p is a t-approximate period of x. To compute the distances in step 1, we use a dynamic programming table called the D table. To compute the distance between two strings x and y, ad table of size ( x + 1) ( y + 1) is used. Each entry D[i; j]; 06i6 x and 06j6 y, stores the minimum cost of transforming x[1::i] to y[1::j]. Initially, D[0; 0] = 0; D[i; 0] = D[i 1; 0] + (x[i];), and D[0; j]=d[0; j 1] + (; y[j]). Then we can compute all the entries of the D table in O( x y ) time by the following recurrence: D[i 1; j]+(x[i];) D[i; j] = min D[i; j 1] + (; y[ j]) D[i 1; j 1] + (x[i]; y[ j]): Note that is a blank space, and thus (a; ) means the deletion cost of a and (; a) means the insertion cost of a. We now consider the time complexity of the above algorithm for Problem 1 under various distances functions. For a weighted edit distance, we make a D table of size (m +1) (n i + 2) for each position i of x. Since the distances between all prexes of p and x[i::n] appear in the last column of the D table, we can get the minimum value of the last column as w in without increasing the time complexity. Hence, step 1 takes O(mn 2 ) time. In step 2, we can compute the minimum t in O(n 2 ) time since we compare O(n) values for each t i. Therefore, the total time complexity is O(mn 2 ). When the edit distance is used for, this algorithm for Problem 1 can be improved. In this case, (a; b) is always 1 if a b; (a; b) = 0, otherwise. Now it is not necessary to compute the edit distances between p and the substrings of x whose lengths are larger than 2m because their edit distances with p will exceed m. Step 1 now takes O(m 2 n) time since we make a D table of size (m +1) (2m + 1) for each position of x. Also, step 2 can be done in O(mn) time since we compare O(m) values at each position of x. Thus the time complexity is reduced to O(m 2 n). However, we can do better. Step 1 can be solved in O(mn) time by the algorithm due to Landau et al. [22]. Given two strings x and y and a forward (resp. backward) solution for the comparison between x and y, the algorithm in [22] incrementally computes a solution for x and by (resp. yb) ino(k) time, where b is an additional character and k is a threshold on the edit distance. This can be done due to the relationship between the solution for x and y and the solution for x and by. When k = m (i.e., the threshold is not given), we can compute all the edit distances between p and every substring of x whose length is at most 2m in O(mn) time using this algorithm. Recently, Kim and Park [20] gave a simpler O(mn)-time algorithm for the same problem. Therefore,
7 J.S. Sim et al. / Theoretical Computer Science 262 (2001) we can solve Problem 1 in O(mn) time if the edit distance is used for. When the threshold k on the edit distance is given as input for Problem 1, it can be solved in O(kn) time because each step of the above algorithm takes O(kn) time. If we use the Hamming distance for, the algorithm takes trivially O(n) time since x must be partitioned into blocks of size m. (The last partition block can be shorter than m.) Therefore, we have the following theorem. Theorem 1. Problem 1 can be solved in O(mn 2 ) time when a weighted edit distance is used for. If the edit distance (resp. the Hamming distance) is used for ; it can be solved in O(mn) time (resp. in O(n) time) Problem 2 When a relative edit distance is used for, Problem 2 can be solved in O(n 4 ) time by our algorithm for Problem 1. Let p be a candidate string for the approximate period of x. If we take each substring of x as p and apply the O(mn) algorithm for Problem 1 (that uses the algorithm in [22]), it takes O( p n) time for each p. Since there are O(n 2 ) substrings of x, the overall time is O(n 4 ). Without using the somewhat complicated algorithm in [22], however, we can solve Problem 2 in O(n 4 ) time by the following simple algorithm for weighted edit distances as well as relative edit distances. We take every substring of x as a candidate string p, and compute the minimum distance t such that p is a t-approximate period of x. But, we do this for all substrings of x that start at a position i at the same time, as follows. Let T be the minimum distance so far. Initially, T =. For i =1 to n, wedothe following. For each i, we process the n i + 1 substrings that start at position i as candidate strings. Let m be the length of a chosen substring of x as p. Initially, m =1. (1) Take x[i::i + m 1] as p and compute w hj for all 16h6j6n. The denition of w hj is the same as in Problem 1. This computation can be done by making nd tables with p and each of n suxes of x. By adding just one row to each of previous D tables (i.e., ndtables when p = x[i::i +m 2]), we can compute these new D tables in O(n 2 ) time. See Fig. 2. (Note that when m = 1, we create new D tables.) (2) Compute the minimum distance t such that p is a t-approximate period of x. This step is similar to the second step of the algorithm for Problem 1. Let t j be the minimum value such that p is a t j -approximate period of x[1::j] and let t 0 =0. For j =1 to n, we compute t j by the following formula: t j = min 06h j (max(t h;w h+1;j )): The value t n is the minimum t such that p is a t-approximate period of x. Ift n is smaller than T, we update T with t n.ifm n i + 1, increase m by 1 and go to step 1.
8 564 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 2. Computing new D tables. When all the steps are completed, the nal value of T is the minimum distance and the substring p that is a T -approximate period of x is an answer to Problem 2. Theorem 2. Problem 2 can be solved in O(n 4 ) time when a weighted edit distance or a relative edit distance is used for. When a relative Hamming distance is used for ; Problem 2 can be solved in O(n 3 ) time. Proof. For a weighted edit distance, we make n D tables in O(n 2 ) time in step 1 and compute the minimum distance in O(n 2 ) time in step 2. For m =1 to n i +1, we repeat the two steps. Therefore, it takes O(n 3 ) time for each i and the total time complexity of this algorithm is O(n 4 ). If a relative edit distance is used, this algorithm can be slightly simplied as in Problem 1, but it still takes time O(n 4 ). For a relative Hamming distance, it takes O(n) time for each candidate string and there are O(n 2 ) candidate strings. Thus, the total time complexity is O(n 3 ) Problem 3 Given a set of strings, the shortest common supersequence (SCS) problem is to nd a shortest common supersequence of all strings in the set. The SCS problem is NP-complete [25, 29]. We will show that Problem 3 is NP-complete by a reduction from the SCS problem. In this section we will call Problem 3 the AP problem (abbreviation of the approximate period problem). The decision versions of the SCS and AP problems are as follows: Denition 1. Given a positive integer m and a nite set S of strings from where is a nite alphabet, the SCS problem is to decide if there exists a common supersequence w of S such that w 6m. Denition 2. Given a number t; a string x from ( ) where is a nite alphabet, and a penalty matrix, the AP problem is to decide if there exists a string u such that u is a t-approximate period of x.
9 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 3. The penalty matrix M. Now we transform an instance of the SCS problem to an instance of the AP problem. We can assume that = {0; 1} since the SCS problem is NP-complete even if = {0; 1} [27, 29]. Assume that there are n strings s 1 ;:::;s n in S. First, we set = {a; b; #; $; 1 ; 2 ;}. Let x =# 1 m $# 2 m $#s 1 $#s 2 $ #s n $. Then, set t = m and dene the penalty matrix as in Fig. 3, where a shaded entry can be any value greater than m which makes the penalty matrix a metric. For example, every shaded entry can be m + 1. It is easy to see that this transformation can be done in polynomial time. Lemma 1. Assume that x is constructed as above. If u is an m-approximate period of x; then u is of the form #$ where {a; b} m. Proof. We rst show that u must have one # and one $. (1) Suppose that u has no # (resp. $). Clearly, there exists a partition block of x which has at least one # (resp. $), and the distance between u and the partition block is greater than m. Therefore, u must have at least one # and at least one $. (2) Suppose that u has more than one # (or $). Assume that u has two # s. (The other cases are similar.) Then u must also have two $ s, because unless the number of # s equals that of $ s in u, at least one partition block of x cannot have the same numbers of # s and $ s to those of u. If the numbers of # s and $ s in u dier from those of a partition block of x, the distance between u and the partition block exceed m due to the penalty matrix M. Consider the rst partition block of x. The rst partition block is # 1 m $# 2 m $ because it must have two # s and two $ s as u. For the distance between u and the rst partition block of x to be at most m; u must have at least m characters from { 1 ; 2 }. In such cases, however, the distance between u and at least one partition block of x will exceed m. It remains to show that u =#$ such that {a; b} m. Since u has one # and one $, x must be partitioned just after every occurrence of $. Let u be of the form #$, where ; ; {0; 1; a;b; 1 ; 2 ;}. Consider the rst two partition blocks # 1 m $
10 566 J.S. Sim et al. / Theoretical Computer Science 262 (2001) Fig. 4. An alignment of S {u}. and # 2 m $ofx. If contains i 1 s for i 1; must also have i 2 s and the remaining m 2i characters in must be from {a; b} so that the distances between u and the rst two partition blocks of x do not exceed m. However, this makes the distance between u and any other partition block of x exceed m due to 1 s and 2 s in. Hence cannot have 1 or 2. Also, cannot have any character from {0; 1; } since 0, 1 and have cost 2 with 1 and 2 in the rst two partition blocks of x. For the distances between u and the rst two partition blocks of x to be at most m, and must be empty and must be of the form {a; b} m. See Fig. 4. Theorem 3. The AP problem is NP-complete. Proof. It is easy to see that the AP problem is in NP. To show that the AP problem is NP-complete, we need to show that S has a common supersequence w such that w 6m if and only if there exists a string u such that u is an m-approximate period of x. (if) By Lemma 1, u =#$ where {a; b} m. Since u is an m-approximate period of x, the distance between u and each partition block #s i $ is at most m. (The distances between u and the rst two partition blocks # 1 m $ and # 2 m $ are always m.) Consider an alignment of S {u}. Since = m and the distance between and s i is at most m, each a (resp. b) in must be aligned with 0 (resp. 1) or in s i. (See cases (a) and (b) in Fig. 4.) If we substitute 0 for a and 1 for b in, we obtain a common supersequence w of s 1 ;:::;s n such that w = m. (Note that if a or b in is aligned with for all s i, we can delete the character in and we can obtain a common supersequence which is shorter than m. See case (c) in Fig. 4.) Asimilar alignment was used by Wang and Jiang [32].
11 J.S. Sim et al. / Theoretical Computer Science 262 (2001) (only if ) Let w be a common supersequence of S such that w 6m. Let be the string constructed by substituting a for 0 and b for1inw. (When w m, we append some characters from {a; b} to so that = m:) Partition x just after every occurrence of $. The distance between each partition block of x and #$ ism since each a (resp. b) in can be aligned with 0 (resp. 1), ; 1,or 2 in each partition block. Therefore, #$ is an m-approximate period of x. 5. Conclusion In this paper, we have introduced the notion of approximate periods and three related problems. We presented polynomial-time algorithms for two of the problems and an NP-completeness proof for the third problem. There can be a variety of problems on approximate periods. For example, the following problem is an interesting one: Given a string x and a distance function, nd a substring of x that has an approximate period. This is a problem of nding an interesting area (i.e., approximately periodic area) in a given string. References [1] A. Apostolico, D. Breslauer, An optimal O(loglogN )-time parallel algorithm for detecting all squares in a string, SIAM J. Comput. 25 (6) (1996) [2] A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in: Proc. 19th Int. Colloq. Automata Languages and Programming, Lecture Notes in Computer Science, Vol. 623, Springer, Berlin, 1992, pp [3] A. Apostolico, A. Ehrenfeucht, Ecient detection of quasiperiodicities in strings, Theoret. Comput. Sci. 119 (2) (1993) [4] A. Apostolico, M. Farach, C.S. Iliopoulos, Optimal superprimitivity testing for strings, Inform. Process. Lett. 39 (1) (1991) [5] A. Apostolico, F.P. Preparata, Optimal o-line detection of repetitions in a string, Theoret. Comput. Sci. 22 (1983) [6] O. Berkman, C.S. Iliopoulos, K. Park, The subtree max gap problem with application to parallel string covering, Inform. Comput. 123 (1) (1995) [7] D. Breslauer, An on-line string superprimitivity test, Inform. Process. Lett. 44 (1992) [8] D. Breslauer, Testing string superprimitivity in parallel, Inform. Process. Lett. 49 (5) (1994) [9] T. Crawford, C.S. Iliopoulos, R. Raman, String matching techniques for musical similarity and melodic recognition, Comput. Musicol. 11 (1998) [10] M. Crochemore, An optimal algorithm for computing the repetitions in a word, Inform. Process. Lett. 12 (5) (1981) [11] M. Crochemore, C.S. Iliopoulos, M. Korda, Two-dimensional prex string matching and covering on square matrices, Algorithmica 20 (1998) [12] M. Crochemore, C.S. Iliopoulos, H. Yu, Algorithms for computing evolutionary chains in molecular and musical sequences, Proc. 9th Austral. Workshop on Combinatorial Algorithms, 1998, pp [13] M. Crochemore, W. Rytter, Text Algorithms, Oxford University Press, Oxford, [14] C.S. Iliopoulos, M. Korda, Optimal parallel superprimitivity testing on square arrays, Parallel Process. Lett. 6 (3) (1996) [15] C.S. Iliopoulos, D.W.G. Moore, K. Park, Covering a string, Algorithmica 16 (1996) [16] C.S. Iliopoulos, L. Mouchard, An O(nlogn) algorithm for computing all maximal quasiperiodicities in strings, in: Proc. Computing: Australasian Theory Symposium, Lecture Notes in Computer Science, Springer, Berlin, 1999, pp
12 568 J.S. Sim et al. / Theoretical Computer Science 262 (2001) [17] C.S. Iliopoulos, K. Park, An optimal O(log logn)-time algorithm for parallel superprimitivity testing, J. Korea Inform. Sci. Soc. 21 (1994) [18] C.S. Iliopoulos, K. Park, Awork-time optimal algorithm for computing all string covers, Theoret. Comput. Sci. 164 (1996) [19] C.S. Iliopoulos, W.F. Smyth, On-line algorithms for k-covering, Proc. 9th Austral. Workshop on Combinatorial Algorithms, 1998, pp [20] S. Kim, K. Park, Adynamic edit distance table, in: Proc. 11th Symp. Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 1848, Springer, Berlin, 2000, pp [21] D.E. Knuth, J.H. Morris, V.R. Pratt, Fast pattern matching in strings, SIAM J. Comput. 6 (1) (1977) [22] G.M. Landau, E.W. Myers, J.P. Schmidt, Incremental string comparison, SIAM J. Comput. 27 (2) (1998) [23] G.M. Landau, J.P. Schmidt, An algorithm for approximate tandem repeats, in: Proc. 4th Symp. Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 648, Springer, Berlin, 1993, pp [24] Y. Li, W.F. Smyth, An optimal on-line algorithm to compute all the covers of a string, preprint. [25] D. Maier, The complexity of some problems on subsequences and supersequences, J. ACM 25 (2) (1978) [26] M.G. Main R.J. Lorentz, An algorithm for nding all repetitions in a string, J. Algorithms 5 (1984) [27] M. Middendorf, More on the complexity of common superstring and supersequence problems, Theoret. Comput. Sci. 125 (2) (1994) [28] D. Moore, W.F. Smyth, Acorrection to An optimal algorithm to compute all the covers of a string, Inform. Process. Lett. 54 (2) (1995) [29] K.J. Raiha, E. Ukkonen, The shortest common supersequence problem over binary alphabet is NP-complete, Theoret. Comput. Sci. 16 (1981) [30] J.P. Schmidt, All highest scoring paths in weighted grid graphs and its application to nding all approximate repeats in strings, SIAM J. Comput. 27 (4) (1998) [31] P.H. Sellers, Pattern recognition genetic sequences by mismatch density, Bull. Math. Biol. 46 (4) (1984) [32] L. Wang, T. Jiang, On the complexity of multiple sequence alignment, J. Comput. Biol. 1 (4) (1994)
Implementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationComputing repetitive structures in indeterminate strings
Computing repetitive structures in indeterminate strings Pavlos Antoniou 1, Costas S. Iliopoulos 1, Inuka Jayasekera 1, Wojciech Rytter 2,3 1 Dept. of Computer Science, King s College London, London WC2R
More informationFinding all covers of an indeterminate string in O(n) time on average
Finding all covers of an indeterminate string in O(n) time on average Md. Faizul Bari, M. Sohel Rahman, and Rifat Shahriyar Department of Computer Science and Engineering Bangladesh University of Engineering
More informationOPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS
Parallel Processing Letters, c World Scientific Publishing Company OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS COSTAS S. ILIOPOULOS Department of Computer Science, King s College London,
More informationMURDOCH RESEARCH REPOSITORY This is the author s final version of the work, as accepted for publication following peer review but without the publisher s layout or pagination. The definitive version is
More informationOptimal Superprimitivity Testing for Strings
Optimal Superprimitivity Testing for Strings Alberto Apostolico Martin Farach Costas S. Iliopoulos Fibonacci Report 90.7 August 1990 - Revised: March 1991 Abstract A string w covers another string z if
More information(Preliminary Version)
Relations Between δ-matching and Matching with Don t Care Symbols: δ-distinguishing Morphisms (Preliminary Version) Richard Cole, 1 Costas S. Iliopoulos, 2 Thierry Lecroq, 3 Wojciech Plandowski, 4 and
More informationA fast algorithm to generate necklaces with xed content
Theoretical Computer Science 301 (003) 477 489 www.elsevier.com/locate/tcs Note A fast algorithm to generate necklaces with xed content Joe Sawada 1 Department of Computer Science, University of Toronto,
More informationOverlapping factors in words
AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 57 (2013), Pages 49 64 Overlapping factors in words Manolis Christodoulakis University of Cyprus P.O. Box 20537, 1678 Nicosia Cyprus christodoulakis.manolis@ucy.ac.cy
More informationMURDOCH RESEARCH REPOSITORY Authors Version Iliopoulos, C.S., Moore, D. and Smyth, W.. (1998) The covers of a circular ibonacci string. Journal of Combinatorial Mathematics and Combinatorial Computing,
More informationNumber of occurrences of powers in strings
Author manuscript, published in "International Journal of Foundations of Computer Science 21, 4 (2010) 535--547" DOI : 10.1142/S0129054110007416 Number of occurrences of powers in strings Maxime Crochemore
More informationProperties of Fibonacci languages
Discrete Mathematics 224 (2000) 215 223 www.elsevier.com/locate/disc Properties of Fibonacci languages S.S Yu a;, Yu-Kuang Zhao b a Department of Applied Mathematics, National Chung-Hsing University, Taichung,
More informationSorting suffixes of two-pattern strings
Sorting suffixes of two-pattern strings Frantisek Franek W. F. Smyth Algorithms Research Group Department of Computing & Software McMaster University Hamilton, Ontario Canada L8S 4L7 April 19, 2004 Abstract
More informationConsensus Optimizing Both Distance Sum and Radius
Consensus Optimizing Both Distance Sum and Radius Amihood Amir 1, Gad M. Landau 2, Joong Chae Na 3, Heejin Park 4, Kunsoo Park 5, and Jeong Seop Sim 6 1 Bar-Ilan University, 52900 Ramat-Gan, Israel 2 University
More informationString Regularities and Degenerate Strings
M.Sc. Engg. Thesis String Regularities and Degenerate Strings by Md. Faizul Bari Submitted to Department of Computer Science and Engineering in partial fulfilment of the requirments for the degree of Master
More informationAverage Complexity of Exact and Approximate Multiple String Matching
Average Complexity of Exact and Approximate Multiple String Matching Gonzalo Navarro Department of Computer Science University of Chile gnavarro@dcc.uchile.cl Kimmo Fredriksson Department of Computer Science
More informationVarieties of Regularities in Weighted Sequences
Varieties of Regularities in Weighted Sequences Hui Zhang 1, Qing Guo 2 and Costas S. Iliopoulos 3 1 College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, Zhejiang 310023,
More informationEfficient (δ, γ)-pattern-matching with Don t Cares
fficient (δ, γ)-pattern-matching with Don t Cares Yoan José Pinzón Ardila Costas S. Iliopoulos Manolis Christodoulakis Manal Mohamed King s College London, Department of Computer Science, London WC2R 2LS,
More informationOn the number of occurrences of all short factors in almost all words
Theoretical Computer Science 290 (2003) 2031 2035 www.elsevier.com/locate/tcs Note On the number of occurrences of all short factors in almost all words Ioan Tomescu Faculty of Mathematics, University
More informationUniversity of Waterloo. W. F. Smyth. McMaster University. Curtin University of Technology
WEAK REPETITIONS IN STRINGS L. J. Cummings Faculty of Mathematics University of Waterloo W. F. Smyth Department of Computer Science & Systems McMaster University School of Computing Curtin University of
More informationString Regularities and Degenerate Strings
M. Sc. Thesis Defense Md. Faizul Bari (100705050P) Supervisor: Dr. M. Sohel Rahman String Regularities and Degenerate Strings Department of Computer Science and Engineering Bangladesh University of Engineering
More informationA Lower-Variance Randomized Algorithm for Approximate String Matching
A Lower-Variance Randomized Algorithm for Approximate String Matching Mikhail J. Atallah Elena Grigorescu Yi Wu Department of Computer Science Purdue University West Lafayette, IN 47907 U.S.A. {mja,egrigore,wu510}@cs.purdue.edu
More informationEfficient High-Similarity String Comparison: The Waterfall Algorithm
Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)
More informationSORTING SUFFIXES OF TWO-PATTERN STRINGS.
International Journal of Foundations of Computer Science c World Scientific Publishing Company SORTING SUFFIXES OF TWO-PATTERN STRINGS. FRANTISEK FRANEK and WILLIAM F. SMYTH Algorithms Research Group,
More informationSIMPLE ALGORITHM FOR SORTING THE FIBONACCI STRING ROTATIONS
SIMPLE ALGORITHM FOR SORTING THE FIBONACCI STRING ROTATIONS Manolis Christodoulakis 1, Costas S. Iliopoulos 1, Yoan José Pinzón Ardila 2 1 King s College London, Department of Computer Science London WC2R
More informationCompressed Index for Dynamic Text
Compressed Index for Dynamic Text Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu Abstract This paper investigates how to index a text which is subject to updates. The best solution
More informationA GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS *
A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * 1 Jorma Tarhio and Esko Ukkonen Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki,
More informationNote On Parikh slender context-free languages
Theoretical Computer Science 255 (2001) 667 677 www.elsevier.com/locate/tcs Note On Parikh slender context-free languages a; b; ; 1 Juha Honkala a Department of Mathematics, University of Turku, FIN-20014
More informationOn Pattern Matching With Swaps
On Pattern Matching With Swaps Fouad B. Chedid Dhofar University, Salalah, Oman Notre Dame University - Louaize, Lebanon P.O.Box: 2509, Postal Code 211 Salalah, Oman Tel: +968 23237200 Fax: +968 23237720
More informationNote Watson Crick D0L systems with regular triggers
Theoretical Computer Science 259 (2001) 689 698 www.elsevier.com/locate/tcs Note Watson Crick D0L systems with regular triggers Juha Honkala a; ;1, Arto Salomaa b a Department of Mathematics, University
More informationOn the Number of Distinct Squares
Frantisek (Franya) Franek Advanced Optimization Laboratory Department of Computing and Software McMaster University, Hamilton, Ontario, Canada Invited talk - Prague Stringology Conference 2014 Outline
More informationarxiv: v1 [cs.ds] 2 Dec 2009
Variants of Constrained Longest Common Subsequence arxiv:0912.0368v1 [cs.ds] 2 Dec 2009 Paola Bonizzoni Gianluca Della Vedova Riccardo Dondi Yuri Pirola Abstract In this work, we consider a variant of
More informationDecision Problems Concerning. Prime Words and Languages of the
Decision Problems Concerning Prime Words and Languages of the PCP Marjo Lipponen Turku Centre for Computer Science TUCS Technical Report No 27 June 1996 ISBN 951-650-783-2 ISSN 1239-1891 Abstract This
More informationHow many double squares can a string contain?
How many double squares can a string contain? F. Franek, joint work with A. Deza and A. Thierry Algorithms Research Group Department of Computing and Software McMaster University, Hamilton, Ontario, Canada
More informationChapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],
Chapter 1 Comparison-Sorting and Selecting in Totally Monotone Matrices Noga Alon Yossi Azar y Abstract An mn matrix A is called totally monotone if for all i 1 < i 2 and j 1 < j 2, A[i 1; j 1] > A[i 1;
More informationEfficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem
Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Hsing-Yen Ann National Center for High-Performance Computing Tainan 74147, Taiwan Chang-Biau Yang and Chiou-Ting
More informationString Range Matching
String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings
More informationMinimal enumerations of subsets of a nite set and the middle level problem
Discrete Applied Mathematics 114 (2001) 109 114 Minimal enumerations of subsets of a nite set and the middle level problem A.A. Evdokimov, A.L. Perezhogin 1 Sobolev Institute of Mathematics, Novosibirsk
More informationThe number of runs in Sturmian words
The number of runs in Sturmian words Paweł Baturo 1, Marcin Piatkowski 1, and Wojciech Rytter 2,1 1 Department of Mathematics and Computer Science, Copernicus University 2 Institute of Informatics, Warsaw
More informationOptimal Superprimitivity Testing for Strings
Purdue University Purdue e-pubs Computer Science Technical Reports Department of Computer Science 1990 Optimal Superprimitivity Testing for Strings Alberto Apostolico Martin Farach Costas S. Iliopoulos
More informationSUFFIX TREE. SYNONYMS Compact suffix trie
SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix
More informationState Complexity of Neighbourhoods and Approximate Pattern Matching
State Complexity of Neighbourhoods and Approximate Pattern Matching Timothy Ng, David Rappaport, and Kai Salomaa School of Computing, Queen s University, Kingston, Ontario K7L 3N6, Canada {ng, daver, ksalomaa}@cs.queensu.ca
More informationApproximate Binary Search Algorithms for Mean Cuts and Cycles
Approximate Binary Search Algorithms for Mean Cuts and Cycles S. Thomas McCormick Faculty of Commerce and Business Administration University of British Columbia Vancouver, BC V6T 1Z2 Canada June 1992,
More informationCounting and Verifying Maximal Palindromes
Counting and Verifying Maximal Palindromes Tomohiro I 1, Shunsuke Inenaga 2, Hideo Bannai 1, and Masayuki Takeda 1 1 Department of Informatics, Kyushu University 2 Graduate School of Information Science
More informationAn efficient alignment algorithm for masked sequences
Theoretical Computer Science 370 (2007) 19 33 www.elsevier.com/locate/tcs An efficient alignment algorithm for masked sequences Jin Wook Kim, Kunsoo Park School of Computer Science and Engineering, Seoul
More informationRestricted b-matchings in degree-bounded graphs
Egerváry Research Group on Combinatorial Optimization Technical reports TR-009-1. Published by the Egerváry Research Group, Pázmány P. sétány 1/C, H1117, Budapest, Hungary. Web site: www.cs.elte.hu/egres.
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider
More informationA shrinking lemma for random forbidding context languages
Theoretical Computer Science 237 (2000) 149 158 www.elsevier.com/locate/tcs A shrinking lemma for random forbidding context languages Andries van der Walt a, Sigrid Ewert b; a Department of Mathematics,
More informationMore Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?
More informationShortest Common Superstrings and Scheduling. with Coordinated Starting Times. Institut fur Angewandte Informatik. und Formale Beschreibungsverfahren
Shortest Common Superstrings and Scheduling with Coordinated Starting Times Martin Middendorf Institut fur Angewandte Informatik und Formale Beschreibungsverfahren D-76128 Universitat Karlsruhe, Germany
More informationText Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des
Text Searching Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, du Traitement de l Information et des Systèmes. International PhD School in Formal Languages and Applications Tarragona,
More informationNP-hardness of the stable matrix in unit interval family problem in discrete time
Systems & Control Letters 42 21 261 265 www.elsevier.com/locate/sysconle NP-hardness of the stable matrix in unit interval family problem in discrete time Alejandra Mercado, K.J. Ray Liu Electrical and
More informationTheoretical Computer Science. Efficient algorithms to compute compressed longest common substrings and compressed palindromes
Theoretical Computer Science 410 (2009) 900 913 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Efficient algorithms to compute compressed
More informationLecture 14 - P v.s. NP 1
CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation
More informationAn on{line algorithm is presented for constructing the sux tree for a. the desirable property of processing the string symbol by symbol from left to
(To appear in ALGORITHMICA) On{line construction of sux trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN{00014 University of Helsinki,
More informationFinding a longest common subsequence between a run-length-encoded string and an uncompressed string
Journal of Complexity 24 (2008) 173 184 www.elsevier.com/locate/jco Finding a longest common subsequence between a run-length-encoded string and an uncompressed string J.J. Liu a, Y.L. Wang b,, R.C.T.
More informationString Matching with Variable Length Gaps
String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length
More informationIncluding Interval Encoding into Edit Distance Based Music Comparison and Retrieval
Including Interval Encoding into Edit Distance Based Music Comparison and Retrieval Kjell Lemström; Esko Ukkonen Department of Computer Science; P.O.Box 6, FIN-00014 University of Helsinki, Finland {klemstro,ukkonen}@cs.helsinki.fi
More informationOn-line String Matching in Highly Similar DNA Sequences
On-line String Matching in Highly Similar DNA Sequences Nadia Ben Nsira 1,2,ThierryLecroq 1,,MouradElloumi 2 1 LITIS EA 4108, Normastic FR3638, University of Rouen, France 2 LaTICE, University of Tunis
More informationThe Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization
Department of Computer Science Series of Publications C Report C-2004-2 The Complexity of Maximum Matroid-Greedoid Intersection and Weighted Greedoid Maximization Taneli Mielikäinen Esko Ukkonen University
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationBOUNDS ON ZIMIN WORD AVOIDANCE
BOUNDS ON ZIMIN WORD AVOIDANCE JOSHUA COOPER* AND DANNY RORABAUGH* Abstract. How long can a word be that avoids the unavoidable? Word W encounters word V provided there is a homomorphism φ defined by mapping
More informationTHE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS
THE MAXIMAL SUBGROUPS AND THE COMPLEXITY OF THE FLOW SEMIGROUP OF FINITE (DI)GRAPHS GÁBOR HORVÁTH, CHRYSTOPHER L. NEHANIV, AND KÁROLY PODOSKI Dedicated to John Rhodes on the occasion of his 80th birthday.
More informationIndependence of certain quantities indicating subword occurrences
Theoretical Computer Science 362 (2006) 222 231 wwwelseviercom/locate/tcs Independence of certain quantities indicating subword occurrences Arto Salomaa Turku Centre for Computer Science, Lemminkäisenkatu
More informationarxiv: v1 [cs.dm] 12 Jun 2016
A Simple Extension of Dirac s Theorem on Hamiltonicity Yasemin Büyükçolak a,, Didem Gözüpek b, Sibel Özkana, Mordechai Shalom c,d,1 a Department of Mathematics, Gebze Technical University, Kocaeli, Turkey
More informationThe Intractability of Computing the Hamming Distance
The Intractability of Computing the Hamming Distance Bodo Manthey and Rüdiger Reischuk Universität zu Lübeck, Institut für Theoretische Informatik Wallstraße 40, 23560 Lübeck, Germany manthey/reischuk@tcs.uni-luebeck.de
More informationThe limitedness problem on distance automata: Hashiguchi s method revisited
Theoretical Computer Science 310 (2004) 147 158 www.elsevier.com/locate/tcs The limitedness problem on distance automata: Hashiguchi s method revisited Hing Leung, Viktor Podolskiy Department of Computer
More informationOutline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009
Outline Approximation: Theory and Algorithms The Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 1 Nikolaus Augsten (DIS) Approximation: Theory and
More informationarxiv: v1 [cs.cc] 4 Feb 2008
On the complexity of finding gapped motifs Morris Michael a,1, François Nicolas a, and Esko Ukkonen a arxiv:0802.0314v1 [cs.cc] 4 Feb 2008 Abstract a Department of Computer Science, P. O. Box 68 (Gustaf
More informationOn-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins.
On-line Bin-Stretching Yossi Azar y Oded Regev z Abstract We are given a sequence of items that can be packed into m unit size bins. In the classical bin packing problem we x the size of the bins and try
More informationAlgorithms for Three Versions of the Shortest Common Superstring Problem
Algorithms for Three Versions of the Shortest Common Superstring Problem Maxime Crochemore, Marek Cygan, Costas Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen King s College
More informationMinimizing the size of an identifying or locating-dominating code in a graph is NP-hard
Theoretical Computer Science 290 (2003) 2109 2120 www.elsevier.com/locate/tcs Note Minimizing the size of an identifying or locating-dominating code in a graph is NP-hard Irene Charon, Olivier Hudry, Antoine
More informationTheoretical Computer Science
Theoretical Computer Science 410 (2009) 2759 2766 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Note Computing the longest topological
More informationMotif Extraction from Weighted Sequences
Motif Extraction from Weighted Sequences C. Iliopoulos 1, K. Perdikuri 2,3, E. Theodoridis 2,3,, A. Tsakalidis 2,3 and K. Tsichlas 1 1 Department of Computer Science, King s College London, London WC2R
More informationOutline. Similarity Search. Outline. Motivation. The String Edit Distance
Outline Similarity Search The Nikolaus Augsten nikolaus.augsten@sbg.ac.at Department of Computer Sciences University of Salzburg 1 http://dbresearch.uni-salzburg.at WS 2017/2018 Version March 12, 2018
More informationApproximation: Theory and Algorithms
Approximation: Theory and Algorithms The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 6, 2009 Nikolaus Augsten (DIS) Approximation:
More informationDD2446 Complexity Theory: Problem Set 4
DD2446 Complexity Theory: Problem Set 4 Due: Friday November 8, 2013, at 23:59. Submit your solutions as a PDF le by e-mail to jakobn at kth dot se with the subject line Problem set 4: your full name.
More informationLifting to non-integral idempotents
Journal of Pure and Applied Algebra 162 (2001) 359 366 www.elsevier.com/locate/jpaa Lifting to non-integral idempotents Georey R. Robinson School of Mathematics and Statistics, University of Birmingham,
More informationOn shredders and vertex connectivity augmentation
On shredders and vertex connectivity augmentation Gilad Liberman The Open University of Israel giladliberman@gmail.com Zeev Nutov The Open University of Israel nutov@openu.ac.il Abstract We consider the
More informationAnalysis and Design of Algorithms Dynamic Programming
Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................
More informationPeriodicity in Rectangular Arrays
Periodicity in Rectangular Arrays arxiv:1602.06915v2 [cs.dm] 1 Jul 2016 Guilhem Gamard LIRMM CNRS, Univ. Montpellier UMR 5506, CC 477 161 rue Ada 34095 Montpellier Cedex 5 France guilhem.gamard@lirmm.fr
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationTheoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with
More informationSimilarity Search. The String Edit Distance. Nikolaus Augsten. Free University of Bozen-Bolzano Faculty of Computer Science DIS. Unit 2 March 8, 2012
Similarity Search The String Edit Distance Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 2 March 8, 2012 Nikolaus Augsten (DIS) Similarity Search Unit 2 March 8,
More informationConstant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool
Constant-Space String-Matching in Sublinear Average Tie (Extended Abstract) Maxie Crocheore Universite de Marne-la-Vallee Leszek Gasieniec y Max-Planck Institut fur Inforatik Wojciech Rytter z Warsaw University
More informationTheoretical Computer Science. Why greed works for shortest common superstring problem
Theoretical Computer Science 410 (2009) 5374 5381 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Why greed works for shortest common
More informationarxiv: v2 [cs.ds] 5 Mar 2014
Order-preserving pattern matching with k mismatches Pawe l Gawrychowski 1 and Przemys law Uznański 2 1 Max-Planck-Institut für Informatik, Saarbrücken, Germany 2 LIF, CNRS and Aix-Marseille Université,
More informationIt is well-known (cf. [2,4,5,9]) that the generating function P w() summed over all tableaux of shape = where the parts in row i are at most a i and a
Counting tableaux with row and column bounds C. Krattenthalery S. G. Mohantyz Abstract. It is well-known that the generating function for tableaux of a given skew shape with r rows where the parts in the
More informationComputability and Complexity
Computability and Complexity Non-determinism, Regular Expressions CAS 705 Ryszard Janicki Department of Computing and Software McMaster University Hamilton, Ontario, Canada janicki@mcmaster.ca Ryszard
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationCounting Palindromic Binary Strings Without r-runs of Ones
1 3 47 6 3 11 Journal of Integer Sequences, Vol. 16 (013), Article 13.8.7 Counting Palindromic Binary Strings Without r-runs of Ones M. A. Nyblom School of Mathematics and Geospatial Science RMIT University
More informationarxiv: v1 [cs.dm] 18 May 2016
A note on the shortest common superstring of NGS reads arxiv:1605.05542v1 [cs.dm] 18 May 2016 Tristan Braquelaire Marie Gasparoux Mathieu Raffinot Raluca Uricaru May 19, 2016 Abstract The Shortest Superstring
More informationMaximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017
Maximal Unbordered Factors of Random Strings arxiv:1704.04472v1 [cs.ds] 14 Apr 2017 Patrick Hagge Cording 1 and Mathias Bæk Tejs Knudsen 2 1 DTU Compute, Technical University of Denmark, phaco@dtu.dk 2
More informationarxiv: v1 [cs.ds] 15 Feb 2012
Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl
More informationSimilarity Search. The String Edit Distance. Nikolaus Augsten.
Similarity Search The String Edit Distance Nikolaus Augsten nikolaus.augsten@sbg.ac.at Dept. of Computer Sciences University of Salzburg http://dbresearch.uni-salzburg.at Version October 18, 2016 Wintersemester
More informationMinimal and Maximal Critical Sets in Room Squares
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 1996 Minimal and Maximal Critical Sets in Room Squares Ghulam Rasool Chaudhry
More informationOn the reconstruction of the degree sequence
Discrete Mathematics 259 (2002) 293 300 www.elsevier.com/locate/disc Note On the reconstruction of the degree sequence Charles Delorme a, Odile Favaron a, Dieter Rautenbach b;c; ;1 a LRI, Bât. 490, Universite
More informationOn the Complexity of Budgeted Maximum Path Coverage on Trees
On the Complexity of Budgeted Maximum Path Coverage on Trees H.-C. Wirth An instance of the budgeted maximum coverage problem is given by a set of weighted ground elements and a cost weighted family of
More informationGreedy Conjecture for Strings of Length 4
Greedy Conjecture for Strings of Length 4 Alexander S. Kulikov 1, Sergey Savinov 2, and Evgeniy Sluzhaev 1,2 1 St. Petersburg Department of Steklov Institute of Mathematics 2 St. Petersburg Academic University
More informationUpper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139
Upper and Lower Bounds on the Number of Faults a System Can Withstand Without Repairs Michel Goemans y Nancy Lynch z Isaac Saias x Laboratory for Computer Science Massachusetts Institute of Technology
More information