Average Case Analysis of the Boyer-Moore Algorithm

Size: px
Start display at page:

Download "Average Case Analysis of the Boyer-Moore Algorithm"

Transcription

1 Average Case Analysis of the Boyer-Moore Algorithm TSUNG-HSI TSAI Institute of Statistical Science Academia Sinica Taipei 115 Taiwan URL: November 19, 2003 Abstract Some limit theorems (including a Berry-Esseen bound) are derived for the number of comparisons taken by the Boyer-Moore algorithm for finding the occurrences of a given pattern in a random Markovian text. Previously, only special variants of this algorithm have been analyzed. We also propose means of computing the limiting constants for the mean and the variance. 1 Introduction String match is a huge area with a wide literature and a large number of applications. The basic problem is to identify all occurrences of a given pattern in a large text. We study in this paper the most widely used string-matching algorithms, the Boyer-Moore (BM) algorithm and its simplified version, the Boyer- Moore-Horspool (BMH) algorithm. We are concerned with the probabilistic behavior of the algorithms, as well as the evaluation of the constants in the limit theorems. Applications of string match are found in many communicative medium, including language processing, graphics, videos and sound. Typical applications include: key-word searches (or replacement) in any text editing software, the use of search engines, phone number lookup, search in databases like gene-bank, etc. Although the use of exact string match in computational biology may be limited, exact string match is the prototype, both in algorithms and in analysis, of more useful approximate string match and hidden string search algorithms; see Flajolet et al. [9]. Throughout the paper, A is a set of q characters, T is a text of n characters from A and S is a pattern of m characters from A. We assume that q 2 and n m. Although several factors determine the running time of a program, we deal in this paper only with the number of character comparisons; other cost measures may be similarly analyzed. We denote by C n = C n (A) the number of comparisons used by algorithm A. The most naive algorithm for finding all occurrences of a pattern in a text consists in comparing consecutively the characters of the pattern to each m-length substring of the text. The procedure moves to 1

2 the next m-length substring as soon as a mismatch occurs. This simple algorithm has C n = m(n m+1) in the worst case (when all pairwise comparisons have been executed). The Knuth-Morris-Pratt (1977) algorithm, which is based on building up an automaton from the pattern, avoids comparing the match part in the text to the pattern more than once, improves the worstcase to C n 2n m (see Cole [8]), the average-case of C n /n being close to 1 + 1/q (see Régnier [13]). However, it has been observed that in many applications, the average value of C n /n for several algorithms is less than one (see [12] for recent experiments). Among them, the BM algorithm is considered to be the most efficient algorithm (see Lecrocq s Handbook of Exact String-Matching Algorithms [6]). Despite its popularity in diverse applications, probabilistic analysis of the BM algorithms has not received much attention in the literature. Most known probabilistic results are for the BMH algorithm. We first summarize some related results and then present our new ones. Baeza-Yates, Gonnet and Régnier (1990 & 1992) applied an analytic approach to the average-case analysis of the BMH algorithm. Under the assumption that the text is independent and identically distributed (iid), they derived an exact expression for the linearity constant µ = lim n E (C n /n). For earlier results, see Barth (1984) and Régnier (1989). The first distributional result for string-matching algorithms was derived by Mahmoud, Smythe and Régnier (1997). They proved the asymptotic normality of C n via analytic methods (the text being iid) for the BMH algorithm, C n µn Bn D N (0, 1), where D denotes convergence in distribution. However, the value of B is difficult to be computed. Smythe (2001) obtained the same asymptotic normality result but for Markovian input by a completely probabilistic approach (via theory of Markov chains). A simple expression for the linearity constant µ was also derived, but the state space of his Markov chain seems difficult to be further simplified and the evaluation of B still remains unclear. We propose a similar but simpler Markov-chain approach; we consider the open windows together with a distance counter as a Markov chain and derive a Berry-Esseen bound, the law of the iterated logarithm and a better estimate for the mean for C n with explicit determination of µ and B under the assumption that the text is iid and the pattern is fixed. One major drawback of the usual Markovian approach is that the computational complexity grows exponentially when the pattern length increases. Our main contribution is to construct a state space with a growth rate of O(m 2 ). We thus are able to compute the linearity constant µ for the BM algorithm when the pattern length is of reasonable size. With the construction of the state space, we can also compute the exact value of B for both the BM and the BMH algorithms. All computations can be efficiently done by mathematical softwares (like Maple); numeric results are presented in Section 5. 2 Description of the algorithms For completeness, we describe the BM algorithm in this section; see Boyer et al. [5] or Lecroq [6] for more details. 2

3 Denote the text by T = t 1 t 2 t n and the pattern by S = s m s m 1 s 1 ; note that in the paper we index the pattern from right to left because BM type algorithms scan in that direction inside each comparing windows. The BM algorithm We begin by aligning S under the first open window t 1 t m. Start with s 1 and compare each pattern character to the corresponding character in the open window from right to left, until a mismatch is detected or when the left end of the window is reached (an occurrence is found if all m characters match). Then, S is shifted to the right and a new window in the text is opened. The distance shifted (from the current open window to the next) is determined by the following two rules. Bad-character rule. For all a A, let B(a) := mink 1 : s 1+k = a}. Good-suffix rule. For all 1 i m, let [s G(i) := min k 1 : maxm,i 1+k}, s 1+k ] matches exactly to a suffix of [s i 1, s 1 ] and s i+k s i if i + k m }, where [s i, s j ], i j denotes the sequence s i s i 1 s j and min } = m. If there are i 1 matches, the ith pair is a mismatch and a is the mismatch character in the text, then the distance shifted is maxg(i), B(a) i + 1}, and we call G(i) the good-suffix shift and B(a) i + 1 the bad-character shift. If all m characters match, then the distance shifted is G(m). The BMH algorithm This is a simplification of the BM algorithm using only the bad-character rule. The distance shifted is B(a), where a is the right-most character of the open window. When the first comparison yields a mismatch, i.e. i = 1, the bad-character shift dominates the goodsuffix shift, and thus the distances shifted by the BM and BMH algorithm are identical. When q is large (as in Chinese language), it becomes more likely that a mismatch happens right at the first comparison in an open window. In this case, the performance of BM and BMH algorithms are comparable. We give an example to demonstrate both algorithms. Suppose A = a, b, c} and S = abaa. Then G(1) = 2, G(2) = 1, G(3) = 3, G(4) = 3, B(a) = 1, B(b) = 2, B(c) = 4. 3

4 Suppose T = acabaacca. The first three shifting of the BM algorithm (left) and the BMH algorithm (right) are shown separately as follows. a c c c a a c c c a In the first window, i = 1 and the mismatch character in text is b, so the distance shifted is 2 for both BM and BMH algorithms. In the second window, all four characters match (an occurrence is found), so the distance shifted is G(4) = 3 for BM algorithm and is B(a) = 1 for BMH algorithm. The next windows for BM and BMH algorithms are different. For BM algorithm, i = 2 and the mismatch character in text is c, so the distance shifted is maxg(2), B(c) 2 + 1} = 3. For BMH, the right-most character of the window in text is c, so the distance shifted is B(c) = 4. 3 A Markovian approach Assume that the text is iid and the pattern is fixed. Let W j be the j th open window in the processing of the algorithm. Obviously, W j } is a Markov chain with the state space A m. Let g(w j ) be the distance to shift at W j and f(w j ) be the number of comparisons taken inside the window W j. Let N n be one plus the total number of shifts } k k+1 N n = 1 + max k : g(w j ) n m < g(w j ). (1) We now define the pair of random variables j=1 W n (W Nn, Z n ), where Z n = n m N n 1 j=1 g(w j ). The random variable Z n counts the distance between n and the position of the the current window. It is also obvious that W n } n m is a Markov chain with the state space A m 0, 1,..., m 1}. Note that not every state is visited by the algorithms. Reduce the size of the state space We now propose a means of reducing the size of the state space by properly combining states. Let Γ = D 1,..., D K } be a partition of A m. To combine the states and also to preserve the values of g and f, we require that Γ satisfies the following two conditions. C1 Every element in the same class gives the same value under f and the same value under g. C2 New transition probability is well-defined, that is, for all D i, D j Γ, j=1 p(a, D j ) = p(b, D j ) for all a, b D i, where p(a, D) is the transition probability from the state a into the class D. 4

5 With the conditions C1 and C2, W j } can be restricted naturally to the state space Γ. Now, we construct a partition of A m satisfying the conditions C1 and C2. We need a few more notations. Recall that S = s m s 1. Here, we use the symbol x to denote any character in A and a 1,..., a k to denote a character in A different from a 1,..., a k. We write s i+l 1 s i for the class of strings of the form m l }} x x l }} s i+l 1 s i+1 s i, and a 1,..., a k s i+l 2 s i for the class of strings of the form m l l }}}} x x a 1,..., a k s i+l 2 s i+1 s i, where 1 i m and 1 l m i + 1. We call l the length of s i+l 1 s i or a 1,..., a k s i+l 2 s i. Let Γ i = s m s i, s m s i, s m 1 s i,..., s i+1 s i }, for 1 i m. Let Γ h be the finest disjoint partition of A m decomposed by the classes in Γ i } and the class s m,..., s 2, s 1. Note that s m s 1 Γ h and all classes in Γ h are of the form s m,..., s 2, s 1, s m s i or a 1,..., a k s i+l 2 s i, where the set a i } consists of those s j s with s j 1 s j l+1 = s i+l 2 s i. Lemma 1. The partition Γ h satisfies the conditions C1 and C2 with respect to the BMH algorithm. Proof. The right most character of each class determines the value of g function except for s m,..., s 2, s 1, which gives the value m. Thus each class of Γ h has only one value under the shift function g. We need to show that each class of Γ h has only one value under the function f. Let α Γ h. It is sufficient to examine the following two cases. (i) Suppose α = s m s i. It is trivial when i = 1. Suppose i > 1. If s m s i s m i+1 s 1, then there is at least one mismatch and thus α has only one value under f. If s m s i = s m i+1 s 1 then s m s 1 α, contrary to the disjointness in Γ h. (ii) Suppose α = a 1,..., a k s i+l 2 s i. The same argument applies to the case s i+l 2 s i s l 1 s 1. If s i+l 2 s i = s l 1 s 1, then s l a 1,..., a k } and this forces a 1,..., a k to be a mismatch. Thus α has only one value under f and this completes the proof of condition C1. Let α, β Γ h with length l α and l β, respectively. If l β g(α), then condition C2 holds trivially. Suppose l β > g(α) and β = s m s m lβ +1 or b 1,..., b κ s j+lβ 2 s j. Let β 1 be the g(α)-length suffix of β and β 2 be the (l β g(α))-length prefix of β, namely see the alignment below β 1 = s m lβ +g(α) s m lβ +1 (s j+g(α) 1 s j ), β 2 = s m s m lβ +g(α)+1 (b 1,..., b κ s j+l 2 s j+g(α) ); l }} α α l β g(α) }} β 2 5 g(α) }} t g(α) }} β 1

6 where t is the g(α)-length random text. To show condition C2, we need to show that either α β 2 = or α β 2. (2) The relations (2) are easily checked when l β g(α) < l α. Suppose l β g(α) > l α. Clearly, α β 2. We claim that α β 2 =. Let β 3 be the l α -length suffix of β 2. If α β 3 then there is at least one mismatch and thus α β 2 =. If α = β 3 then β 2 α for some β 2 Γ h, contrary to the disjointness in Γ h. Suppose l β g(α) = l α. It is also easy to check (2) when α = s m s m lα+1. Suppose α = a 1,..., a k s i+lα 2 s i. Let α 1 be the (l α 1)-length suffix of α and β 4 be the (l β g(α) 1)-length suffix of β 2, namely α 1 = s i+lα 2 s i, β 4 = s m 1 s m lβ +g(α)+1 (s j+l 2 s j+g(α) ). If β 4 α 1 then α β 2 =. Suppose β 4 = α 1. There are two further cases. (i) If β = s m s m lβ +1, then s m a 1,..., a k }, by the definition of a i }. Thus, we have α β 2 =. (ii) If β = b 1,..., b κ s j+lβ 2 s j, then the same reasoning as above gives b 1,..., b κ } a 1,..., a k }. Thus we have α β 2. This completes the proof of condition C2. It is easy to see that Γ h satisfies the condition C1 and C2 with respect to a modified BM algorithm that combines the good-suffix rule and the bad-character rule of the BMH algorithm. However, finer partition is needed for the BM algorithm, since there are possibly more than one value of g for some classes in Γ h. We have then to decompose those classes according to the values of g, and trace back which leads to the splitting classes and further decompose them. Such a tracing-back decomposition is continued until the length of the splitting classes is less than two. The proof of the condition C2 is left to the reader. We now generate a partition Γ b satisfying conditions C1 and C2 with respect to BM algorithm. For example, suppose A = 1, 2, 3, 4} and S = Then Γ 1 = , , 32121, 2121, 121, 21}, Γ 2 = 43212, 43212, 3212, 212, 12}, Γ 3 = 4321, 4321, 321, 21}, Γ 4 = 432, 432, 32}, Γ 5 = 43, 43}, Γ 6 = 4}. Cancelling one of the overlapping 21 in Γ 1 and Γ 3, combining 121 in Γ 1 and 321 in Γ 3 and combining 12 in Γ 2 and 32 in Γ 4, gives the partition Γ h. In Γ h, in the class 1321, g(221) = 2 and g(421) = 3. Decompose the class 1321 into 221 and 421. But, p(132, 221) and p(132, 421) are not well-defined. The 6

7 solution is to decompose the class 132 into 22 and 42. Then we have the partition Γ b , , 32121, 2121, , , 32121, 2121, 43212, 43212, 3212, 212, 43212, 43212, 3212, 212, 4321, 4321, 1321, 21, 4321, 4321, 221, 421, 21, Γ h =, Γ 432, 432, 132, b = 432, 432, 22, 42, 43, 43, 43, 43, 4 4. Clearly, Γ i = (m 2 + m)/2 and Γ h 1 + Γ i. The number of additional classes from the decomposition is small. Thus Γ b is O(m 2 ). Determination of the transition probability Let Γ be a state space constructed as above for W j }. Let θ denote the distribution of W 1. Let α, β Γ. With the same notations as above, we have the transition probability θ(β) when g(α) l β, p(α, β) = θ(β 1 ) when g(α) < l β and α β 2, 0 when g(α) < l β and α β 2 =. Define the space Γ = (α, i) : α Γ, 0 i < g(α)}. Observe that Γ is a state space for W n }. Let p be transition probability for W n } with respect to Γ. It is easy to see that 1 when 0 i g(α) 2 and (β, j) = (α, i + 1), p((α, i), (β, j)) = p(α, β) when i = g(α) 1 and j = 0. 4 Asymptotic behavior of the number of comparisons Recall that C n is the number of character comparisons taken by the algorithm. Observe that C n = Nn j=1 f(w j). By applying the strong law of large number for the Markov chain W j } (see Chung [7]), there exist two constants µ f and µ g such that 1 n n f(w j ) µ f a.s. and j=1 1 n n g(w j ) µ g a.s. (3) j=1 Clearly, N n almost surely as n. From (3) it follows that C n n = Nn Nn j=1 g(w j) + j=1 f(w j) ( n N n j=1 g(w j) ) µ f µ (S) a.s. µ g Thus, the limiting mean of C n /n is µ S by the bounded convergence theorem. Define the weight function, h(w n ) = f(w Nn )1 Zn=0}. 7

8 Then C n = n j=1 h(w n). We apply the limit theorems in Chung [7] to the Markov chains W n } and obtain the following theorems. Theorem 1. where B (S) 0 is a constant specified in (8). Theorem 2. If B (S) > 0 then E(C n ) = µ (S) n + o( n), (4) Var(C n ) = B (S) n + o(n), (5) lim sup n C n µ (S) n 2B(S) n ln n ln n and the corresponding lim inf is equal to 1 almost surely. = 1 a.s. (6) The results from the application of the limit theorems in Chung include a central limit theorem. Here, we can obtain a stronger result by applying the Berry-Esseen theorem in Bolthausen [4] to the Markov chains W n }. The theorem in Bolthausen is for Markov chains with a possibly infinite discrete state space. Their conditions, the third moment of a returning time and the first moment of an entrance time, hold automatically for Markov chains with only finite states. Theorem 3. If B (S) > 0 then sup <x< P ) B(S) n < x Φ(x) = O(n 1/2 ), (7) ( C n µ (S) n where Φ(x) is the standard normal distribution. Explicit determination of the quantities µ (S) and B (S) Let P be the transition matrix and Ω be the class of recurrent states for W j }. Let π be the invariant probability measure with respect to P (i.e. πp = π). Then µ f = α Ω f(α)π α and µ g = α Ω g(α)π α. Let P be the transition matrix and Ω be the class of recurrent states for W n }. Let π be the invariant probability measure with respect to P. It is easy to check that π (α,i) = π α /µ g. From Chung [7, p.88], B (S) = h 2 c (α)π α + 2 h c (α)π α h c (β)π β (m α,α0 + m α0,β m α,β ), (8) α Ω α Ω β α 0 where α 0 is a fixed state (the value of B (S) is independent of the choice of α 0 ), h c (α) = h(α) µ (S) 8

9 and m α,β is the expectation of the time of first entrance of state β starting from state α. The M = m α,β } can be obtained from P, M = (I Z + EZ dg )D, where I is the identity matrix, Z is the fundamental matrix for P, E is a matrix with all entries 1, Z dg results from Z by setting off-diagonal entries equal to 0 and D is diagonal matrix with ith entry 1/π i (see Kemeny and Snell [10]). 5 Evaluation for the constants of the limiting mean and variance In this section, we assume that each t i is uniformly distributed in A = 1,..., q}. The average is taken for all patterns with the same length. First, we compute the constant µ (S) for the BM algorithm. In Figure 1, we have the maximum, average and minimum values of µ (S) for m = 2 to 10 and q = 2, 4, 8. Figure 1: The maximum, average and minimum values of µ (S) for the BM algorithm m ( q = 2 ) m ( q = 4 ) m ( q = 8 ) m q = 2 q = 4 q = Patterns reaching the minimum constant are of the form S = 11 1; while patterns reaching the maximum constant are showed in the table below. 9

10 m q = 2 q = 4 q = Let 1 σ = (µ q m (S) µ) 2, where µ is the average, be the standard deviation of µ (S). The values of σ are showed in Figure 2. S Figure 2: The standard deviation of the linearity constant for the BM algorithm q = q = q = m We compare our theoretical result with a recent experimental result [12] (see Figure 3). In their experiments, the text consists of 150,000 characters built randomly and 50 patterns randomly generated for each length are searched. The experimental result is very close to our theoretical result when m 10 and q = 8. The coincidence may be explained by the graphs in Figure 2 and Figure 4. Note that our approach becomes less efficient when computing the exact constant for longer patterns, say patterns with length > 20. However, a good estimate of the constant can be by obtained by the following procedure. The idea is to use a property about the good-suffix rule: there is a K such that G(i) is a constant for all i K (usually K log q m). Note that bad-character shift is always less than or equal to G(K). Thus, the class s K s 1 has only one value under g, G(K). Choose an m 0 K. We consider the pattern S = s m0 s 1, find the corresponding state space Γ and compute the invariant probability π. 10

11 Figure 3: The experimental C n /n vs. the average of the theoretical constant µ (S) for the BM algorithm in the cases q = 2 and experimental theoretical m ( q = 2 ) m ( q = 2 ) m ( q = 8 ) m ( q = 8 ) Note that every class of Γ has only one value under g and only one value under f except for the class α 0 = s m0 s 1. Thus µ g = α Γ g(α) π α, and µ f α Γ f(α) π α (m m 0 1) π α0. Note that π α0 = q m0 µ g q m 0 m 0. We can choose a suitable m 0 to control the order of the errors. We compute the constant for each papern individually. So, the number of patterns can not be too large. In the right-hand part of Figure 3, we use random samples of 2000 patterns for the estimation of the average of µ (S). Finally, we compute the value of B (S) for the BM and the BMH algorithm for m = 2 to 8 in the cases q = 2, 4, 8. The results are presented in Figure 4 and Figure 5. Note that B (S) = 0 in the special case that S = 12 and A = 1, 2}. 11

12 Figure 4: The maximum, average and minimum values of B (S) for the BM algorithm m ( q = 2 ) m ( q = 4 ) m ( q = 8 ) m q = 2 q = 4 q = Conclusion In the paper we investigated only the original BM and the BMH algorithms. Our approach may be readily modified for other variants of the BM algorithm. The extension from iid to Markov also seems to be straightforward. However, the transition matrix remains complex and thus a further simplification of µ (S) and B (S) seems to be difficult. We conclude by mentioning some interesting extensions for the BM algorithm. Random pattern. Evaluate the constant µ and B under the assumption that the pattern is random with a fixed length. Characterize the asymptotic behavior as the pattern length m (comparatively small compared with n). Maximum and minimum. Investigate the asymptotic behavior of the maximum and minimum values of µ (S) and the combinatorial properties of patterns reaching these values. Large deviation. Derive estimates for the probabilities of large-deviations via the constructed Markov chain. References [1] Baeza-Yates, R.A., Gonnet, G.H. and Régnier, M. Analysis of Boyer-Moore type string searching algorithms. Proc. 1st ACM-SIAM Symposium on Discrete Algorithms, San Francisco (1990),

13 Figure 5: The maximum, average and minimum values of B (S) for the BMH algorithm m ( q = 2 ) m ( q = 4 ) m ( q = 8 ) m q = 2 q = 4 q = [2] Baeza-Yates, R.A. and Régnier, M. Average running time of the Boyer-Moore-Horspool algorithm. Theoretical Computer Science 92 (1992), [3] Barth, G. An analytical comparison of two string matching algorithms. Information processing characters 18 (1984), [4] Bolthausen, E. The Berry-Esseen theorem for functionals of discrete Markov chains. Z. Wahrsch. Verw. Gebiete 54 (1980), no. 1, [5] Boyer, R.S. and Moore, J.S. A first string searching algorithm. Comm. ACM 20 (1977), [6] Charras, C. and Lecroq, T. Handbook of exact string-matching algorithms. lecroq/string/. [7] Chung, K.L. Markov chains with stationary transition probabilities. Second edition, Springer- Verlag, New York, [8] Cole, R. Tight bounds on the complexity of the Boyer- Moore string matching algorithm. SIAM J. Comput. 23 (1994), [9] Flajolet, P., Szpankowski, W. and Vallée, B. Hidden word statistics. Preprint (2002). [10] Kemeny, J. G. and Snell, J. L. Finite Markov chains. Van Nostrand, 1960; Springer,

14 [11] Mahmoud, H. M., Smythe, R. T. and Régnier, M. Analysis of Boyer-Moore-Horspool stringmatching heuristic. Random Structures Algorithms 10 (1997), [12] Michailidis, P.D. and Margaritis, K.G. On-line string matching algorithms: survey and experimental results. Intern. J. Computer Math. 76 (2001), [13] Régnier, M. Knuth-Morris-Pratt algorithm: An analysis. Mathematical Foundations Of Computer Science 1989 (Porabka Poland, 1989). Lecture Notes in Computer Science, Vol. 379, Springer, Berlin, 1989, [14] Smythe, R.T. The Boyer-Moore-Horspool heuristic with Markovian input. Random Structures Algorithms 18 (2001),

15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12

15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12 Algorithms Theory 15 Text search P.D. Dr. Alexander Souza Text search Various scenarios: Dynamic texts Text editors Symbol manipulators Static texts Literature databases Library systems Gene databases

More information

Searching Sear ( Sub- (Sub )Strings Ulf Leser

Searching Sear ( Sub- (Sub )Strings Ulf Leser Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf

More information

Algorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer

Algorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer Algorithm Theory 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore Institut für Informatik Wintersemester 2007/08 Text Search Scenarios Static texts Literature databases Library systems Gene databases

More information

Three new strategies for exact string matching

Three new strategies for exact string matching Three new strategies for exact string matching Simone Faro 1 Thierry Lecroq 2 1 University of Catania, Italy 2 University of Rouen, LITIS EA 4108, France SeqBio 2012 November 26th-27th 2012 Marne-la-Vallée,

More information

A Lower-Variance Randomized Algorithm for Approximate String Matching

A Lower-Variance Randomized Algorithm for Approximate String Matching A Lower-Variance Randomized Algorithm for Approximate String Matching Mikhail J. Atallah Elena Grigorescu Yi Wu Department of Computer Science Purdue University West Lafayette, IN 47907 U.S.A. {mja,egrigore,wu510}@cs.purdue.edu

More information

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des

Text Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des Text Searching Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, du Traitement de l Information et des Systèmes. International PhD School in Formal Languages and Applications Tarragona,

More information

On Pattern Matching With Swaps

On Pattern Matching With Swaps On Pattern Matching With Swaps Fouad B. Chedid Dhofar University, Salalah, Oman Notre Dame University - Louaize, Lebanon P.O.Box: 2509, Postal Code 211 Salalah, Oman Tel: +968 23237200 Fax: +968 23237720

More information

Overview. Knuth-Morris-Pratt & Boyer-Moore Algorithms. Notation Review (2) Notation Review (1) The Kunth-Morris-Pratt (KMP) Algorithm

Overview. Knuth-Morris-Pratt & Boyer-Moore Algorithms. Notation Review (2) Notation Review (1) The Kunth-Morris-Pratt (KMP) Algorithm Knuth-Morris-Pratt & s by Robert C. St.Pierre Overview Notation review Knuth-Morris-Pratt algorithm Discussion of the Algorithm Example Boyer-Moore algorithm Discussion of the Algorithm Example Applications

More information

Analysis of Algorithms Prof. Karen Daniels

Analysis of Algorithms Prof. Karen Daniels UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Spring, 2012 Tuesday, 4/24/2012 String Matching Algorithms Chapter 32* * Pseudocode uses 2 nd edition conventions 1 Chapter

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence

Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence November 1, 2007 Valeri T. Stefanov Wojciech Szpankowski School of Mathematics and Statistics Department of Computer Science

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Hsing-Yen Ann National Center for High-Performance Computing Tainan 74147, Taiwan Chang-Biau Yang and Chiou-Ting

More information

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS *

A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * 1 Jorma Tarhio and Esko Ukkonen Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki,

More information

Average Complexity of Exact and Approximate Multiple String Matching

Average Complexity of Exact and Approximate Multiple String Matching Average Complexity of Exact and Approximate Multiple String Matching Gonzalo Navarro Department of Computer Science University of Chile gnavarro@dcc.uchile.cl Kimmo Fredriksson Department of Computer Science

More information

State Complexity of Neighbourhoods and Approximate Pattern Matching

State Complexity of Neighbourhoods and Approximate Pattern Matching State Complexity of Neighbourhoods and Approximate Pattern Matching Timothy Ng, David Rappaport, and Kai Salomaa School of Computing, Queen s University, Kingston, Ontario K7L 3N6, Canada {ng, daver, ksalomaa}@cs.queensu.ca

More information

Maximum union-free subfamilies

Maximum union-free subfamilies Maximum union-free subfamilies Jacob Fox Choongbum Lee Benny Sudakov Abstract An old problem of Moser asks: how large of a union-free subfamily does every family of m sets have? A family of sets is called

More information

7 Asymptotics for Meromorphic Functions

7 Asymptotics for Meromorphic Functions Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential

More information

String Search. 6th September 2018

String Search. 6th September 2018 String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

Math 564 Homework 1. Solutions.

Math 564 Homework 1. Solutions. Math 564 Homework 1. Solutions. Problem 1. Prove Proposition 0.2.2. A guide to this problem: start with the open set S = (a, b), for example. First assume that a >, and show that the number a has the properties

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

1 Sequences of events and their limits

1 Sequences of events and their limits O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For

More information

Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth

Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1992 Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth Philippe

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

INF 4130 / /8-2017

INF 4130 / /8-2017 INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve

More information

Lecture 5: The Shift-And Method

Lecture 5: The Shift-And Method Biosequence Algorithms, Spring 2005 Lecture 5: The Shift-And Method Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 5: Shift-And p.1/19 Seminumerical String Matching Most

More information

2. Exact String Matching

2. Exact String Matching 2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1 Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt

More information

Combinatorial Proof of the Hot Spot Theorem

Combinatorial Proof of the Hot Spot Theorem Combinatorial Proof of the Hot Spot Theorem Ernie Croot May 30, 2006 1 Introduction A problem which has perplexed mathematicians for a long time, is to decide whether the digits of π are random-looking,

More information

LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES

LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca June 25, 2001 Abstract We

More information

Multiple Choice Tries and Distributed Hash Tables

Multiple Choice Tries and Distributed Hash Tables Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

Heuristics for The Whitehead Minimization Problem

Heuristics for The Whitehead Minimization Problem Heuristics for The Whitehead Minimization Problem R.M. Haralick, A.D. Miasnikov and A.G. Myasnikov November 11, 2004 Abstract In this paper we discuss several heuristic strategies which allow one to solve

More information

An Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching

An Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching An Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching Ricardo Luis de Azevedo da Rocha 1, João José Neto 1 1 Laboratório de Linguagens

More information

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Markus L. Schmid Trier University, Fachbereich IV Abteilung Informatikwissenschaften, D-54286 Trier, Germany, MSchmid@uni-trier.de

More information

The NFA Segments Scan Algorithm

The NFA Segments Scan Algorithm The NFA Segments Scan Algorithm Omer Barkol, David Lehavi HP Laboratories HPL-2014-10 Keyword(s): formal languages; regular expression; automata Abstract: We present a novel way for parsing text with non

More information

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro Thierry Lecroq University of Catania, Italy University of Rouen, LITIS EA 4108, France Symposium on Eperimental Algorithms

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

String Matching. Jayadev Misra The University of Texas at Austin December 5, 2003

String Matching. Jayadev Misra The University of Texas at Austin December 5, 2003 String Matching Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 Rabin-Karp Algorithm 3 3 Knuth-Morris-Pratt Algorithm 5 3.1 Informal Description.........................

More information

How Many Holes Can an Unbordered Partial Word Contain?

How Many Holes Can an Unbordered Partial Word Contain? How Many Holes Can an Unbordered Partial Word Contain? F. Blanchet-Sadri 1, Emily Allen, Cameron Byrum 3, and Robert Mercaş 4 1 University of North Carolina, Department of Computer Science, P.O. Box 6170,

More information

Lecture 4: September 19

Lecture 4: September 19 CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

CS 455/555: Mathematical preliminaries

CS 455/555: Mathematical preliminaries CS 455/555: Mathematical preliminaries Stefan D. Bruda Winter 2019 SETS AND RELATIONS Sets: Operations: intersection, union, difference, Cartesian product Big, powerset (2 A ) Partition (π 2 A, π, i j

More information

INF 4130 / /8-2014

INF 4130 / /8-2014 INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT

More information

Maximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017

Maximal Unbordered Factors of Random Strings arxiv: v1 [cs.ds] 14 Apr 2017 Maximal Unbordered Factors of Random Strings arxiv:1704.04472v1 [cs.ds] 14 Apr 2017 Patrick Hagge Cording 1 and Mathias Bæk Tejs Knudsen 2 1 DTU Compute, Technical University of Denmark, phaco@dtu.dk 2

More information

On Boyer-Moore Preprocessing

On Boyer-Moore Preprocessing On Boyer-Moore reprocessing Heikki Hyyrö Department of Computer Sciences University of Tampere, Finland Heikki.Hyyro@cs.uta.fi Abstract robably the two best-known exact string matching algorithms are the

More information

Efficient Sequential Algorithms, Comp309

Efficient Sequential Algorithms, Comp309 Efficient Sequential Algorithms, Comp309 University of Liverpool 2010 2011 Module Organiser, Igor Potapov Part 2: Pattern Matching References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

String Matching with Variable Length Gaps

String Matching with Variable Length Gaps String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length

More information

Avoiding Approximate Squares

Avoiding Approximate Squares Avoiding Approximate Squares Narad Rampersad School of Computer Science University of Waterloo 13 June 2007 (Joint work with Dalia Krieger, Pascal Ochem, and Jeffrey Shallit) Narad Rampersad (University

More information

By allowing randomization in the verification process, we obtain a class known as MA.

By allowing randomization in the verification process, we obtain a class known as MA. Lecture 2 Tel Aviv University, Spring 2006 Quantum Computation Witness-preserving Amplification of QMA Lecturer: Oded Regev Scribe: N. Aharon In the previous class, we have defined the class QMA, which

More information

The 123 Theorem and its extensions

The 123 Theorem and its extensions The 123 Theorem and its extensions Noga Alon and Raphael Yuster Department of Mathematics Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University, Tel Aviv, Israel Abstract It is shown

More information

EXPECTED WORST-CASE PARTIAL MATCH IN RANDOM QUADTRIES

EXPECTED WORST-CASE PARTIAL MATCH IN RANDOM QUADTRIES EXPECTED WORST-CASE PARTIAL MATCH IN RANDOM QUADTRIES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca Carlos Zamora-Cura Instituto de Matemáticas Universidad

More information

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007. Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G

More information

Gärtner-Ellis Theorem and applications.

Gärtner-Ellis Theorem and applications. Gärtner-Ellis Theorem and applications. Elena Kosygina July 25, 208 In this lecture we turn to the non-i.i.d. case and discuss Gärtner-Ellis theorem. As an application, we study Curie-Weiss model with

More information

CPT+: A Compact Model for Accurate Sequence Prediction

CPT+: A Compact Model for Accurate Sequence Prediction CPT+: A Compact Model for Accurate Sequence Prediction Ted Gueniche 1, Philippe Fournier-Viger 1, Rajeev Raman 2, Vincent S. Tseng 3 1 University of Moncton, Canada 2 University of Leicester, UK 3 National

More information

Multiple Pattern Matching

Multiple Pattern Matching Multiple Pattern Matching Stephen Fulwider and Amar Mukherjee College of Engineering and Computer Science University of Central Florida Orlando, FL USA Email: {stephen,amar}@cs.ucf.edu Abstract In this

More information

Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas)

Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Michael Fuchs Institute of Applied Mathematics National Chiao Tung University

More information

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods. Lecture 7: November 11, 2003 Estimating the permanent Eric Vigoda

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods. Lecture 7: November 11, 2003 Estimating the permanent Eric Vigoda University of Chicago Autumn 2003 CS37101-1 Markov Chain Monte Carlo Methods Lecture 7: November 11, 2003 Estimating the permanent Eric Vigoda We refer the reader to Jerrum s book [1] for the analysis

More information

The More for Less -Paradox in Transportation Problems with Infinite-Dimensional Supply and Demand Vectors

The More for Less -Paradox in Transportation Problems with Infinite-Dimensional Supply and Demand Vectors The More for Less -Paradox in Transportation Problems with Infinite-Dimensional Supply and Demand Vectors Ingo Althoefer, Andreas Schaefer April 22, 2004 Faculty of Mathematics and Computer Science Friedrich-Schiller-University

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift

More information

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6)

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6) Markov chains and the number of occurrences of a word in a sequence (4.5 4.9,.,2,4,6) Prof. Tesler Math 283 Fall 208 Prof. Tesler Markov Chains Math 283 / Fall 208 / 44 Locating overlapping occurrences

More information

A mathematical model for a copolymer in an emulsion

A mathematical model for a copolymer in an emulsion J Math Chem (2010) 48:83 94 DOI 10.1007/s10910-009-9564-y ORIGINAL PAPER A mathematical model for a copolymer in an emulsion F. den Hollander N. Pétrélis Received: 3 June 2007 / Accepted: 22 April 2009

More information

Computing repetitive structures in indeterminate strings

Computing repetitive structures in indeterminate strings Computing repetitive structures in indeterminate strings Pavlos Antoniou 1, Costas S. Iliopoulos 1, Inuka Jayasekera 1, Wojciech Rytter 2,3 1 Dept. of Computer Science, King s College London, London WC2R

More information

Quantum pattern matching fast on average

Quantum pattern matching fast on average Quantum pattern matching fast on average Ashley Montanaro Department of Computer Science, University of Bristol, UK 12 January 2015 Pattern matching In the traditional pattern matching problem, we seek

More information

Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger)

Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger) Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger) Michael Fuchs Department of Applied Mathematics National Chiao Tung University Hsinchu, Taiwan

More information

Competitive Weighted Matching in Transversal Matroids

Competitive Weighted Matching in Transversal Matroids Competitive Weighted Matching in Transversal Matroids Nedialko B. Dimitrov and C. Greg Plaxton University of Texas at Austin 1 University Station C0500 Austin, Texas 78712 0233 {ned,plaxton}@cs.utexas.edu

More information

The Number of Runs in a String: Improved Analysis of the Linear Upper Bound

The Number of Runs in a String: Improved Analysis of the Linear Upper Bound The Number of Runs in a String: Improved Analysis of the Linear Upper Bound Wojciech Rytter Instytut Informatyki, Uniwersytet Warszawski, Banacha 2, 02 097, Warszawa, Poland Department of Computer Science,

More information

String Range Matching

String Range Matching String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings

More information

arxiv: v2 [math.pr] 4 Sep 2017

arxiv: v2 [math.pr] 4 Sep 2017 arxiv:1708.08576v2 [math.pr] 4 Sep 2017 On the Speed of an Excited Asymmetric Random Walk Mike Cinkoske, Joe Jackson, Claire Plunkett September 5, 2017 Abstract An excited random walk is a non-markovian

More information

Decomposing oriented graphs into transitive tournaments

Decomposing oriented graphs into transitive tournaments Decomposing oriented graphs into transitive tournaments Raphael Yuster Department of Mathematics University of Haifa Haifa 39105, Israel Abstract For an oriented graph G with n vertices, let f(g) denote

More information

The Kemeny constant of a Markov chain

The Kemeny constant of a Markov chain The Kemeny constant of a Markov chain Peter Doyle Version 1.0 dated 14 September 009 GNU FDL Abstract Given an ergodic finite-state Markov chain, let M iw denote the mean time from i to equilibrium, meaning

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

Pattern Matching in Constrained Sequences

Pattern Matching in Constrained Sequences Pattern Matching in Constrained Sequences Yongwook Choi and Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Email: ywchoi@purdue.edu, spa@cs.purdue.edu

More information

Optimal spaced seeds for faster approximate string matching

Optimal spaced seeds for faster approximate string matching Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching

More information

THE N-VALUE GAME OVER Z AND R

THE N-VALUE GAME OVER Z AND R THE N-VALUE GAME OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is an easily described mathematical diversion with deep underpinnings in dynamical systems analysis. We examine

More information

Compression in the Space of Permutations

Compression in the Space of Permutations Compression in the Space of Permutations Da Wang Arya Mazumdar Gregory Wornell EECS Dept. ECE Dept. Massachusetts Inst. of Technology Cambridge, MA 02139, USA {dawang,gww}@mit.edu University of Minnesota

More information

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( ) Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio (2014-2015) Etienne Tanré - Olivier Faugeras INRIA - Team Tosca October 22nd, 2014 E. Tanré (INRIA - Team Tosca) Mathematical

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Likelihood Analysis of Gaussian Graphical Models

Likelihood Analysis of Gaussian Graphical Models Faculty of Science Likelihood Analysis of Gaussian Graphical Models Ste en Lauritzen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 2 Slide 1/43 Overview of lectures Lecture 1 Markov Properties

More information

Optimal spaced seeds for faster approximate string matching

Optimal spaced seeds for faster approximate string matching Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching

More information

The Jordan canonical form

The Jordan canonical form The Jordan canonical form Francisco Javier Sayas University of Delaware November 22, 213 The contents of these notes have been translated and slightly modified from a previous version in Spanish. Part

More information

SORTING SUFFIXES OF TWO-PATTERN STRINGS.

SORTING SUFFIXES OF TWO-PATTERN STRINGS. International Journal of Foundations of Computer Science c World Scientific Publishing Company SORTING SUFFIXES OF TWO-PATTERN STRINGS. FRANTISEK FRANEK and WILLIAM F. SMYTH Algorithms Research Group,

More information

On the static assignment to parallel servers

On the static assignment to parallel servers On the static assignment to parallel servers Ger Koole Vrije Universiteit Faculty of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The Netherlands Email: koole@cs.vu.nl, Url: www.cs.vu.nl/

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE Most of the material in this lecture is covered in [Bertsekas & Tsitsiklis] Sections 1.3-1.5

More information

An asymptotically tight bound on the adaptable chromatic number

An asymptotically tight bound on the adaptable chromatic number An asymptotically tight bound on the adaptable chromatic number Michael Molloy and Giovanna Thron University of Toronto Department of Computer Science 0 King s College Road Toronto, ON, Canada, M5S 3G

More information

Finding all covers of an indeterminate string in O(n) time on average

Finding all covers of an indeterminate string in O(n) time on average Finding all covers of an indeterminate string in O(n) time on average Md. Faizul Bari, M. Sohel Rahman, and Rifat Shahriyar Department of Computer Science and Engineering Bangladesh University of Engineering

More information

Containment restrictions

Containment restrictions Containment restrictions Tibor Szabó Extremal Combinatorics, FU Berlin, WiSe 207 8 In this chapter we switch from studying constraints on the set operation intersection, to constraints on the set relation

More information

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then

String Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then String Matching Thanks to Piotr Indyk String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p Example: Σ={,a,b,,z}

More information

Online Interval Coloring and Variants

Online Interval Coloring and Variants Online Interval Coloring and Variants Leah Epstein 1, and Meital Levy 1 Department of Mathematics, University of Haifa, 31905 Haifa, Israel. Email: lea@math.haifa.ac.il School of Computer Science, Tel-Aviv

More information

Dyadic diaphony of digital sequences

Dyadic diaphony of digital sequences Dyadic diaphony of digital sequences Friedrich Pillichshammer Abstract The dyadic diaphony is a quantitative measure for the irregularity of distribution of a sequence in the unit cube In this paper we

More information

Negative Examples for Sequential Importance Sampling of Binary Contingency Tables

Negative Examples for Sequential Importance Sampling of Binary Contingency Tables Negative Examples for Sequential Importance Sampling of Binary Contingency Tables Ivona Bezáková Alistair Sinclair Daniel Štefankovič Eric Vigoda arxiv:math.st/0606650 v2 30 Jun 2006 April 2, 2006 Keywords:

More information

SMSTC (2007/08) Probability.

SMSTC (2007/08) Probability. SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................

More information

Discovering Most Classificatory Patterns for Very Expressive Pattern Classes

Discovering Most Classificatory Patterns for Very Expressive Pattern Classes Discovering Most Classificatory Patterns for Very Expressive Pattern Classes Masayuki Takeda 1,2, Shunsuke Inenaga 1,2, Hideo Bannai 3, Ayumi Shinohara 1,2, and Setsuo Arikawa 1 1 Department of Informatics,

More information

A conjecture on the alphabet size needed to produce all correlation classes of pairs of words

A conjecture on the alphabet size needed to produce all correlation classes of pairs of words A conjecture on the alphabet size needed to produce all correlation classes of pairs of words Paul Leopardi Thanks: Jörg Arndt, Michael Barnsley, Richard Brent, Sylvain Forêt, Judy-anne Osborn. Mathematical

More information

Explicit expressions for the moments of the size of an (s, s + 1)-core partition with distinct parts

Explicit expressions for the moments of the size of an (s, s + 1)-core partition with distinct parts arxiv:1608.02262v2 [math.co] 23 Nov 2016 Explicit expressions for the moments of the size of an (s, s + 1)-core partition with distinct parts Anthony Zaleski November 24, 2016 Abstract For fixed s, the

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information