The NFA Segments Scan Algorithm

Size: px
Start display at page:

Download "The NFA Segments Scan Algorithm"

Transcription

1 The NFA Segments Scan Algorithm Omer Barkol, David Lehavi HP Laboratories HPL Keyword(s): formal languages; regular expression; automata Abstract: We present a novel way for parsing text with non deterministic finite automatons. For "real life" regular expressions and text, our algorithm scans only a fraction of the characters, and performs a small number of operations for each of these characters (for synthetic worse case scenarios, it would perform worse than classical algorithms). Although there are similar approaches, our algorithm is far simpler and less resource consuming than the alternatives we are aware of. External Posting Date: February 21, 2014 [Fulltext] Internal Posting Date: February 21, 2014 [Fulltext] Approved for External Publication Copyright 2014 Hewlett-Packard Development Company, L.P.

2 The NFA segments scan algorithm Omer Barkol and David Lehavi HP Labs Israel Abstract. We present a novel way for parsing text with non deterministic finite automatons. For real life regular expressions and text, our algorithm scans only a fraction of the characters, and performs a small number of operations for each of these characters (for synthetic worse case scenarios, it would perform worse than classical algorithms). Although there are similar approaches, our algorithm is far simpler and less resource consuming than the alternatives we are aware of. 1 Introduction The pattern matching problem calls for discovering if a given string x is in a language L. In the case where L is the language of strings containing a given word as a substring, there are several fast (and by now classical) algorithms - see [BM], and [KMP]. In the case where L the language of strings containing one word out of a given set, there are two (again, rather classical) algorithms - see [AC], [C-W]. In the case where L is a general regular language, we are aware of two approaches: a rather complicated approached presented in [WW], and a rather simple but resource consuming one presented in [Ke] (in which one has to maintain an entire suffix tree for each state in the automaton corresponding to the regular language). Our approach is somewhat similar to the one presented in [Ke], but it avoids the big memory overhead, and is easier to analze and generalize. Our method is motivated by our observations on the strcture of real life regular expressions on the one hand, and the moral of the Boyer-Moore algorithm on the other: Many real life regular expressions are composed of contigous words connected by either or operations, or Kleen-* operations on a single node in the automaton. Addapting the Boyer-Moore phylosophy, we match the words between the Kleen-* s, and jump ahead in order to match the next word(s). We summarized this approach the following principles: At any stage of the execution, one should hold all the possible sub-matches of the processed sub-string to the automaton. Regarding non-deterministic finite automaton as a directed graph, instead of storing a sub-matche as a path on the automaton, store the path s endpoints. Instead of advancing one character at a time over easily matched pieces of the string/automaton and matching a contiguous pieces of the string to a path, one can jump and attempt to match another piece of the string to another part of the automaton, and then these paths should be glued.

3 2 Preliminaries Notations: For each string x we denote the ith character of x by x i. We denote by [i, j] the segment (or set) of integers {i, i + 1,..., j 1, j}. Given a non deterministic automaton M = (Q, Σ,, q 0, {q F }) we define a path on the automaton to be a map f : [i, j] Q, such that for each i k < j, f(k + 1) (f(k)). We then say that f[i, j] is a path from q to q if f(i) = q and f(j) = q. We define the distance between two states q, q Q: dist(q, q ) := min f[i,j] a path from q to q (j i). We assume that M does not admit sinks (i.e. nodes with self loops as the only outgoing edges). Before adding some less standard definitions to support our algorithm we present the Thompson algorithm as presented in [Th]: Algorithm 1: Thompson s algorithm Input: x = (x 0,... x x 1 ) Σ x, M Output: True if the M matches x; False o.w. p 0 Z {q o } while p < x do if q F Z then return True // match! Z {q q Z : q (q, x p )} // crawl right p p + 1 if Z = then return False // no active states end return False // reached end of string We now add a few non-standard definitions. φ(q) = True σ Σ : q (q, σ) or q = q F. ψ(q) = True x Σ, x ɛ : q (q, x) or q = q 0, or q = q F. Example 1. If our automaton is the one corresponding to the regular expression.*a.*bca described by the following diagram:.. q a start 0 q b 1 q c 2 q a 3 q F then φ(q) = ψ(q) = True if and only if q {q 0, q 1, q F }. Intuitively and keeping our motivation (presented in the introduction) in mind one should think about nodes satisfying φ as nodes from which we always want to jump. There is no point in reading one character at a time if the only possible path we are trying to extend is one ending with a node satisfying φ(q); in a sense this is exactly the Boyer-Moore approach: if the match of the pattern fails and you have to start looking from the beginning of the pattern (which

4 is the equivalent of a node satisfying φ(q)), you should jump as far as possible before starting. The intuition behind nodes satisfying ψ(q) is more heuristic: these are nodes which typically and in real-life have more chance to match. From an algorithmic point of view, the difference between nodes satisfying φ(q), and one satisfying ψ(q), is that once we jump from a node satisfying ψ but not φ, we have to glue back-segments. We also define the following for every q Q: Ψ(q) def = {q path f : [1, m] Q from q to q, j [2, m 1] : ψ(f(j))} l(q) def = max q Ψ(q) dist(q, q ). I.e, in Example 1 above we would have Ψ(q 0 ) = {q 0, q 1 }, Ψ(q 1 ) = {q 1, q 2, q 3, q F }, Ψ(q 2 ) = {q 2, q 3, q F }, Ψ(q 3 ) = {q 3, q F }, Ψ(q F ) = {q F }, l(q 0 ) = 1, l(q 1 ) = 3, l(q 2 ) = 2, l(q 3 ) = 1, l(q F ) = 0. Note that as M does not admit sinks l > 0 for any non terminal state (either accepting or not). Note also that Ψ, l are computable using reverse BFS from q F. The set Ψ(q) is simply the set of nodes on the automaton which are reachable from q but do not satisfy ψ(q) to which we jump (i.e. start a new match to another path) if our current path ends with a node satisfying ψ(q), whereas l(q) is the size of the jump on the string. Given a string x Σ, a finite disjoint union I = t i=1 [a i, b i ] of ordered intervals inside N, we define P I to be the set of maps f : I Q such that 0 I f(0) = q 0, and i : f(a i+1 ) Ψ(f(b i )), k i [a i, b i 1] : f(k + 1) (f(k), x k ). E.g. in the automaton from Example 1, given the string bad and writing functions as sets of ordered pairs we have P [0,2] [5,5] = {{(0, q 0 ), (1, q 0 ), (2, q 1 ), (5, q 1 )}, {(0, q 0 ), (1, q 0 ), (2, q 1 ), (5, q 2 )}, {(0, q 0 ), (1, q 0 ), (2, q 1 ), (5, q 3 )}, {(0, q 0 ), (1, q 0 ), (2, q 1 ), (5, q F )}}. Finally we define S I to be set of pairs of end points of the non contigous segments of function of P I ; i.e. for the automaton in Example 1 and the same string as above we would have S [0,2] [5,5] = {{(q 0, q 1 )}, {(q 0, q 1 ), (q 2, q 2 )}, {(q 0, q 1 ), (q 3, q 3 )}, {(q 0, q 1 ), (q F, q F )}}. 3 The Segments Scan Algorithm We are now ready to present our segments automaton scan algorithm. The algorithm implicitly uses interval unions I of only two intervals: [0, p] [p + b, p + e]

5 (intuitively p represents a point which Thompson s algorithm surely arrived, and the second segment is eventually supposed to glue to the first; we jump ahead in this fashion for the same reason we do so in Boyer-Moores algorithm: if we fail we want to do so while advancing as much as we can on the string). Instead of finding a compact representation of P I we find a compact representation of S I : the possible f(p) (intuitevly: the points we jump from will be represented by a union of two sets A, B depending if the nodes satisfies φ or not). The pairs corresponding to the second interval of S I will be represented by a set of pairs denote S (continuing with our example from above, for S [0,2] [5,5] we have S = {(q 1, q 1 ), (q 2, q 2 ), (q 3, q 3 ), (q F, q F )}). We denote π 2 S = {q q : (q, q) S} (i.e. the projection on the second coordinate); we use the analog notation π 1 for projection on the first coordinate. We set O, O to be two boolean oracles (note that usualy these oracles are simple functions depending on S, ψ, φ - see Example 2 below): Algorithm 2: The Segments Scan Algorithm Input: x = (x 0,... x x 1 ) Σ x, M, preprocessed data φ, ψ, Ψ, l Output: True if the M matches x; False o.w. p, b, e 0 S {(q 0, q 0 )} A B while p + e x do if b = 0 then if q F π 2 S then return True // match! if O or p = e = 0 then // jump B π 2 S A A {q B φ(q)} p p + e b, e min q A B l(q ) S {(q, q ) q A B : q Ψ(q)} if q F π 2 S and e > b and O then // string end or crawl right if p + e = x then return False // reached end of string S {(q, q) q : (q, q ) S, q (q, x p+e )} e e + 1 else // crawl left b b 1 S {(q, q) q : (q, q) S, q (q, x p+b )} if b = 0 then S {(q, q) S q A B} // glue if S = then return False // no active segments if (q, q) S : φ(q ) then b 0 end return False // passed end of string

6 Example 2. In our experiments (see Section 5) we considered three pairs of oracles: O 1 : b = 0 or q π 2 S : ψ(q) O 1 : q π 2 S : ψ(q) O 2 : b = 0 O 2 : q π 2 S : ψ(q) O 3 : False O 3 : True Theorem 1. The segments scan algorithm returns the same truth value as the Thompson algorithm. In order to prove the theorem we first claim two Lemmas: Lemma 1. Immediately after line 6 in Thompson s algorithm, the set Z is the set of endpoints of the paths in P [0,p]. The proof of this Lemma is quite straightforward, and see [Th] for the details. We turn to prove the invariant our algorithm holds: Lemma 2. Immediately after each of the lines 13, 17, 21, 23 in the segments automaton scan algorithm: f P [0,p] [p+b,p+e] : either f(p) A B or (b = 0 and f(p) A B π 1 S), moreover: (f(p + b), f(p + e)) S, (q, q ) S : f P [0,p] [p+b,p+e] : f(p + b) = q, f(p + e) = q, f(p) A B. Proof. The initialized values are I = {[0, 0]}, p, b, e = 0, A, B =, and S = {(q 0, q 0 )}; thus as for all f P [0,0] : f(0) = q 0 the two properties hold. We will show that if the properties hold in the beginning of the loop (i.e. at line 6) it holds in each of the other lines. We thus induct on the number of times we encounter each of this lines. We now separate to cases: After line 13: We get to this line either by not entering the if statement of line 6, which means all values are the same and thus the hypothesis holds, or we did, and then b = 0. We now use the fact that: [0, p] [p + b, p + e] = [0, p] [p + 0, p + e] = [0, p + e] P [0,p] [p+0,p+e] = P [0,p+e]. By the induction hypothesis, (f(p + b), f(p + e)) S, and thus for the new p = p + e it holds that f(p ) = f(p + e) π 2 S = B. Also, for the new values of b, e, note that b = e, and that by the assignment of line 13 (f(p + b ), f(p + e )) = (f(p + b ), f(p + b )) S. Also, the only (q, q ) that where added in those lines to S are (f(p + b ), f(p + b )) where obviously, f(p ) A B. After line 17: By the induction hypothesis, the claim holds when the program was last before line 15; it holds after line 17 by the definition of. After line 21: Here we seperate to two cases: If b > 1, then by the induction hypothesis, the claim holds when the program was last before line 18; it holds before line 21 by the definition of, and the then part of the if in line 21 is never executed.

7 If b = 1 (note that we never get to this line with b = 0), then - by the definition of, the only way in which the induction hypothesis is violated before line 21, is that the paths paremetrized by S may not glue at p to the paths parameterized by A B; i.e. at before line 21 there are pairs (q, q ) S such that q A B, which means that the functions (from [p, p + e] to Q) parameterized by the pair (q, q ) do not glue at p to any of the functions parameterized by A B. However, this issue is ammended by line 21, where we get rid of the bad pairs (implicitly, by setting a unique value for f(p), the one which comes from the set A B). Thus the hypothesis hold after this line. After line 23: By the induction hypothesis, the claim holds when the program was last before line 23; it holds after the line by the definition of φ: indeed if (q, q ) S, and f : [p + b, p + e] Q, then since σ Σ : q (q, σ) the function f can be trivially extended to [p + b 1, p + e] by setting f(p + b 1) to q; we conclude this argument using a decsending induction on b. Proof ( of Theorem 1). By the Lemmas 1 and 2, if we get to line 7 in the segments scan algorithm, then using parameter values from the segments scan algorithm Thompson s algorithm reached place p Thompson = p+e on the string, with front Z = π 2 S. The first conclusion we draw from this fact, is that if the segments scan algorithm exists succesfully on line 7, then so does Thompson s algorithm. As for the other direction, assume that the segments scan algorithm exists unsuccesfully. We will analyze what happened between the last time the algorithm visited line 7 and the exit point, and prove that Thompson s algorithm exists unsucessfully as well. Let A, B, p be as they were set after the last visit to line 12, and arguing by contradiction and assuming that Thompson s algorithm exit succesfully let p T-final be the number of iterations of Thompson s algorithm, then by Lemma 2 there is a map f P [p,pt-final ] such that f(p) A B and f(p T-final ) = q F. By (decending) induction on (on a a below) this means that for all a, a such that p a a p T-final, the following two properties hold: P [0,p] [a,a ], and f P [0,p] [a,pt-final ] : f(p T-final ) = q F. By the first of these properties we do not pass the if in line 22 before hitting line 7 again, and by the second of these p + e cannot excceed p T-final (see the condition in line 14) before hitting 7 again contradicting the assumption that we already reached this line for the last time. 3.1 Prunning of redundant extenions There are four minor changes in the algorithm which may allways be used to trim down the sizes of A, B and S. For the sake of simplicity of the exposition we omitted them from the initial algorithm presentation. The four changes we

8 present are by and large independant of one another (we explicitly note when they are not). 1. Currently the set A represents all the nodes in the front which satisfy φ; instead we can make A represent all the nodes q in the front which satisfy φ, that admit paths from them to q F which do not pass through other nodes of A (otherwise - why keep q? we can just keep these other points of A). I.e. imidiately after line 10 we modify A as follows: A {q A path f from q to q F : range(f) A = {q}}. Note that the predicate above may be precomputed before we execute the algorithm namely for each node q statisfying φ we may encode the set of nodes Φ(q) such that we erase q from A only if A Φ(q). 2. In the case where b = 0, the set π 1 S is glued to A B. Thus we may work directly with π 2 S instead of S. Representing π 2 S by B, we simply have to make the following changes: Line 2: Substitute by S. Line 4: Substitute by B {q 0 }. Line 7: Substitute the π 2 S in the condition by B. Line 9: Erase Line 16: Substitute by: if b=0 then B {q q B : q (q, x p+e )} else S {(q, q) q : (q, q ) S, q (q, x p+e )}. Line 21: Substitute the then part by B {q (q, q) S q A B}. 3. Ideally, instead of extending all the possible segments to the left, we would want to prune pairs (q, q) such that min q A B dist(q, q ) > b. While this goal is dificult to achieve in general, it is easy enough to prune some of these pairs, by modifing the update of S in the left crawl (line 21) to S {(q, q) q : (q, q) S, q (q, x p+b ) and (q q or q A B)}. Note that a similar change should be made in the modification to line 21 in the previous paragraph. 4. Finaly, allowing crawling to the right from states in A B bloughts the size of S. Such a crawl is redundant since we eventually crawl left to A B (either before performing another jump, or after it). Thus we can modify the update of S in line 13 to S {(q, q ) q A B : q Ψ(q), q q or q A}, and the condition line 22 to S = and A =. 3.2 Jumping, crawling, and oracles In line 14 we use the oracle O to decide whether crawling to the left is better than crawling to the right; where as in line 8 we use the oracle O to determine

9 when it s better to crawl to the right, and when to jump (in Section 5 we show test results for the three oracles presented in Example 2). All the example oracles we considred in Example 2 are motivated by Boyer-Moore; i.e. they have a bias to crawl to the left which is only violated in cases which do not occur in the absence of loops in the automaton. We do not know if this design of the oracles is optimal (or even close to optimal) for typical regular expression and for either typical or worst case string. While we are not sure on how to analyze worst case behaviour for typical regular expression, we are working on an approach that we hope will prove usefull both in the analysys, and in the design of better oracles for typical strings where typical here means generated by a Markov process (both for the string and the regular expression, where in the second one the Markov process is a heirarchical one on the application of regular expression gramtical rules). This approach is motivated by the run-time analysis of the Boyer-Moore algorithm in e.g. [B-YR] [S] [Ts]) for Markovian inputs. 3.3 Generalization: Segment unions of more than two segments Our algorithm works on unions of segments I which are a union of only two segments the first of which starts at 0. Hence, our algorithm either updates data about the segment which does not contain 0, or unites the two segments. However, we can modify the definition of Ψ(q) to Ψ k (q) = {q path f : [1, m] Q from q to q, #{j [2, m 1] : ψ(f(j))} k}, thus allowing the path to contain at most k nodes satisfying ψ(q) (possibly, but not necessarily, adding other requirement e.g. distance, or simply being some special nodes on them). The algorithmic change there would be to work with Is which contain more than two segments; thus at each iteration of the main loop we would have to decide not between extensing to left or right side of the second segment of S I which is currently represented by S but between exending the left or right side of any of the segments except the first one. As we can store more states of the NFA at once, and if some state scans are more likely to fail than others, this may be an advantage. As usual, this can be determined based on the NFA, the text, or both. 4 Qualitative analysis of the number of character reads, character comparisons, and the front size The number of character comparisons we perform in the segments scan algorithm #S for iteration i, i an iteration of the main loop (with some mild change if we use the second acceleration in 3.1, since in some iterations we have to add the size of B instead of S which is smaller), whereas

10 the number of character comparisons of Thompson s algorithm is #Z for place p. p place on string In Section 6 we discuss various alternatives of substituting the left and right crawling operations on the entire set S by a constant time operation. In this case, when comparing the segments scan algorithm to Thompson s, one simply has to compare the number of iterations of the segments scan algorithm which is the number of character reads it performs, and the length of the string up to acceptance/denail of the automaton which is the number of character reads Thompson s algorithm performs. Example 3 (Bad regex and input string for the segments scan algorithm). We note that given the regular expression b(.{10}a.{9}a..{8}a.{2}.{7}a.{3}.{6}a.{4}.{5}a.{5}.{4}a.{6}.{3}a.{7}.{2}a.{8}.a.{9} a.{10}) the input string aaaa..., and assuming the standard assumption above, our algorithm requires 10 times more character comparisons than Thompson s. Why are we presenting this algorithm then? Simply put, the acceleration of the segments scan algorithm comes from performing big jumps (thus reducing the number of iterations of the main loop), while not increasing the size of S by much in a way that will compensate on this reduction. We cannot make any statements which are true for all input strings and all NFAs (in 3.2 we discussed future plans on making better worst case analysys, as well as probabilistic quantitative statements, and how these statements would affect the algorithm). However, we do make two empirical claims which hold for regular expressions and input strings in real life : Most not very short input words correspond only to paths on the NFA which stay on nodes satisfying φ(q). Most not very short input words which correspond to paths on the NFA which do not stay on nodes satisfying φ(q), correspond to a small number of such paths. The effect of the first rule of thumb here is that we may perform big jumps, and that therefor the number of iterations of the algorithm is small. The effect of the second rule of thumb is that after crawling only a small number of letters, the size of the S is still small. 5 Experimental results In our experiments, we implemented the algorithm, and tested how many characters out of the input string x are actually read during the run of the segment scan algorithm given an automaton for different regular expressions. We

11 used the three oracles pairs from Example 2 which we denote in the table below simple by 1, 2, 3 and used the first three optimizations presented in 3.1, (but not the forth). We ran our searches on tests used by boost (see On the Mark Twain corpus with the following results: regex % for 1 % for 2 % for 3 Twain Huck[[:alpha:]] [[:alpha:]]+ing Tom Sawyer Tom Sawyer Huckleberry Finn (Tom Sawyer Huckleberry Finn).{0,30}river river.{0,30}(tom Sawyer Huckleberry Finn) and on test html searches benchmarks with the following results: regex % for 1 % for 2 % for 3 beman john dave <p>.*</p> <h[1-8][^>]*>.*</h[1-8]> <a[^>]+href=("[^"]*" [^[:space:]]+)[^>]*> <img[^>]+src=("[^"]*" [^[:space:]]+)[^>]*> <font[^>]+face=("[^"]*" [^[:space:]]+)[^>]*>.*</font> One can observe that the percentage of characters read in this test set even for complicated and very not word match like regular expressions (as in the last of the html example searches) where as low as 34%. Moreover, note that there is a significant difference depending on the oracle, and thus an oracle that can learn the input (regular expressions and common stings) might be much more efficient. 6 Accelerating the inner loops There are three inner loops in our algorithm: one in the computation of π 2 S (which is rather standard to accelerate), the left expansion line 16, and right expansion in line 21, where we compute {(q, q) q : (q, q ) S, q (q, x p+e )}, {(q, q) q : (q, q) S, q (q, x p+b )}, respectively. Reducing these loops from O(S) time operations to O(1) time operations changes the performance of the algorithm from the number of character comparisons to the number of character reads (see Section 4). In this section we consider two acceleration methods for these loops: the more conservative method is constructing the DFA corresponding to the segments scan algorithm; whereas

12 the more radical one are is realiance on a hardware incarnation of the original NFA. 6.1 Full DFA and hybrid execution Our algorithm is a complicated way to scan an automaton. Nevertheless, it is still an automaton scanning algorithm, and as such, it admits an underlying DFA. I.e. we can construct the corresponding DFA (whose size, in a worse case scenario, is O(size of the NFA squared). E.g. the segments DFA corresponding to the NFA from Example 1 is given by the following diagram (here we use the oracles O 1, O 1, and all four optimizations presented in 3.1 when computing S): start abc,3 S = {(q 0, q 0 ), (q 1, q 1 )} b = 1, e = 1 A = B = {q 0 } a,1 S = {(q 3, q F )} b = 2, e = 3 c S = {(q 2, q F )} b = 1, e = 3 b S =Irrelevant b = 0, e = 3 A = {q 1 }, B = {q F } b,3 a,1 c,3 a S = {(q 2, q 3 )} b = 2, e = 3 b S = {(q 1, q 3 )} b = 1, e = 3 a S = {(q 1, q F )} b = 1, e = 4 S = {(q 1, q 1 ), (q 2, q 2 ), (q 3, q 3 ), (q F, q F )} b = 3, e = 3 c b,2 a,2 b c,1 S = {(q 2, q F } b = 2, e = 3 c S = {(q 3, q F )} b = 2, e = 4 a S = {(q 1, q F )} b = 2, e = 5 a,2 Legened: In out-going edges from dashed nodes, we consider the charcater x p+e, where as in full nodes we consider the character x p+b. The first parameter on each edge is the accepting character of the edges; the second paramter if one exists is the increment of p. Finally note that we could represent the DFA partially, and run the scan in a hybrid mode (see [BC]). 6.2 Accelerating left and right expansions, given an NFA with front expansion in O(1) Assuming we have at our disposal an NFA implementation such that front expansion is done in O(1), (this assumption which is not that far from reality - see

13 e.g. [SP]). We may store S as a (possibly sparse) bit matrix C, and utilize the given NFA implementation to accelerate the left and right expansions: I.e. the right expansion C with the character σ is given by C [i, j] = k 1 (j,σ)c[i, k]. 7 Conclusion We have presented a new algorithm to match strings to regular expressions. This algorithm does evidently not perform well in the worst case, but rather is suited for real life regular expressions as we encounter in the industry (e.g., in security filtering scenarios or in IT monitoring scenarios, to name a couple). The algorithm is inspired by the Boyer-Moore algorithm, in jumping on hopeless matches. By doing so, it actually holds a set of segments in the automaton, that might be complementing the computation of the string by the automaton. We have shown that the algorithm is most suitable to be parallelized, and that it can be generalized in many different and permission ways. References AC. A. V. Aho, M. J. Corasick (June 1975) Efficient string matching: An aid to bibliographic search. Communications of the ACM June 1975, 18 (6): B-YR. R. A. Baeza-Yates, M. Régnier Average Running Time of the Boyer-Moore- Horspool Algorithm. Theor. Comput. Sci. 92(1): (1992) 8 BC. M. Becchi, P. Crowley A hybrid finite automaton for practical deep packet inspection. CoNEXT 2007: 1 11 BM. R.S. Boyer, S. J. Moore A fast string matching algorithm Comm. ACM. 01/1977; 20 (10): C-W. B. Commentz-Walter A String Matching Algorithm Fast on the Average ICALP 1979: Extended abstract. 1 G. Z. Galil On improving the worst case running time of the Boyer-Moore string matching algorithm. September 1979 Comm. ACM (New York, NY, USA: Association for Computing Machinery 22 (9): Ke. S. Kearns Accelerated Finite Automata Enable Regular Expression Searching in Sublinear Time 2013: preprint. 1 KMP. D. Knuth, J. H. Morris, V. Pratt Fast pattern matching in strings. SIAM Journal on Computing 1977, 6 (2): SP. R. P. S. Sidhu, V. K. Prasanna Fast Regular Expression Matching Using FPGAs. FCCM 2001: S. R. T. Smythe The Boyer-Moore-Horspool heuristic with Markovian input. Random Structures and Algorithms, Volume (2001) 8 Th. K. Thompson Programming Techniques: Regular expression search algorithm. Communications of the ACM 1968, 11 (6): , 5 Ts. Tsung-Hsi Tsai Average case analysis of the Boyer-Moore algorithm. Random Struct. Algorithms 28(4): (2006). 8 WW. B. W. Watson, R.E. Watson a Boyer-Moore-stayle algorithm for regular expression pattern matching. Science of Computer programming 48 (2003)

CS243, Logic and Computation Nondeterministic finite automata

CS243, Logic and Computation Nondeterministic finite automata CS243, Prof. Alvarez NONDETERMINISTIC FINITE AUTOMATA (NFA) Prof. Sergio A. Alvarez http://www.cs.bc.edu/ alvarez/ Maloney Hall, room 569 alvarez@cs.bc.edu Computer Science Department voice: (67) 552-4333

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for solving

More information

Sri vidya college of engineering and technology

Sri vidya college of engineering and technology Unit I FINITE AUTOMATA 1. Define hypothesis. The formal proof can be using deductive proof and inductive proof. The deductive proof consists of sequence of statements given with logical reasoning in order

More information

An Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching

An Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching An Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching Ricardo Luis de Azevedo da Rocha 1, João José Neto 1 1 Laboratório de Linguagens

More information

Average Case Analysis of the Boyer-Moore Algorithm

Average Case Analysis of the Boyer-Moore Algorithm Average Case Analysis of the Boyer-Moore Algorithm TSUNG-HSI TSAI Institute of Statistical Science Academia Sinica Taipei 115 Taiwan e-mail: chonghi@stat.sinica.edu.tw URL: http://www.stat.sinica.edu.tw/chonghi/stat.htm

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

Languages, regular languages, finite automata

Languages, regular languages, finite automata Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,

More information

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2 BIJU PATNAIK UNIVERSITY OF TECHNOLOGY, ODISHA Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2 Prepared by, Dr. Subhendu Kumar Rath, BPUT, Odisha. UNIT 2 Structure NON-DETERMINISTIC FINITE AUTOMATA

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

CPSC 421: Tutorial #1

CPSC 421: Tutorial #1 CPSC 421: Tutorial #1 October 14, 2016 Set Theory. 1. Let A be an arbitrary set, and let B = {x A : x / x}. That is, B contains all sets in A that do not contain themselves: For all y, ( ) y B if and only

More information

Alpha-Beta Pruning: Algorithm and Analysis

Alpha-Beta Pruning: Algorithm and Analysis Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person

More information

On Boyer-Moore Preprocessing

On Boyer-Moore Preprocessing On Boyer-Moore reprocessing Heikki Hyyrö Department of Computer Sciences University of Tampere, Finland Heikki.Hyyro@cs.uta.fi Abstract robably the two best-known exact string matching algorithms are the

More information

(a) Definition of TMs. First Problem of URMs

(a) Definition of TMs. First Problem of URMs Sec. 4: Turing Machines First Problem of URMs (a) Definition of the Turing Machine. (b) URM computable functions are Turing computable. (c) Undecidability of the Turing Halting Problem That incrementing

More information

Chapter 5. Finite Automata

Chapter 5. Finite Automata Chapter 5 Finite Automata 5.1 Finite State Automata Capable of recognizing numerous symbol patterns, the class of regular languages Suitable for pattern-recognition type applications, such as the lexical

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1 Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt

More information

The efficiency of identifying timed automata and the power of clocks

The efficiency of identifying timed automata and the power of clocks The efficiency of identifying timed automata and the power of clocks Sicco Verwer a,b,1,, Mathijs de Weerdt b, Cees Witteveen b a Eindhoven University of Technology, Department of Mathematics and Computer

More information

CSC236 Week 10. Larry Zhang

CSC236 Week 10. Larry Zhang CSC236 Week 10 Larry Zhang 1 Today s Topic Deterministic Finite Automata (DFA) 2 Recap of last week We learned a lot of terminologies alphabet string length of string union concatenation Kleene star language

More information

CS 4120 Lecture 3 Automating lexical analysis 29 August 2011 Lecturer: Andrew Myers. 1 DFAs

CS 4120 Lecture 3 Automating lexical analysis 29 August 2011 Lecturer: Andrew Myers. 1 DFAs CS 42 Lecture 3 Automating lexical analysis 29 August 2 Lecturer: Andrew Myers A lexer generator converts a lexical specification consisting of a list of regular expressions and corresponding actions into

More information

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata CISC 4090: Theory of Computation Chapter Regular Languages Xiaolan Zhang, adapted from slides by Prof. Werschulz Section.: Finite Automata Fordham University Department of Computer and Information Sciences

More information

CS 154. Finite Automata, Nondeterminism, Regular Expressions

CS 154. Finite Automata, Nondeterminism, Regular Expressions CS 54 Finite Automata, Nondeterminism, Regular Expressions Read string left to right The DFA accepts a string if the process ends in a double circle A DFA is a 5-tuple M = (Q, Σ, δ, q, F) Q is the set

More information

Proving languages to be nonregular

Proving languages to be nonregular Proving languages to be nonregular We already know that there exist languages A Σ that are nonregular, for any choice of an alphabet Σ. This is because there are uncountably many languages in total and

More information

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is Abstraction of Problems Data: abstracted as a word in a given alphabet. Σ: alphabet, a finite, non-empty set of symbols. Σ : all the words of finite length built up using Σ: Conditions: abstracted as a

More information

2. Exact String Matching

2. Exact String Matching 2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.

More information

Final exam study sheet for CS3719 Turing machines and decidability.

Final exam study sheet for CS3719 Turing machines and decidability. Final exam study sheet for CS3719 Turing machines and decidability. A Turing machine is a finite automaton with an infinite memory (tape). Formally, a Turing machine is a 6-tuple M = (Q, Σ, Γ, δ, q 0,

More information

Analysis of Algorithms Prof. Karen Daniels

Analysis of Algorithms Prof. Karen Daniels UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Spring, 2012 Tuesday, 4/24/2012 String Matching Algorithms Chapter 32* * Pseudocode uses 2 nd edition conventions 1 Chapter

More information

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen Pushdown Automata Notes on Automata and Theory of Computation Chia-Ping Chen Department of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan ROC Pushdown Automata p. 1

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift

More information

Enhancing Active Automata Learning by a User Log Based Metric

Enhancing Active Automata Learning by a User Log Based Metric Master Thesis Computing Science Radboud University Enhancing Active Automata Learning by a User Log Based Metric Author Petra van den Bos First Supervisor prof. dr. Frits W. Vaandrager Second Supervisor

More information

10. The GNFA method is used to show that

10. The GNFA method is used to show that CSE 355 Midterm Examination 27 February 27 Last Name Sample ASU ID First Name(s) Ima Exam # Sample Regrading of Midterms If you believe that your grade has not been recorded correctly, return the entire

More information

Nondeterminism. September 7, Nondeterminism

Nondeterminism. September 7, Nondeterminism September 7, 204 Introduction is a useful concept that has a great impact on the theory of computation Introduction is a useful concept that has a great impact on the theory of computation So far in our

More information

CP405 Theory of Computation

CP405 Theory of Computation CP405 Theory of Computation BB(3) q 0 q 1 q 2 0 q 1 1R q 2 0R q 2 1L 1 H1R q 1 1R q 0 1L Growing Fast BB(3) = 6 BB(4) = 13 BB(5) = 4098 BB(6) = 3.515 x 10 18267 (known) (known) (possible) (possible) Language:

More information

arxiv: v3 [cs.fl] 2 Jul 2018

arxiv: v3 [cs.fl] 2 Jul 2018 COMPLEXITY OF PREIMAGE PROBLEMS FOR DETERMINISTIC FINITE AUTOMATA MIKHAIL V. BERLINKOV arxiv:1704.08233v3 [cs.fl] 2 Jul 2018 Institute of Natural Sciences and Mathematics, Ural Federal University, Ekaterinburg,

More information

UNIT-III REGULAR LANGUAGES

UNIT-III REGULAR LANGUAGES Syllabus R9 Regulation REGULAR EXPRESSIONS UNIT-III REGULAR LANGUAGES Regular expressions are useful for representing certain sets of strings in an algebraic fashion. In arithmetic we can use the operations

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) CS/ECE 374: Algorithms & Models of Computation, Fall 28 Deterministic Finite Automata (DFAs) Lecture 3 September 4, 28 Chandra Chekuri (UIUC) CS/ECE 374 Fall 28 / 33 Part I DFA Introduction Chandra Chekuri

More information

Theory of Computation

Theory of Computation Thomas Zeugmann Hokkaido University Laboratory for Algorithmics http://www-alg.ist.hokudai.ac.jp/ thomas/toc/ Lecture 10: CF, PDAs and Beyond Greibach Normal Form I We want to show that all context-free

More information

1 More finite deterministic automata

1 More finite deterministic automata CS 125 Section #6 Finite automata October 18, 2016 1 More finite deterministic automata Exercise. Consider the following game with two players: Repeatedly flip a coin. On heads, player 1 gets a point.

More information

Computational Models - Lecture 5 1

Computational Models - Lecture 5 1 Computational Models - Lecture 5 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 10/22, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Deterministic Finite Automata

Deterministic Finite Automata Deterministic Finite Automata COMP2600 Formal Methods for Software Engineering Ranald Clouston Australian National University Semester 2, 2013 COMP 2600 Deterministic Finite Automata 1 Pop quiz What is

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) Algorithms & Models of Computation CS/ECE 374, Fall 27 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, September 5, 27 Sariel Har-Peled (UIUC) CS374 Fall 27 / 36 Part I DFA Introduction Sariel

More information

Kleene Algebras and Algebraic Path Problems

Kleene Algebras and Algebraic Path Problems Kleene Algebras and Algebraic Path Problems Davis Foote May 8, 015 1 Regular Languages 1.1 Deterministic Finite Automata A deterministic finite automaton (DFA) is a model of computation that can simulate

More information

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering Tasks of lexer CISC 5920: Compiler Construction Chapter 2 Lexical Analysis Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Copyright Arthur G. Werschulz, 2017. All

More information

Confusion of Memory. Lawrence S. Moss. Department of Mathematics Indiana University Bloomington, IN USA February 14, 2008

Confusion of Memory. Lawrence S. Moss. Department of Mathematics Indiana University Bloomington, IN USA February 14, 2008 Confusion of Memory Lawrence S. Moss Department of Mathematics Indiana University Bloomington, IN 47405 USA February 14, 2008 Abstract It is a truism that for a machine to have a useful access to memory

More information

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010 University of Virginia - cs3102: Theory of Computation Spring 2010 PS2 - Comments Average: 77.4 (full credit for each question is 100 points) Distribution (of 54 submissions): 90, 12; 80 89, 11; 70-79,

More information

String Matching with Variable Length Gaps

String Matching with Variable Length Gaps String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length

More information

Turing Machines, diagonalization, the halting problem, reducibility

Turing Machines, diagonalization, the halting problem, reducibility Notes on Computer Theory Last updated: September, 015 Turing Machines, diagonalization, the halting problem, reducibility 1 Turing Machines A Turing machine is a state machine, similar to the ones we have

More information

What we have done so far

What we have done so far What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.

More information

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

September 11, Second Part of Regular Expressions Equivalence with Finite Aut Second Part of Regular Expressions Equivalence with Finite Automata September 11, 2013 Lemma 1.60 If a language is regular then it is specified by a regular expression Proof idea: For a given regular language

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 5-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY NON-DETERMINISM and REGULAR OPERATIONS THURSDAY JAN 6 UNION THEOREM The union of two regular languages is also a regular language Regular Languages Are

More information

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication

Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Decentralized Control of Discrete Event Systems with Bounded or Unbounded Delay Communication Stavros Tripakis Abstract We introduce problems of decentralized control with communication, where we explicitly

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Spring 27 Alexis Maciel Department of Computer Science Clarkson University Copyright c 27 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever.  ETH Zürich (D-ITET) September, Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) September, 24 2015 Last week was all about Deterministic Finite Automaton We saw three main

More information

(Refer Slide Time: 0:21)

(Refer Slide Time: 0:21) Theory of Computation Prof. Somenath Biswas Department of Computer Science and Engineering Indian Institute of Technology Kanpur Lecture 7 A generalisation of pumping lemma, Non-deterministic finite automata

More information

Finite Automata. Mahesh Viswanathan

Finite Automata. Mahesh Viswanathan Finite Automata Mahesh Viswanathan In this lecture, we will consider different models of finite state machines and study their relative power. These notes assume that the reader is familiar with DFAs,

More information

Space-aware data flow analysis

Space-aware data flow analysis Space-aware data flow analysis C. Bernardeschi, G. Lettieri, L. Martini, P. Masci Dip. di Ingegneria dell Informazione, Università di Pisa, Via Diotisalvi 2, 56126 Pisa, Italy {cinzia,g.lettieri,luca.martini,paolo.masci}@iet.unipi.it

More information

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Before we show how languages can be proven not regular, first, how would we show a language is regular? CS35 Proving Languages not to be Regular Before we show how languages can be proven not regular, first, how would we show a language is regular? Although regular languages and automata are quite powerful

More information

Recognizing Safety and Liveness by Alpern and Schneider

Recognizing Safety and Liveness by Alpern and Schneider Recognizing Safety and Liveness by Alpern and Schneider Calvin Deutschbein 17 Jan 2017 1 Intro 1.1 Safety What is safety? Bad things do not happen For example, consider the following safe program in C:

More information

Introduction to Theory of Computing

Introduction to Theory of Computing CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages

More information

Fooling Sets and. Lecture 5

Fooling Sets and. Lecture 5 Fooling Sets and Introduction to Nondeterministic Finite Automata Lecture 5 Proving that a language is not regular Given a language, we saw how to prove it is regular (union, intersection, concatenation,

More information

CSC236 Week 11. Larry Zhang

CSC236 Week 11. Larry Zhang CSC236 Week 11 Larry Zhang 1 Announcements Next week s lecture: Final exam review This week s tutorial: Exercises with DFAs PS9 will be out later this week s. 2 Recap Last week we learned about Deterministic

More information

CS 455/555: Finite automata

CS 455/555: Finite automata CS 455/555: Finite automata Stefan D. Bruda Winter 2019 AUTOMATA (FINITE OR NOT) Generally any automaton Has a finite-state control Scans the input one symbol at a time Takes an action based on the currently

More information

Knuth-Morris-Pratt Algorithm

Knuth-Morris-Pratt Algorithm Knuth-Morris-Pratt Algorithm Jayadev Misra June 5, 2017 The Knuth-Morris-Pratt string matching algorithm (KMP) locates all occurrences of a pattern string in a text string in linear time (in the combined

More information

2. Elements of the Theory of Computation, Lewis and Papadimitrou,

2. Elements of the Theory of Computation, Lewis and Papadimitrou, Introduction Finite Automata DFA, regular languages Nondeterminism, NFA, subset construction Regular Epressions Synta, Semantics Relationship to regular languages Properties of regular languages Pumping

More information

Lecture 3: Nondeterministic Finite Automata

Lecture 3: Nondeterministic Finite Automata Lecture 3: Nondeterministic Finite Automata September 5, 206 CS 00 Theory of Computation As a recap of last lecture, recall that a deterministic finite automaton (DFA) consists of (Q, Σ, δ, q 0, F ) where

More information

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed HKN CS/ECE 374 Midterm 1 Review Nathan Bleier and Mahir Morshed For the most part, all about strings! String induction (to some extent) Regular languages Regular expressions (regexps) Deterministic finite

More information

Efficient Sequential Algorithms, Comp309

Efficient Sequential Algorithms, Comp309 Efficient Sequential Algorithms, Comp309 University of Liverpool 2010 2011 Module Organiser, Igor Potapov Part 2: Pattern Matching References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to

More information

Computational Theory

Computational Theory Computational Theory Finite Automata and Regular Languages Curtis Larsen Dixie State University Computing and Design Fall 2018 Adapted from notes by Russ Ross Adapted from notes by Harry Lewis Curtis Larsen

More information

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism, CS 54, Lecture 2: Finite Automata, Closure Properties Nondeterminism, Why so Many Models? Streaming Algorithms 0 42 Deterministic Finite Automata Anatomy of Deterministic Finite Automata transition: for

More information

UNIT-VIII COMPUTABILITY THEORY

UNIT-VIII COMPUTABILITY THEORY CONTEXT SENSITIVE LANGUAGE UNIT-VIII COMPUTABILITY THEORY A Context Sensitive Grammar is a 4-tuple, G = (N, Σ P, S) where: N Set of non terminal symbols Σ Set of terminal symbols S Start symbol of the

More information

Computer Sciences Department

Computer Sciences Department 1 Reference Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER 3 objectives Finite automaton Infinite automaton Formal definition State diagram Regular and Non-regular

More information

Johns Hopkins Math Tournament Proof Round: Automata

Johns Hopkins Math Tournament Proof Round: Automata Johns Hopkins Math Tournament 2018 Proof Round: Automata February 9, 2019 Problem Points Score 1 10 2 5 3 10 4 20 5 20 6 15 7 20 Total 100 Instructions The exam is worth 100 points; each part s point value

More information

Algorithms for pattern involvement in permutations

Algorithms for pattern involvement in permutations Algorithms for pattern involvement in permutations M. H. Albert Department of Computer Science R. E. L. Aldred Department of Mathematics and Statistics M. D. Atkinson Department of Computer Science D.

More information

Fall 1999 Formal Language Theory Dr. R. Boyer. 1. There are other methods of nding a regular expression equivalent to a nite automaton in

Fall 1999 Formal Language Theory Dr. R. Boyer. 1. There are other methods of nding a regular expression equivalent to a nite automaton in Fall 1999 Formal Language Theory Dr. R. Boyer Week Four: Regular Languages; Pumping Lemma 1. There are other methods of nding a regular expression equivalent to a nite automaton in addition to the ones

More information

Dynamic Noninterference Analysis Using Context Sensitive Static Analyses. Gurvan Le Guernic July 14, 2007

Dynamic Noninterference Analysis Using Context Sensitive Static Analyses. Gurvan Le Guernic July 14, 2007 Dynamic Noninterference Analysis Using Context Sensitive Static Analyses Gurvan Le Guernic July 14, 2007 1 Abstract This report proposes a dynamic noninterference analysis for sequential programs. This

More information

CSC173 Workshop: 13 Sept. Notes

CSC173 Workshop: 13 Sept. Notes CSC173 Workshop: 13 Sept. Notes Frank Ferraro Department of Computer Science University of Rochester September 14, 2010 1 Regular Languages and Equivalent Forms A language can be thought of a set L of

More information

Min/Max-Poly Weighting Schemes and the NL vs UL Problem

Min/Max-Poly Weighting Schemes and the NL vs UL Problem Min/Max-Poly Weighting Schemes and the NL vs UL Problem Anant Dhayal Jayalal Sarma Saurabh Sawlani May 3, 2016 Abstract For a graph G(V, E) ( V = n) and a vertex s V, a weighting scheme (w : E N) is called

More information

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever.   ETH Zürich (D-ITET) October, Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) October, 5 2017 Part 3 out of 5 Last week, we learned about closure and equivalence of regular

More information

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu Part 3 out of 5 ETH Zürich (D-ITET) October, 5 2017 Last week, we learned about closure and equivalence of regular

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Fall 28 Alexis Maciel Department of Computer Science Clarkson University Copyright c 28 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu Part 4 out of 5 ETH Zürich (D-ITET) October, 12 2017 Last week, we showed the equivalence of DFA, NFA and REX

More information

Introduction to Turing Machines. Reading: Chapters 8 & 9

Introduction to Turing Machines. Reading: Chapters 8 & 9 Introduction to Turing Machines Reading: Chapters 8 & 9 1 Turing Machines (TM) Generalize the class of CFLs: Recursively Enumerable Languages Recursive Languages Context-Free Languages Regular Languages

More information

On improving matchings in trees, via bounded-length augmentations 1

On improving matchings in trees, via bounded-length augmentations 1 On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical

More information

Finite Automata Part One

Finite Automata Part One Finite Automata Part One Computability Theory What problems can we solve with a computer? What kind of computer? Computers are Messy http://en.wikipedia.org/wiki/file:eniac.jpg Computers are Messy That

More information

Theory of Computation Prof. Raghunath Tewari Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Theory of Computation Prof. Raghunath Tewari Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Theory of Computation Prof. Raghunath Tewari Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture 10 GNFA to RE Conversion Welcome to the 10th lecture of this course.

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 Automata and Formal Language Theory Course Notes Part II: The Recognition Problem (II) Chapter II.4.: Properties of Regular Languages (13) Anton Setzer (Based on a book draft by J. V. Tucker and

More information

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata : Organization of Programming Languages Theory of Regular Expressions Finite Automata Previous Course Review {s s defined} means the set of string s such that s is chosen or defined as given s A means

More information

Minimization Techniques for Symbolic Automata

Minimization Techniques for Symbolic Automata University of Connecticut OpenCommons@UConn Honors Scholar Theses Honors Scholar Program Spring 5-1-2018 Minimization Techniques for Symbolic Automata Jonathan Homburg jonhom1996@gmail.com Follow this

More information

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages CSci 311, Models of Computation Chapter 4 Properties of Regular Languages H. Conrad Cunningham 29 December 2015 Contents Introduction................................. 1 4.1 Closure Properties of Regular

More information

Lecture 2: Connecting the Three Models

Lecture 2: Connecting the Three Models IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 2: Connecting the Three Models David Mix Barrington and Alexis Maciel July 18, 2000

More information

Discrete Event Systems Exam

Discrete Event Systems Exam Computer Engineering and Networks Laboratory TEC, NSG, DISCO HS 2016 Prof. L. Thiele, Prof. L. Vanbever, Prof. R. Wattenhofer Discrete Event Systems Exam Friday, 3 rd February 2017, 14:00 16:00. Do not

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) Algorithms & Models of Computation CS/ECE 374, Spring 29 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, January 22, 29 L A TEXed: December 27, 28 8:25 Chan, Har-Peled, Hassanieh (UIUC) CS374 Spring

More information

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford Probabilistic Model Checking Michaelmas Term 2011 Dr. Dave Parker Department of Computer Science University of Oxford Probabilistic model checking System Probabilistic model e.g. Markov chain Result 0.5

More information

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Lecture 3 Lexical analysis Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Big picture Source code Front End IR Back End Machine code Errors Front end responsibilities Check

More information

Lecture 3. 1 Terminology. 2 Non-Deterministic Space Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, 2005.

Lecture 3. 1 Terminology. 2 Non-Deterministic Space Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, 2005. Notes on Complexity Theory: Fall 2005 Last updated: September, 2005 Jonathan Katz Lecture 3 1 Terminology For any complexity class C, we define the class coc as follows: coc def = { L L C }. One class

More information

Proclaiming Dictators and Juntas or Testing Boolean Formulae

Proclaiming Dictators and Juntas or Testing Boolean Formulae Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas The Academic College of Tel-Aviv-Yaffo Tel-Aviv, ISRAEL michalp@mta.ac.il Dana Ron Department of EE Systems Tel-Aviv University

More information

6.841/18.405J: Advanced Complexity Wednesday, February 12, Lecture Lecture 3

6.841/18.405J: Advanced Complexity Wednesday, February 12, Lecture Lecture 3 6.841/18.405J: Advanced Complexity Wednesday, February 12, 2003 Lecture Lecture 3 Instructor: Madhu Sudan Scribe: Bobby Kleinberg 1 The language MinDNF At the end of the last lecture, we introduced the

More information

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata. Finite Automata Automata (singular: automation) are a particularly simple, but useful, model of computation. They were initially proposed as a simple model for the behavior of neurons. The concept of a

More information

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS Automata Theory Lecture on Discussion Course of CS2 This Lecture is about Mathematical Models of Computation. Why Should I Care? - Ways of thinking. - Theory can drive practice. - Don t be an Instrumentalist.

More information

Automata and Computability

Automata and Computability Automata and Computability Fall 207 Alexis Maciel Department of Computer Science Clarkson University Copyright c 207 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata 5 2. Turing Machines...............................

More information