A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms
|
|
- Holly Mills
- 5 years ago
- Views:
Transcription
1 A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro Thierry Lecroq University of Catania, Italy University of Rouen, LITIS EA 4108, France Symposium on Eperimental Algorithms June 7th-9th 2012 Bordeau, France
2 Outline 1 Introduction to eact string matching 2 Classical solutions 3 New solutions 4 Eperimental results SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
3 Outline 1 Introduction to eact string matching 2 Classical solutions 3 New solutions 4 Eperimental results SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
4 Eact String Matching Definition Find all the occurrences of a pattern of length m in a tet y of length n., y Σ 2 instances is given first y is given first SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
5 Eact String Matching Interests basic components of many softwares theoretical problems SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
6 Eact String Matching Theory linear time since Morris and Pratt 1970 linear time and constant space O((n log m)/m) in average [Yao 1979] SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
7 Eact String Matching Solutions Many!! see SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
8 Efficient solutions S. Faro and T. Lecroq The Eact Online String Matching Problem: a Review of the Most Recent Results ACM Computing Surveys 45(2) (2013) to appear. SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
9 Outline 1 Introduction to eact string matching 2 Classical solutions 3 New solutions 4 Eperimental results SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
10 Eact String Matching Classical solutions comparisons Knuth-Morris-Pratt (KMP) Boyer-Moore (BM) automata Backward DAWG Matching (with suffi automaton or oracle) (BDM) bit-parallelism Shift Or (SO) Backward Nondeterministic DAWG Matching (BNDM) filtering Karp-Rabin (KR) SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
11 Sliding Window y window beginning SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
12 Sliding Window y window middle SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
13 Sliding Window y window middle SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
14 Sliding Window y window end SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
15 Boyer-Moore (1977) y SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
16 Boyer-Moore (1977) y comparisons b v = a v SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
17 Boyer-Moore (1977) y comparisons b v = a v = c v good suffi shift (1) SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
18 Boyer-Moore (1977) y comparisons b v = a v = good suffi shift (2) v SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
19 Boyer-Moore (1977) y comparisons b v = a v bad character shift b no b SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
20 Fast Search (Cantone-Faro, 2003) uses the bad character shift when a mismatch occurs with the pattern righmost character uses the good suffi shift otherwise SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
21 TVSBS (Thathoo-Virmani-Lakshmi-Balakrishnan-Sekar, 2006) compares first the rightmost character of the window then the leftmost, then all the others uses both the righmost character of the window and the following to perform the shift SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
22 TVSBS (Thathoo-Virmani-Lakshmi-Balakrishnan-Sekar, 2006) y SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
23 TVSBS (Thathoo-Virmani-Lakshmi-Balakrishnan-Sekar, 2006) y a = a SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
24 TVSBS (Thathoo-Virmani-Lakshmi-Balakrishnan-Sekar, 2006) y b = b a = a SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
25 TVSBS (Thathoo-Virmani-Lakshmi-Balakrishnan-Sekar, 2006) y b = b comparisons u = u c d a = a SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
26 TVSBS (Thathoo-Virmani-Lakshmi-Balakrishnan-Sekar, 2006) forward character comparisons y b = b u = u c d a = a e Berry-Ravindran shift a e no ae SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
27 BNDM (Navarro & Raffinot, 1998) C A T A SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
28 BNDM (Navarro & Raffinot, 1998) C A T A S A S C S G S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
29 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
30 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
31 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
32 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
33 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S T = SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
34 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S T = Shift SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
35 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
36 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S A SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
37 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S A = SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
38 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S A = Shift SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
39 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
40 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S C SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
41 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S C = SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
42 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S C = Shit SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
43 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
44 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S C SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
45 BNDM (Navarro & Raffinot, 1998) C A T A y C C A T A C S A S C S G S T And with S C = SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
46 Outline 1 Introduction to eact string matching 2 Classical solutions 3 New solutions 4 Eperimental results SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
47 Partition the tet in k/2 substrings and use k windows m < n/k and k is even (A) s 0 }{{} w 0 (B) s 0 }{{} { w 1 }} { s 1 w 0 (C) s 0 }{{} w 0 s 1 w 2 {}}{ } {{ } w 1 s 2 { w 3 }} { s 3 A general scheme for the multiple sliding windows matcher with: (A) 1 window (B) 2 windows (C) 4 windows SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
48 A General Multiple Sliding Windows Approach Then process simultaneously the k different tet windows where w 0, w 1,..., w k 1 w 2i = y[(2n/k)i.. (2n/k)i + m 1] and left windows, goes to right w 2i+1 = y[(2n/k)(i + 1) 1.. (2n/k)(i + 1) + m 2] right windows, goes to left for i = 0,..., (k 2)/2 SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
49 Ending situation For each couple of windows (w 2i, w 2i+1 ) the sliding process ends when the window w 2i slides over the window w 2i+1 no occurrence can be missed due to the m 1 overlapping characters between adjacent substrings SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
50 Outline 1 Introduction to eact string matching 2 Classical solutions 3 New solutions 4 Eperimental results SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
51 SMART: String Matching Algorithm Research Tool SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
52 SMART: String Matching Algorithm Research Tool more than 80 string matching algorithms a corpus of 12 tets select/deselect string matching algorithms output eperimental results in L A TEX, ml, html and tt formats easy to plug new algorithms SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
53 Eperiments Random tets of length n = 4, 000, 000 on alphabets of size 16, 32 and 64 for m = 2, 4, 8, 16, 32, 64 with algorithms: Fast Search TVSBS SBNDM FSBNDM and k = 1, 2, 4, 6, 8 windows SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
54 Eperimental results Fast Search TVSBS SBNDM FSBNDM SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
55 Eperiments Random tets of length n = 4, 000, 000 on alphabets of size 16, 32 and 64 for m = 2, 4, 8, 16, 32, 64, 128, 256, 512 and with a protein and a natural language (English) files with algorithms: known EBOM HASH FSBNDM QF new FS-W FSBNDM-W SBNDM-W TVSBS-W SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
56 Eperimental results Alphabet size 32, short patterns SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
57 Eperimental results Alphabet size 32, long patterns SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
58 Eperimental results Protein alphabet (size 20), short patterns SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
59 Eperimental results Protein alphabet (size 20), long patterns SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
60 Eperimental results English tet, short patterns SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
61 Eperimental results English tet, long patterns SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
62 Perspectives implement other known eact string matching algorithms in this multiple window framework truly parallelize apply it to other related problems SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
63 Thank you for your attention! SF&TL (Catania & Rouen) String Matching with Multiple Windows SEA / 34
Three new strategies for exact string matching
Three new strategies for exact string matching Simone Faro 1 Thierry Lecroq 2 1 University of Catania, Italy 2 University of Rouen, LITIS EA 4108, France SeqBio 2012 November 26th-27th 2012 Marne-la-Vallée,
More informationModule 9: Tries and String Matching
Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School
More informationPATTERN MATCHING WITH SWAPS IN PRACTICE
International Journal of Foundations of Computer Science c World Scientific Publishing Company PATTERN MATCHING WITH SWAPS IN PRACTICE MATTEO CAMPANELLI Università di Catania, Scuola Superiore di Catania
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt
More informationAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /
More information2. Exact String Matching
2. Exact String Matching Let T = T [0..n) be the text and P = P [0..m) the pattern. We say that P occurs in T at position j if T [j..j + m) = P. Example: P = aine occurs at position 6 in T = karjalainen.
More informationText Searching. Thierry Lecroq Laboratoire d Informatique, du Traitement de l Information et des
Text Searching Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, du Traitement de l Information et des Systèmes. International PhD School in Formal Languages and Applications Tarragona,
More informationPSC Prague Stringology Club
Proceedings of the Prague Stringology Conference 2011 Edited by Jan Holub and Jan Žd árek August 2011 PSC Prague Stringology Club http://www.stringology.org/ Proceedings of the Prague Stringology Conference
More informationString Search. 6th September 2018
String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift
More informationINF 4130 / /8-2017
INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve
More informationAlgorithms: COMP3121/3821/9101/9801
NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales LECTURE 8: STRING MATCHING ALGORITHMS COMP3121/3821/9101/9801
More informationAverage Complexity of Exact and Approximate Multiple String Matching
Average Complexity of Exact and Approximate Multiple String Matching Gonzalo Navarro Department of Computer Science University of Chile gnavarro@dcc.uchile.cl Kimmo Fredriksson Department of Computer Science
More informationData Structure for Dynamic Patterns
Data Structure for Dynamic Patterns Chouvalit Khancome and Veera Booning Member IAENG Abstract String matching and dynamic dictionary matching are significant principles in computer science. These principles
More informationOverview. Knuth-Morris-Pratt & Boyer-Moore Algorithms. Notation Review (2) Notation Review (1) The Kunth-Morris-Pratt (KMP) Algorithm
Knuth-Morris-Pratt & s by Robert C. St.Pierre Overview Notation review Knuth-Morris-Pratt algorithm Discussion of the Algorithm Example Boyer-Moore algorithm Discussion of the Algorithm Example Applications
More informationString Matching. Thanks to Piotr Indyk. String Matching. Simple Algorithm. for s 0 to n-m. Match 0. for j 1 to m if T[s+j] P[j] then
String Matching Thanks to Piotr Indyk String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p Example: Σ={,a,b,,z}
More informationLecture 3: String Matching
COMP36111: Advanced Algorithms I Lecture 3: String Matching Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2017 18 Outline The string matching problem The Rabin-Karp algorithm The Knuth-Morris-Pratt
More informationAnalysis of Algorithms Prof. Karen Daniels
UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Spring, 2012 Tuesday, 4/24/2012 String Matching Algorithms Chapter 32* * Pseudocode uses 2 nd edition conventions 1 Chapter
More informationSearching Sear ( Sub- (Sub )Strings Ulf Leser
Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf
More informationLecture 5: The Shift-And Method
Biosequence Algorithms, Spring 2005 Lecture 5: The Shift-And Method Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 5: Shift-And p.1/19 Seminumerical String Matching Most
More informationINF 4130 / /8-2014
INF 4130 / 9135 26/8-2014 Mandatory assignments («Oblig-1», «-2», and «-3»): All three must be approved Deadlines around: 25. sept, 25. oct, and 15. nov Other courses on similar themes: INF-MAT 3370 INF-MAT
More informationSamson Zhou. Pattern Matching over Noisy Data Streams
Samson Zhou Pattern Matching over Noisy Data Streams Finding Structure in Data Pattern Matching Finding all instances of a pattern within a string ABCD ABCAABCDAACAABCDBCABCDADDDEAEABCDA Knuth-Morris-Pratt
More information15 Text search. P.D. Dr. Alexander Souza. Winter term 11/12
Algorithms Theory 15 Text search P.D. Dr. Alexander Souza Text search Various scenarios: Dynamic texts Text editors Symbol manipulators Static texts Literature databases Library systems Gene databases
More informationExact Circular Pattern Matching Using the BNDM Algorithm
Exact Circular Pattern Matching Using the BNDM Algorithm K. H. Chen, G. S. Huang and R. C. T. Lee Department of Computer Science and Information Engineering, National Chi Nan University, Puli, Nantou,
More informationCMP 309: Automata Theory, Computability and Formal Languages. Adapted from the work of Andrej Bogdanov
CMP 309: Automata Theory, Computability and Formal Languages Adapted from the work of Andrej Bogdanov Course outline Introduction to Automata Theory Finite Automata Deterministic Finite state automata
More informationOn Boyer-Moore Preprocessing
On Boyer-Moore reprocessing Heikki Hyyrö Department of Computer Sciences University of Tampere, Finland Heikki.Hyyro@cs.uta.fi Abstract robably the two best-known exact string matching algorithms are the
More informationAverage Case Analysis of the Boyer-Moore Algorithm
Average Case Analysis of the Boyer-Moore Algorithm TSUNG-HSI TSAI Institute of Statistical Science Academia Sinica Taipei 115 Taiwan e-mail: chonghi@stat.sinica.edu.tw URL: http://www.stat.sinica.edu.tw/chonghi/stat.htm
More informationProofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.
Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G
More informationOn-line String Matching in Highly Similar DNA Sequences
On-line String Matching in Highly Similar DNA Sequences Nadia Ben Nsira 1,2,ThierryLecroq 1,,MouradElloumi 2 1 LITIS EA 4108, Normastic FR3638, University of Rouen, France 2 LaTICE, University of Tunis
More informationAlgorithm Theory. 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore. Christian Schindelhauer
Algorithm Theory 13 Text Search - Knuth, Morris, Pratt, Boyer, Moore Institut für Informatik Wintersemester 2007/08 Text Search Scenarios Static texts Literature databases Library systems Gene databases
More informationGraduate Algorithms CS F-20 String Matching
Graduate Algorithms CS673-2016F-20 String Matching David Galles Department of Computer Science University of San Francisco 20-0: String Matching Given a source text, and a string to match, where does the
More informationImproving the KMP Algorithm by Using Properties of Fibonacci String
Improving the KMP Algorithm by Using Properties of Fibonacci String Yi-Kung Shieh and R. C. T. Lee Department of Computer Science National Tsing Hua University d9762814@oz.nthu.edu.tw and rctlee@ncnu.edu.tw
More informationOnline Computation of Abelian Runs
Online Computation of Abelian Runs Gabriele Fici 1, Thierry Lecroq 2, Arnaud Lefebvre 2, and Élise Prieur-Gaston2 1 Dipartimento di Matematica e Informatica, Università di Palermo, Italy Gabriele.Fici@unipa.it
More informationNew Inverted Lists-Multiple String Patterns Matching Algorithm
SSN 2348-1196 (print) nternational Journal of Computer Science and nformation Technology Research SSN 2348-120X (online) Vol. 2, ssue 4, pp: (254-264), Month: October - December 2014, Available at: www.researchpublish.com
More informationEfficient Sequential Algorithms, Comp309
Efficient Sequential Algorithms, Comp309 University of Liverpool 2010 2011 Module Organiser, Igor Potapov Part 2: Pattern Matching References: T. H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to
More informationCompror: On-line lossless data compression with a factor oracle
Information Processing Letters 83 (2002) 1 6 Compror: On-line lossless data compression with a factor oracle Arnaud Lefebvre a,, Thierry Lecroq b a UMR CNRS 6037 ABISS, Faculté des Sciences et Techniques,
More informationString Matching with Variable Length Gaps
String Matching with Variable Length Gaps Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind Technical University of Denmark Abstract. We consider string matching with variable length
More informationarxiv: v1 [cs.fl] 29 Jun 2013
On a compact encoding of the swap automaton Kimmo Fredriksson 1 and Emanuele Giaquinta 2 arxiv:1307.0099v1 [cs.fl] 29 Jun 2013 1 School of Computing, University of Eastern Finland kimmo.fredriksson@uef.fi
More informationText Analytics. Searching Terms. Ulf Leser
Text Analytics Searching Terms Ulf Leser A Probabilistic Interpretation of Relevance We want to compute the probability that a doc d is relevant to query q The probabilistic model determines this probability
More informationPattern Matching (Exact Matching) Overview
CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm
More informationFast String Searching. J Strother Moore Department of Computer Sciences University of Texas at Austin
Fast String Searching J Strother Moore Department of Computer Sciences University of Texas at Austin 1 The Problem One of the classic problems in computing is string searching: find the first occurrence
More informationAn Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching
An Adaptive Finite-State Automata Application to the problem of Reducing the Number of States in Approximate String Matching Ricardo Luis de Azevedo da Rocha 1, João José Neto 1 1 Laboratório de Linguagens
More informationOn the Decision Tree Complexity of String Matching. December 29, 2017
On the Decision Tree Complexity of String Matching Xiaoyu He 1 Neng Huang 1 Xiaoming Sun 2 December 29, 2017 arxiv:1712.09738v1 [cs.cc] 28 Dec 2017 Abstract String matching is one of the most fundamental
More informationString Matching II. Algorithm : Design & Analysis [19]
String Matching II Algorithm : Design & Analysis [19] In the last class Simple String Matching KMP Flowchart Construction Jump at Fail KMP Scan String Matching II Boyer-Moore s heuristics Skipping unnecessary
More informationMultiple Pattern Matching
Multiple Pattern Matching Stephen Fulwider and Amar Mukherjee College of Engineering and Computer Science University of Central Florida Orlando, FL USA Email: {stephen,amar}@cs.ucf.edu Abstract In this
More information58093 String Processing Algorithms. Lectures, Fall 2010, period II
58093 String Processing Algorithms Lectures, Fall 2010, period II Juha Kärkkäinen 1 Who is this course for? Master s level course in Computer Science, 4 cr Subprogram of Algorithms and Machine Learning
More informationString Matching. Jayadev Misra The University of Texas at Austin December 5, 2003
String Matching Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 Rabin-Karp Algorithm 3 3 Knuth-Morris-Pratt Algorithm 5 3.1 Informal Description.........................
More informationHow do regular expressions work? CMSC 330: Organization of Programming Languages
How do regular expressions work? CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata What we ve learned What regular expressions are What they can express, and cannot
More informationSUBSTRING SEARCH BBM ALGORITHMS DEPT. OF COMPUTER ENGINEERING
BBM 202 - LGORITHMS DEPT. OF OMPUTER ENGINEERING SUBSTRING SERH cknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton University. 1 TODY Substring
More informationUNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r
Syllabus R9 Regulation UNIT-II NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: In the automata theory, a nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite
More informationCS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa
CS:4330 Theory of Computation Spring 2018 Regular Languages Finite Automata and Regular Expressions Haniel Barbosa Readings for this lecture Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.
More informationA Pattern Matching Algorithm Using Deterministic Finite Automata with Infixes Checking. Jung-Hua Hsu
A Pattern Matching Algorithm Using Deterministic Finite Automata with Infixes Checking Jung-Hua Hsu A Pattern Matching Algorithm Using Deterministic Finite Automata with Infixes Checking Student:Jung-Hua
More informationAll three must be approved Deadlines around: 21. sept, 26. okt, and 16. nov
INF 4130 / 9135 29/8-2012 Today s slides are produced mainly by Petter Kristiansen Lecturer Stein Krogdahl Mandatory assignments («Oblig1», «-2», and «-3»): All three must be approved Deadlines around:
More informationKnuth-Morris-Pratt Algorithm
Knuth-Morris-Pratt Algorithm The roblem of tring Matching Given a string, the roblem of string matching deals with finding whether a attern occurs in and if does occur then returning osition in where occurs.
More informationSUBSTRING SEARCH BBM ALGORITHMS DEPT. OF COMPUTER ENGINEERING ERKUT ERDEM. Apr. 28, 2015
BBM 202 - LGORITHMS DEPT. OF OMPUTER ENGINEERING ERKUT ERDEM SUBSTRING SERH pr. 28, 2015 cknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton
More informationThe streaming k-mismatch problem
The streaming k-mismatch problem Raphaël Clifford 1, Tomasz Kociumaka 2, and Ely Porat 3 1 Department of Computer Science, University of Bristol, United Kingdom raphael.clifford@bristol.ac.uk 2 Institute
More informationDeleting and Testing Forbidden Patterns in Multi-Dimensional Arrays
Deleting and Testing Forbidden Patterns in Multi-Dimensional Arrays Omri Ben-Eliezer Simon Korman Daniel Reichman March 27, 2017 Abstract Understanding the local behaviour of structured multi-dimensional
More informationECE260: Fundamentals of Computer Engineering
Data Representation & 2 s Complement James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Data
More informationClosure under the Regular Operations
September 7, 2013 Application of NFA Now we use the NFA to show that collection of regular languages is closed under regular operations union, concatenation, and star Earlier we have shown this closure
More informationTheoretical Computer Science
Theoretical Computer Science 443 (2012) 25 34 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs String matching with variable
More informationIntro to Theory of Computation
Intro to Theory of Computation 1/19/2016 LECTURE 3 Last time: DFAs and NFAs Operations on languages Today: Nondeterminism Equivalence of NFAs and DFAs Closure properties of regular languages Sofya Raskhodnikova
More information5.3 Substring Search
5.3 Substring Search brute force Knuth-Morris-Pratt Boyer-Moore Rabin-Karp lgorithms, 4 th Edition Robert Sedgewick and Kevin Wayne opyright 2002 2010 pril 5, 2011 9:29:31 PM Substring search Goal. Find
More informationTheory of Computation (II) Yijia Chen Fudan University
Theory of Computation (II) Yijia Chen Fudan University Review A language L is a subset of strings over an alphabet Σ. Our goal is to identify those languages that can be recognized by one of the simplest
More informationOn Pattern Matching With Swaps
On Pattern Matching With Swaps Fouad B. Chedid Dhofar University, Salalah, Oman Notre Dame University - Louaize, Lebanon P.O.Box: 2509, Postal Code 211 Salalah, Oman Tel: +968 23237200 Fax: +968 23237720
More informationDeleting and Testing Forbidden Patterns in Multi-Dimensional Arrays
Deleting and Testing Forbidden Patterns in Multi-Dimensional Arrays Omri Ben-Eliezer 1, Simon Korman 2, and Daniel Reichman 3 1 Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
More informationCSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )
CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont ) Sungjin Im University of California, Merced 2-3-214 Example II A ɛ B ɛ D F C E Example II A ɛ B ɛ D F C E NFA accepting
More informationMore Speed and More Compression: Accelerating Pattern Matching by Text Compression
More Speed and More Compression: Accelerating Pattern Matching by Text Compression Tetsuya Matsumoto, Kazuhito Hagio, and Masayuki Takeda Department of Informatics, Kyushu University, Fukuoka 819-0395,
More informationCSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata
CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata Sungjin Im University of California, Merced 1-27-215 Nondeterminism Michael Rabin and Dana Scott (1959) Michael Rabin Dana
More informationString Matching Problem
String Matching Problem Pattern P Text T Set of Locations L 9/2/23 CAP/CGS 5991: Lecture 2 Computer Science Fundamentals Specify an input-output description of the problem. Design a conceptual algorithm
More informationCounting and Verifying Maximal Palindromes
Counting and Verifying Maximal Palindromes Tomohiro I 1, Shunsuke Inenaga 2, Hideo Bannai 1, and Masayuki Takeda 1 1 Department of Informatics, Kyushu University 2 Graduate School of Information Science
More informationJava II Finite Automata I
Java II Finite Automata I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz November, 23 Processing Regular Expressions We already learned about Java s regular expression
More informationEfficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem
Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Hsing-Yen Ann National Center for High-Performance Computing Tainan 74147, Taiwan Chang-Biau Yang and Chiou-Ting
More informationAlgorithms. Algorithms 5.3 SUBSTRING SEARCH. introduction brute force Knuth Morris Pratt Boyer Moore Rabin Karp ROBERT SEDGEWICK KEVIN WAYNE
lgorithms ROBERT SEDGEWIK KEVIN WYNE 5.3 SUBSTRING SERH lgorithms F O U R T H E D I T I O N ROBERT SEDGEWIK KEVIN WYNE introduction brute force Knuth Morris Pratt Boyer Moore Rabin Karp http://algs4.cs.princeton.edu
More informationClarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.
CMSC 330: Organization of Programming Languages Last Lecture Languages Sets of strings Operations on languages Finite Automata Regular expressions Constants Operators Precedence CMSC 330 2 Clarifications
More informationDiscovering Most Classificatory Patterns for Very Expressive Pattern Classes
Discovering Most Classificatory Patterns for Very Expressive Pattern Classes Masayuki Takeda 1,2, Shunsuke Inenaga 1,2, Hideo Bannai 3, Ayumi Shinohara 1,2, and Setsuo Arikawa 1 1 Department of Informatics,
More informationSimple Compression Code Supporting Random Access and Fast String Matching
Simple Compression Code Supporting Random Access and Fast String Matching Kimmo Fredriksson and Fedor Nikitin Department of Computer Science and Statistics, University of Joensuu PO Box 111, FIN 80101
More informationThe NFA Segments Scan Algorithm
The NFA Segments Scan Algorithm Omer Barkol, David Lehavi HP Laboratories HPL-2014-10 Keyword(s): formal languages; regular expression; automata Abstract: We present a novel way for parsing text with non
More informationApproximate Pattern Matching and the Query Complexity of Edit Distance
Krzysztof Onak Approximate Pattern Matching p. 1/20 Approximate Pattern Matching and the Query Complexity of Edit Distance Joint work with: Krzysztof Onak MIT Alexandr Andoni (CCI) Robert Krauthgamer (Weizmann
More informationPart I: Definitions and Properties
Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States
More informationStreaming for Aibohphobes: Longest Near-Palindrome under Hamming Distance
Streaming for Aibohphobes: Longest Near-Palindrome under Hamming Distance Elena Grigorescu, Purdue University Erfan Sadeqi Azer, Indiana University Samson Zhou, Purdue University Structure of Talk Background
More informationCS4026 Formal Models of Computation
CS4026 Formal Models of Computation Turing Machines Turing Machines Abstract but accurate model of computers Proposed by Alan Turing in 1936 There weren t computers back then! Turing s motivation: find
More informationSection Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence
Section 2.4 Section Summary Sequences. Examples: Geometric Progression, Arithmetic Progression Recurrence Relations Example: Fibonacci Sequence Summations Introduction Sequences are ordered lists of elements.
More informationHomework Assignment 6 Answers
Homework Assignment 6 Answers CSCI 2670 Introduction to Theory of Computing, Fall 2016 December 2, 2016 This homework assignment is about Turing machines, decidable languages, Turing recognizable languages,
More informationWhere to Use and How not to Use Polynomial String Hashing
Olympiads in Informatics, 2013, Vol. 7, 90 100 90 2013 Vilnius University Where to Use and How not to Use Polynomial String Hashing Jakub PACHOCKI, Jakub RADOSZEWSKI Faculty of Mathematics, Informatics
More informationPSC Prague Stringology Club
Proceedings of the Prague Stringology Conference 2013 Edited by Jan Holub and Jan Žd árek September 2013 PSC Prague Stringology Club http://www.stringology.org/ Conference Organisation Program Committee
More informationInferring Strings from Graphs and Arrays
Inferring Strings from Graphs and Arrays Hideo Bannai 1, Shunsuke Inenaga 2, Ayumi Shinohara 2,3, and Masayuki Takeda 2,3 1 Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1
More informationString Range Matching
String Range Matching Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi Department of Computer Science, University of Helsinki Helsinki, Finland firstname.lastname@cs.helsinki.fi Abstract. Given strings
More informationFinite Automata. Seungjin Choi
Finite Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 28 Outline
More informationParallel Rabin-Karp Algorithm Implementation on GPU (preliminary version)
Bulletin of Networking, Computing, Systems, and Software www.bncss.org, ISSN 2186-5140 Volume 7, Number 1, pages 28 32, January 2018 Parallel Rabin-Karp Algorithm Implementation on GPU (preliminary version)
More informationNondeterministic Finite Automata. Nondeterminism Subset Construction
Nondeterministic Finite Automata Nondeterminism Subset Construction 1 Nondeterminism A nondeterministic finite automaton has the ability to be in several states at once. Transitions from a state on an
More informationCPS 220 Theory of Computation REGULAR LANGUAGES
CPS 22 Theory of Computation REGULAR LANGUAGES Introduction Model (def) a miniature representation of a thing; sometimes a facsimile Iraq village mockup for the Marines Scientific modelling - the process
More informationFinite Automata and Regular languages
Finite Automata and Regular languages Huan Long Shanghai Jiao Tong University Acknowledgements Part of the slides comes from a similar course in Fudan University given by Prof. Yijia Chen. http://basics.sjtu.edu.cn/
More informationState Complexity of Neighbourhoods and Approximate Pattern Matching
State Complexity of Neighbourhoods and Approximate Pattern Matching Timothy Ng, David Rappaport, and Kai Salomaa School of Computing, Queen s University, Kingston, Ontario K7L 3N6, Canada {ng, daver, ksalomaa}@cs.queensu.ca
More informationAnomaly Detection. What is an anomaly?
Anomaly Detection Brian Palmer What is an anomaly? the normal behavior of a process is characterized by a model Deviations from the model are called anomalies. Example Applications versus spyware There
More informationCSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182
CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings
More informationMechanized Operational Semantics
Mechanized Operational Semantics J Strother Moore Department of Computer Sciences University of Texas at Austin Marktoberdorf Summer School 2008 (Lecture 5: Boyer-Moore Fast String Searching) 1 The Problem
More informationLinear-Time Computation of Local Periods
Linear-Time Computation of Local Periods Jean-Pierre Duval 1, Roman Kolpakov 2,, Gregory Kucherov 3, Thierry Lecroq 4, and Arnaud Lefebvre 4 1 LIFAR, Université de Rouen, France Jean-Pierre.Duval@univ-rouen.fr
More informationQuantum pattern matching fast on average
Quantum pattern matching fast on average Ashley Montanaro Department of Computer Science, University of Bristol, UK 12 January 2015 Pattern matching In the traditional pattern matching problem, we seek
More informationFinite-state machines (FSMs)
Finite-state machines (FSMs) Dr. C. Constantinides Department of Computer Science and Software Engineering Concordia University Montreal, Canada January 10, 2017 1/19 Finite-state machines (FSMs) and state
More informationTheory of Computation
Thomas Zeugmann Hokkaido University Laboratory for Algorithmics http://www-alg.ist.hokudai.ac.jp/ thomas/toc/ Lecture 3: Finite State Automata Motivation In the previous lecture we learned how to formalize
More information