Fingerprint idea. Assume:

Similar documents
Where did dynamic programming come from?

Module 9: Tries and String Matching

Module 9: Tries and String Matching

Data Structures and Algorithm. Xiaoqing Zheng

Harvard University Computer Science 121 Midterm October 23, 2012

Deterministic Finite Automata

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Homework 3 Solutions

Finite Automata-cont d

Convert the NFA into DFA

Let's start with an example:

Chapter 2 Finite Automata

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Minimal DFA. minimal DFA for L starting from any other

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Lecture 08: Feb. 08, 2019

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes

3 Regular expressions

Balanced binary search trees

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

Designing finite automata II

1 Nondeterministic Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Java II Finite Automata I

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Random subgroups of a free group

Nondeterminism and Nodeterministic Automata

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Formal languages, automata, and theory of computation

CISC 4090 Theory of Computation

Worked out examples Finite Automata

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

First Midterm Examination

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

1 From NFA to regular expression

Languages & Automata

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Regular Language. Nonregular Languages The Pumping Lemma. The pumping lemma. Regular Language. The pumping lemma. Infinitely long words 3/17/15

CS 275 Automata and Formal Language Theory

Lexical Analysis Finite Automate

2.4 Linear Inequalities and Interval Notation

Prefix-Free Regular-Expression Matching

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

Chapter 5 Plan-Space Planning

Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem 2/16/15

CMSC 330: Organization of Programming Languages

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

First Midterm Examination

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

More on automata. Michael George. March 24 April 7, 2014

Chapter 1, Part 1. Regular Languages. CSC527, Chapter 1, Part 1 c 2012 Mitsunori Ogihara 1

CHAPTER 1 Regular Languages. Contents

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Faster Regular Expression Matching. Philip Bille Mikkel Thorup

Linear Inequalities. Work Sheet 1

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

p-adic Egyptian Fractions

New data structures to reduce data size and search time

The size of subsequence automaton

CSE 548: (Design and) Analysis of Algorithms

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

11.1 Finite Automata. CS125 Lecture 11 Fall Motivation: TMs without a tape: maybe we can at least fully understand such a simple model?

AT100 - Introductory Algebra. Section 2.7: Inequalities. x a. x a. x < a

Regular Languages and Applications

Homework Solution - Set 5 Due: Friday 10/03/08

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter

String Matching. CSE 548: (Design and) Analysis of Algorithms. Topics. Terminology

1.4 Nonregular Languages

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 23, April 5, 2016

Bases for Vector Spaces

CS 330 Formal Methods and Models

The graphs of Rational Functions

We will see what is meant by standard form very shortly

Finite-State Automata: Recap

For convenience, we rewrite m2 s m2 = m m m ; where m is repeted m times. Since xyz = m m m nd jxyj»m, we hve tht the string y is substring of the fir

Overview of Today s Lecture:

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

Algorithm Design and Analysis

This lecture covers Chapter 8 of HMU: Properties of CFLs

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

378 Relations Solutions for Chapter 16. Section 16.1 Exercises. 3. Let A = {0,1,2,3,4,5}. Write out the relation R that expresses on A.

Automata and Languages

Special Numbers, Factors and Multiples

Non-Deterministic Finite Automata

1.3 Regular Expressions

Transcription:

Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m 1]), in O(1) f AALG, lecture 3, Simons Šltenis, 2004 f

Algorithm with Fingerprints Let the lphet ={0,1,2,3,4,5,6,7,8,9} Let fingerprint to e just deciml numer, i.e., f( 1045 ) = 1*103 + 0*102 + 4*101 + 5 = 1045 Fingerprint-Serch(T,P) 01 fp compute f(p) 02 f compute f(t[0..m 1]) 03 for s 0 to n m do 04 if fp = f return s 05 f (f T[s]*10 m-1 )*10 + T[s+m] 06 return 1 T[s] new f f T[s+m] Running time 2O(m) + O(n m) = O(n) AALG, lecture 3, Simons Šltenis, 2004

Using Hsh Function Prolem: we cn not ssume we cn do rithmetics with m-digits-long numers in O(1) time Solution: Use hsh function h = f mod q For exmple, if q = 7, h( 52 ) = 52 mod 7 = 3 h(s1) h(s2) S1 S2 But h(s1) = h(s2) does not imply S1=S2 For exmple, if q = 7, h( 73 ) = 3, ut 73 52 Bsic mod q rithmetics: (+) mod q = ( mod q + mod q) mod q (*) mod q = ( mod q)*( mod q) mod q AALG, lecture 3, Simons Šltenis, 2004

Preprocessing nd Stepping Preprocessing: fp = P[m-1] + 10*(P[m-2] + 10*(P[m-3]+ + 10*(P[1] + 10*P[0]) )) mod q In the sme wy compute ft from T[0..m-1] Exmple: P = 2531, q = 7, fp =? Stepping: ft = (ft T[s]*10 m-1 mod q)*10 + T[s+m]) mod q 10 m-1 mod q cn e computed once in the preprocessing Exmple: Let T[ ] = 5319, q = 7, wht is the corresponding ft? T[s] new ft AALG, lecture 3, Simons Šltenis, 2004 ft T[s+m]

Stepping T = 25316446766, m = 4, q=7 T 0 = 2531 ft = 2531 mod 7 = 4 T 1 = 5319 ft = ((ft T[s]*(10 m-1 mod q))*10 + T[s+m]) mod q ft = ((ft T[0]*(10 3 mod 7))*10 + T[0+4]) mod 7 = ((4 (2*1000 mod 7)) * 10 + T[4]) mod 7 = ((4-(2*6))*10+6) mod 7 = (-8*10+ 9) mod 7 = -71 mod 7 = 6 5319 mod 7 = 6

Rin-Krp Algorithm Rin-Krp-Serch(T,P) 01 q prime lrger thn m 02 c 10 m-1 mod q // run loop multiplying y 10 mod q 03 fp 0; ft 0 04 for i 0 to m-1 // preprocessing 05 fp (10*fp + P[i]) mod q 06 ft (10*ft + T[i]) mod q 07 for s 0 to n m // mtching 08 if fp = ft then // run loop to compre strings 09 if P[0..m-1] = T[s..s+m-1] return s 10 ft ((ft T[s]*c)*10 + T[s+m]) mod q 11 return 1 AALG, lecture 3, Simons Šltenis, 2004

Anlysis If q is prime, the hsh function distriutes m-digit strings evenly mong the q vlues Thus, only every q th vlue of shift s will result in mtching fingerprints (which will require compring strings with O(m) comprisons) Expected running time (if q > m): Preprocessing: O(m) Outer loop: O(n-m) All inner loops: Totl time: O(n-m) Worst-cse running time: O(nm) n m m O n m q AALG, lecture 3, Simons Šltenis, 2004

Rin-Krp in Prctice If the lphet hs d chrcters, interpret chrcters s rdix-d digits (replce 10 with d in the lgorithm). Choosing prime q > m cn e done with rndomized lgorithms in O(m), or q cn e fixed to e the lrgest prime so tht 10*q fits in computer word. AALG, lecture 3, Simons Šltenis, 2004

Serching in n comprisons The gol: ech chrcter of the text is compred only once! Prolem with the nïve lgorithm: Forgets wht ws lerned from prtil mtch! Exmples: T = Tweedledee nd Tweedledum nd P = Tweedledum T = pppppppr nd P = pppr AALG, lecture 3, Simons Šltenis, 2004

Finite utomton serch c input stte c P 0 1 0 0 1 1 2 0 2 3 0 0 3 1 4 0 4 5 0 0 5 1 4 6 c 6 7 0 0 7 1 2 0 i -- 1 2 3 4 5 6 7 8 9 10 11 T[i] -- c stte (i) 0 1 2 3 4 5 4 5 6 7 2 3 Processing time tkes (n). But hve to first construct FA. Min Issue: How to construct FA?

Need some Nottion (w) = stte FA ends up in fter processing w. Exmple: () = 4. (x) = mx{k: P k suf x}. Clled the suffix function. Exmples: Let P =. () = 0 (ccc) = 1 (cc) = 2 Note: If P = m, then (x) = m indictes mtch. T: c Sttes: 0 1...m..m. mtch mtch

FA Construction Given: P[1..m] Let Q = sttes = {0, 1,, m}. initil finl Define trnsition function s follows: (q, ) = (P q ) for ech q nd. Exmple: (5, ) = (P 5 ) = () = 4 Intuition: Encountering in stte 5 mens the current sustring doesn t mtch. But, you know this sustring ends with -- nd this is the longest suffix tht mtches the eginning of P. Thus, we go to stte 4 nd continue processing.

P=c,c c m=7; Q={0,1,2,3,4,5,6,7) Prefixes c c

P=c,c c (1, ) = (P 1 ) = () = () = 1 Prefixes c c

P=c,c c c (1, ) = (P 1 ) = () = () = 1 (1, c) = (P 1 c) = (c) = 0 Prefixes c c

P=c,c c c c (2, ) = (P 2 ) = () = 0 (2, c) = (P 2 c) = (c) = 0 Prefixes c c

P=c (fst forwrd & simplified),c c (5, ) = (P 5 ) = () = () = 1 (5, ) = (P 5 ) = () = () = 4 Prefixes c c

P=c (finl, simplified),c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Prefixes c c

Serch,c c T= c Accept stte, we re done Prefixes c c

Anlysis of FA Serching: O(n) good Preprocessing: O(m ) d Memory: O(m ) d