Prefix-Free Regular-Expression Matching

Similar documents
Regular Languages and Applications

Data Structures and Algorithm. Xiaoqing Zheng

Regular languages refresher

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Nondeterministic Automata vs Deterministic Automata

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Running an NFA & the subset algorithm (NFA->DFA) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/

Finite State Automata and Determinisation

Formal Languages and Automata

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Nondeterminism and Nodeterministic Automata

Minimal DFA. minimal DFA for L starting from any other

Deterministic Finite Automata

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

Theory of Computation Regular Languages

Nondeterministic Finite Automata

Homework 3 Solutions

DFA Minimization and Applications

Regular expressions, Finite Automata, transition graphs are all the same!!

Let's start with an example:

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

First Midterm Examination

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Lecture 08: Feb. 08, 2019

= state, a = reading and q j

Worked out examples Finite Automata

Java II Finite Automata I

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

CS375: Logic and Theory of Computing

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

1.4 Nonregular Languages

First Midterm Examination

CS 573 Automata Theory and Formal Languages

Designing finite automata II

Languages & Automata

Chapter 1, Part 1. Regular Languages. CSC527, Chapter 1, Part 1 c 2012 Mitsunori Ogihara 1

Finite-State Automata: Recap

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

GNFA GNFA GNFA GNFA GNFA

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

NON-DETERMINISTIC FSA

Chapter 4 Regular Grammar and Regular Sets. (Solutions / Hints)

CISC 4090 Theory of Computation

Lecture 6 Regular Grammars

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

CHAPTER 1 Regular Languages. Contents

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

State Complexity of Union and Intersection of Binary Suffix-Free Languages

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Some Theory of Computation Exercises Week 1

CSC 311 Theory of Computation

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem 2/16/15

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Nondeterminism. Nondeterministic Finite Automata. Example: Moves on a Chessboard. Nondeterminism (2) Example: Chessboard (2) Formal NFA

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Thoery of Automata CS402

Harvard University Computer Science 121 Midterm October 23, 2012

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

1 Nondeterministic Finite Automata

1 From NFA to regular expression

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

CS 311 Homework 3 due 16:30, Thursday, 14 th October 2010

Formal languages, automata, and theory of computation

Lexical Analysis Finite Automate

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

CS241 Week 6 Tutorial Solutions

Hybrid Systems Modeling, Analysis and Control

CMSC 330: Organization of Programming Languages

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Finite Automata-cont d

Petri Nets. Rebecca Albrecht. Seminar: Automata Theory Chair of Software Engeneering

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Non-deterministic Finite Automata

NFAs continued, Closure Properties of Regular Languages

ɛ-closure, Kleene s Theorem,

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

CS375: Logic and Theory of Computing

Chapter 2 Finite Automata

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

State Minimization for DFAs

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Non-deterministic Finite Automata

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

input tape head moves current state

Today s Topics Automata and Languages

Automata and Languages

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Transcription:

Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15

Pttern Mthing Given pttern P nd text T, find ll sustrings of T tht re in P. P = 1: string pttern mthing [BM, KMP] P = k: keyword pttern mthing [AC] P is regulr expression: regulr-expression pttern mthing!!! Prefix-Free Regulr-Expression Mthing p.2/15

Overview Bsi Notions Relted Work Regulr-Expression Mthing Infix-Free Regulr-Expression Mthing Prefix-Free Regulr-Expression Mthing Determine whether or not L(E) is prefix-free Conlusions Prefix-Free Regulr-Expression Mthing p.3/15

Bsi Notions An utomton A is speified y tuple (Q, Σ,δ,s,F); Q finite set of sttes Σ finite lphet δ Q Σ Q s Q strt stte F Q set of finl sttes λ = the null-string symol A = Q + δ E = the numer of hrter ppernes in given regulr expression E Prefix-Free Regulr-Expression Mthing p.4/15

Bsi Notions Given trnsition (p,,q) in δ p hs n out-trnsition q hs n in-trnsition p is soure stte of q q is trget stte of p A to e non-returning if the strt stte of A does not hve ny in-trnsitions A to e non-exiting if finl stte of A does not hve ny out-trnsitions p q Prefix-Free Regulr-Expression Mthing p.4/15

Bsi Notions Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ suh tht xz = y. x is n infix of y if there exists u,v Σ suh tht uxv = y; we often ll x sustring of y. Prefix-Free Regulr-Expression Mthing p.4/15

Bsi Notions We define lnguge L to e prefix-free if no string in L is prefix of ny other strings in L. infix-free if no string in L is n infix of ny other strings in L. Prefix-Free Regulr-Expression Mthing p.4/15

Relted Work Given regulr expression E nd text T, The memership prolem: We n determine whether or not T L(E) in O(mn) time [Thompson] The deision prolem: We n determine whether or not there is sustring of T tht is in L(E)) in O(mn) time [Aho] or in O(m log n) time [Myers] The reognition prolem: We n report ll end positions of mthing sustrings of T in O(mn) time [Aho] or in O(m log n) time [Myers] The identifition prolem: We n report ll (strt, end) positions of mthing sustrings of T in O(mn log n) time [Myers et l.] Prefix-Free Regulr-Expression Mthing p.5/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Memership Prolem E = ( + ) nd T = Prefix-Free Regulr-Expression Mthing p.6/15

The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. Σ ( + ) Σ Prefix-Free Regulr-Expression Mthing p.7/15

The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. ExpressionMthing (A, T) Q = null({s}) if f Q then output λ for j=1 to n Q = null(goto(q,w j )) if f Q then output j null(q) omputes ll sttes in A tht n e rehed from stte in the set Q of sttes y null trnsitions goto(q,w j ) gives ll sttes tht n e rehed from stte in Q y trnsition with w j, the urrent input hrter Prefix-Free Regulr-Expression Mthing p.7/15

The Reognition Prolem Given E over Σ, we prepend Σ to E; thus, llowing mthing to egin t ny position in T. E = ( + ) T Given regulr expression E nd text T, we n find ll end positions of mthing sustrings of T in O(mn) worst-se time using O(m) spe [Crohemore nd Hnrt]. Prefix-Free Regulr-Expression Mthing p.7/15

The Identifition Prolem Given regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn 2 ) worst-se time using O(m) spe. Note tht the lgorithm of Myers et l. tkes O(mn log n) time using O(m log n) spe. Prefix-Free Regulr-Expression Mthing p.8/15

Infix-Free Regulr-Expression Mthing L IN L PRE L REG T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Given n infix-free regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn) worst-se time using O(m) spe. Prefix-Free Regulr-Expression Mthing p.9/15

Prefix-Free Regulr-Expression Mthing L IN L PRE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, there re t most n mthing sustrings of T tht elong to L(E), where n is the size of T Prefix-Free Regulr-Expression Mthing p.10/15

Prefix-Free Regulr-Expression Mthing Given prefix-free regulr expression E nd text T, we find ll end positions of mthing sustrings of T in O(mn) time. T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Let P = {p 1,p 2,...,p k } e the set of end positions of mthing sustrings for k n Construt the Thompson utomton A = (Q, Σ,δ,s,f ) for E R Sn T R = w n w 1 strting from the lst position p k in P to find the orresponding strt position Prefix-Free Regulr-Expression Mthing p.10/15

Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 15 For urrent input position i in T R, Q 15 is set of sttes suh tht there is pth from s to eh stte in Q 15 tht spells out w 15 w 14 w i. We keep reding T R until we meet f. Prefix-Free Regulr-Expression Mthing p.10/15

Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 13 Q 15 Prefix-Free Regulr-Expression Mthing p.10/15

Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 10 Q 13 Q 15 Prefix-Free Regulr-Expression Mthing p.10/15

Prefix-Free Regulr-Expression Mthing T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Q 9 Q 10 Q 13 Q 15 In the worst-se, there re k suh sets of sttes nd we need O(km) time for eh hrter of T to updte these k sets. Thus, the totl running time is O(mn 2 ) in the worst-se sine k is t most n. Prefix-Free Regulr-Expression Mthing p.10/15

Prefix-Free Regulr-Expression Mthing If stte r in A is rehed from two different sttes p nd q, where p Q i nd q Q j, when reding hrter w h in EM, where h i < j, then oth pths from p nd q vi r nnot reh f y reding ny prefix of the remining input in EM. p Q i, q Q j T R j i h Q i Q j p r q Prefix-Free Regulr-Expression Mthing p.11/15

Prefix-Free Regulr-Expression Mthing If stte r in A is rehed from two different sttes p nd q, where p Q i nd q Q j, when reding hrter w h in EM, where h i < j, then oth pths from p nd q vi r nnot reh f y reding ny prefix of the remining input in EM. Eh stte in A ppers in t most one rehle set Any two sets of rehle sttes re disjoint We need t most O(m) time to updte ll sets of rehle sttes simultneously t eh step Given prefix-free regulr expression E nd text T, we n identify ll mthing sustrings of T tht elong to L(E) in O(mn) worst-se time using O(m) spe. Prefix-Free Regulr-Expression Mthing p.11/15

Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? Prefix-Free Regulr-Expression Mthing p.12/15

Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? If n NFA A is prefix-free, then A must e non-exiting However, the reverse does not hold Prefix-Free Regulr-Expression Mthing p.12/15

Prefix-Freeness An FA A is prefix-free if L(A) is prefix-free A DFA A is prefix-free if it is non-exiting Wht out the NFA se? If n NFA A is prefix-free, then A must e non-exiting However, the reverse does not hold s f Prefix-Free Regulr-Expression Mthing p.12/15

Stte-Pir Grph Given finite-stte utomton A = (Q, Σ, δ, s, f), we define the stte-pir grph G A = (V,E), where V is set of nodes nd E is set of edges, s follows: V = {(i,j) q i nd q j Q} nd E = {((i,j),, (x,y)) (q i,,q x ) nd (q j,,q y ) δ nd Σ}. 2 1 3 4 5 6 7 2,2 1,1 3,3 4,6 4,4 5,5 6,6 5,7 7,7 Prefix-Free Regulr-Expression Mthing p.13/15

Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A 2 1 3 4 5 6 7 2,2 1,1 3,3 4,6 4,4 5,5 6,6 5,7 7,7 Prefix-Free Regulr-Expression Mthing p.14/15

Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A Given finite-stte utomton A = (Q, Σ, δ, s, f), we n determine whether or not L(A) is prefix-free in O( Q 2 + δ 2 ) worst-se time Let G A = (V, E) e the stte-pir grph of A V = Q 2 Let δ i denote the set of out-trnsitions from stte q i in A δ = m i=1 δ i, where m = Q node (i, j) in G A n hve t most δ i δ j out-trnsitions E = m i,j=1 δ i δ j δ 2 Prefix-Free Regulr-Expression Mthing p.14/15

Stte-Pir Grph & Prefix-Freeness CPM 2005 Given finite-stte utomton A, L(A) is prefix-free if nd only if there is no pth from (1, 1) to (m,j), for ny j m, in G A Given finite-stte utomton A = (Q, Σ, δ, s, f), we n determine whether or not L(A) is prefix-free in O( Q 2 + δ 2 ) worst-se time Given regulr expression E, we n determine whether or not L(E) is prefix-free in O( E 2 ) worst-se time Construt the Thompson utomton for E Q = δ = O( E ) Prefix-Free Regulr-Expression Mthing p.14/15

Conlusions Solve the prefix-free regulr-expression mthing prolem in O(mn) time using O(m) spe sed on the Thompson utomt Determine whether or not L(A) is prefix-free for given NFA A in polynomil time sed on stte-pir grphs Prefix-Free Regulr-Expression Mthing p.15/15