In Linear Time from Regular Expressions to NFAs

Similar documents
Lecture 6 Regular Grammars

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

FABER Formal Languages, Automata and Models of Computation

Formal Methods in Software Engineering

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Theory of Computation Regular Languages

Non-deterministic Finite Automata

CMSC 330: Organization of Programming Languages

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Lexical Analysis Part III

Java II Finite Automata I

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Non-deterministic Finite Automata

Lexical Analysis Finite Automate

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

CS 275 Automata and Formal Language Theory

Formal languages, automata, and theory of computation

Non-Deterministic Finite Automata

Deterministic Finite Automata

Regular expressions, Finite Automata, transition graphs are all the same!!

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

Designing finite automata II

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Fundamentals of Computer Science

Today s Topics Automata and Languages

Automata and Languages

Chapter 4 Regular Grammar and Regular Sets. (Solutions / Hints)

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

1.4 Nonregular Languages

Non Deterministic Automata. Formal Languages and Automata - Yonsei CS 1

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

CS 275 Automata and Formal Language Theory

State Minimization for DFAs

Lecture 09: Myhill-Nerode Theorem

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Regular Languages and Applications

3 Regular expressions

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

More on automata. Michael George. March 24 April 7, 2014

Agenda. Agenda. Regular Expressions. Examples of Regular Expressions. Regular Expressions (crash course) Computational Linguistics 1

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

XPath Node Selection over Grammar-Compressed Trees

CS375: Logic and Theory of Computing

ɛ-closure, Kleene s Theorem,

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Worked out examples Finite Automata

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

Nondeterminism. Nondeterministic Finite Automata. Example: Moves on a Chessboard. Nondeterminism (2) Example: Chessboard (2) Formal NFA

Overview HC9. Parsing: Top-Down & LL(1) Context-Free Grammars (1) Introduction. CFGs (3) Context-Free Grammars (2) Vertalerbouw HC 9: Ch.

Foundations of XML Types: Tree Automata

Regular languages refresher

I. Theory of Automata II. Theory of Formal Languages III. Theory of Turing Machines

Exercises Chapter 1. Exercise 1.1. Let Σ be an alphabet. Prove wv = w + v for all strings w and v.

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

1.3 Regular Expressions

Finite Automata-cont d

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

Convert the NFA into DFA

Automata and Languages

Nondeterminism and Nodeterministic Automata

Finite-State Automata: Recap

First Midterm Examination

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

CSC 473 Automata, Grammars & Languages 11/9/10

CS375: Logic and Theory of Computing

Let's start with an example:

1 Nondeterministic Finite Automata

Homework 3 Solutions

Greedy regular expression matching

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Uninformed Search Lecture 4

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

First Midterm Examination

Normal Forms for Context-free Grammars

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Faster Regular Expression Matching. Philip Bille Mikkel Thorup

Lecture 08: Feb. 08, 2019

CS 275 Automata and Formal Language Theory

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Formal Languages and Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata

COMP 250. Lecture 20. tree traversal. Oct. 25/26, 2017

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Context-Free Grammars and Languages

Section: Other Models of Turing Machines. Definition: Two automata are equivalent if they accept the same language.

NON-DETERMINISTIC FSA

Abstract. The ISO standard for Standard Generalized Markup Language (SGML) provides a syntactic

Chapter 2 Finite Automata

Transcription:

In Liner Time rom Regulr Expressions to NFAs e = ǫ ǫ ǫ ǫ e = e = ǫ ǫ e = ǫ ǫ e = ǫ ǫ ǫ Thompson s Algorithm Produces O(n) sttes or regulr expressions o length n Ken Thompson 27 / 282

Berry-Sethi Approch Berry-Sethi Algorithm Produces exctly n+1 sttes without ǫ-trnsitions Gerrd Berry Rvi Sethi nd demonstrtes Equlity Systems nd Attriute Grmmrs Ide: The utomton trcks (conceptionlly vi mrker ), in the syntx tree o regulr expression, which suexpressions in e re rechle consuming the rest o input w 28 / 282

Berry-Sethi Approch Glushkov Algorithm Produces exctly n+1 sttes without ǫ-trnsitions nd demonstrtes Equlity Systems nd Attriute Grmmrs Viktor M Glushkov Ide: The utomton trcks (conceptionlly vi mrker ), in the syntx tree o regulr expression, which suexpressions in e re rechle consuming the rest o input w 28 / 282

Berry-Sethi Approch or exmple: () () * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch or exmple: w = : * 29 / 282

Berry-Sethi Approch In generl: Input is only consumed t the leves Nvigting the tree does not consume input ǫ-trnsitions For orml construction we need identiiers or sttes For node n s identiier we tke the suexpression, corresponding to the sutree dominted y n There re possily identicl suexpressions in one regulr expression == we enumerte the leves 30 / 282

Berry-Sethi Approch or exmple: * 31 / 282

Berry-Sethi Approch or exmple: * 2 0 1 3 4 31 / 282

Berry-Sethi Approch or exmple: * 2 0 1 3 4 31 / 282

Berry-Sethi Approch (nive version) Construction (nive version): Sttes: r, r with r nodes o e; Strt stte: e; Finl stte: e ; Trnsitions: or leves r i x we require: ( r,x,r ) The letover trnsitions re: r Trnsitions r 1 r 2 ( r,ǫ, r 1 ) ( r,ǫ, r 2 ) (r 1,ǫ,r ) (r 2,ǫ,r ) r 1 r 2 ( r,ǫ, r 1 ) (r 1,ǫ, r 2 ) (r 2,ǫ,r ) r Trnsitions r1 ( r,ǫ,r ) ( r,ǫ, r 1 ) (r 1,ǫ, r 1 ) (r 1,ǫ,r ) r 1? ( r,ǫ,r ) ( r,ǫ, r 1 ) (r 1,ǫ,r ) 32 / 282

Berry-Sethi Approch Discussion: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic 33 / 282

Berry-Sethi Approch Discussion: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic Strtegy or the sophisticted version: Avoid generting ǫ-trnsitions 33 / 282

Berry-Sethi Approch Discussion: Ide: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic Strtegy or the sophisticted version: Avoid generting ǫ-trnsitions Pre-compute helper ttriutes during D(epth)F(irst)S(erch)! 33 / 282

Berry-Sethi Approch Discussion: Ide: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic Strtegy or the sophisticted version: Avoid generting ǫ-trnsitions Pre-compute helper ttriutes during D(epth)F(irst)S(erch)! Necessry node-ttriutes: irst the set o red sttes elow r, which my e reched irst, when descending into r next the set o red sttes to the right o r, which my e reched irst in the trversl ter r lst the set o red sttes elow r, which my e reched lst when descending into r empty cn the suexpression r consume ǫ? 33 / 282

Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: * 2 0 1 3 4 34 / 282

Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: * 2 0 1 3 4 34 / 282

Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: * 2 0 1 3 4 34 / 282

Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: t * 2 0 1 3 4 34 / 282

Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: t * 2 0 1 3 4 34 / 282

Berry-Sethi Approch: 1st step Implementtion: DFS post-order trversl or leves r i x we ind empty[r] = (x ǫ) Otherwise: empty[r 1 r 2 ] = empty[r 1 ] empty[r 2 ] empty[r 1 r 2 ] = empty[r 1 ] empty[r 2 ] empty[r1] = t empty[r 1?] = t 35 / 282

Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: t * 2 0 1 3 4 36 / 282

Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 t * 2 1 2 3 4 1 3 4 36 / 282

Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 t * 2 34 1 2 3 4 1 3 4 36 / 282

Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 2 t * 2 34 2 1 3 4 1 3 4 36 / 282

Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 012 01 2 t * 2 34 2 1 3 4 1 3 4 36 / 282

Berry-Sethi Approch: 2nd step Implementtion: DFS post-order trversl or leves r i x we ind irst[r] = {i x ǫ} Otherwise: irst[r 1 r 2 ] = { irst[r 1 ] irst[r 2 ] irst[r1 ] irst[r irst[r 1 r 2 ] = 2 ] i empty[r 1 ] = t irst[r 1 ] i empty[r 1 ] = irst[r1] = irst[r 1 ] irst[r 1?] = irst[r 1 ] 37 / 282

Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 t * 012 2 2 34 1 2 3 4 1 3 4 38 / 282

Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 t * 012 2 2 34 1 2 3 4 1 3 4 38 / 282

Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 2 t * 012 2 2 34 3 4 1 2 3 4 1 3 4 38 / 282

Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 1 2 0 0 01 01 2 t * 012 2 2 34 3 4 1 2 3 4 1 3 4 38 / 282

Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 012 01 2 2 t * 2 34 0 1 2 01 3 4 0 1 2 3 4 0 1 2 0 1 2 0 1 3 4 38 / 282

Berry-Sethi Approch: 3rd step Implementtion: DFS pre-order trversl For the root, we ind: next[e] = Aprt rom tht we distinguish, sed on the context: r Equlities r 1 r 2 next[r 1 ] = next[r] next[r 2 ] = next[r] { irst[r2 ] next[r] i empty[r r 1 r 2 next[r 1 ] = 2 ] = t irst[r 2 ] i empty[r 2 ] = next[r 2 ] = next[r] r1 next[r 1 ] = irst[r 1 ] next[r] r 1? next[r 1 ] = next[r] 39 / 282

Berry-Sethi Approch: 4th step The my-set o lst reched red sttes: The set o red sttes, which my e reched lst during the trversl o r connected to the root vi ǫ-trnsitions only: lst[r] = {i in r ( i x,ǫ,r ) δ,x ǫ} or exmple: 012 01 2 2 t * 2 34 0 1 2 01 3 4 0 1 2 3 4 0 1 2 0 1 2 0 1 3 4 40 / 282

Berry-Sethi Approch: 4th step The my-set o lst reched red sttes: The set o red sttes, which my e reched lst during the trversl o r connected to the root vi ǫ-trnsitions only: lst[r] = {i in r ( i x,ǫ,r ) δ,x ǫ} or exmple: 012 01 2 2 t * 2 2 34 0 1 2 01 3 4 0 0 1 1 2 3 3 4 4 0 1 2 0 1 2 0 1 3 4 40 / 282

Berry-Sethi Approch: 4th step The my-set o lst reched red sttes: The set o red sttes, which my e reched lst during the trversl o r connected to the root vi ǫ-trnsitions only: lst[r] = {i in r ( i x,ǫ,r ) δ,x ǫ} or exmple: 012 34 01 01 2 34 2 t * 2 2 34 34 0 1 2 01 01 3 4 0 0 1 1 2 3 3 4 4 0 1 2 0 1 2 0 1 3 4 40 / 282

Berry-Sethi Approch: 4th step Implementtion: DFS post-order trversl or leves r i x we ind lst[r] = {i x ǫ} Otherwise: lst[r 1 r 2 ] = { lst[r 1 ] lst[r 2 ] lst[r1 ] lst[r lst[r 1 r 2 ] = 2 ] i empty[r 2 ] = t lst[r 2 ] i empty[r 2 ] = lst[r1] = lst[r 1 ] lst[r 1?] = lst[r 1 ] 41 / 282

Berry-Sethi Approch: (sophisticted version) Construction (sophisticted version): Crete n utomnton sed on the syntx tree s new ttriutes: Sttes: { e} {i i le} Strt stte: e Finl sttes: lst[e] i empty[e] = { e} lst[e] otherwise Trnsitions: ( e,, i ) i i irst[e] nd i lled with (i,,i ) i i next[i] nd i lled with We cll the resulting utomton A e 42 / 282

Berry-Sethi Approch or exmple: 0 1 2 3 4 Remrks: This construction is known s Berry-Sethi- or Glushkov-construction It is used or XML to deine Content Models The result my not e, wht we hd in mind 43 / 282