Lexical Analysis Part III

Similar documents
Principles of Programming Languages

Java II Finite Automata I

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

Convert the NFA into DFA

Regular expressions, Finite Automata, transition graphs are all the same!!

Lexical Analysis Finite Automate

CMSC 330: Organization of Programming Languages

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Deterministic Finite Automata

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Let's start with an example:

FABER Formal Languages, Automata and Models of Computation

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

CS375: Logic and Theory of Computing

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Formal languages, automata, and theory of computation

Finite Automata-cont d

Worked out examples Finite Automata

Nondeterminism and Nodeterministic Automata

NFAs continued, Closure Properties of Regular Languages

2. Lexical Analysis. Oscar Nierstrasz

CISC 4090 Theory of Computation

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CHAPTER 1 Regular Languages. Contents

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

Chapter 2 Finite Automata

Finite-State Automata: Recap

Theory of Computation Regular Languages

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Review for the Midterm

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

Homework 3 Solutions

In-depth introduction to main models, concepts of theory of computation:

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Designing finite automata II

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

Lecture 08: Feb. 08, 2019

CS375: Logic and Theory of Computing

Harvard University Computer Science 121 Midterm October 23, 2012

Closure Properties of Regular Languages

Some Theory of Computation Exercises Week 1

Normal Forms for Context-free Grammars

NFAs continued, Closure Properties of Regular Languages

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

First Midterm Examination

1 Nondeterministic Finite Automata

Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem 2/16/15

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Minimal DFA. minimal DFA for L starting from any other

Today s Topics Automata and Languages

Automata and Languages

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

First Midterm Examination

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

CSC 311 Theory of Computation

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

CS 275 Automata and Formal Language Theory

Context-Free Grammars and Languages

GNFA GNFA GNFA GNFA GNFA

CS 267: Automated Verification. Lecture 8: Automata Theoretic Model Checking. Instructor: Tevfik Bultan

Lecture 6 Regular Grammars

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Fundamentals of Computer Science

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

3 Regular expressions

Regular languages refresher

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

CS 314 Principles of Programming Languages

Chapter 1, Part 1. Regular Languages. CSC527, Chapter 1, Part 1 c 2012 Mitsunori Ogihara 1

Thoery of Automata CS402

Formal Language and Automata Theory (CS21004)

a b b a pop push read unread

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

Formal Languages and Automata

Languages & Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata

CSC 473 Automata, Grammars & Languages 11/9/10

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

Non-deterministic Finite Automata

1 From NFA to regular expression

Agenda. Agenda. Regular Expressions. Examples of Regular Expressions. Regular Expressions (crash course) Computational Linguistics 1

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Overview HC9. Parsing: Top-Down & LL(1) Context-Free Grammars (1) Introduction. CFGs (3) Context-Free Grammars (2) Vertalerbouw HC 9: Ch.

Transcription:

Lexicl Anlysis Prt III Chpter 3: Finite Automt Slides dpted from : Roert vn Engelen, Florid Stte University Alex Aiken, Stnford University

Design of Lexicl Anlyzer Genertor Trnslte regulr expressions to NFA Trnslte NFA to n efficient DFA Optionl regulr expressions NFA DFA Simulte NFA to recognize tokens Simulte DFA to recognize tokens

Nondeterministic Finite Automt An nondeterministic finite utomton (NFA) is 5-tuple (S, Σ, δ, s, F) where S is finite set of sttes Σ is finite set of symols, the lphet δ is mpping from S (Σ {}) to suset of S s S is the strt stte F S is the set of ccepting (or finl) sttes

Trnsition Grph An NFA cn e digrmmticlly represented y leled directed grph clled trnsition grph strt 1 2 3 S = {,1,2,3} Σ = {,} s = F = {3}

Trnsition Tle The mpping δ of n NFA cn e represented in trnsition tle δ(, ) = {,1} δ(, ) = {} δ(1, ) = {2} δ(2, ) = {3} Stte Input Input {, 1} {} 1 {2} 2 {3} Input

The Lnguge Defined y n NFA An NFA ccepts n input string x if nd only if there is some pth with edges leled with symols from x in sequence from the strt stte to some ccepting stte A stte trnsition from one stte to nother on the pth is clled move The lnguge defined y n NFA is the set of input strings it ccepts

Exmple NFA A NFA tht ccepts L( * * ) 1 2 strt 3 34

Deterministic Finite Automt A deterministic finite utomton (DFA) is specil cse of n NFA No stte hs n -trnsition For ech stte s nd input symol there is t most one edge leled leving s Ech entry in the trnsition tle is single stte or is undefined At most one pth exists to ccept string Simultion lgorithm is simple

Exmple DFA A DFA tht ccepts L( ( )* ) strt 1 2 3

Exercise Select the regulr lnguge tht denotes the sme lnguge s this finite utomton 1* (1)* (1)* (*1)* ( 1)* (1* )(1 ) strt ( 1)* 1 1 1 1 2 3

Exercise Choose the NFA tht ccepts the following regulr expression: 1* strt 3 5 1 2 4 1 6 7 8 strt 3 5 1 2 4 1 6 7 8 strt 3 5 1 2 4 1 6 7 8 strt 3 5 1 2 4 1 6 7 8

Simulting DFA s = s ; c = nextchr(); while ( c!= eof ) { s = move(s, c); c = nextchr(); } if ( s in F ) return yes ; else return no ;

Design of Lexicl Anlyzer Genertor: RE to NFA to DFA Lex specifiction with regulr expressions NFA p 1 { ction 1 } p 2 { ction 2 } p n { ction n } strt s N(p 1 ) N(p 2 ) N(p n ) ction 1 ction 2 ction n Suset construction DFA

From Regulr Expression to NFA strt i f strt i f r 1 r 2 r 1 r 2 strt i N(r 1 ) N(r 2 ) strt i N(r 1 ) N(r 2 ) f r* strt i N(r) f f

Exmple: Construct the NFA for ( c)* First: NFAs for,, c S S 1 S S 1 c S S 1 S 1 S 2 S S 5 c S 3 S 4 Second: NFA for c S 2 S S 1 S 4 S 3 S 6 S 7 c S 5 Third: NFA for ( c)*

Exmple: Construct the NFA for ( c)* S 4 S 5 S S 1 S 2 S 3 S 8 S 9 S c 6 S 7 Fourth: NFA for ( c)* Of course, humn would design simpler one But, we cn utomte production of the complex one... S S 1 c

Comining the NFAs of Set of Regulr Expressions strt 1 2 { ction 1 } { ction 2 } *+ { ction 3 } strt 3 4 5 6 strt 7 8 strt 1 2 3 4 5 7 8 6

Simulting the Comined NFA Exmple 1 strt 1 2 3 4 5 7 8 ction 1 ction 3 6 ction 2 1 3 7 2 4 7 none: 7 8 retrct, ction 3 Must find the longest mtch: Continue until no further moves re possile When lst stte is ccepting: execute ction

Simulting the Comined NFA Exmple 2 strt 1 2 3 4 5 7 8 ction 1 ction 3 6 ction 2 1 2 4 none: 5 6 8 8 retrct, {ction 2, ction 3 } 3 7 7 When two or more ccepting sttes re reched, the first ction given in the Lex specifiction is executed

Errors Wht if no rule mtches? Crete new stte in the utomton corresponding to the regulr expression ll strings not in the lexicl specifiction Put the regulr expression lst in priority

Auxiliry Functions: -closure() nd move() Used in severl constructions lter : -closure(s) = {s} {t s t} -closure(t) = s T -closure(s) move(t, ) = {t s t nd s T}

Exmples for -closure() nd move() strt 1 2 3 4 5 7 8 6 -closure({}) = {,1,3,7} move({,1,3,7}, ) = {2,4,7} -closure({2,4,7}) = {2,4,7} move({2,4,7}, ) = {7} -closure({7}) = {7} move({7}, ) = {8} -closure({8}) = {8} move({8}, ) =

Simulting n NFA using -closure() nd move() S = -closure(s ); c = nextchr(); while ( c!= eof ) { S = -closure(move(s, c)); c = nextchr(); } if ( S F!= ) return yes ; else return no ;

Simulting n NFA: Additionl Dt Structure Two stcks: oldsttes holds current set of sttes newsttes holds next set of sttes Boolen rry lredyon, indexed y NFA sttes, indictes which sttes re in newsttes Two-dimensionl rry move[s, ] representing the trnsition tle

Simulting n NFA: Auxiliry Function ddstte(s) { push s onto newsttes; lredyon[s] = TRUE; for ( t on move[s, ] ) if (! lredyon[t] ) ddstte(t); }

Simulting n NFA: Auxiliry Code for ( s on oldsttes ) { for ( t on move[s, c] ) if (! lredyon[t] ) ddstte(t); pop s from oldsttes; }

Simulting n NFA: Auxiliry Code for ( s on newsttes ) { pop s from newsttes; push s onto oldsttes; lredyon[s] = FALSE; }

Simulting n NFA: Strem Version S = -closure({s }); S prev = ; c = nextchr(); while ( S!= ) { if ( S F!= ) S prev = S; S = -closure(move(s, c)); c = nextchr(); } if ( S prev!= ) { execute highest priority ction in S prev ; return yes ; } else return error ;

The Suset Construction Algorithm Off-line version of the lgorithm for simultion of NFAs on full word The lgorithm produces: Dsttes, the set of sttes of the new DFA consisting of sets of sttes of the NFA Dtrn, the trnsition tle of the new DFA

The Suset Construction Algorithm dd -closure(s ) s n unmrked stte to Dsttes while ( there is n unmrked stte T in Dsttes ) { mrk T ; for ech input symol Σ { U = -closure(move(t, )); if ( U is not in Dsttes ) dd U s n unmrked stte to Dsttes Dtrn[T, ] = U; } }

Suset Construction Exmple 1 strt 2 3 1 6 7 8 9 1 4 5 strt A C B D E Dsttes A = {,1,2,4,7} B = {1,2,3,4,6,7,8} C = {1,2,4,5,6,7} D = {1,2,4,5,6,7,9} E = {1,2,4,5,6,7,1}

Suset Construction Exmple 2 strt 1 2 3 4 5 7 8 strt 1 3 A C D B 1 6 3 2 E F 3 2 3 Dsttes A = {,1,3,7} B = {2,4,7} C = {8} D = {7} E = {5,8} F = {6,8}

Exercise Choose the DFA tht represents the sme lnguge s the given NFA strt 1 2 9 1 1 3 4 5 6 7 8 1 1 strt 1 1 strt 1 2 1 strt 1 1 2 strt 1 1 2 3

Recp Decision procedure for string s nd regulr expression R 1. Generte NFA from R 2. Either: Convert NFA to DFA Run DFA simultion lgorithm on s 3. Or: Run NFA simultion lgorithm on s

Time-Spce Trdeoffs r regulr expression, x input string Automton Spce (worst cse) Time (worst cse) NFA O( r ) O( r x ) DFA O(2 r ) O( x )

From Regulr Expression to DFA Directly The importnt sttes of n NFA re those without n -trnsition, tht is if move({s}, ) for some then s is n importnt stte The suset construction lgorithm uses only the importnt sttes when it determines -closure(move(t, ))

From Regulr Expression to DFA Directly (Algorithm) Augment the regulr expression r with specil end symol # to mke ccepting sttes importnt: the new expression is r# Construct syntx tree for r# Trverse the tree to construct functions nullle, firstpos, lstpos, nd followpos

From Regulr Expression to DFA Directly: Syntx Tree of ( )*# conctention # 6 closure 4 5 lterntion * 3 1 2 position numer (for lefs )

From Regulr Expression to DFA Directly: Annotting the Tree nullle(n): the sutree t node n genertes lnguges including the empty string firstpos(n): set of positions tht cn mtch the first symol of string generted y the sutree t node n lstpos(n): the set of positions tht cn mtch the lst symol of string generted e the sutree t node n followpos(i): the set of positions tht cn follow position i in the tree

From Regulr Expression to DFA Directly: Annotting the Tree Node n nullle(n) firstpos(n) lstpos(n) Lef true Lef i flse {i} {i} / \ c 1 c 2 / \ c 1 c 2 nullle(c 1 ) or nullle(c 2 ) nullle(c 1 ) nd nullle(c 2 ) firstpos(c 1 ) firstpos(c 2 ) if nullle(c 1 ) then firstpos(c 1 ) firstpos(c 2 ) else firstpos(c 1 ) lstpos(c 1 ) lstpos(c 2 ) if nullle(c 2 ) then lstpos(c 1 ) lstpos(c 2 ) else lstpos(c 2 ) * true firstpos(c 1 ) lstpos(c 1 ) c 1

From Regulr Expression to DFA Directly: Syntx Tree of ( )*# {1, 2, 3} {6} {1, 2, 3} {5} {6} # {6} 6 nullle {1, 2, 3} {4} {5} {5} 5 {1, 2} * {1, 2, 3} {1, 2} {3} {4} {4} 4 {3} {3} 3 firstpos lstpos {1, 2} {1, 2} {1} {1} {2} {2} 1 2

From Regulr Expression to DFA Directly: followpos for ech node n in the tree do if n is ct-node with left child c 1 nd right child c 2 then for ech i in lstpos(c 1 ) do followpos(i) := followpos(i) firstpos(c 2 ) end do else if n is str-node for ech i in lstpos(n) do followpos(i) := followpos(i) firstpos(n) end do end if end do

From Regulr Expression to DFA Directly: Algorithm s := firstpos(root) where root is the root of the syntx tree Dsttes := {s } nd is unmrked while there is n unmrked stte T in Dsttes do mrk T for ech input symol Σ do let U e the set of positions tht re in followpos(p) for some position p in T, such tht the symol t position p is if U is not empty nd not in Dsttes then dd U s n unmrked stte to Dsttes end if Dtrn[T,] := U end do end do

From Regulr Expression to DFA Directly: Exmple Node followpos 1 {1, 2, 3} 2 {1, 2, 3} 3 {4} 4 {5} 5 {6} 6-1 2 3 4 5 6 strt 1,2, 1,2,3 3,4 1,2, 3,5 1,2, 3,6

Implementing Trnsition Function Two-dimensionl tle indexed y current stte nd input chrcter Severl rows might e equl Compress tle y using n rry indexed y current stte, providing pointer to n rry indexed y input chrcter

Implementing Trnsition Function Alterntively, use djcency mtrix For ech stte, record list of trnsitions in the form of input chrcter-stte pirs List ended y defult stte for ny input chrcter not on the list

Implementing Trnsition Four rry solution Function defult se next check q r t