Lexicl Anlysis Prt III Chpter 3: Finite Automt Slides dpted from : Roert vn Engelen, Florid Stte University Alex Aiken, Stnford University
Design of Lexicl Anlyzer Genertor Trnslte regulr expressions to NFA Trnslte NFA to n efficient DFA Optionl regulr expressions NFA DFA Simulte NFA to recognize tokens Simulte DFA to recognize tokens
Nondeterministic Finite Automt An nondeterministic finite utomton (NFA) is 5-tuple (S, Σ, δ, s, F) where S is finite set of sttes Σ is finite set of symols, the lphet δ is mpping from S (Σ {}) to suset of S s S is the strt stte F S is the set of ccepting (or finl) sttes
Trnsition Grph An NFA cn e digrmmticlly represented y leled directed grph clled trnsition grph strt 1 2 3 S = {,1,2,3} Σ = {,} s = F = {3}
Trnsition Tle The mpping δ of n NFA cn e represented in trnsition tle δ(, ) = {,1} δ(, ) = {} δ(1, ) = {2} δ(2, ) = {3} Stte Input Input {, 1} {} 1 {2} 2 {3} Input
The Lnguge Defined y n NFA An NFA ccepts n input string x if nd only if there is some pth with edges leled with symols from x in sequence from the strt stte to some ccepting stte A stte trnsition from one stte to nother on the pth is clled move The lnguge defined y n NFA is the set of input strings it ccepts
Exmple NFA A NFA tht ccepts L( * * ) 1 2 strt 3 34
Deterministic Finite Automt A deterministic finite utomton (DFA) is specil cse of n NFA No stte hs n -trnsition For ech stte s nd input symol there is t most one edge leled leving s Ech entry in the trnsition tle is single stte or is undefined At most one pth exists to ccept string Simultion lgorithm is simple
Exmple DFA A DFA tht ccepts L( ( )* ) strt 1 2 3
Exercise Select the regulr lnguge tht denotes the sme lnguge s this finite utomton 1* (1)* (1)* (*1)* ( 1)* (1* )(1 ) strt ( 1)* 1 1 1 1 2 3
Exercise Choose the NFA tht ccepts the following regulr expression: 1* strt 3 5 1 2 4 1 6 7 8 strt 3 5 1 2 4 1 6 7 8 strt 3 5 1 2 4 1 6 7 8 strt 3 5 1 2 4 1 6 7 8
Simulting DFA s = s ; c = nextchr(); while ( c!= eof ) { s = move(s, c); c = nextchr(); } if ( s in F ) return yes ; else return no ;
Design of Lexicl Anlyzer Genertor: RE to NFA to DFA Lex specifiction with regulr expressions NFA p 1 { ction 1 } p 2 { ction 2 } p n { ction n } strt s N(p 1 ) N(p 2 ) N(p n ) ction 1 ction 2 ction n Suset construction DFA
From Regulr Expression to NFA strt i f strt i f r 1 r 2 r 1 r 2 strt i N(r 1 ) N(r 2 ) strt i N(r 1 ) N(r 2 ) f r* strt i N(r) f f
Exmple: Construct the NFA for ( c)* First: NFAs for,, c S S 1 S S 1 c S S 1 S 1 S 2 S S 5 c S 3 S 4 Second: NFA for c S 2 S S 1 S 4 S 3 S 6 S 7 c S 5 Third: NFA for ( c)*
Exmple: Construct the NFA for ( c)* S 4 S 5 S S 1 S 2 S 3 S 8 S 9 S c 6 S 7 Fourth: NFA for ( c)* Of course, humn would design simpler one But, we cn utomte production of the complex one... S S 1 c
Comining the NFAs of Set of Regulr Expressions strt 1 2 { ction 1 } { ction 2 } *+ { ction 3 } strt 3 4 5 6 strt 7 8 strt 1 2 3 4 5 7 8 6
Simulting the Comined NFA Exmple 1 strt 1 2 3 4 5 7 8 ction 1 ction 3 6 ction 2 1 3 7 2 4 7 none: 7 8 retrct, ction 3 Must find the longest mtch: Continue until no further moves re possile When lst stte is ccepting: execute ction
Simulting the Comined NFA Exmple 2 strt 1 2 3 4 5 7 8 ction 1 ction 3 6 ction 2 1 2 4 none: 5 6 8 8 retrct, {ction 2, ction 3 } 3 7 7 When two or more ccepting sttes re reched, the first ction given in the Lex specifiction is executed
Errors Wht if no rule mtches? Crete new stte in the utomton corresponding to the regulr expression ll strings not in the lexicl specifiction Put the regulr expression lst in priority
Auxiliry Functions: -closure() nd move() Used in severl constructions lter : -closure(s) = {s} {t s t} -closure(t) = s T -closure(s) move(t, ) = {t s t nd s T}
Exmples for -closure() nd move() strt 1 2 3 4 5 7 8 6 -closure({}) = {,1,3,7} move({,1,3,7}, ) = {2,4,7} -closure({2,4,7}) = {2,4,7} move({2,4,7}, ) = {7} -closure({7}) = {7} move({7}, ) = {8} -closure({8}) = {8} move({8}, ) =
Simulting n NFA using -closure() nd move() S = -closure(s ); c = nextchr(); while ( c!= eof ) { S = -closure(move(s, c)); c = nextchr(); } if ( S F!= ) return yes ; else return no ;
Simulting n NFA: Additionl Dt Structure Two stcks: oldsttes holds current set of sttes newsttes holds next set of sttes Boolen rry lredyon, indexed y NFA sttes, indictes which sttes re in newsttes Two-dimensionl rry move[s, ] representing the trnsition tle
Simulting n NFA: Auxiliry Function ddstte(s) { push s onto newsttes; lredyon[s] = TRUE; for ( t on move[s, ] ) if (! lredyon[t] ) ddstte(t); }
Simulting n NFA: Auxiliry Code for ( s on oldsttes ) { for ( t on move[s, c] ) if (! lredyon[t] ) ddstte(t); pop s from oldsttes; }
Simulting n NFA: Auxiliry Code for ( s on newsttes ) { pop s from newsttes; push s onto oldsttes; lredyon[s] = FALSE; }
Simulting n NFA: Strem Version S = -closure({s }); S prev = ; c = nextchr(); while ( S!= ) { if ( S F!= ) S prev = S; S = -closure(move(s, c)); c = nextchr(); } if ( S prev!= ) { execute highest priority ction in S prev ; return yes ; } else return error ;
The Suset Construction Algorithm Off-line version of the lgorithm for simultion of NFAs on full word The lgorithm produces: Dsttes, the set of sttes of the new DFA consisting of sets of sttes of the NFA Dtrn, the trnsition tle of the new DFA
The Suset Construction Algorithm dd -closure(s ) s n unmrked stte to Dsttes while ( there is n unmrked stte T in Dsttes ) { mrk T ; for ech input symol Σ { U = -closure(move(t, )); if ( U is not in Dsttes ) dd U s n unmrked stte to Dsttes Dtrn[T, ] = U; } }
Suset Construction Exmple 1 strt 2 3 1 6 7 8 9 1 4 5 strt A C B D E Dsttes A = {,1,2,4,7} B = {1,2,3,4,6,7,8} C = {1,2,4,5,6,7} D = {1,2,4,5,6,7,9} E = {1,2,4,5,6,7,1}
Suset Construction Exmple 2 strt 1 2 3 4 5 7 8 strt 1 3 A C D B 1 6 3 2 E F 3 2 3 Dsttes A = {,1,3,7} B = {2,4,7} C = {8} D = {7} E = {5,8} F = {6,8}
Exercise Choose the DFA tht represents the sme lnguge s the given NFA strt 1 2 9 1 1 3 4 5 6 7 8 1 1 strt 1 1 strt 1 2 1 strt 1 1 2 strt 1 1 2 3
Recp Decision procedure for string s nd regulr expression R 1. Generte NFA from R 2. Either: Convert NFA to DFA Run DFA simultion lgorithm on s 3. Or: Run NFA simultion lgorithm on s
Time-Spce Trdeoffs r regulr expression, x input string Automton Spce (worst cse) Time (worst cse) NFA O( r ) O( r x ) DFA O(2 r ) O( x )
From Regulr Expression to DFA Directly The importnt sttes of n NFA re those without n -trnsition, tht is if move({s}, ) for some then s is n importnt stte The suset construction lgorithm uses only the importnt sttes when it determines -closure(move(t, ))
From Regulr Expression to DFA Directly (Algorithm) Augment the regulr expression r with specil end symol # to mke ccepting sttes importnt: the new expression is r# Construct syntx tree for r# Trverse the tree to construct functions nullle, firstpos, lstpos, nd followpos
From Regulr Expression to DFA Directly: Syntx Tree of ( )*# conctention # 6 closure 4 5 lterntion * 3 1 2 position numer (for lefs )
From Regulr Expression to DFA Directly: Annotting the Tree nullle(n): the sutree t node n genertes lnguges including the empty string firstpos(n): set of positions tht cn mtch the first symol of string generted y the sutree t node n lstpos(n): the set of positions tht cn mtch the lst symol of string generted e the sutree t node n followpos(i): the set of positions tht cn follow position i in the tree
From Regulr Expression to DFA Directly: Annotting the Tree Node n nullle(n) firstpos(n) lstpos(n) Lef true Lef i flse {i} {i} / \ c 1 c 2 / \ c 1 c 2 nullle(c 1 ) or nullle(c 2 ) nullle(c 1 ) nd nullle(c 2 ) firstpos(c 1 ) firstpos(c 2 ) if nullle(c 1 ) then firstpos(c 1 ) firstpos(c 2 ) else firstpos(c 1 ) lstpos(c 1 ) lstpos(c 2 ) if nullle(c 2 ) then lstpos(c 1 ) lstpos(c 2 ) else lstpos(c 2 ) * true firstpos(c 1 ) lstpos(c 1 ) c 1
From Regulr Expression to DFA Directly: Syntx Tree of ( )*# {1, 2, 3} {6} {1, 2, 3} {5} {6} # {6} 6 nullle {1, 2, 3} {4} {5} {5} 5 {1, 2} * {1, 2, 3} {1, 2} {3} {4} {4} 4 {3} {3} 3 firstpos lstpos {1, 2} {1, 2} {1} {1} {2} {2} 1 2
From Regulr Expression to DFA Directly: followpos for ech node n in the tree do if n is ct-node with left child c 1 nd right child c 2 then for ech i in lstpos(c 1 ) do followpos(i) := followpos(i) firstpos(c 2 ) end do else if n is str-node for ech i in lstpos(n) do followpos(i) := followpos(i) firstpos(n) end do end if end do
From Regulr Expression to DFA Directly: Algorithm s := firstpos(root) where root is the root of the syntx tree Dsttes := {s } nd is unmrked while there is n unmrked stte T in Dsttes do mrk T for ech input symol Σ do let U e the set of positions tht re in followpos(p) for some position p in T, such tht the symol t position p is if U is not empty nd not in Dsttes then dd U s n unmrked stte to Dsttes end if Dtrn[T,] := U end do end do
From Regulr Expression to DFA Directly: Exmple Node followpos 1 {1, 2, 3} 2 {1, 2, 3} 3 {4} 4 {5} 5 {6} 6-1 2 3 4 5 6 strt 1,2, 1,2,3 3,4 1,2, 3,5 1,2, 3,6
Implementing Trnsition Function Two-dimensionl tle indexed y current stte nd input chrcter Severl rows might e equl Compress tle y using n rry indexed y current stte, providing pointer to n rry indexed y input chrcter
Implementing Trnsition Function Alterntively, use djcency mtrix For ech stte, record list of trnsitions in the form of input chrcter-stte pirs List ended y defult stte for ny input chrcter not on the list
Implementing Trnsition Four rry solution Function defult se next check q r t