Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Converting Regulr Expressions to Discrete Finite Automt: A Tutoril Dvid Christinsen 2013-01-03 This is tutoril on how to convert regulr expressions to nondeterministic finite utomt (NFA) nd how to convert these to deterministic finite utomt. It s ment to e strightforwrd nd esy to follow, rther thn worrying out every technicl detil. Plese see Toren Mogensen s ook for the nitty-gritty. In this tutoril, we write concrete regulr expressions in typewriter font nd vriles rnging over regulr expressions with vrious forms of the letter r. The regulr expression mtching the empty string is written, pronounced epsilon. We use α nd β, pronounced lph nd et, to represent ritrry symols. Converting Regulr Expressions to NFAs A regulr expression cn consist of the following constructions: It cn mtch the empty string, represented y. It cn mtch symol α from the lphet of the lnguge to e mtched. It cn mtch either of two regulr expressions r 1 nd r 2, written r 1 r 2. It cn mtch seuence of two regulr expressions r 1 nd r 2, written r 1 r 2. It cn mtch zero or more instnces of regulr expression r, written r*. We convert regulr expression to nondeterministic finite utomton (NFA) y considering ech cse in the ove definition. By definition, every utomton (whether NFA or DFA) hs single stte. An utomton my hve more thn one ccepting stte, ut ecuse of the wy the rules elow re constructed, the resulting NFAs will hve exctly one ccepting stte. Even if this were not the cse, it would e esy to crete uniue ccepting stte y simply creting new uniue ccepting stte nd inserting -trnsitions from the previous ccepting sttes to the new one. 1

We egin with the se cses, tht is, regulr expressions α nd. The symol α is mtched y n utomton tht hs stte, n ccepting stte, nd trnsition on α from the stte to the ccepting stte: α Becuse we re creting n NFA, we cn use the sme construction to mtch the empty expression, just with n -trnsition to the ccepting stte rther thn symol from the lphet. In other words, the regulr expression gives rise to the following utomton: In the remining cses, tht is, the composite regulr expressions, we need to represent the result of recursive use of the conversion procedure. The result of converting some regulr expression r to n NFA will e shown s follows: r The leftmost stte in the ox represents the stte of r s utomton, nd the ccepting stte to the right represent the ccepting sttes of r s utomton. A choice etween r 1 nd r 2, written r 1 r 2, is constructed y creting new stte with -trnsitions to the sttes of r 1 s nd r 2 s utomt, nd n ccepting stte with -trnsitions from their ccepting sttes, s follows: r 1 A seuence of two regulr expressions r 1 nd r 2, written r 1 r 2, is converted to n NFA y simply ttching the ccepting sttes of r 1 s NFA to the initil stte of r 2 s NFA: r 2 r 1 r 2 The NFA of the repeting expression r* hs n -trnsition from its stte to r s stte, nd n -trnsition from r s ccepting stte to its ccepting stte. We llow unlimited repetitions of r y inserting n -trnsition from its ccepting stte to its stte, nd we llow r to e skipped y inserting n -trnsition from the new stte directly to the new ccepting stte. 2

In picture form, the NFA corresponding to r* is s follows: r Converting NFAs to DFAs To convert n NFA to DFA, we must find wy to remove ll -trnsitions nd to ensure tht there is one trnsition per symol in ech stte. We do this y constructing DFA in which ech stte corresponds to set of some sttes from the NFA. In the DFA, trnsitions from stte y some symol α go to the stte tht consists of ll the possile NFA-sttes tht could e reched y α from some NFA stte contined in the present DFA stte. The resulting DFA simultes the given NFA in the sense tht single DFA-trnsition represents mny simultneous NFA-trnsitions. The first concept we need is the -closure, pronounced epsilon closure. The -closure of n NFA stte is the set contining long with ll sttes in the utomton tht re rechle y ny numer of -trnsitions from. In the following utomton, the -closures re given in the tle to the right: α 0 1 β 2 3 Stte -closure 0 { 0, 2, 3 } 1 { 1 } 2 { 2, 3 } 3 { 3 } Likewise, we cn define the -closure of set of sttes to e the sttes rechle y -trnsitions from its memers. In other words, this is the union of the -closures of its elements. To convert our NFA to its DFA counterprt, we egin y tking the -closure of the stte 0 of our NFA nd constructing new stte in our DFA corresponding to tht -closure. Next, for ech symol α in our lphet, we record the set of NFA sttes tht we cn rech from on tht symol. For ech such set, we mke DFA stte corresponding to its -closure, tking cre to do this only once for ech set. In the cse two sets re eul, we simply reuse the existing DFA stte tht we lredy constructed. This process is then repeted for ech of the new DFA sttes (tht is, set of NFA sttes) until we run out of DFA sttes to process. Finlly, every DFA stte whose corresponding set of NFA sttes contins n ccepting stte is itself mrked s n ccepting stte. Exmple For this exmple, we ll convert the regulr expression * into DFA. We eing with simple utomt mtching nd : 3

Next, we construct the utomton mtching * using the utomton for : Finlly, we ttch our -utomton to ech end using the rule for seuencing: To convert this NFA to DFA, we egin y leling the sttes so we cn refer to them during the process: 0 1 6 7 2 5 3 4 4

We egin y tking the -closure of the stte, 0. As there re no epsilon trnsitions, we simply hve the singleton set { 0 }. Our DFA is s follows: Stte NFA Sttes?? { 0 } The NFA stte 0 hs no trnsitions on. Therefore, we mrk this tle entry s n error. On, it hs trnsition to 1. The -closure of 1 is { 1, 2, 3, 5, 6 }. We ssign this set to new DFA stte, yielding the following utomton: Stte NFA Sttes Err { 0 }?? { 1, 2, 3, 5, 6 } The NFA stte 3 hs n -trnsition to 4, which is the only -trnsition in s NFA sttes. The -closure of 4 is { 2, 3, 4, 5, 6 }. The NFA stte 6 hs the only -trnsition in s NFA sttes, nd it leds to 7, whose -closure is simply { 7 }. Stte NFA Sttes Err { 0 } { 1, 2, 3, 5, 6 }?? { 2, 3, 4, 5, 6 }?? { 7 } Anlyzing, we find n -trnsition from the NFA stte 3 to the NFA stte 4. However, we lredy hve DFA stte corresponding to 4 s -closure; nmely, itself. Likewise, we hve -trnsition to 7, whose -closure is represented y the DFA stte. Stte NFA Sttes Err { 0 } { 1, 2, 3, 5, 6 } { 2, 3, 4, 5, 6 }?? { 7 } Finlly, we exmine, corresponding to { 7 }. There re no outgoing trnsitions from 7, so we mrk these s errors: Stte NFA Sttes Err { 0 } { 1, 2, 3, 5, 6 } { 2, 3, 4, 5, 6 } Err Err { 7 } 5

Finlly, we mrk ech DFA stte (set of NFA sttes) s ccepting if t lest one of its memer NFA sttes re ccepting. In this cse, only 7 is ccepting, which mens tht only is ccepting. Stte Accepting Err No No No Err Err Yes (from 7 ) The DFA is now fully constructed. Acknowledgments I would like to thnk Jnus Vrmrken for correcting n error in the first version of this tutoril. 6