g Text Serh @#? ~ Mrko Berezovský Rdek Mřík PAL 0 Nondeterministi Finite Automt n Trnsformtion NFA to DFA nd Simultion of NFA f Text Serh Using Automt A B R Power of Nondeterministi Approh u j Regulr Expression Serh u Deling with trnsitions ] { k e! "!" 4 N q wtf? @# u
NFA, DFA & Text serh Referenes 0 Lnguges, grmmrs, utomt Czeh instnt soures: [] M. Demlová: A4B0JAG http://mth.feld.vut.z/demlov/tehing/jg/predn_jg.html Pges -7, in PAL, you my wish to skip: Proofs, hpters.4,.,.. [] I. Černá, M. Křetínský, A. Kučer: Automty formální jzyky I http://is.muni.z/do/499/el/estud/fi/js0/i00/formlni_jzyky utomty_i.pdf Chpters nd, skip sme prts s in []. English soures: [] B. Melihr, J. Holu, T. Polr: Text Serh Algorithms http://w.felk.vut.z/li/exe/feth.php/ourses/4mpl/melihr-ts-letures-.pdf Chpters.4 nd., it is proly too short, there is nothing to skip. [4] J. E. Hoproft, R. Motwni, J. D. Ullmn: Introdution to Automt Theory folow the link t http://w.felk.vut.z/doku.php/ourses/4mpl/litertur_odkzy Chpters.,.,., there is lot to skip, onsult the teher preferly. For more referenes see PAL links pges http://w.felk.vut.z/doku.php/ourses/4mpl/odkzy_zdroje (CZ) https://w.fel.vut.z/wiki/ourses/e4mpl/referenes (EN)
Finite Automt Overview Deterministi Finite Automton (DFA) Nondeterministi Finite Automton (NFA) A 0 0 Both DFA nd NFA onsist of: Finite input lphet. Finite set of internl sttes Q. One strting stte q 0 Q. Nonempty set of ept sttes F Q. Trnsition funtion. A 0 0 0 0 0, 0 0 DFA trnsition funtion is :Q Q. DFA is lwys in one of its sttes q Q. DFA trnsits from urrent stte to nother stte depending on the urrent input symol. NFA trnsition funtion is : Q P(Q) (P(Q) is the powerset of Q) NFA is lwys (simultneously) in set of some numer of its sttes. NFA trnsits from set of sttes to nother set of sttes depending on the urrent input symol.
Indeterminism Bsis NFA A, its trnsition digrm nd its trnsition tle A 0 4 7,, sttes lphet 0,4 F 4, 0 4,7, F 0 7 7 7 ept sttes mrked
Indeterminism NFA A proessing input word NFA t work 0 4 A 0 4 7,, 4 Ative sttes 7,, 0 4 0 4 7 7,,,, ontinue...
Indeterminism NFA t work 4...ontinued 4 0 4 7,, 0 4 7,, 0 4 7,, Aepted! NFA A hs proessed the word nd went through the input hrters nd respetive sets(!) of sttes {0} {} {, 4} {0,, 7, } {,, 7} {0, 4,, }.
Indeterminism Simultion NFA simultion without trnsform to DFA Eh of the urrent sttes is oupied y one token. Red n input symol nd move the tokens ordingly. If token hs more movement possiilities it will split into two or more tokens, if it hs no movement possiility it will leve the ord, uhm, the trnsition digrm. Red from input
Indeterminism Simultion Input: NFA, text in rry t NFA simultion without trnsform to DFA Ide: Register ll sttes to whih you hve just rrived. In the next step, red the input symol x nd move SIMULTANEOUSLY to ALL sttes to whih you n get from ALL tive sttes long trnsitions mrked y x. SetOfSttes S = {q0}, S_tmp; i = ; while( (i <= t.length) && (!S.isEmpty()) ) { S_tmp = Set.emptySet(); for( q in S ) // for eh stte in S S_tmp.union( delt(q, t[i]) ); S = S_tmp; i++; } return S.ontinsFinlStte(); // true or flse
NFA to DFA Algorithm 7 Generting DFA A equivlent to NFA A using trnsition tles Dt Eh stte of DFA is suset of sttes of NFA. Strt stte of DFA is n one-element set ontining the strt stte of NFA. A stte of DFA is n ept stte iff it ontins t lest one ept stte of NFA. Constrution Crete the strt stte of DFA nd the orresponding first line of its trnsition tle (TT). For eh stte Q of DFA not yet proessed do { Deompose Q into its onstituent sttes Q,..., Qk of NFA For eh symol x of lphet do { S = union of ll referenes in NFA tle t positions [Q] [x],..., [Qk][x] if (S is not mong sttes of DFA yet) dd S to the sttes of DFA nd dd orresponding line to TT of DFA } Mrk Q s proessed } // Rememer, empty set is lso set ot sttes, it n e stte of DFA
NFA to DFA Exmple Generting DFA A equivlent to NFA A A 0,4 F 4, 0 4,7, F 0 7 7 7 0 4 7,, A Copy strt stte 0... Add new stte(s) 0... ontinue...
NFA to DFA Exmple 9 Generting DFA A equivlent to NFA A 0,4 F 4, 0 4,7, F 0 7 7 7 Add new stte(s) A 0 4 F 4... A 0 4 7,, ontinue...
NFA to DFA Exmple 0 Generting DFA A equivlent to NFA A 0,4 F 4, 0 4,7, F 0 7 7 7 Add new stte(s) A 0 4 F 4 4 4... A 0 4 7,, ontinue...
NFA to DFA Exmple Generting DFA A equivlent to NFA A 0,4 F 4, 0 4,7, F 0 7 7 7 Add new stte(s) A 0 4 F 4 4 07 4 07... A 0 4 7,, ontinue...
NFA to DFA Exmple Generting DFA A equivlent to NFA A A 0,4 F 4, 0 4,7, F 0 7 7 7 0 4 7,, Add new stte(s) A 0 4 F 4 4 07 4 7 F 07 7... ontinue...
NFA to DFA Exmple Generting DFA A equivlent to NFA A A 0,4 F 4, 0 4,7, F 0 7 7 7 0 4 No new stte 7,, Add new stte(s) A 0 4 F 4 4 07 4 7 F 0 07 7... ontinue...
NFA to DFA Exmple 4 Generting DFA A equivlent to NFA A A 0,4 F 4, 0 4,7, F 0 7 7 7 0 4 7,, Add new stte(s) A 0 4 F 4 4 07 4 7 F 0 07 07 7 7 07 7... ontinue...
NFA to DFA Exmple Generting DFA A equivlent to NFA A A 0,4 F 4, 0 4,7, F 0 7 7 7 0 4 7,, Add new stte(s) A 0 4 F 4 4 07 4 7 F 0 07 07 7 7 7 7 07 7 7... ontinue...
NFA to DFA Exmple Generting DFA A equivlent to NFA A A 0,4 F 4, 0 4,7, F 0 7 7 7 0 4 7,, A Add new stte(s) 0 4 F 4 4 07 4 7 F 0 07 07 7 7 7 7 07 7 07 F 7 7 07 7... ontinue... ontinue... ontinue...
NFA to DFA Exmple 7 0 n DFA A equivlent to NFA A A A...FINISHED! 0,4 F 4, 0 4,7, F 0 7 7 7 0 4 7,, n 4 n F 4 n n 4 n 07 4 n 7 F 0 n n 07 07 7 n 7 7 n 7 07 7 n 07 0 4 n F 7 04 n 7 n 07 0 4 n 7 0 n 0 0 4 n F 4 04 n 07 04 0 7 F 0 0 n 0 4 n F 4 4 n 07 47 7 n 4 0 7 F 47 7 F 07 7 n 07 n 0 4 n 04 n n 4 7 F n n n n
Text Serh Repetition To e used with gret ution! Nïve pproh. Align the pttern with the eginning of the text.. While orresponding symols of the pttern nd the text mth eh other move forwrd y one symol in the pttern.. When symol mismth ours shift the pttern forwrd y one symol, reset position in the pttern to the eginning of the pttern nd go to. 4. When the end of the pttern is pssed report suess, shift the pttern forwrd y one symol, reset position in the pttern to its eginning nd go to.. When the end of the text is rehed stop. Might e effiient in fvourle text Strt text. Pttern shift... text... pttern x pttern x fter while: et... text pttern... x mth mismth
Text Serh Bsis 9 Alphet: Finite set of symols. Text: Sequene of symols of the lphet. Pttern: Sequene of symols of the sme lphet. Gol: Pttern ourene is to e deteted in the text. Text is often fixed or seldom hnged, pttern typilly vries (looking for different words in the sme doument), pttern is often signifintly shorter thn the text. Nottion Alphet: Symols in the text: t, t,, t n. Symols in the pttern: p, p,, p m. It holds m n, usully m << n Exmple Text:...tsk is very simple ut it is used very freq... Pttern: simple
Power of Indeterminism Exmples 0 NFA A whih epts just single word p p p p 4. A 0 p p p p 4 4 NFA A 4 whih epts eh word with suffix p p p p 4 nd its trnsition tle. A 4 p 0 p p p 4 4 p p p p 4 z 0 0, 0 0 0 0 4 4 F z {p, p, p, p4}
Power of Indeterminism Esy desription A 4 repeted 0 NFA A 4 whih epts eh word with suffix p p p p 4 nd its trnsition tle. p p p p 4 4 p p p p 4 z 0 0, 0 0 0 0 4 4 F z {p, p, p, p4} equivlently DFA A is deterministi equivlent of NFA A 4. A x = {x} p p p p p p p p 4 0 0 0 0 04 p,p 4,z p,p 4,z p,p,z p p p p p p 4 z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 04 0 04 0 0 0 0 0 F Supposing p p p p 4 re mutully different!
Power of Indeterminism Esy onstrution A exmple 0 NFA A whih epts eh word with suffix nd its trnsition tle 4 z 0 0, 0 0 4 4 F z {, } DFA A 7 is deterministi equivlent of NFA A. It lso epts eh word with suffix. A 7,z 0 0 0 0 04 z z,z z z 0 0 0 0 0 0 0 0 0 0 0 0 0 04 0 0 04 0 0 0 F Note the struturl differene etween A nd A 7.
Power of Indeterminism Simple exmples NFA epting extly one word p p p p 4. 0 p p p p 4 4 NFA epting ny word with suffix p p p p 4. 0 p p p p 4 4 NFA epting ny word with sustring (ftor) p p p p 4 nywhere in it. A 0 p p p p 4 4
Power of Indeterminism Esy modifitions 4 NFA epting ny word with sustring (ftor) p p p p 4 nywhere in it. 0 p p p p 4 4 Cn e used for serhing, ut the following redution is more frequent. Text serh NFA for finding pttern P = p p p p 4 in the text. 0 p p p p 4 4 NFA stops when pttern is found. Wnt to know the position of the pttern in the text? Equip the trnsitions with ounter. [pos=0] 0, [pos++] p p p p 4 4
Power of Indeterminism Exmples Exmple NFA epting ny word with susequene p p p p 4 nywhere in it. 0 p p p p 4 4 Exmple NFA epting ny word with susequene p p p p 4 nywhere in it, one symol in the sequene my e ltered. p p p p 4 0 4 p p p 4 7 Alterntively: NFA epting ny word ontining susequene Q whose Hmming distne from p p p p 4 is t most.
Lnguges Hierrhy Wider piture Serh NFA n serh for more thn one pttern simultneously. The numer of ptterns n e finite -- this leds lso to ditionry utomton (we will meet it lter) or infinite -- this leds to regulr lnguge. Chomsky lnguge hierrhy reminder Grmmr Lnguge Automton Type-0 Type- Type- Type- Reursively enumerle Turing mhine Context-sensitive Liner-ounded non-deterministi Turing mhine Context-free Non-deterministi pushdown utomton Regulr Finite stte utomton (NFA or DFA) Only regulr lnguges n e proessed y NFA/DFA. More omplex lnguges nnot. For exmple, ny lnguge ontining well-formed prentheses is ontext-free nd not regulr nd nnot e reognized y NFA/DFA.
Regulr Lnguges A reminder 7 Opertions on regulr lnguges Let L nd L e ny lnguges. Then L L is union of L nd L. It is set of ll words whih re in L or in L. L.L is ontention of L nd L. It is set of ll words w for whih holds w = w w (ontention of words w nd w ), where w L nd w L. L * is Kleene str or Kleene losure of lnguge L. It is set of ll words whih re ontentions of ny numer (inl. zero) of ny words of L in ny order. Closure property Whenever L nd L re regulr lnguges then L L, L.L, L * re regulr lnguges too. Exmple L = {00, 000, 0000,...}, L = {0, 0, 0,...}. L L = {00, 0, 000, 0, 000, 0,...} L.L = {000, 000, 000,..., 0000, 0000, 0000,... } L * = {, 00, 0000, 000000,... 00000, 00000000,......, 000000, 0000000,... } // this one is not esy to list niely... or is it?
Regulr Expressions Another reminder Regulr expressions defined reursively Symol is regulr expression. Eh symol of lphet is regulr expression. Whenever e nd e re regulr expressions then lso strings (e ), e +e, e e, (e ) * re regulr expressions. Lnguges represented y regulr expressions (RE) defined reursively RE represents lnguge ontining only empty string. RE x, where x, represents lnguge {x}. Let RE's e nd e represent lnguges L nd L. Then RE (e ) represents L, RE e +e represents L L, REs e e, e.e represent L.L, RE (e ) * represents L *. Exmples 0+(0+) * ll integers in inry without leding 0's 0.(0+) * ll finite inry frtions (0, ) without triling 0's ((0+)(0++++4+++7++9) + (0+++)):(0++++4+)(0++++4+++7++9) ll 440 dy's times in formt hh:mm from 00:00 to :9 (mon+(wedne+t(ue+hur))s+fri+s(tur+un))dy English nmes of dys in the week (+++4+++7++9)(0++++4+++7++9) * ((+7)+(+0)0) ll deiml integers 00 divisile y
Regulr Expressions Conversion to NFA 9 Convert regulr expression to NFA Input: Regulr expression R ontining n hrters of the given lphet. Output: NFA reognizing lnguge L(R) desried y R. Crete strt stte S for eh k ( k n) { ssign index k to the k-th hrter in R // this mkes ll hrters in R unique: [], [],..., [n]. rete stte S[k] // S[k] orresponds diretly to [k] } for eh k ( k n) { if [k] n e the first hrter in some string desried y R then rete trnsition S S[k] leled y [k] with index stripped off if [k] n e the lst hrter in some string desried y R then mrk S[k] s finl stte for eh p ( p n) if ([k] n follow immeditely fter [p] in some string desried y R) then rete trnsition S[p] S[k] leled y [k] with index stripped off }
Regulr Expression to NFA Exmple 0 Regulr expression R = *( + *)* + Add indies: R = * ( + 4 * )* + 7 NFA epts L(R) S S7 4 S
Regulr Expressions Serh NFA NFA serhes the text for ny ourene of ny word of L(R) R = * ( + *)* + S S7 The only differene from the NFA epting R 4 S
Regulr Expressions More pplitions Bonus To find susequene representing word L(R), where R is regulr expression, do the following: Crete NFA epting L(R) Add self loops to the sttes of NFA:. Self loop leled y (whole lphet) t the strt stte.. Self loop leled {x} t eh stte whose outgoing trnsition(s) re leled y single x. // serves s n "optimized" wit loop. Self loop leled y t eh stte whose outgoing trnsition(s) re leled y more thn single symol from. // serves s n "usul" wit loop 4. No self loop to ll other sttes. // whih hve no outgoing loop = finl ones
Regulr Expressions Susequene serh Bonus NFA serhes the text for ny ourene of ny susequene representing word of L(R) R = + ( + )*, S S,,,, 4 7 S9
Regulr Expressions Effetivity of NFA 4 Trnsforming NFA whih serhes text for n ourene of word of given regulr lnguge into the equivlent DFA might tke exponentil spe nd thus lso exponentil time. Not lwys, ut sometimes yes: Consider regulr expression R = (+)(+)...(+) over lphet {, }. Text serh NFA for R NFA, 0,,,, n Mystery Text serh NFA for R, why not this one? NFA, 0 4 7
Regulr Expressions Effetivity of NFA R = (+)(+) Text serh NFA for R, 0,, NFA tle 0 0, 0 - - Text serh DFA for R DFA tle 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Epsilon Trnsitions Definition/exmple Serh the text for more thn just ext mth NFA with trnsitions The trnsition from one stte to nother n e performed without reding ny input symol. Suh trnsition is leled y symol. losure Symol CLOSURE(p) denotes the set of ll sttes q, whih n e rehed from stte p using only trnsitions. By definition, let CLOSURE(p) = {p} when there is no trnsition out from p. CLOSURE(0) = {,, 4} CLOSURE() = {} CLOSURE() = {, 4} CLOSURE() = {}... A 9 0, 4
Epsilon Trnsitions Removl 7 Constrution of equivlent NFA without trnsitions Input: NFA A with some trnsitions. Output: NFA A' without trnsitions.. A' = ext opy of A.. Remove ll trnsitions from A'.. In A' for eh (q, ) do: dd to the set (p,) ll suh sttes r for whih it holds q CLOSURE(p) nd (q,) =r. 4. Add to the set of finl sttes F in A' ll sttes p for whih it holds CLOSURE(p) F. esy onstrution p q, r t p, q, r t
Epsilon Trnsitions Removed, 4 0 NFA with s trnsitions, 4 0, Equivlent NFA without trnsitions New trnsitions nd ept sttes re highlighted
Epsilon Trnsitions Applition 9 NFA for serh for ny unempty sustring of pttern p p p p 4 over lphet. Note the trnsitions. 0 A p p p p 4 4 p p p 4 7 9 p p 4 0 p 4 Powerful trik! Union of two or more NFA: Crete dditionl strt stte S nd dd trnsitions from S to strt sttes of ll involved NFA's. Drw n exmple yourself!
Epsilon Trnsitions Applition ont. 40 Equivlent NFA for serh for ny unempty sustring of pttern p p p p 4 with trnsitions removed. A 0 p p p p 4 4 p p p p 4 7 p p p 4 9 p 0 p 4 p 4 Sttes, 9, re unrehle. Trnsformtion lgorithm NFA -> DFA if pplied, will neglet them. p 4 p 4
Epsilon Trnsitions Removed / DFA 4 p p p p 4 z 0 0, 0, 0,0 0, 0 0 F 0 F 4 0 F 4 0 F 0 0 7 0 F 7 0 F 0 F 9 0 0 0 0 F 0 F 0 0 F p p p p 4 z 0 0. 0. 0.0 0. 0 0. 0. 0.. 0.0 0. 0 F 0. 0. 0. 0.7.0 0. 0 F 0.0 0. 0. 0.0 0.. 0 F 0. 0. 0. 0.0 0. 0 F 0.. 0. 0. 0..7.0 0. 0 F 0.7.0 0. 0. 0.0 0... 0 F 0.. 0. 0. 0.0 0. 0 F 0..7.0 0. 0. 0.0 0.4... 0 F 0... 0. 0. 0.0 0. 0 F 0.4... 0. 0. 0.0 0. 0 F Trnsition tle of NFA ove without trnsitions. Trnsition tle of DFA whih is equivlent to previous NFA. DFA in this se hs less sttes thn the equivlent NFA. Q: Does it hold for ny utomton of this type? Proof?
Text Serh with epsilon trnsitions 4 Text serh using NFA simultion without trnsform to DFA Input: NFA, text in rry t, SetOfSttes S = eps_closure(q0), S_tmp; int i = ; while ((i <= t.length) && (!S.empty())) { for (q in S) // for eh stte in S if (q.isfinl) print(q.finl_stte_info); // pttern found } S_tmp = Set.empty(); // trnsiton to next for (q in S) // set of sttes S_tmp.union(eps_CLOSURE(delt(q, t[i]))); S = S_tmp; i++; // next hr in text return S.ontinsFinlStte(); // true or flse