Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30
Lnguges nd Automt 1 Lnguges nd Automt Wht is lnguge? Finite utomt: recp 2 Finite utomton Regulr lnguge DFAs nd NFAs 3 2 / 30
Lnguges nd Automt Lnguges nd lphbets Wht is lnguge? Finite utomt: recp Throughout this course, lnguges will consist of finite sequences of symbols drwn from some given lphbet. An lphbet Σ is simply some finite set of letters or symbols which we tret s primitive. These might be... English letters: Σ = {, b,..., z} Deciml digits: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} ASCII chrcters: Σ = {0, 1,...,, b,...,?,!,...} Progrmming lnguge tokens : Σ = {if, while, x, ==,...} Words in (some frgment of) nturl lnguge. Primitive ctions performble by mchine or system, e.g. Σ = {insert50p, pressbutton1,...} In toy exmples, we ll use simple lphbets like {0, 1} or {, b, c}. 3 / 30
Lnguges nd Automt Wht is lnguge? Wht is lnguge? Finite utomt: recp A lnguge over n lphbet Σ will consist of finite sequences (strings) of elements of Σ. E.g. the following re strings over the lphbet Σ = {, b, c}: b b cb bcc cccccccc There s lso the empty string, which we usully write s ɛ. (Note tht ɛ isn t itself symbol in the lphbet!) A lnguge over Σ is simply (finite or infinite) set of strings over Σ. A string s is legl in the lnguge L if nd only if s L. We write Σ for the set of ll possible strings over Σ. So lnguge L is simply subset of Σ. (L Σ ) (N.B. This is just technicl definition ny rel lnguge is obviously much more thn this!) 4 / 30
Lnguges nd Automt Wys to define lnguge Wht is lnguge? Finite utomt: recp There re mny wys in which we might formlly define lnguge: Direct mthemticl definition, e.g. L 1 = {,, b, bbc} L 2 = {xb x Σ } L 3 = { n b n n 0} Regulr expressions (see Lecture 5): e.g. ( + b) b. Forml grmmrs (see Lecture 9 onwrds): e.g. S ɛ Sb. Specify some mchine tht tests if string is legl or not. The more complex the lnguge, the more complex the mchine might need to be. As we shll see, ech level in the Chomsky hierrchy is correlted with certin clss of mchines. 5 / 30
Lnguges nd Automt Wht is lnguge? Finite utomt: recp Finite utomt (.k.. finite stte mchines) 1 1 0 even odd 0 This is n exmple of finite utomton over Σ = {0, 1}. At ny moment, the mchine is in one of 2 sttes. From ny stte, ech symbol in Σ determines destintion stte we cn jump to. The stte mrked with the in-rrow is picked out s the strting stte. So ny string in Σ gives rise to sequence of sttes. Certin sttes (with double circles) re designted s ccepting. We cll string legl if it tkes us from the strt stte to some ccepting stte. In this wy, the mchine defines lnguge L Σ : the lnguge L is the set of ll legl strings. 6 / 30
Lnguges nd Automt Quick test question... Wht is lnguge? Finite utomt: recp 1 1 0 even odd 0 For the finite stte mchine shown here, which of the following strings re legl (i.e. ccepted)? 1 ɛ 2 11 3 1010 4 1101 7 / 30
Lnguges nd Automt Quick test question... Wht is lnguge? Finite utomt: recp 1 1 0 even odd 0 For the finite stte mchine shown here, which of the following strings re legl (i.e. ccepted)? 1 ɛ 2 11 3 1010 4 1101 Answer: 1,2,3 re legl, 4 isn t. 7 / 30
Lnguges nd Automt Wht is lnguge? Finite utomt: recp More generlly, for ny current stte nd ny symbol, there my be zero, one or mny new sttes we cn jump to. 0,1 1 0,1 0,1 0,1 0,1 q0 q1 q2 q3 q4 q5 Here there re two trnsitions for 1 from q0, nd none from q5. The lnguge ssocited with the mchine is defined to consist of ll strings tht re ccepted under some possible execution run. The lnguge ssocited with the exmple mchine bove is {x Σ the fifth symbol from the end of x is 1} 8 / 30
Lnguges nd Automt Finite utomton Regulr lnguge DFAs nd NFAs Forml definition of finite utomton Formlly, finite utomton with lphbet Σ consists of: A finite set Q of sttes, A trnsition reltion Q Σ Q, A set S Q of possible strting sttes. A set F Q of ccepting sttes. 9 / 30
Lnguges nd Automt Exmple forml definition Finite utomton Regulr lnguge DFAs nd NFAs 0,1 1 0,1 0,1 0,1 0,1 q0 q1 q2 q3 q4 q5 Q = {q0, q1, q2, q3, q4, q5} = { (q0, 0, q0), (q0, 1, q0), (q0, 1, q1), (q1, 0, q2), (q1, 1, q2), (q2, 0, q3), (q2, 1, q3), (q3, 0, q4), (q3, 1, q4), (q4, 0, q5), (q4, 1, q5) } S = {q0} F = {q5} 10 / 30
Regulr lnguge Lnguges nd Automt Finite utomton Regulr lnguge DFAs nd NFAs Suppose M = (Q,, S, F ) is finite utomton with lphbet Σ. We sy tht string x Σ is ccepted if there exists pth through the set of sttes Q, strting t some stte s S, ending t some stte f F, with ech step tken from the reltion, nd with the pth s whole spelling out the string x. This enbles us to define the lnguge ccepted by M: L(M) = {x Σ x is ccepted by M} We cll lnguge L Σ regulr if L = L(M) for some finite utomton M. Regulr lnguges re the subject of lectures 4 8 of the course. 11 / 30
DFAs nd NFAs Lnguges nd Automt Finite utomton Regulr lnguge DFAs nd NFAs A finite utomton with lphbet Σ is deterministic if: It hs exctly one strting stte. For every stte q Q nd symbol Σ there is exctly one stte q for which there exists trnsition q q in. The first condition sys tht S is singleton set. The second condition sys tht specifies function Q Σ Q. Deterministic finite utomt re usully bbrevited DFAs. Generl finite utomt re usully clled nondeterministic, by wy of contrst, nd bbrevited NFAs. Note tht every DFA is n NFA. 12 / 30
Exmple Lnguges nd Automt Finite utomton Regulr lnguge DFAs nd NFAs 1 1 0 even odd This is DFA (nd hence n NFA). 0 0,1 1 0,1 0,1 0,1 0,1 q0 q1 q2 q3 q4 q5 This is n NFA but not DFA. 13 / 30
Chllenge question Lnguges nd Automt Finite utomton Regulr lnguge DFAs nd NFAs Consider the following NFA over {, b, c}: b c Wht is the minimum number of sttes of n equivlent DFA? 14 / 30
Solution Lnguges nd Automt Finite utomton Regulr lnguge DFAs nd NFAs An equivlent DFA must hve t lest 5 sttes! b c b c............ (grbge stte),b,c 15 / 30
Specifying DFA Lnguges nd Automt Finite utomton Regulr lnguge DFAs nd NFAs Clerly, DFA with lphbet Σ cn equivlently be given by: Exmple: A finite set Q of sttes, A trnsition function δ : Q Σ Q, A single designted strting stte s Q, A set F Q of ccepting sttes. Q = {even, odd} 0 1 δ : even odd even odd even odd s = even F = {even} 16 / 30
Lnguges nd Automt Running finite utomton DFAs re ded esy to implement nd efficient to run. We don t need much more thn two-dimensionl rry for the trnsition function δ. Given n input string x it is esy to follow the unique pth determined by x nd so determine whether or not the DFA ccepts x. It is by no mens so obvious how to run n NFA over n input string x. How do we prevent ourselves from mking incorrect nondeterministic choices? Solution: At ech stge in processing the string, keep trck of ll the sttes the mchine might possibly be in. 17 / 30
Lnguges nd Automt Executing n NFA: exmple Given n NFA N over Σ nd string x Σ, how cn we in prctice decide whether x L(N)? We illustrte with the running exmple below. q0,b q2,b q1 String to process: b 18 / 30
Stge 0: initil stte Lnguges nd Automt At the strt, the NFA cn only be in the initil stte q0. q0,b q2,b q1 String to process: Processed so fr: Next symbol: b ɛ 19 / 30
Lnguges nd Automt Stge 1: fter processing The NFA could now be in either q0 or q1. q0,b q2,b q1 String to process: Processed so fr: Next symbol: b b 20 / 30
Lnguges nd Automt Stge 2: fter processing b The NFA could now be in either q1 or q2. q0,b q2,b q1 String to process: Processed so fr: Next symbol: b b 21 / 30
Stge 3: finl stte Lnguges nd Automt The NFA could now be in q2 or q0. (It could hve got to q2 in two different wys, though we don t need to keep trck of this.) q0,b q2,b q1 String to process: Processed so fr: b b Since we ve reched the end of the input string, nd the set of possible sttes includes the ccepting stte q0, we cn sy tht the string b is ccepted by this NFA. 22 / 30
The key insight Lnguges nd Automt The process we ve just described is completely deterministic process! Given ny current set of coloured sttes, nd ny input symbol in Σ, there s only one right nswer to the question: Wht should the new set of coloured sttes be? Wht s more, it s finite stte process. A stte is simply choice of coloured sttes in the originl NFA N. If N hs n sttes, there re 2 n such choices. This suggests how n NFA with n sttes cn be converted into n equivlent DFA with 2 n sttes. 23 / 30
Lnguges nd Automt : exmple Our 3-stte NFA gives rise to DFA with 2 3 = 8 sttes. The sttes of this DFA re subsets of {q0, q1, q2}. {q0,q1, q2} b q0,b q2,b q1 b {q0,q1} {q1,q2} {q0,q2} b b {q0} {q1} {q2} b,b b,b {} The ccepting sttes of this DFA re exctly those tht contin n ccepting stte of the originl NFA. 24 / 30
Lnguges nd Automt in generl Given n NFA N = (Q,, S, F ), we cn define n equivlent DFA M = (Q, δ, s, F ) (over the sme lphbet Σ) like this: Q is 2 Q, the set of ll subsets of Q. (Also written P(Q).) δ (A, u) = {q Q q A. (q, u, q ) }. (Set of ll sttes rechble vi u from some stte in A.) s = S. F = {A Q q A. q F }. It s then not hrd to prove mthemticlly tht L(M) = L(N). (See Kozen for detils.) This process is clled determiniztion. Coming up in lecture 6: Appliction of this process to efficient string serching. 25 / 30
Summry Lnguges nd Automt We ve shown tht for ny NFA N, we cn construct DFA M with the sme ssocited lnguge. Since every DFA is lso n NFA, the clsses of lnguges recognised by DFAs nd by NFAs coincide these re the regulr lnguges. Often lnguge cn be specified more concisely by n NFA thn by DFA. We cn utomticlly convert n NFA to DFA, t the risk of n exponentil blow-up in the number of sttes. To determine whether string x is ccepted by n NFA, we don t ctully need to construct the entire DFA. Insted, we efficiently simulte the execution of the DFA on x on step-by-step bsis. (This is clled just-in-time simultion.) 26 / 30
Lnguges nd Automt End-of-lecture question 1 Let M be the DFA shown erlier: 1 1 0 even odd Give simple, concise description of the strings tht re in L(M). 0 27 / 30
Lnguges nd Automt End-of-lecture question 1 Let M be the DFA shown erlier: 1 1 0 even odd Give simple, concise description of the strings tht re in L(M). Answer: They re the strings contining n even number of 0 s. 0 27 / 30
Lnguges nd Automt End-of-lecture question 2 Which of these three lnguges do you think re regulr? L 1 = {,, b, bbc} L 2 = {xb x Σ } L 3 = { n b n n 0} If not regulr, cn you explin why not? 28 / 30
Lnguges nd Automt End-of-lecture question 2 Which of these three lnguges do you think re regulr? L 1 = {,, b, bbc} L 2 = {xb x Σ } L 3 = { n b n n 0} If not regulr, cn you explin why not? Answer: L 1 is regulr (esy to see tht ny finite set of strings is regulr lnguge). L 2 is regulr (esy to give DFA). L 3 is not regulr for the reson, see Lecture 8. 28 / 30
Lnguges nd Automt End-of-lecture chllenge question 3 Consider our first exmple NFA over {0, 1}: 0,1 1 0,1 0,1 0,1 0,1 q0 q1 q2 q3 q4 q5 Wht is the number of sttes of the smllest DFA tht recognises the sme lnguge? Answer given t end of Lecture 4. 29 / 30
Reference mteril Lnguges nd Automt Kozen chpters 3, 5 nd 6. J & M section 2.2 (rther brief). 30 / 30