Introduction to the Theory of Computation Some Notes for CIS511

Size: px

Start display at page:

Download "Introduction to the Theory of Computation Some Notes for CIS511"

Lionel Reynolds
5 years ago
Views:

1 Introduction to the Theory of Computtion Some Notes for CIS511 Jen Gllier Deprtment of Computer nd Informtion Science University of Pennsylvni Phildelphi, PA 19104, USA e-mil: c Jen Gllier Plese, do not reproduce without permission of the uthor Decemer 26, 2017

2 Contents 1 Introduction 7 2 Bsics of Forml Lnguge Theory Alphets, Strings, Lnguges Opertions on Lnguges DFA s, NFA s, Regulr Lnguges Deterministic Finite Automt (DFA s) The Cross-product Construction Nondeteterministic Finite Automt (NFA s) ϵ-closure Converting n NFA into DFA Finite Stte Automt With Output: Trnsducers An Appliction of NFA s: Text Serch Hidden Mrkov Models (HMMs) Hidden Mrkov Models (HMMs) The Viteri Algorithm nd the Forwrd Algorithm Regulr Lnguges, Minimiztion of DFA s Morphisms, F -Mps, B-Mps nd Homomorphisms of DFA s Directed Grphs nd Pths Leled Grphs nd Automt The Closure Definition of the Regulr Lnguges Regulr Expressions Regulr Expressions nd Regulr Lnguges Regulr Expressions nd NFA s Right-Invrint Equivlence Reltions on Σ Finding miniml DFA s Stte Equivlence nd Miniml DFA s The Pumping Lemm A Fst Algorithm for Checking Stte Equivlence Context-Free Grmmrs And Lnguges 131 2

3 CONTENTS Context-Free Grmmrs Derivtions nd Context-Free Lnguges Norml Forms for Context-Free Grmmrs Regulr Lnguges re Context-Free Useless Productions in Context-Free Grmmrs The Greich Norml Form Lest Fixed-Points Context-Free Lnguges s Lest Fixed-Points Lest Fixed-Points nd the Greich Norml Form Tree Domins nd Gorn Trees Derivtions Trees Ogden s Lemm Pushdown Automt From Context-Free Grmmrs To PDA s From PDA s To Context-Free Grmmrs The Chomsky-Schutzenerger Theorem ASurveyofLR-Prsing Methods LR(0)-Chrcteristic Automt Shift/Reduce Prsers Computtion of FIRST The Intuition Behind the Shift/Reduce Algorithm The Grph Method for Computing Fixed Points Computtion of FOLLOW Algorithm Trverse More on LR(0)-Chrcteristic Automt LALR(1)-Lookhed Sets Computing FIRST, FOLLOW, etc. in the Presence of ϵ-rules LR(1)-Chrcteristic Automt RAM Progrms, Turing Mchines Prtil Functions nd RAM Progrms Definition of Turing Mchine Computtions of Turing Mchines RAM-computle functions re Turing-computle Turing-computle functions re RAM-computle Computly Enumerle nd Computle Lnguges The Primitive Recursive Functions The Prtil Computle Functions Universl RAM Progrms nd the Hlting Prolem Piring Functions Equivlence of Alphets

4 4 CONTENTS 9.3 Coding of RAM Progrms Kleene s T -Predicte A Simple Function Not Known to e Computle A Non-Computle Function; Busy Bevers Elementry Recursive Function Theory Acceptle Indexings Undecidle Prolems Listle (Recursively Enumerle) Sets Reduciility nd Complete Sets The Recursion Theorem Extended Rice Theorem Cretive nd Productive Sets Listle nd Diophntine Sets; Hilert s Tenth Diophntine Equtions; Hilert s Tenth Prolem Diophntine Sets nd Listle Sets Some Applictions of the DPRM Theorem The Post Correspondence Prolem; Applictions The Post Correspondence Prolem Some Undecidility Results for CFG s More Undecidle Properties of Lnguges Computtionl Complexity; P nd NP The Clss P Directed Grphs, Pths Eulerin Cycles Hmiltonin Cycles Propositionl Logic nd Stisfiility The Clss NP, NP-Completeness The Cook-Levin Theorem Some NP-Complete Prolems Sttements of the Prolems Proofs of NP-Completeness Succinct Certifictes, conp, ndexp Primlity Testing is in NP Prime Numers nd Composite Numers Methods for Primlity Testing Modulr Arithmetic, the Groups Z/nZ, (Z/nZ) The Lucs Theorem; Lucs Trees

5 CONTENTS Algorithms for Computing Powers Modulo m PRIMES is in NP Phrse-Structure nd Context-Sensitive Grmmrs Phrse-Structure Grmmrs Derivtions nd Type-0 Lnguges Type-0 Grmmrs nd Context-Sensitive Grmmrs

6 6 CONTENTS

7 Chpter 1 Introduction The theory of computtion is concerned with lgorithms nd lgorithmic systems: their design nd representtion, their completeness, nd their complexity. The purpose of these notes is to introduce some of the sic notions ofthetheoryof computtion, including concepts from forml lnguges nd utomt theory, the theory of computility, some sics of recursive function theory, nd n introduction to complexity theory. Other topics such s correctness of progrms will not e treted here (there just isn t enough time!). The notes re divided into three prts. The first prt is devoted to forml lnguges nd utomt. The second prt dels with models of computtion, recursive functions, nd undecidility. The third prt dels with computtionl complexity, in prticulr the clsses P nd NP. 7

8 8 CHAPTER 1. INTRODUCTION

9 Chpter 2 Bsics of Forml Lnguge Theory 2.1 Alphets, Strings, Lnguges Our view of lnguges is tht lngugeissetofstrings. In turn, string is finite sequence of letters from some lphet. These concepts re defined rigorously s follows. Definition 2.1. An lphet Σ is ny finite set. We often write Σ = { 1,..., k }.The i re clled the symols of the lphet. Exmples: Σ={} Σ={,, c} Σ={0, 1} Σ={α, β, γ, δ, ϵ, λ, ϕ, ψ, ω, µ, ν, ρ, σ, η, ξ, ζ} Astringisfinitesequenceofsymols. Techniclly, it is convenient to define strings s functions. For ny integer n 1, let nd for n = 0, let [n] ={1, 2,...,n}, [0] =. Definition 2.2. Given n lphet Σ, string over Σ (or simply string) of length n is ny function u: [n] Σ. The integer n is the length of the string u, nd it is denoted s u. When n =0,the specil string u: [0] Σ of length 0 is clled the empty string, or null string, nd is denoted s ϵ. 9

10 10 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY Given string u: [n] Σ of length n 1, u(i) is the i-th letter in the string u. For simplicity of nottion, we denote the string u s with ech u i Σ. u = u 1 u 2...u n, For exmple, if Σ = {, } nd u: [3] Σ is defined such tht u(1) =, u(2) =, nd u(3) =, we write u =. Other exmples of strings re work, fun, guzomeuh Strings of length 1 re functions u: [1] Σ simply picking some element u(1) = i in Σ. Thus, we will identify every symol i Σ with the corresponding string of length 1. The set of ll strings over n lphet Σ, including the empty string, is denoted s Σ. Oserve tht when Σ =, then = {ϵ}. When Σ, thesetσ is countly infinite. Lter on, we will see wys of ordering nd enumerting strings. Strings cn e juxtposed, or conctented. Definition 2.3. Given n lphet Σ, given ny two strings u: [m] Σndv :[n] Σ, the conctention u v (lso written uv) ofu nd v is the string uv :[m + n] Σ, defined such tht { u(i) if 1 i m, uv(i) = v(i m) if m +1 i m + n. In prticulr, uϵ = ϵu = u. Oservetht uv = u + v. For exmple, if u = g, ndv = uzo, then uv = guzo It is immeditely verified tht u(vw) =(uv)w.

11 2.1. ALPHABETS, STRINGS, LANGUAGES 11 Thus, conctention is inry opertion on Σ which is ssocitive nd hs ϵ s n identity. Note tht generlly, uv vu, for exmple for u = nd v =. Given string u Σ nd n 0, we define u n recursively s follows: u 0 = ϵ u n+1 = u n u (n 0). Clerly, u 1 = u, nd it is n esy exercise to show tht u n u = uu n, for ll n 0. For the induction step, we hve u n+1 u =(u n u)u y definition of u n+1 =(uu n )u y the induction hypothesis = u(u n u) y ssocitivity = uu n+1 y definition of u n+1. Definition 2.4. Given n lphet Σ, given ny two strings u, v Σ we define the following notions s follows: u is prefix of v iff there is some y Σ such tht v = uy. u is suffix of v iff there is some x Σ such tht v = xu. u is sustring of v iff there re some x, y Σ such tht v = xuy. We sy tht u is proper prefix (suffix, sustring) of v iff u is prefix (suffix, sustring) of v nd u v. For exmple, g is prefix of guzo, zo is suffix of guzo nd uz is sustring of guzo. Recll tht prtil ordering on set S is inry reltion S S which is reflexive, trnsitive, nd ntisymmetric. The concepts of prefix, suffix, nd sustring, define inry reltions on Σ in the ovious wy. It cn e shown tht these reltions re prtil orderings. Another importnt ordering on strings is the lexicogrphic (or dictionry) ordering.

12 12 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY Definition 2.5. Given n lphet Σ = { 1,..., k } ssumed totlly ordered such tht 1 < 2 < < k, given ny two strings u, v Σ,wedefinethelexicogrphic ordering s follows: u v (1) if v = uy, forsomey Σ,or (2) if u = x i y, v = x j z, i < j, with i, j Σ, nd for some x, y, z Σ. Note tht cses (1) nd (2) re mutully exclusive. In cse (1) u is prefix of v. Incse (2) v u nd u v. For exmple, gllhger gllier. It is firly tedious to prove tht the lexicogrphic ordering is in fct prtil ordering. In fct, it is totl ordering, which mens tht for ny two strings u, v Σ, either u v, orv u. The reversl w R of string w is defined inductively s follows: where Σndu Σ. ϵ R = ϵ, (u) R = u R, For exmple It cn e shown tht Thus, nd when u i Σ, we hve reillg = gllier R. (uv) R = v R u R. (u 1...u n ) R = u R n...u R 1, (u 1...u n ) R = u n...u 1. We cn now define lnguges. Definition 2.6. Given n lphet Σ, lnguge over Σ (or simply lnguge) is ny suset L of Σ.

13 2.1. ALPHABETS, STRINGS, LANGUAGES 13 If Σ, there re uncountly mny lnguges. A Quick Review of Finite,Infinite,Countle,nd Uncountle Sets For detils nd proofs, see Discrete Mthemtics, y Gllier. Let N = {0, 1, 2,...} e the set of nturl numers. Recll tht set X is finite if there is some nturl numer n N nd ijection etween X nd the set [n] ={1, 2,...,n}. (Whenn =0,X =, the empty set.) The numer n is uniquely determined. It is clled the crdinlity (or size) of X nd is denoted y X. A set is infinite iff it is not finite. Recll tht ny injection or surjection of finite set to itself is in fct ijection. The ove fils for infinite sets. The pigeonhole principle sserts tht there is no ijection etween finite set X nd ny proper suset Y of X. Consequence: If we think of X s set of n pigeons nd if there re only m<noxes (corresponding to the elements of Y ), then t lest two of the pigeons must shre the sme ox. As consequence of the pigeonhole principle, set X is infinite iff it is in ijection with proper suset of itself. For exmple, we hve ijection n 2n etween N nd the set 2N of even nturl numers, proper suset of N, son is infinite. N. AsetX is countle (or denumerle) if there is n injection from X into N. If X is not the empty set, then X is countle iff there is surjection from N onto X. It cn e shown tht set X is countle if either it is finite or if it is in ijection with We will see lter tht N N is countle. As consequence, the set Q of rtionl numers is countle. A set is uncountle if it is not countle. For exmple, R (the set of rel numers) is uncountle. Similrly (0, 1) = {x R 0 <x<1} is uncountle. However, there is ijection etween (0, 1) nd R (find one!) The set 2 N of ll susets of N is uncountle.

14 14 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY If Σ, thenthesetσ of ll strings over Σ is infinite nd countle. Suppose Σ = k with Σ = { 1,..., k }. If k = 1 write = 1,ndthen {} = {ϵ,,,,..., n,...}. We hve the ijection n n from N to {}. If k 2, then we cn think of the string u = i1 in s representtion of the integer ν(u) in se k shifted y (k n 1)/(k 1), ν(u) =i 1 k n 1 + i 2 k n i n 1 k + i n (with ν(ϵ) = 0). = kn 1 k 1 +(i 1 1)k n 1 + +(i n 1 1)k + i n 1. We leve it s n exercise to show tht ν :Σ N is ijection. In fct, ν correspond to the enumertion of Σ where u precedes v if u < v, ndu precedes v in the lexicogrphic ordering if u = v. For exmple, if k = 2 nd if we write Σ = {, }, then the enumertion egins with ϵ,,,,,,. On the other hnd, if Σ, theset2 Σ of ll susets of Σ (ll lnguges) is uncountle. Indeed, we cn show tht there is no surjection from N onto 2 Σ. First, we show tht there is no surjection from Σ onto 2 Σ. We clim tht if there is no surjection from Σ onto 2 Σ, then there is no surjection from N onto 2 Σ either. Assume y contrdiction tht there is surjection g : N 2 Σ. But, if Σ, thenσ is infinite nd countle, thus we hve the ijection ν :Σ N. Then the composition Σ ν N g 2 Σ is surjection, ecuse the ijection ν is surjection, g is surjection, nd the composition of surjections is surjection, contrdicting the hypothesis tht there is no surjection from Σ onto 2 Σ. To prove tht tht there is no surjection Σ onto 2 Σ.Weusedigonliztion rgument. This is n instnce of Cntor s Theorem.

15 2.2. OPERATIONS ON LANGUAGES 15 Theorem 2.1. (Cntor) There is no surjection from Σ onto 2 Σ. Proof. Assume there is surjection h: Σ 2 Σ, nd consider the set D = {u Σ u/ h(u)}. By definition, for ny u we hve u D iff u/ h(u). Since h is surjective, there is some w Σ such tht h(w) =D. Then, since y definition of D nd since D = h(w), we hve w D iff w/ h(w) =D, contrdiction. Therefore g is not surjective. Therefore, if Σ, then2 Σ is uncountle. We will try to single out countle trctle fmilies of lnguges. We will egin with the fmily of regulr lnguges,nd then proceed to the context-free lnguges. We now turn to opertions on lnguges. 2.2 Opertions on Lnguges A wy of uilding more complex lnguges from simpler ones is to comine them using vrious opertions. First, we review the set-theoretic opertions of union, intersection, nd complementtion. Given some lphet Σ, for ny two lnguges L 1,L 2 over Σ, the union L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ w L 1 or w L 2 }. The intersection L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ w L 1 nd w L 2 }. The difference L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ w L 1 nd w/ L 2 }. The difference is lso clled the reltive complement. A specil cse of the difference is otined when L 1 =Σ, in which cse we define the complement L of lnguge L s L = {w Σ w/ L}. The ove opertions do not use the structure of strings. The following opertions use conctention.

16 16 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY Definition 2.7. Given n lphet Σ, for ny two lnguges L 1,L 2 over Σ, the conctention L 1 L 2 of L 1 nd L 2 is the lnguge L 1 L 2 = {w Σ u L 1, v L 2,w= uv}. For ny lnguge L, wedefinel n s follows: L 0 = {ϵ}, L n+1 = L n L (n 0). The following properties re esily verified: L =, L =, L{ϵ} = L, {ϵ}l = L, (L 1 {ϵ})l 2 = L 1 L 2 L 2, L 1 (L 2 {ϵ}) =L 1 L 2 L 1, L n L = LL n. In generl, L 1 L 2 L 2 L 1. So fr, the opertions tht we hve introduced, except complementtion (since L =Σ L is infinite if L is finite nd Σ is nonempty), preserve the finiteness of lnguges. This is not the cse for the next two opertions. Definition 2.8. Given n lphet Σ, for ny lnguge L over Σ, the Kleene -closure L of L is the lnguge L = n 0 L n. The Kleene +-closure L + of L is the lnguge L + = n 1 L n. Thus, L is the infinite union L = L 0 L 1 L 2... L n...,

17 2.2. OPERATIONS ON LANGUAGES 17 nd L + is the infinite union L + = L 1 L 2... L n... Since L 1 = L, othl nd L + contin L. In fct, L + = {w Σ, n 1, u 1 L u n L, w = u 1 u n }, nd since L 0 = {ϵ}, L = {ϵ} {w Σ, n 1, u 1 L u n L, w = u 1 u n }. Thus, the lnguge L lwys contins ϵ, ndwehve L = L + {ϵ}. However, if ϵ/ L, thenϵ/ L +. The following is esily shown: = {ϵ}, L + = L L, L = L, L L = L. The Kleene closures hve mny other interesting properties. Homomorphisms re lso very useful. Given two lphets Σ,, homomorphism h: Σ etween Σ nd is function h: Σ such tht Letting u = v = ϵ, weget which implies tht (why?) h(uv) =h(u)h(v) for ll u, v Σ. h(ϵ) =h(ϵ)h(ϵ),

18 18 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY h(ϵ) =ϵ. If Σ = { 1,..., k }, it is esily seen tht h is completely determined y h( 1 ),...,h( k ) (why?) Exmple: Σ={,, c}, ={0, 1}, nd For exmple h() =01, h() =011, h(c) =0111. h(c) = Given ny lnguge L 1 Σ,wedefinetheimge h(l 1 ) of L 1 s h(l 1 )={h(u) u L 1 }. Given ny lnguge L 2,wedefinethe inverse imge h 1 (L 2 ) of L 2 s h 1 (L 2 )={u Σ h(u) L 2 }. We now turn to the first formlism for defining lnguges, Deterministic Finite Automt (DFA s)

19 Chpter 3 DFA s, NFA s, Regulr Lnguges The fmily of regulr lnguges is the simplest, yet interesting fmily of lnguges. We give six definitions of the regulr lnguges. 1. Using deterministic finite utomt (DFAs). 2. Using nondeterministic finite utomt (NFAs). 3. Using closure definition involving, union, conctention, nd Kleene. 4. Using regulr expressions. 5. Using right-invrint equivlence reltions of finite index (the Myhill-Nerode chrcteriztion). 6. Using right-liner context-free grmmrs. We prove the equivlence of these definitions, often y providing n lgorithm for converting one formultion into nother. We find tht the introduction of NFA s is motivted y the conversion of regulr expressions into DFA s. To finish this conversion, we lso show tht every NFA cn e converted into DFA (using the suset construction). So, lthough NFA s often llow for more concise descriptions, they do not hve more expressive power thn DFA s. NFA s operte ccording to the prdigm: guess successful pth, nd check it in polynomil time. This is the essence of n importnt clss of hrd prolems known s NP, which will e investigted lter. 19

20 20 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES We will lso discuss methods for proving tht certin lnguges re not regulr (Myhill- Nerode, pumping lemm). We present lgorithms to convert DFA to n equivlent one with miniml numer of sttes. 3.1 Deterministic Finite Automt (DFA s) First we define wht DFA s re, nd then we explin how they re used to ccept or reject strings. Roughly speking, DFA is finite trnsition grph whose edges re leled with letters from n lphet Σ. The grph lso stisfies certin properties tht mke it deterministic. Bsiclly, this mens tht given ny string w, strting from ny node, there is unique pth in the grph prsing the string w. Exmple 1. A DFA for the lnguge L 1 = {} + = {} {}, i.e., L 1 = {,,,..., () n,...}. Input lphet: Σ = {, }. Stte set Q 1 = {0, 1, 2, 3}. Strt stte: 0. Set of ccepting sttes: F 1 = {2}. Trnsition tle (function) δ 1 : Note tht stte 3 is trp stte or ded stte. Here is grph representtion of the DFA specified y the trnsition function shown ove:

21 3.1. DETERMINISTIC FINITE AUTOMATA (DFA S) , Figure 3.1: DFA for {} + i.e., Exmple 2. A DFA for the lnguge Input lphet: Σ = {, }. Stte set Q 2 = {0, 1, 2}. Strt stte: 0. Set of ccepting sttes: F 2 = {0}. Trnsition tle (function) δ 2 : Stte 2 is trp stte or ded stte. L 2 = {} = L 1 {ϵ} L 2 = {ϵ,,,,..., () n,...} Here is grph representtion of the DFA specified y the trnsition function shown ove: 0 1 2, Figure 3.2: DFA for {}

22 22 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Exmple 3. A DFA for the lnguge L 3 = {, } {}. Note tht L 3 consists of ll strings of s nd s ending in. Input lphet: Σ = {, }. Stte set Q 3 = {0, 1, 2, 3}. Strt stte: 0. Set of ccepting sttes: F 3 = {3}. Trnsition tle (function) δ 3 : Here is grph representtion of the DFA specified y the trnsition function shown ove: Figure 3.3: DFA for {, } {} Is this miniml DFA? Definition 3.1. A deterministic finite utomton (or DFA) is quintuple D =(Q, Σ,δ,q 0,F), where Σ is finite input lphet; Q is finite set of sttes;

23 3.1. DETERMINISTIC FINITE AUTOMATA (DFA S) 23 F is suset of Q of finl (or ccepting) sttes; q 0 Q is the strt stte (or initil stte); δ is the trnsition function, function δ : Q Σ Q. For ny stte p Q nd ny input Σ, the stte q = δ(p, ) is uniquely determined. Thus, it is possile to define the stte reched from given stte p Q on input w Σ, following the pth specified y w. Techniclly, this is done y defining the extended trnsition function δ : Q Σ Q. Definition 3.2. Given DFA D =(Q, Σ,δ,q 0,F), the extended trnsition function δ : Q Σ Q is defined s follows: δ (p, ϵ) =p, δ (p, u) =δ(δ (p, u),), where Σndu Σ. It is immedite tht δ (p, ) =δ(p, ) for Σ. The mening of δ (p, w) is tht it is the stte reched from stte p following the pth from p specified y w. We cn show (y induction on the length of v) tht δ (p, uv) =δ (δ (p, u),v) for ll p Q nd ll u, v Σ For the induction step, for u Σ, nd ll v = y with y Σ nd Σ, δ (p, uy) =δ(δ (p, uy), ) y definition of δ = δ(δ (δ (p, u), y), ) y induction = δ (δ (p, u), y) y definition of δ. We cn now define how DFA ccepts or rejects string. Definition 3.3. Given DFA D =(Q, Σ,δ,q 0,F), the lnguge L(D) ccepted (or recognized) y D is the lnguge L(D) ={w Σ δ (q 0,w) F }.

24 24 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Thus, string w Σ is ccepted iff the pth from q 0 on input w ends in finl stte. The definition of DFA does not prevent the possiility tht DFA my hve sttes tht re not rechle from the strt stte q 0, which mens tht there is no pth from q 0 to such sttes. For exmple, in the DFA D 1 defined y the trnsition tle elow nd the set of finl sttes F = {1, 2, 3}, the sttes in the set {0, 1} re rechle from the strt stte 0, ut the sttes in the set {2, 3, 4} re not (even though there re trnsitions from 2, 3, 4to0,ut they go in the wrong direction) Since there is no pth from the strt stte 0 to ny of the sttes in {2, 3, 4}, thesttes 2, 3, 4 re useless s fr s cceptnce of strings, so they should e deleted s well s the trnsitions from them. Given DFA D =(Q, Σ,δ,q 0,F), the ove suggests defining the set Q r of rechle (or ccessile) sttess Q r = {p Q ( u Σ )(p = δ (q 0,u))}. The set Q r consists of those sttes p Q such tht there is some pth from q 0 to p (long some string u). Computing the set Q r is rechility prolem in directed grph. There re vrious lgorithms to solve this prolem, including redth-first serch or depth-first serch. Once the set Q r hs een computed, we cn clen up the DFA D y deleting ll redundnt sttes in Q Q r nd ll trnsitions from these sttes. More precisely, we form the DFA D r =(Q r, Σ,δ r,q 0,Q r F ), where δ r : Q r Σ Q r is the restriction of δ : Q Σ Q to Q r. If D 1 is the DFA of the previous exmple, then the DFA (D 1 ) r is otined y deleting the sttes 2, 3, 4: It cn e shown tht L(D r )=L(D) (see the homework prolems).

25 3.2. THE CROSS-PRODUCT CONSTRUCTION 25 ADFAD such tht Q = Q r is sid to e trim (or reduced). Oserve tht the DFA D r is trim. A miniml DFA must e trim. Computing Q r gives us method to test whether DFA D ccepts nonempty lnguge. Indeed L(D) iff Q r F We now come to the first of severl equivlent definitions of the regulr lnguges. Regulr Lnguges, Version 1 Definition 3.4. A lnguge L is regulr lnguge if it is ccepted y some DFA. Note tht regulr lnguge my e ccepted y mny different DFAs. Lter on, we will investigte how to find miniml DFA s. For given regulr lnguge L, miniml DFA for L is DFA with the smllest numer of sttes mong ll DFA s ccepting L. A miniml DFA for L must exist since every nonempty suset of nturl numers hs smllest element. In order to understnd how complex the regulr lnguges re, we will investigte the closure properties of the regulr lnguges under union, intersection, complementtion, conctention, nd Kleene. It turns out tht the fmily of regulr lnguges is closed under ll these opertions. For union, intersection, nd complementtion, we cn use the cross-product construction which preserves determinism. However, for conctention nd Kleene, there does not pper to e ny method involving DFA s only. The wy to do it is to introduce nondeterministic finite utomt (NFA s), which we do little lter. 3.2 The Cross-product Construction Let Σ = { 1,..., m } e n lphet. Given ny two DFA s D 1 =(Q 1, Σ,δ 1,q 0,1,F 1 )nd D 2 =(Q 2, Σ,δ 2,q 0,2,F 2 ), there is very useful construction for showing tht the union, the intersection, or the reltive complement of regulr lnguges, is regulr lnguge. Given ny two lnguges L 1,L 2 over Σ, recll tht L 1 L 2 = {w Σ w L 1 or w L 2 }, L 1 L 2 = {w Σ w L 1 nd w L 2 }, L 1 L 2 = {w Σ w L 1 nd w/ L 2 }.

26 26 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Let us first explin how to constuct DFA ccepting the intersection L 1 L 2. Let D 1 nd D 2 e DFA s such tht L 1 = L(D 1 )ndl 2 = L(D 2 ). The ide is to construct DFA simulting D 1 nd D 2 in prllel. This cn e done y using sttes which re pirs (p 1,p 2 ) Q 1 Q 2. Thus, we define the DFA D s follows: D =(Q 1 Q 2, Σ,δ,(q 0,1,q 0,2 ),F 1 F 2 ), where the trnsition function δ :(Q 1 Q 2 ) Σ Q 1 Q 2 is defined s follows: for ll p 1 Q 1, p 2 Q 2,nd Σ. δ((p 1,p 2 ),)=(δ 1 (p 1,), δ 2 (p 2,)), Clerly, D is DFA, since D 1 nd D 2 re. Also, y the definition of δ, wehve for ll p 1 Q 1, p 2 Q 2,ndw Σ. Now, we hve w L(D 1 ) L(D 2 ) δ ((p 1,p 2 ),w)=(δ 1 (p 1,w), δ 2 (p 2,w)), iff w L(D 1 )ndw L(D 2 ), iff δ 1(q 0,1,w) F 1 nd δ 2(q 0,2,w) F 2, iff (δ 1 (q 0,1,w),δ 2 (q 0,2,w)) F 1 F 2, iff δ ((q 0,1,q 0,2 ),w) F 1 F 2, iff Thus, L(D) =L(D 1 ) L(D 2 ). w L(D). We cn now modify D very esily to ccept L(D 1 ) L(D 2 ). We chnge the set of finl sttes so tht it ecomes (F 1 Q 2 ) (Q 1 F 2 ). Indeed, w L(D 1 ) L(D 2 ) iff w L(D 1 )orw L(D 2 ), iff δ 1 (q 0,1,w) F 1 or δ 2 (q 0,2,w) F 2, iff (δ 1 (q 0,1,w),δ 2 (q 0,2,w)) (F 1 Q 2 ) (Q 1 F 2 ), iff δ ((q 0,1,q 0,2 ),w) (F 1 Q 2 ) (Q 1 F 2 ), iff w L(D). Thus, L(D) =L(D 1 ) L(D 2 ). We cn lso modify D very esily to ccept L(D 1 ) L(D 2 ).

27 3.3. NONDETETERMINISTIC FINITE AUTOMATA (NFA S) 27 We chnge the set of finl sttes so tht it ecomes F 1 (Q 2 F 2 ). Indeed, w L(D 1 ) L(D 2 ) iff w L(D 1 )ndw/ L(D 2 ), iff δ1 (q 0,1,w) F 1 nd δ2 (q 0,2,w) / F 2, iff (δ1(q 0,1,w),δ2(q 0,2,w)) F 1 (Q 2 F 2 ), iff δ ((q 0,1,q 0,2 ),w) F 1 (Q 2 F 2 ), iff w L(D). Thus, L(D) =L(D 1 ) L(D 2 ). In ll cses, if D 1 hs n 1 sttes nd D 2 hs n 2 sttes, the DFA D hs n 1 n 2 sttes. 3.3 Nondeteterministic Finite Automt (NFA s) NFA s re otined from DFA s y llowing multiple trnsitions from given stte on given input. This cn e done y defining δ(p, ) ssuset of Q rther thn single stte. It will lso e convenient to llow trnsitions on input ϵ. We let 2 Q denote the set of ll susets of Q, including the empty set. The set 2 Q is the power set of Q. Exmple 4. A NFA for the lnguge Input lphet: Σ = {, }. Stte set Q 4 = {0, 1, 2, 3}. Strt stte: 0. Set of ccepting sttes: F 4 = {3}. Trnsition tle δ 4 : L 3 = {, } {}. 0 {0, 1} {0} 1 {2} 2 {3} 3, Figure 3.4: NFA for {, } {}

28 28 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Exmple 5. Let Σ = { 1,..., n }, let L i n = {w Σ w contins n odd numer of i s}, nd let L n = L 1 n L 2 n L n n. The lnguge L n consists of those strings in Σ tht contin n odd numer of some letter i Σ. Equivlently Σ L n consists of those strings in Σ with n even numer of every letter i Σ. It cn e shown tht every DFA ccepting L n hs t lest 2 n sttes. However, there is n NFA with 2n + 1 sttes ccepting L n. We define NFA s s follows. Definition 3.5. A nondeterministic finite utomton (or NFA) is quintuple N =(Q, Σ,δ,q 0,F), where Σ is finite input lphet; Q is finite set of sttes; F is suset of Q of finl (or ccepting) sttes; q 0 Q is the strt stte (or initil stte); δ is the trnsition function, function δ : Q (Σ {ϵ}) 2 Q. For ny stte p Q nd ny input Σ {ϵ}, thesetofsttesδ(p, ) is uniquely determined. We write q δ(p, ). Given n NFA N =(Q, Σ,δ,q 0,F), we would like to define the lnguge ccepted y N. However, given n NFA N, unlike the sitution for DFA s, given stte p Q nd some input w Σ, in generl there is no unique pth from p on input w, utinstedtreeof computtion pths. For exmple, given the NFA shown elow,, Figure 3.5: NFA for {, } {}

29 3.3. NONDETETERMINISTIC FINITE AUTOMATA (NFA S) 29 from stte 0 on input w = we otin the following tree of computtion pths: Figure 3.6: A tree of computtion pths on input Oserve tht there re three kinds of computtion pths: 1. A pth on input w ending in rejecting stte (for exmple, the lefmost pth). 2. A pth on some proper prefix of w, long which the computtion gets stuck (for exmple, the rightmost pth). 3. A pth on input w ending in n ccepting stte (such s the pth ending in stte 3). The cceptnce criterion for NFA is very lenient: string w is ccepted iff the tree of computtion pths contins some ccepting pth (of type (3)). Thus, ll filed pths of type (1) nd (2) re ignored. Furthermore, there is no chrge for filed pths. A string w is rejected iff ll computtion pths re filed pths of type (1) or (2). The philosophy of nondeterminism is tht n NFA guesses n ccepting pth nd then checks it in polynomil time y following this pth. We re only chrged for one ccepting pth (even if there re severl ccepting pths). A wy to cpture this cceptnce policy if to extend the trnsition function δ : Q (Σ {ϵ}) 2 Q to function

30 30 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES δ : Q Σ 2 Q. The presence of ϵ-trnsitions (i.e., when q δ(p, ϵ)) cuses technicl prolems, nd to overcome these prolems, we introduce the notion of ϵ-closure. 3.4 ϵ-closure Definition 3.6. Given n NFA N = (Q, Σ,δ,q 0,F) (with ϵ-trnsitions) for every stte p Q, theϵ-closure of p is set ϵ-closure(p) consisting of ll sttes q such tht there is pth from p to q whose spelling is ϵ (n ϵ-pth). ϵ. This mens tht either q = p, or tht ll the edges on the pth from p to q hve the lel We cn compute ϵ-closure(p) using sequence of pproximtions s follows. Define the sequence of sets of sttes (ϵ-clo i (p)) i 0 s follows: ϵ-clo 0 (p) ={p}, ϵ-clo i+1 (p) =ϵ-clo i (p) {q Q s ϵ-clo i (p), q δ(s, ϵ)}. Since ϵ-clo i (p) ϵ-clo i+1 (p), ϵ-clo i (p) Q, for ll i 0, nd Q is finite, it cn e shown tht there is smllest i, syi 0,suchtht ϵ-clo i0 (p) =ϵ-clo i0 +1(p). It suffices to show tht there is some i 0suchthtϵ-clo i (p) =ϵ-clo i+1 (p), ecuse then there is smllest such i (since every nonempty suset of N hs smllest element). Assume y contrdiction tht ϵ-clo i (p) ϵ-clo i+1 (p) for ll i 0. Then, I clim tht ϵ-clo i (p) i + 1 for ll i 0. This is true for i = 0 since ϵ-clo 0 (p) ={p}. Since ϵ-clo i (p) ϵ-clo i+1 (p), there is some q ϵ-clo i+1 (p) tht does not elong to ϵ-clo i (p), nd since y induction ϵ-clo i (p) i +1,weget ϵ-clo i+1 (p) ϵ-clo i (p) +1 i +1+1=i +2, estlishing the induction hypothesis.

31 3.4. ϵ-closure 31 If n = Q, then ϵ-clo n (p) n + 1, contrdiction. Therefore, there is indeed some i 0suchtht ϵ-clo i (p) =ϵ-clo i+1 (p), nd for the lest such i = i 0,wehvei 0 n 1. It cn lso e shown tht y proving tht 1. ϵ-clo i (p) ϵ-closure(p), for ll i ϵ-closure(p) i ϵ-clo i0 (p), for ll i 0. ϵ-closure(p) =ϵ-clo i0 (p), where ϵ-closure(p) i is the set of sttes rechle from p y n ϵ-pth of length i. When N hs no ϵ-trnsitions, i.e., when δ(p, ϵ) = for ll p Q (which mens tht δ cn e viewed s function δ : Q Σ 2 Q ), we hve ϵ-closure(p) ={p}. It should e noted tht there re more efficient wys of computing ϵ-closure(p), for exmple, using stck (siclly, kind of depth-first serch). We present such n lgorithm elow. It is ssumed tht the types NFA nd stck re defined. If n is the numer of sttes of n NFA N, we let eclotype = rry[1..n] of oolen function eclosure[n : NFA,p: integer]: eclotype; egin vr eclo: eclotype, q, s: integer, st: stck; for ech q setsttes(n) do eclo[q] := flse; endfor eclo[p] := true; st := empty; trns := delttle(n); st := push(st, p); while st emptystck do q = pop(st); for ech s trns(q, ϵ) do if eclo[s] =flse then eclo[s] := true; st := push(st, s)

32 32 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES endif endfor endwhile; eclosure := eclo end This lgorithm cn e esily dpted to compute the set of sttes rechle from given stte p (in DFA or n NFA). Given suset S of Q, we define ϵ-closure(s) s ϵ-closure(s) = s S ϵ-closure(s), with When N hs no ϵ-trnsitions, we hve ϵ-closure( ) =. ϵ-closure(s) =S. We re now redy to define the extension δ : Q Σ 2 Q of the trnsition function δ : Q (Σ {ϵ}) 2 Q. 3.5 Converting n NFA into DFA The intuition ehind the definition of the extended trnsition function is tht δ (p, w) is the set of ll sttes rechle from p y pth whose spelling is w. Definition 3.7. Given n NFA N =(Q, Σ,δ,q 0,F) (with ϵ-trnsitions), the extended trnsition function δ : Q Σ 2 Q is defined s follows: for every p Q, everyu Σ,nd every Σ, δ (p, ϵ) =ϵ-closure({p}), ( δ (p, u) =ϵ-closure In the second eqution, if δ (p, u) = then s δ (p,u) δ (p, u) =. The lnguge L(N) ccepted y n NFA N is the set ) δ(s, ). L(N) ={w Σ δ (q 0,w) F }.

33 3.5. CONVERTING AN NFA INTO A DFA 33 Oserve tht the definition of L(N) conforms to the lenient cceptnce policy: string w is ccepted iff δ (q 0,w) contins some finl stte. We cn lso extend δ : Q Σ 2 Q to function δ :2 Q Σ 2 Q defined s follows: for every suset S of Q, foreveryw Σ, δ(s, w) = s S δ (s, w), with δ(,w)=. Let Q e the suset of 2 Q consisting of those susets S of Q tht re ϵ-closed, i.e., such tht If we consider the restriction S = ϵ-closure(s). : Q Σ Q of δ :2 Q Σ 2 Q to Q nd Σ, we oserve tht is the trnsition function of DFA. Indeed, this is the trnsition function of DFA ccepting L(N). It is esy to show tht is defined directly s follows (on susets S in Q): ( ) (S, ) =ϵ-closure δ(s, ), s S with (,)=. Then, the DFA D is defined s follows: where F = {S Q S F }. D =(Q, Σ,,ϵ-closure({q 0 }), F), It is not difficult to show tht L(D) =L(N), tht is, D is DFA ccepting L(N). For this, we show tht (S, w) = δ(s, w). Thus, we hve converted the NFA N into DFA D (nd gotten rid of ϵ-trnsitions).

34 34 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES Since DFA s re specil NFA s, the suset construction shows tht DFA s nd NFA s ccept the sme fmily of lnguges, the regulr lnguges, version 1 (lthough not with the sme complexity). The sttes of the DFA D equivlent to N re ϵ-closed susets of Q. For this reson, the ove construction is often clled the suset construction. This construction is due to Rin nd Scott. Although theoreticlly fine, the method my construct useless sets S tht re not rechle from the strt stte ϵ-closure({q 0 }). A more economicl construction is given next. An Algorithm to convert n NFA into DFA: The suset construction Given n input NFA N =(Q, Σ,δ,q 0,F), DFA D =(K, Σ,,S 0, F) is constructed. It is ssumed tht K is liner rry of sets of sttes S Q, nd is 2-dimensionl rry, where [i, ] is the index of the trget stte of the trnsition from K[i] =S on input, with S K, nd Σ. S 0 := ϵ-closure({q 0 }); totl := 1; K[1] := S 0 ; mrked := 0; while mrked < totl do; mrked := mrked + 1;S := K[mrked]; for ech Σ do U := s S δ(s, ); T := ϵ-closure(u); if T / K then totl := totl + 1; K[totl] := T endif; [mrked, ] := index(t ) endfor endwhile; F := {S K S F } Let us illustrte the suset construction on the NFA of Exmple 4. A NFA for the lnguge L 3 = {, } {}. Trnsition tle δ 4 :

35 3.5. CONVERTING AN NFA INTO A DFA 35 Set of ccepting sttes: F 4 = {3}. 0 {0, 1} {0} 1 {2} 2 {3} 3, Figure 3.7: NFA for {, } {} The pointer corresponds to mrked nd the pointer to totl. Initil trnsition tle. Just fter entering the while loop After the first round through the while loop. After just reentering the while loop. index sttes A {0} index sttes A {0} index sttes A {0} B A B {0, 1} index sttes A {0} B A B {0, 1} After the second round through the while loop. index sttes A {0} B A B {0, 1} B C C {0, 2}

36 36 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES After the third round through the while loop. index sttes A {0} B A B {0, 1} B C C {0, 2} B D D {0, 3} After the fourth round through the while loop. index sttes A {0} B A B {0, 1} B C C {0, 2} B D D {0, 3} B A This is the DFA of Figure 3.3, except tht in tht exmple A, B, C, D re renmed 0, 1, 2, Figure 3.8: DFA for {, } {} 3.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input (except ccept or reject ). It is interesting nd useful to consider input/output finite stte mchines. Such utomt re clled trnsducers. They compute functions or reltions. First, we define deterministic kind of trnsducer. Definition 3.8. A generl sequentil mchine (gsm) is sextuple M =(Q, Σ,,δ,λ,q 0 ), where (1) Q is finite set of sttes,

37 3.6. FINITE STATE AUTOMATA WITH OUTPUT: TRANSDUCERS 37 (2) Σ is finite input lphet, (3) is finite output lphet, (4) δ : Q Σ Q is the trnsition function, (5) λ: Q Σ is the output function nd (6) q 0 is the initil (or strt) stte. If λ(p, ) ϵ, for ll p Q nd ll Σ, then M is nonersing. If λ(p, ) for ll p Q nd ll Σ, we sy tht M is complete sequentil mchine (csm). An exmple of gsm for which Σ = {, } nd = {0, 1, 2} is shown in Figure 3.9. For exmple is converted to /11 /01 /10 /21 /00 /20 2 Figure 3.9: Exmple of gsm In order to define how gsm works, we extend the trnsition nd the output functions. We define δ : Q Σ Q nd λ : Q Σ recursively s follows: For ll p Q, ll u Σ nd ll Σ For ny w Σ, we let δ (p, ϵ) =p δ (p, u) =δ(δ (p, u),) λ (p, ϵ) =ϵ nd for ny L Σ nd L, let λ (p, u) =λ (p, u)λ(δ (p, u),). M(w) =λ (q 0,w) M(L) ={λ (q 0,w) w L}

38 38 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES nd M 1 (L )={w Σ λ (q 0,w) L }. Note tht if M is csm, then M(w) = w for ll w Σ. Also, homomorphism is specil kind of gsm it cn e relized y gsm with one stte. We cn use gsm s nd csm s to compute certin kinds of functions. Definition 3.9. A function f :Σ is gsm (resp. csm) mpping iff there is gsm (resp. csm) M so tht M(w) =f(w), for ll w Σ. Remrk: Ginsurg nd Rose (1966) chrcterized gsm mppings s follows: A function f :Σ is gsm mpping iff () f preserves prefixes, i.e., f(x) is prefix of f(xy); () There is n integer, m, such tht for ll w Σ nd ll Σ, we hve f(w) f(w) m; (c) f(ϵ) =ϵ; (d) For every regulr lnguge, R, the lnguge f 1 (R) ={w Σ f(w) R} is regulr. A function f :Σ is csm mpping iff f stisfies () nd (d), nd for ll w Σ, f(w) = w. The following proposition is left s homework prolem. Proposition 3.1. The fmily of regulr lnguges (over n lphet Σ) isclosedunderoth gsm nd inverse gsm mppings. We cn generlize the gsm model so tht (1) the device is nondeterministic, (2) the device hs set of ccepting sttes, (3) trnsitions re llowed to occur without new input eing processed, (4) trnsitions re defined for input strings insted of individul letters. Here is the definition of such model, the -trnsducer. A much more powerful model of trnsducer will e investigted lter: the Turing mchine.

39 3.6. FINITE STATE AUTOMATA WITH OUTPUT: TRANSDUCERS 39 Definition An -trnsducer (or nondeterministic sequentil trnsducer with ccepting sttes) is sextuple M =(K, Σ,,λ,q 0,F), where (1) K is finite set of sttes, (2) Σ is finite input lphet, (3) is finite output lphet, (4) q 0 K is the strt (or initil) stte, (5) F K is the set of ccepting (of finl) sttes nd (6) λ K Σ K is finite set of qudruples clled the trnsition function of M. If λ K Σ + K, thenm is ϵ-free Clerly, gsm is specil kind of -trnsducer. An -trnsducer defines inry reltion etween Σ nd, or equivlently, function M :Σ 2. We cn explin wht this function is y descriing how n -trnsducer mkes sequence of moves from configurtions to configurtions. The current configurtion of n -trnsducer is descried y triple (p, u, v) K Σ, where p is the current stte, u is the remining input, nd v is some ouput produced so fr. We define the inry reltion M on K Σ s follows: For ll p, q K, u, α Σ, β,v, if (p, u, v, q) λ, then (p, uα, β) M (q, α, βv). Let M e the trnsitive nd reflexive closure of M.

40 40 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES The function M :Σ 2 is defined such tht for every w Σ, M(w) ={y (q 0,w,ϵ) M (f, ϵ, y), f F }. For ny lnguge L Σ let M(L) = M(w). w L For ny y, let M 1 (y) ={w Σ y M(w)} nd for ny lnguge L, let M 1 (L )= M 1 (y). y L Remrk: Notice tht if w M 1 (L ), then there exists some y L such tht w M 1 (y), i.e., y M(w). This does not imply tht M(w) L, only tht M(w) L. One should relize tht for ny L nd ny -trnsducer, M, there is some - trnsducer, M,(from to 2 Σ )sothtm (L )=M 1 (L ). The following proposition is left s homework prolem: Proposition 3.2. The fmily of regulr lnguges (over n lphet Σ) isclosedunderoth -trnsductions nd inverse -trnsductions. 3.7 An Appliction of NFA s: Text Serch A common prolem in the ge of the We (nd on-line text repositories) is the following: Given set of words, clled the keywords, find ll the documents tht contin one (or ll) of those words. Serch engines re populr exmple of this process. Serch engines use inverted indexes (for ech word ppering on the We, list of ll the plces where tht word occurs is stored). However, there re pplictions tht re unsuited for inverted indexes, ut re good for utomton-sed techniques. Some text-processing progrms, such s dvnced forms of the UNIX grep commnd (such s egrep or fgrep) re sed on utomton-sed techniques. The chrcteristics tht mke n ppliction suitle for serches tht use utomt re:

41 3.7. AN APPLICATION OF NFA S: TEXT SEARCH 41 (1) The repository on which the serch is conducted is rpidly chnging. (2) The documents to e serched cnnot e ctlogued. For exmple, Amzon.com cretes pges on the fly in response to queries. We cn use n NFA to find occurrences of set of keywords in text. This NFA signls y entering finl stte tht it hs seen one of the keywords. The form of such n NFA is specil. (1) There is strt stte, q 0, with trnsition to itself on every input symol from the lphet, Σ. (2) For ech keyword, w = w 1 w k (with w i Σ), there re k sttes, q (w) 1,...,q (w) k,nd there is trnsition from q 0 to q (w) 1 on input w 1, trnsition from q (w) 1 to q (w) 2 on input w 2, nd so on, until trnsition from q (w) k 1 to q(w) k on input w k. The stte q (w) k is n ccepting stte nd indictes tht the keyword w = w 1 w k hs een found. The NFA constructed ove cn then e converted to DFA using the suset construction. Here is n exmple where Σ = {, } nd the set of keywords is {,, }. q1 q2 q3, 0 q1 q2 q1 q2 Figure 3.10: NFA for the keywords,,.

42 42 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES is: Applying the suset construction to the NFA, we otin the DFA whose trnsition tle The finl sttes re: 3, 4, ,q1,q ,q ,q1,q2,q ,q1,q1,q ,q1,q1,q 2,q Figure 3.11: DFA for the keywords,,. The good news news is tht, due to the very specil structure of the NFA, the numer of sttes of the corresponding DFA is t most the numer of sttes of the originl NFA! We find tht the sttes of the DFA re (check it yourself!): (1) The set {q 0 }, ssocited with the strt stte q 0 of the NFA. (2) For ny stte p q 0 of the NFA reched from q 0 long pth corresponding to string u = u 1 u m, the set consisting of:

43 3.7. AN APPLICATION OF NFA S: TEXT SEARCH 43 () q 0 () p (c) The set of ll sttes q of the NFA rechle from q 0 y following pth whose symols form nonempty suffix of u, i.e., string of the form u j u j+1 u m. As consequence, we get n efficient (w.r.t. time nd spce) method to recognize set of keywords. In fct, this DFA recognizes leftmost occurrences of keywords in text (we cn stop s soon s we enter finl stte).

44 44 CHAPTER 3. DFA S, NFA S, REGULAR LANGUAGES

45 Chpter 4 Hidden Mrkov Models (HMMs) 4.1 Hidden Mrkov Models (HMMs) There is vrint of the notion of DFA with ouput, for exmple trnsducer such s gsm (generlized sequentil mchine), which is widely used in mchine lerning. This mchine model is known s hidden Mrkov model, forshorthmm. These notes re only n introduction to HMMs nd re y no mens complete. For more comprehensive presenttions of HMMs, see the references t the end of this chpter. There re three new twists compred to trditionl gsm models: (1) There is finite set of sttes Q with n elements, ijection σ : Q {1,...,n}, nd the trnsitions etween sttes re leled with proilities rther tht symols from n lphet. For ny two sttes p nd q in Q, theedgefromp to q is leled with proility A(i, j), with i = σ(p) ndj = σ(q). The proilities A(i, j) formnn n mtrix A =(A(i, j)). (2) There is finite set O of size m (clled the oservtion spce) of possile outputs tht cn e emitted, ijection ω : O {1,...,m}, ndforeverystteq Q, there is proility B(i, j) thtoutputo O is emitted (produced), with i = σ(q) nd j = ω(o). The proilities B(i, j) formnn m mtrix B =(B(i, j)). (3) Sequences of outputs O =(O 1,...,O T ) (with O t O for t =1,...,T) emitted y the model re directly oservle, utthesequencesofsttess =(q 1,...,q T ) (with q t Q for t =1,...,T) tht cused some sequence of output to e emitted re not oservle. In this sense the sttes re hidden, nd this is the reson for clling this model hidden Mrkov model. Remrk: We could define stte trnsition proility function A: Q Q [0, 1] y A(p, q) = A(σ(p),σ(q)), nd stte oservtion proility function B: Q O [0, 1] y B(p, O) = B(σ(p),ω(O)). The function A conveys exctly the sme mount of informtion 45

46 46 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) s the mtrix A, nd the function B conveys exctly the sme mount of informtion s the mtrix B. The only difference is tht the rguments of A re sttes rther thn integers, so in tht sense it is perhps more nturl. We cn think of A s n implementtion of A. Similrly, the rguments of B re sttes nd outputs rther thn integers. Agin, we cn think of B s n implementtion of B. Most of the literture is rther sloppy out this. We will use mtrices. Before going ny further, we wish to ddress nottionl issue tht everyone who writes out stte-processes fces. This issue is it of hedche which needs to e resolved to void lot of confusion. The issue is how to denote the sttes, the ouputs, s well s (ordered) sequences of sttes nd sequences of output. In most prolems, sttes nd outputs hve meningful nmes. For exmple, if we wish to descrie the evolution of the temperture from dy to dy, it mkes sense to use two sttes Cold nd Hot, nd to descrie whether given individul hs drink y D, nd no drink y N. Thus our set of sttes is Q = {Cold, Hot}, nd our set of outputs is O = {N, D}. However, when computing proilities, we need to use mtrices whose rows nd columns re indexed y positive integers, so we need mechnism to ssocite numericl index to every stte nd to every output, nd this is the purpose of the ijections σ : Q {1,...,n} nd ω : O {1,...,m}. In our exmple, we define σ y σ(cold) = 1 nd σ(hot) = 2, nd ω y ω(n) = 1 nd ω(d) = 2. Some uthor circumvent (or do they?) this nottionl issue y ssuming tht the set of outputs is O = {1, 2,...,m}, nd tht the set of sttes is Q = {1, 2,...,n}. The disdvntge of doing this is tht in rel situtions, it is often more convenient to nme the outputs nd the sttes with more meningful nmes thn 1, 2, 3 etc. With respect to this, Mitch Mrcus pointed out to me tht the tsk of nming the elements of the output lphet cn e chllenging, for exmple in speech recognition. Let us now turn to sequences. For exmple, consider the sequence of six sttes (from the set Q = {Cold, Hot}), S = (Cold, Cold, Hot, Cold, Hot, Hot). Using the ijection σ : {Cold, Hot} {1, 2} defined ove, the sequence S is completely determined y the sequence of indices σ(s) =(σ(cold),σ(cold),σ(hot),σ(cold),σ(hot),σ(hot)) = (1, 1, 2, 1, 2, 2). More generlly, we will denote sequence of length T 1ofsttesfromsetQ of size n y S =(q 1,q 2,...,q T ), with q t Q for t =1,...,T. Using the ijection σ : Q {1,...,n}, thesequences is completely determined y the sequence of indices σ(s) =(σ(q 1 ),σ(q 2 ),...,σ(q T )),

47 4.1. HIDDEN MARKOV MODELS (HMMS) 47 where σ(q t ) is some index from the set {1,...,n}, fort =1,...,T. The prolem now is, wht is etter nottion for the index denoted y σ(q t )? Of course, we could use σ(q t ), ut this is hevy nottion, so we dopt the nottionl convention to denote the index σ(q t ) y i t. 1 Going ck to our exmple we hve S =(q 1,q 2,q 3,q 4,q 4,q 6 ) = (Cold, Cold, Hot, Cold, Hot, Hot), σ(s) =(σ(q 1 ),σ(q 2 ),σ(q 3 ),σ(q 4 ),σ(q 5 ),σ(q 6 )) = (1, 1, 2, 1, 2, 2), so the sequence of indices (i 1,i 2,i 3,i 4,i 5,i 6 )=(σ(q 1 ),σ(q 2 ),σ(q 3 ),σ(q 4 ),σ(q 5 ),σ(q 6 )) is given y σ(s) =(i 1,i 2,i 3,i 4,i 5,i 6 )=(1, 1, 2, 1, 2, 2). So, the fourth index i 4 is hs the vlue 1. We pply similr convention to sequences of outputs. For exmple, consider the sequence of six outputs (from the set O = {N, D}), O =(N, D, N, N, N, D). Using the ijection ω : {N, D} {1, 2} defined ove, the sequence O is completely determined y the sequence of indices ω(o) =(ω(n),ω(d),ω(n),ω(n),ω(n),ω(d)) = (1, 2, 1, 1, 1, 2). More generlly, we will denote sequence of length T 1ofoutputsfromsetO of size m y O =(O 1,O 2,...,O T ), with O t O for t =1,...,T. Using the ijection ω : O {1,...,m}, thesequenceo is completely determined y the sequence of indices ω(o) =(ω(o 1 ),ω(o 2 ),...,ω(o T )), where ω(o t ) is some index from the set {1,...,m}, fort =1,...,T. This time, we dopt the nottionl convention to denote the index ω(o t ) y ω t. Going ck to our exmple O =(O 1,O 2,O 3,O 4,O 5,O 6 )=(N, D, N, N, N, D), 1 We contemplted using the nottion σ t for σ(q t ) insted of i t. However, we feel tht this would devite too much from the common prctice found in the literture, which uses the nottion i t. This is not to sy tht the literture is free of horrily confusing nottion!

48 48 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) we hve ω(o) =(ω(o 1 ),ω(o 2 ),ω(o 3 ),ω(o 4 ),ω(o 5 ),ω(o 6 )) = (1, 2, 1, 1, 1, 2), so the sequence of indices (ω 1,ω 2,ω 3,ω 4,ω 5,ω 6 )=(ω(o 1 ),ω(o 2 ),ω(o 3 ),ω(o 4 ),ω(o 5 ), ω(o 6 )) is given y ω(o) =(ω 1,ω 2,ω 3,ω 4,ω 5,ω 6 )=(1, 2, 1, 1, 1, 2). Remrk: Wht is very confusing is this: to ssume tht our stte set is Q = {q 1,...,q n }, nd to denote sequence of sttes of length T s S =(q 1,q 2,...,q T ). The symol q 1 in the sequence S my ctully refer to q 3 in Q, etc. We feel tht the explicit introduction of the ijections σ : Q {1,...,n} nd ω : O {1,...,m}, lthough not stndrd in the literture, yields mthemticlly clen wy to del with sequences which is not too cumersome, lthough this ltter point is mtter of tste. HMM s re mong the most effective tools to solve the following types of prolems: (1) DNA nd protein sequence lignment in the fce of muttions nd other kinds of evolutionry chnge. (2) Speech understnding, lso clled Automtic speech recognition. When we tlk, our mouths produce sequences of sounds from the sentences tht we wnt to sy. This process is complex. Multiple words my mp to the sme sound, words re pronounced differently s function of the word efore nd fter them, we ll form sounds slightly differently, nd so on. All listener cn her (perhps computer system) is the sequence of sounds, nd the listener would like to reconstruct the mpping (ckwrd) in order to determine wht words we were ttempting to sy. For exmple, when you tlk to your TV to pick progrm, sy gme of thrones, you don t wnt to get Jessic Jones. (3) Opticl chrcter recognition (OCR). When we write, our hnds mp from n idelized symol to some set of mrks on pge (or screen). The mrks re oservle, ut the process tht genertes them isn t. A system performing OCR, such s system used y the post office to red ddresses, must discover which word is most likely to correspond to the mrk it reds. Here is n exmple illustrting the notion of HMM. Exmple 4.1. Sy we consider the following ehvior of some professor t some university. On hot dy (denoted y Hot), the professor comes to clss with drink (denoted D) with proility 0.7, nd with no drink (denoted N) with proility 0.3. On the other hnd, on

49 4.1. HIDDEN MARKOV MODELS (HMMS) 49 cold dy (denoted Cold), the professor comes to clss with drink with proility 0.2, nd with no drink with proility 0.8. Suppose student intrigued y this ehvior recorded sequence showing whether the professor cme to clss with drink or not, sy NNND. Severl months lter, the student would like to know whether the wether ws hot or cold the dys he recorded the drinking ehvior of the professor. Now the student herd out mchine lerning, so he constructs proilistic (hidden Mrkov) model of the wether. Bsed on some experiments, he determines the proility of going from hot dy to nother hot dy to e 0.75, the proility of going from hot to cold dy to e 0.25, the proility of going from cold dy to nother cold dy to e 0.7, nd the proility of going from cold dy to hot dy to e 0.3. He lso knows tht when he strted his oservtions, it ws cold dy with proility 0.45, nd hot dy with proility In this exmple, the set of sttes is Q = {Cold, Hot}, nd the set of outputs is O = {N, D}. We hve the ijection σ : {Cold, Hot} {1, 2} given y σ(cold) = 1 nd σ(hot) = 2, nd the ijection ω : {N, D} {1, 2} given y ω(n) = 1 nd ω(d) = 2 The ove dt determine n HMM depicted in Figure 4.1. strt Cold Hot N D Figure 4.1: Exmple of n HMM modeling the drinking ehvior of professor t the University of Pennsylvni. The portion of the stte digrm involving the sttes Cold, Hot, is nlogous to n NFA in which the trnsition lels re proilities; it is the underlying Mrkov model of the HMM. For ny given stte, the proilities on the outgoing edges sum to 1. The strt stte is convenient wy to express the proilities of strting either in stte Cold or in stte

50 50 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) Hot. Also, from ech of the sttes Cold nd Hot, we hve emission proilities of producing the ouput N or D, nd these proilities lso sum to 1. We cn lso express these dt using mtrices. The mtrix A = descries the trnsitions of the Mkov model, the vector π = descries the proilities of strting either in stte Cold or in stte Hot, nd the mtrix B = descries the emission proilities. Oserve tht the rows of the mtrices A nd B sum to 1. Such mtrices re clled row-stochstic mtrices. The entries in the vector π lso sum to 1. The student would like to solve wht is known s the decoding prolem. Nmely, given the output sequence NNND, find the most likely stte sequence of the Mrkov model tht produces the output sequence NNND. Is it (Cold, Cold, Cold, Cold), or (Hot, Hot, Hot, Hot), or (Hot, Cold, Cold, Hot), or (Cold, Cold, Cold, Hot)? Given the proilities of the HMM, it seems unlikely tht it is (Hot, Hot, Hot, Hot), ut how cn we find the most likely one? Let us consider nother exmple tken from Stmp [20]. Exmple 4.2. Suppose we wnt to determine the verge nnul temperture t prticulr loction over series of yers in distnt pst where thermometers did not exist. Since we cn t go ck in time, we look for indirect evidence of the temperture, sy in terms of the size of tree growth rings. For simplicity, ssume tht we consider the two tempertures Cold nd Hot, nd three different sizes of tree rings: smll, medium nd lrge, which we denote y S, M, L. In this exmple, the set of sttes is Q = {Cold, Hot}, nd the set of outputs is O = {S, M, L}. We hve the ijection σ : {Cold, Hot} {1, 2} given y σ(cold) = 1 nd σ(hot) = 2, nd the ijection ω : {S, M, L} {1, 2, 3} given y ω(s) = 1, ω(m) = 2, nd ω(l) = 3. The HMM shown in Figure 4.2 is model of the sitution. Suppose we oserve the sequence of tree growth rings (S, M, S, L). Wht is the most likely sequence of tempertures over four-yer period which yields the oservtions (S, M, S, L)?

51 4.1. HIDDEN MARKOV MODELS (HMMS) 51 strt Cold Hot S M L Figure 4.2: Exmple of n HMM modeling the temperture in terms of tree growth rings. Going ck to Exmple 4.1, we need to figure out the proility tht sequence of sttes S =(q 1,q 2,...,q T )producestheoutputsequenceo =(O 1,O 2,...,O T ). Then the proility tht we wnt is just the product of the proility tht we egin with stte q 1, times the product of the proilities of ech of the trnsitions, times the product of the emission proilities. With our nottionl conventions, σ(q t )=i t nd ω(o t )=ω t,sowe hve T Pr(S, O) =π(i 1 )B(i 1,ω 1 ) A(i t 1,i t )B(i t,ω t ). In our exmple, ω(o) =(ω 1,ω 2,ω 3,ω 4 )=(1, 1, 1, 2), which corresponds to NNND. The rute-force method is to compute these proilities for ll 2 4 =16sequencesofsttesof length 4 (in generl, there re n T sequences of length T ). For exmple, for the sequence S = (Cold, Cold, Cold, Hot), ssocited with the sequence of indices σ(s) =(i 1,i 2,i 3,i 4 )= (1, 1, 1, 2), we find tht Pr(S, NNND) = π(1)b(1, 1)A(1, 1)B(1, 1)A(1, 1)B(1, 1)A(1, 2)B(2, 2) t=2 = = A much more efficient wy to proceed is to use method sed on dynmic progrmming. Recll the ijection σ : {Cold, Hot} {1, 2}, so tht we will refer to the stte Cold s 1, nd to the stte Hot s 2. For t =1, 2, 3, 4, for every stte i =1, 2, we compute score(i, t) to e the highest proility tht sequence of length t ending in stte i produces the output sequence (O 1,...,O t ), nd for t 2, we let pred(i, t) ethesttethtprecedessttei in est sequence of length t ending in i.

52 52 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) Recll tht in our exmple, ω(o) =(ω 1,ω 2,ω 3,ω 4 )=(1, 1, 1, 2), which corresponds to NNND. Initilly, we set score(j, 1) = π(j)b(j, ω 1 ), j =1, 2, nd since ω 1 =1wegetscore(1, 1) = =0.36, which is the proility of strting in stte Cold nd emitting N, nd score(2, 1) = =0.165, which is the proility of strting in stte Hot nd emitting N. Next we compute score(1, 2) nd score(2, 2) s follows. For j =1, 2, for i =1, 2, compute temporry scores tscore(i, j) =score(i, 1)A(i, j)b(j, ω 2 ); then pick the est of the temporry scores, score(j, 2) = mx tscore(i, j). i Since ω 2 =1,wegettscore(1, 1) = =0.2016, tscore(2, 1) = = , nd tscore(1, 2) = =0.0324, tscore(2, 2) = = Then score(1, 2) = mx{tscore(1, 1),tscore(2, 1)} =mx{0.2016, } =0.2016, which is the lrgest proility tht sequence of two sttes emitting the output(n, N) ends in stte Cold, nd score(2, 2) = mx{tscore(1, 2),tscore(2, 2)} =mx{0.0324, } = which is the lrgest proility tht sequence of two sttes emitting the output(n, N) ends in stte Hot. Since the stte tht leds to the optiml score score(1, 2) is 1, we let pred(1, 2) = 1, nd since the stte tht leds to the optiml score score(2, 2) is 2, we let pred(2, 2) = 2. We compute score(1, 3) nd score(2, 3) in similr wy. For j = 1, 2, for i = 1, 2, compute tscore(i, j) =score(i, 2)A(i, j)b(j, ω 3 ); then pick the est of the temporry scores, score(j, 3) = mx tscore(i, j). i Since ω 3 =1,wegettscore(1, 1) = =0.1129, tscore(2, 1) = = , nd tscore(1, 2) = = , tscore(2, 2) = = Then score(1, 3) = mx{tscore(1, 1),tscore(2, 1)} =mx{0.1129, 0.074} =0.1129,

53 4.1. HIDDEN MARKOV MODELS (HMMS) 53 which is the lrgest proility tht sequence of three sttes emitting the output (N, N, N) ends in stte Cold, nd score(2, 3) = mx{tscore(1, 2),tscore(2, 2)} =mx{0.0181, } =0.0181, which is the lrgest proility tht sequence of three sttes emitting the output (N, N, N) ends in stte Hot. We lso get pred(1, 3) = 1 nd pred(2, 3) = 1. Finlly, we compute score(1, 4) nd score(2, 4) in similr wy. For j =1, 2, for i =1, 2, compute then pick the est of the temporry scores, tscore(i, j) =score(i, 3)A(i, j)b(j, ω 4 ); score(j, 4) = mx tscore(i, j). i Since ω 4 =2,wegettscore(1, 1) = =0.0158, tscore(2, 1) = = , nd tscore(1, 2) = = , tscore(2, 2) = = Then score(1, 4) = mx{tscore(1, 1),tscore(2, 1)} =mx{0.0158, } =0.0158, which is the lrgest proility tht sequence of four sttes emitting the output (N, N, N, D) ends in stte Cold, nd score(2, 4) = mx{tscore(1, 2),tscore(2, 2)} =mx{0.0237, } =0.0237, which is the lrgest proility tht sequence of four sttes emitting the output (N, N, N, D) ends in stte Hot, nd pred(1, 4) = 1 nd pred(2, 4) = 1 Since mx{score(1, 4),score(2, 4)} =mx{0.0158, } =0.0237, the stte with the mximum score is Hot, nd y following the predecessor list (lso clled ckpointer list), we find tht the most likely stte sequence to produce the output sequence NNND is (Cold, Cold, Cold, Hot). The stges of the computtions of score(j, t) fori =1, 2ndt =1, 2, 3, 4cnerecorded in the following digrm clled lttice, or trellis (which mens lttice in French!): Cold Hot Doule rrows represent the predecessor edges. For exmple, the predecessor pred(2, 3) of the third node on the ottom row leled with the score (which corresponds to

54 54 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) Hot), is the second node on the first row leled with the score (which corresponds to Cold). The two incoming rrows to the third node on the ottom row re leled with the temporry scores nd The node with the highest score t time t = 4 is Hot, with score (showed in old), nd y following the doule rrows ckwrd from this node, we otin the most likely stte sequence (Cold, Cold, Cold, Hot). The method we just descried is known s the Viteri lgorithm. We now define HHM s in generl, nd then present the Viteri lgorithm. Definition 4.1. A hidden Mrkov model,forshorthmm, is quintuple M =(Q, O,π,A,B) where Q is finite set of sttes with n elements, nd there is ijection σ : Q {1,...,n}. O is finite output lphet (lso clled set of possile oservtions) with m oservtions, nd there is ijection ω : O {1,...,m}. A =(A(i, j)) is n n n mtrix clled the stte trnsition proility mtrix, with A(i, j) 0, 1 i, j n, nd n A(i, j) =1, j=1 i =1,...,n. B =(B(i, j)) is n n m mtrix clled the stte oservtion proility mtrix (lso clled confusion mtrix ), with B(i, j) 0, 1 i, j n, nd m B(i, j) =1, j=1 i =1,...,n. A mtrix stisfying the ove conditions is sid to e row stochstic. Both A nd B re row-stochstic. We lso need to stte the conditions tht mke M Mrkov model. To do this rigorously requires the notion of rndom vrile nd is it tricky (see the remrk elow), so we will chet s follows: () Given ny sequence of sttes (q 0,...,q t 2,p,q), the conditionl proility tht q is the tth stte given tht the previous sttes were q 0,...,q t 2,p is equl to the conditionl proility tht q is the tth stte given tht the previous stte t time t 1 is p: Pr(q q 0,...,q t 2,p)=Pr(q p). This is the Mrkov property. Informlly, the next stte q of the process t time t is independent of the pst sttes q 0,...,q t 2, provided tht the present stte p t time t 1 is known.

55 4.1. HIDDEN MARKOV MODELS (HMMS) 55 () Given ny sequence of sttes (q 0,...,q i,...,q t ), nd given ny sequence of outputs (O 0,...,O i,...,o t ), the conditionl proility tht the output O i is emitted depends only on the stte q i, nd not ny other sttes or ny other oservtions: Pr(O i q 0,...,q i,...,q t,o 0,...,O i,...,o t )=Pr(O i q i ). This is the output independence condition. Informlly, the output function is nersighted. Exmples of HMMs re shown in Figure 4.1, Figure 4.2, nd Figure 4.3. Note tht n ouput is emitted when visiting stte, not when mking trnsition, s in the cse of gsm. So the nlogy with the gsm model is only prtil; it is ment s motivtion for HMMs. The hidden Mrkov model ws developed y L. E. Bum nd collegues t the Institue for Defence Anlysis t Princeton (including Petrie, Egon, Sell, Soules, nd Weiss) strting in If we ignore the output components O nd B, then we hve wht is clled Mrkov chin. A good interprettion of Mrkov chin is the evolution over (discrete) time of the popultions of n species tht my chnge from one species to nother. The proility A(i, j) is the frction of the popultion of the ith species tht chnges to the jth species. If we denote the popultions t time t y the row vector x =(x 1,...,x n ), nd the popultions t time t +1yy =(y 1,...,y n ), then y j = A(1,j)x A(i, j)x i + + A(n, j)x n, 1 j n, in mtrix form, y = xa. The condition n j=1 A(i, j) = 1 expresses tht the totl popultion is preserved, nmely y y n = x x n. Remrk: This remrk is intended for the reder who knows some proility theory, nd it cn e skipped without ny negtive effect on understnding the rest of this chpter. Given proility spce (Ω, F,µ) nd ny countle set Q (for simplicity we my ssume Q is finite), stochstic discrete-prmeter process with stte spce Q is countle fmily (X t ) t N of rndom vriles X t :Ω Q. We cn think of t s time, nd for ny q Q, of Pr(X t = q) s the proility tht the process X is in stte q t time t. If Pr(X t = q X 0 = q 0,...,X t 2 = q t 2,X t 1 = p) =Pr(X t = q X t 1 = p) for ll q 0,,...,q t 2,p,q Q nd for ll t 1, nd if the proility on the right-hnd side is independent of t, thenwesythtx =(X t ) t N is time-homogeneous Mrkov chin, for short, Mrkov chin. Informlly, the next stte X t of the process is independent of the pst sttes X 0,...,X t 2, provided tht the present stte X t 1 is known. Since for simplicity Q is ssumed to e finite, there is ijection σ : Q {1,...,n}, nd then, the process X is completely determined y the proilities ij = Pr(X t = q X t 1 = p), i = σ(p), j= σ(q), p,q Q,

56 56 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) nd if Q is finite stte spce of size n, theseformnn n mtrix A =( ij ) clled the Mrkov mtrix of the process X. It is row-stochstic mtrix. The euty of Mrkov chins is tht if we write π(i) =Pr(X 0 = i) for the initil proility distriution, then the joint proility distriution of X 0,X 1,..., X t is given y Pr(X 0 = i 0,X 1 = i 1,...,X t = i t )=π(i 0 )A(i 0,i 1 ) A(i t 1,i t ). The ove expression only involves π nd the mtrix A, nd mkes no mention of the originl mesure spce. Therefore, it doesn t mtter wht the proility spce is! Conversely, given n n n row-stochstic mtrix A, let Ω e the set of ll countle sequences ω =(ω 0,ω 1,...,ω t,...) with ω t Q = {1,...,n} for ll t N, nd let X t :Ω Q e the projection on the tth component, nmely X t (ω) =ω t. 2 Then it is possile to define σ-lger (lso clled σ-field) B nd mesure µ on B such tht (Ω, B,µ) is proility spce, nd X =(X t ) t N is Mrkov chin with corresponding Mrkov mtrix A. To define B, proceed s follows. For every t N, let F t e the fmily of ll unions of susets of Ω of the form {ω Ω (X 0 (ω) S 0 ) (X 1 (ω) S 1 ) (X t (ω) S t )}, where S 0,S 1,...,S t re susets of the stte spce Q = {1,...,n}. It is not hrd to show tht ech F t is σ-lger. Then let F = t 0 F t. Ech set in F is set of pths for which finite numer of outcomes re restricted to lie in certin susets of Q = {1,...,n}. All other outcomes re unrestricted. In fct, every suset C in F is countle union of sets of the form C = i N B (t) i B (t) i = {ω Ω ω =(q 0,q 1,...,q t,s t+1,....s j,...,) q 0,q 1,...,q t Q} = {ω Ω X 0 (ω) =q 0,X 1 (ω) =q 1,...,X t (ω) =q t }. 2 It is customry in proility theory to denote events y the letter ω. Inthepresentcse,ω denotes countle sequence of elements from Q. This nottion hs nothing do with the ijection ω : O {1,...,m} occurring in Definition 4.1.

57 4.1. HIDDEN MARKOV MODELS (HMMS) 57 The sequences in B (t) i re those eginning with the fixed sequence (q 0,q 1,...,q t ). One cn show tht F is field of sets, ut not necessrily σ-lger, so we form the smllest σ-lger G contining F. Using the mtrix A we cn define the mesure ν(b (t) i ) s the product of the proilities long the sequence (q 0,q 1,...,q t ). Then it cn e shown tht ν cn e extended to mesure µ on G, nd we let B e the σ-lger otined y dding to G ll susets of sets of mesure zero. The resulting proility spce (Ω, B,µ) is usully clled the sequence spce, ndthe mesure µ is clled the tree mesure. Then it is esy to show tht the fmily of rndom vriles X t :Ω Qontheproility spce(ω, B,µ) is time-homogeneous Mrkov chin whose Mrkov mtrix is the orginl mtrix A. The ove construction is presented in full detil in Kemeny, Snell, nd Knpp[11] (Chpter 2, Sections 1 nd 2). Most presenttions of Mrkov chins do not even mention the proility spce over which the rndom vriles X t re defined. This mkes the whole thing quite mysterious, since the proilities Pr(X t = q) re y definition given y Pr(X t = q) =µ({ω Ω X t (ω) =q}), which requires knowing the mesure µ. This is more prolemtic if we strt with stochstic mtrix. Wht re the rndom vriles X t,whtretheydefinedon?theoveconstruction puts things on firm grounds. There re three types of prolems tht cn e solved using HMMs: (1) The decoding prolem: Given n HMM M =(Q, O,π,A,B), for ny oserved output sequence O =(O 1,O 2,...,O T ) of length T 1, find most likely sequence of sttes S =(q 1,q 2,...,q T )thtproducestheoutputsequenceo. More precisely, with our nottionl convention tht σ(q t )=i t nd ω(o t )=ω t, this mens finding sequence S such tht the proility Pr(S, O) =π(i 1 )B(i 1,ω 1 ) T A(i t 1,i t )B(i t,ω t ) is mximl. This prolem is solved effectively y the Viteri lgorithm tht we outlined efore. (2) The evlution prolem, lso clled the likelyhood prolem: Given finite collection {M 1,...,M L } of HMM s with the sme output lphet O, fornyoutput sequence O =(O 1,O 2,...,O T ) of length T 1, find which model M l is most likely to hve generted O. More precisely, given ny model M k, we compute the proility tpro k tht M k could hve produced O long ny pth. Then we pick n HMM M l for which tpro l is mximl. We will return to this point fter hving descried the Viteri lgoritm. A vrition of the Viteri lgorithm clled the forwrd lgorithm effectively solves the evlution prolem. t=2

58 58 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) (3) The trining prolem, lso clled the lerning prolem: Given set {O 1,...,O r } of output sequences on the sme output lpet O, usully clled set of trining dt, given Q, findthe est π, A, ndb for n HMM M tht produces ll the sequences in the trining set, in the sense tht the HMM M =(Q, O,π,A,B) is the most likely to hve produced the sequences in the trining set. The technique used here is clled expecttion mximiztion, orem. It is n itertive method tht strts with n initil triple π, A, B, nd tries to impove it. There is such n lgorithm known s the Bum- Welch or forwrd-ckwrd lgorithm, ut it is eyond the scope of this introduction. Let us now descrie the Viteri lgorithm in more detils. 4.2 The Viteri Algorithm nd the Forwrd Algorithm Given n HMM M =(Q, O,π,A,B), for ny oserved output sequence O =(O 1,O 2,..., O T ) of length T 1, we wnt to find most likely sequence of sttes S =(q 1,q 2,...,q T ) tht produces the output sequence O. Using the ijections σ : Q {1,...,n} nd ω : O {1,...,m}, we cn work with sequences of indices, nd recll tht we denote the index σ(q t ) ssocited with the tth stte q t in the sequence S y i t, nd the index ω(o t ) ssocited with the tth output O t in the sequence O y ω t.thenweneedtofindsequences such tht the proility is mximl. Pr(S, O) =π(i 1 )B(i 1,ω 1 ) T A(i t 1,i t )B(i t,ω t ) In generl, there re n T sequences of length T. This prolem cn e solved efficiently y method sed on dynmic progrmming. For ny t, 1 t T,fornystteq Q, if σ(q) = j, thenwecomputescore(j, t), which is the lrgest proility tht sequence (q 1,...,q t 1,q) of length t ending with q hs produced the output sequence (O 1,...,O t 1,O t ). The point is tht if we know score(k, t 1) for k =1,...,n (with t 2), then we cn find score(j, t) forj =1,...,n, ecuse if we write k = σ(q t 1 )ndj = σ(q) (recll tht ω t = ω(o t )), then the proility ssocited with the pth (q 1,...,q t 1,q) is t=2 tscore(k, j) =score(k, t 1)A(k, j)b(j, ω t ).

59 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 59 See the illustrtion elow: stte indices i 1... k j σ σ σ score(k,t 1) sttes q 1... q t 1 A(k,j) q outputs O 1... O t 1 O t B(j,ω t) ω output indices ω 1... ω t 1 ω t So to mximize this proility, we just hve to find the mximum of the proilities tscore(k, j) over ll k, tht is, we must hve See the illustrtion elow: score(j, t) =mxtscore(k, j). k σ 1 (1) tscore(1,j) σ 1 tscore(k,j) (k) q = σ 1 (j) tscore(n,j) σ 1 (n) To get strted, we set score(j, 1) = π(j)b(j, ω 1 )forj =1,...,n. The lgorithm goes through forwrd phse for t =1,...,T, during which it computes the proilities score(j, t) forj =1,...,n. When t = T, we pick n index j such tht score(j, T ) is mximl. The mchine lerning community is fond of the nottion j =rgmxscore(k, T) k to express the ove fct. Typiclly, the smllest index j corresponding the mximum element in the list of proilities (score(1,t),score(2,t),...,score(n, T )) is returned. This gives us the lst stte q T = σ 1 (j) in n optiml sequence tht yields the output sequence O. ω ω

60 60 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) The lgorithm then goes through pth retrievl phse. To do this, whenwecompute score(j, t) =mxtscore(k, j), k we lso record the index k = σ(q t 1 )ofthestteq t 1 in the est sequence (q 1,...,q t 1,q t ) for which tscore(k, j) is mximl (with j = σ(q t )), s pred(j, t) =k. The index k is often clled the ckpointer of j t time t. This index my not e unique, we just pick one of them. Agin, this cn e expressed y pred(j, t) =rgmxtscore(k, j). k Typiclly, the smllest index k corresponding the mximum element in the list of proilities (tscore(1, j), tscore(2, j),..., tscore(n, j)) is returned. The predecessors pred(j, t) re only defined for t =2,...,T, ut we cn let pred(j, 1) = 0. Oserve tht the pth retrievl phse of the Viteri lgorithm is very similr to the phse of Dijkstr s lgorithm for finding shortest pth tht follows the prev rry. One should not confuse this phse with wht is clled the ckwrd lgorithm, which is used in solving the lerning prolem. The forwrd phse of the Viteri lgorithm is quite different from the Dijkstr s lgorithm, nd the Viteri lgorithm is ctully simpler (it computes score(j, t) for ll sttes nd for t =1,...,T), wheres Dijkstr s lgorithm mintins list of unvisited vertices, nd needs to pick the next vertex). The mjor difference is tht the Viteri lgorithm mximizes product of weights long pth, ut Dijkstr s lgorithm minimizes sum of weights long pth. Also, the Viteri lgorithm knows the length of the pth (T ) hed of time, ut Dijkstr s lgorithm does not. The Viteri lgorithm, invented y Andrew Viteri in 1967, is shown elow. The input to the lgorithm is M =(Q, O,π,A,B) nd the sequence of indices ω(o) = (ω 1,...,ω T ) ssocited with the oserved sequence O =(O 1,O 2,...,O T ) of length T 1, with ω t = ω(o t )fort =1,...,T. The output is sequence of sttes (q 1,...,q T ). sequence of indices (I 1,...,I T ); nmely, q t = σ 1 (I t ). This sequence is determined y the The Viteri Algorithm egin for j =1to n do score(j, 1) = π(j)b(j, ω 1 ) endfor;

61 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 61 ( forwrd phse to find the est (highest) scores ) for t =2to T do for j =1to n do for k =1to n do tscore(k) =score(k, t 1)A(k, j)b(j, ω t ) endfor; score(j, t) =mx k tscore(k); pred(j, t) =rgmx k tscore(k) endfor endfor; ( second phse to retrieve the optiml pth ) end I T =rgmx j score(j, T ); q T = σ 1 (I T ); for t = T to 2 y 1 do I t 1 = pred(i t,t); q t 1 = σ 1 (I t 1 ) endfor An illustrtion of the Viteri lgorithm pplied to Exmple 4.1 ws presented fter Exmple 4.3. If we run the Viteri lgorithm on the output sequence (S, M, S, L) of Exmple 4.2, we find tht the sequence (Cold, Cold, Cold, Hot) hs the highest proility, , mong ll sequences of length four. One my hve noticed tht the numers involved, eing products of proilities, ecome quite smll. Indeed, underflow my rise in dynmic progrmming. Fortuntely, there is simple wy to void underflow y tking logrithms. We initilize the lgorithm y computing score(j, 1) = log[π(j)] + log[b(j, ω 1 )], nd in the step where tscore is computed we use the formul tscore(k) =score(k, t 1) + log[a(k, j)] + log[b(j, ω t )]. It immeditely verified tht the time complexity of the Viteri lgorithm is O(n 2 T ). Let us now to turn to the second prolem, the evlution prolem (or likelyhood prolem). This time, given finite collection {M 1,...,M L } of HMM s with the sme output lphet O, fornyoservedoutputsequenceo =(O 1,O 2,...,O T ) of length T 1, find which model M l is most likely to hve generted O. More precisely, given ny model M k,

62 62 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) we compute the proility tpro k tht M k could hve produced O long ny sequence of sttes S =(q 1,...,q T ). Then we pick n HMM M l for which tpro l is mximl. The proility tpro k tht we re seeking is given y tpro k = Pr(O) = Pr((q i1,...,q it ), O) (i 1,...,i T ) {1,...,n} T = (i 1,...,i T ) {1,...,n} T π(i 1 )B(i 1,ω 1 ) T A(i t 1,i t )B(i t,ω t ), where {1,...,n} T denotes the set of ll sequences of length T consisting of elements from the set {1,...,n}. It is not hrd to see tht rute-force computtion requires 2Tn T multiplictions. Fortuntely, it is esy to dpt the Viteri lgorithm to compute tpro k efficiently. Since we re not looking for n explicity pth, there is no need for the second phse, nd during the forwrd phse, going from t 1 to t, rther thn finding the mximum of the scores tscore(k) for k =1,...,n,wejustsetscore(j, t) tothesumoverk of the temporry scores tscore(k). At the end, tpro k is the sum over j of the proilities score(j, T ). The lgorithm solving the evlution prolem known s the forwrd lgorithm is shown elow. The input to the lgorithm is M =(Q, O,π,A,B) nd the sequence of indices ω(o) = (ω 1,...,ω T ) ssocited with the oserved sequence O =(O 1,O 2,...,O T ) of length T 1, with ω t = ω(o t )fort =1,...,T. The output is the proility tpro. t=2 The Fowrd Algorithm egin for j =1to n do score(j, 1) = π(j)b(j, ω 1 ) endfor; for t =2to T do for j =1to n do for k =1to n do tscore(k) =score(k, t 1)A(k, j)b(j, ω t ) endfor; score(j, t) = k tscore(k) endfor

63 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 63 endfor; tpro = j score(j, T ) end We cn now run the ove lgorithm on M 1,...,M L to compute tpro 1,...,tpro L,nd we pick the model M l for which tpro l is mximum. As for the Viteri lgorithm, the time complexity of the forwrd lgorithm is O(n 2 T ). Underflow is lso prolem with the forwrd lgorithm. At first glnce it looks like tking logrithms does not help ecuse there is no simple expression for log(x x n ) in terms of the log x i. Fortuntely, we cn use the log-sum exp trick (which I lerned from Mitch Mrcus), nmely the identity ( n ) ( n ) log i=1 e x i = + log i=1 e x i for ll x 1,...,x n R nd R (tke exponentils on oth sides). Then, if we pick =mx 1 i n x i,weget n 1 e xi n, so mx x i log 1 i n i=1 ( n i=1 e x i ) mx 1 i n x i + log n, which shows tht mx 1 i n x i is good pproximtion for log ( n i=1 ex i ). For ny positive rels y 1,...,y n, if we let x i = log y i,thenweget ( n ( n ) log i=1 y i ) =mx 1 i n log y i + log i=1 e log(y i), with =mx 1 i n log y i. We will use this trick to compute ( n ) ( n ) log(score(j, k)) = log e log(tscore(k)) = + log e log(tscore(k)) k=1 with =mx 1 k n log(tscore(k)), where tscore((k) could e very smll, ut log(tscore(k)) is not, so computing log(tscore(k)) does not cuse underflow, nd 1 n e log(tscore(k)) n, k=1 since log(tscore(k)) 0 nd one of these terms is equl to zero, so even if some of the terms e log(tscore(k)) re very smll, ( this does not cuse ny troule. We will lso use this n ) trick to compute log(tpro) = log j=1 score(j, T ) in terms of the log(score(j, T )). k=1

64 64 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS) We leve it s n exercise to the reder to modify the forwrd lgorithm so tht it computes log(score(j, t)) nd log(tpro) using the log-sum exp trick. If you use Mtl, then this is quite esy ecuse Mtl does lot of the work for you since it cn pply opertors such s exp or (sum) to vectors. Exmple 4.3. To illustrte the forwrd lgorithm, ssume tht our oservnt student lso recorded the drinking ehvior of professor t Hrvrd, nd tht he cme up with the HHM shown in Figure 4.3. strt Cold Hot N D Figure 4.3: Exmple of n HMM modeling the drinking ehvior of professor t Hrvrd. However, the student cn t rememer whether he oserved the sequence NNND t Penn or t Hrvrd. So he runs the forwrd lgorithm on oth HMM s to find the most likely model. Do it! Following Jurfsky, the following chronology shows how of the Viteri lgorithm hs hd pplictions in mny seprte fields. Cittion Viteri (1967) Vintsyuk (1968) Needlemn nd Wunsch (1970) Skoe nd Chi (1971) Snkoff (1972) Reichert et l. (1973) Wgner nd Fischer (1974) Field informtion theory speech processing moleculr iology speech processing moleculr iology moleculr iology computer science

65 4.2. THE VITERBI ALGORITHM AND THE FORWARD ALGORITHM 65 Reders who wish to lern more out HMMs should egin with Stmp [20], gret tutoril which contins very cler nd esy to red presenttion. Another nice introduction is given in Rich [19] (Chpter 5, Section 5.11). A much more complete, yet ccessile, coverge of HMMs is found in Riner s tutoril [17]. Jurfsky nd Mrtin s online Chpter 9 (Hidden Mrkov Models) is lso very good nd informl tutoril (see jurfsky/slp3/9.pdf). A very cler nd quite ccessile presenttion of Mrkov chins is given in Cinlr [4]. Another thorough ut it more dvnced presenttion is given in Brémud [3]. Other presenttions of Mrkov chins cn e found in Mitzenmcher nd Upfl [14], nd in Grimmett nd Stirzker [10]. Acknowledgments: I would like to thnk Mitch Mrcus, Jocelyn Qintnce, nd Joo Sedoc, for scrutinizing my work nd for mny insightful comments.

66 66 CHAPTER 4. HIDDEN MARKOV MODELS (HMMS)

67 Chpter 5 Regulr Lnguges nd Equivlence Reltions, The Myhill-Nerode Chrcteriztion, Stte Equivlence 5.1 Morphisms, F -Mps, B-Mps nd Homomorphisms of DFA s It is nturl to wonder whether there is resonle notion of mpping etween DFA s. It turns out tht this is indeed the cse nd there is notion of mp etween DFA s which is very useful in the theory of DFA minimiztion (given DFA, find n equivlent DFA of miniml size). Oviously, mp etween DFA s should e certin kind of grph homomorphism, which mens tht given two DFA s D 1 = (Q 1, Σ,δ 1,q 0,1,F 1 )ndd 2 = (Q 2, Σ,δ 2,q 0,2,F 2 ), we hve function, h: Q 1 Q 2, mpping every stte, p Q 1,ofD 1, to some stte, q = h(p) Q 2,ofD 2 in such wy tht for every input symol, Σ, the trnsition on from p to δ 1 (p, ) is mpped to the trnsition on from h(p) toh(δ 1 (p, )), so tht h(δ 1 (p, )) = δ 2 (h(p),), which cn e expressed y the commuttivity of the following digrm: p h h(p) δ 1 (p, ) h δ 2 (h(p),) In order to e useful, mp of DFA s, h: D 1 D 2, should induce reltionship etween the lnguges, L(D 1 )ndl(d 2 ), such s L(D 1 ) L(D 2 ), L(D 2 ) L(D 1 )orl(d 1 )=L(D 2 ). This cn indeed e chieved y requiring some simple condition on the wy finlsttesre relted y h. 67

68 68 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S For ny function, h: X Y,ndfornytwosusets,A X nd B Y, recll tht is the (direct) imge of A y h nd h(a) ={h() Y A} h 1 (B) ={x X h(x) B} is the inverse imge of B y h, ndh 1 (B) mkes sense even if h is not invertile. The following Definition is dpted from Eilenerg [8] (Automt, Lnguges nd Mchines, Vol A, Acdemic Press, 1974; see Chpter III, Section 4). Definition 5.1. Given two DFA s, D 1 =(Q 1, Σ,δ 1,q 0,1,F 1 )ndd 2 =(Q 2, Σ,δ 2,q 0,2,F 2 ), morphism, h: D 1 D 2,ofDFA sis function, h: Q 1 Q 2, stisfying the following conditions: (1) h(δ 1 (p, )) = δ 2 (h(p),), for ll p Q 1 nd ll Σ, which cn e expressed y the commuttivity of the following digrm: p h h(p) δ 1 (p, ) h δ 2 (h(p),). (2) h(q 0,1 )=q 0,2. An F -mp of DFA s, forshort,mp, is morphism of DFA s, h: D 1 D 2,tht stisfies the condition (3) h(f 1 ) F 2. A B-mp of DFA s is morphism of DFA s, h: D 1 D 2, tht stisfies the condition (3) h 1 (F 2 ) F 1. A proper homomorphism of DFA s, forshort,homomorphism, is n F -mp of DFA s tht is lso B-mp of DFA s nmely, homomorphism stisfies (3) & (3). Now, for ny function f : X Y nd ny two susets A X nd B Y, recll tht f(a) B iff A f 1 (B). Thus, (3) & (3) is equivlent to the condition (3c) elow, tht is, homomorphism of DFA s is morphism stisfying the condition (3c) h 1 (F 2 )=F 1.

69 5.1. MORPHISMS, F -MAPS, B-MAPS AND HOMOMORPHISMS OF DFA S 69 Note tht the condition for eing proper homomorphism of DFA s (condition (3c)) is not equivlent to h(f 1 )=F 2. Condition (3c) forces h(f 1 )=F 2 h(q 1 ), nd furthermore, for every p Q 1,whenever h(p) F 2,thenp F 1. Exmple 5.1. Figure 5.1 shows mp, h, of DFA s, with h(a) =h(c) =0 h(b) =1 h(d) =2 h(e) =3. It is esy to check tht h is ctully (proper) homomorphism. B A D E C A 0; B 1; C 0; D 2; E Figure 5.1: A mp of DFA s The reder should check tht if f : D 1 D 2 nd g : D 2 D 3 re morphisms (resp. F -mps, resp. B-mps), then g f : D 1 D 3 is lso morphism (resp. n F -mp, resp. B-mp).

70 70 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Remrk: In previous versions of these notes, n F -mp ws clled simply mp nd B-mp ws clled n F 1 -mp. Over the yers, the old terminology proved to e confusing. We hope the new one is less confusing! Note tht n F -mp or B-mp is specil cse of the concept of simultion of utomt. A proper homomorphism is specil cse of isimultion. Bisimultions ply n importnt role in rel-time systems nd in concurrency theory. The min motivtion ehind these definitions is tht when there is n F -mp h: D 1 D 2,somehow,D 2 simultes D 1, nd it turns out tht L(D 1 ) L(D 2 ). When there is B-mp h: D 1 D 2,somehow,D 1 simultes D 2, nd it turns out tht L(D 2 ) L(D 1 ). When there is proper homomorphism h: D 1 D 2,somehow,D 1 isimultes D 2,nd it turns out tht L(D 2 )=L(D 1 ). A DFA morphism f : D 1 D 2, is n isomorphism iff there is DFA morphism, g : D 2 D 1,sotht g f = id D1 nd f g = id D2. Similrly n F -mp f : D 1 D 2, is n isomorphism iff there is n F -mp, g : D 2 D 1,so tht g f = id D1 nd f g = id D2. Finlly, B-mp f : D 1 D 2, is n isomorphism iff there is B-mp, g : D 2 D 1,sotht g f = id D1 nd f g = id D2. The mp g is unique nd it is denoted f 1. The reder should prove tht if DFA F - mp is n isomorphism, then it is lso proper homomorphism nd if DFA B-mp is n isomorphism, then it is lso proper homomorphism. If h: D 1 D 2 is morphism of DFA s, it is esily shown y induction on the length of w tht h(δ 1(p, w)) = δ 2(h(p),w), for ll p Q 1 nd ll w Σ, which corresponds to the commuttivity of the following digrm: h p h(p) w δ 1 (p, w) h δ 2 (h(p),w). This is the generliztion of the commuttivity of the digrm in condition (1) of Definition 5.1, where ny ritrry string w Σ is llowed insted of just single symol Σ. This is the crucil property of DFA morphisms. It sys tht for every string, w Σ, if we pick ny stte, p Q 1, s strting point in D 1, then the imge of the pth from p on w

71 5.1. MORPHISMS, F -MAPS, B-MAPS AND HOMOMORPHISMS OF DFA S 71 input w in D 1 is the pth in D 2 from the imge, h(p) Q 2,ofp on the sme input, w. In prticulr, the imge, h(δ 1 (p, w)) of the stte reched from p on input w in D 1 is the stte, δ 2(h(p),w), in D 2 reched from h(p) on input w. Exmple 5.2. For exmple, going ck to the DFA mp shown in Figure 3.3, the imge of the pth C B D B D E from C on input w = in D 1 is the pth 0 1 from 0 on input w = in D As consequence, we hve the following Proposition: 3 Proposition 5.1. If h: D 1 D 2 is n F -mp of DFA s, then L(D 1 ) L(D 2 ). If h: D 1 D 2 is B-mp of DFA s, then L(D 2 ) L(D 1 ). Finlly, if h: D 1 D 2 is proper homomorphism of DFA s, then L(D 1 )=L(D 2 ). One might think tht there my e mny DFA morphisms etween two DFA s D 1 nd D 2, ut this is not the cse. In fct, if every stte of D 1 is rechle from the strt stte, then there is t most one morphism from D 1 to D 2. Given DFA D =(Q, Σ,δ,q 0,F), the set Q r of ccessile or rechle sttes is the suset of Q defined s Q r = {p Q w Σ,δ (q 0,w)=p}. The set Q r cn e esily computed y stges. A DFA is ccessile, or trim if Q = Q r ;tht is, if every stte is rechle from the strt stte. A morphism (resp. F -mp, B-mp) h: D 1 D 2 is surjective if h(q 1 )=Q 2. The following proposition is esy to show: Proposition 5.2. If D 1 is trim, then there is t most one morphism h: D 1 D 2 (resp. F -mp, resp. B-mp). If D 2 is lso trim nd we hve morphism, h: D 1 D 2,thenh is surjective. It cn lso e shown tht miniml DFA D L for L is chrcterized y the property tht there is unique surjective proper homomorphism h: D D L from ny trim DFA D ccepting L to D L. Another useful notion is the notion of congruence on DFA. Definition 5.2. Given ny DFA, D =(Q, Σ,δ,q 0,F), congruence on D is n equivlence reltion on Q stisfying the following conditions: For ll p, q Q nd ll Σ,

72 72 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S (1) If p q, thenδ(p, ) δ(q, ). (2) If p q nd p F,thenq F. It cn e shown tht proper homomorphism of DFA s h: D 1 D 2 induces congruence h on D 1 defined s follows: p h q iff h(p) =h(q). Given congruence on DFA D, wecndefinethequotient DFA D/, nd there is surjective proper homomorphism π : D D/. We will come ck to this point when we study miniml DFA s. 5.2 Directed Grphs nd Pths It is often useful to view DFA s nd NFA s s leled directed grphs. Definition 5.3. A directed grph is qudruple G =(V,E,s,t), where V is set of vertices, or nodes, E is set of edges, or rcs,nds, t: E V re two functions, s eing clled the source function, nd t the trget function. Given n edge e E, we lso cll s(e) theorigin (or source) ofe, ndt(e) theendpoint (or trget) ofe. Remrk : the functions s, t need not e injective or surjective. Thus, we llow isolted vertices. Exmple: Let G e the directed grph defined such tht E = {e 1,e 2,e 3,e 4,e 5,e 6,e 7,e 8 }, V = {v 1,v 2,v 3,v 4,v 5,v 6 }, nd s(e 1 )=v 1, s(e 2 )=v 2, s(e 3 )=v 3, s(e 4 )=v 4, s(e 5 )=v 2, s(e 6 )=v 5, s(e 7 )=v 5, s(e 8 )=v 5, t(e 1 )=v 2, t(e 2 )=v 3, t(e 3 )=v 4, t(e 4 )=v 2, t(e 5 )=v 5, t(e 6 )=v 5, t(e 7 )=v 6, t(e 8 )=v 6. Such grph cn e represented y the following digrm:

73 5.2. DIRECTED GRAPHS AND PATHS 73 e 6 e 7 v 5 v 6 e 8 e 5 v 4 e 4 e 1 v 1 v 2 e 3 e 2 v 3 Figure 5.2: A directed grph In drwing directed grphs, we will usully omit edge nmes (the e i ), nd sometimes even the node nmes (the v j ). We now define pths in directed grph. Definition 5.4. Given directed grph G =(V,E,s,t), for ny two nodes u, v V,pth from u to v is triple π =(u, e 1...e n,v), where e 1...e n is string (sequence) of edges in E such tht, s(e 1 )=u, t(e n )=v, ndt(e i )=s(e i+1 ), for ll i such tht 1 i n 1. When n =0,wemusthveu = v, ndthepth(u, ϵ, u) is clled the null pth from u to u. The numer n is the length of the pth. We lso cll u the source (or origin) ofthepth,nd v the trget (or endpoint) of the pth. When there is nonnull pth π from u to v, wesy tht u nd v re connected. Remrk :Inpthπ =(u, e 1...e n,v), the expression e 1...e n is sequence, ndthus, the e i re not necessrily distinct. For exmple, the following re pths: π 1 =(v 1,e 1 e 5 e 7,v 6 ),

74 74 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S nd π 2 =(v 2,e 2 e 3 e 4 e 2 e 3 e 4 e 2 e 3 e 4,v 2 ), π 3 =(v 1,e 1 e 2 e 3 e 4 e 2 e 3 e 4 e 5 e 6 e 6 e 8,v 6 ). Clerly, π 2 nd π 3 re of different nture from π 1. Indeed, they contin cycles. This is formlized s follows. Definition 5.5. Given directed grph G =(V,E,s,t), for ny node u V cycle (or loop) through u is nonnull pth of the form π =(u, e 1...e n,u) (equivlently, t(e n )=s(e 1 )). More generlly, nonnull pth π =(u, e 1...e n,v) contins cycle iff for some i, j, with 1 i j n, t(e j )=s(e i ). In this cse, letting w = t(e j )=s(e i ), the pth (w, e i...e j,w) is cycle through w. A pth π is cyclic iff it does not contin ny cycle. Note tht ech null pth (u, ϵ, u) is cyclic. Oviously, cycle π =(u, e 1...e n,u)throughu is lso cycle through every node t(e i ). Also, pth π my contin severl different cycles. Pths cn e conctented s follows. Definition 5.6. Given directed grph G =(V,E,s,t), two pths π 1 =(u, e 1...e m,v) nd π 2 =(u,e 1...e n,v )cneconctented provided tht v = u, in which cse their conctention is the pth π 1 π 2 =(u, e 1...e m e 1...e n,v ). It is immeditely verified tht the conctention of pths is ssocitive, nd tht the conctention of the pth π =(u, e 1...e m,v) with the null pth (u, ϵ, u) or with the null pth (v, ϵ, v) is the pth π itself. The following fct, lthough lmost trivil, is used ll the time, nd is worth stting in detil. Proposition 5.3. Given directed grph G =(V,E,s,t), if the set of nodes V contins m 1 nodes, then every pth π of length t lest m contins some cycle. A consequence of Proposition 5.3 is tht in finite grph with m nodes, given ny two nodes u, v V, in order to find out whether there is pth from u to v, it is enough to consider pths of length m 1. Indeed, if there is pth etween u nd v, then there is some pth π of miniml length (not necessrily unique, ut this doesn t mtter). If this miniml pth hs length t lest m, then y the Proposition, it contins cycle. However, y deleting this cycle from the pth π, wegetnevenshorterpthfromu to v, contrdicting the minimlity of π. We now turn to leled grphs.

75 5.3. LABELED GRAPHS AND AUTOMATA Leled Grphs nd Automt In fct, we only need edge-leled grphs. Definition 5.7. A leled directed grph is tuple G =(V,E,L,s,t,λ), where V is set of vertices, or nodes, E is set of edges, or rcs, L is set of lels, s, t: E V re two functions, s eing clled the source function, nd t the trget function, nd λ: E L is the leling function. Given n edge e E, we lso cll s(e) theorigin (or source) ofe, t(e) the endpoint (or trget) ofe, ndλ(e) thelel of e. Note tht the function λ need not e injective or surjective. Thus, distinct edges my hve the sme lel. Exmple: Let G e the directed grph defined such tht E = {e 1,e 2,e 3,e 4,e 5,e 6,e 7,e 8 }, V = {v 1,v 2,v 3,v 4,v 5,v 6 }, L = {, }, nd s(e 1 )=v 1, s(e 2 )=v 2, s(e 3 )=v 3, s(e 4 )=v 4, s(e 5 )=v 2, s(e 6 )=v 5, s(e 7 )=v 5, s(e 8 )=v 5, t(e 1 )=v 2, t(e 2 )=v 3, t(e 3 )=v 4, t(e 4 )=v 2, t(e 5 )=v 5, t(e 6 )=v 5, t(e 7 )=v 6, t(e 8 )=v 6 λ(e 1 )=, λ(e 2 )=, λ(e 3 )=, λ(e 4 )=, λ(e 5 )=, λ(e 6 )=, λ(e 7 )=, λ(e 8 )=. Such leled grph cn e represented y the following digrm: In drwing leled grphs, we will usully omit edge nmes (the e i ), nd sometimes even the node nmes (the v j ). Pths, cycles, nd conctention of pths re defined just s efore (tht is, we ignore the lels). However, we cn now define the spelling of pth. Definition 5.8. Given leled directed grph G =(V,E,L,s,t,λ) fornytwonodes u, v V,fornypthπ =(u, e 1...e n,v), the spelling of the pth π is the string of lels λ(e 1 ) λ(e n ). When n = 0, the spelling of the null pth (u, ϵ, u) is the null string ϵ. For exmple, the spelling of the pth π 3 =(v 1,e 1 e 2 e 3 e 4 e 2 e 3 e 4 e 5 e 6 e 6 e 8,v 6 )

76 76 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S e 6 e 7 v v 6 5 e 8 e 5 v 4 e 4 e 1 v 1 v 2 e 3 e 2 v 3 Figure 5.3: A leled directed grph is. Every DFA nd every NFA cn e viewed s leled grph, in such wy thttheset of spellings of pths from the strt stte to some finl stte is the lnguge ccepted y the utomton in question. Given DFA D =(Q, Σ,δ,q 0,F), where δ : Q Σ Q, we ssocite the leled directed grph G D =(V,E,L,s,t,λ) defined s follows: V = Q, E = {(p,, q) q = δ(p, ), p,q Q, Σ}, L =Σ,s((p,, q)) = p, t((p,, q)) = q, ndλ((p,, q)) =. Such leled grphs hve specil structure tht cn esily e chrcterized. It is esily shown tht string w Σ is in the lnguge L(D) ={w Σ δ (q 0,w) F } iff w is the spelling of some pth in G D from q 0 to some finl stte.

77 5.4. THE CLOSURE DEFINITION OF THE REGULAR LANGUAGES 77 Similrly, given n NFA N =(Q, Σ,δ,q 0,F), where δ : Q (Σ {ϵ}) 2 Q, we ssocite the leled directed grph G N =(V,E,L,s,t,λ) defined s follows: V = Q E = {(p,, q) q δ(p, ), p,q Q, Σ {ϵ}}, L =Σ {ϵ}, s((p,, q)) = p, t((p,, q)) = q, λ((p,, q)) =. Remrk :WhenN hs no ϵ-trnsitions, we cn let L =Σ. Such leled grphs hve lso specil structure tht cn esily e chrcterized. Agin, string w Σ is in the lnguge L(N) ={w Σ δ (q 0,w) F } iff w is the spelling of some pth in G N from q 0 to some finl stte. 5.4 The Closure Definition of the Regulr Lnguges Let Σ = { 1,..., m } e some lphet. We would like to define fmily of lnguges, R(Σ), y singling out some very sic (tomic) lnguges, nmely the lnguges { 1 },...,{ m }, the empty lnguge, nd the trivil lnguge, {ϵ}, nd then forming more complicted lnguges y repetedly forming union, conctention nd Kleene of previously constructed lnguges. By doing so, we hope to get fmily of lnguges (R(Σ)) tht is closed under union, conctention, nd Kleene. This mens tht for ny two lnguges, L 1,L 2 R(Σ), we lso hve L 1 L 2 R(Σ) nd L 1 L 2 R(Σ), nd for ny lnguge L R(Σ), we hve L R(Σ). Furthermore, we would like R(Σ) to e the smllest fmily with these properties. How do we chieve this rigorously? First, let us look more closely t wht we men y fmily of lnguges. Recll tht lnguge (over Σ) is ny suset, L, ofσ. Thus, the set of ll lnguges is 2 Σ,thepower set of Σ. If Σ is nonempty, this is n uncountle set. Next, we define fmily, L, of lnguges to e ny suset of 2 Σ. This time, the set of fmilies of lnguges is 2 2Σ. This is huge set. We cn use the inclusion reltion on 2 2Σ to define prtil order on fmilies of lnguges. So, L 1 L 2 iff for every lnguge, L, if L L 1 then L L 2. We cn now stte more precisely wht we re trying to do. properties for fmily of lnguges, L: Consider the following (1) We hve { 1 },...,{ m },, {ϵ} L, i.e., L contins the tomic lnguges. (2) For ll L 1,L 2 L, we lso hve L 1 L 2 L. (2) For ll L 1,L 2 L, we lso hve L 1 L 2 L. (2c) For ll L L, we lso hve L L.

78 78 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S In other words, L is closed under union, conctention nd Kleene. Now, wht we wnt is the smllest (w.r.t. inclusion) fmily of lnguges tht stisfies properties (1) nd (2)()()(c). We cn construct such fmily using n inductive definition. This inductive definition constructs sequence of fmilies of lnguges, (R(Σ) n ) n 0, clled the stges of the inductive definition, s follows: R(Σ) 0 = {{ 1 },...,{ m },, {ϵ}}, R(Σ) n+1 = R(Σ) n {L 1 L 2,L 1 L 2,L L 1,L 2,L R(Σ) n }. Then, we define R(Σ) y R(Σ) = n 0 R(Σ) n. Thus, lnguge L elongs to R(Σ) iff it elongs L n,forsomen 0. For exmple, if Σ = {, }, wehve R(Σ) 1 = {{}, {},, {ϵ}, {, }, {, ϵ}, {, ϵ}, {}, {}, {}, {}, {}, {} }. Some of the lnguges tht will pper in R(Σ) 2 re: Oserve tht {, }, {, }, {}, {}, {}{}, {}{}, {}. R(Σ) 0 R(Σ) 1 R(Σ) 2 R(Σ) n R(Σ) n+1 R(Σ), so tht if L R(Σ) n,thenl R(Σ) p, for ll p n. Also, there is some smllest n for which L R(Σ) n (the irthdte of L!). In fct, ll these inclusions re strict. Note tht ech R(Σ) n only contins finite numer of lnguges (ut some of the lnguges in R(Σ) n re infinite, ecuse of Kleene ). Then we define the Regulr lnguges, Version 2, s the fmily R(Σ). Of course, it is fr from ovious tht R(Σ) coincides with the fmily of lnguges ccepted y DFA s (or NFA s), wht we cll the regulr lnguges, version 1. However, this is the cse, nd this cn e demonstrted y giving two lgorithms. Actully, it will e slightly more convenient to define nottion system, the regulr expressions, to denote the lnguges in R(Σ). Then, we will give n lgorithm tht converts regulr expression, R, into n NFA, N R,sothtL R = L(N R ), where L R is the lnguge (in R(Σ)) denoted y R. We will lso give n lgorithm tht converts n NFA, N, into regulr expression, R N,sotht L(R N )=L(N). But efore doing ll this, we should mke sure tht R(Σ) is indeed the fmily tht we re seeking. This is the content of

79 5.4. THE CLOSURE DEFINITION OF THE REGULAR LANGUAGES 79 Proposition 5.4. The fmily, R(Σ), isthesmllestfmilyoflngugeswhichcontinsthe tomic lnguges { 1 },...,{ m },, {ϵ}, ndisclosedunderunion,conctention,nd Kleene. Proof. There re two things to prove. (i) We need to prove tht R(Σ) hs properties (1) nd (2)()()(c). (ii) We need to prove tht R(Σ) is the smllest fmily hving properties (1) nd (2)()()(c). (i) Since R(Σ) 0 = {{ 1 },...,{ m },, {ϵ}}, it is ovious tht (1) holds. Next, ssume tht L 1,L 2 R(Σ). This mens tht there re some integers n 1,n 2 0, so tht L 1 R(Σ) n1 nd L 2 R(Σ) n2. Now, it is possile tht n 1 n 2, ut if we let n =mx{n 1,n 2 },sweoservedthtr(σ) p R(Σ) q whenever p q, weregurnteedthtothl 1,L 2 R(Σ) n. However, y the definition of R(Σ) n+1 (tht s why we defined it this wy!), we hve L 1 L 2 R(Σ) n+1 R(Σ). The sme rgument proves tht L 1 L 2 R(Σ) n+1 R(Σ). Also, if L R(Σ) n, we immeditely hve L R(Σ) n+1 R(Σ). Therefore, R(Σ) hs properties (1) nd (2)()()(c). (ii) Let L e ny fmily of lnguges hving properties (1) nd (2)()()(c). We need to prove tht R(Σ) L. If we cn prove tht R(Σ) n L, for ll n 0, we re done (since then, R(Σ) = n 0 R(Σ) n L). We prove y induction on n tht R(Σ) n L, for ll n 0. The se cse n = 0 is trivil, since L hs (1), which sys tht R(Σ) 0 L. Assume inductively tht R(Σ) n L.WeneedtoprovethtR(Σ) n+1 L. Pick ny L R(Σ) n+1. Recll tht R(Σ) n+1 = R(Σ) n {L 1 L 2,L 1 L 2,L L 1,L 2,L R(Σ) n }. If L R(Σ) n,thenl L, since R(Σ) n L, y the induction hypothesis. Otherwise, there re three cses: () L = L 1 L 2,whereL 1,L 2 R(Σ) n. By the induction hypothesis, R(Σ) n L,so,we get L 1,L 2 L; since L hs 2(), we hve L 1 L 2 L. () L = L 1 L 2,whereL 1,L 2 R(Σ) n. By the induction hypothesis, R(Σ) n L,so,weget L 1,L 2 L; since L hs 2(), we hve L 1 L 2 L. (c) L = L 1,whereL 1 R(Σ) n. By the induction hypothesis, R(Σ) n L,so,weget L 1 L; since L hs 2(c), we hve L 1 L. Thus, in ll cses, we showed tht L L,ndso,R(Σ) n+1 L, which proves the induction step.

80 80 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Note: given lnguge L my e uilt up in different wys. For exmple, {, } =({} {} ). Students should study crefully the ove proof. Although simple, it is the prototype of mny proofs ppering in the theory of computtion. 5.5 Regulr Expressions The definition of the fmily of lnguges R(Σ) given in the previous section in terms of n inductive definition is good to prove properties of these lnguges ut is it not very convenient to mnipulte them in prcticl wy. To do so, it is etter to introduce symolic nottion system, the regulr expressions. Regulr expressions re certin strings formed ccording to rules tht mimic the inductive rules for constructing the fmilies R(Σ) n. The set of regulr expressions R(Σ) over n lphet Σ is lnguge defined on n lphet defined s follows. Given n lphet Σ = { 1,..., m }, consider the new lphet =Σ {+,,, (, ),,ϵ}. We define the fmily (R(Σ) n ) of lnguges over s follows: R(Σ) 0 = { 1,..., m,,ϵ}, R(Σ) n+1 = R(Σ) n {(R 1 + R 2 ), (R 1 R 2 ),R R 1,R 2,R R(Σ) n }. Then, we define R(Σ) s R(Σ) = n 0 R(Σ) n. Note tht every lnguge R(Σ) n is finite. For exmple, if Σ = {, }, wehve R(Σ) 1 = {,,,ϵ, ( + ), ( + ), ( + ), ( + ), ( + ϵ), (ϵ + ), ( + ϵ), (ϵ + ), ( + ), ( + ), ( + ), ( + ), (ϵ + ϵ), (ϵ + ), ( + ϵ), ( + ), ( ), ( ), ( ), ( ), ( ϵ), (ϵ ), ( ϵ), (ϵ ), (ϵ ϵ), ( ),( ), ( ),( ),(ϵ ),( ϵ),( ),,,ϵ, }.

81 5.6. REGULAR EXPRESSIONS AND REGULAR LANGUAGES 81 Some of the regulr expressions ppering in R(Σ) 2 re: ( +( )), (( )+( )), (( ) ), (( ) ( )), ( ), (( ) ), ( ). Definition 5.9. The set R(Σ) is the set of regulr expressions (over Σ). Proposition 5.5. The lnguge R(Σ) is the smllest lnguge which contins the symols 1,..., m,, ϵ, from, ndsuchtht(r 1 + R 2 ), (R 1 R 2 ),ndr,lsoelongtor(σ), when R 1,R 2,R R(Σ). For simplicity of nottion, we write insted of Exmples: R =( + ), S =( ). (R 1 R 2 ) (R 1 R 2 ). T =(( + ) )(( + ) ( + ) ). }{{} n 5.6 Regulr Expressions nd Regulr Lnguges Every regulr expression R R(Σ) cn e viewed s the nme, ordenottion, ofsome lnguge L R(Σ). Similrly, every lnguge L R(Σ) is the interprettion (or mening) of some regulr expression R R(Σ). Think of regulr expression R s progrm, ndofl(r) s the result of the execution or evlution, ofr y L. This cn e mde rigorous y defining function L: R(Σ) R(Σ). This function is defined recursively s follows: L[ i ]={ i }, L[ ] =, L[ϵ] ={ϵ}, L[(R 1 + R 2 )] = L[R 1 ] L[R 2 ], L[(R 1 R 2 )] = L[R 1 ]L[R 2 ], L[R ]=L[R].

82 82 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Proposition 5.6. For every regulr expression R R(Σ), thelngugel[r] is regulr (version 2), i.e. L[R] R(Σ). Conversely,foreveryregulr(version2)lngugeL R(Σ), there is some regulr expression R R(Σ) such tht L = L[R]. Proof. To prove tht L[R] R(Σ) for ll R R(Σ), we prove y induction on n 0tht if R R(Σ) n,thenl[r] R(Σ) n.toprovethtl is surjective, we prove y induction on n 0 tht if L R(Σ) n, then there is some R R(Σ) n such tht L = L[R]. Note: the function L is not injective. Exmple: IfR =( + ), S =( ),then L[R] =L[S] ={, }. For simplicity, we often denote L[R] sl R. As exmples, we hve Remrk. If L[((())+)] = {, } L[(((( ) )) )] = {w {, } w hs two s} L[((((( ) )) ) )] = {w {, } w hs n even # of s} L[((((((( ) )) ) )) )] = {w {, } w hs n R =(( + ) )(( + ) ( + ) ), }{{} n odd # of s} it cn e shown tht ny miniml DFA ccepting L R hs 2 n+1 sttes. Yet, oth (( + ) ) nd (( + ) ( + ) ) denote lnguges tht cn e ccepted y smll DFA s (of size 2 }{{} n nd n +2). Definition Two regulr expressions R, S R(Σ) re equivlent,denotedsr = S, iff L[R] =L[S]. It is immedite tht = is n equivlence reltion. The reltion = stisfies some (nice) identities. For exmple: ((()+)+c) = (()+( + c)) (()((cc))) = ((())(cc)) ( ) =,

83 5.7. REGULAR EXPRESSIONS AND NFA S 83 nd more generlly ((R 1 + R 2 )+R 3 ) = (R 1 +(R 2 + R 3 )), ((R 1 R 2 )R 3 ) = (R 1 (R 2 R 3 )), (R 1 + R 2 ) = (R 2 + R 1 ), (R R ) = R, R = R. There is n lgorithm to test the equivlence of regulr expressions, ut its complexity is exponentil. Such n lgorithm uses the conversion of regulr expression to n NFA, nd the suset construction for converting n NFA to DFA. Then the prolem of deciding whether two regulr expressions R nd S re equivlent is reduced to testing whether two DFA D 1 nd D 2 ccept the sme lnguges (the equivlence prolem for DFA s). This lst prolem is equivlent to testing whether L(D 1 ) L(D 2 )= nd L(D 2 ) L(D 1 )=. But L(D 1 ) L(D 2 ) (nd similrly L(D 2 ) L(D 1 )) is ccepted y DFA otined y the cross-product construction for the reltive complement (with finl sttes F 1 F 2 nd F 1 F 2 ). Thus, in the end, the equivlence prolem for regulr expressions reduces to the prolem of testing whether DFA D =(Q, Σ,δ,q 0,F) ccepts the empty lnguge, which is equivlent to Q r F =. This lst prolem is rechility prolem in directed grph which is esily solved in polynomil time. It is n open prolem to prove tht the prolem of testing the equivlence of regulr expressions cnnot e decided in polynomil time. In the next two sections we show the equivlence of NFA s nd regulr expressions, y providing n lgorithm to construct n NFA from regulr expression, nd n lgorithm for constructing regulr expression from n NFA. This will show tht the regulr lnguges Version 1 coincide with the regulr lnguges Version Regulr Expressions nd NFA s Proposition 5.7. There is n lgorithm, which, given ny regulr expression R R(Σ), constructs n NFA N R ccepting L R,i.e.,suchthtL R = L(N R ). In order to ensure the correctness of the construction s well s to simplify the description of the lgorithm it is convenient to ssume tht our NFA s stisfy the following conditions: 1. Ech NFA hs single finl stte, t, distinct from the strt stte, s. 2. There re no incoming trnsitions into the the strt stte, s, ndno outgoing trnsitions from the finl stte, t.

84 84 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S 3. Every stte hs t most two incoming nd two outgoing trnsitions. Here is the lgorithm. For the se cse, either () R = i, in which cse, N R is the following NFA: s i t Figure 5.4: NFA for i () R = ϵ, in which cse, N R is the following NFA: s ϵ t Figure 5.5: NFA for ϵ (c) R =, in which cse, N R is the following NFA: s t Figure 5.6: NFA for The recursive cluses re s follows: (i) If our expression is (R+S), the lgorithm is pplied recursively to R nd S, generting NFA s N R nd N S, nd then these two NFA s re comined in prllel s shown in Figure 5.7: ϵ s 1 N R t 1 ϵ s t ϵ ϵ s 2 N S t 2 Figure 5.7: NFA for (R + S)

85 5.7. REGULAR EXPRESSIONS AND NFA S 85 (ii) If our expression is (R S), the lgorithm is pplied recursively to R nd S, generting NFA s N R nd N S, nd then these NFA s re comined sequentilly s shown in Figure 5.8 y merging the old finl stte, t 1,ofN R, with the old strt stte, s 2,ofN S : s 1 N R t 1 N S t 2 Figure 5.8: NFA for (R S) Note tht since there re no incoming trnsitions into s 2 in N S,onceweenterN S,there is no wy of reentering N R, nd so the construction is correct (it yields the conctention L R L S ). (iii) If our expression is R, the lgorithm is pplied recursively to R, generting the NFA N R. Then we construct the NFA shown in Figure 5.9 y dding n ϵ-trnsition from the old finl stte, t 1,ofN R to the old strt stte, s 1,ofN R nd, s ϵ is not necessrily ccepted y N R,weddnϵ-trnsition from s to t: ϵ ϵ ϵ s s 1 t 1 t N R ϵ Figure 5.9: NFA for R Since there re no outgoing trnsitions from t 1 in N R, we cn only loop ck to s 1 from t 1 using the new ϵ-trnsition from t 1 to s 1 nd so the NFA of Figure 5.9 does ccept L R. The lgorithm tht we just descried is sometimes clled the somrero construction. As corollry of this construction, we get Reg. lnguges version 2 Reg. lnguges, version 1. The reder should check tht if one constructs the NFA corresponding to the regulr expression ( + ) nd then pplies the suset construction, one get the following DFA:

86 86 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S B A D E C Figure 5.10: A non-miniml DFA for {, } {} We now consider the construction of regulr expression from n NFA. Proposition 5.8. There is n lgorithm, which, given ny NFA N, constructsregulr expression R R(Σ), denotingl(n), i.e.,suchthtl R = L(N). As corollry, Reg. lnguges version 1 Reg. lnguges, version 2. This is the node elimintion lgorithm. The generl ide is to llow more generl lels on the edges of n NFA, nmely, regulr expressions. Then, such generlized NFA s re simplified y eliminting nodes one t time, nd redjusting lels. Preprocessing, phse 1: If necessry, we need to dd new strt stte with n ϵ-trnsition to the old strt stte, if there re incoming edges into the old strt stte. If necessry, we need to dd new (unique) finl stte with ϵ-trnsitions from ech of the old finl sttes to the new finl stte, if there is more thn one finl stte or some outgoing edge from ny of the old finl sttes. At the end of this phse, the strt stte, sy s, is source (no incoming edges), nd the finl stte, sy t, is sink (no outgoing edges). Preprocessing, phse 2: We need to fltten prllel edges. For ny pir of sttes (p, q) (p = q is possile), if there re k edges from p to q leled u 1,..., u k, then crete single edge leled with the regulr expression u u k.

87 5.7. REGULAR EXPRESSIONS AND NFA S 87 For ny pir of sttes (p, q) (p = q is possile) such tht there is no edge from p to q, we put n edge leled. At the end of this phse, the resulting generlized NFA is such tht for ny pir of sttes (p, q) (wherep = q is possile), there is unique edge leled with some regulr expression denoted s R p,q. When R p,q =, this relly mens tht there is no edge from p to q in the originl NFA N. By interpreting ech R p,q s function cll (relly, mcro) to the NFA N p,q ccepting L[R p,q ] (constructed using the previous lgorithm), we cn verify tht the originl lnguge L(N) is ccepted y this new generlized NFA. Node elimintion only pplies if the generlized NFA hs t lest one node distinct from s nd t. Pick ny node r distinct from s nd t. For every pir (p, q) wherep r nd q r, replce the lel of the edge from p to q s indicted elow: R r,r r R p,r R r,q p R p,q q Figure 5.11: Before Eliminting node r

88 88 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S p R p,q + R p,r R r,r R r,q q Figure 5.12: After Eliminting node r At the end of this step, delete the node r nd ll edges djcent to r. Note tht p = q is possile, in which cse the tringle is flt. It is lso possile tht p = s or q = t. Also, this step is performed for ll pirs (p, q), which mens tht oth (p, q) nd (q, p) re considered (when p q)). Note tht this step only hs n effect if there re edges from p to r nd from r to q in the originl NFA N. Otherwise, r cn simply e deleted, s well s the edges djcent to r. Other simplifictions cn e mde. For exmple, when R r,r =, we cn simplify R p,r R r,r R r,q to R p,r R r,q.whenr p,q =, wehver p,r R r,r R r,q. The order in which the nodes re eliminted is irrelevnt, lthough it ffects the size of the finl expression. The lgorithm stops when the only remining nodes re s nd t. Then, the lel R of the edge from s to t is regulr expression denoting L(N). For exmple, let L = {w Σ w contins n odd numer of s or n odd numer of s}. An NFA for L fter the preprocessing phse is:

89 5.7. REGULAR EXPRESSIONS AND NFA S 89 0 ϵ 1 2 ϵ 3 4 ϵ ϵ 5 Figure 5.13: NFA for L (fter preprocessing phse) After eliminting node 2: 0 ϵ ϵ + ϵ 5 After eliminting node 3: Figure 5.14: NFA for L (fter eliminting node 2)

90 90 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S + 0 ϵ ϵ Figure 5.15: NFA for L (fter eliminting node 3) After eliminting node 4: S ϵ T Figure 5.16: NFA for L (fter eliminting node 4) where T = + +( + )( + ) (ϵ + + ) nd S = + +( + )( + ) ( + ). Finlly, fter eliminting node 1, we get: R =( + +( + )( + ) ( + )) ( + +( + )( + ) (ϵ + + )).

91 5.8. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ Right-Invrint Equivlence Reltions on Σ The purpose of this section is to give one more chrcteriztion of the regulr lnguges in terms of certin kinds of equivlence reltions on strings. Pushing this chrcteriztion it further, we will e le to show how miniml DFA s cn e found. Let D =(Q, Σ,δ,q 0,F)eDFA.TheDFAD my e redundnt, for exmple, if there re sttes tht re not ccessile from the strt stte. The set Q r of ccessile or rechle sttes is the suset of Q defined s Q r = {p Q w Σ,δ (q 0,w)=p}. If Q Q r, we cn clen up D y deleting the sttes in Q Q r nd restricting the trnsition function δ to Q r. This wy, we get n equivlent DFA D r such tht L(D) =L(D r ), where ll the sttes of D r re rechle. From now on, we ssume tht we re deling with DFA s such tht D = D r, clled trim, or rechle. Recll tht n equivlence reltion on set A is reltion which is reflexive, symmetric, nd trnsitive. Given ny A, theset { A } is clled the equivlence clss of, nd it is denoted s [],orevens[]. Recll tht for ny two elements, A, [] [] = iff, nd[] =[] iff. The set of equivlence clsses ssocited with the equivlence reltion is prtition Πof A (lso denoted s A/ ). This mens tht it is fmily of nonempty pirwise disjoint sets whose union is equl to A itself. The equivlence clsses re lso clled the locks of the prtition Π. The numer of locks in the prtition Π is clled the index of (nd Π). Given ny two equivlence reltions 1 nd 2 with ssocited prtitions Π 1 nd Π 2, 1 2 iff every lock of the prtition Π 1 is contined in some lock of the prtition Π 2.Then,every lock of the prtition Π 2 is the union of locks of the prtition Π 1,ndwesytht 1 is refinement of 2 (nd similrly, Π 1 is refinement of Π 2 ). Note tht Π 2 hs t most s mny locks s Π 1 does. We now define n equivlence reltion on strings induced y DFA. This equivlence is kind of oservtionl equivlence, in the sense tht we decide tht two strings u, v re equivlent iff, when feeding first u nd then v to the DFA, u nd v drive the DFA to the sme stte. From the point of view of the oserver, u nd v hve the sme effect (reching the sme stte). Definition Given DFA D =(Q, Σ,δ,q 0,F), we define the reltion D on Σ s follows: for ny two strings u, v Σ, u D v iff δ (q 0,u)=δ (q 0,v).

92 92 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Exmple 5.3. We cn figure out wht the equivlence clsses of D re for the following DFA: with 0 oth strt stte nd (unique) finl stte. For exmple There re three equivlences clsses: D D ϵ D. [ϵ], [], []. Oserve tht L(D) =[ϵ]. Also, the equivlence clsses re in one to one correspondence with the sttes of D. The reltion D turns out to hve some interesting properties. In prticulr, it is rightinvrint, which mens tht for ll u, v, w Σ, if u v, thenuw vw. Proposition 5.9. Given ny (trim) DFA D =(Q, Σ,δ,q 0,F), thereltion D is n equivlence reltion which is right-invrint nd hs finite index. Furthermore, if Q hs n sttes, then the index of D is n, ndeveryequivlenceclssof D is regulr lnguge. Finlly, L(D) is the union of some of the equivlence clsses of D. Proof. The fct tht D is n equivlence reltion is trivil verifiction. To prove tht D is right-invrint, we first prove y induction on the length of v tht for ll u, v Σ, for ll p Q, δ (p, uv) =δ (δ (p, u), v). Then, if u D v, which mens tht δ (q 0,u)=δ (q 0,v), we hve δ (q 0,uw)=δ (δ (q 0,u), w)=δ (δ (q 0,v), w)=δ (q 0,vw), which mens tht uw D vw. Thus, D is right-invrint. We still hve to prove tht D hs index n. Define the function f :Σ Q such tht f(u) =δ (q 0,u).

93 5.8. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 93 Note tht if u D v, which mens tht δ (q 0,u)=δ (q 0,v), then f(u) =f(v). Thus, the function f :Σ Q hs the sme vlue on ll the strings in some equivlence clss [u], so it induces function f :Π Q defined such tht f([u]) = f(u) for every equivlence clss [u] Π, where Π = Σ / is the prtition ssocited with D. This function is well defined since f(v) hs the sme vlue for ll elements v in the equivlence clss [u]. However, the function f :Π Q is injective (one-to-one), since f([u]) = f([v]) is equivlent to f(u) =f(v) (since y definition of f we hve f([u]) = f(u) nd f([v]) = f(v)), which y definition of f mens tht δ (q 0,u)=δ (q 0,v), which mens precisely tht u D v,tht is, [u] =[v]. Since Q hs n sttes, Π hs t most n locks. Moreover, since every stte is ccessile, for every q Q, there is some w Σ so tht δ (q 0,w)=q, which shows tht f([w]) = f(w) =q. Consequently, f is lso surjective. But then, eing injective nd surjective, f is ijective nd Π hs exctly n locks. Every equivlence clss of Π is set of strings of the form {w Σ δ (q 0,w)=p}, for some p Q, which is ccepted y the DFA D p =(Q, Σ,δ,q 0, {p}) otined from D y chnging F to {p}. Thus, every equivlence clss is regulr lnguge. Finlly, since L(D) ={w Σ δ (q 0,w) F } = f F{w Σ δ (q 0,w)=f} = f F L(D f ), we see tht L(D) is the union of the equivlence clsses corresponding to the finl sttes in F. One should not e too optimistic nd hope tht every equivlence reltion on strings is right-invrint. Exmple 5.4. For exmple, if Σ = {}, the equivlence reltion given y the prtition { ϵ,, 4, 9, 16,..., n2,... n 0 } { 2, 3, 5, 6, 7, 8,..., m,... m is not squre }

94 94 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S we hve 4, yet y conctenting on the right with 5, since 5 = 6 nd 4 5 = 9 we get 6 9, tht is, 6 nd 9 re not equivlent. It turns out tht the prolem is tht neither equivlence clss is regulr lnguge. It is worth noting tht right-invrint equivlence reltion is not necessrily leftinvrint, which mens tht if u v then wu wv. Exmple 5.5. For exmple, if is given y the four equivlence clsses C 1 = {}, C 2 = {}, C 3 = {}, C 4 = {} {, } + {} {, }, then we cn check tht is right-invrint y figuring out the inclusions C i C j C i C j, which re recorded in the following tle: nd C 1 C 2 C 3 C 2 C 4 C 4 C 3 C 4 C 1 C 4 C 4 C 4 However, oth, C 4,yet C 4 nd C 2,so is not left-invrint. The remrkle fct due to Myhill nd Nerode is tht Proposition 5.9 hs converse. Indeed, given right-invrint equivlence reltion of finite index it is possile to reconstruct DFA, nd y suitle choice of finl stte, every equivlence clss is ccepted y such DFA. Let us show how this DFA is constructed using simple exmple. Exmple 5.6. Consider the equivlence reltion on {, } given y the three equivlence clsses C 1 = {ϵ}, C 2 = {, }, C 3 = {, }. We leve it s n esy exercise to check tht is right-invrint. For exmple, if u v nd u, v C 2,thenu = x nd v = y for some x, y {, },sofornyw {, } we hve uw = xw nd vw = yw, which mens tht we lso hve uw, vw C 2,thusuw vw. For ny suset C {, } nd ny string w {, } define Cw s the set of strings Cw = {uw u C}. There re two resons why DFA cn e recovered from the right-invrint equivlence reltion :

95 5.8. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 95 (1) For every equivlence clss C i nd every string w, there is unique equivlence clss C j such tht C i w C j. Actully, it is enough to check the ove property for strings w of length 1 (i.e. symols in the lphet) ecuse the property for ritrry strings follows y induction. (2) For every w Σ nd every clss C i, C 1 w C i iff w C i, where C 1 is the equivlence clss of the empty string. We cn mke tle recording these inclusions. Exmple 5.7. Continuing Exmple 5.6, we get: C 1 C 2 C 3 C 2 C 2 C 2 C 3 C 3 C 3 For exmple, from C 1 = {ϵ} we hve C 1 = {} C 2 nd C 1 = {} C 3,forC 2 = {, },wehvec 2 = {, } C 2 nd C 2 = {, } C 2,ndforC 3 = {, },we hve C 3 = {, } C 3 nd C 3 = {, } C 3. The key point is tht the ove tle is the trnsition tle of DFA with strtstte C 1 =[ϵ]. Furthermore, if C i (i =1, 2, 3) is chosen s single finl stte, the corresponding DFA D i ccepts C i. This is the converse of Myhill-Nerode! Oserve tht the inclusions C i w C j my e strict inclusions. For exmple, C 1 = {} is proper suset of C 2 = {, } Let us do nother exmple. Exmple 5.8. Consider the equivlence reltion given y the four equivlence clsses C 1 = {ϵ}, C 2 = {}, C 3 = {} +, C 4 = {, } + {} + {, }. We leve it s n esy exercise to check tht is right-invrint. We otin the following tle of inclusions C i C j nd C i C j :

96 96 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S C 1 C 2 C 3 C 2 C 4 C 4 C 3 C 4 C 3 C 4 C 4 C 4 For exmple, from C 3 = {} + we get C 3 = {} + C 4,ndC 3 = {} + C 3. The ove tle is the trnsition function of DFA with four sttes nd strt stte C 1. If C i (i =1, 2, 3, 4) is chosen s single finl stte, the corresponding DFA D i ccepts C i. Here is the generl result. Proposition Given ny equivlence reltion on Σ,if is right-invrint nd hs finite index n, theneveryequivlenceclss(lock)intheprtitionπ ssocited with is regulr lnguge. Proof. Let C 1,...,C n e the locks of Π, nd ssume tht C 1 =[ϵ] is the equivlence clss of the empty string. First, we clim tht for every lock C i nd every w Σ, there is unique lock C j such tht C i w C j,wherec i w = {uw u C i }. For every u C i, the string uw elongs to one nd only one of the locks of Π, sy C j. For ny other string v C i, since (y definition) u v, y right invrince, we get uw vw, ut since uw C j nd C j is n equivlence clss, we lso hve vw C j. This proves the first clim. We lso clim tht for every w Σ, for every lock C i, C 1 w C i iff w C i. If C 1 w C i, since C 1 =[ϵ], we hve ϵw = w C i. Conversely, if w C i,forny v C 1 =[ϵ], since ϵ v, y right invrince we hve w vw, ndthusvw C i, which shows tht C 1 w C i. For every clss C k, let D k =({1,...,n}, Σ,δ,1, {k}), where δ(i, ) =j iff C i C j. We will prove the following equivlence: δ (i, w) =j iff C i w C j. For this, we prove the following two implictions y induction on w : () If δ (i, w) =j, thenc i w C j,nd

97 5.8. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 97 () If C i w C j,thenδ (i, w) =j. The se cse (w = ϵ) is trivil for oth () nd (). We leve the proof of the induction step for () s n exercise nd give the proof of the induction step for () ecuse it is more sutle. Let w = u, with Σndu Σ.IfC i u C j, then y the first clim, we know tht there is unique lock, C k,suchthtc i u C k. Furthermore, there is unique lock, C h,suchthtc k C h,utc i u C k implies C i u C k so we get C i u C h.however, y the uniqueness of the lock, C j,suchthtc i u C j,wemusthvec h = C j. By the induction hypothesis, s C i u C k,wehve δ (i, u) =k nd, y definition of δ, sc k C j (= C h ), we hve δ(k, ) =j, sowededucetht δ (i, u) =δ(δ (i, u),)=δ(k, ) =j, s desired. Then, using the equivlence just proved nd the second clim, we hve L(D k )={w Σ δ (1,w) {k}} = {w Σ δ (1,w)=k} = {w Σ C 1 w C k } = {w Σ w C k } = C k, proving tht every lock, C k, is regulr lnguge. In generl it is flse tht C i = C j for some lock C j, nd we cn only clim tht C i C j. We cn comine Proposition 5.9 nd Proposition 5.10 to get the following chrcteriztion of regulr lnguge due to Myhill nd Nerode: Theorem (Myhill-Nerode) A lnguge L (over n lphet Σ) isregulrlngugeiff it is the union of some of the equivlence clsses of n equivlence reltion on Σ,which is right-invrint nd hs finite index. Given two DFA s D 1 nd D 2, whether or not there is morphism h: D 1 D 2 depends on the reltionship etween D1 nd D2. More specificlly, we hve the following proposition: Proposition Given two DFA s D 1 nd D 2,withD 1 trim, the following properties hold: (1) There is DFA morphism h: D 1 D 2 iff D1 D2.

98 98 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S (2) There is DFA F -mp h: D 1 D 2 iff (3) There is DFA B-mp h: D 1 D 2 iff Furthermore, h is surjective iff D 2 is trim. D1 D2 nd L(D 1 ) L(D 2 ); D1 D2 nd L(D 2 ) L(D 1 ). Theorem 5.11 cn lso e used to prove tht certin lnguges re not regulr. A generl scheme (not the only one) goes s follows: If L is not regulr, then it must e infinite. Now, we rgue y contrdiction. If L ws regulr, then y Myhill-Nerode, there would e some equivlence reltion, which is right-invrint nd of finite index, nd such tht L is the union of some of the clsses of. Becuse Σ is infinite nd hs only finitely mny equivlence clsses, there re strings x, y Σ with x y so tht x y. If we cn find third string, z Σ,suchtht xz L nd yz / L, then we rech contrdiction. Indeed, y right invrince, from x y, wegetxz yz. But, L is the union of equivlence clsses of, so if xz L, then we should lso hve yz L, contrdicting yz / L. Therefore,L is not regulr. Then the scenrio is this: to prove tht L is not regulr, first we check tht L is infinite. If so, we try finding three strings x, y, z, wherendx nd y x re prefixes of strings in L such tht x y, where is right-invrint reltion of finite index such tht L is the union of equivlence of L (which must exist y Myhill Nerode since we re ssuming y contrdiction tht L is regulr), nd where z is chosen so tht xz L nd yz L. Exmple 5.9. For exmple, we prove tht L = { n n n 1} is not regulr. Assuming for the ske of contrdiction tht L is regulr, there is some equivlence reltion which is right-invrint nd of finite index nd such tht L is the union of some of the clsses of. Since the sequence,,,..., i,... is infinite nd hs finite numer of clsses, two of these strings must elong to the sme clss, which mens tht i j for some i j. But since is right invrint, y conctenting with i on the right, we see tht i i j i for some i j. However i i L, nd since L is the union of clsses of, we lso hve j i L for i j, which is surd, given the definition of L. Thus, in fct, L is not regulr.

99 5.8. RIGHT-INVARIANT EQUIVALENCE RELATIONS ON Σ 99 Here is nother illustrtion of the use of the Myhill-Nerode Theorem to prove tht lnguge is not regulr. Exmple We clim tht the lnguge, L = { n! n 1}, is not regulr, where n! (n fctoril) is given y 0! = 1 nd (n +1)!=(n +1)n!. Assume L is regulr. Then, there is some equivlence reltion which is right-invrint nd of finite index nd such tht L is the union of some of the clsses of. Since the sequence, 2,..., n,... is infinite, two of these strings must elong to the sme clss, which mens tht p q for some p, q with 1 p<q.asq! q for ll q 0ndq>p, we cn conctente on the right with q! p nd we get p q! p q q! p, tht is, q! q!+q p. Since p<qwe hve q! <q!+q p. Ifwecnshowtht q!+q p<(q +1)! we will otin contrdiction ecuse then q!+q p / L,yet q!+q p q! nd q! L, contrdicting Myhill-Nerode. Now, s 1 p<q,wehveq p q 1, so if we cn prove tht q!+q p q!+q 1 < (q +1)! we will e done. However, q!+q 1 < (q + 1)! is equivlent to q 1 < (q +1)! q!, nd since (q +1)! q! =(q +1)q! q! =qq!, we simply need to prove tht which holds for q 1. q 1 <q qq!, There is nother version of the Myhill-Nerode Theorem involving congruences which is lso quite useful. An equivlence reltion,, onσ is left nd right-invrint iff for ll x, y, u, v Σ, if x y then uxv uyv. An equivlence reltion,, onσ is congruence iff for ll u 1,u 2,v 1,v 2 Σ, if u 1 v 1 nd u 2 v 2 then u 1 u 2 v 1 v 2.

100 100 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S It is esy to prove tht n equivlence reltion is congruence iff it is left nd rightinvrint. For exmple, ssume tht is left nd right-invrint equivlence reltion, nd ssume tht u 1 v 1 nd u 2 v 2. By right-invrince pplied to u 1 v 1,weget u 1 u 2 v 1 u 2 nd y left-invrince pplied to u 2 v 2 we get By trnsitivity, we conclude tht which shows tht is congruence. v 1 u 2 v 1 v 2. u 1 u 2 v 1 v 2. Proving tht congruence is left nd right-invrint is even esier. There is version of Proposition 5.9 tht pplies to congruences nd for this we define the reltion D s follows: For ny (trim) DFA, D =(Q, Σ,δ,q 0,F), for ll x, y Σ, x D y iff ( q Q)(δ (q, x) =δ (q, y)). Proposition Given ny (trim) DFA, D = (Q, Σ,δ,q 0,F), the reltion D is n equivlence reltion which is left nd right-invrint nd hs finite index. Furthermore, if Q hs n sttes, then the index of D is t most n n nd every equivlence clss of D is regulr lnguge. Finlly, L(D) is the union of some of the equivlence clsses of D. Proof. We leve most of the proof of Proposition 5.13 s n exercise. The lst twoprtsof the proposition re proved using the following fcts: (1) Since D is left nd right-invrint nd hs finite index, in prticulr, D is rightinvrint nd hs finite index, so y Proposition 5.10 every equivlence clss of D is regulr. (2) Oserve tht D D, since the condition δ (q, x) =δ (q, y) holds for every q Q, so in prticulr for q = q 0. But then, every equivlence clss of D is the union of equivlence clsses of D nd since, y Proposition 5.9, L is the union of equivlence clsses of D, we conclude tht L is lso the union of equivlence clsses of D. This completes the proof.

101 5.9. FINDING MINIMAL DFA S 101 Using Proposition 5.13 nd Proposition 5.10, we otin nother version of the Myhill- Nerode Theorem. Theorem (Myhill-Nerode, Congruence Version) A lnguge L (over n lphet Σ) is regulr lnguge iff it is the union of some of the equivlence clsses of n equivlence reltion on Σ,whichiscongruencendhsfiniteindex. We now consider n equivlence reltion ssocited with lnguge L. 5.9 Finding miniml DFA s Given ny lnguge L (not necessrily regulr), we cn define n equivlence reltion ρ L on Σ which is right-invrint, ut not necessrily of finite index. The equivlence reltion ρ L is such tht L is the union of equivlence clsses of ρ L.Furthermore,whenL is regulr, the reltion ρ L hs finite index. In fct, this index is the size of smllest DFA ccepting L. As consequence, if L is regulr, simple modifiction of the proof of Proposition 5.10 pplied to = ρ L yields miniml DFA D ρl ccepting L. Then, given ny trim DFA D ccepting L, the equivlence reltion ρ L cn e trnslted to n equivlence reltion on sttes, in such wy tht for ll u, v Σ, uρ L v iff ϕ(u) ϕ(v), where ϕ: Σ Q is the function (run the DFA D on u from q 0 ) given y ϕ(u) =δ (q 0,u). One cn then construct quotient DFA D/ whose sttes re otined y merging ll sttes in given equivlence clss of sttes into single stte, nd the resulting DFA D/ is mininl DFA. Even though D/ ppers to depend on D, it is in fct unique, nd isomorphic to the strct DFA D ρl induced y ρ L. The lst step in otining the miniml DFA D/ is to give constructive method to compute the stte equivlence reltion. This cn e done y constructing sequence of pproximtions i,whereech i+1 refines i. It turns out tht if D hs n sttes, then there is some index i 0 n 2suchtht nd tht j = i0 for ll j i 0 +1, = i0. Furthermore, i+1 cn e computed inductively from i. In summry, we otin itertive lgorithm for computing tht termintes in t most n 2steps.

102 102 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Definition Given ny lnguge L (over Σ), we define the right-invrint equivlence ρ L ssocited with L s the reltion on Σ defined s follows: for ny two strings u, v Σ, uρ L v iff w Σ (uw L iff vw L). It is cler tht the reltion ρ L is n equivlence reltion, nd it is right-invrint. To show right-invrince, rgue s follows: if uρ L v,thenfornyw Σ, since uρ L v mens tht uz L iff vz L for ll z Σ, in prticulr the ove equivlence holds for ll z of the form z = wy for ny ritry y Σ,sowehve uwy L iff vwy L for ll y Σ, which mens tht uwρ L vw. It is lso cler tht L is the union of the equivlence clsses of strings in L. This is ecuse if u L nd uρ L v, y letting w = ϵ in the definition of ρ L,weget u L iff v L, nd since u L, we lso hve v L. This implies tht if u L then [u] ρl L nd so, L = u L [u] ρl. Exmple For exmple, consider the regulr lnguge L = {} { m m 1}. We leve it s n exercise to show tht the equivlence reltion ρ L equivlence clsses consists of the four C 1 = {ϵ}, C 2 = {}, C 3 = {} +, C 4 = {, } + {} + {, } encountered erlier in Exmple 5.8. Oserve tht L = C 2 C 3. When L is regulr, we hve the following remrkle result: Proposition Given ny regulr lnguge L, forny(trim)dfad =(Q, Σ,δ,q 0,F) such tht L = L(D), ρ L is right-invrint equivlence reltion, nd we hve D ρ L. Furthermore, if ρ L hs m clsses nd Q hs n sttes, then m n.

103 5.9. FINDING MINIMAL DFA S 103 Proof. By definition, u D v iff δ (q 0,u)=δ (q 0,v). Since w L(D) iff δ (q 0,w) F,the fct tht uρ L v cn e expressed s w Σ (uw L iff vw L) iff w Σ (δ (q 0,uw) F iff δ (q 0,vw) F ) iff w Σ (δ (δ (q 0,u),w) F iff δ (δ (q 0,v),w) F ), nd if δ (q 0,u)=δ (q 0,v), this shows tht uρ L v. Since the numer of clsses of D is n nd D ρ L, the equivlence reltion ρ L hs fewer clsses thn D,ndm n. Proposition 5.15 shows tht when L is regulr, the index m of ρ L is finite, nd it is lower ound on the size of ll DFA s ccepting L. It remins to show tht DFA with m sttes ccepting L exists. However, going ck to the proof of Proposition 5.10 strting with the right-invrint equivlence reltion ρ L of finite index m, if L is the union of the clsses C i1,...,c ik,the DFA D ρl =({1,...,m}, Σ,δ,1, {i 1,...,i k }), where δ(i, ) =j iff C i C j, is such tht L = L(D ρl ). In summry, if L is regulr, then the index of ρ L is equl to the numer of sttes of miniml DFA for L, ndd ρl is miniml DFA ccepting L. Exmple For exmple, if L = {} { m m 1}. then we sw in Exmple 5.11 tht ρ L consists of the four equivlence clsses C 1 = {ϵ}, C 2 = {}, C 3 = {} +, C 4 = {, } + {} + {, }, nd we showed in Exmple 5.8 tht the trnsition tle of D ρl is given y C 1 C 2 C 3 C 2 C 4 C 4 C 3 C 4 C 3 C 4 C 4 C 4 By picking the finl sttes to e C 2 nd C 3, we otin the miniml DFA D ρl ccepting L = {} { m m 1}. In the next section, we give n lgorithm which llows us to find D ρl, given ny DFA D ccepting L. This lgorithms finds which sttes of D re equivlent.

104 104 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S 5.10 Stte Equivlence nd Miniml DFA s The proof of Proposition 5.15 suggests the following definition of n equivlence etween sttes: Definition Given ny DFA D =(Q, Σ,δ,q 0,F), the reltion on Q, clled stte equivlence, is defined s follows: for ll p, q Q, p q iff w Σ (δ (p, w) F iff δ (q, w) F ). ( ) When p q, wesythtp nd q re indistinguishle. It is trivil to verify tht is n equivlence reltion, nd tht it stisfies the following property: if p q then δ(p, ) δ(q, ), for ll Σ. To prove the ove, since the condition defining must hold for ll strings w Σ, in prticulr it must hold for ll strings of the form w = u with Σndu Σ, so if p q then we hve ( Σ)( u Σ )(δ (p, u) F iff δ (q, u) F ) iff ( Σ)( u Σ )(δ (δ (p, ),u) F iff δ (δ (q, ),u) F ) iff ( Σ)( u Σ )(δ (δ(p, ),u) F iff δ (δ(q, ),u) F ) iff ( Σ)(δ(p, ) δ(q, )). Since condition ( ) in Definition 5.13 must hold for w = ϵ, in this cse we get δ (p, ϵ) F iff δ (q, ϵ) F, which, since δ (p, ϵ) =p nd δ (q, ϵ) =q, is equivlent to p F iff q F. Therefore, if two sttes p, q re equivlent, then either oth p, q F or oth p, q F. This implies tht finl stte nd rejecting sttes re never equivlent. Exmple The reder should check tht sttes A nd C in the DFA elow re equivlent nd tht no other distinct sttes re equivlent.

105 5.10. STATE EQUIVALENCE AND MINIMAL DFA S 105 B A D E C Figure 5.17: A non-miniml DFA for {, } {} It is illuminting to express stte equivlence s the equlity of two lnguges. Given the DFA D =(Q, Σ,δ,q 0,F), let D p =(Q, Σ,δ,p,F) e the DFA otined from D y redefining the strt stte to e p. Then, it is cler tht p q iff L(D p )=L(D q ). This simple oservtion implies tht there is n lgorithm to test stte equivlence. Indeed, we simply hve to test whether the DFA s D p nd D q ccept the sme lnguge nd this cn e done using the cross-product construction. Indeed, L(D p ) = L(D q ) iff L(D p ) L(D q )= nd L(D q ) L(D p )=. Now, if (D p D q ) 1 2 denotes the cross-product DFA with strting stte (p, q) nd with finl sttes F (Q F )nd(d p D q ) 2 1 denotes the cross-product DFA lso with strting stte (p, q) nd with finl sttes (Q F ) F,we know tht L((D p D q ) 1 2 )=L(D p ) L(D q ) nd L((D p D q ) 2 1 )=L(D q ) L(D p ), so ll we need to do if to test whether (D p D q ) 1 2 nd (D p D q ) 2 1 ccept the empty lnguge. However, we know tht this is the cse iff the set of sttes rechle from (p, q) in (D p D q ) 1 2 contins no stte in F (Q F ) nd the set of sttes rechle from (p, q) in (D p D q ) 2 1 contins no stte in (Q F ) F. Actully, the grphs of (D p D q ) 1 2 nd (D p D q ) 2 1 re identicl, so we only need to check tht no stte in (F (Q F )) ((Q F ) F ) is rechle from (p, q) in tht grph. This lgorithm to test stte equivlence is not the most efficient ut it is quite resonle (it runs in polynomil time). If L = L(D), Theorem 5.16 elow shows the reltionship etween ρ L nd nd, more generlly, etween the DFA, D ρl,ndthedfa,d/, otined s the quotient of the DFA D modulo the equivlence reltion on Q. The miniml DFA D/ is otined y merging the sttes in ech lock C i of the prtition Π ssocited with, forming sttes corresponding to the locks C i, nd drwing

106 106 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S trnsition on input from lock C i to lock C j of Π iff there is trnsition q = δ(p, ) from ny stte p C i to ny stte q C j on input. The strt stte is the lock contining q 0, nd the finl sttes re the locks consisting of finl sttes. Exmple For exmple, consider the DFA D 1 ccepting L = {, } shown in Figure , Figure 5.18: A nonminiml DFA D 1 for L = {, } This is not miniml DFA. In fct, 0 2 nd 3 5. Here is the miniml DFA for L: 0, 2 1 3, 5 4, Figure 5.19: A miniml DFA D 2 for L = {, } The miniml DFA D 2 is otined y merging the sttes in the equivlence clss {0, 2} into single stte, similrly merging the sttes in the equivlence clss {3, 5} into single

107 5.10. STATE EQUIVALENCE AND MINIMAL DFA S 107 stte, nd drwing the trnsitions etween equivlence clsses. We otin the DFA shown in Figure Formlly, the quotient DFA D/ is defined such tht D/ =(Q/, Σ,δ/, [q 0 ],F/ ), where δ/ ([p],)=[δ(p, )]. Theorem For ny (trim) DFA D =(Q, Σ,δ,q 0,F) ccepting the regulr lnguge L = L(D), thefunctionϕ: Σ Q defined such tht stisfies the property ϕ(u) =δ (q 0,u) uρ L v iff ϕ(u) ϕ(v) for ll u, v Σ, nd induces ijection ϕ: Σ /ρ L Q/, definedsuchtht ϕ([u] ρl )=[δ (q 0,u)]. Furthermore, we hve [u] ρl [v] ρl iff δ(ϕ(u),) ϕ(v). Consequently, ϕ induces n isomorphism of DFA s, ϕ: D ρl D/ (i.e., n invertile F - mp whose inverse is lso n F -mp; we know from homework prolem tht such mp, ϕ, musteproperhomomorphismwhoseinverseislsoproperhomomorphism). Proof. Since ϕ(u) =δ (q 0,u)ndϕ(v) =δ (q 0,v), the fct tht ϕ(u) ϕ(v) cneexpressed s w Σ (δ (δ (q 0,u),w) F iff δ (δ (q 0,v),w) F ) iff which is exctly uρ L v.therefore, w Σ (δ (q 0,uw) F iff δ (q 0,vw) F ), uρ L v iff ϕ(u) ϕ(v). From the ove, we see tht the equivlence clss [ϕ(u)] of ϕ(u) doesnotdependonthe choice of the representtive in the equivlence clss [u] ρl of u Σ, since for ny v Σ, if uρ L v then ϕ(u) ϕ(v), so [ϕ(u)] =[ϕ(v)]. Therefore, the function ϕ: Σ Q mps

108 108 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S ech equivlence clss [u] ρl modulo ρ L to the equivlence clss [ϕ(u)] modulo, ndso the function ϕ: Σ /ρ L Q/ given y ϕ([u] ρl )=[ϕ(u)] =[δ (q 0,u)] is well-defined. Moreover, ϕ is injective, since ϕ([u]) = ϕ([v]) iff ϕ(u) ϕ(v) iff (from ove) uρ v v iff [u] =[v]. Since every stte in Q is ccessile, for every q Q, there is some u Σ so tht ϕ(u) =δ (q 0,u)=q, so ϕ([u]) = [q] nd ϕ is surjective. Therefore, we hve ijection ϕ: Σ /ρ L Q/. Since ϕ(u) =δ (q 0,u), we hve δ(ϕ(u),)=δ(δ (q 0,u),)=δ (q 0,u)=ϕ(u), nd thus, δ(ϕ(u),) ϕ(v) cneexpressedsϕ(u) ϕ(v). By the previous prt, this is equivlent to uρ L v, nd we clim tht this is equivlent to [u] ρl [v] ρl. First, if [u] ρl [v] ρl,thenu [v] ρl, tht is, uρ L v. Conversely, if uρ L v,thenforevery u [u] ρl,wehveu ρ L u, so y right-invrince we get u ρ L u, nd since uρ L v,weget u ρ L v, tht is, u [v] ρl. Since u [u] ρl is ritrry, we conclude tht [u] ρl [v] ρl. Therefore, we proved tht δ(ϕ(u),) ϕ(v) iff [u] ρl [v] ρl. It is then esy to check (do it!) tht ϕ induces n F -mp of DFA s which is n isomorphism (i.e., n invertile F -mp whose inverse is lso n F -mp), ϕ: D ρl D/. Theorem 5.16 shows tht the DFA D ρl is isomorphic to the DFA D/ otined s the quotient of the DFA D modulo the equivlence reltion on Q. Since D ρl is miniml DFA ccepting L, so is D/. Exmple Consider the following DFA D, with strt stte 1 nd finl sttes 2 nd 3. It is esy to see tht L(D) ={} { m m 1}.

109 5.10. STATE EQUIVALENCE AND MINIMAL DFA S 109 It is not hrd to check tht sttes 4 nd 5 re equivlent, nd no other pirs of distinct sttes re equivlent. The quotient DFA D/ is otined my merging sttes 4 nd 5, nd we otin the following miniml DFA: with strt stte 1 nd finl sttes 2 nd 3. This DFA is isomorphic to the DFA D ρl Exmple of There re other chrcteriztions of the regulr lnguges. Among those, the chrcteriztion in terms of right derivtives is of prticulr interest ecuse it yields n lterntive construction of miniml DFA s. Definition Given ny lnguge, L Σ, for ny string, u Σ,theright derivtive of L y u, denotedl/u, is the lnguge L/u = {w Σ uw L}. Theorem If L Σ is ny lnguge, then L is regulr iff it hs finitely mny right derivtives. Furthermore, if L is regulr, then ll its right derivtives re regulr nd their numer is equl to the numer of sttes of the miniml DFA s for L. Proof. It is esy to check tht L/u = L/v iff uρ L v. The ove shows tht ρ L hs finite numer of clsses, sy m, iff there is finite numer of right derivtives, sy n, nd if so, m = n. IfL is regulr, then we know tht the numer of equivlence clsses of ρ L is the numer of sttes of the miniml DFA s for L, sothenumer of right derivtives of L is equl to the size of the miniml DFA s for L. Conversely, if the numer of derivtives is finite, sy m, thenρ L hs m clsses nd y Myhill-Nerode, L is regulr. It remins to show tht if L is regulr then every right derivtive is regulr. Let D =(Q, Σ,δ,q 0,F) e DFA ccepting L. Ifp = δ (q 0,u), then let D p =(Q, Σ,δ,p,F), tht is, D with with p s strt stte. It is cler tht L/u = L(D p ),

110 110 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S so L/u is regulr for every u Σ. Also oserve tht if Q = n, thenthereretmostn DFA s D p, so there is t most n right derivtives, which is nother proof of the fct tht regulr lnguge hs finite numer of right derivtives. If L is regulr then the construction of miniml DFA for L cn e recst in terms of right derivtives. Let L/u 1,L/u 2,...,L/u m e the set of ll the right derivtives of L. Of course, we my ssume tht u 1 = ϵ. We form DFA whose sttes re the right derivtives, L/u i. For every stte, L/u i,forevery Σ, there is trnsition on input from L/u i to L/u j = L/(u i ). The strt stte is L = L/u 1 nd the finl sttes re the right derivtives, L/u i, for which ϵ L/u i. We leve it s n exercise to check tht the ove DFA ccepts L. One wy to do this is to recll tht L/u = L/v iff uρ L v nd to oserve tht the ove construction mimics the construction of D ρl s in the Myhill-Nerode proposition (Proposition 5.10). This DFA is miniml since the numer of right derivtives is equl to the size of the miniml DFA s for L. We now return to stte equivlence. Note tht if F =, then hs single lock (Q), nd if F = Q, then hs single lock (F ). In the first cse, the miniml DFA is the one stte DFA rejecting ll strings. In the second cse, the miniml DFA is the one stte DFA ccepting ll strings. When F nd F Q, there re t lest two sttes in Q, nd lso hs t lest two locks, s we shll see shortly. It remins to compute explicitly. This is done using sequence of pproximtions. In view of the previous discussion, we re ssuming tht F nd F Q, which mens tht n 2, where n is the numer of sttes in Q. Definition Given ny DFA D =(Q, Σ,δ,q 0,F), for every i 0, the reltion i on Q, clled i-stte equivlence, is defined s follows: for ll p, q Q, p i q iff w Σ, w i (δ (p, w) F iff δ (q, w) F ). When p i q,wesythtp nd q re i-indistinguishle. Since stte equivlence is defined such tht we note tht testing the condition p q iff w Σ (δ (p, w) F iff δ (q, w) F ), δ (p, w) F iff δ (q, w) F for ll strings in Σ is equivlent to testing the ove condition for ll strings of length t most i for ll i 0, i.e. p q iff i 0 w Σ, w i (δ (p, w) F iff δ (q, w) F ).

111 5.10. STATE EQUIVALENCE AND MINIMAL DFA S 111 Since i is defined such tht p i q iff w Σ, w i (δ (p, w) F iff δ (q, w) F ), we conclude tht p q iff i 0(p i q). This identity cn lso e expressed s = i. i 0 If we ssume tht F nd F Q, oservetht 0 hs exctly two equivlence clsses F nd Q F, since ϵ is the only string of length 0, nd since the condition is equivlent to the condition δ (p, ϵ) F iff δ (q, ϵ) F p F iff q F. It is lso ovious from the definition of i tht i+1 i 1 0. If this sequence ws strictly decresing for ll i 0, the prtition ssocited with i+1 would contin t lest one more lock thn the prtition ssocited with i nd since we strt with prtition with two locks, the prtition ssocited with i would hve t lest i+2 locks. But then, for i = n 1, the prtition ssocited with n 1 would hve t lest n + 1 locks, which is surd since Q hs only n sttes. Therefore, there is smllest integer, i 0 n 2, such tht i0 +1 = i0. Thus, it remins to compute i+1 from i, which cn e done using the following proposition: The proposition lso shows tht = i0. Proposition For ny (trim) DFA D =(Q, Σ,δ,q 0,F), forllp, q Q, p i+1 q iff p i q nd δ(p, ) i δ(q, ), forevery Σ. Furthermore,ifF nd F Q, thereis smllest integer i 0 n 2, suchtht Proof. By the definition of the reltion i, i0 +1 = i0 =. p i+1 q iff w Σ, w i +1(δ (p, w) F iff δ (q, w) F ).

112 112 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S The trick is to oserve tht the condition δ (p, w) F iff δ (q, w) F holds for ll strings of length t most i + 1 iff it holds for ll strings of length t most i nd for ll strings of length etween 1 nd i + 1. This is expressed s p i+1 q iff w Σ, w i (δ (p, w) F iff δ (q, w) F ) nd w Σ, 1 w i +1(δ (p, w) F iff δ (q, w) F ). Oviously, the first condition in the conjunction is p i q, nd since every string w such tht 1 w i + 1 cn e written s u where Σnd0 u i, the second condition in the conjunction cn e written s Σ u Σ, u i (δ (p, u) F iff δ (q, u) F ). However, δ (p, u) =δ (δ(p, ),u)ndδ (q, u) =δ (δ(q, ),u), so tht the ove condition is relly Σ(δ(p, ) i δ(q, )). Thus, we showed tht p i+1 q iff p i q nd Σ(δ(p, ) i δ(q, )). Thus, if i+1 = i for some i 0, using induction, we lso hve i+j = i for ll j 1. Since = i 0 i, i+1 i, nd since we know tht there is smllest index sy i 0,suchtht j = i0, for ll j i 0 +1, we hve = i0. Using Proposition 5.18, we cn compute inductively, strting from 0 =(F, Q F ), nd computing i+1 from i, until the sequence of prtitions ssocited with the i stilizes. Note tht if F = Q or F =, then = 0, nd the inductive chrcteriztion of Proposition 5.18 holds trivilly. There re numer of lgorithms for computing, or to determine whether p q for some given p, q Q. A simple method to compute is descried in Hopcroft nd Ullmn. The sic ide is to propgte inequivlence, rther thn equivlence. The method consists in forming tringulr rry corresponding to ll unordered pirs (p, q), with p q (the rows nd the columns of this tringulr rry re indexed y the

113 5.10. STATE EQUIVALENCE AND MINIMAL DFA S 113 sttes in Q, where the entries re elow the descending digonl). Initilly, the entry (p, q) is mrked iff p nd q re not 0-equivlent, which mens tht p nd q re not oth in F or not oth in Q F. Then, we process every unmrked entry on every row s follows: for ny unmrked pir (p, q), we consider pirs (δ(p, ),δ(q, )), for ll Σ. If ny pir (δ(p, ),δ(q, )) is lredy mrked, this mens tht δ(p, ) ndδ(q, ) reinequivlent, ndthusp nd q re inequivlent, nd we mrk the pir (p, q). We continue in this fshion, until t the end of round during which ll the rows re processed, nothing hs chnged. When the lgorithm stops, ll mrked pirs re inequivlent, nd ll unmrked pirs correspond to equivlent sttes. Let us illustrtes the ove method. Exmple Consider the following DFA ccepting {, } {}: A B C B B D C B C D B E E B C The strt stte is A, nd the set of finl sttes is F = {E}. (This is the DFA displyed in Figure 5.10.) The initil (hlf) rry is s follows, using to indicte tht the corresponding pir (sy, (E,A)) consists of inequivlent sttes, nd to indicte tht nothing is known yet. After the first round, we hve B C D E B C A B C D D E A B C D

114 114 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S After the second round, we hve B C D E A B C D Finlly, nothing chnges during the third round, nd thus, only A nd C re equivlent, nd we get the four equivlence clsses ({A, C}, {B}, {D}, {E}). We otin the miniml DFA showed in Figure Figure 5.20: A miniml DFA ccepting {, } {} There re wys of improving the efficiency of this lgorithm, see Hopcroft nd Ullmn for such improvements. Fst lgorithms for testing whether p q for some given p, q Q lso exist. One of these lgorithms is sed on forwrd closures, following n ide of Knuth. Such n lgorithm is relted to fst unifiction lgorithm; see Section The Pumping Lemm Another useful tool for proving tht lnguges re not regulr is the so-clled pumping lemm. Proposition (Pumping lemm) Given ny DFA D =(Q, Σ,δ,q 0,F), thereissome m 1 such tht for every w Σ,ifw L(D) nd w m, thenthereexistsdecomposition of w s w = uxv, where

115 5.11. THE PUMPING LEMMA 115 (1) x ϵ, (2) ux i v L(D), forlli 0, nd (3) ux m. Moreover, m cn e chosen to e the numer of sttes of the DFA D. Proof. Let m e the numer of sttes in Q, nd let w = w 1...w n. Since Q contins the strt stte q 0, m 1. Since w m, wehven m. Since w L(D), let (q 0,q 1,...,q n ), e the sequence of sttes in the ccepting computtion of w (where q n F ). Consider the susequence (q 0,q 1,...,q m ). This sequence contins m + 1 sttes, ut there re only m sttes in Q, ndthus,wehve q i = q j,forsomei, j such tht 0 i<j m. Then, letting u = w 1...w i, x = w i+1...w j, nd v = w j+1...w n, it is cler tht the conditions of the proposition hold. An importnt consequence of the pumping lemm is tht if DFA D hs m sttes nd if there is some string w L(D) suchtht w m, thenl(d) is infinite. Indeed, y the pumping lemm, w L(D) cn e written s w = uxv with x ϵ, nd ux i v L(D) for ll i 0. Since x ϵ, wehve x > 0, so for ll i, j 0 with i<jwe hve ux i v < ux i v +(j i) x = ux j v, which implies tht ux i v ux j v for ll i<j, nd the set of strings {ux i v i 0} L(D) is n infinite suset of L(D), which is itself infinite. As consequence, if L(D) is finite, there re no strings w in L(D) suchtht w m. In this cse, since the premise of the pumping lemm is flse, the pumping lemm holds vcuously; tht is, if L(D) is finite, the pumping lemm yields no informtion. Another corollry of the pumping lemm is tht there is test to decide whetherdfa D ccepts n infinite lnguge L(D). Proposition Let D e DFA with m sttes, The lnguge L(D) ccepted y D is infinite iff there is some string w L(D) such tht m w < 2m.

116 116 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S If L(D) is infinite, there re strings of length m in L(D), ut prirori there is no gurntee tht there re short strings w in L(D), tht is, strings whose length is uniformly ounded y some function of m independent of D. The pumping lemm ensures tht there re such strings, nd the function is m 2m. Typiclly, the pumping lemm is used to prove tht lnguge is not regulr. The method is to proceed y contrdiction, i.e., to ssume (contrry to wht we wish to prove) tht lnguge L is indeed regulr, nd derive contrdiction of the pumping lemm. Thus, it would e helpful to see wht the negtion of the pumping lemm is, nd for this, we first stte the pumping lemm s logicl formul. We will use the following revitions: nt = {0, 1, 2,...}, pos = {1, 2,...}, A w = uxv, B x ϵ, C ux m, P i: nt (ux i v L(D)). The pumping lemm cn e stted s ) D :DFA m: pos w :Σ ((w L(D) w m) ( u, x, v :Σ A B C P ). Reclling tht nd (A B C P ) (A B C) P (A B C) P (R S) R S, the negtion of the pumping lemm cn e stted s ) D :DFA m: pos w :Σ ((w L(D) w m) ( u, x, v :Σ (A B C) P ). Since P i: nt (ux i v/ L(D)), in order to show tht the pumping lemm is contrdicted, one needs to showthtforsome DFA D, foreverym 1, there is some string w L(D) of length t lest m, suchthtfor every possile decomposition w = uxv stisfying the constrints x ϵ nd ux m, there is some i 0suchthtux i v/ L(D). When proceeding y contrdiction, we hve lnguge L tht we re (wrongly) ssuming to e regulr, nd we cn use ny DFA D ccepting L. The cretive prt of the rgument is to pick the right w L (not mking ny ssumption on m w ).

117 5.11. THE PUMPING LEMMA 117 As n illustrtion, let us use the pumping lemm to prove tht L 1 = { n n n 1} is not regulr. The usefulness of the condition ux m lies in the fct tht it reduces the numer of legl decomposition uxv of w. We proceed y contrdiction. Thus, let us ssume tht L 1 = { n n n 1} is regulr. If so, it is ccepted y some DFA D. Now, we wish to contrdict the pumping lemm. For every m 1, let w = m m. Clerly, w = m m L 1 nd w m. Then, every legl decomposition u, x, v of w is such tht w =... }{{}... }{{}... }{{} u x v where x ϵ nd x ends within the s, since ux m. Since x ϵ, the string uxxv is of the form n m where n>m,ndthusuxxv / L 1, contrdicting the pumping lemm. Let us consider two more exmples. let L 2 = { m n 1 m<n}. We clim tht L 2 is not regulr. Our first proof uses the pumping lemm. For ny m 1, pick w = m m+1. We hve w L 2 nd w m so we need to contrdict the pumping lemm. Every legl decomposition u, x, v of w is such tht w =... }{{}... }{{}... }{{} u x v where x ϵ nd x ends within the s, since ux m. Since x ϵ nd x consists of s the string ux 2 v = uxxv contins t lest m+1 s nd still m+1 s, so ux 2 v L 2, contrdicting the pumping lemm. Our second proof uses Myhill-Nerode. Let e right-invrint equivlence reltion of finite index such tht L 2 is the union of clsses of. If we consider the infinite sequence, 2,..., n,... since hs finite numer of clsses there re two strings m nd n with m<nsuch tht m n. By right-invrince y conctenting on the right with n we otin m n n n, nd since m<nwe hve m n L 2 ut n n / L 2, contrdiction. Let us now consider the lnguge L 3 = { m n m n}. This time let us egin y using Myhill-Nerode to prove tht L 3 is not regulr. The proof is the sme s efore, we otin m n n n, nd the contrdiction is tht m n L 3 nd n n / L 3.

118 118 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Let use now try to use the pumping lemm to prove tht L 3 is not regulr. For ny m 1 pick w = m m+1 L 3. As in the previous cse, every legl decomposition u, x, v of w is such tht w =... }{{}... }{{}... }{{} u x v where x ϵ nd x ends within the s, since ux m. However this time we hve prolem, nmely tht we know tht x is nonempty string of s ut we don t know how mny, so we cn t gurntee tht pumping up x will yield exctly the string m+1 m+1.wemdethe wrong choice for w. There is choice tht will work ut it is it tricky. Fortuntely, there is nother simpler pproch. Recll tht the regulr lnguges re closed under the oolen opertions (union, intersection nd complementtion). Thus, L 3 is not regulr iff its complement L 3 is not regulr. Oserve tht L 3 contins { n n n 1}, which we showed to e nonregulr. But there is nother prolem, which is tht L 3 contins other strings esides strings of the form n n, for exmple strings of the form m n with m, n > 0. Agin, we cn tke cre of this difficulty using the closure opertions of the regulr lnguges. If we cn find regulr lnguge R such tht L 3 R is not regulr, then L 3 itself is not regulr, since otherwise s L 3 nd R re regulr then L 3 R is lso regulr. In our cse, we cn use R = {} + {} + to otin L 3 {} + {} + = { n n n 1}. Since { n n n 1} is not regulr, we reched our finl contrdiction. Oserve how we use the lnguge R to clen up L 3 y intersecting it with R. To complete direct proof using the pumping lemm, the reder should try w = m! (m+1)!. The use of the closure opertions of the regulr lnguges is often quick wy of showing tht lnguge L is not regulr y reducing the prolem of proving tht L is not regulr to the prolem of proving tht some well-known lnguge is not regulr A Fst Algorithm for Checking Stte Equivlence Using Forwrd-Closure Given two sttes p, q Q, if p q, thenweknowthtδ(p, ) δ(q, ), for ll Σ. This suggests method for testing whether two distinct sttes p, q re equivlent. Strting with the reltion R = {(p, q)}, construct the smllest equivlence reltion R contining R with the property tht whenever (r, s) R,then(δ(r, ),δ(s, )) R, for ll Σ. If we ever encounter pir (r, s) suchthtr F nd s F,orr F nd s F,thenr nd s re inequivlent, nd so re p nd q. Otherwise, it cn e shown tht p nd q re indeed equivlent. Thus, testing for the equivlence of two sttes reduces to finding n efficient

119 5.12. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE 119 method for computing the forwrd closure of reltion defined on the set of sttes of DFA. Such method ws worked out y John Hopcroft nd Richrd Krp nd pulished in 1971 Cornell technicl report. This method is sed on n ide of Donld Knuth for solving Exercise 11, in Section of The Art of Computer Progrmming, Vol. 1, second edition, A sketch of the solution for this exercise is given on pge 594. As fr s I know, Hopcroft nd Krp s method ws never pulished in journl, ut simple recursive lgorithm does pper on pge 144 of Aho, Hopcroft nd Ullmn s The Design nd Anlysis of Computer Algorithms, first edition, Essentilly the sme ide ws used y Pterson nd Wegmn to design fst unifiction lgorithm (in 1978). We mke few definitions. A reltion S Q Q is forwrd closure iff it is n equivlence reltion nd whenever (r, s) S, then (δ(r, ), δ(s, )) S, for ll Σ. The forwrd closure of reltion R Q Q is the smllest equivlence reltion R contining R which is forwrd closed. We sy tht forwrd closure S is good iff whenever (r, s) S, thengood(r, s), where good(r, s) holds iff either oth r, s F,orothr, s / F. Oviously, d(r, s) iff good(r, s). Given ny reltion R Q Q, recll tht the smllest equivlence reltion R contining R is the reltion (R R 1 ) (where R 1 = {(q, p) (p, q) R}, nd(r R 1 ) is the reflexive nd trnsitive closure of (R R 1 )). The forwrd closure of R cn e computed inductively y defining the sequence of reltions R i Q Q s follows: R 0 = R R i+1 =(R i {(δ(r, ),δ(s, )) (r, s) R i, Σ}). It is not hrd to prove tht R i0 +1 = R i0 for some lest i 0,ndthtR = R i0 is the smllest forwrd closure contining R. The following two fcts cn lso een estlished. () if R is good, then R. (5.1) () if p q, then R, tht is, eqution (5.1) holds. This implies tht R is good. As consequence, we otin the correctness of our procedure: p q iff the forwrd closure R of the reltion R = {(p, q)} is good. In prctice, we mintin prtition Π representing the equivlence reltion tht we re closing under forwrd closure. We dd ech new pir (δ(r, ),δ(s, )) one t time, nd immeditely form the smllest equivlence reltion contining the new reltion. If δ(r, ) nd δ(s, ) lredy elong to the sme lock of Π, we consider nother pir, else wemerge the locks corresponding to δ(r, ) nd δ(s, ), nd then consider nother pir.

120 120 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S The lgorithm is recursive, ut it cn esily e implemented using stck. To mnipulte prtitions efficiently, we represent them s lists of trees (forests). Ech equivlence clss C in the prtition Π is represented y tree structure consisting of nodes nd prent pointers, with the pointers from the sons of node to the node itself. The root hs null pointer. Ech node lso mintins counter keeping trck of the numer of nodes in the sutree rooted t tht node. Note tht pointers cn e voided. We cn represent forest of n nodes s list of n pirs of the form (fther,count). If (fther,count) is the ith pir in the list, then fther =0 iff node i is root node, otherwise, fther is the index of the node in the list which is the prent of node i. The numer count is the totl numer of nodes in the tree rooted t the ith node. For exmple, the following list of nine nodes ((0, 3), (0, 2), (1, 1), (0, 2), (0, 2), (1, 1), (2, 1), (4, 1), (5, 1)) represents forest consisting of the following four trees: Figure 5.21: A forest of four trees Two functions union nd find re defined s follows. Given stte p, find(p, Π) finds the root of the tree contining p s node (not necessrily lef). Given two root nodes p, q, union(p, q, Π) forms new prtition y merging the two trees with roots p nd q s follows: if the counter of p is smller thn tht of q, then let the root of p point to q, else let the root of q point to p. For exmple, given the two trees shown on the left in Figure 5.22, find(6, Π) returns 3 nd find(8, Π) returns 4. Then union(3, 4, Π) yields the tree shown on the right in Figure Figure 5.22: Applying the function union to the trees rooted t 3 nd 4

121 5.12. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE 121 In order to speed up the lgorithm, using n ide due to Trjn, we cn modify find s follows: during cll find(p, Π), s we follow the pth from p to the root r of the tree contining p, we redirect the prent pointer of every node q on the pth from p (including p itself) to r (we perform pth compression). For exmple, pplying find(8, Π) to the tree shown on the right in Figure 5.22 yields the tree shown in Figure Figure 5.23: The result of pplying find with pth compression Then, the lgorithm is s follows:

122 122 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S function unif [p, q, Π,dd]: flg; egin end trns := left(dd); ff := right(dd); pq := (p, q); st := (pq); flg := 1; k := Length(first(trns)); while st () flg 0do uv := top(st); uu := left(uv); vv := right(uv); pop(st); if d(ff,uv)=1then flg := 0 else u := find(uu, Π); v := find(vv, Π); if u v then union(u, v, Π); for i =1to k do u1 := delt(trns, uu, k i +1);v1 := delt(trns, vv, k i +1); uv := (u1,v1); push(st, uv) endfor endif endif endwhile The initil prtition Π is the identity reltion on Q, i.e., it consists of locks {q} for ll sttes q Q. The lgorithm uses stck st. We re ssuming tht the DFA dd is specified y list of two sulists, the first list, denoted left(dd) in the pseudo-code ove, eing representtion of the trnsition function, nd the second one, denoted right(dd), the set of finl sttes. The trnsition function itself is list of lists, where the i-th list represents the i-th row of the trnsition tle for dd. The function delt is such tht delt(trns, i, j) returns the j-th stte in the i-th row of the trnsition tle of dd. For exmple, we hve the DFA dd =(((2, 3), (2, 4), (2, 3), (2, 5), (2, 3), (7, 6), (7, 8), (7, 9), (7, 6)), (5, 9)) consisting of 9 sttes leled 1,...,9, nd two finl sttes 5 nd 9 shown in Figure Also, the lphet hs two letters, since every row in the trnsition tle consists of two entries. For exmple, the two trnsitions from stte 3 re given y the pir (2, 3), which indictes tht δ(3,)= 2 nd δ(3,)= 3. The sequence of steps performed y the lgorithm strting with p =1ndq = 6 is shown elow. At every step, we show the current pir of sttes, the prtition, nd the stck.

123 5.12. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE Figure 5.24: Testing stte equivlence in DFA p =1,q =6, Π={{1, 6}, {2}, {3}, {4}, {5}, {7}, {8}, {9}}, st= {{1, 6}} Figure 5.25: Testing stte equivlence in DFA p =2,q =7, Π={{1, 6}, {2, 7}, {3}, {4}, {5}, {8}, {9}}, st= {{3, 6}, {2, 7}}

124 124 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S Figure 5.26: Testing stte equivlence in DFA p =4,q =8, Π={{1, 6}, {2, 7}, {3}, {4, 8}, {5}, {9}}, st= {{3, 6}, {4, 8}} Figure 5.27: Testing stte equivlence in DFA p =5,q =9, Π={{1, 6}, {2, 7}, {3}, {4, 8}, {5, 9}}, st= {{3, 6}, {5, 9}}

125 5.12. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE Figure 5.28: Testing stte equivlence in DFA p =3,q =6, Π={{1, 3, 6}, {2, 7}, {4, 8}, {5, 9}}, st= {{3, 6}, {3, 6}} Since sttes 3 nd 6 elong to the first lock of the prtition, the lgorithm termintes. Since no lock of the prtition contins d pir, the sttes p =1ndq = 6 re equivlent. Let us now test whether the sttes p =3ndq = 7 re equivlent Figure 5.29: Testing stte equivlence in DFA

126 126 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S p =3,q =7, Π={{1}, {2}, {3, 7}, {4}, {5}, {6}, {8}, {9}}, st= {{3, 7}} Figure 5.30: Testing stte equivlence in DFA p =2,q =7, Π={{1}, {2, 3, 7}, {4}, {5}, {6}, {8}, {9}}, st= {{3, 8}, {2, 7}} Figure 5.31: Testing stte equivlence in DFA

127 5.12. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE 127 p =4,q =8, Π={{1}, {2, 3, 7}, {4, 8}, {5}, {6}, {9}}, st= {{3, 8}, {4, 8}} Figure 5.32: Testing stte equivlence in DFA p =5,q =9, Π={{1}, {2, 3, 7}, {4, 8}, {5, 9}, {6}}, st= {{3, 8}, {5, 9}} Figure 5.33: Testing stte equivlence in DFA

128 128 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S p =3,q =6, Π={{1}, {2, 3, 6, 7}, {4, 8}, {5, 9}}, st= {{3, 8}, {3, 6}} Figure 5.34: Testing stte equivlence in DFA p =3,q =8, Π={{1}, {2, 3, 4, 6, 7, 8}, {5, 9}}, st= {{3, 8}} Figure 5.35: Testing stte equivlence in DFA

129 5.12. A FAST ALGORITHM FOR CHECKING STATE EQUIVALENCE 129 p =3,q =9, Π={{1}, {2, 3, 4, 6, 7, 8}, {5, 9}}, st= {{3, 9}} Since the pir (3, 9) is d pir, the lgorithm stops, nd the sttes p =3ndq =7 re inequivlent.

130 130 CHAPTER 5. REGULAR LANGUAGES, MINIMIZATION OF DFA S

131 Chpter 6 Context-Free Grmmrs, Context-Free Lnguges, Prse Trees nd Ogden s Lemm 6.1 Context-Free Grmmrs A context-free grmmr siclly consists of finite set of grmmr rules. In order to define grmmr rules, we ssume tht we hve two kinds of symols: the terminls, which re the symols of the lphet underlying the lnguges under considertion, nd the nonterminls, which ehve like vriles rnging over strings of terminls. A rule is of theforma α, where A is single nonterminl, nd the right-hnd side α is string of terminl nd/or nonterminl symols. As usul, first we need to define wht the oject is ( context-free grmmr), nd then we need to explin how it is used. Unlike utomt, grmmrsreused to generte strings, rther thn recognize strings. Definition 6.1. A context-free grmmr (for short, CFG) is qudruple G =(V,Σ,P,S), where V is finite set of symols clled the voculry (or set of grmmr symols); Σ V is the set of terminl symols (for short, terminls); S (V Σ) is designted symol clled the strt symol; P (V Σ) V is finite set of productions (or rewrite rules, or rules). The set N = V Σ is clled the set of nonterminl symols (for short, nonterminls). Thus, P N V, nd every production A, α is lso denoted s A α. A production of the form A ϵ is clled n epsilon rule, or null rule. 131

132 132 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Remrk : Context-free grmmrs re sometimes defined s G = (V N,V T,P,S). The correspondence with our definition is tht Σ = V T nd N = V N,sothtV = V N V T.Thus, in this other definition, it is necessry to ssume tht V T V N =. Exmple 1. G 1 =({E,,}, {, },P,E), where P is the set of rules E E, E. As we will see shortly, this grmmr genertes the lnguge L 1 = { n n n 1}, which is not regulr. Exmple 2. G 2 =({E,+,, (, ),}, {+,, (, ),},P,E), where P is the set of rules E E + E, E E E, E (E), E. This grmmr genertes set of rithmetic expressions. 6.2 Derivtions nd Context-Free Lnguges The productions of grmmr re used to derive strings. In this process, the productions re used s rewrite rules. Formlly, we define the derivtion reltion ssocited with context-free grmmr. First, let us review the concepts of trnsitive closure nd reflexive nd trnsitive closure of inry reltion. Given set A, inry reltion R on A is ny set of ordered pirs, i.e. R A A. For short, insted of inry reltion, we often simply sy reltion. Given ny two reltions R, S on A, their composition R S is defined s R S = {(x, y) A A z A, (x, z) R nd (z, y) S}. The identity reltion I A on A is the reltion I A defined such tht For short, we often denote I A s I. Notetht I A = {(x, x) x A}. R I = I R = R for every reltion R on A. Given reltion R on A, fornyn 0wedefineR n s follows: R 0 = I, R n+1 = R n R.

133 6.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 133 It is ovious tht R 1 = R. It is lso esily verified y induction tht R n R = R R n. The trnsitive closure R + of the reltion R is defined s R + = n 1 R n. It is esily verified tht R + is the smllest trnsitive reltion contining R, ndtht (x, y) R + iff there is some n 1ndsomex 0,x 1,...,x n A such tht x 0 = x, x n = y, nd (x i,x i+1 ) R for ll i, 0 i n 1. The trnsitive nd reflexive closure R of the reltion R is defined s R = n 0 R n. Clerly, R = R + I. It is esily verified tht R is the smllest trnsitive nd reflexive reltion contining R. Definition 6.2. Given context-free grmmr G =(V,Σ,P,S), the (one-step) derivtion reltion = G ssocited with G is the inry reltion = G V V defined s follows: for ll α, β V,wehve α = G β iff there exist λ, ρ V, nd some production (A γ) P,suchtht α = λaρ nd β = λγρ. + The trnsitive closure of = G is denoted s = G nd the reflexive nd trnsitive closure of = G is denoted s = G. When the grmmr G is cler from the context, we usully omit the suscript G in = G, + = G,nd= G. A string α V such tht S = α is clled sententil form, nd string w Σ such tht S = w is clled sentence. A derivtion α = β involving n steps is denoted s α = n β. Note tht derivtion step α = G β is rther nondeterministic. Indeed, one cn choose mong vrious occurrences of nonterminls A in α, nd lso mong vrious productions A γ with left-hnd side A. For exmple, using the grmmr G 1 =({E,,}, {, },P,E), where P is the set of rules E E, E,

134 134 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES every derivtion from E is of the form E = n E n = n n = n+1 n+1, or E = n E n = n E n = n+1 E n+1, where n 0. Grmmr G 1 is very simple: every string n n hs unique derivtion. This is usully not the cse. For exmple, using the grmmr G 2 =({E,+,, (, ),}, {+,, (, ),},P,E), where P is the set of rules E E + E, E E E, E (E), E, the string + hs the following distinct derivtions, where the oldfce indictes which occurrence of E is rewritten: nd E = E E = E + E E = + E E = + E = +, E = E + E = + E = + E E = + E = +. In the ove derivtions, the leftmost occurrence of nonterminl is chosen t ech step. Such derivtions re clled leftmost derivtions. We could systemticlly rewrite the rightmost occurrence of nonterminl, getting rightmost derivtions. The string + lso hs the following two rightmost derivtions, where the oldfce indictes which occurrence of E is rewritten: nd E = E + E = E + E E = E + E = E + = +, E = E E = E = E + E = E + = +. The lnguge generted y context-free grmmr is defined s follows.

135 6.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 135 Definition 6.3. Given context-free grmmr G =(V,Σ,P,S), the lnguge generted y G is the set L(G) ={w Σ S = + w}. A lnguge L Σ is context-free lnguge (for short, CFL) iff L = L(G) forsome context-free grmmr G. It is techniclly very useful to consider derivtions in which the leftmost nonterminl is lwys selected for rewriting, nd dully, derivtions in which the rightmost nonterminl is lwys selected for rewriting. Definition 6.4. Given context-free grmmr G =(V,Σ,P,S), the (one-step) leftmost derivtion reltion = ssocited with G is the inry reltion = V V defined s lm lm follows: for ll α, β V,wehve α = β lm iff there exist u Σ, ρ V, nd some production (A γ) P,suchtht The trnsitive closure of = is denoted s = lm is denoted s lm G is the inry reltion = rm α = uaρ nd β = uγρ. + = nd the reflexive nd trnsitive closure of lm ssocited with =. The (one-step) rightmost derivtion reltion = lm rm V V defined s follows: for ll α, β V,wehve α = rm iff there exist λ V, v Σ, nd some production (A γ) P,suchtht α = λav nd β = λγv. β The trnsitive closure of = = rm is denoted s = rm. rm is denoted s + = nd the reflexive nd trnsitive closure of rm Remrks: It is customry to use the symols,, c, d, e for terminl symols, nd the symols A, B, C, D, E for nonterminl symols. The symols u, v, w, x, y, z denote terminl strings, nd the symols α, β, γ, λ, ρ, µ denote strings in V. The symols X, Y, Z usully denote symols in V. Given context-free grmmr G =(V,Σ,P,S), prsing string w consists in finding out whether w L(G), nd if so, in producing derivtion for w. The following proposition is techniclly very importnt. It shows tht leftmost nd rightmost derivtions re universl. This hs some importnt prcticl implictions for the complexity of prsing lgorithms.

136 136 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Proposition 6.1. Let G =(V,Σ,P,S) e context-free grmmr. For every w Σ,for every derivtion S = + w, thereisleftmostderivtions = + w,ndthereisrightmost lm derivtion S = + w. rm Proof. Of course, we hve to somehow use induction on derivtions, ut this is little tricky, nd it is necessry to prove stronger fct. We tret leftmost derivtions, rightmost derivtions eing hndled in similr wy. Clim: Foreveryw Σ,foreveryα V +,foreveryn 1, if α = n w, then there is leftmost derivtion α = n lm w. The clim is proved y induction on n. For n = 1, there exist some λ, ρ V nd some production A γ, suchthtα = λaρ nd w = λγρ. Since w is terminl string, λ, ρ, ndγ, re terminl strings. Thus, A is the only nonterminl in α, nd the derivtion step α = 1 w is leftmost step (nd rightmost step!). If n>1, then the derivtion α There re two sucses. n = w is of the form α = α 1 n 1 = w. Cse 1. If the derivtion step α = α 1 is leftmost step α = α 1, y the induction hypothesis, there is leftmost derivtion α 1 α = n 1 = lm n 1 α 1 = w. lm lm lm w, nd we get the leftmost derivtion Cse 2. The derivtion step α = α 1 is not leftmost step. In this cse, there must e some u Σ, µ, ρ V, some nonterminls A nd B, nd some production B δ, such tht α = uaµbρ nd α 1 = uaµδρ, where A is the leftmost nonterminl in α. Since we hve derivtion α 1 n 1, y the induction hypothesis, there is leftmost derivtion n 1 = w of length α 1 n 1 = lm w. Since α 1 = uaµδρ where A is the leftmost terminl in α 1, the first step in the leftmost n 1 derivtion α 1 w is of the form = lm uaµδρ = lm uγµδρ,

137 6.2. DERIVATIONS AND CONTEXT-FREE LANGUAGES 137 for some production A γ. Thus, we hve derivtion of the form α = uaµbρ = uaµδρ = lm uγµδρ n 2 = lm w. We cn commute the first two steps involving the productions B δ nd A γ, ndwe get the derivtion α = uaµbρ = uγµbρ = uγµδρ n 2 = w. lm lm This my no longer e leftmost derivtion, ut the first step is leftmost, nd we re ck in cse 1. Thus, we conclude y pplying the induction hypothesis to the derivtion uγµbρ = n 1 w, s in cse 1. Proposition 6.1 implies tht L(G) ={w Σ S = + w} = {w Σ S = + lm rm We oserved tht if we consider the grmmr G 2 =({E,+,, (, ),}, {+,, (, ),},P,E), where P is the set of rules E E + E, E E E, E (E), E, the string + hs the following two distinct leftmost derivtions, where the oldfce indictes which occurrence of E is rewritten: nd E = E E = E + E E = + E E = + E = +, E = E + E = + E = + E E = + E = +. When this hppens, we sy tht we hve n miguous grmmrs. In some cses, it is possile to modify grmmr to mke it unmiguous. For exmple, the grmmrg 2 cn e modified s follows. Let G 3 =({E,T,F,+,, (, ),}, {+,, (, ),},P,E), where P is the set of rules E E + T, E T, T T F, T F, F (E), F. w}.

138 138 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES We leve s n exercise to show tht L(G 3 )=L(G 2 ), nd tht every string in L(G 3 )hs unique leftmost derivtion. Unfortuntely, it is not lwys possile to modify contextfree grmmr to mke it unmiguous. There exist context-free lnguges tht hve no unmiguous context-free grmmrs. For exmple, the lnguge L 3 = { m m c n m, n 1} { m n c n m, n 1} is context-free, since it is generted y the following context-free grmmr: S S 1, S S 2, S 1 XC, S 2 AY, X X, X, Y Y c, Y c, A A, A, C cc, C c. However, it cn e shown tht L 3 hs no unmiguous grmmrs. All this motivtes the following definition. Definition 6.5. Acontext-freegrmmrG =(V,Σ,P,S) is miguous if there is some string w L(G) tht hs two distinct leftmost derivtions (or two distinct rightmost derivtions). Thus, grmmr G is unmiguous if every string w L(G) hs unique leftmost derivtion (or unique rightmost derivtion). A context-free lnguge L is inherently miguous if every CFG G for L is miguous. Whether or not grmmr is miguous ffects the complexity of prsing. Prsing lgorithms for unmiguous grmmrs re more efficient thn prsing lgorithms for miguous grmmrs. We now consider vrious norml forms for context-free grmmrs. 6.3 Norml Forms for Context-Free Grmmrs, Chomsky Norml Form One of the min gols of this section is to show tht every CFG G cn e converted to n equivlent grmmr in Chomsky Norml Form (for short, CNF). A context-free grmmr

139 6.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 139 G =(V,Σ,P,S) is in Chomsky Norml Form iff its productions re of the form A BC, A, S ϵ, where A, B, C N, Σ, S ϵ is in P iff ϵ L(G), nd S does not occur on the right-hnd side of ny production. Note tht grmmr in Chomsky Norml Form does not hve ϵ-rules, i.e., rules of the form A ϵ, exceptwhenϵ L(G), in which cse S ϵ is the only ϵ-rule. It lso does not hve chin rules, i.e., rules of the form A B, wherea, B N. Thus, in order to convert grmmr to Chomsky Norml Form, we need to show how to eliminte ϵ-rules nd chin rules. This is not the end of the story, since we my still hve rules of the form A α where either α 3 or α 2 nd α contins terminls. However, deling with such rules is simple recoding mtter, nd we first focus on the elimintion of ϵ-rules nd chin rules. It turns out tht ϵ-rules must e eliminted first. The first step to eliminte ϵ-rules is to compute the set E(G) ofersle (or nullle) nonterminls E(G) ={A N A = + ϵ}. The set E(G) is computed using sequence of pproximtions E i defined s follows: E 0 = {A N (A ϵ) P }, E i+1 = E i {A (A B 1...B j...b k ) P, B j E i, 1 j k}. Clerly, the E i form n scending chin or E 0 E 1 E i E i+1 N, nd since N is finite, there is lest i, sy i 0, such tht E i0 = E i0 +1. We clim tht E(G) =E i0. Actully, we prove the following proposition. Proposition 6.2. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G =(V, Σ,P,S ) such tht: (1) L(G )=L(G); (2) P contins no ϵ-rules other thn S ϵ, nds ϵ P iff ϵ L(G); (3) S does not occur on the right-hnd side of ny production in P. Proof. We egin y proving tht E(G) =E i0. For this, we prove tht E(G) E i0 nd E i0 E(G). To prove tht E i0 E(G), we proceed y induction on i. Since E 0 = {A N (A ϵ) P }, we hve A = 1 ϵ, nd thus A E(G). By the induction hypothesis, E i

140 140 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES E(G). If A E i+1, either A E i nd then A E(G), or there is some production (A B 1...B j...b k ) P,suchthtB j E i for ll j, 1 j k. By the induction hypothesis, B + j = ϵ for ech j, 1 j k, ndthus which shows tht A E(G). A = B 1...B j...b k + = B 2...B j...b k + = B j...b k + = ϵ, To prove tht E(G) E i0, we lso proceed y induction, ut on the length of derivtion A = + ϵ. IfA = 1 ϵ, thena ϵ P,ndthusA E 0 since E 0 = {A N (A ϵ) P }. If A n+1 = ϵ, then A = α n = ϵ, for some production A α P.Ifαcontins terminls of nonterminls not in E(G), it is impossile to derive ϵ from α, ndthus,wemusthveα = B 1...B j...b k, with B j E(G), n j for ll j, 1 j k. However, B j = ϵ where nj n, nd y the induction hypothesis, B j E i0.butthen,wegeta E i0 +1 = E i0, s desired. Hving shown tht E(G) =E i0,weconstructthegrmmrg. Its set of production P is defined s follows. First, we crete the production S S where S / V,tomkesure tht S does not occur on the right-hnd side of ny rule in P.Let nd let P 2 e the set of productions P 1 = {A α P α V + } {S S}, P 2 = {A α 1 α 2...α k α k+1 α 1 V,..., α k+1 V, B 1 E(G),..., B k E(G) A α 1 B 1 α 2...α k B k α k+1 P, k 1, α 1...α k+1 ϵ}. Note tht ϵ L(G) iff S E(G). If S/ E(G), then let P = P 1 P 2, nd if S E(G), then let P = P 1 P 2 {S ϵ}. We clim tht L(G )=L(G), which is proved y showing tht every derivtion using G cn e simulted y derivtion using G, nd vice-vers. All the conditions of the proposition re now met. From prcticl point of view, the construction or Proposition 6.2 is very costly. For exmple, given grmmr contining the productions S ABCDEF, A ϵ, B ϵ, C ϵ, D ϵ, E ϵ, F ϵ,......,

141 6.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 141 eliminting ϵ-rules will crete = 63 new rules corresponding to the 63 nonempty susets of the set {A, B, C, D, E, F }. We now turn to the elimintion of chin rules. It turns out tht mtters re gretly simplified if we first pply Proposition 6.2 to the input grmmr G, nd we explin the construction ssuming tht G =(V,Σ,P,S) stisfies the conditions of Proposition 6.2. For every nonterminl A N, we define the set I A = {B N A + = B}. The sets I A re computed using pproximtions I A,i defined s follows: I A,0 = {B N (A B) P }, I A,i+1 = I A,i {C N (B C) P, nd B I A,i }. Clerly, for every A N, thei A,i form n scending chin I A,0 I A,1 I A,i I A,i+1 N, nd since N is finite, there is lest i, syi 0,suchthtI A,i0 = I A,i0 +1. We clim tht I A = I A,i0. Actully, we prove the following proposition. Proposition 6.3. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G =(V, Σ,P,S ) such tht: (1) L(G )=L(G); (2) Every rule in P is of the form A α where α 2, ora where Σ, or S ϵ iff ϵ L(G); (3) S does not occur on the right-hnd side of ny production in P. Proof. First, we pply Proposition 6.2 to the grmmr G, otining grmmr G 1 = (V 1, Σ,S 1, P 1 ). The proof tht I A = I A,i0 is similr to the proof tht E(G) =E i0. First, we prove tht I A,i I A y induction on i. This is stightforwrd. Next, we prove tht I A I A,i0 y induction on derivtions of the form A = + B. In this prt of the proof, we use the fct tht G 1 hs no ϵ-rules except perhps S 1 ϵ, ndthts 1 does not occur on the right-hnd side of ny rule. This implies tht derivtion A = n+1 C is necessrily of the form A = n B = C for some B N. Then, in the induction step, we hve B I A,i0,nd thus C I A,i0 +1 = I A,i0. We now define the following sets of rules. Let nd let P 2 = P 1 {A B A B P 1 }, P 3 = {A α B α P 1,α/ N 1,B I A }. We clim tht G = (V 1, Σ,P 2 P 3,S 1 ) stisfies the conditions of the proposition. For exmple, S 1 does not pper on the right-hnd side of ny production, since the productions in P 3 hve right-hnd sides from P 1,ndS 1 does not pper on the right-hnd side in P 1. It is lso esily shown tht L(G )=L(G 1 )=L(G).

142 142 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Let us pply the method of Proposition 6.3 to the grmmr G 3 =({E,T,F,+,, (, ),},{+,, (, ),},P,E), where P is the set of rules E E + T, E T, T T F, T F, F (E), F. We get I E = {T,F}, I T = {F }, ndi F =. ThenewgrmmrG 3 hs the set of rules E E + T, E T F, E (E), E, T T F, T (E), T, F (E), F. At this stge, the grmmr otined in Proposition 6.3 no longer hs ϵ-rules (except perhps S ϵ iff ϵ L(G)) or chin rules. However, it my contin rules A α with α 3, or with α 2 nd where α contins terminls(s). To otin the Chomsky Norml Form. we need to eliminte such rules. This is not difficult, ut nottionlly it messy. Proposition 6.4. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G =(V, Σ,P,S ) such tht L(G )=L(G) nd G is in Chomsky Norml Form, tht is, grmmr whose productions re of the form A BC, A, S ϵ, where A, B, C N, Σ, S ϵ is in P iff ϵ L(G), nds does not occur on the right-hnd side of ny production in P. or

143 6.3. NORMAL FORMS FOR CONTEXT-FREE GRAMMARS 143 Proof. First, we pply Proposition 6.3, otining G 1.LetΣ r e the set of terminls occurring on the right-hnd side of rules A α P 1, with α 2. For every Σ r, let X e new nonterminl not in V 1.Let Let P 1,r e the set of productions P 2 = {X Σ r }. A α 1 1 α 2 α k k α k+1, where 1,..., k Σ r nd α i N1. For every production A α 1 1 α 2 α k k α k+1 in P 1,r, let A α 1 X 1 α 2 α k X k α k+1 e new production, nd let P 3 e the set of ll such productions. Let P 4 =(P 1 P 1,r ) P 2 P 3. Now, productions A α in P 4 with α 2 do not contin terminls. However, we my still hve productions A α P 4 with α 3. We cn perform some recoding using some new nonterminls. For every production of the form A B 1 B k, where k 3, crete the new nonterminls [B 1 B k 1 ], [B 1 B k 2 ],, [B 1 B 2 B 3 ], [B 1 B 2 ], nd the new productions A [B 1 B k 1 ]B k, [B 1 B k 1 ] [B 1 B k 2 ]B k 1,, [B 1 B 2 B 3 ] [B 1 B 2 ]B 3, [B 1 B 2 ] B 1 B 2. All the productions re now in Chomsky Norml Form, nd it is cler tht the sme lnguge is generted. Applying the first phse of the method of Proposition 6.4 to the grmmr G 3,wegetthe

144 144 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES rules E EX + T, E TX F, E X ( EX ), E, T TX F, T X ( EX ), T, F X ( EX ), F, X + +, X, X ( (, X ) ). After pplying the second phse of the method, we get the following grmmr in Chomsky Norml Form: E [EX + ]T, [EX + ] EX +, E [TX ]F, [TX ] TX, E [X ( E]X ), [X ( E] X ( E, E, T [TX ]F, T [X ( E]X ), T, F [X ( E]X ), F, X + +, X, X ( (, X ) ). For lrge grmmrs, it is often convenient to use the revition which consists in grouping productions hving common left-hnd side, nd listing the right-hnd sides seprted

145 6.4. REGULAR LANGUAGES ARE CONTEXT-FREE 145 y the symol. Thus, group of productions A α 1, A α 2,, A α k, my e revited s A α 1 α 2 α k. An interesting corollry of the CNF is the following decidility result. There is n lgorithm which, given context-free grmmr G, given ny string w Σ, decides whether w L(G). Indeed, we first convert G to grmmr G in Chomsky Norml Form. If w = ϵ, we cn test whether ϵ L(G), since this is the cse iff S ϵ P.Ifw ϵ, letting n = w, note tht since the rules re of the form A BC or A, where Σ, ny derivtion for w hs n 1+n =2n 1 steps. Thus, we enumerte ll (leftmost) derivtions of length 2n 1. There re much etter prsing lgorithms thn this nive lgorithm. We now show tht every regulr lnguge is context-free. 6.4 Regulr Lnguges re Context-Free The regulr lnguges cn e chrcterized in terms of very specil kinds of context-free grmmrs, right-liner (nd left-liner) context-free grmmrs. Definition 6.6. Acontext-freegrmmrG =(V,Σ,P,S) is left-liner iff its productions re of the form A B, A, A ϵ. where A, B N, nd Σ. A context-free grmmr G =(V,Σ,P,S) is right-liner iff its productions re of the form where A, B N, nd Σ. A B, A, A ϵ. The following proposition shows the equivlence etween NFA s nd right-liner grmmrs.

146 146 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Proposition 6.5. AlngugeL is regulr if nd only if it is generted y some right-liner grmmr. Proof. Let L = L(D) forsomedfad =(Q, Σ,δ,q 0,F). We construct right-liner grmmr G s follows. Let V = Q Σ, S = q 0, nd let P e defined s follows: P = {p q q = δ(p, ), p,q Q, Σ} {p ϵ p F }. It is esily shown y induction on the length of w tht nd thus, L(D) =L(G). p = wq iff q = δ (p, w), Conversely, let G =(V,Σ,P,S) e right-liner grmmr. First, let G =(V, Σ,P,S)e the right-liner grmmr otined from G y dding the new nonterminl E to N, replcing every rule in P of the form A where Σ y the rule A E, nd dding the rule E ϵ. It is immeditely verified tht L(G )=L(G). Next, we construct the NFA M =(Q, Σ,δ,q 0,F) s follows: Q = N = N {E}, q 0 = S, F = {A N A ϵ}, nd δ(a, ) ={B N A B P }, for ll A N nd ll Σ. It is esily shown y induction on the length of w tht A nd thus, L(M) =L(G )=L(G). = wb iff B δ (A, w), A similr proposition holds for left-liner grmmrs. It is lso esily shown tht the regulr lnguges re exctly the lnguges generted y context-free grmmrs whose rules re of the form where A, B N, ndu Σ. A Bu, A u, 6.5 Useless Productions in Context-Free Grmmrs Given context-free grmmr G =(V,Σ,P,S), it my contin rules tht re useless for numer of resons. For exmple, consider the grmmr G 3 =({E,A,,}, {, },P,E), where P is the set of rules E E, E, E A, A A.

147 6.5. USELESS PRODUCTIONS IN CONTEXT-FREE GRAMMARS 147 The prolem is tht the nonterminl A does not derive ny terminl strings, nd thus, it is useless, s well s the lst two productions. Let us now consider the grmmr G 4 = ({E,A,,,c,d}, {,, c, d},p,e), where P is the set of rules E E, E, A cad, A cd. This time, the nonterminl A genertes strings of the form c n d n, ut there is no derivtion E = + α from E where A occurs in α. The nonterminl A is not connected to E, nd the lst two rules re useless. Fortuntely, it is possile to find such useless rules, nd to eliminte them. Let T (G) e the set of nonterminls tht ctully derive some terminl string, i.e. T (G) ={A (V Σ) w Σ,A= + w}. The set T (G) cnedefinedystges.wedefinethesetst n (n 1) s follows: nd T 1 = {A (V Σ) (A w) P, with w Σ }, T n+1 = T n {A (V Σ) (A β) P, with β (T n Σ) }. It is esy to prove tht there is some lest n such tht T n+1 = T n, nd tht for this n, T (G) =T n. If S/ T (G), then L(G) =, ndg is equivlent to the trivil grmmr G =({S}, Σ,,S). If S T (G), then let U(G) e the set of nonterminls tht re ctully useful, i.e., U(G) ={A T (G) α, β (T (G) Σ), S = αaβ}. The set U(G) cn lso e computed y stges. We define the sets U n (n 1) s follows: nd U 1 = {A T (G) (S αaβ) P, with α, β (T (G) Σ) }, U n+1 = U n {B T (G) (A αbβ) P, with A U n,α,β (T (G) Σ) }. It is esy to prove tht there is some lest n such tht U n+1 = U n, nd tht for this n, U(G) =U n {S}. Then, we cn use U(G) totrnsformg into n equivlent CFG in

148 148 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES which every nonterminl is useful (i.e., for which V Σ= U(G)). Indeed, simply delete ll rules contining symols not in U(G). The detils re left s n exercise. We sy tht context-free grmmr G is reduced if ll its nonterminls re useful, i.e., N = U(G). It should e noted thn lthough dull, the ove considertions re importnt in prctice. Certin lgorithms for constructing prsers, for exmple, LR-prsers, my loop if useless rules re not eliminted! We now consider nother norml form for context-free grmmrs, the Greich Norml Form. 6.6 The Greich Norml Form Every CFG G cn lso e converted to n equivlent grmmr in Greich Norml Form (for short, GNF). Acontext-freegrmmrG =(V,Σ,P,S) is in Greich Norml Form iff its productions re of the form A BC, A B, A, S ϵ, where A, B, C N, Σ, S ϵ is in P iff ϵ L(G), nd S does not occur on the right-hnd side of ny production. Note tht grmmr in Greich Norml Form does not hve ϵ-rules other thn possily S ϵ. More importntly, except for the specil rule S ϵ, every rule produces some terminl symol. An importnt consequence of the Greich Norml Form is tht every nonterminl is not left recursive. A nonterminl A is left recursive iff A = + Aα for some α V. Left recursive nonterminls cuse top-down determinitic prsers to loop. The Greich Norml Form provides wy of voiding this prolem. There re no esy proofs tht every CFG cn e converted to Greich Norml Form. A prticulrly elegnt method due to Rosenkrntz using lest fixed-points nd mtrices will e given in section 6.9. Proposition 6.6. Given context-free grmmr G = (V,Σ,P,S), one cn construct context-free grmmr G =(V, Σ,P,S ) such tht L(G )=L(G) nd G is in Greich Norml Form, tht is, grmmr whose productions re of the form A BC, A B, A, S ϵ, or or

149 6.7. LEAST FIXED-POINTS 149 where A, B, C N, Σ, S ϵ is in P iff ϵ L(G), nds does not occur on the right-hnd side of ny production in P. 6.7 Lest Fixed-Points Context-free lnguges cn lso e chrcterized s lest fixed-points of certin functions induced y grmmrs. This chrcteriztion yields rther quick proof tht every contextfree grmmr cn e converted to Greich Norml Form. This chrcteriztion lso revels very clerly the recursive nture of the context-free lnguges. We egin y reviewing wht we need from the theory of prtilly ordered sets. Definition 6.7. Given prtilly ordered set A,, nω-chin ( n ) n 0 is sequence such tht n n+1 for ll n 0. The lest-upper ound of n ω-chin ( n ) is n element A such tht: (1) n, for ll n 0; (2) For ny A, if n, for ll n 0, then. A prtilly ordered set A, is n ω-chin complete poset iff it hs lest element, nd iff every ω-chin hs lest upper ound denoted s n. Remrk : The ω in ω-chin mens tht we re considering countle chins (ω is the ordinl ssocited with the order-type of the set of nturl numers). This nottion my seem rcne, ut is stndrd in denottionl semntics. For exmple, given ny set X, the power set 2 X ordered y inclusion is n ω-chin complete poset with lest element. The Crtesin product 2 } X 2 {{ X } ordered such n tht (A 1,...,A n ) (B 1,...,B n ) iff A i B i (where A i,b i 2 X ) is n ω-chin complete poset with lest element (,..., ). We re interested in functions etween prtilly ordered sets. Definition 6.8. Given ny two prtilly ordered sets A 1, 1 nd A 2, 2, function f : A 1 A 2 is monotonic iff for ll x, y A 1, x 1 y implies tht f(x) 2 f(y). If A 1, 1 nd A 2, 2 re ω-chin complete posets, function f : A 1 A 2 is ω-continuous iff it is monotonic, nd for every ω-chin ( n ), f( n )= f( n ).

150 150 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Remrk : Note tht we re not requiring tht n ω-continuous function f : A 1 A 2 preserve lest elements, i.e., it is possile tht f( 1 ) 2. We now define the crucil concept of lest fixed-point. Definition 6.9. Let A, e prtilly ordered set, nd let f : A A e function. A fixed-point of f is n element A such tht f() =. The lest fixed-point of f is n element A such tht f() =, ndforevery A such tht f() =, then. The following proposition gives sufficient conditions for the existence of lest fixed-points. It is one of the key propositions in denottionl semntics. Proposition 6.7. Let A, e n ω-chin complete poset with lest element. Every ω-continuous function f : A A hs unique lest fixed-point x 0 given y x 0 = f n ( ). Furthermore, for ny A such tht f(), thenx 0. Proof. First, we prove tht the sequence,f( ),f 2 ( ),...,f n ( ),... is n ω-chin. This is shown y induction on n. Since is the lest element of A, wehve f( ). Assuming y induction tht f n ( ) f n+1 ( ), since f is ω-continuous, it is monotonic, nd thus we get f n+1 ( ) f n+2 ( ), s desired. Since A is n ω-chin complete poset, the ω-chin (f n ( )) hs lest upper ound Since f is ω-continuous, we hve nd x 0 is indeed fixed-point of f. x 0 = f n ( ). f(x 0 )=f( f n ( )) = f(f n ( )) = f n+1 ( ) =x 0, Clerly, if f() implies tht x 0, thenf() = implies tht x 0. Thus,ssume tht f() for some A. We prove y induction of n tht f n ( ). Indeed,, since is the lest element of A. Assuming y induction tht f n ( ), y monotonicity of f, weget f(f n ( )) f(), nd since f(), this yields Since f n ( ) for ll n 0, we hve f n+1 ( ). x 0 = f n ( ).

151 6.8. CONTEXT-FREE LANGUAGES AS LEAST FIXED-POINTS 151 The second prt of Proposition 6.7 is very useful to prove tht functions hve the sme lest fixed-point. For exmple, under the conditions of Proposition 6.7, if g : A A is nother ω-chin continuous function, letting x 0 e the lest fixed-point of f nd y 0 e the lest fixed-point of g, if f(y 0 ) y 0 nd g(x 0 ) x 0,wecndeducethtx 0 = y 0. Indeed, since f(y 0 ) y 0 nd x 0 is the lest fixed-point of f, wegetx 0 y 0, nd since g(x 0 ) x 0 nd y 0 is the lest fixed-point of g, wegety 0 x 0,ndthereforex 0 = y 0. Proposition 6.7 lso shows tht the lest fixed-point x 0 of f cn e pproximted s much s desired, using the sequence (f n ( )). We will now pply this fct to context-free grmmrs. For this, we need to show how context-free grmmr G =(V,Σ,P,S) with m nonterminls induces n ω-continuous mp Φ G : } 2 Σ 2 {{ Σ } 2 } Σ 2 {{ Σ }. m m 6.8 Context-Free Lnguges s Lest Fixed-Points Given context-free grmmr G =(V,Σ,P,S) with m nonterminls A 1,...A m, grouping ll the productions hving the sme left-hnd side, the grmmr G cn e concisely written s A 1 α 1,1 + + α 1,n1, A i α i,1 + + α i,ni, A m α m,1 + + α m,nn. Given ny set A, let P fin (A) e the set of finite susets of A. Definition Let G =(V,Σ,P,S) e context-free grmmr with m nonterminls A 1,..., A m.fornym-tuple Λ = (L 1,...,L m ) of lnguges L i Σ, we define the function inductively s follows: Φ[Λ]( ) =, Φ[Λ]({ϵ}) ={ϵ}, Φ[Λ]({}) ={}, if Σ, Φ[Λ]({A i })=L i, if A i N, Φ[Λ]: P fin (V ) 2 Σ Φ[Λ]({αX}) = Φ[Λ]({α})Φ[Λ]({X}), if α V +, X V, Φ[Λ](Q {α}) = Φ[Λ](Q) Φ[Λ]({α}), if Q P fin (V ), Q, α V, α/ Q.

152 152 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Then, writing the grmmr G s A 1 α 1,1 + + α 1,n1, A i α i,1 + + α i,ni, A m α m,1 + + α m,nn, we define the mp such tht Φ G : 2 Σ 2 Σ }{{} m 2 Σ 2 Σ }{{} m Φ G (L 1,...L m ) = (Φ[Λ]({α 1,1,...,α 1,n1 }),...,Φ[Λ]({α m,1,...,α m,nm })) for ll Λ = (L 1,...,L m ) 2 } Σ 2 {{ Σ }. m One should verify tht the mp Φ[Λ] is well defined, ut this is esy. proposition is esily shown: The following Proposition 6.8. Given context-free grmmr G =(V,Σ,P,S) with m nonterminls A 1,..., A m,themp Φ G : } 2 Σ 2 {{ Σ } 2 } Σ 2 {{ Σ } m m is ω-continuous. Now, 2 } Σ 2 {{ Σ } is n ω-chin complete poset, nd the mp Φ G is ω-continous. Thus, m y Proposition 6.7, the mp Φ G hs lest-fixed point. It turns out tht the components of this lest fixed-point re precisely the lnguges generted y the grmmrs (V,Σ,P,A i ). Before proving this fct, let us give n exmple illustrting it. Exmple. Consider the grmmr G =({A, B,, }, {, },P,A) defined y the rules A BB +, B B +. The lest fixed-point of Φ G is the lest upper ound of the chin (Φ n G (, )) = ((Φn G,A (, ), Φn G,B (, )), where Φ 0 G,A (, ) =Φ0 G,B (, ) =,

153 6.8. CONTEXT-FREE LANGUAGES AS LEAST FIXED-POINTS 153 nd It is esy to verify tht Φ n+1 G,A (, ) =Φn G,B(, )Φ n G,B(, ) {}, Φ n+1 G,B (, ) =Φn G,B (, ) {}. Φ 1 G,A (, ) ={}, Φ 1 G,B (, ) ={}, Φ 2 G,A (, ) ={, }, Φ 2 G,B (, ) ={, }, Φ 3 G,A (, ) ={,,,, }, Φ 3 G,B (, ) ={,, }. By induction, we cn esily prove tht the two components of the lest fixed-point re the lnguges L A = { m m n n m, n 1} {} nd L B = { n n n 1}. Letting G A =({A, B,, }, {, },P,A)ndG B =({A, B,, }, {, },P,B), it is indeed true tht L A = L(G A )ndl B = L(G B ). We hve the following theorem due to Ginsurg nd Rice: Theorem 6.9. Given context-free grmmr G =(V,Σ,P,S) with m nonterminls A 1,..., A m,thelestfixed-pointofthempφ G is the m-tuple of lnguges where G Ai =(V,Σ,P,A i ). Proof. Writing G s (L(G A1 ),...,L(G Am )), A 1 α 1,1 + + α 1,n1, A i α i,1 + + α i,ni, A m α m,1 + + α m,nn, let M =mx{ α i,j } e the mximum length of right-hnd sides of rules in P.Let Φ n G (,..., ) =(Φn G,1 (,..., ),...,Φn G,m (,..., )).

154 154 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Then, for ny w Σ,oservetht w Φ 1 G,i(,..., ) iff there is some rule A i α i,j with w = α i,j,ndtht w Φ n G,i (,..., ) for some n 2 iff there is some rule A i α i,j with α i,j of the form α i,j = u 1 A j1 u 2 u k A jk u k+1, where u 1,...,u k+1 Σ, k 1, nd some w 1,...,w k Σ such tht w h Φ n 1 G,j h (,..., ), nd w = u 1 w 1 u 2 u k w k u k+1. We prove the following two clims. Clim 1: For every w Σ, if A i n = w, thenw Φ p G,i (,..., ), for some p 1. Clim 2: For every w Σ, if w Φ n G,i (,..., ), with n 1, then A i p (M +1) n 1. p = w for some Proof of Clim 1. We proceed y induction on n. If A i 1 = w, thenw = α i,j for some rule A α i,j, nd y the remrk just efore the clim, w Φ 1 G,i (,..., ). If A i n+1 = w with n 1, then for some rule A i α i,j.if A i where u 1,...,u k+1 Σ, k 1, then A jh n = α i,j = w α i,j = u 1 A j1 u 2 u k A jk u k+1, n h = wh,wheren h n, nd w = u 1 w 1 u 2 u k w k u k+1 for some w 1,...,w k Σ. By the induction hypothesis, w h Φ p h G,jh (,..., ), for some p h 1, for every h, 1 h k. Letting p =mx{p 1,...,p k }, since ech sequence (Φ q G,i (,..., )) is n ω-chin, we hve w h Φ p G,j h (,..., ) foreveryh, 1 h k, ndy the remrk just efore the clim, w Φ p+1 G,i (,..., ).

155 6.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 155 Proof of Clim 2. We proceed y induction on n. Ifw Φ 1 G,i (,..., ), y the remrk just 1 efore the clim, then w = α i,j for some rule A α i,j,nda i = w. If w Φ n G,i (,..., ) forsomen 2, then there is some rule A i α i,j with α i,j of the form α i,j = u 1 A j1 u 2 u k A jk u k+1, where u 1,...,u k+1 Σ, k 1, nd some w 1,...,w k Σ such tht w h Φ n 1 G,j h (,..., ), nd w = u 1 w 1 u 2 u k w k u k+1. By the induction hypothesis, A jh so tht A i since k M. p = w with p h = wh with p h (M +1) n 2,ndthus p 1 p A i = u 1 A j1 u 2 u k A jk u k+1 = = k w, p p p k +1 M(M +1) n 2 +1 (M +1) n 1, Comining Clim 1 nd Clim 2, we hve L(G Ai )= n Φ n G,i(,..., ), which proves tht the lest fixed-point of the mp Φ G is the m-tuple of lnguges (L(G A1 ),...,L(G Am )). We now show how theorem 6.9 cn e used to give short proof tht every context-free grmmr cn e converted to Greich Norml Form. 6.9 Lest Fixed-Points nd the Greich Norml Form The hrd prt in converting grmmr G =(V,Σ,P,S) to Greich Norml Form is to convert it to grmmr in so-clled wek Greich Norml Form, where the productions re of the form A α, S ϵ, or

156 156 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES where Σ, α V, nd if S ϵ is rule, then S does not occur on the right-hnd side of ny rule. Indeed, if we first convert G to Chomsky Norml Form, it turns out tht we will get rules of the form A BC, A B or A. Using the lgorithm for eliminting ϵ-rules nd chin rules, we cn first convert the originl grmmr to grmmr with no chin rules nd no ϵ-rules except possily S ϵ, in which cse, S does not pper on the right-hnd side of rules. Thus, for the purpose of converting to wek Greich Norml Form, we cn ssume tht we re deling with grmmrs without chin rules nd without ϵ-rules. Let us lso ssume tht we computed the set T (G) of nonterminls tht ctully derive some terminl string, nd tht useless productions involving symols not in T (G) hve een deleted. Let us explin the ide of the conversion using the following grmmr: A AB + BB +. B Bd + BA + A + c. The first step is to group the right-hnd sides α into two ctegories: those whose leftmost symol is terminl (α ΣV ) nd those whose leftmost symol is nonterminl (α NV ). It is lso convenient to dopt mtrix nottion, nd we cn write the ove grmmr s (A, B) =(A, B) ( B B {d, A} ) +(, {A, c}) Thus, we re deling with mtrices (nd row vectors) whose entries re finite susets of V. For nottionl simplicity, rces round singleton sets re omitted. The finite susets of V form semiring, where ddition is union, nd multipliction is conctention. Addition nd multipliction of mtrices re s usul, except tht the semiring opertions re used. We will lso consider mtrices whose entries re lnguges over Σ. Agin, the lnguges over Σ form semiring, where ddition is union, nd multipliction is conctention. The identity element for ddition is, nd the identity element for multipliction is {ϵ}. Asove, ddition nd multipliction of mtrices re s usul, except tht the semiring opertions re used. For exmple, given ny lnguges A i,j nd B i,j over Σ, where i, j {1, 2}, wehve ( A1,1 A 1,2 A 2,1 A 2,2 )( B1,1 B 1,2 B 2,1 B 2,2 Letting X =(A, B), K =(, {A, c}), nd ) = H = ( A1,1 B 1,1 A 1,2 B 2,1 A 1,1 B 1,2 A 1,2 B 2,2 A 2,1 B 1,1 A 2,2 B 2,1 A 2,1 B 1,2 A 2,2 B 2,2 ( B B {d, A} ) )

157 6.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 157 the ove grmmr cn e concisely written s X = XH + K. More generlly, given ny context-free grmmr G =(V,Σ,P,S) with m nonterminls A 1,..., A m, ssuming tht there re no chin rules, no ϵ-rules, nd tht every nonterminl elongs to T (G), letting X =(A 1,...,A m ), we cn write G s X = XH + K, for some pproprite m m mtrix H in which every entry contins set (possily empty) of strings in V +,ndsomerowvectork in which every entry contins set (possily empty) of strings α ech eginning with terminl (α ΣV ). Given n m m squre mtrix A =(A i,j ) of lnguges over Σ, we cn define the mtrix A whose entry A i,j is given y A i,j = A n i,j, n 0 where A 0 = Id m, the identity mtrix, nd A n is the n-th power of A. Similrly, we define A + where A + i,j = n 1 A n i,j. Given mtrix A where the entries re finite suset of V,whereN = {A 1,...,A m },for ny m-tuple Λ = (L 1,...,L m ) of lnguges over Σ, we let Φ[Λ](A) = (Φ[Λ](A i,j )). Given system X = XH + K where H is n m m mtrix nd X, K re row mtrices, if H nd K do not contin ny nonterminls, we clim tht the lest fixed-point of the grmmr G ssocited with X = XH + K is KH. This is esily seen y computing the pproximtions X n =Φ n G (,..., ). Indeed, X0 = K, nd X n = KH n + KH n KH + K = K(H n + H n H + I m ). Similrly, if Y is n m m mtrix of nonterminls, the lest fixed-point of the grmmr ssocited with Y = HY + H is H + (provided tht H does not contin ny nonterminls). Given ny context-free grmmr G = (V,Σ,P,S) with m nonterminls A 1,..., A m, writing G s X = XH + K s explined erlier, we cn form nother grmmr GH y creting m 2 new nonterminls Y i,j, where the rules of this new grmmr re defined y the system of two mtrix equtions X = KY + K, Y = HY + H,

158 158 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES where Y =(Y i,j ). The following proposition is the key to the Greich Norml Form. Proposition Given ny context-free grmmr G =(V,Σ,P,S) with m nonterminls A 1,..., A m,writingg s X = XH + K s explined erlier, if GH is the grmmr defined y the system of two mtrix equtions X = KY + K, Y = HY + H, s explined ove, then the components in X of the lest-fixed points of the mps Φ G nd Φ GH re equl. Proof. Let U e the lest-fixed point of Φ G, nd let (V,W) e the lest fixed-point of Φ GH. We shll prove tht U = V. For nottionl simplicity, let us denote Φ[U](H) sh[u] nd Φ[U](K) sk[u]. Since U is the lest fixed-point of X = XH + K, wehve U = UH[U]+K[U]. Since H[U] ndk[u] do not contin ny nonterminls, y previous remrk, K[U]H [U] is the lest-fixed point of X = XH[U]+K[U], nd thus, K[U]H [U] U. On the other hnd, y monotonicity, [ ] [ ] K[U]H [U]H K[U]H [U] + K K[U]H [U] K[U]H [U]H[U]+K[U] =K[U]H [U], nd since U is the lest fixed-point of X = XH + K, U K[U]H [U]. Therefore, U = K[U]H [U]. We cn prove in similr mnner tht W = H[V ] +. nd Let Z = H[U] +.Wehve K[U]Z + K[U] =K[U]H[U] + + K[U] =K[U]H[U] = U, H[U]Z + H[U] =H[U]H[U] + + H[U] =H[U] + = Z, nd since (V,W) is the lest fixed-point of X = KY + K nd Y = HY + H, wegetv U nd W H[U] +.

159 6.9. LEAST FIXED-POINTS AND THE GREIBACH NORMAL FORM 159 nd We lso hve V = K[V ]W + K[V ]=K[V ]H[V ] + + K[V ]=K[V ]H[V ], VH[V ]+K[V ]=K[V ]H[V ] H[V ]+K[V ]=K[V ]H[V ] = V, nd since U is the lest fixed-point of X = XH + K, wegetu V.Therefore,U = V,s climed. Note tht the ove proposition ctully pplies to ny grmmr. Applying Proposition 6.10 to our exmple grmmr, we get the following new grmmr: ( ) Y1 Y 2 (A, B) =(, {A, c}) +(, {A, c}), Y 3 Y 4 ( ) ( )( ) ( ) Y1 Y 2 B Y1 Y 2 B = + Y 3 Y 4 B {d, A} Y 3 Y 4 B {d, A} There re still some nonterminls ppering s leftmost symols, ut using the equtions defining A nd B, we cn replce A with nd B with {Y 1,AY 3,cY 3,} {Y 2,AY 4,cY 4,A,c}, otining system in wek Greich Norml Form. This mounts to converting the mtrix to the mtrix H = ( B B {d, A} ) ( B L = {Y 2,AY 4,cY 4,A,c} {d, Y 1, AY 3, cy 3, } ) The wek Greich Norml Form corresponds to the new system X = KY + K, Y = LY + L.

160 160 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES This method works in generl for ny input grmmr with no ϵ-rules, no chin rules, nd such tht every nonterminl elongs to T (G). Under these conditions, the row vector K contins some nonempty entry, ll strings in K re in ΣV, nd ll strings in H re in V +. After otining the grmmr GH defined y the system X = KY + K, Y = HY + H, we use the system X = KY + K to express every nonterminl A i in terms of expressions contining strings α i,j involving terminl s the leftmost symol (α i,j ΣV ), nd we replce ll leftmost occurrences of nonterminls in H (occurrences A i in strings of the form A i β,whereβ V ) using the ove expressions. In this fshion, we otin mtrix L, nd it is immeditely shown tht the system X = KY + K, Y = LY + L, genertes the sme tuple of lnguges. Furthermore, this lst system corresponds to wek Greich Norml Form. It we strt with grmmr in Chomsky Norml Form (with no production S ϵ) such tht every nonterminl elongs to T (G), we ctully get Greich Norml Form (the entries in K re terminls, nd the entries in H re nonterminls). Thus, we hve justified Proposition 6.6. The method is lso quite economicl, since it introduces only m 2 new nonterminls. However, the resulting grmmr my contin some useless nonterminls Tree Domins nd Gorn Trees Derivtion trees ply very importnt role in prsing theory nd in the proofofstrong version of the pumping lemm for the context-free lnguges known s Ogden s lemm. Thus, it is importnt to define derivtion trees rigorously. We do so using Gorn trees. Let N + = {1, 2, 3,...}. Definition A tree domin D is nonempty suset of strings in N + conditions: stisfying the (1) For ll u, v N +, if uv D, thenu D. (2) For ll u N +,foreveryi N +, if ui D then uj D for every j, 1 j i. The tree domin is represented s follows: D = {ϵ, 1, 2, 11, 21, 22, 221, 222, 2211}

161 6.10. TREE DOMAINS AND GORN TREES 161 ϵ A tree leled with symols from set is defined s follows. Definition Given set of lels, -tree (for short, tree) is totl function t : D, where D is tree domin. The domin of tree t is denoted s dom(t). Every string u dom(t) is clled tree ddress or node. Let = {f,g,h,,}. Thetreet : D, where D is the tree domin of the previous exmple nd t is the function whose grph is {(ϵ, f), (1,h), (2,g), (11,), (21,), (22,f), (221,h), (222,), (2211,)} is represented s follows: f h g f h The outdegree (sometimes clled rmifiction) r(u) ofnodeu is the crdinlity of the set {i ui dom(t)}.

162 162 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Note tht the outdegree of node cn e infinite. Most of the trees tht we shll consider will e finite-rnching, tht is, for every node u, r(u) will e n integer, nd hence finite. If the outdegree of ll nodes in tree is ounded y n, then we cn view the domin of the tree s eing defined over {1, 2,...,n}. A node of outdegree 0 is clled lef. The node whose ddress is ϵ is clled the root of the tree. A tree is finite if its domin dom(t) is finite. Given node u in dom(t), every node of the form ui in dom(t) with i N + is clled son (or immedite successor) ofu. Tree ddresses re totlly ordered lexicogrphiclly: u v if either u is prefix of v or, there exist strings x, y, z N + nd i, j N +, with i<j,suchthtu = xiy nd v = xjz. In the first cse, we sy tht u is n ncestor (or predecessor) ofv (or u domintes v) nd in the second cse, tht u is to the left of v. If y = ϵ nd z = ϵ, wesythtxi is left rother (or left siling) ofxj, (i<j). Two tree ddresses u nd v re independent if u is not prefix of v nd v is not prefix of u. Given finite tree t, theyield of t is the string t(u 1 )t(u 2 ) t(u k ), where u 1,u 2,...,u k is the sequence of leves of t in lexicogrphic order. For exmple, the yield of the tree elow is : f h g f h Given finite tree t, thedepth of t is the integer d(t) =mx{ u u dom(t)}. Given tree t nd node u in dom(t), the sutree rooted t u is the tree t/u, whose domin is the set {v uv dom(t)} nd such tht t/u(v) =t(uv) for ll v in dom(t/u). Another importnt opertion is the opertion of tree replcement (or tree sustitution).

163 6.10. TREE DOMAINS AND GORN TREES 163 Definition Given two trees t 1 nd t 2 nd tree ddress u in t 1,theresult of sustituting t 2 t u in t 1,denotedyt 1 [u t 2 ], is the function whose grph is the set of pirs {(v, t 1 (v)) v dom(t 1 ),uis not prefix of v} {(uv, t 2 (v)) v dom(t 2 )}. Let t 1 nd t 2 e the trees defined y the following digrms: Tree t 1 f h g f h Tree t 2 g The tree t 1 [22 t 2 ] is defined y the following digrm: f h g g We cn now define derivtion trees nd relte derivtions to derivtion trees.

164 164 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES 6.11 Derivtions Trees Definition Given context-free grmmr G =(V,Σ,P,S), for ny A N, naderivtion tree for G is (V {ϵ})-tree t ( tree with set of lels (V {ϵ})) such tht: (1) t(ϵ) =A; (2) For every nonlef node u dom(t), if u1,...,uk re the successors of u, then either there is production B X 1 X k in P such tht t(u) =B nd t(ui) =X i for ll i, 1 i k, orb ϵ P, t(u) =B nd t(u1) = ϵ. Acomplete derivtion (or prse tree) is n S-tree whose yield elongs to Σ. A derivtion tree for the grmmr where P is the set of rules G 3 =({E,T,F,+,, (, ),}, {+,, (, ),},P,E), E E + T, E T, T T F, T F, F (E), F, is shown in Figure 6.1. The yield of the derivtion tree is +. E E T T + T F F F Figure 6.1: A complete derivtion tree Derivtions trees re ssocited to derivtions inductively s follows. Definition Given context-free grmmr G =(V,Σ,P,S), for ny A N, if π : A = n α is derivtion in G, weconstructna-derivtion tree t π with yield α s follows.

165 6.11. DERIVATIONS TREES 165 (1) If n =0,thent π is the one-node tree such tht dom(t π )={ϵ} nd t π (ϵ) =A. (2) If A = n 1 λbρ = λγρ = α, then if t 1 is the A-derivtion tree with yield λbρ ssocited with the derivtion A = n 1 λbρ, nd if t 2 is the tree ssocited with the production B γ (tht is, if γ = X 1 X k, then dom(t 2 )={ϵ, 1,...,k}, t 2 (ϵ) =B, ndt 2 (i) =X i for ll i, 1 i k, or if γ = ϵ, then dom(t 2 )={ϵ, 1}, t 2 (ϵ) =B, ndt 2 (1) = ϵ), then t π = t 1 [u t 2 ], where u is the ddress of the lef leled B in t 1. The tree t π is the A-derivtion tree ssocited with the derivtion A Given the grmmr where P is the set of rules G 2 =({E,+,, (, ),}, {+,, (, ),},P,E), E E + E, E E E, E (E), E, n = α. the prse trees ssocited with two derivtions of the string + re shown in Figure 6.2: E E E E + E E E E + E E Figure 6.2: Two derivtion trees for + The following proposition is esily shown. Proposition Let G =(V,Σ,P,S) e context-free grmmr. For ny derivtion A = n α, thereisuniquea-derivtion tree ssocited with this derivtion, with yield α. Conversely, for ny A-derivtion tree t with yield α, thereisuniqueleftmostderivtion A α in G hving t s its ssocited derivtion tree. = lm We will now prove strong version of the pumping lemm for context-free lnguges due to Bill Ogden (1968).

166 166 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES 6.12 Ogden s Lemm Ogden s lemm sttes some comintoril properties of prse trees tht re deep enough. The yield w of such prse tree cn e split into 5 sustrings u, v, x, y, z such tht w = uvxyz, where u, v, x, y, z stisfy certin conditions. It turns out tht we get more powerful version of the lemm if we llow ourselves to mrk certin occurrences of symols in w efore invoking the lemm. We cn imgine tht mrked occurrences in nonempty string w re occurrences of symols in w in oldfce, or red, or ny given color (ut one color only). For exmple, given w =, we cn mrk the symols of even index s follows:. More rigorously, we cn define mrking of nonnull string w : {1,...,n} Σsny function m: {1,...,n} {0, 1}. Then, letter w i in w is mrked occurrence iff m(i) =1, nd n unmrked occurrence if m(i) = 0. The numer of mrked occurrences in w is equl to n m(i). i=1 Ogden s lemm only yields useful informtion for grmmrs G generting n infinite lnguge. We could mke this hypothesis, ut it seems more elegnt to use the precondition tht the lemm only pplies to strings w L(D) suchthtw contins t lest K mrked occurrences, for constnt K lrge enough. If K is lrge enough, L(G) will indeed e infinite. Proposition For every context-free grmmr G, there is some integer K>1 such tht, for every string w Σ +,foreverymrkingofw, ifw L(G) nd w contins t lest K mrked occurrences, then there exists some decomposition of w s w = uvxyz, ndsome A N, suchthtthefollowingpropertieshold: (1) There re derivtions S + = uaz, A + = vay, nda + = x, sotht for ll n 0 (the pumping property); (2) x contins some mrked occurrence; uv n xy n z L(G) (3) Either (oth u nd v contin some mrked occurrence), or (oth y nd z contin some mrked occurrence); (4) vxy contins less thn K mrked occurrences.

167 6.12. OGDEN S LEMMA 167 Proof. Let t e ny prse tree for w. We cll lef of t mrked lef if its lel is mrked occurrence in the mrked string w. The generl ide is to mke sure tht K is lrge enough so tht prse trees with yield w contin enough repeted nonterminls long some pth from the root to some mrked lef. Let r = N, nd let We clim tht K = p 2r+3 does the jo. p =mx{2, mx{ α (A α) P }}. The key concept in the proof is the notion of B-node. Given prse tree t, B-node is node with t lest two immedite successors u 1,u 2,suchthtfori =1, 2, either u i is mrked lef, or u i hs some mrked lef s descendnt. We construct pth from the root to some mrked lef, so tht for every B-node, we pick the leftmost successor with the mximum numer of mrked leves s descendnts. Formlly, define pth(s 0,...,s n )from the root to some mrked lef, so tht: (i) Every node s i hs some mrked lef s descendnt, nd s 0 is the root of t; (ii) If s j is in the pth, s j is not lef, nd s j hs single immedite descendnt which is either mrked lef or hs mrked leves s its descendnts, let s j+1 e tht unique immedite descendnt of s i. (iii) If s j is B-node in the pth, then let s j+1 e the leftmost immedite successors of s j with the mximum numer of mrked leves s descendnts (ssuming tht if s j+1 is mrked lef, then it is its own descendnt). (iv) If s j is lef, then it is mrked lef nd n = j. We will show tht the pth (s 0,...,s n ) contins t lest 2r +3B-nodes. Clim: Foreveryi, 0 i n, if the pth (s i,...,s n ) contins B-nodes, then s i hs t most p mrked leves s descendnts. Proof. We proceed y ckwrd induction, i.e., y induction on n i. Fori = n, there re no B-nodes, so tht = 0, nd there is indeed p 0 = 1 mrked lef s n.assumethtthe clim holds for the pth (s i+1,...,s n ). If s i is not B-node, then the numer of B-nodes in the pth (s i+1,...,s n ) is the sme s the numer of B-nodes in the pth (s i,...,s n ), nd s i+1 is the only immedite successor of s i hving mrked lef s descendnt. By the induction hypothesis, s i+1 hs t most p mrked leves s descendnts, nd this is lso n upper ound on the numerofmrked leves which re descendnts of s i. If s i is B-node, then if there re B-nodes in the pth (s i+1,...,s n ), there re +1 B-nodes in the pth (s i,...,s n ). By the induction hypothesis, s i+1 hs t most p mrked leves s descendnts. Since s i is B-node, s i+1 ws chosen to e the leftmost immedite successor of s i hving the mximum numer of mrked leves s descendnts. Thus, since

168 168 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES the outdegree of s i is t most p, nd ech of its immedite successors hs t most p mrked leves s descendnts, the node s i hs t most pp d = p d+1 mrked leves s descendnts, s desired. Applying the clim to s 0, since w hs t lest K = p 2r+3 mrked occurrences, we hve p p 2r+3, nd since p 2, we hve 2r +3,ndthepth(s 0,...,s n ) contins t lest 2r +3B-nodes (Note tht this would not follow if we hd p =1). Let us now select the lowest 2r +3B-nodes in the pth, (s 0,...,s n ), nd denote them ( 1,..., 2r+3 ). Every B-node i hs t lest two immedite successors u i <v i such tht u i or v i is on the pth (s 0,...,s n ). If the pth goes through u i,wesytht i is right B-node nd if the pth goes through v i,wesytht i is left B-node. Since 2r +3=r +2+r +1, either there re r +2 left B-nodes or there re r +2 right B-nodes in the pth ( 1,..., 2r+3 ). Let us ssume tht there re r + 2 left B-nodes, the other cse eing similr. Let (d 1,...,d r+2 ) e the lowest r + 2 left B-nodes in the pth. Since there re r +1 B-nodes in the sequence (d 2,...,d r+2 ), nd there re only r distinct nonterminls, there re two nodes d i nd d j, with 2 i<j r +2,suchthtt(d i )=t(d j )=A, forsomea N. We cn ssume tht d i is n ncestor of d j,ndthus,d j = d i α,forsomeα ϵ. If we prune out the sutree t/d i rooted t d i from t, wegetns-derivtion tree hving yield of the form uaz, nd we hve derivtion of the form S = + uaz, since there re t lest r + 2 left B-nodes on the pth, nd we re looking t the lowest r + 1 left B-nodes. Considering the sutree t/d i, pruning out the sutree t/d j rooted t α in t/d i,wegetn A-derivtion tree hving yield of the form vay, nd we hve derivtion of the form A = + vay. Finlly, the sutree t/d j is n A-derivtion tree with yield x, ndwehve derivtion A = + x. This proves (1) of the lemm. Since s n is mrked lef nd descendnt of d j, x contins some mrked occurrence, proving (2). Since d 1 is left B-node, some left siling of the immedite successor of d 1 on the pth hs some distinguished lef in u s descendnt. Similrly, since d i is left B-node, some left siling of the immedite successor of d i on the pth hs some distinguished lef in v s descendnt. This proves (3). (d j,..., 2r+3 )hstmost2r +1 B-nodes, nd y the clim shown erlier, d j hs t most p 2r+1 mrked leves s descendnts. Since p 2r+1 <p 2r+3 = K, this proves (4). Oserve tht condition (2) implies tht x ϵ, nd condition (3) implies tht either u ϵ nd v ϵ, ory ϵ nd z ϵ. Thus, the pumping condition (1) implies tht the set {uv n xy n z n 0} is n infinite suset of L(G), nd L(G) is indeed infinite, s we mentioned erlier. Note tht K 3, nd in fct, K 32. The stndrd pumping lemm due to Br-Hillel, Perles, nd Shmir, is otined y letting ll occurrences e mrked in w L(G).

169 6.12. OGDEN S LEMMA 169 Proposition For every context-free grmmr G (without ϵ-rules), there is some integer K>1 such tht, for every string w Σ +,ifw L(G) nd w K, thenthereexistssome decomposition of w s w = uvxyz, ndsomea N, suchthtthefollowingpropertieshold: (1) There re derivtions S + = uaz, A + = vay, nda + = x, sotht for ll n 0 (the pumping property); (2) x ϵ; (3) Either v ϵ or y ϵ; (4) vxy K. uv n xy n z L(G) A stronger version could e stted, nd we re just following trdition in stting this stndrd version of the pumping lemm. Ogden s lemm or the pumping lemm cn e used to show tht certin lnguges re not context-free. The method is to proceed y contrdiction, i.e., to ssume (contrry to wht we wish to prove) tht lnguge L is indeed context-free, nd derive contrdiction of Ogden s lemm (or of the pumping lemm). Thus, s in the cse of the regulr lnguges, it would e helpful to see wht the negtion of Ogden s lemm is, nd for this, we first stte Ogden s lemm s logicl formul. For ny nonnull string w : {1,...,n} Σ, for ny mrking m: {1,...,n} {0, 1} of w, for ny sustring y of w, wherew = xyz, with x = h nd k = y, thenumerofmrked occurrences in y, denoted s m(y), is defined s m(y) = We will lso use the following revitions: nt = {0, 1, 2,...}, nt32 = {32, 33,...}, A w = uvxyz, B m(x) 1, i=h+k i=h+1 m(i). C ( m(u) 1 m(v) 1) ( m(y) 1 m(z) 1), D m(vxy) <K, P n: nt (uv n xy n z L(D)).

170 170 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Ogden s lemm cn then e stted s G: CFG K : nt32 w :Σ m: mrking ( ) (w L(D) m(w) K) ( u, v, x, y, z :Σ A B C D P ). Reclling tht nd (A B C D P ) (A B C D) P (A B C D) P (P Q) P Q, the negtion of Ogden s lemm cn e stted s G: CFG K : nt32 w :Σ m: mrking ( ) (w L(D) m(w) K) ( u, v, x, y, z :Σ (A B C D) P ). Since P n: nt (uv n xy n z/ L(D)), in order to show tht Ogden s lemm is contrdicted, one needs to show tht for some context-free grmmr G, foreveryk 2, there is some string w L(D) nd some mrking m of w with t lest K mrked occurrences in w, such tht for every possile decomposition w = uvxyz stisfying the constrints A B C D, there is some n 0suchtht uv n xy n z / L(D). When proceeding y contrdiction, we hve lnguge L tht we re (wrongly) ssuming to e context-free nd we cn use ny CFG grmmr G generting L. The cretive prt of the rgument is to pick the right w L nd the right mrking of w (not mking ny ssumption on K). As n illustrtion, we show tht the lnguge L = { n n c n n 1} is not context-free. Since L is infinite, we will e le to use the pumping lemm. The proof proceeds y contrdiction. If L ws context-free, there would e some contextfree grmmr G such tht L = L(G), nd some constnt K>1 s in Ogden s lemm. Let w = K K c K,ndchoosethe s s mrked occurrences. Then y Ogden s lemm, x contins some mrked occurrence, nd either oth u, v or oth y, z contin some mrked occurrence. Assume tht oth u nd v contin some. We hve the following sitution: } {{ } } {{} } c {{ c }. u v xyz

171 6.12. OGDEN S LEMMA 171 If we consider the string uvvxyyz,thenumerof s is still K, utthenumerof s is strictly greter thn K since v contins t lest one, ndthusuvvxyyz / L, contrdiction. If oth y nd z contin some we will lso rech contrdiction ecuse in the string uvvxyyz, thenumerofc s is still K, utthenumerof s is strictly greter thn K. Hving reched contrdiction in ll cses, we conclude tht L is not context-free. Let us now show tht the lnguge is not context-free. L = { m n c m d n m, n 1} Agin, we proceed y contrdiction. This time, let where the s nd c s re mrked occurrences. w = K K c K d K, By Ogden s lemm, either oth u, v contin some mrked occurrence, or oth y, z contin some mrked occurrence, nd x contins some mrked occurrence. Let us first consider the cse where oth u, v contin some mrked occurrence. If v contins some, since uvvxyyz L, v must contin only s, since otherwise we would hve d string in L, nd we hve the following sitution: } {{ } } {{} } c cd {{ d}. u v xyz Since uvvxyyz L, the only wy to preserve n equl numer of s nd d s is to hve y d +.Butthen,vxy contins c K, which contrdicts (4) of Ogden s lemm. If v contins some c, since x lso contins some mrked occurrence, it must e some c, nd v contins only c s nd we hve the following sitution: } c {{ c } c } c {{} c } cd {{ d}. u v xyz Since uvvxyyz L nd the numer of s is still K wheres the numer of c s is strictly more thn K, this cse is impossile. Let us now consider the cse where oth y, z contin some mrked occurrence. Resoning s efore, the only possiility is tht v + nd y c + : } {{} } {{} } c {{ c } c } c {{}} c cd {{ d}. u v x y z But then, vxy contins K, which contrdicts (4) of Ogden s Lemm. Since contrdiction ws otined in ll cses, L is not context-free.

172 172 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Ogden s lemm cn lso e used to show tht the context-free lnguge { m n c n m, n 1} { m m c n m, n 1} is inherently miguous. The proof is quite involved. Another corollry of the pumping lemm is tht it is decidle whether context-free grmmr genertes n infinite lnguge. Proposition Given ny context-free grmmr, G, if K is the constnt of Ogden s lemm, then the following equivlence holds: L(G) is infinite iff there is some w L(G) such tht K w < 2K. Proof. Let K = p 2r+3 e the constnt from the proof of Proposition If there is some w L(G) such tht w K, we lredy oserved tht the pumping lemm implies tht L(G) contins n infinite suset of the form {uv n xy n z n 0}. Conversely, ssume tht L(G) is infinite. If w <Kfor ll w L(G), then L(G) is finite. Thus, there is some w L(G) suchtht w K. Letw L(G) e miniml string such tht w K. Bythe pumping lemm, we cn write w s w = uvxyxz, wherex ϵ, vy ϵ, nd vxy K. By the pumping property, uxz L(G). If w 2K, then uxz = uvxyz vy > uvxyz vxy 2K K = K, nd uxz < uvxyz, contrdicting the minimlity of w. Thus,wemusthve w < 2K. In prticulr, if G is in Chomsky Norml Form, it cn e shown tht we just hve to consider derivtions of length t most 4K Pushdown Automt We hve seen tht the regulr lnguges re exctly the lnguges ccepted y DFA s or NFA s. The context-free lnguges re exctly the lnguges ccepted y pushdown utomt, for short, PDA s. However, lthough there re two versions of PDA s, deterministic nd nondeterministic, contrry to the fct tht every NFA cn e converted to DFA, nondeterministic PDA s re strictly more poweful thn deterministic PDA s (DPDA s). Indeed, there re context-free lnguges tht cnnot e ccepted y DPDA s. Thus, the nturl mchine model for the context-free lnguges is nondeterministic, nd for this reson, we just use the revition PDA, s opposed to NPDA. We dopt definition of PDA in which the pushdown store, or stck, must not e emptyformove to tke plce. Other uthors llow PDA s to mke move when the stck is empty. Novices seem to e confused y such moves, nd this is why we do not llow moves with n empty stck. Intuitively, PDA consists of n input tpe, nondeterministic finite-stte control, nd stck. Given ny set X possily infinite, let P fin (X) e the set of ll finite susets of X.

173 6.13. PUSHDOWN AUTOMATA 173 Definition A pushdown utomton is 7-tuple M =(Q, Σ, Γ,δ,q 0,Z 0,F), where Q is finite set of sttes; Σ is finite input lphet; Γ is finite pushdown store (or stck) lphet; q 0 Q is the strt stte (or initil stte); Z 0 Γ is the initil stck symol (or ottom mrker ); F Q is the set of finl (or ccepting) sttes; δ : Q (Σ {ϵ}) Γ P fin (Q Γ ) is the trnsition function. A trnsition is of the form (q, γ) δ(p,, Z), where p, q Q, Z Γ, γ Γ nd Σ {ϵ}. A trnsition of the form (q, γ) δ(p, ϵ, Z) is clled n ϵ-trnsition (or ϵ-move). The wy PDA opertes is explined in terms of Instntneous Descriptions, forshort ID s. Intuitively, n Instntneous Description is snpshot of the PDA. An ID is triple of the form (p, u, α) Q Σ Γ. The ide is tht p is the current stte, u is the remining input, nd α represents the stck. It is importnt to note tht we use the convention tht the leftmost symol in α represents the topmost stck symol. Given PDA M, we define reltion M etween pirs of ID s. This is very similr to the derivtion reltion = G ssocited with context-free grmmr. Intuitively, PDA scns the input tpe symol y symol from left to right, mking moves tht cuse chnge of stte, n updte to the stck (ut only t the top), nd either dvncing the reding hed to the next symol, or not moving the reding hed during n ϵ-move. Definition Given PDA the reltion M is defined s follows: M =(Q, Σ, Γ,δ,q 0,Z 0,F), (1) For ny move (q, γ) δ(p,, Z), wherep, q Q, Z Γ, Σ, for every ID of the form (p, v, Zα), we hve (p, v, Zα) M (q, v, γα). (2) For ny move (q, γ) δ(p, ϵ, Z), wherep, q Q, Z Γ, for every ID of the form (p, u, Zα), we hve (p, u, Zα) M (q, u, γα).

174 174 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES As usul, + M is the trnsitive closure of M,nd M of M. is the reflexive nd trnsitive closure Amoveoftheform where Σ {ϵ}, is clled pop move. (p, u, Zα) M (q, u, α) A move on rel input symol Σ cuses this input symol to e consumed, nd the reding hed dvnces to the next input symol. On the other hnd, during n ϵ-move, the reding hed stys put. When we sy tht we hve computtion. (p, u, α) M (q, v, β) There re severl equivlent wys of defining cceptnce y PDA. Definition Given PDA the following lnguges re defined: M =(Q, Σ, Γ,δ,q 0,Z 0,F), (1) T (M) = {w Σ (q 0,w,Z 0 ) M (f,ϵ,α), where f F,ndα Γ }. We sy tht T (M) is the lnguge ccepted y M y finl stte. (2) N(M) = {w Σ (q 0,w,Z 0 ) M (q, ϵ, ϵ), where q Q}. We sy tht N(M) is the lnguge ccepted y M y empty stck. (3) L(M) = {w Σ (q 0,w,Z 0 ) M (f,ϵ,ϵ), where f F }. We sy tht L(M) is the lnguge ccepted y M y finl stte nd empty stck. In ll cses, note tht the input w must e consumed entirely. The following proposition shows tht the cceptnce mode does not mtter for PDA s. As we will see shortly, it does mtter for DPDAs. Proposition For ny lnguge L, thefollowingfctshold. (1) If L = T (M) for some PDA M, thenl = L(M ) for some PDA M. (2) If L = N(M) for some PDA M, thenl = L(M ) for some PDA M. (3) If L = L(M) for some PDA M, thenl = T (M ) for some PDA M.

175 6.13. PUSHDOWN AUTOMATA 175 (4) If L = L(M) for some PDA M, thenl = N(M ) for some PDA M. In view of Proposition 6.15, the three cceptnce modes T,N,L re equivlent. The following PDA ccepts the lnguge y empty stck. Q = {1, 2}, Γ={Z 0,}; (1,) δ(1,,z 0 ), (1,) δ(1,,), (2,ϵ) δ(1,,), (2,ϵ) δ(2,,). The following PDA ccepts the lnguge y finl stte (nd lso y empty stck). Q = {1, 2, 3}, Γ={Z 0,A,}, F = {3}; (1,A) δ(1,,z 0 ), (1,A) δ(1,,a), (1,) δ(1,,), (2,ϵ) δ(1,,), (2,ϵ) δ(2,,), (3,ϵ) δ(1,,a), (3,ϵ) δ(2,,a). DPDA s re defined s follows. Definition APDA L = { n n n 1} L = { n n n 1} M =(Q, Σ, Γ,δ,q 0,Z 0,F) is deterministic PDA (for short, DPDA), iff the following conditions hold for ll (p, Z) Q Γ: either (1) δ(p,, Z) = 1 for ll Σ, nd δ(p, ϵ, Z) =, or (2) δ(p,, Z) = for ll Σ, nd δ(p, ϵ, Z) =1.

176 176 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES A DPDA opertes in reltime iff it hs no ϵ-trnsitions. It turns out tht for DPDA s the most generl cceptnce mode is y finl stte. Indeed, there re lnguge tht cn only e ccepted deterministiclly s T (M). The lnguge L = { m n m n 1} is such n exmple. The prolem is tht m is prefix of ll strings m n, with m n 2. A lnguge L is deterministic context-free lnguge iff L = T (M) forsomedpdam. It is esily shown tht if L = N(M) (orl = L(M)) for some DPDA M, thenl = T (M ) for some DPDA M esily constructed from M. A PDA is unmiguous iff for every w Σ, there is t most one computtion where ID n is n ccepting ID. (q 0,w,Z 0 ) ID n, There re context-free lnguges tht re not ccepted y ny DPDA. For exmple, it cn e shown tht the lnguges nd re not ccepted y ny DPDA. L 1 = { n n n 1} { n 2n n 1}, L 2 = {ww R w {, } }, Also note tht unmiguous grmmrs for these lnguges cn e esily given. We now show tht every context-free lnguge is ccepted y PDA From Context-Free Grmmrs To PDA s We show how PDA cn e esily constructed from context-free grmmr. Although simple, the construction is not prcticl for prsing purposes, since the resulting PDA is horrily nondeterministic. Given context-free grmmr G =(V,Σ,P,S), we define one-stte PDA M s follows: Q = {q 0 };Γ=V ; Z 0 = S; F = ; For every rule (A α) P, there is trnsition For every Σ, there is trnsition (q 0,α) δ(q 0,ϵ,A). (q 0,ϵ) δ(q 0,,). The intuition is tht computtion of M mimics leftmost derivtion in G. One might sy tht we hve pop/expnd PDA.

177 6.15. FROM PDA S TO CONTEXT-FREE GRAMMARS 177 Proposition Given ny context-free grmmr G = (V,Σ,P,S), the PDA M just descried ccepts L(G) y empty stck, i.e., L(G) =N(M). Proof sketch. The following two clims re proved y induction. Clim 1: For ll u, v Σ nd ll α NV {ϵ}, if S = uα, then lm (q 0,uv,S) (q 0,v,α). Clim 2: For ll u, v Σ nd ll α V, if (q 0,uv,S) (q 0,v,α) then S = uα. lm We now show how PDA cn e converted to context-free grmmr 6.15 From PDA s To Context-Free Grmmrs The construction of context-free grmmr from PDA is not relly difficult, ut it is quite messy. The construction is simplified if we first convert PDA to n equivlent PDA such tht for every move (q, γ) δ(p,, Z) (where Σ {ϵ}), we hve γ 2. In some sense, we form kind of PDA in Chomsky Norml Form. Proposition Given ny PDA nother PDA M =(Q, Σ, Γ,δ,q 0,Z 0,F), M =(Q, Σ, Γ,δ,q 0,Z 0,F ) cn e constructed, such tht L(M) =L(M ) nd the following conditions hold: (1) There is one-to-one correspondence etween ccepting computtions of M nd M ; (2) If M hs no ϵ-moves, then M hs no ϵ-moves; If M is unmiguous, then M is unmiguous; (3) For ll p Q,ll Σ {ϵ}, ndllz Γ,if(q, γ) δ (p,, Z), thenq q 0 nd γ 2.

178 178 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES The crucil point of the construction is tht ccepting computtions of PDA ccepting y empty stck nd finl stte cn e decomposed into sucomputtions of the form (p, uv, Zα) (q, v, α), where for every intermedite ID (s, w, β), we hve β = γα for some γ ϵ. The nonterminls of the grmmr constructed from the PDA M re triples of the form [p, Z, q] suchtht (p, u, Z) + (q, ϵ, ϵ) for some u Σ. Given PDA M =(Q, Σ, Γ,δ,q 0,Z 0,F) stisfying the conditions of Proposition 6.17, we construct context-free grmmr G = (V,Σ,P,S) s follows: V = {[p, Z, q] p, q Q, Z Γ} Σ {S}, where S is new symol, nd the productions re defined s follows: for ll p, q Q, ll Σ {ϵ}, ll X, Y, Z Γ, we hve: (1) S ϵ P, if q 0 F ; (2) S P, if (f,ϵ) δ(q 0,,Z 0 ), nd f F ; (3) S [p, X, f] P,foreveryf F, if (p, X) δ(q 0,,Z 0 ); (4) S [p, X, s][s, Y, f] P,foreveryf F,foreverys Q, if (p, XY ) δ(q 0,,Z 0 ); (5) [p, Z, q] P, if (q, ϵ) δ(p,, Z) ndp q 0 ; (6) [p, Z, s] [q, X, s] P,foreverys Q, if (q, X) δ(p,, Z) ndp q 0 ; (7) [p, Z, t] [q, X, s][s, Y, t] P,foreverys, t Q, if (q, XY ) δ(p,, Z) ndp q 0. Proposition Given ny PDA M =(Q, Σ, Γ,δ,q 0,Z 0,F) stisfying the conditions of Proposition 6.17, the context-free grmmr G = (V,Σ,P,S) constructed s ove genertes L(M), i.e.,l(g) =L(M). Furthermore, G is unmiguous iff M is unmiguous.

179 6.16. THE CHOMSKY-SCHUTZENBERGER THEOREM 179 Proof skecth. We hve to prove tht L(G) ={w Σ + (q 0,w,Z 0 ) + (f,ϵ,ϵ), f F } {ϵ q 0 F }. For this, the following clim is proved y induction. Clim: For ll p, q Q, ll Z Γ, ll k 1, nd ll w Σ, [p, Z, q] k = lm w iff (p, w, Z) + (q, ϵ, ϵ). Using the clim, it is possile to prove tht L(G) =L(M). In view of Propositions 6.16 nd 6.18, the fmily of context-free lnguges is exctly the fmily of lnguges ccepted y PDA s. It is hrder to give grmmticl chrcteriztion of the deterministic context-free lnguges. One method is to use Knuth LR(k)-grmmrs. Another chrcteriztion cn e given in terms of strict deterministic grmmrs due to Hrrison nd Hvel The Chomsky-Schutzenerger Theorem Unfortuntely, there is no chrcteriztion of the context-free lnguges nlogous to the chrcteriztion of the regulr lnguges in terms of closure properties (R(Σ)). However, there is fmous theorem due to Chomsky nd Schutzenerger showing tht every context-free lnguge cn e otined from specil lnguge, the Dyck set, in terms of homomorphisms, inverse homomorphisms nd intersection with the regulr lnguges. Definition Given the lphet Σ 2 = {,,, }, define the reltion on Σ 2 s follows: For ll u, v Σ 2, u v iff x, y Σ 2, u = xy, v = xy or u = xy, v = xy. Let e the reflexive nd trnsitive closure of, nd let D 2 = {w Σ 2 w ϵ}. This is the Dyck set on two letters. It is not hrd to prove tht D 2 is context-free. Theorem (Chomsky-Schutzenerger) ForeveryPDA,M = (Q, Σ, Γ,δ,q 0,Z 0,F), there is regulr lnguge R nd two homomorphisms g, h such tht L(M) =h(g 1 (D 2 ) R).

180 180 CHAPTER 6. CONTEXT-FREE GRAMMARS AND LANGUAGES Oserve tht Theorem 6.19 yields nother proof of the fct tht the lnguge ccepted PDA is context-free. Indeed, the context-free lnguges re closed under, homomorphisms, inverse homomorphisms, intersection with the regulr lnguges, nd D 2 is context-free. From the chrcteriztion of -trnsducers in terms of homomorphisms, inverse homomorphisms, nd intersection with regulr lnguges, we deduce tht every context-free lnguge is the imge of D 2 under some -trnsduction.

181 Chpter 7 A Survey of LR-Prsing Methods In this chpter, we give rief survey on LR-prsing methods. We egin with the definition of chrcteristic strings nd the construction of Knuth s LR(0)-chrcteristic utomton. Next, we descrie the shift/reduce lgorithm. The need for lookhed sets is motivted y the resolution of conflicts. A unified method for computing FIRST, FOLLOW nd LALR(1) lookhed sets is presented. The method uses sme grph lgorithm Trverse which finds ll nodes rechle from given node nd computes the union of predefinedsets ssigned to these nodes. Hence, the only difference etween the vrious lgorithms for computing FIRST, FOLLOW nd LALR(1) lookhed sets lies in the fct tht the initil sets nd the grphs re computed in different wys. The method cn e viewed s n efficient wy for solving set of simultneously recursive equtions with set vriles. The method is inspired y DeRemer nd Pennello s method for computing LALR(1) lookhed sets. However, DeRemer nd Pennello use more sophisticted grph lgorithm for finding strongly connected components. We use slightly less efficient ut simpler lgorithm ( depth-first serch). We conclude with rief presenttion of LR(1) prsers. 7.1 LR(0)-Chrcteristic Automt The purpose of LR-prsing, invented y D. Knuth in the mid sixties, is the following: Given context-freegrmmrg, for ny terminl string w Σ,findoutwhetherw elongs to the lnguge L(G) genertedyg, nd if so, construct rightmost derivtion of w, in deterministic fshion. Of course, this is not possile for ll context-free grmmrs, ut only for those tht correspond to lnguges tht cn e recognized y deterministic PDA (DPDA). Knuth s mjor discovery ws tht for certin type of grmmrs, the LR(k)- grmmrs, certin kind of DPDA could e constructed from the grmmr (shift/reduce prsers). The k in LR(k) refers to the mount of lookhed tht is necessry in order to proceed deterministiclly. It turns out tht k = 1 is sufficient, ut even in this cse, Knuth construction produces very lrge DPDA s, nd his originl LR(1) method is not prcticl. Fortuntely, round 1969, Frnk DeRemer, in his MIT Ph.D. thesis, investigted prcticl restriction of Knuth s method, known s SLR(k), nd soon fter, the LALR(k) method ws 181

182 182 CHAPTER 7. A SURVEY OF LR-PARSING METHODS discovered. The SLR(k) nd the LALR(k) methods re oth sed on the construction of the LR(0)-chrcteristic utomton from grmmr G, nd we egin y explining this construction. The dditionl ingredient needed to otin n SLR(k) or n LALR(k) prser from n LR(0) prser is the computtion of lookhed sets. In the SLR cse, the FOLLOW sets re needed, nd in the LALR cse, more sophisticted version of the FOLLOW sets is needed. We will consider the construction of these sets in the cse k = 1. We will discuss the shift/reduce lgorithm nd consider riefly wys of uilding LR(1)-prsing tles. For simplicity of exposition, we first ssume tht grmmrs hve no ϵ-rules. This restriction will e lifted in Section Given reduced context-free grmmr G =(V,Σ,P,S ) ugmented with strt production S S, wheres does not pper in ny other productions, the set C G of chrcteristic strings of G is the following suset of V (wtch out, not Σ ): C G = {αβ V S = αbv = αβv, rm rm α, β V,v Σ,B β P }. In words, C G is certin set of prefixes of sententil forms otined in rightmost derivtions: Those otined y truncting the prt of the sententil form immeditely following the rightmost symol in the righthnd side of the production pplied t the lst step. The fundmentl property of LR-prsing, due to D. Knuth, is tht C G is regulr lnguge. Furthermore,DFA,DCG, ccepting C G,cneconstructedfromG. Conceptully, it is simpler to construct the DFA ccepting C G in two steps: (1) First, construct nondeterministic utomton with ϵ-rules, NCG, ccepting C G. (2) Apply the suset construction (Rin nd Scott s method) to NCG to otin the DFA DCG. In fct, creful inspection of the two steps of this construction revels tht it is possile to construct DCG directly in single step, nd this is the construction usully found in most textooks on prsing. The nondeterministic utomton NCG ccepting C G is defined s follows: The sttes of N CG re mrked productions, where mrked production is string of the form A α. β, wherea αβ is production, nd. is symol not in V clled the dot nd which cn pper nywhere within αβ. The strt stte is S. S, nd the trnsitions re defined s follows: () For every terminl Σ, if A α. β is mrked production, with α, β V,then there is trnsition on input from stte A α. β to stte A α. β otined y shifting the dot. Such trnsition is shown in Figure 7.1.

183 7.1. LR(0)-CHARACTERISTIC AUTOMATA 183 A α. β A α. β Figure 7.1: Trnsition on terminl input A α. Bβ B ϵ ϵ A αb. β B. γ 1 B. γ m Figure 7.2: Trnsitions from stte A α. Bβ () For every nonterminl B N, if A α. Bβ is mrked production, with α, β V, then there is trnsition on input B from stte A α. Bβ to stte A αb. β (otined y shifting the dot ), nd trnsitions on input ϵ (the empty string) to ll sttes B. γ i, for ll productions B γ i with left-hnd side B. Such trnsitions re shown in Figure 7.2. (c) A stte is finl if nd only if it is of the form A β. (tht is, the dot is in the rightmost position). The ove construction is illustrted y the following exmple:

184 184 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Exmple 1. Consider the grmmr G 1 given y: S E E E E The NFA for C G1 is shown in Figure 7.3. The result of mking the NFA for C G1 deterministic is shown in Figure 7.4 (where trnsitions to the ded stte hve een omitted). The internl structure of the sttes 1,...,6 is shown elow: 1:S.E E.E E. 2:E.E E. E.E E. 3:E E. 4:S E. 5:E. 6:E E. The next exmple is slightly more complicted. Exmple 2. Consider the grmmr G 2 given y: S E E E + T E T T T T The result of mking the NFA for C G2 deterministic is shown in Figure 7.5 (where trnsitions to the ded stte hve een omitted). The internl structure of the sttes 1,...,8

185 7.1. LR(0)-CHARACTERISTIC AUTOMATA 185 S.E E ϵ ϵ S E. E.E E. ϵ ϵ E.E E E. E. E E. E E. Figure 7.3: NFA for C G E E Figure 7.4: DFA for C G1

186 186 CHAPTER 7. A SURVEY OF LR-PARSING METHODS T E + T 4 Figure 7.5: DFA for C G2 is shown elow: 1:S.E E.E + T E.T T.T T. 2:E E. + T S E. 3:E T. T T. 4:T. 5:E E +.T T.T T. 6:T T. 7:E E + T. T T. 8:T T. Note tht some of the mrked productions re more importnt thn others. For exmple, in stte 5, the mrked production E E +.T determines the stte. The other two items T.T nd T. re otined y ϵ-closure. We cll mrked production of the form A α.β, whereα ϵ, core item. Amrked production of the form A β. is clled reduce item. Reduce items only pper in finl

187 7.1. LR(0)-CHARACTERISTIC AUTOMATA 187 sttes. If we lso cll S.S core item, we oserve tht every stte is completely determined y its suset of core items. The other items in the stte re otined vi ϵ-closure. We cn tke dvntge of this fct to write more efficient lgorithm to construct in single pss the LR(0)-utomton. Also oserve the so-clled spelling property: All the trnsitions entering ny given stte hve the sme lel. Given stte s, if s contins oth reduce item A γ. nd shift item B α.β, where Σ, we sy tht there is shift/reduce conflict in stte s on input. Ifs contins two (distinct) reduce items A 1 γ 1. nd A 2 γ 2., we sy tht there is reduce/reduce conflict in stte s. A grmmr is sid to e LR(0) if the DFA DCG hs no conflicts. This is the cse for the grmmr G 1. However, it should e emphsized tht this is extremely rre in prctice. The grmmr G 1 is just very nice, nd toy exmple. In fct, G 2 is not LR(0). To eliminte conflicts, one cn either compute SLR(1)-lookhed sets, using FOLLOW sets (see Section 7.6), or shrper lookhed sets, the LALR(1) sets (see Section 7.9). For exmple, the computtion of SLR(1)-lookhed sets for G 2 will eliminte the conflicts. We will descrie methods for computing SLR(1)-lookhed sets nd LALR(1)-lookhed sets in Sections 7.6, 7.9, nd A more drstic mesure is to compute the LR(1)- utomton, in which the sttes incoporte lookhed symols (see Section 7.11). However, s we sid efore, this is not prcticl methods for lrge grmmrs. In order to motivte the construction of shift/reduce prser from the DFA ccepting C G, let us consider rightmost derivtion for w = in reverse order for the grmmr 0: S E 1: E E 2: E α 1 β 1 v 1 E α 1 B 1 v 1 E E α 2 β 2 v 2 E α 2 B 2 v 2 E E E α 3 β 3 v 3 α 3 = v 3 = ϵ E α 3 B 3 v 3 α 3 = v 3 = ϵ E E E α 4 β 4 v 4 α 4 = v 4 = ϵ S α 4 B 4 v 4 α 4 = v 4 = ϵ S E

188 188 CHAPTER 7. A SURVEY OF LR-PARSING METHODS E E Figure 7.6: DFA for C G Oserve tht the strings α i β i for i =1, 2, 3, 4 re ll ccepted y the DFA for C G shown in Figure 7.6. Also, every step from α i β i v i to α i B i v i is the inverse of the derivtion step using the production B i β i, nd the mrked production B i β i. is one of the reduce items in the finl stte reched fter processing α i β i with the DFA for C G. This suggests tht we cn prse w = y recursively running the DFA for C G. The first time (which correspond to step 1) we run the DFA for C G on w, some string α 1 β 1 is ccepted nd the remining input in v 1. Then, we reduce β 1 to B 1 using production B 1 β 1 corresponding to some reduce item B 1 β 1. in the finl stte s 1 reched on input α 1 β 1. We now run the DFA for C G on input α 1 B 1 v 1. The string α 2 β 2 is ccepted, nd we hve α 1 B 1 v 1 = α 2 β 2 v 2. We reduce β 2 to B 2 using production B 2 β 2 corresponding to some reduce item B 2 β 2. in the finl stte s 2 reched on input α 2 β 2. We now run the DFA for C G on input α 2 B 2 v 2,ndsoon. At the (i+1)th step (i 1), we run the DFA for C G on input α i B i v i. The string α i+1 β i+1 is ccepted, nd we hve α i B i v i = α i+1 β i+1 v i+1. We reduce β i+1 to B i+1 using production B i+1 β i+1 corresponding to some reduce item B i+1 β i+1. in the finl stte s i+1 reched on input α i+1 β i+1. The string β i+1 in α i+1 β i+1 v i+1 if often clled hndle. Then we run gin the DFA for C G on input α i+1 B i+1 v i+1. Now, ecuse the DFA for C G is deterministic there is no need to rerun it on the entire string α i+1 B i+1 v i+1,ecuseon input α i+1 it will tke us to the sme stte, syp i+1, tht it reched on input α i+1 β i+1 v i+1!

189 7.1. LR(0)-CHARACTERISTIC AUTOMATA 189 The trick is tht we cn use stck to keep trck of the sequence of sttes used to process α i+1 β i+1. Then, to perform the reduction of α i+1 β i+1 to α i+1 B i+1, we simply pop numerof sttes equl to β i+1, encovering new stte p i+1 on top of the stck, nd from stte p i+1 we perform the trnsition on input B i+1 to stte q i+1 (in the DFA for C G ), so we push stte q i+1 on the stck which now contins the sequence of sttes on input α i+1 B i+1 tht tkes us to q i+1. Then we resume scnning v i+1 using the DGA for C G, pushing ech stte eing trversed on the stck until we hit finl stte. At this point we find the new string α i+2 β i+2 tht leds to finl stte nd we continue s efore. The process stops when the remining input v i+1 ecomes empty nd when the reduce item S S. (here, S E.) elongs to the finl stte s i E E Figure 7.7: DFA for C G For exmple, on input α 2 β 2 = E, wehvethesequenceofsttes: Stte 6 contins the mrked production E E., so we pop the three topmost sttes otining the stck 1 2 nd then we mke the trnsition from stte 2 on input E, which tkes us to stte 3, so we push 3 on top of the stck, otining We continue from stte 3 on input. 123 Bsiclly, the recursive clls to the DFA for C G re implemented using stck.

190 190 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Wht is not cler is, during step i + 1, when reching finl stte s i+1,howdoweknow which production B i+1 β i+1 to use in the reduction step? Indeed, stte s i+1 could contin severl reduce items B i+1 β i+1.. This is where we ssume tht we were le to compute some lookhed informtion, tht is, for every finl stte s nd every input, we know which unique production n: B i+1 β i+1 pplies. This is recorded in tle nme ction, such tht ction(s, ) =rn, where r stnds for reduce. Typiclly we compute SLR(1) or LALR(1) lookhed sets. Otherwise, we could pick some reducing production nondeterministicllly nd use cktrcking. This works ut the running time my e exponentil. The DFA for C G nd the ction tle giving us the reductions cn e comined to form igger ction tle which specifies completely how the prser using stck works. This kind of prser clled shift-reduce prser is discussed in the next section. In order to mke it esier to compute the reduce entries in the prsing tle, we ssume tht the end of the input w is signlled y specil endmrker trditionlly denoted y $. 7.2 Shift/Reduce Prsers A shift/reduce prser is modified kind of DPDA. Firstly, push moves, clled shift moves, re restricted so tht exctly one symol is pushed on top of the stck. Secondly, more powerful kinds of pop moves, clled reduce moves, re llowed. During reduce move, finite numer of stck symols my e popped off the stck, nd the lst step of reduce move, clled goto move, consists of pushing one symol on top of new topmost symol in the stck. Shift/reduce prsers use prsing tles constructed from the LR(0)-chrcteristic utomton DCG ssocited with the grmmr. The shift nd goto moves come directly from the trnsition tle of DCG, ut the determintion of the reduce moves requires the computtion of lookhed sets. TheSLR(1) lookhed sets re otined from some sets clled the FOLLOW sets (see Section 7.6), nd the LALR(1) lookhed sets LA(s, A γ) require fncier FOLLOW sets (see Section 7.9). The construction of shift/reduce prsers is mde simpler y ssuming tht the end of input strings w Σ is indicted y the presence of n endmrker, usully denoted $, nd ssumed not to elong to Σ. Consider the grmmr G 1 of Exmple 1, where we hve numered the productions 0, 1, 2: 0:S E 1:E E 2:E The prsing tles ssocited with the grmmr G 1 re shown elow:

191 7.2. SHIFT/REDUCE PARSERS 191 $ E 1 s2 4 2 s2 s5 3 3 s6 4 cc 5 r2 r2 r2 6 r1 r1 r E E Figure 7.8: DFA for C G Entries of the form si re shift ctions, wherei denotes one of the sttes, nd entries of the form rn re reduce ctions,where n denotes production numer (not stte). The specil ction cc mens ccept, nd signls the successful completion of the prse. Entries of the form i, in the rightmost column, re goto ctions. All lnk entries re error entries, nd men tht the prse should e orted. We will use the nottion ction(s, ) for the entry corresponding to stte s nd terminl Σ {$}, ndgoto(s, A) for the entry corresponding to stte s nd nonterminl A N {S }. Assuming tht the input is w$, we now descrie in more detil how shift/reduce prser proceeds. The prser uses stck in which sttes re pushed nd popped. Initilly, the stck contins stte 1 nd the cursor pointing to the input is positioned on the leftmost symol. There re four possiilities: (1) If ction(s, ) =sj, thenpushsttej on top of the stck, nd dvnce to the next input symol in w$. This is shift move. (2) If ction(s, ) =rn, then do the following: First, determine the length k = γ of the righthnd side of the production n: A γ. Then, pop the topmost k symols off the stck (if k = 0, no symols re popped). If p is the new top stte on the stck (fter the k pop moves), push the stte goto(p, A) ontopofthestck,wherea is the

192 192 CHAPTER 7. A SURVEY OF LR-PARSING METHODS lefthnd side of the reducing production A γ. Do not dvnce the cursor in the current input. This is reduce move. (3) If ction(s, $) = cc, then ccept. The input string w elongs to L(G). (4) In ll other cses, error, ort the prse. The input string w does not elong to L(G). Oserve tht no explicit stte control is needed. The current stte is lwys the current topmost stte in the stck. We illustrte elow prse of the input $. stck remining input ction 1 $ s2 12 $ s2 122 $ s $ s $ r $ s $ r1 123 $ s $ r1 14 $ cc Oserve tht the sequence of reductions red from ottom-up yields rightmost derivtion of from E (or from S, if we view the ction cc s the reduction y the production S E). This is generl property of LR-prsers. The SLR(1) reduce entries in the prsing tles re determined s follows: For every stte s contining reduce item B γ., if B γ is the production numer n, enter the ction rn for stte s nd every terminl FOLLOW(B). If the resulting shift/reduce prser hs no conflicts, we sy tht the grmmr is SLR(1). For the LALR(1) reduce entries, enter the ction rn for stte s nd production n: B γ, for ll LA(s, B γ). Similrly, if the shift/reduce prser otined using LALR(1)-lookhed sets hs no conflicts, we sy tht the grmmr is LALR(1). 7.3 Computtion of FIRST In order to compute the FOLLOW sets, we first need to to compute the FIRST sets! For simplicity of exposition, we first ssume tht grmmrs hve no ϵ-rules. The generl cse will e treted in Section 7.10.

193 7.4. THE INTUITION BEHIND THE SHIFT/REDUCE ALGORITHM 193 Given context-free grmmr G = (V,Σ,P,S ) (ugmented with strt production S S), for every nonterminl A N = V Σ, let FIRST(A) ={ Σ, A + = α, for some α V }. For terminl Σ, let FIRST() ={}. The key to the computtion of FIRST(A) is the following oservtion: is in FIRST(A) if either is in or is in INITFIRST(A) ={ Σ, A α P, for some α V }, { FIRST(B), A Bα P, for some α V,B A}. Note tht the second ssertion is true ecuse, if B = + δ, thena = Bα = + δα, nd so, FIRST(B) FIRST(A) whenevera Bα P, with A B. Hence,theFIRSTsets re the lest solution of the following set of recursive equtions: For ech nonterminl A, FIRST(A) =INITFIRST(A) {FIRST(B) A Bα P, A B}. In order to explin the method for solving such systems, we will formulte the prolem in more generl terms, ut first, we descrie nive version of the shift/reduce lgorithm tht hopefully demystifies the optimized version descried in Section The Intuition Behind the Shift/Reduce Algorithm Let DCG =(K, V, δ, q 0,F) e the DFA ccepting the regulr lnguge C G, nd let δ e the extension of δ to K V.LetusssumethtthegrmmrGis either SLR(1) or LALR(1), which implies tht it hs no shift/reduce or reduce/reduce conflicts. WecnusetheDFA DCG ccepting C G recursively to prse L(G). The function CG is defined s follows: Given ny string µ V, egin error CG(µ) = if δ (q 0,µ)=error; (δ (q 0,θ),θ,v) if δ (q 0,θ) F, µ = θv nd θ is the shortest prefix of µ s.t. δ (q 0,θ) F. The nive shift-reduce lgorithm is shown elow: ccept := true; stop := flse; µ := w$; {input string} while stop do if CG(µ) =error then

194 194 CHAPTER 7. A SURVEY OF LR-PARSING METHODS end else stop := true; ccept := flse Let (q, θ, v) =CG(µ) Let B β e the production so tht ction(q, FIRST(v)) = B β nd let θ = αβ if B β = S S then else stop := true µ := αbv {reduction} endif endif endwhile The ide is to recursively run the DFA DCG on the sententil form µ, until the first finl stte q is hit. Then, the sententil form µ must e of the form αβv, wherev is terminl string ending in $, nd the finl stte q contins reduce item of the form B β, with ction(q, FIRST(v)) = B β. Thus,wecnreduceµ = αβv to αbv, since we hve found rightmost derivtion step, nd repet the process. Note tht the mjor inefficiency of the lgorithm is tht when reduction is performed, the prefix α of µ is reprsed entirely y DCG. Since DCG is deterministic, the sequence of sttes otined on input α is uniquely determined. If we keep the sequence of sttes produced on input θ y DCG in stck, then it is possile to void reprsing α. Indeed, ll we hve to do is updte the stck so tht just efore pplying DCG to αav, thesequence of sttes in the stck is the sequence otined fter prsing α. This stck is otined y popping the β topmost sttes nd performing n updte which is just goto move. This is the stndrd version of the shift/reduce lgorithm! 7.5 The Grph Method for Computing Fixed Points Let X e finite set representing the domin of the prolem (in Section 7.3 ove, X =Σ), let F (1),...,F(N) en sets to e computed nd let I(1),...,I(N) en given susets of X. The sets I(1),...,I(N) re the initil sets. We lso hve directed grph G whose set of nodes is {1,...,N} nd which represents reltionships mong the sets F (i), where 1 i N. ThegrphG hs no prllel edges nd no loops, ut it my hve cycles. If there is n edge from i to j, this is denoted y igj (note tht the sense of loops mens tht igi never holds). Also, the existence of pth from i to j is denoted y ig + j. The grph G represents reltion, nd G + is the grph of the trnsitive closure of this reltion. The existence of pth from i to j, including the null pth, is denoted y ig j.hence,g is the

195 7.5. THE GRAPH METHOD FOR COMPUTING FIXED POINTS 195 reflexive nd trnsitive closure of G. We wnt to solve for the lest solution of the system of recursive equtions: F (i) =I(i) {F (j) igj, i j}, 1 i N. Since (2 X ) N is complete lttice under the inclusion ordering (which mens tht every fmily of susets hs lest upper ound, nmely, the union of this fmily), it is n ω-complete poset, nd since the function F :(2 X ) N (2 X ) N induced y the system of equtions is esily seen to preserve lest upper ounds of ω-chins, the lest solution of the system cn e computed y the stndrd fixed point technique (s explined in Section 3.7 of the clss notes). We simply compute the sequence of pproximtions (F k (1),...,F k (N)), where F 0 (i) =, 1 i N, nd F k+1 (i) =I(i) {F k (j) igj, i j}, 1 i N. It is esily seen tht we cn stop t k = N 1, nd the lest solution is given y F (i) =F 1 (i) F 2 (i) F N (i), 1 i N. However, the ove expression cn e simplified to F (i) = {I(j) ig j}, 1 i N. This lst expression shows tht in order to compute F (i), it is necessry to compute the union of ll the initil sets I(j) rechle from i (including i). Hence, ny trnsitive closure lgorithm or grph trversl lgorithm will do. For simplicity nd for pedgogicl resons, we use depth-first serch lgorithm. Going ck to FIRST, we see tht ll we hve to do is to compute the INITFIRST sets, the grph GFIRST, nd then use the grph trversl lgorithm. The grph GFIRST is computed s follows: The nodes re the nonterminls nd there is n edge from A to B (A B) if nd only if there is production of the form A Bα, forsomeα V. Exmple 1. Computtion of the FIRST sets for the grmmr G 1 given y the rules: S E$ E E + T E T T T F T F F (E) F T F.

196 196 CHAPTER 7. A SURVEY OF LR-PARSING METHODS E T F Figure 7.9: Grph GFIRST for G 1 We get INITFIRST(E) =, INITFIRST(T )=, INITFIRST(F )={(,,}. The grph GFIRST is shown in Figure 7.9. We otin the following FIRST sets: FIRST(E) =FIRST(T )=FIRST(F )={(,,}. 7.6 Computtion of FOLLOW Recll the definition of FOLLOW(A) for nonterminl A: FOLLOW(A) ={ Σ, S + = αaβ, for some α, β V }. Note tht is in FOLLOW(A) if either is in or is in INITFOLLOW(A) ={ Σ, B αaxβ P, FIRST(X), α,β V } { FOLLOW(B), B αa P, α V,A B}. Indeed, if S + = λbρ, thens + = λbρ = λαaρ, ndso, FOLLOW(B) FOLLOW(A) whenever B αa is in P, with A B. Hence, the FOLLOW sets re the lest solution of the set of recursive equtions: For ll nonterminls A, FOLLOW(A) =INITFOLLOW(A) {FOLLOW(B) B αa P, α V,A B}. According to the method explined ove, we just hve to compute the INITFOLLOW sets (using FIRST) nd the grph GFOLLOW, which is computed s follows: The nodes re the nonterminls nd there is n edge from A to B (A B) if nd only if there is production

197 7.7. ALGORITHM TRAVERSE 197 E T F Figure 7.10: Grph GFOLLOW for G 1 of the form B αa in P,forsomeα V. Note the dulity etween the construction of the grph GFIRST nd the grph GFOLLOW. Exmple 2. Computtion of the FOLLOW sets for the grmmr G 1. INITFOLLOW(E) ={+, ), $}, INITFOLLOW(T )={ }, INITFOLLOW(F )=. The grph GFOLLOW is shown in Figure We hve FOLLOW(E) = INITFOLLOW(E), FOLLOW(T ) = INITFOLLOW(T ) INITFOLLOW(E) INITFOLLOW(F ), FOLLOW(F ) = INITFOLLOW(F ) INITFOLLOW(T ) INITFOLLOW(E), nd so FOLLOW(E) ={+, ), $}, FOLLOW(T )={+,, ), $}, FOLLOW(F )={+,, ), $}. 7.7 Algorithm Trverse The input is directed grph Gr hving N nodes, nd fmily of initil sets I[i], 1 i N. We ssume tht function successors is ville, which returns for ech node n in the grph, the list successors[n] of ll immedite successors of n. The output is the list of sets F [i], 1 i N, solution of the system of recursive equtions of Section 7.5. Hence, F [i] = {I[j] ig j}, 1 i N. The procedure Rechle visits ll nodes rechle from given node. It uses stck STACK nd oolen rry VISITED to keep trck of which nodes hve een visited. The procedures Rechle nd trverse re shown in Figure 7.11.

198 198 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Procedure Rechle(Gr : grph; strtnode : node; I : listofsets; vrf : listofsets); vr currentnode, succnode, i : node; STACK : stck; VISITED: rry[1..n] of oolen; egin for i := 1 to N do VISITED[i] := flse; STACK := EMPTY ; push(stack,strtnode); while STACK EMPTY do egin currentnode := top(stack); pop(stack); VISITED[currentnode] := true; for ech succnode successors(currentnode) do if VISITED[succnode] then egin push(stack,succnode); F [strtnode] := F [strtnode] I[succnode] end end end The sets F [i], 1 i N, re computed s follows: egin for i := 1 to N do F [i] := I[i]; for strtnode := 1 to N do Rechle(Gr, strtnode, I, F ) end Figure 7.11: Algorithm trverse

199 7.8. MORE ON LR(0)-CHARACTERISTIC AUTOMATA More on LR(0)-Chrcteristic Automt Let G =(V,Σ,P,S ) e n ugmented context-free grmmr with ugmented strt production S S$(whereS only occurs in the ugmented production). The righmost derivtion reltion is denoted y = rm. Recll tht the set C G of chrcteristic strings for the grmmr G is defined y C G = {αβ V S = rm αav = rm αβv, αβ V,v Σ }. The fundmentl property of LR-prsing, due to D. Knuth, is stted in the following theorem: Theorem 7.1. Let G e context-free grmmr nd ssume tht every nonterminl derives some terminl string. The lnguge C G (over V )isregulrlnguge. Furthermore, deterministic utomton DCG ccepting C G cn e constructed from G. The construction of DCG cn e found in vrious plces, including the ook on Compilers y Aho, Sethi nd Ullmn. We explined this construction in Section 7.1. Theproofthtthe NFA NCG constructed s indicted in Section 7.1 is correct, i.e., tht it ccepts precisely C G, is nontrivil, ut not relly hrd either. This will e the oject of homework ssignment! However, note sutle point: The construction of NCG is only correct under the ssumption tht every nonterminl derives some terminl string. Otherwise, the construction could yield n NFA NCG ccepting strings not in C G. Recll tht the sttes of the chrcteristic utomton CGA re sets of items (or mrked productions), where n item is production with dot nywhere in its right-hnd side. Note tht in constructing CGA, it is not necessry to include the stte {S S$.} (the endmrker $ is only needed to compute the lookhed sets). If stte p contins mrked production of the form A β., where the dot is the rightmost symol, stte p is clled reduce stte nd A β is clled reducing production for p. Given ny stte q, wesy tht string β V ccesses q if there is pth from some stte p to the stte q on input β in the utomton CGA. Given ny two sttes p, q CGA, fornyβ V, if there is sequence of trnsitions in CGA from p to q on input β, this is denoted y p β q. The initil stte which is the closure of the item S.S$ is denoted y 1. The LALR(1)- lookhed sets re defined in the next section. 7.9 LALR(1)-Lookhed Sets For ny reduce stte q nd ny reducing production A β for q, let LA(q, A β) ={ Σ, S = αav = αβv, α, β V,v Σ,αβccessesq}. rm rm

200 200 CHAPTER 7. A SURVEY OF LR-PARSING METHODS In words, LA(q, A β) consists of the terminl symols for which the reduction y production A β in stte q is the correct ction (tht is, for which the prse will terminte successfully). The LA sets cn e computed using the FOLLOW sets defined elow. For ny stte p nd ny nonterminl A, let FOLLOW(p, A) ={ Σ, S = αav, α V,v Σ nd α ccesses p}. rm Since for ny derivtion S = αav = αβv rm rm where αβ ccesses q, there is stte p such tht p β q nd α ccesses p, it is esy to see tht the following result holds: Proposition 7.2. For every reduce stte q nd ny reducing production A β for q, we hve LA(q, A β) = {FOLLOW(p, A) p β q}. Also, we let LA({S S.$},S S$) = FOLLOW(1,S). Intuitively, when the prser mkes the reduction y production A β in stte q, ech stte p s ove is possile top of stck fter the sttes corresponding to β re popped. Then the prser must red A in stte p, nd the next input symol will e one of the symols in FOLLOW(p, A). The computtion of FOLLOW(p, A) is similr to tht of FOLLOW(A). First, we compute INITFOLLOW(p, A), given y INITFOLLOW(p, A) ={ Σ, q, r, p A q r}. These re the terminls tht cn e red in CGA fter the goto trnsition on nonterminl A hs een performed from p. These sets cn e esily computed from CGA. Note tht for the stte p whose core item is S S.$, we hve INITFOLLOW(p, S) ={$}. Next, oserve tht if B αa is production nd if S = λbv rm where λ ccesses p,then S = λbv = λαav rm rm

201 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ϵ-rules 201 where λ ccesses p nd p α p. Henceλα ccesses p nd FOLLOW(p,B) FOLLOW(p, A) whenever there is production B αa nd p α p. From this, the following recursive equtions re esily otined: For ll p nd ll A, FOLLOW(p, A) =INITFOLLOW(p, A) {FOLLOW(p,B) B αa P, α V nd p α p}. From Section 7.5, we know tht these sets cn e computed y using the lgorithm trverse. All we need is to compute the grph GLA. The nodes of the grph GLA re the pirs (p, A), where p is stte nd A is nonterminl. There is n edge from (p, A) to(p,b) if nd only if there is production of the form B αa in P for some α V nd p α p in CGA. Note tht it is only necessry to consider nodes (p, A) for which there is nonterminl trnsition on A from p. Such pirs cn e otined from the prsing tle. Also, using the spelling property, tht is, the fct tht ll trnsitions entering given stte hve the sme lel, it is possile to compute the reltion lookck defined s follows: (q, A) lookck (p, A) iff p β q for some reduce stte q nd reducing production A β. The ove considertions show tht the FOLLOW sets of Section 7.6 re otined y ignoring the stte component from FOLLOW(p, A). We now consider the chnges tht hve to e mde when ϵ-rules re llowed Computing FIRST, FOLLOW, etc. inthepresence of ϵ-rules [Computing FIRST, FOLLOW nd LA(q, A β) in the Presence of ϵ-rules] First, it is necessry to compute the set E of ersle nonterminls, tht is, the set of nonterminls A such tht A = + ϵ. We let E e oolen rry nd chnge e oolen flg. An lgorithm for computing E is shown in Figure Then, in order to compute FIRST, we compute INITFIRST(A) ={ Σ, A α P, or A A 1 A k α P, for some α V,ndE(A 1 )= = E(A k )=true}. The grph GFIRST is otined s follows: The nodes re the nonterminls, nd there is n edge from A to B if nd only if either there is production A Bα, or production A A 1 A k Bα, forsomeα V, with E(A 1 )= = E(A k )=true. Then,weextend

202 202 CHAPTER 7. A SURVEY OF LR-PARSING METHODS egin for ech nonterminl A do E(A) := flse; for ech nonterminl A such tht A ϵ P do E(A) := true; chnge := true; while chnge do egin chnge := flse; for ech A A 1 A n P s.t. E(A 1 )= = E(A n )=true do if E(A) =flse then egin E(A) := true; chnge := true end end end Figure 7.12: Algorithm for computing E

203 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ϵ-rules 203 FIRST to strings in V +, in the ovious wy. Given ny string α V +, if α =1,then β = X for some X V,nd FIRST(β) =FIRST(X) s efore, else if β = X 1 X n with n 2ndX i V,then FIRST(β) =FIRST(X 1 ) FIRST(X k ), where k, 1 k n, is the lrgest integer so tht To compute FOLLOW, we first compute E(X 1 )= = E(X k )=true. INITFOLLOW(A) ={ Σ, B αaβ P, α V,β V +, nd FIRST(β)}. The grph GFOLLOW is computed s follows: The nodes re the nonterminls. There is n edge from A to B if either there is production of the form B αa, orb αaa 1 A k, for some α V, nd with E(A 1 )= = E(A k )=true. The computtion of the LALR(1) lookhed sets is lso more complicted ecuse nother grph is needed in order to compute INITFOLLOW(p, A). First, the grph GLA is defined in the following wy: The nodes re still the pirs (p, A), s efore, ut there is n edge from (p, A) to(p,b) if nd only if either there is some production B αa, forsome α V nd p α p, or production B αaβ, forsomeα V, β V +, β = + ϵ, nd p α p. ThesetsINITFOLLOW(p, A) re computed in the following wy: First, let DR(p, A) ={ Σ, q, r, p A q r}. The sets DR(p, A) rethedirect red sets. Note tht for the stte p whose core item is S S.$, we hve DR(p, S) ={$}. Then, INITFOLLOW(p, A) =DR(p, A) { Σ, S = αaβv = αav, α V,β V +,β = + ϵ, α ccesses p}. rm rm The set INITFOLLOW(p, A) is the set of terminls tht cn e red efore ny hndle contining A is reduced. The grph GREAD is defined s follows: The nodes re the pirs (p, A), nd there is n edge from (p, A) to(r, C) if nd only if p A r nd r C s, forsome s, with E(C) =true. Then, it is not difficult to show tht the INITFOLLOW sets re the lest solution of the set of recursive equtions: INITFOLLOW(p, A) =DR(p, A) {INITFOLLOW(r, C) (p, A) GREAD (r, C)}.

204 204 CHAPTER 7. A SURVEY OF LR-PARSING METHODS Hence the INITFOLLOW sets cn e computed using the lgorithm trverse on the grph GREAD nd the sets DR(p, A), nd then, the FOLLOW sets cn e computed using trverse gin, with the grph GLA nd sets INITFOLLOW. Finlly, the sets LA(q, A β) re computed from the FOLLOW sets using the grph lookck. From section 7.5, we note tht F (i) =F (j) whenever there is pth from i to j nd pth from j to i, tht is, whenever i nd j re strongly connected. Hence, the solution of the system of recursive equtions cn e computed more efficiently y finding the mximl strongly connected components of the grph G, since F hs sme vlue on ech strongly connected component. This is the pproch followed y DeRemer nd Pennello in: Efficient Computtion of LALR(1) Lookhed sets, y F. DeRemer nd T. Pennello, TOPLAS, Vol. 4, No. 4, Octoer 1982, pp We now give n exmple of grmmr which is LALR(1) ut not SLR(1). Exmple 3. The grmmr G 2 is given y: S S$ S L = R S R L R L id R L

205 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ϵ-rules 205 The sttes of the chrcteristic utomton CGA 2 re: We find tht 1:S.S$ S.L = R S.R L. R L.id R.L 2:S S.$ 3:S L. = R R L. 4:S R. 5:L.R R.L L. R L.id 6:L id. 7:S L =.R R.L L. R L.id 8:L R. 9:R L. 10 : S L = R. INITFIRST(S) = INITFIRST(L) = {,id} INITFIRST(R) =. The grph GFIRST is shown in Figure Then, we find tht FIRST(S) = {,id} FIRST(L) = {,id} FIRST(R) = {,id}.

206 206 CHAPTER 7. A SURVEY OF LR-PARSING METHODS S L R Figure 7.13: The grph GFIRST S L R Figure 7.14: The grph GFOLLOW We lso hve INITFOLLOW(S) = {$} INITFOLLOW(L) = {=} INITFOLLOW(R) =. The grph GFOLLOW is shown in Figure Then, we find tht FOLLOW(S) = {$} FOLLOW(L) = {=, $} FOLLOW(R) = {=, $}. Note tht there is shift/reduce conflict in stte 3 on input =, since there is shift on input = (since S L. = R is in stte 3), nd reduce for R L, since = is in FOLLOW(R). However, s we shll see, the conflict is resolved if the LALR(1) lookhed sets re computed. The grph GLA is shown in Figure 7.15.

207 7.10. COMPUTING FIRST, FOLLOW, ETC. IN THE PRESENCE OF ϵ-rules 207 (1,S) (1,R) (5,R) (7,R) (1,L) (5,L) (7,L) Figure 7.15: The grph GLA We get the following INITFOLLOW nd FOLLOW sets: INITFOLLOW(1,S)= {$} INITFOLLOW(1,S)= {$} INITFOLLOW(1,R)= INITFOLLOW(1,R)= {$} INITFOLLOW(1,L)= {=} INITFOLLOW(1,L)= {=, $} INITFOLLOW(5,R)= INITFOLLOW(5,R)= {=, $} INITFOLLOW(5,L)= INITFOLLOW(5,L)= {=, $} INITFOLLOW(7,R)= INITFOLLOW(7,R)= {$} INITFOLLOW(7,L)= INITFOLLOW(7,L)= {$}. Thus, we get LA(2,S S$) = FOLLOW(1,S)={$} LA(3,R L) = FOLLOW(1,R)={$} LA(4,S R) = FOLLOW(1,S)={$} LA(6,L id) = FOLLOW(1,L) FOLLOW(5,L) FOLLOW(7,L)={=, $} LA(8,L R) = FOLLOW(1,L) FOLLOW(5,L) FOLLOW(7,L)={=, $} LA(9,R L) = FOLLOW(5,R) FOLLOW(7,R)={=, $} LA(10,S L = R) = FOLLOW(1,S)={$}. Since LA(3,R L) does not contin =, the conflict is resolved.

208 208 CHAPTER 7. A SURVEY OF LR-PARSING METHODS (A α.β, ) (A α.β, ) Figure 7.16: Trnsition on terminl input 7.11 LR(1)-Chrcteristic Automt We conclude this rief survey on LR-prsing y descriing the construction of LR(1)-prsers. The new ingredient is tht when we construct n NFA ccepting C G, we incorporte lookhed symols into the sttes. Thus, stte is pir (A α.β, ), where A α.β is mrked production, s efore, nd Σ {$} is lookhed symol. The new twist in the construction of the nondeterministic chrcteristic utomton is the following: The strt stte is (S.S, $), nd the trnsitions re defined s follows: () For every terminl Σ, then there is trnsition on input from stte (A α.β, ) to the stte (A α.β, ) otined y shifting the dot (where = is possile). Such trnsition is shown in Figure () For every nonterminl B N, there is trnsition on input B from stte (A α.bβ, ) to stte (A αb.β, ) (otined y shifting the dot ), nd trnsitions on input ϵ (the empty string) to ll sttes (B.γ, ), for ll productions B γ with left-hnd side B nd ll FIRST(β). Such trnsitions re shown in Figure (c) A stte is finl if nd only if it is of the form (A β., ) (tht is, the dot is in the rightmost position). Exmple 3. Consider the grmmr G 3 given y: 0: S E 1: E E 2: E ϵ

209 7.11. LR(1)-CHARACTERISTIC AUTOMATA 209 (A α.bβ, ) B ϵ ϵ ϵ (A αb.β, ) (B.γ, ) Figure 7.17: Trnsitions from stte (A α.bβ, ) The result of mking the NFA for C G3 deterministic is shown in Figure 7.18 (where trnsitions to the ded stte hve een omitted). The internl structure of the sttes 1,...,8 is shown elow: 1:S.E, $ E.E, $ E., $ 2:E.E, $ E.E, E., 3:E.E, E.E, E., 4:E E., $ 5:E E., $ 6:E E., 7:E E., 8:S E., $ The LR(1)-shift/reduce prser ssocited with DCG is uilt s follows: The shift nd goto entries come directly from the trnsitions of DCG, nd for every stte s, for every item

210 210 CHAPTER 7. A SURVEY OF LR-PARSING METHODS E E E Figure 7.18: DFA for C G3 (A γ,) in s, enternentryrn for stte s nd input, wherea γ is production numer n. If the resulting prser hs no conflicts, we sy tht the grmmr is n LR(1) grmmr. The LR(1)-shift/reduce prser for G 3 is shown elow: $ E 1 s2 r2 8 2 s3 r2 4 3 s3 r2 6 4 r5 5 r1 6 r1 s7 7 r1 8 cc Oserve tht there re three pirs of sttes, (2, 3), (4, 6), nd (5, 7), where oth sttes in common pir only differ y the lookhed symols. We cn merge the sttes corresponding to ech pir, ecuse the mrked items re the sme, ut now, we hve to llow lookhed sets. Thus, the merging of (2, 3) yields 2 : E.E, {, $} E.E, {} E., {}, the merging of (4, 6) yields 3 : E E., {, $},

211 7.11. LR(1)-CHARACTERISTIC AUTOMATA 211 the merging of (5, 7) yields 4 : E E., {, $}. We otin merged DFA with only five sttes, nd the corresponding shift/reduce prser is given elow: $ E 1 s2 r2 8 2 s2 r2 3 3 s4 4 r1 r1 8 cc The reder should verify tht this is the LALR(1)-prser. The reder should lso check tht tht the SLR(1)-prser is given elow: $ E 1 s2 r2 r2 5 2 s2 r2 r2 3 3 s4 4 r1 r1 5 cc The difference etween the two prsing tles is tht the LALR(1)-lookhed sets re shrper thn the SLR(1)-lookhed sets. This is ecuse the computtion of the LALR(1)- lookhed sets uses shrper version of FOLLOW sets. It cn lso e shown tht if grmmr is LALR(1), then the merging of sttes of n LR(1)-prser lwys succeeds nd yields the LALR(1) prser. Of course, this is very inefficient wy of producing LALR(1) prsers, nd much etter methods exist, such s the grph method descried in these notes. However, there re cses where the merging fils. Sufficient conditions for successful merging hve een investigted, ut there is still room for reserch in this re.

212 212 CHAPTER 7. A SURVEY OF LR-PARSING METHODS

213 Chpter 8 RAM Progrms, Turing Mchines, nd the Prtil Recursive Functions See the scnned version of this chpter found in the we pge for CIS511: Prtil Functions nd RAM Progrms We define n strct mchine model for computing functions f : Σ } Σ {{} Σ, n where Σ = { 1,..., k } s some input lphet. Numericl functions f : N n N cn e viewed s functions defined over the one-letter lphet { 1 }, using the ijection m m 1. Let us recll the definition of prtil function. Definition 8.1. A inry reltion R A B etween two sets A nd B is functionl iff, for ll x A nd y, z B, (x, y) R nd (x, z) R implies tht y = z. A prtil function is triple f = A, G, B, wherea nd B re ritrry sets (possily empty) nd G is functionl reltion (possily empty) etween A nd B, clled the grph of f. 213

214 214 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Hence, prtil function is functionl reltion such tht every rgument hs t most one imge under f. The grph of function f is denoted s grph(f). function f nd its grph re usully identified. When no confusion cn rise, A prtil function f = A, G, B is often denoted s f : A B. The domin dom(f) of prtil function f = A, G, B is the set dom(f) ={x A y B, (x, y) G}. For every element x dom(f), the unique element y B such tht (x, y) grph(f) is denoted s f(x). We sy tht f(x) converges, lso denoted s f(x). If x A nd x/ dom(f), we sy tht f(x) diverges, lso denoted s f(x). Intuitively, if function is prtil, it does not return ny output for ny input not in its domin. This corresponds to n infinite computtion. A prtil function f : A B is totl function iff dom(f) =A. It is customry to cll totl function simply function. We now define model of computtion know s the RAM progrms, or Post mchines. RAM progrms re written in sort of ssemly lnguge involving simple instructions mnipulting strings stored into registers. Every RAM progrm uses fixed nd finite numer of registers denoted s R1,...,Rp, with no limittion on the size of strings held in the registers. RAM progrms cn e defined either in flowchrt form or in liner form. Since the liner form is more convenient for coding purposes, we present RAM progrms in liner form. ARAMprogrmP (in liner form) consists of finite sequence of instructions using finite numer of registers R1,...,Rp. Instructions my optionlly e leled with line numers denoted s N1,...,Nq. It is neither mndtory to lel ll instructions, nor to use distinct line numers! Thus, the sme line numer cn e used in more thn one line. As we will see lter on, this mkes it esier to conctente two different progrms without performing renumering of line numers. Every instruction hs four fields, not necessrily ll used. The min field is the op-code. Here is n exmple of RAM progrm to conctente two strings x 1 nd x 2.

215 8.1. PARTIAL FUNCTIONS AND RAM PROGRAMS 215 R3 R1 R4 R2 N0 R4 jmp N1 R4 jmp N2 jmp N3 N1 dd R3 til jmp R4 N0 N2 dd R3 til jmp R4 N0 N3 R1 R3 continue Definition 8.2. RAM progrms re constructed from seven types of instructions shown elow: (1 j ) N dd j Y (2) N til Y (3) N clr Y (4) N Y X (5) N jmp N1 (5) N jmp N1 (6 j ) N Y jmp j N1 (6 j ) N Y jmp j N1 (7) N continue 1. An instruction of type (1 j ) conctentes the letter j to the right of the string held y register Y (1 j k). The effect is the ssignment Y := Y j. 2. An instruction of type (2) deletes the leftmost letter of the string held y the register Y. This corresponds to the function til, definedsuchtht

216 216 CHAPTER 8. RAM PROGRAMS, TURING MACHINES til(ϵ) =ϵ, til( j u)=u. The effect is the ssignment Y := til(y ). 3. An instruction of type (3) clers register Y, i.e., sets its vlue to the empty string ϵ. The effect is the ssignment Y := ϵ. 4. An instruction of type (4) ssigns the vlue of register X to register Y. The effect is the ssignment Y := X. 5. An instruction of type (5) or (5) is n unconditionl jump. The effect of (5) is to jump to the closest line numer N1 occurring ove the instruction eing executed, nd the effect of (5) is to jump to the closest line numer N1 occurring elow the instruction eing executed. 6. An instruction of type (6 j )or(6 j ) is conditionl jump. Let hed e the function defined s follows: hed(ϵ) =ϵ, hed( j u)= j. The effect of (6 j ) is to jump to the closest line numer N1 occurring ove the instruction eing executed iff hed(y )= j, else to execute the next instruction (the one immeditely following the instruction eing executed). The effect of (6 j ) is to jump to the closest line numer N1 occurring elow the instruction eing executed iff hed(y )= j, else to execute the next instruction. When computing over N, instructions of type (6 j ) jump to the closest N1 oveor elow iff Y is nonnull.

217 8.1. PARTIAL FUNCTIONS AND RAM PROGRAMS An instruction of type (7) is no-op, i.e., the registers re unffected. If there is next instruction, then it is executed, else, the progrm stops. Oviously, progrm is syntcticlly correct only if certin conditions hold. Definition 8.3. A RAM progrm P is finite sequence of instructions s in Definition 8.2, nd stisfying the following conditions: (1) For every jump instruction (conditionl or not), the line numer to e jumped to must exist in P. (2) The lst instruction of RAM progrm is continue. The reson for llowing multiple occurences of line numers is to mke it esier to conctente progrms without hving to perform renming of line numers. The technicl choice of jumping to the closest ddress N1 ove or elow comes from the fct tht it is esy to serch up or down using primitive recursion, s we will see lter on. For the purpose of computing function f : } Σ Σ {{} Σ using RAM progrm n P,wessumethtP hs t lest n registers clled input registers, nd tht these registers R1,...,Rn re initilized with the input vlues of the function f. We lso ssume tht the output is returned in register R1. The following RAM progrm conctentes two strings x 1 nd x 2 held in registers R1 nd R2. R3 R1 R4 R2 N0 R4 jmp N1 R4 jmp N2 jmp N3 N1 dd R3 til jmp R4 N0 N2 dd R3 til jmp R4 N0 N3 R1 R3 continue

218 218 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Since Σ = {, }, for more clrity, we wrote jmp insted of jmp 1, jmp insted of jmp 2, dd insted of dd 1,nddd insted of dd 2. Definition 8.4. ARAMprogrmP computes the prtil function ϕ: (Σ ) n Σ if the following conditions hold: For every input (x 1,...,x n ) (Σ ) n, hving initilized the input registers R1,...,Rnwith x 1,...,x n, the progrm eventully hlts iff ϕ(x 1,...,x n )converges, nd if nd when P hlts, the vlue of R1 is equl to ϕ(x 1,...,x n ). A prtil function ϕ is RAM-computle iff it is computed y some RAM progrm. For exmple, the following progrm computes the erse function E defined such tht for ll u Σ : E(u) =ϵ clr R1 continue The following progrm computes the jth successor function S j defined such tht for ll u Σ : S j (u) =u j dd j continue R1 The following progrm (with n input vriles) computes the projection function P n i defined such tht where n 1, nd 1 i n: P n i (u 1,...,u n )=u i, Note tht P 1 1 is the identity function. R1 Ri continue Hving progrmming lnguge, we would like to know how powerful it is, tht is, we would like to know wht kind of functions re RAM-computle. At first glnce, RAM progrms don t do much, ut this is not so. Indeed, we will see shortly tht the clss of RAM-computle functions is quite extensive.

219 8.2. DEFINITION OF A TURING MACHINE 219 One wy of getting new progrms from previous ones is vi composition. Another one is y primitive recursion. We will investigte these constructions fter introducing nother model of computtion, Turing mchines. Remrkly, the clsses of (prtil) functions computed y RAM progrms nd y Turing mchines re identicl. This is the clss of prtil computle functions, lso clled prtil recursive functions, term which is now considered old-fshion. This clss cn e given severl other definitions. We will present the definition of the so-clled µ-recursive functions (due to Kleene). The following proposition will e needed to simplify the encoding of RAM progrms s numers. Proposition 8.1. Every RAM progrm cn e converted to n equivlent progrm only using the following type of instructions: (1 j ) N dd j Y (2) N til Y (6 j ) N Y jmp j N1 (6 j ) N Y jmp j N1 (7) N continue The proof is firly simple. For exmple, instructions of the form Ri Rj cn e eliminted y trnsferring the contents of Rj into n uxiliry register Rk, nd then y trnsferring the contents of Rk into Ri nd Rj. 8.2 Definition of Turing Mchine We define Turing mchine model for computing functions f : Σ } Σ {{} Σ, n where Σ = { 1,..., N } is some input lphet. mchines. A Turing mchine lso uses tpe lphet ΓsuchthtΣ Γ. contins some specil symol B/ Σ, the lnk. We only consider deterministic Turing The tpe lphet

220 220 CHAPTER 8. RAM PROGRAMS, TURING MACHINES In this model, Turing mchine uses single tpe. This tpe cn e viewed s string over Γ. The tpe is oth n input tpe nd storge mechnism. Symols on the tpe cn e overwritten, nd the tpe cn grow either on the left or on the right. There is red/write hed pointing to some symol on the tpe. Definition 8.5. A (deterministic) Turing mchine (or TM ) M is sextuple M =(K, Σ, Γ, {L, R},δ,q 0 ), where K is finite set of sttes; Σ is finite input lphet; Γ is finite tpe lphet, s.t. Σ Γ, K Γ=, nd with lnk B/ Σ; q 0 K is the strt stte (or initil stte); δ is the trnsition function, (finite) set of quintuples δ K Γ Γ {L, R} K, such tht for ll (p, ) K Γ, there is t most one triple (, m, q) Γ {L, R} K such tht (p,,, m, q) δ. A quintuple (p,,, m, q) δ is clled n instruction. It is lso denoted s p,, m, q. The effect of n instruction is to switch from stte p to stte q, overwrite the symol currently scnned with, nd move the red/write hed either left or right, ccording to m. Here is n exmple of Turing mchine. K = {q 0,q 1,q 2,q 3 }; Σ={, }; Γ={,, B}; The instructions in δ re:

221 8.3. COMPUTATIONS OF TURING MACHINES 221 q 0,B B,R,q 3, q 0,, R, q 1, q 0,, R, q 1, q 1,, R, q 1, q 1,, R, q 1, q 1,B B,L,q 2, q 2,, L, q 2, q 2,, L, q 2, q 2,B B,R,q Computtions of Turing Mchines To explin how Turing mchine works, we descrie its ction on Instntneous descriptions. We tke dvntge of the fct tht K Γ= to define instntneous descriptions. Definition 8.6. Given Turing mchine M =(K, Σ, Γ, {L, R},δ,q 0 ), n instntneous description (for short n ID) is (nonempty) string in Γ KΓ +, tht is, string of the form upv, where u, v Γ, p K, nd Γ. The intuition is tht n ID upv descries snpshot of TM in the current stte p, whose tpe contins the string uv, nd with the red/write hed pointing to the symol. Thus, in upv, thesttep is just to the left of the symol presently scnned y the red/write hed. We explin how TM works y showing how it cts on ID s. Definition 8.7. Given Turing mchine M =(K, Σ, Γ, {L, R},δ,q 0 ), the yield reltion (or compute reltion) is inry reltion defined on the set of ID s s follows. For ny two ID s ID 1 nd ID 2,wehveID 1 ID 2 iff either (1) (p,,, R, q) δ, nd either

222 222 CHAPTER 8. RAM PROGRAMS, TURING MACHINES or () ID 1 = upcv, c Γ, nd ID 2 = uqcv, or () ID 1 = up nd ID 2 = uqb; (2) (p,,, L, q) δ, nd either () ID 1 = ucpv, c Γ, nd ID 2 = uqcv, or () ID 1 = pv nd ID 2 = qbv. Note how the tpe is extended y one lnk fter the rightmost symol in cse (1)(), nd y one lnk efore the leftmost symol in cse (2)(). As usul, we let + denote the trnsitive closure of, nd we let denote the reflexive nd trnsitive closure of. We cn now explin how Turing mchine computes prtil function f : Σ } Σ {{} Σ. n Since we llow functions tking n 1 input strings, we ssume tht Γ contins the specil delimiter, not in Σ, used to seprte the vrious input strings. It is convenient to ssume tht Turing mchine clens up its tpe when it hlts, efore returning its output. For this, we will define proper ID s. Definition 8.8. Given Turing mchine M =(K, Σ, Γ, {L, R},δ,q 0 ), where Γ contins some delimiter, not in Σ in ddition to the lnk B, strting ID is of the form q 0 w 1,w 2,...,w n where w 1,...,w n Σ nd n 2, or q 0 w with w Σ +,orq 0 B. A locking (or hlting) ID is n ID upv such tht there re no instructions (p,,, m, q) δ for ny (, m, q) Γ {L, R} K. A proper ID is hlting ID of the form B k pwb l, where w Σ,ndk, l 0 (with l 1whenw = ϵ). Computtion sequences re defined s follows.

223 8.3. COMPUTATIONS OF TURING MACHINES 223 Definition 8.9. Given Turing mchine M =(K, Σ, Γ, {L, R},δ,q 0 ), computtion sequence (or computtion) is finite or infinite sequence of ID s ID 0,ID 1,...,ID i,id i+1,..., such tht ID i ID i+1 for ll i 0. A computtion sequence hlts iff it is finite sequence of ID s, so tht ID 0 ID n, nd ID n is hlting ID. A computtion sequence diverges if it is n infinite sequence of ID s. We now explin how Turing mchine computes prtil function. Definition A Turing mchine M =(K, Σ, Γ, {L, R},δ,q 0 ) computes the prtil function iff the following conditions hold: f : } Σ Σ {{} Σ n (1) For every w 1,...,w n Σ, given the strting ID ID 0 = q 0 w 1,w 2,...,w n or q 0 w with w Σ +,orq 0 B, the computtion sequence of M from ID 0 hlts in proper ID iff f(w 1,...,w n ) is defined. (2) If f(w 1,...,w n ) is defined, then M hlts in proper ID of the form ID n = B k pf(w 1,...,w n )B h, which mens tht it computes the right vlue. A function f (over Σ ) is Turing computle iff it is computed y some Turing mchine M.

224 224 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Note tht y (1), the TM M my hlt in n improper ID, in which cse f(w 1,...,w n ) must e undefined. This corresponds to the fct tht we only ccept to retrieve the output of computtion if the TM hs clened up its tpe, i.e., produced proper ID. In prticulr, intermedite clcultions hve to e ersed efore hlting. Exmple. K = {q 0,q 1,q 2,q 3 }; Σ={, }; Γ={,, B}; The instructions in δ re: q 0,B B,R,q 3, q 0,, R, q 1, q 0,, R, q 1, q 1,, R, q 1, q 1,, R, q 1, q 1,B B,L,q 2, q 2,, L, q 2, q 2,, L, q 2, q 2,B B,R,q 3. The reder cn esily verify tht this mchine exchnges the s nd s in string. For exmple, on input w =, the output is. 8.4 RAM-computle functions re Turing-computle Turing mchines cn simulte RAM progrms, nd s result, we hve the following Theorem. Theorem 8.2. Every RAM-computle function is Turing-computle. Furthermore, given RAMprogrmP,wecneffectivelyconstructTuringmchineM computing the sme function. The ide of the proof is to represent the contents of the registers R1,...Rpon the Turing mchine tpe y the string #r1#r2# #rp#,

225 8.5. TURING-COMPUTABLE FUNCTIONS ARE RAM-COMPUTABLE 225 Where # is specil mrker nd ri represents the string held y Ri, We lso use Proposition 8.1 to reduce the numer of instructions to e delt with. The Turing mchine M is uilt of locks, ech lock simulting the effect of some instruction of the progrm P. The detils re it tedious, nd cn e found in the notes or in Mchtey nd Young. 8.5 Turing-computle functions re RAM-computle RAM progrms cn lso simulte Turing mchines. Theorem 8.3. Every Turing-computle function is RAM-computle. Furthermore, given Turing mchine M, one cn effectively construct RAM progrm P computing the sme function. The ide of the proof is to design RAM progrm contining n encoding ofthecurrent ID of the Turing mchine M in register R1, nd to use other registers R2,R3 to simulte the effect of executing n instruction of M y updting the ID of M in R1. The detils re tedious nd cn e found in the notes. Another proof cn e otined y proving tht the clss of Turing computle functions coincides with the clss of prtil computle functions (formerly clled prtil recursive functions). Indeed, it turns out tht oth RAM progrms nd Turing mchines compute precisely the clss of prtil recursive functions. For this, we need to define the primitive recursive functions. Informlly, primitive recursive function is totl recursive function tht cn e computed using only for loops, tht is, loops in which the numer of itertions is fixed (unlike while loop). A forml definition of the primitive functions is given in Section 8.7. Definition Let Σ = { 1,..., N }. The clss of prtil computle functions lso clled prtil recursive functions is the clss of prtil functions (over Σ )thtcnecomputed y RAM progrms (or equivlently y Turing mchines). The clss of computle functions lso clled recursive functions is the suset of the clss of prtil computle functions consisting of functions defined for every input (i.e., totl functions). We cn lso del with lnguges.

226 226 CHAPTER 8. RAM PROGRAMS, TURING MACHINES 8.6 Computly Enumerle Lnguges nd Computle Lnguges We define the computly enumerle lnguges, lso clled listle lnguges, nd the computle lnguges. The old-fshion terminology for computly enumerle lnguges is recursively enumerle lnguges, nd for computle lnguges is recursive lnguges. We ssume tht the TM s under considertion hve tpe lphet contining the specil symols 0 nd 1. Definition Let Σ = { 1,..., N }. A lnguge L Σ is (Turing) computly enumerle (for short, c.e. set), or(turing) listle (or recursively enumerle (for short, r.e. set)) iff there is some TM M such tht for every w L, M hlts in proper ID with the output 1, nd for every w/ L, either M hlts in proper ID with the output 0, or it runs forever. A lnguge L Σ is (Turing) computle (or recursive) iff there is some TM M such tht for every w L, M hlts in proper ID with the output 1, nd for every w/ L, M hlts in proper ID with the output 0. Thus, given computly enumerle lnguge L, forsomew/ L, it is possile tht TM ccepting L runs forever on input w. On the other hnd, for computle (recursive) lnguge L, TM ccepting L lwys hlts in proper ID. When deling with lnguges, it is often useful to consider nondeterministic Turing mchines. Such mchines re defined just like deterministic Turing mchines, except tht their trnsition function δ is just (finite) set of quintuples with no prticulr extr condition. δ K Γ Γ {L, R} K, It cn e shown tht every nondeterministic Turing mchine cn e simulted y deterministic Turing mchine, nd thus, nondeterministic Turing mchines lso ccept the clss of c.e. sets. It cn e shown tht computly enumerle lnguge is the rnge of some computle (recursive) function. It cn lso e shown tht lnguge L is computle (recursive) iff oth L nd its complement re computly enumerle. There re computly enumerle lnguges tht re not computle (recursive). Turing mchines were invented y Turing round The primitive recursive functions were known to Hilert circ Gödel formlized their definition in The prtil recursive functions were defined y Kleene round 1934.

227 8.7. THE PRIMITIVE RECURSIVE FUNCTIONS 227 Church lso introduced the λ-clculus s model of computtion round Other models: Post systems, Mrkov systems. The equivlence of the vrious models of computtion ws shown round 1935/36. RAM progrms were only defined round 1963 (they re slight generliztion of Post system). A further study of the prtil recursive functions requires the notions of piring functions nd of universl functions (or universl Turing mchines). 8.7 The Primitive Recursive Functions The clss of primitive recursive functions is defined in terms of se functions nd closure opertions. Definition Let Σ = { 1,..., N }.These functions over Σ re the following functions: (1) The erse function E, definedsuchthte(w) =ϵ, for ll w Σ ; (2) For every j, 1 j N, thej-successor function S j,definedsuchthts j (w) =w j, for ll w Σ ; (3) The projection functions P n i,definedsuchtht P n i (w 1,...,w n )=w i, for every n 1, every i, 1 i n, nd for ll w 1,...,w n Σ. Note tht P 1 1 is the identity function on Σ. Projection functions cn e used to permute the rguments of nother function. A crucil closure opertion is (extended) composition. Definition Let Σ = { 1,..., N }. For ny function g :Σ } Σ {{} Σ, m nd ny m functions h i : } Σ Σ {{} Σ, n the composition of g nd the h i is the function denoted s g (h 1,...,h m ), such tht for ll w 1,...,w n Σ. f : Σ } Σ {{} Σ, n f(w 1,...,w n )=g(h 1 (w 1,...,w n ),...,h m (w 1,...,w n )),

228 228 CHAPTER 8. RAM PROGRAMS, TURING MACHINES As n exmple, f = g (P2 2,P2 1 ) is such tht f(w 1,w 2 )=g(p 2 2 (w 1,w 2 ),P 2 1 (w 1,w 2 )) = g(w 2,w 1 ). Another crucil closure opertion is primitive recursion. Definition Let Σ = { 1,..., N }. For ny function where m 2, nd ny N functions g :Σ } Σ {{} Σ, m 1 h i : } Σ Σ {{} Σ, m+1 the function f : Σ } Σ {{} Σ, m is defined y primitive recursion from g nd h 1,...,h N, if f(ϵ,w 2,...,w m )=g(w 2,...,w m ), f(u 1,w 2,...,w m )=h 1 (u,f(u,w 2,...,w m ),w 2,...,w m ), = f(u N,w 2,...,w m )=h N (u,f(u,w 2,...,w m ),w 2,...,w m ), for ll u, w 2,...,w m Σ. When m =1,forsomefixedw Σ,wehve for ll u Σ. f(ϵ) =w, f(u 1 )=h 1 (u,f(u)), = f(u N )=h N (u,f(u)), For numericl functions (i.e., when Σ = { 1 }), the scheme of primitive recursion is simpler: f(0,x 2,...,x m )=g(x 2,...,x m ), f(x +1,x 2,...,x m )=h 1 (x, f(x, x 2,...,x m ),x 2,...,x m ),

229 8.7. THE PRIMITIVE RECURSIVE FUNCTIONS 229 for ll x, x 2,...,x m N. The successor function S is the function S(x) =x +1. Addition, multipliction, exponentition, nd super-exponentition, cn e defined y primitive recursion s follows (eing it loose, we should use some projections...): dd(0,n)=p 1 1 (n) =n, dd(m +1,n)=S P2 3 (m, dd(m, n),n) = S(dd(m, n)) mult(0,n)=e(n) =0, mult(m +1,n)=dd (P 3 2,P 3 3 )(m, mult(m, n),n) = dd(mult(m, n),n), rexp(0,n)=s E(n) =1, rexp(m +1,n)=mult(rexp(m, n),n), exp(m, n) =rexp (P2 2,P2 1 )(m, n), supexp(0,n)=1, supexp(m +1, n)=exp(n, supexp(m, n)). We usully write m + n for dd(m, n), m n or even mn for mult(m, n), nd m n for exp(m, n). y There is minus opertion on N nmed monus. This opertion denoted y. is defined m. n = { m n if m n 0 if m<n. To show tht it is primitive recursive, we define the function pred. primitive recursive function given y Then monus is defined y pred(0) = 0 pred(m +1)=P 2 1 (m, pred(m)) = m. monus(m, 0) = m monus(m, n +1)=pred(monus(m, n)), Let pred e the except tht the ove is not legl primitive recursion. It is left s n exercise to give proper primitive recursive definition of monus.

230 230 CHAPTER 8. RAM PROGRAMS, TURING MACHINES As n exmple over {, }, the following function g :Σ Σ Σ, is defined y primitive recursion: g(ϵ, v) =P 1 1 (v), g(u i,v)=s i P2 3 (u, g(u, v),v), where 1 i N. It is esily verified tht g(u, v) =vu. Then, f = g (P 2 2,P2 1 ) computes the conctention function, i.e. f(u, v) = uv. The following functions re lso primitive recursive: { 1 if n>0 sg(n) = 0 if n =0, s well s nd nd Indeed Finlly, the function is primitive recursive since sg(n) = { 0 if n>0 1 if n =0, s(m, n) = m m = m. n + n. m, eq(m, n) = sg(0) = 0 { 1 if m = n 0 if m n. sg(n +1)=S E P1 2 (n, sg(n)), sg(n) =S(E(n)). sg(n) =1. sg(n), eq(m, n) =sg( m n ). cond(m, n, p, q) = { p if m = n q if m n, cond(m, n, p, q) =eq(m, n) p + sg(eq(m, n)) q.

231 8.7. THE PRIMITIVE RECURSIVE FUNCTIONS 231 We cn lso design more generl version of cond. For exmple, define compre s { 1 if m n compre (m, n) = 0 if m>n, which is given y Then we cn define with compre (m, n) =1. sg(m. n). cond (m, n, p, q) = { p q if m n if m>n, cond (m, n, n, p) =compre (m, n) p + sg(compre (m, n)) q. The ove llows to define functions y cses. Definition Let Σ = { 1,..., N }. The clss of primitive recursive functions is the smllest clss of functions (over Σ ) which contins the se functions nd is closed under composition nd primitive recursion. We leve s n exercise to show tht every primitive recursive function is totl function. The clss of primitive recursive functions my not seem very ig, ut it contins ll the totl functions tht we would ever wnt to compute. Although it is rther tedious to prove, the following theorem cn e shown. Theorem 8.4. For n lphet Σ = { 1,..., N }, every primitive recursive function is Turing computle. The est wy to prove the ove theorem is to use the computtion model of RAM progrms. Indeed, it ws shown in Theorem 8.2 tht every RAM progrm cn e converted to Turing mchine. It is lso rther esy to show tht the primitive recursive functions re RAM-computle. In order to define new functions it is lso useful to use predictes. Definition An n-ry predicte P (over Σ ) is ny suset of (Σ ) n. We write tht tuple (x 1,...,x n ) stisfies P s (x 1,...,x n ) P or s P (x 1,...,x n ). The chrcteristic function of predicte P is the function C P :(Σ ) n { 1 } defined y { 1 iff P (x C p (x 1,...,x n )= 1,...,x n ) ϵ iff not P (x 1,...,x n ). A predicte P is primitive recursive iff its chrcteristic function C P is primitive recursive.

232 232 CHAPTER 8. RAM PROGRAMS, TURING MACHINES We leve to the reder the ovious dpttion of the the notion of primitive recursive predicte to functions defined over N. In this cse, 0 plys the role of ϵ nd 1 plys the role of 1. It is esily shown tht if P nd Q re primitive recursive predictes (over (Σ ) n ), then P Q, P Q nd P re lso primitive recursive. As n exercise, the reder my wnt to prove tht the predicte (defined over N): prime(n) iff n is prime numer, is primitive recursive predicte. For ny fixed k 1, the function: ord(k, n) = exponent of the kth prime in the prime fctoriztion of n, is primitive recursive function. We cn lso define functions y cses. Proposition 8.5. If P 1,...,P n re pirwise disjoint primitive recursive predictes (which mens tht P i P j = for ll i j) ndf 1,...,f n+1 re primitive recursive functions, the function g defined elow is lso primitive recursive: f 1 (x) iff P 1 (x) g(x) =. f n (x) iff P n (x) f n+1 (x) otherwise. (writing x for (x 1,...,x n ).) It is lso useful to hve ounded quntifiction nd ounded minimiztion. Definition If P is n (n + 1)-ry predicte, then the ounded existentil predicte y/xp(y, z) holds iff some prefix y of x mkes P (y, z) true. The ounded universl predicte y/xp(y, z) holds iff every prefix y of x mkes P (y, z) true. Proposition 8.6. If P is n (n +1)-ry primitive recursive predicte, then y/xp(y, z) nd y/xp(y, z) re lso primitive recursive predictes. As n ppliction, we cn show tht the equlity predicte, u = v?, is primitive recursive. Definition If P is n (n + 1)-ry predicte, then the ounded minimiztion of P, min y/xp(y, z), is the function defined such tht min y/xp(y, z) is the shortest prefix of x such tht P (y, z) if such y exists, x 1 otherwise. The ounded mximiztion of P, mxy/x P(y, z), is the function defined such tht mx y/xp(y, z) is the longest prefix of x such tht P (y, z) if such y exists, x 1 otherwise.

233 8.8. THE PARTIAL COMPUTABLE FUNCTIONS 233 Proposition 8.7. If P is n (n +1)-ry primitive recursive predicte, then min y/xp(y, z) nd mx y/xp(y, z) re primitive recursive functions. So fr, the primitive recursive functions do not yield ll the Turing-computle functions. In order to get lrger clss of functions, we need the closure opertion known s minimiztion. 8.8 The Prtil Computle Functions Minimiztion cn e viewed s n strct version of while loop. Let Σ = { 1,..., N }. For ny function g :Σ } Σ {{} Σ, m+1 where m 0, for every j, 1 j N, the function f : } Σ Σ {{} Σ m looks for the shortest string u over j (for given j) suchtht u := ϵ; while g(u, w 1,...,w m ) ϵ do u := u j ; endwhile let f(w 1,...,w m )=u g(u, w 1,...,w m )=ϵ : The opertion of minimiztion (sometimes clled minimliztion) is defined s follows. Definition Let Σ = { 1,..., N }. For ny function g :Σ } Σ {{} Σ, m+1 where m 0, for every j, 1 j N, the function f : Σ } Σ {{} Σ, m is defined y minimiztion over { j } from g, if the following conditions hold for ll w 1,..., w m Σ :

234 234 CHAPTER 8. RAM PROGRAMS, TURING MACHINES (1) f(w 1,...,w m ) is defined iff there is some n 0suchthtg( p j,w 1,...,w m ) is defined for ll p, 0 p n, nd g( n j,w 1,...,w m )=ϵ. (2) When f(w 1,...,w m ) is defined, f(w 1,...,w m )= n j, where n is such tht nd for every p, 0 p n 1. g( n j,w 1,...,w m )=ϵ g( p j,w 1,...,w m ) ϵ We lso write f(w 1,...,w m )=min j u[g(u, w 1,...,w m )=ϵ]. Note: Whenf(w 1,...,w m ) is defined, f(w 1,...,w m )= n j, where n is the smllest integer such tht condition (1) holds. It is very importnt to require tht ll the vlues g( p j,w 1,...,w m ) e defined for ll p, 0 p n, when defining f(w 1,...,w m ). Filure to do so llows non-computle functions. Remrk : Kleene used the µ-nottion: ctully, its numericl form: f(w 1,...,w m )=µ j u[g(u, w 1,...,w m )=ϵ], f(x 1,...,x m )=µx[g(x, x 1,...,x m )=0]. The clss of prtil computle functions is defined s follows. Definition Let Σ = { 1,..., N }. The clss of prtil computle functions lso clled prtil recursive functions is the smllest clss of prtil functions (over Σ ) which contins the se functions nd is closed under composition, primitive recursion, nd minimiztion. The clss of computle functions lso clled recursive functions is the suset of the clss of prtil computle functions consisting of functions defined for every input (i.e., totl functions). One of the mjor results of computility theory is the following theorem.

235 8.8. THE PARTIAL COMPUTABLE FUNCTIONS 235 Theorem 8.8. For n lphet Σ={ 1,..., N },everyprtilcomputlefunction(prtil recursive function) is Turing-computle. Conversely, every Turing-computle function is prtilcomputlefunction(prtilrecursivefunction). Similrly, the clss of computle functions (recursive functions) is equl to the clss of Turing-computle functions tht hlt in proper ID for every input. To prove tht every prtil computle function is indeed Turing-computle, since y Theorem 8.2, every RAM progrm cn e converted to Turing mchine, the simplest thing to do is to show tht every prtil computle function is RAM-computle. For the converse, one cn show tht given Turing mchine, there is primitive recursive function descriing how to go from one ID to the next. Then, minimiztion is used to guess whether computtion hlts. The proof shows tht every prtil computle function needs minimiztion t most once. The chrcteriztion of the computle functions in terms of TM s follows esily. There re computle functions (recursive functions) tht re not primitive recursive. Such n exmple is given y Ackermnn s function. nd Ackermnn s function. This is function A: N N N which is defined y the following recursive cluses: A(0, y)=y +1, A(x +1, 0) = A(x, 1), A(x +1,y+1)=A(x, A(x +1,y)). It turns out tht A is computle function which is not primitive recursive. It cn e shown tht: with A(4, 0) = 16 3=13. For exmple A(0, x)=x +1, A(1, x)=x +2, A(2, x)=2x +3, A(3, x)=2 x+3 3, A(4,x)= } x 3, A(4, 1) = , A(4, 2) =

236 236 CHAPTER 8. RAM PROGRAMS, TURING MACHINES Actully, it is not so ovious tht A is totl function. This cn e shown y induction, using the lexicogrphic ordering on N N, which is defined s follows: (m, n) (m,n ) m = m nd n = n, or m<m, or m = m nd n<n. iff either We write (m, n) (m,n )when(m, n) (m,n )nd(m, n) (m,n ). We prove tht A(m, n) is defined for ll (m, n) N N y complete induction over the lexicogrphic ordering on N N. In the se cse, (m, n) =(0, 0), nd since A(0,n)=n +1, we hve A(0, 0) = 1, nd A(0, 0) is defined. For (m, n) (0, 0), the induction hypothesis is tht A(m,n ) is defined for ll (m,n ) (m, n). We need to conclude tht A(m, n) is defined. If m = 0, since A(0,n)=n +1,A(0,n) is defined. If m 0ndn = 0, since (m 1, 1) (m, 0), y the induction hypothesis, A(m 1, 1) is defined, ut A(m, 0) = A(m 1, 1), nd thus A(m, 0) is defined. If m 0ndn 0, since (m, n 1) (m, n), y the induction hypothesis, A(m, n 1) is defined. Since (m 1,A(m, n 1)) (m, n), y the induction hypothesis, A(m 1,A(m, n 1)) is defined. 1,A(m, n 1)), nd thus A(m, n) is defined. But A(m, n) = A(m Thus, A(m, n) is defined for ll (m, n) N N. It is possile to show tht A is recursive function, lthough the quickest wy to prove it requires some fncy mchinery (the recursion theorem). Proving tht A is not primitive recursive is hrder. The following proposition shows tht restricting ourselves to totl functions is too limiting. Let F e ny set of totl functions tht contins the se functions nd is closed under composition nd primitive recursion (nd thus, F contins ll the primitive recursive functions).

237 8.8. THE PARTIAL COMPUTABLE FUNCTIONS 237 Definition We sy tht function f :Σ Σ Σ is universl for the one-rgument functions in F iff for every function g :Σ Σ in F, there is some n N such tht for ll u Σ. f( n 1,u)=g(u) Proposition 8.9. For ny countle set F of totl functions contining the se functions nd closed under composition nd primitive recursion, if f is universl function for the functions g :Σ Σ in F, thenf / F. Proof. Assume tht the universl function f is in F. Letg e the function such tht g(u) =f( u 1,u) 1 for ll u Σ. We clim tht g F. It it enough to prove tht the function h such tht h(u) = u 1 is primitive recursive, which is esily shown. Then, ecuse f is universl, there is some m such tht for ll u Σ. Letting u = m 1,weget contrdiction. g(u) =f( m 1,u) g( m 1 )=f( m 1, m 1 )=f( m 1, m 1 ) 1, Thus, either universl function for F is prtil, or it is not in F.

238 238 CHAPTER 8. RAM PROGRAMS, TURING MACHINES

239 Chpter 9 Universl RAM Progrms nd Undecidility of the Hlting Prolem 9.1 Piring Functions Piring functions re used to encode pirs of integers into single integers, or more generlly, finite sequences of integers into single integers. We egin y exhiiting ijective piring function J : N 2 N. The function J hs the grph prtilly showed elow: y x The function J corresponds to certin wy of enumerting pirs of integers (x, y). Note tht the vlue of x + y is constnt long ech descending digonl, nd consequently, we hve J(x, y) =1+2+ +(x + y)+x, =((x + y)(x + y +1)+2x)/2, =((x + y) 2 +3x + y)/2, 239

240 240 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM tht is, J(x, y) =((x + y) 2 +3x + y)/2. For exmple, J(0, 3) = 6, J(1, 2) = 7, J(2, 2) = 12, J(3, 1) = 13, J(4, 0) = 14. Let K : N N nd L: N N e the projection functions onto the xes, tht is, the unique functions such tht K(J(, )) = nd L(J(, )) =, for ll, N. For exmple, K(11) = 1, nd L(11) = 3; K(12) = 2, nd L(12) = 2; K(13) = 3 nd L(13) = 1. The functions J, K, L re clled Cntor s piring functions. They were used y Cntor to prove tht the set Q of rtionl numers is countle. Clerly, J is primitive recursive, since it is given y polynomil. It is not hrd to prove tht J is injective nd surjective, nd tht it is strictly monotonic in ech rgument, which mens tht for ll x, x,y,y N, if x<x then J(x, y) <J(x,y), nd if y<y then J(x, y) <J(x, y ). The projection functions cn e computed explicitly, lthough this is it tricky. We only need to oserve tht y monotonicity of J, nd thus, nd x J(x, y) nd y J(x, y), K(z) = min(x z)( y z)[j(x, y) =z], L(z) = min(y z)( x z)[j(x, y) =z]. Therefore, K nd L re primitive recursive. It cn e verified tht J(K(z),L(z)) = z, for ll z N. More explicit formule cn e given for K nd L. Ifwedefine then it cn e shown tht Q 1 (z) = ( 8z +1 +1)/2 1 Q 2 (z) =2z (Q 1 (z)) 2, K(z) = 1 2 (Q 2(z) Q 1 (z)) L(z) =Q 1 (z) 1 2 (Q 2(z) Q 1 (z)). In the ove formul, the function m m yields the lrgest integer s such tht s 2 m. ItcnecomputedyRAMprogrm.

241 9.1. PAIRING FUNCTIONS 241 The piring function J(x, y) is lso denoted s x, y, ndk nd L re lso denoted s Π 1 nd Π 2. By induction, we cn define ijections etween N n nd N for ll n 1. We let z 1 = z, x 1,x 2 2 = x 1,x 2, nd For exmple. x 1,...,x n,x n+1 n+1 = x 1,...,x n 1, x n,x n+1 n. It cn e shown y induction on n tht x 1,x 2,x 3 3 = x 1, x 2,x 3 2 = x 1, x 2,x 3 x 1,x 2,x 3,x 4 4 = x 1,x 2, x 3,x 4 3 = x 1, x 2, x 3,x 4 x 1,x 2,x 3,x 4,x 5 5 = x 1,x 2,x 3, x 4,x 5 4 = x 1, x 2, x 3, x 4,x 5. x 1,...,x n,x n+1 n+1 = x 1, x 2,...,x n+1 n. The function,..., n : N n N is clled n extended piring function. Oserve tht if z = x 1,...,x n n,thenx 1 =Π 1 (z), x 2 =Π 1 (Π 2 (z)), x 3 =Π 1 (Π 2 (Π 2 (z)), x 4 =Π 1 (Π 2 (Π 2 (Π 2 (z)))), x 5 =Π 2 (Π 2 (Π 2 (Π 2 (z)))). We cn lso define uniform projection function Π with the following property: if z = x 1,...,x n, with n 2, then Π(i, n, z) =x i for ll i, where1 i n. The ide is to view z s n-tuple, nd Π(i, n, z) sthei-th component of tht n-tuple. The function Π is defined y cses s follows: Π(i, 0,z)=0, for ll i 0, Π(i, 1,z)=z, for ll i 0, Π(i, 2,z)=Π 1 (z), if 0 i 1, Π(i, 2,z)=Π 2 (z), for ll i 2, nd for ll n 2, Π(i, n, z) if 0 i<n, Π(i, n +1,z)= Π 1 (Π(n, n, z)) if i = n, Π 2 (Π(n, n, z)) if i>n.

242 242 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM By previous exercise, this is legitimte primitive recursive definition. Some sic properties of Π re given s exercises. In prticulr, the following properties re esily shown: () 0,...,0 n =0, x, 0 = x, 0,...,0 n ; () Π(0,n,z)=Π(1,n,z)ndΠ(i, n, z) =Π(n, n, z), for ll i n nd ll n, z N; (c) Π(1,n,z),...,Π(n, n, z) n = z, for ll n 1 nd ll z N; (d) Π(i, n, z) z, for ll i, n, z N; (e) There is primitive recursive function Lrge, such tht, for i, n, z N. Π(i, n +1, Lrge(n +1,z)) = z, As first ppliction, we oserve tht we need only consider prtil computle functions (prtil recursive functions) 1 of single rgument. Indeed, let ϕ: N n N e prtil computle function of n 2rguments. Let ϕ(z) =ϕ(π(1, n, z),..., Π(n, n, z)), for ll z N. Then,ϕ is prtil computle function of single rgument, nd ϕ cn e recovered from ϕ, since ϕ(x 1,...,x n )=ϕ( x 1,...,x n ). Thus, using, nd Π s coding nd decoding functions, we cn restrict our ttention to functions of single rgument. Next, we show tht there exist coding nd decoding functions etween Σ nd { 1 },nd tht prtil computle functions over Σ cn e recoded s prtil computle functions over { 1 }. Since { 1 } is isomorphic to N, this shows tht we cn restrict out ttention to functions defined over N. 9.2 Equivlence of Alphets Given n lphet Σ = { 1,..., k }, strings over Σ cn e ordered y viewing strings s numers in numer system where the digits re 1,..., k. In this numer system, which is lmost the numer system with se k, the string 1 corresponds to zero, nd k to k 1. Hence, we hve kind of shifted numer system in se k. For exmple, if Σ = {,, c}, listing of Σ in the ordering corresponding to the numer system egins with,, c,,, c,,, c, c, c, cc,,, c,,, c, The term prtil recursive is now considered old-fshion. Mny reserchers hve switched to the term prtil computle.

243 9.2. EQUIVALENCE OF ALPHABETS 243 Clerly, there is n ordering function from Σ to N which is ijection. Indeed, if u = i1 in, this function f :Σ N is given y f(u) =i 1 k n 1 + i 2 k n i n 1 k + i n. Since we lso wnt decoding function, we define the coding function C k :Σ follows: C k (ϵ) =ϵ, nd if u = i1 in,then Σ s C k (u) = i 1k n 1 +i 2 k n 2 + +i n 1 k+i n 1. The function C k is primitive recursive, ecuse C k (ϵ) =ϵ, C k (x i )=C k (x) k i 1. The inverse of C k is function D k : { 1 } Σ. However, primitive recursive functions re totl, nd we need to extend D k to Σ. This is esily done y letting D k (x) =D k ( x 1 ) for ll x Σ. It remins to define D k y primitive recursion over { 1 }. For this, we introduce three uxiliry functions p, q, r, defined s follows. Let p(ϵ) =ϵ, p(x i )=x i, if i k, p(x k )=p(x). Note tht p(x) is the result of deteting consecutive k s in the til of x. Let q(ϵ) =ϵ, Note tht q(x) = x 1. Finlly, let r(ϵ) =1, q(x i )=q(x) 1. r(x i )=x i+1, if i k, r(x k )=x k. The function r is lmost the successor function, for the ordering. Then, the trick is tht D k (x i ) is the successor of D k (x) in the ordering, nd if D k (x) =y j n k

244 244 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM with j k, since the successor of y j n k is y j+1 n k,wecnuser. Thus,wehve D k (ϵ) =ϵ, D k (x i )=r(p(d k (x)))q(d k (x) p(d k (x))). Then, oth C k nd D k re primitive recursive, nd C k D k = D k C k = id. Let ϕ: Σ Σ e prtil function over Σ, nd let ϕ + (x 1,...,x n )=C k (ϕ(d k (x 1 ),...,D k (x n ))). The function ϕ + is defined over { 1 }. Also, for ny prtil function ψ over { 1 }, let ψ (x 1,...,x n )=D k (ψ(c k (x 1 ),...,C k (x n ))). We clim tht if ψ is prtil computle function over { 1 },thenψ is prtil computle over Σ, nd tht if ϕ is prtil computle function over Σ,thenϕ + is prtil computle over { 1 }. First, ψ cn e extended to Σ y letting ψ(x) =ψ( x 1 ) for ll x Σ, nd so, if ψ is prtil computle, then so is ψ y composition. This seems eqully ovious for ϕ nd ϕ +, ut there is difficulty. The prolem is tht ϕ + is defined s composition of functions over Σ. We hve to show how ϕ + cn e defined directly over { 1 } without using ny dditionl lphet symols. This is done in Mchtey nd Young [13], see Section 2.2, Lemm Piring functions cn lso e used to prove tht certin functions re primitive recursive, even though their definition is not legl primitive recursive definition. For exmple, consider the Fioncci function defined s follows: f(0) = 1, f(1) = 1, f(n +2)=f(n +1)+f(n), for ll n N. This is not legl primitive recursive definition, since f(n +2)dependsoth on f(n+1) nd f(n). In primitive recursive definition, g(y +1, x) is only llowed to depend upon g(y, x). Definition 9.1. Given ny function f : N n N, the function f : N n+1 N defined such tht f(y, x) = f(0, x),...,f(y, x) y+1 is clled the course-of-vlue function for f.

245 9.2. EQUIVALENCE OF ALPHABETS 245 The following lemm holds. Proposition 9.1. Given ny function f : N n N, iff is primitive recursive, then so is f. Proof. First, it is necessry to define function con such tht if x = x 1,...,x m nd y = y 1,...,y n,wherem, n 1, then con(m, x, y) = x 1,...,x m,y 1,...,y n. This fct is left s n exercise. Now, if f is primitive recursive, let f(0, x) =f(0, x), f(y +1, x) =con(y +1, f(y, x),f(y +1, x)), showing tht f is primitive recursive. Conversely, if f is primitive recursive, then nd so, f is primitive recursive. Remrk : Why is it tht does not work? f(y, x) =Π(y +1,y+1, f(y, x)), f(y +1, x) = f(y, x),f(y +1, x) We define course-of-vlue recursion s follows. Definition 9.2. Given ny two functions g : N n N nd h: N n+2 N, the function f : N n+1 N is defined y course-of-vlue recursion from g nd h if The following lemm holds. f(0, x) =g(x), f(y +1, x) =h(y, f(y, x), x). Proposition 9.2. If f : N n+1 N is defined y course-of-vlue recursion from g nd h nd g, h re primitive recursive, then f is primitive recursive. Proof. We prove tht f is primitive recursive. Then, y Proposition 9.1, f is lso primitive recursive. To prove tht f is primitive recursive, oserve tht f(0, x) =g(x), f(y +1, x) =con(y +1, f(y, x),h(y, f(y, x), x)). When we use Proposition 9.2 to prove tht function is primitive recursive, we rrely other to construct forml course-of-vlue recursion. Insted, we simply indicte how the vlue of f(y +1, x) cn e otined in primitive recursive mnner from f(0, x) through f(y, x). Thus, n informl use of Proposition 9.2 shows tht the Fioncci function is primitive recursive. A rigorous proof of this fct is left s n exercise.

246 246 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM 9.3 Coding of RAM Progrms In this Section, we present specific encoding of RAM progrms which llows us to tret progrms s integers. Encoding progrms s integers lso llows us to hve progrms tht tke other progrms s input, nd we otin universl progrm. Universl progrms hve the property tht given two inputs, the first one eing the code of progrmndthesecond one n input dt, the universl progrm simultes the ctions of the encoded progrm on the input dt. A coding scheme is lso clled n indexing or Gödel numering, in honor to Gödel, who invented this technique. From results of the previous Chpter, without loss of generlity, we cn restrict out ttention to RAM progrms computing prtil functions of one rgument over N. Furthermore, we only need the following kinds of instructions, ech instruction eing coded s shown elow. Since we re considering functions over the nturl numers, which corresponds to one-letter lphet, there is only one kind of instruction of the form dd nd jmp (nd dd increments y 1 the contents of the specified register Rj). Ni dd Rj code = 1,i,j,0 Ni til Rj code = 2,i,j,0 Ni continue code = 3,i,1, 0 Ni Rj jmp Nk code = 4,i,j,k Ni Rj jmp Nk code = 5,i,j,k Recll tht conditionl jump cuses jump to the closest ddress Nk ove or elow iff Rj is nonzero, nd if Rj is null, the next instruction is executed. We ssume tht ll lines in RAM progrm re numered. This is lwys fesile, y leling unnmed instructions with new nd unused line numer. The code of n instruction I is denoted s #I. To simplify the nottion, we introduce the following decoding primitive recursive functions Typ, Nm, Reg, nd Jmp, defined s follows: Typ(x) =Π(1, 4,x), Nm(x) =Π(2, 4,x), Reg(x) =Π(3, 4,x), Jmp(x) =Π(4, 4,x). The functions yield the type, line numer, register nme, nd line numer jumped to, if ny, for n instruction coded y x. Note tht we hve no need to interpret the vlues of these functions if x does not code n instruction. We cn define the primitive recursive predicte INST, such tht INST(x) holds iff x codes n instruction. First, we need the connective (implies), defined such tht P Q iff P Q.

247 9.3. CODING OF RAM PROGRAMS 247 Then, INST(x) holds iff: [1 Typ(x) 5] [1 Reg(x)] [Typ(x) 3 Jmp(x) =0] [Typ(x) =3 Reg(x) =1]. Progrm re coded s follows. If P is RAM progrm composed of the n instructions I 1,...,I n,thecodeofp,denoteds#p, is Recll from previous exercise tht #P = n, #I 1,...,#I n. n, #I 1,...,#I n = n, #I 1,...,#I n. Also recll tht x, y =((x + y) 2 +3x + y)/2. Consider the following progrm Pdd2 computing the function dd2: N N given y dd2(n) =n +2. I 1 : 1 dd R1 I 2 : 2 dd R1 I 3 : 3 continue We hve #I1 = 1, 1, 1, 0 4 = 1, 1, 1, 0 =37 #I2 = 1, 2, 1, 0 4 = 1, 2, 1, 0 =92 #I3 = 3, 3, 1, 0 4 = 3, 3, 1, 0 =234 nd #Pdd2 = 3, #I1, #I2, #I3 4 = 3, 37, 92, 234 = The codes get ig fst! We define the primitive recursive functions Ln, Pg, nd Line, such tht: Ln(x) =Π(1, 2,x), Pg(x) =Π(2, 2,x), Line(i, x) =Π(i, Ln(x), Pg(x)).

248 248 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM The function Ln yields the length of the progrm (the numer of instructions), Pg yields the sequence of instructions in the progrm (relly, code for the sequence), nd Line(i, x) yields the code of the ith instruction in the progrm. Agin, if x does not code progrm, there is no need to interpret these functions. However, note tht y previous exercise, it hppens tht Line(0,x) = Line(1,x), nd Line(Ln(x),x) = Line(i, x), for ll i x. The primitive recursive predicte PROG is defined such tht PROG(x) holds iff x codes progrm. Thus,PROG(x) holds if ech line codes n instruction, ech jump hs n instruction to jump to, nd the lst instruction is continue. Thus, PROG(x) holds iff i Ln(x)[i 1 [INST(Line(i, x)) Typ(Line(Ln(x), x)) = 3 [Typ(Line(i, x)) = 4 j i 1[j 1 Nm(Line(j, x)) = Jmp(Line(i, x))]] [Typ(Line(i, x)) = 5 j Ln(x)[j >i Nm(Line(j, x)) = Jmp(Line(i, x))]]]] Note tht we hve used the fct proved s n exercise tht if f is primitive recursive function nd P is primitive recursive predicte, then x f(y)p (x) is primitive recursive. We re now redy to prove fundmentl result in the theory of lgorithms. This result points out some of the limittions of the notion of lgorithm. Theorem 9.3. (Undecidility of the hlting prolem) There is no RAM progrm Decider which hlts for ll inputs nd hs the following property when strtedwithinputx in register R1 nd with input i in register R2 (the other registers eing set to zero): (1) Decider hlts with output 1 iff i codes progrm tht eventully hlts when strted on input x (ll other registers set to zero). (2) Decider hlts with output 0 in R1 iff i codes progrm tht runs forever when strted on input x in R1 (ll other registers set to zero). (3) If i does not code progrm, then Decider hlts with output 2 in R1. Proof. Assume tht Decider is such RAM progrm, nd let Q e the following progrm with single input: R2 R1 Progrm Q (code q) N1 P continue R1 jmp N1 continue

249 9.3. CODING OF RAM PROGRAMS 249 Let i e the code of some progrm P. The key point is tht the termintion ehvior of Q on input i is exctly the opposite of the termintion ehvior of Decider on input i nd code i. (1) If Decider sys tht progrm P coded y i hlts on input i, thenr1 justfterthe continue in line N1 contins 1, nd Q loops forever. (2) If Decider sys tht progrm P coded y i loops forever on input i, thenr1 justfter continue in line N1 contins 0, nd Q hlts. The progrm Q cn e trnslted into progrm using only instructions of type 1, 2, 3, 4, 5, descried previously, nd let q e the code of the progrm Q. Let us see wht hppens if we run the progrm Q on input q in R1 (ll other registers set to zero). Just fter execution of the ssignment R2 R1, the progrm Decider is strted with q in oth R1 ndr2. Since Decider is supposed to hlt for ll inputs, it eventully hlts with output 0 or 1 in R1. If Decider hlts with output 1 in R1, then Q goes into n infinite loop, while if Decider hlts with output 0 in R1, then Q hlts. But then, ecuse of the definition of Decider, weseethtdecider sys tht Q hlts when strted on input q iff Q loops forever on input q, ndthtq loops forever on input q iff Q hlts on input q, contrdiction. Therefore, Decider cnnot exist. If we identify the notion of lgorithm with tht of RAM progrm which hlts for ll inputs, the ove theorem sys tht there is no lgorithm for deciding whether RAM progrm eventully hlts for given input. We sy tht the hlting prolem for RAM progrms is undecidle (or unsolvle). The ove theorem lso implies tht the hlting prolem for Turing mchines is undecidle. Indeed, if we hd n lgorithm for solving the hlting prolem for Turing mchines, we could solve the hlting prolem for RAM progrms s follows: first, pply the lgorithm for trnslting RAM progrm into n equivlent Turing mchine, nd then pply the lgorithm solving the hlting prolem for Turing mchines. The rgument is typicl in computility theory nd is clled reduciility rgument. Our next gol is to define primitive recursive function tht descries the computtion of RAM progrms. Assume tht we hve RAM progrm P using n registers R1,...,Rn, whose contents re denoted s r 1,...,r n. We cn code r 1,...,r n into single integer r 1,...,r n. Conversely, every integer x cn e viewed s coding the contents of R1,...,Rn, y tking the sequence Π(1,n,x),...,Π(n, n, x). Actully, it is not necessry to know n, the numer of registers, if we mke the following oservtion: Reg(Line(i, x)) Line(i, x) Pg(x)

250 250 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM for ll i, x N. Then, if x codes progrm, then R1,...,Rx certinly include ll the registers in the progrm. Also note tht from previous exercise, r 1,...,r n, 0,...,0 = r 1,...,r n, 0. We now define the primitive recursive functions Nextline, Nextcont, nd Comp, descriing the computtion of RAM progrms. Definition 9.3. Let x code progrm nd let i e such tht 1 i Ln(x). The following functions re defined: (1) Nextline(i, x, y) is the numer of the next instruction to e executed fter executing the ith instruction in the progrm coded y x, where the contents of the registers is coded y y. (2) Nextcont(i, x, y) is the code of the contents of the registers fter executing the ith instruction in the progrm coded y x, where the contents of the registers is coded y y. (3) Comp(x, y, m) = i, z, where i nd z re defined such tht fter running the progrm coded y x for m steps, where the initil contents of the progrm registers re coded y y, the next instruction to e executed is the ith one, nd z is the code of the current contents of the registers. Proposition 9.4. The functions Nextline, Nextcont, ndcomp, reprimitiverecursive. Proof. (1) Nextline(i, x, y) =i + 1, unless the ith instruction is jump nd the contents of the register eing tested is nonzero: Nextline(i, x, y) = mx j Ln(x)[j <i Nm(Line(j, x)) = Jmp(Line(i, x))] if Typ(Line(i, x)) = 4 Π(Reg(Line(i, x)), x, y) 0 min j Ln(x)[j >i Nm(Line(j, x)) = Jmp(Line(i, x))] if Typ(Line(i, x)) = 5 Π(Reg(Line(i, x)), x, y) 0 i + 1 otherwise. Note tht ccording to this definition, if the ith line is the finl continue, then Nextline signls tht the progrm hs hlted y yielding Nextline(i, x, y) > Ln(x). (2) We need two uxiliry functions Add nd Su defined s follows. Add(j, x, y) is the numer coding the contents of the registers used y the progrm coded y x fter register Rj coded y Π(j, x, y) hs een incresed y 1, nd

251 9.3. CODING OF RAM PROGRAMS 251 Su(j, x, y) codes the contents of the registers fter register Rj hs een decremented y 1(y codes the previous contents of the registers). It is esy to see tht Su(j, x, y) = min z y[π(j, x, z) =Π(j, x, y) 1 k x[0 <k j Π(k, x, z) =Π(k, x, y)]]. The definition of Add is slightly more tricky. We leve s n exercise to the reder to prove tht: Add(j, x, y) = min z Lrge(x, y +1) [Π(j, x, z) =Π(j, x, y)+1 k x[0 <k j Π(k, x, z) =Π(k, x, y)]], where the function Lrge is the function defined in n erlier exercise. Then Nextcont(i, x, y) = Add(Reg(Line(i, x), x, y) if Typ(Line(i, x)) = 1 Su(Reg(Line(i, x), x, y) if Typ(Line(i, x)) = 2 y if Typ(Line(i, x)) 3. (3) Recll tht Π 1 (z) =Π(1, 2,z)ndΠ 2 (z) =Π(2, 2,z). The function Comp is defined y primitive recursion s follows: Comp(x, y, 0) = 1, y Comp(x, y, m +1)= Nextline(Π 1 (Comp(x, y, m)),x,π 2 (Comp(x, y, m))), Nextcont(Π 1 (Comp(x, y, m)),x,π 2 (Comp(x, y, m))). Recll tht Π 1 (Comp(x, y, m)) is the numer of the next instruction to e executed nd tht Π 2 (Comp(x, y, m)) codes the current contents of the registers. We cn now reprove tht every RAM computle function is prtil computle. Indeed, ssume tht x codes progrm P. We define the prtil function End so tht for ll x, y, wherex codes progrm nd y codes the contents of its registers, End(x, y) is the numer of steps for which the computtion runs efore hlting, if it hlts. If the progrm does not hlt, then End(x, y) is undefined. Since End(x, y) = min m[π 1 (Comp(x, y, m)) = Ln(x)], If y is the vlue of the register R1 eforetheprogrmp coded y x is strted, recll tht the contents of the registers is coded y y, 0. Noticing tht 0 nd 1 do not code progrms, we note tht if x codes progrm, then x 2, nd Π 1 (z) =Π(1,x,z) is the contents of R1 s coded y z.

252 252 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM Since Comp(x, y, m) = i, z, wehve Π 1 (Comp(x, y, m)) = i, where i is the numer (index) of the instruction reched fter running the progrm P coded y x with initil vlues of the registers coded y y for m steps. Thus, P hlts if i is the lst instruction in P, nmely Ln(x), iff Π 1 (Comp(x, y, m)) = Ln(x). End is prtil computle function; it cn e computed y RAM progrm involving only one while loop serching for the numer of steps m. However, in generl, End is not totl function. If ϕ is the prtil computle function computed y the progrm P coded y x, thenwe hve ϕ(y) =Π 1 (Π 2 (Comp(x, y, 0, End(x, y, 0 )))). This is ecuse if m =End(x, y, 0 ) is the numer of steps fter which the progrm P coded y x hlts on input y, then Comp(x, y, 0,m)) = Ln(x),z, where z is the code of the register contents when the progrm stops. Consequently z =Π 2 (Comp(x, y, 0,m)) The vlue of the register R1 is Π 1 (z), tht is z =Π 2 (Comp(x, y, 0, End(x, y, 0 ))). ϕ(y) =Π 1 (Π 2 (Comp(x, y, 0, End(x, y, 0 )))). Oserve tht ϕ is written in the form ϕ = g min f, for some primitive recursive functions f nd g. We cn lso exhiit prtil computle function which enumertes ll the unry prtil computle functions. It is universl function. Ausing the nottion slightly, we will write ϕ(x, y) forϕ( x, y ), viewing ϕ s function of two rguments (however, ϕ is relly function of single rgument). We define the function ϕ univ s follows: { ϕ univ (x, y) = Π1 (Π 2 (Comp(x, y, 0, End(x, y, 0 )))) if PROG(x), undefined otherwise. The function ϕ univ is prtil computle function with the following property: for every x coding RAM progrm P, for every input y, ϕ univ (x, y) =ϕ x (y),

253 9.3. CODING OF RAM PROGRAMS 253 the vlue of the prtil computle function ϕ x computed y the RAM progrm P coded y x. Ifx does not code progrm, then ϕ univ (x, y) is undefined for ll y. By Proposition 8.9, the prtil function ϕ univ is not computle (recursive). 2 Indeed, eing n enumerting function for the prtil computle functions, it is n enumerting function for the totl computle functions, nd thus, it cnnot e computle. Being prtil function sves us from contrdiction. The existence of the function ϕ univ leds us to the notion of n indexing of the RAM progrms. We cn define listing of the RAM progrms s follows. If x codes progrm (tht is, if PROG(x) holds) nd P is the progrm tht x codes, we cll this progrm P the xth RAM progrm nd denote it s P x.ifx does not code progrm, we let P x e the progrm tht diverges for every input: N1 dd R1 N1 R1 jmp N1 N1 continue Therefore, in ll cses, P x stnds for the xth RAM progrm. Thus, we hve listing of RAM progrms, P 0,P 1,P 2,P 3,..., such tht every RAM progrm (of the restricted type considered here) ppers in the list exctly once, except for the infinite loop progrm. For exmple, the progrm Pdd2 (dding 2 to n integer) ppers s P In prticulr, note tht ϕ univ eing prtil computle function, it is computed y some RAM progrm UNIV tht hs code univ nd is the progrm P univ in the list. Hving n indexing of the RAM progrms, we lso hve n indexing of the prtil computle functions. Definition 9.4. For every integer x 0, we let P x e the RAM progrm coded y x s defined erlier, nd ϕ x e the prtil computle function computed y P x. For exmple, the function dd2 (dding 2 to n integer) ppers s ϕ Remrk : Kleene used the nottion {x} for the prtil computle function coded y x. Due to the potentil confusion with singleton sets, we follow Rogers, nd use the nottion ϕ x. 2 The term recursive function is now considered old-fshion. Mny reserchers hve switched to the term computle function.

254 254 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM It is importnt to oserve tht different progrms P x nd P y my compute the sme function, tht is, while P x P y for ll x y, it is possile tht ϕ x = ϕ y. In fct, it is undecidle whether ϕ x = ϕ y. The existence of the universl function ϕ univ is sufficiently importnt to e recorded in the following Lemm. Proposition 9.5. For the indexing of RAM progrms defined erlier, there is universl prtil computle function ϕ univ such tht, for ll x, y N, ifϕ x is the prtil computle function computed y P x,then ϕ x (y) =ϕ univ ( x, y ). The progrm UNIV computing ϕ univ cn e viewed s n interpreter for RAM progrms. By giving the universl progrm UNIV the progrm x nd the dt y, we get the result of executing progrm P x on input y. We cn view the RAM model s stored progrm computer. By Theorem 9.3 nd Proposition 9.5, the hlting prolem for the single progrm UNIV is undecidle. Otherwise, the hlting prolem for RAM progrms would e decidle, contrdiction. It should e noted tht the progrm UNIV cn ctully e written (with certin mount of pin). The oject of the next Section is to show the existence of Kleene s T -predicte. This will yield nother importnt norml form. In ddition, the T -predicte is sic tool in recursion theory. 9.4 Kleene s T -Predicte In Section 9.3, we hve encoded progrms. The ide of this Section is to lso encode computtions of RAM progrms. Assume tht x codes progrm, tht y is some input (not code), nd tht z codes computtion of P x on input y. The predicte T (x, y, z) is defined s follows: T (x, y, z) holds iff x codes RAM progrm, y is n input, nd z codes hlting computtion of P x on input y. We will show tht T is primitive recursive. First, we need to encode computtions. We sy tht z codes computtion of length n 1 if z = n +2, 1,y 0, i 1,y 1,..., i n,y n, where ech i j is the physicl loction of the next instruction to e executed nd ech y j codes the contents of the registers just efore execution of the instruction t the loction i j. Also, y 0 codes the initil contents of the registers, tht is, y 0 = y, 0, for some input y. We let Ln(z) =Π 1 (z). Note tht i j denotes the physicl loction of the next instruction to

255 9.4. KLEENE S T -PREDICATE 255 e executed in the sequence of instructions constituting the progrm coded y x, nd not the line numer (lel) of this instruction. Thus, the first instruction to e executed is in loction 1, 1 i j Ln(x), nd i n 1 =Ln(x). Since the lst instruction which is executed is the lst physicl instruction in the progrm, nmely, continue, there is no next instruction to e executed fter tht, nd i n is irrelevnt. Writing the definition of T is little simpler if we let i n =Ln(x)+1. Definition 9.5. The T -predicte is the primitive recursive predicte defined s follows: T (x, y, z) iff PROG(x) nd(ln(z) 3) nd j Ln(z) 3[0 j Nextline(Π 1 (Π(j +2, Ln(z),z)),x,Π 2 (Π(j +2, Ln(z),z))) = Π 1 (Π(j +3, Ln(z),z)) nd Nextcont(Π 1 (Π(j +2, Ln(z),z)),x,Π 2 (Π(j +2, Ln(z),z))) = Π 2 (Π(j +3, Ln(z),z)) nd Π 1 (Π(Ln(z) 1, Ln(z),z)) = Ln(x)nd Π 1 (Π(2, Ln(z),z)) = 1 nd y =Π 1 (Π 2 (Π(2, Ln(z),z))) nd Π 2 (Π 2 (Π(2, Ln(z),z))) = 0] The reder cn verify tht T (x, y, z) holds iff x codes RAM progrm, y is n input, nd z codes hlting computtion of P x on input y. InordertoextrcttheoutputofP x from z, we define the primitive recursive function Res s follows: Res(z) =Π 1 (Π 2 (Π(Ln(z), Ln(z),z))). The explntion for this formul is tht Res(z) re the contents of register R1 whenp x hlts, tht is, Π 1 (y Ln(z) ). Using the T -predicte, we get the so-clled Kleene norml form. Theorem 9.6. (Kleene Norml Form) Using the indexing of the prtil computle functions defined erlier, we hve ϕ x (y) = Res[min z(t (x, y, z))], where T (x, y, z) nd Res re primitive recursive. Note tht the universl function ϕ univ cn e defined s ϕ univ (x, y) = Res[min z(t (x, y, z))]. There is nother importnt property of the prtil computle functions, nmely, tht composition is effective. We need two uxiliry primitive recursive functions. The function Conprogs cretes the code of the progrm otined y conctenting the progrms P x nd P y,ndfori 2, Cumclr(i) is the code of the progrm which clers registers R2,...,Ri. To get Cumclr, we cn use the function clr(i) such tht clr(i) is the code of the progrm N1 til Ri N1 Ri jmp N1 N continue

256 256 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM We leve it s n exercise to prove tht clr, Conprogs, nd Cumclr, re primitive recursive. Theorem 9.7. There is primitive recursive function c such tht ϕ c(x,y) = ϕ x ϕ y. Proof. If oth x nd y code progrms, then ϕ x ϕ y cn e computed s follows: Run P y, cler ll registers ut R1, then run P x. Otherwise, let loop e the index of the infinite loop progrm: { Conprogs(y, Conprogs(Cumclr(y),x)) if PROG(x) ndprog(y) c(x, y) = loop otherwise. 9.5 A Simple Function Not Known to e Computle The 3n + 1 prolem proposed y Colltz round 1937 is the following: Given ny positive integer n 1, construct the sequence c i (n) s follows strting with i =1: c 1 (n) =n { c i (n)/2 c i+1 (n) = 3c i (n) + 1 if c i (n) is even if c i (n) is odd. Oserve tht for n = 1, we get the infinite periodic sequence 1= 4= 2= 1= 4= 2= 1=, so we my ssume tht we stop the first time tht the sequence c i (n) reches the vlue 1 (if it ctully does). Such n index i is clled the stopping time of the sequence. And this is the prolem: Conjecture (Colltz): For ny strting integer vlue n 1, the sequence (c i (n)) lwys reches 1. Strting with n =3,wegetthesequence Strting with n =5,wegetthesequence 3= 10 = 5= 16 = 8= 4= 2= 1. 5= 16 = 8= 4= 2= 1.

257 9.5. A SIMPLE FUNCTION NOT KNOWN TO BE COMPUTABLE 257 Strting with n =6,wegetthesequence 6= 3= 10 = 5= 16 = 8= 4= 2= 1. Strting with n =7,wegetthesequence 7= 22 = 11 = 34 = 17 = 52 = 26 = 13 = 40 = 20 = 10 = 25 = 16 = 8= 4= 2= 1. One might e surprised to find tht for n = 27, it tkes 111 steps to rech 1, nd for n = 97, it tkes 118 steps. I computed the stopping times for n up to 10 7 nd found tht the lrgest stopping time, 525 (524 steps), is otined for n = The terms of this sequence rech vlues over Thegrphofthesequences(837799) is shown in Figure Figure 9.1: Grph of the sequence for n = y We cn define the prtil computle function C (with positive integer inputs) defined C(n) = the smllest i such tht c i (n) = 1 if it exists. Then the Colltz conjecture is equivlent to sserting tht the function C is (totl) computle. The grph of the function C for 1 n 10 7 is shown in Figure 9.2. So fr, the conjecture remins open. It hs een checked y computer for ll integers

258 258 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM Figure 9.2: Grph of the function C for 1 n A Non-Computle Function; Busy Bevers Totl functions tht re not computle must grow very fst nd thus re very complicted. Yet, in 1962, Rdó pulished pper in which he defined two functions Σnd S (involving computtions of Turing mchines) tht re totl nd not computle. Consider Turing mchines with tpe lphet Γ = {1,B} with two symols (B eing the lnk). We lso ssume tht these Turing mchines hve specil finlstteq F, which is locking stte (there re no trnsitions from q F ). We do not count this stte when counting the numer of sttes of such Turing mchines. The gme is to run such Turing mchines with fixed numer of sttes n strting on lnk tpe, with the gol of producing the mximum numer of (not necessrily consecutive) ones (1). Definition 9.6. The function Σ (defined on the positive nturl numers) is defined s the mximum numer Σ(n) of (not necessrily consecutive) 1 s written on the tpe fter Turing mchine with n 1 sttes strted on the lnk tpe hlts. The function S is defined s the mximum numer S(n) of moves tht cn e mde y Turing mchine of the ove type with n sttes efore it hlts, strted on the lnk tpe. A Turing mchine with n sttes tht writes the mximum numer Σ(n) of 1 s when strted on the lnk tpe is clled usy ever.

259 9.6. A NON-COMPUTABLE FUNCTION; BUSY BEAVERS 259 Busy evers re hrd to find, even for smll n. First, it cn e shown tht the numer of distinct Turing mchines of the ove kind with n sttes is (4(n +1)) 2n. Second, since it is undecidle whether Turing mchine hlts on given input, it is hrd to tell which mchines loop or hlt fter very long time. Here is summry of wht is known for 1 n 6. Oserve tht the exct vlue of Σ(5), Σ(6),S(5) nd S(6) is unknown. n Σ(n) S(n) , 176, , 524, 079 8, 690, 333, 381, 690, The first entry in the tle for n = 6 corresponds to mchine due to Heiner Mrxen (1999). This record ws surpssed y Pvel Kropitz in 2010, which corresponds to the second entry for n = 6. The mchines chieving the record in 2017 for n =4, 5, 6 re shown elow, where the lnk is denoted insted of B, nd where the specil hlting sttes is denoted H: 4-stte usy ever: A B C D (1,R,B) (1,L,A) (1,R,H) (1,R,D) 1 (1,L,B) (,L,C) (1,L,D) (,R,A) The ove mchine output 13 ones in 107 steps. In fct, the output is stte est contender: A B C D E (1,R,B) (1,R,C) (1,R,D) (1,L,A) (1,R,H) 1 (1,L,C) (1,R,B) (,L,E) (1,L,D) (,L,A) The ove mchine output 4098 ones in 47, 176, 870 steps.

260 260 CHAPTER 9. UNIVERSAL RAM PROGRAMS AND THE HALTING PROBLEM 6-stte contender (Heiner Mrxen): A B C D E F (1,R,B) (1,L,C) (,R,F) (1,R,A) (1,L,H) (,L,A) 1 (1,R,A) (1,L,B) (1,L,D) (,L,E) (1,L,F) (,L,C) The ove mchine outputs 96, 524, 079 ones in 8, 690, 333, 381, 690, 951 steps. 6-stte est contender (Pvel Kropitz): A B C D E F (1,R,B) (1,R,C) (1,L,D) (1,R,E) (1,L,A) (1,L,H) 1 (1,L,E) (1,R,F) (,R,B) (,L,C) (,R,D) (1,R,C) The ove mchine output t lest ones! The reson why it is so hrd to compute Σ nd S is tht they re not computle! Theorem 9.8. The functions Σ nd S re totl functions tht re not computle (not recursive). Proof sketch. The proof consists in showing tht Σ (nd similrly for S) eventully outgrows ny computle function. More specificlly, we clim tht for every computle function f, there is some positive integer k f such tht Σ(n + k f ) f(n) for ll n 0. We simply hve to pick k f to e the numer of sttes of Turing mchine M f computing f. Then, we cn crete Turing mchine M n,f tht works s follows. Using n of its sttes, it writes n ones on the tpe, nd then it simultes M f with input 1 n. Since the ouput of M n,f strted on the lnk tpe consists of f(n) ones, nd since Σ(n + k f ) is the mximum numer of ones tht turing mchine with n + k f sttes will ouput when it stops, we must hve Σ(n + k f ) f(n) for ll n 0. Next oserve tht Σ(n) < Σ(n + 1), ecuse we cn crete Turing mchine with n +1 sttes which simultes usy ever mchine with n sttes, nd then writes n extr 1 when the usy ever stops, y mking trnsition to the (n + 1)th stte. It follows immeditely tht if m<nthen Σ(m) < Σ(n). If Σ ws computle, then so would e the function g given y g(n) =Σ(2n). By the ove, we would hve Σ(n + k g ) g(n) =Σ(2n) for ll n 0, nd for n>k g, since 2n >n+ k k, we would hve Σ(n + n g ) < Σ(2n), contrdicting the fct tht Σ(n + n g ) Σ(2n).

9.6. A NON-COMPUTABLE FUNCTION; BUSY BEAVERS 261 Since y definition S(n) is the mximum numer of moves tht cn e mde y Turing mchine of the ove type with n sttes efore it hlts, S(n) Σ(n).

261 9.6. A NON-COMPUTABLE FUNCTION; BUSY BEAVERS 261 Since y definition S(n) is the mximum numer of moves tht cn e mde y Turing mchine of the ove type with n sttes efore it hlts, S(n) Σ(n). Then the sme resoning s ove shows tht S is not computle function. The zoo of comutle nd non-computle functions is illustrted in Figure 9.3. Busy Bever Only initil cses computed. prtil computle uilt from primitive recursive nd minimiztion (while loops) totl computle 3x + 1 prolem memership in lnguge (set) primitive recursive E S P n i dd mult exp supexp poor thing is so usy, he is nemic! φ (x,y) univ prtil decider rtionl expressions termintes for ll input functions tht computer cn t clculte grow too fst: overflow Figure 9.3: Computility Clssifiction of Functions.

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES 2.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input